APCodec+: A Spectrum-Coding-Based High-Fidelity and High-Compression-Rate Neural Audio Codec with Staged Training Paradigm

Abstract

This paper proposes a novel neural audio codec, named APCodec+, which is an improved version of APCodec. The APCodec+ takes the audio amplitude and phase spectra as the coding object, and employs an adversarial training strategy. Innovatively, we propose a two-stage joint-individual training paradigm for APCodec+. In the joint training stage, the encoder, quantizer, decoder and discriminator are jointly trained with complete spectral loss, quantization loss, and adversarial loss. In the individual training stage, the encoder and quantizer fix their parameters and provide high-quality training data for the decoder and discriminator. The decoder and discriminator are individually trained from scratch without the quantization loss. The purpose of introducing individual training is to reduce the learning difficulty of the decoder, thereby further improving the fidelity of the decoded audio. Experimental results confirm that our proposed APCodec+ at low bitrates achieves comparable performance with baseline codecs at higher bitrates, thanks to the proposed staged training paradigm.



Experiment on Expresso test set. Model_x: x means bitrate in kbps. Model_x_y: y means y interation(s).


ex01_confused_00371



APCodec+_4.5 (proposed) APCodec+_6 APCodec+_3 SoundStream_12 Encodec_12 HiFi-Codec_12 AudioDec_12
APCodec*_6 APCodec*_4.5 APCodec_6 APCodec_4.5 GroundTruth APCodec+_4.5_1 APCodec+_4.5_2

ex01_enunciated_00373



APCodec+_4.5 (proposed) APCodec+_6 APCodec+_3 SoundStream_12 Encodec_12 HiFi-Codec_12 AudioDec_12
APCodec*_6 APCodec*_4.5 APCodec_6 APCodec_4.5 GroundTruth APCodec+_4.5_1 APCodec+_4.5_2

ex01_sad_00372



APCodec+_4.5 (proposed) APCodec+_6 APCodec+_3 SoundStream_12 Encodec_12 HiFi-Codec_12 AudioDec_12
APCodec*_6 APCodec*_4.5 APCodec_6 APCodec_4.5 GroundTruth APCodec+_4.5_1 APCodec+_4.5_2

ex02_sad_00369



APCodec+_4.5 (proposed) APCodec+_6 APCodec+_3 SoundStream_12 Encodec_12 HiFi-Codec_12 AudioDec_12
APCodec*_6 APCodec*_4.5 APCodec_6 APCodec_4.5 GroundTruth APCodec+_4.5_1 APCodec+_4.5_2

ex03_happy_00379



APCodec+_4.5 (proposed) APCodec+_6 APCodec+_3 SoundStream_12 Encodec_12 HiFi-Codec_12 AudioDec_12
APCodec*_6 APCodec*_4.5 APCodec_6 APCodec_4.5 GroundTruth APCodec+_4.5_1 APCodec+_4.5_2

ex03_happy_00380



APCodec+_4.5 (proposed) APCodec+_6 APCodec+_3 SoundStream_12 Encodec_12 HiFi-Codec_12 AudioDec_12
APCodec*_6 APCodec*_4.5 APCodec_6 APCodec_4.5 GroundTruth APCodec+_4.5_1 APCodec+_4.5_2

ex04_happy_00373



APCodec+_4.5 (proposed) APCodec+_6 APCodec+_3 SoundStream_12 Encodec_12 HiFi-Codec_12 AudioDec_12
APCodec*_6 APCodec*_4.5 APCodec_6 APCodec_4.5 GroundTruth APCodec+_4.5_1 APCodec+_4.5_2

ex04_happy_00377


APCodec+_4.5 (proposed) APCodec+_6 APCodec+_3 SoundStream_12 Encodec_12 HiFi-Codec_12 AudioDec_12
APCodec*_6 APCodec*_4.5 APCodec_6 APCodec_4.5 GroundTruth APCodec+_4.5_1 APCodec+_4.5_2