Academic Homepage

Hui-Peng Du

I am a PhD student at the University of Science and Technology of China (USTC). My research focuses on speech synthesis, neural vocoders, neural audio codecs, speech enhancement, and robust speech generation.

Speech Synthesis Neural Vocoder Neural Audio Codec Speech Enhancement Robust Speech Modeling
Position PhD Student
School University of Science and Technology of China (USTC)
Research Group National Engineering Research Center of Speech and Language Information Processing
Profiles DBLP · Google Scholar

About

I work on audio and speech processing, with a particular interest in efficient and high-fidelity neural waveform generation, low-bitrate neural speech and audio coding, and robust speech modeling. I also work on speech quality prediction, speech enhancement, and environment-aware speech generation.

24Consolidated publications
7First-author papers
17Co-authored papers

This page consolidates journal/conference papers and matching arXiv versions into single title-level entries to avoid duplicate listings.

Research Interests

Neural Vocoders

High-quality and efficient waveform generation with explicit amplitude-phase modeling and low-latency design.

Neural Audio and Speech Codecs

Low-bitrate, high-fidelity, and streamable speech/audio coding with strong reconstruction quality.

Speech Synthesis

Environment-aware, zero-shot, and robust speech generation for realistic usage conditions.

Speech Assessment and Enhancement

Speech quality prediction, denoising, bandwidth extension, and universal enhancement in difficult conditions.

Publications

Publications are split into first-author and non-first-author papers. arXiv links are provided whenever a public preprint is available.

First-Author Publications

    2026
  • CodeSep: Low-Bitrate Codec-Driven Speech Separation with Base-Token Disentanglement and Auxiliary-Token Serial Prediction
    Hui-Peng Du, Yang Ai, Xiao-Hang Jiang, Rui-Chen Zheng, Zhen-Hua Ling
    arXiv preprint, 2026
  • 2025
  • A Distilled Low-Latency Neural Vocoder with Explicit Amplitude and Phase Prediction
    Hui-Peng Du, Yang Ai, Zhen-Hua Ling
    APSIPA ASC, 2025
  • Is GAN Necessary for Mel-Spectrogram-Based Neural Vocoder?
    Hui-Peng Du, Yang Ai, Rui-Chen Zheng, Ye-Xin Lu, Zhen-Hua Ling
    IEEE Signal Processing Letters, 2025
  • 2024
  • A Neural Denoising Vocoder for Clean Waveform Generation from Noisy Mel-Spectrogram Based on Amplitude and Phase Predictions
    Hui-Peng Du, Ye-Xin Lu, Yang Ai, Zhen-Hua Ling
    NCMMSC, 2024
  • APCodec+: A Spectrum-Coding-Based High-Fidelity and High-Compression-Rate Neural Audio Codec with Staged Training Paradigm
    Hui-Peng Du, Yang Ai, Rui-Chen Zheng, Zhen-Hua Ling
    ISCSLP, 2024
  • BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation
    Hui-Peng Du, Ye-Xin Lu, Yang Ai, Zhen-Hua Ling
    INTERSPEECH, 2024
  • 2023
  • APNet2: High-Quality and High-Efficiency Neural Vocoder with Direct Prediction of Amplitude and Phase Spectra
    Hui-Peng Du, Ye-Xin Lu, Yang Ai, Zhen-Hua Ling
    NCMMSC, 2023

Co-Authored Publications

    2025
  • A High-Quality and Low-Complexity Streamable Neural Speech Codec with Knowledge Distillation
    En-Wei Zhang, Hui-Peng Du, Xiao-Hang Jiang, Yang Ai, Zhen-Hua Ling
    APSIPA ASC, 2025
  • CASC-XVC: Zero-Shot Cross-Lingual Voice Conversion with Content Accordant and Speaker Contrastive Losses
    Han-Jie Guo, Hui-Peng Du, Zheng-Yan Sheng, Li-Ping Chen, Yang Ai, Zhen-Hua Ling
    ICASSP, 2025
  • DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis
    Ye-Xin Lu, Yu Gu, Kun Wei, Hui-Peng Du, Yang Ai, Zhen-Hua Ling
    arXiv preprint, 2025
  • ERVQ: Enhanced Residual Vector Quantization with Intra-and-Inter-Codebook Optimization for Neural Audio Codecs
    Rui-Chen Zheng, Hui-Peng Du, Xiao-Hang Jiang, Yang Ai, Zhen-Hua Ling
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025
  • Improving Noise Robustness of LLM-Based Zero-Shot TTS via Discrete Acoustic Token Denoising
    Ye-Xin Lu, Hui-Peng Du, Fei Liu, Yang Ai, Zhen-Hua Ling
    INTERSPEECH, 2025
  • Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis
    Ye-Xin Lu, Hui-Peng Du, Zheng-Yan Sheng, Yang Ai, Zhen-Hua Ling
    ICASSP, 2025
  • Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding
    Rui-Chen Zheng, Wenrui Liu, Hui-Peng Du, Qinglin Zhang, Chong Deng, Qian Chen, Wen Wang, Yang Ai, Zhen-Hua Ling
    arXiv preprint, 2025
  • Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction
    Ye-Xin Lu, Yang Ai, Hui-Peng Du, Zhen-Hua Ling
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025
  • Universal Discrete-Domain Speech Enhancement
    Fei Liu, Yang Ai, Ye-Xin Lu, Rui-Chen Zheng, Hui-Peng Du, Zhen-Hua Ling
    arXiv preprint, 2025
  • Vision-Integrated High-Quality Neural Speech Coding
    Yao Guo, Yang Ai, Rui-Chen Zheng, Hui-Peng Du, Xiao-Hang Jiang, Zhen-Hua Ling
    INTERSPEECH, 2025
  • 2024
  • APCodec: A Neural Audio Codec With Parallel Amplitude and Phase Spectrum Encoding and Decoding
    Yang Ai, Xiao-Hang Jiang, Ye-Xin Lu, Hui-Peng Du, Zhen-Hua Ling
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024
  • Considering Temporal Connection Between Turns for Conversational Speech Synthesis
    Kangdi Mei, Zhaoci Liu, Hui-Peng Du, Hengyu Li, Yang Ai, Liping Chen, Zhen-Hua Ling
    ICASSP, 2024
  • ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
    Xiao-Hang Jiang, Hui-Peng Du, Yang Ai, Ye-Xin Lu, Zhen-Hua Ling
    NCMMSC, 2024
  • MDCTCodec: A Lightweight MDCT-Based Neural Audio Codec Towards High Sampling Rate and Low Bitrate Scenarios
    Xiao-Hang Jiang, Yang Ai, Rui-Chen Zheng, Hui-Peng Du, Ye-Xin Lu, Zhen-Hua Ling
    SLT, 2024
  • Pitch-and-Spectrum-Aware Singing Quality Assessment with Bias Correction and Model Fusion
    Yu-Fei Shi, Yang Ai, Ye-Xin Lu, Hui-Peng Du, Zhen-Hua Ling
    SLT, 2024
  • SAMOS: A Neural MOS Prediction Model Leveraging Semantic Representations and Acoustic Features
    Yu-Fei Shi, Yang Ai, Ye-Xin Lu, Hui-Peng Du, Zhen-Hua Ling
    ISCSLP, 2024
  • Stage-Wise and Prior-Aware Neural Speech Phase Prediction
    Fei Liu, Yang Ai, Hui-Peng Du, Ye-Xin Lu, Rui-Chen Zheng, Zhen-Hua Ling
    SLT, 2024