Jinlong Xue


Hi there! I am currently a master’s student at the Beijing University of Posts and Telecommunications (BUPT), majoring in Artificial Intelligence. I am supervised by Associate Professor Ya Li, focusing on speech synthesis, NLP, and multimodal generation.

I have a keen interest in all aspects of multimodal generation, including speech synthesis, multimodal LLM, and AIGC. I am also interested in developing intelligent and interactive AI systems with human emotions.

I plan to pursue a Ph.D. abroad after completing my master’s degree in 2025. If any professors or researchers are interested in my work or see potential for collaboration, please do not hesitate to contact me!


Jun 05, 2024 2 papers are accepted in InterSpeech 2024! demo in MMCE-Qformer-TTS and RAG-TTS 🎉
Jan 02, 2024 Our Text-to-Audio model Auffusion paper, code and project is released! 🎉
Dec 14, 2023 Our ICASSP 2024 paper CONCSS is accepted! 🎉
Jul 30, 2023 Our ACM MM 2023 paper CMCU-CSS is accepted! 🎉
Feb 17, 2023 Our ICASSP 2023 paper M2-CTTS is accepted! 🎉

selected publications


  1. xue2024improving.png
    Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
    Jinlong Xue, Yayue Deng, Yicheng Han, and 2 more authors
    Interspeech, 2024
  2. xue2024retrieval.png
    Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining
    Jinlong Xue, Yayue Deng, Yingming Gao, and 1 more author
    Interspeech, 2024
  3. xue2024auffusion.png
    Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
    Jinlong Xue, Yayue Deng, Yingming Gao, and 1 more author
    arXiv preprint arXiv:2401.01044, 2024
  4. xue2023concss.png
    CONCSS: Contrastive-based Context Comprehension for Dialogue-appropriate Prosody in Conversational Speech Synthesis
    Yayue Deng, Jinlong Xue, Yukang Jia, and 6 more authors
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024


  1. xue2023cmcu.png
    CMCU-CSS: Enhancing Naturalness via Commonsense-based Multi-modal Context Understanding in Conversational Speech Synthesis
    Yayue Deng, Jinlong Xue, Yingming Gao, and 1 more author
    In Proceedings of the 31st ACM International Conference on Multimedia, (MM), 2023
  2. xue2023m.png
    M2-CTTS: End-to-End Multi-Scale Multi-Modal Conversational Text-to-Speech Synthesis
    Jinlong Xue, Yayue Deng, Fengping Wang, and 5 more authors
    In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023


  1. xue2022ecapa.png
    ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
    Jinlong Xue, Yayue Deng, Yichen Han, and 3 more authors
    In 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2022