支持的模型
语音识别
配置中的名称:asr
SherpaOnnx [推荐]
依赖: pip install "xtalk[sherpa-onnx-asr] @ git+https://github.com/xcc-zach/xtalk.git@main"
路径: src/xtalk/speech/asr/sherpa_onnx_asr.py
一个高性能的语音识别框架,并且不仅限于此。
Qwen3ASRFlashRealtime
依赖: pip install "xtalk[ali] @ git+https://github.com/xcc-zach/xtalk.git@main"
路径: src/xtalk/speech/asr/qwen3_asr_flash_realtime.py
Zipformer
依赖: pip install "xtalk[zipformer-local] @ git+https://github.com/xcc-zach/xtalk.git@main"
路径: src/xtalk/speech/asr/zipformer_local.py
ElevenLabs
依赖: pip install "xtalk[elevenlabs] @ git+https://github.com/xcc-zach/xtalk.git@main"
路径: src/xtalk/speech/asr/elevenlabs.py
文本转语音
配置中的名称:tts
IndexTTS [推荐]
依赖: pip install "xtalk[index-tts] @ git+https://github.com/xcc-zach/xtalk.git@main"
路径:
- src/xtalk/speech/tts/index_tts.py
- src/xtalk/speech/tts/index_tts2.py
GPT-SoVITS
实验性支持。如遇问题,欢迎提交 issue。
依赖: pip install "xtalk[gpt-sovits] @ git+https://github.com/xcc-zach/xtalk.git@main"
路径: src/xtalk/speech/tts/gpt_sovits.py
CosyVoice
依赖: pip install "xtalk[ali] @ git+https://github.com/xcc-zach/xtalk.git@main"
路径: src/xtalk/speech/tts/cosyvoice.py
ElevenLabs
依赖: pip install "xtalk[elevenlabs] @ git+https://github.com/xcc-zach/xtalk.git@main"
路径: src/xtalk/speech/tts/elevenlabs.py
语音活动检测
配置中的名称:vad
X-Talk 已经在客户端侧提供了 VAD,因此您可能不一定需要额外部署一个。
Silero VAD
依赖: pip install "xtalk[silero-vad] @ git+https://github.com/xcc-zach/xtalk.git@main"
路径: src/xtalk/speech/vad/silero_vad.py
轮次检测
配置中的名称:turn_detector
Turn detector 用于判断用户是否已经说完,并决定系统何时开始生成回复。
SoulxDuplug
依赖: pip install "xtalk[soulx-duplug] @ git+https://github.com/xcc-zach/xtalk.git@main"
路径: src/xtalk/speech/turn_detector/soulx_duplug.py
TurnSense
依赖: pip install "xtalk[turn-sense] @ git+https://github.com/xcc-zach/xtalk.git@main"
路径: src/xtalk/speech/turn_detector/turn_sense.py
语音增强
配置中的名称:speech_enhancer
FastEnhancer
依赖: pip install onnxruntime
路径: src/xtalk/speech/speech_enhancer/speech_enhancer.py
说话人识别
配置中的名称:speaker_encoder
Wespeaker-Voxceleb-Resnet34-LM
依赖: pip install "xtalk[pyannote] @ git+https://github.com/xcc-zach/xtalk.git@main"
路径: src/xtalk/speech/speaker_encoder/pyannote_embedding.py
字幕生成器
配置中的名称:captioner
Captioner 用于生成音频片段的文字描述。
Qwen3-Omni-30B-A3B-Captioner
依赖: 无
路径: src/xtalk/speech/captioner/qwen3_omni_captioner.py