Skip to content

Supported Models

Speech Recognition

Name in config: asr

SherpaOnnx [Recommended]

Dependency: pip install "xtalk[sherpa-onnx-asr] @ git+https://github.com/xcc-zach/xtalk.git@main"

Path: src/xtalk/speech/asr/sherpa_onnx_asr.py

A high-performance speech recognition framework and beyond.

Repo

Models

Tutorial to start speech recognition server

Qwen3ASRFlashRealtime

Dependency: pip install "xtalk[ali] @ git+https://github.com/xcc-zach/xtalk.git@main"

Path: src/xtalk/speech/asr/qwen3_asr_flash_realtime.py

Details

Zipformer

Dependency: pip install "xtalk[zipformer-local] @ git+https://github.com/xcc-zach/xtalk.git@main"

Path: src/xtalk/speech/asr/zipformer_local.py

Details

ElevenLabs

Dependency: pip install "xtalk[elevenlabs] @ git+https://github.com/xcc-zach/xtalk.git@main"

Path: src/xtalk/speech/asr/elevenlabs.py

API Reference

Text to Speech

Name in config: tts

IndexTTS [Recommended]

Dependency: pip install "xtalk[index-tts] @ git+https://github.com/xcc-zach/xtalk.git@main"

Path: - src/xtalk/speech/tts/index_tts.py - src/xtalk/speech/tts/index_tts2.py

Repo

Installation (vllm boost)

GPT-SoVITS

Experimental. Feel free to open an issue for any problem.

Dependency: pip install "xtalk[gpt-sovits] @ git+https://github.com/xcc-zach/xtalk.git@main"

Path: src/xtalk/speech/tts/gpt_sovits.py

Repo

CosyVoice

Dependency: pip install "xtalk[ali] @ git+https://github.com/xcc-zach/xtalk.git@main"

Path: src/xtalk/speech/tts/cosyvoice.py

Details

ElevenLabs

Dependency: pip install "xtalk[elevenlabs] @ git+https://github.com/xcc-zach/xtalk.git@main"

Path: src/xtalk/speech/tts/elevenlabs.py

API Reference

Voice Activity Detection

Name in config: vad

X-Talk has VAD on client side, so you may not need one.

Silero VAD

Dependency: pip install "xtalk[silero-vad] @ git+https://github.com/xcc-zach/xtalk.git@main"

Path: src/xtalk/speech/vad/silero_vad.py

Model Details VAD-Web

Turn Detection

Name in config: turn_detector

Turn detectors decide when the user has finished speaking and the system should start generation.

SoulxDuplug

Dependency: pip install "xtalk[soulx-duplug] @ git+https://github.com/xcc-zach/xtalk.git@main"

Path: src/xtalk/speech/turn_detector/soulx_duplug.py

Repo

TurnSense

Dependency: pip install "xtalk[turn-sense] @ git+https://github.com/xcc-zach/xtalk.git@main"

Path: src/xtalk/speech/turn_detector/turn_sense.py

Original Repo

X-Talk-adapted service deployment reference

Speech Enhancement

Name in config: speech_enhancer

FastEnhancer

Dependency: pip install onnxruntime

Path: src/xtalk/speech/speech_enhancer/speech_enhancer.py

Model Details

Speaker Recognition

Name in config: speaker_encoder

Wespeaker-Voxceleb-Resnet34-LM

Dependency: pip install "xtalk[pyannote] @ git+https://github.com/xcc-zach/xtalk.git@main"

Path: src/xtalk/speech/speaker_encoder/pyannote_embedding.py

Wespeaker Model Details

Captioner

Name in config: captioner

Captioners give you description of audio clip.

Qwen3-Omni-30B-A3B-Captioner

Dependency: None

Path: src/xtalk/speech/captioner/qwen3_omni_captioner.py

HuggingFace ModelScope