Configure the service
Customize models
As mentioned before in Start the Service, an X-Talk instance can be created from a JSON config, which is used to customize the models in use.
Inspect the supported model categories:
from xtalk import Xtalk
print(Xtalk.MODEL_REGISTRY)
# Something like
# {
# "asr": ["xtalk.speech.asr"],
# "llm_agent": ["xtalk.llm_agent"],
# "tts": ["xtalk.speech.tts"],
# "embeddings": ["xtalk.embeddings"],
# "speaker_encoder": ["xtalk.speech.speaker_encoder"],
# "captioner": ["xtalk.speech.captioner"],
# "caption_rewriter": ["xtalk.rewriter"],
# "thought_rewriter": ["xtalk.rewriter"],
# "vad": ["xtalk.speech.vad"],
# "speech_enhancer": ["xtalk.speech.speech_enhancer"],
# "speech_speed_controller": ["xtalk.speech.speech_speed_controller"],
# "turn_detector": ["xtalk.speech.turn_detector"],
# }
For model configuration, the config item should match the model Python class name and its initialization arguments.
For example, DefaultAgent is defined in src/xtalk/llm_agent/default.py:
class DefaultAgent(Agent):
def __init__(
self,
model: BaseChatModel | dict,
system_prompt: str = _BASE_PROMPT,
voice_names: Optional[List[str]] = None,
emotions: Optional[List[str]] = None,
tools: Optional[List[Union[BaseTool, Callable[[], BaseTool]]]] = None,
):
...
To match those initialization arguments, the config item should look like:
"llm_agent": {
"type": "DefaultAgent",
"params": {
"model": {
"api_key": "none",
"base_url": "http://127.0.0.1:8000/v1",
"model": "cpatonn/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit"
},
"voice_names": [
"Man",
"Woman",
"Child"
],
"emotions": [
"happy",
"angry",
"sad",
"fear",
"disgust",
"depressed",
"surprised",
"calm",
"normal"
]
}
},
Optional keys such as voice_names, emotions, and tools can be omitted. tools is not supported in config yet.
See Supported Models for the full list of model types, their optional dependencies, and where they are adapted in the source code.
Note Most model implementations are client-side adapters. You may also need to start the model instance itself according to its corresponding instructions.
Customize service behavior
You can also customize service behavior, such as whether to save session audio under logs/ and whether to send full session audio back to the client:
"service_config": {
"recording": true,
"send_full_audio_to_client": true
}
See all service configuration for the full list of service configuration options.
Frontend configuration
The frontend accepts a SessionConfig through createSession(wsUrl, config). It currently supports three groups of options:
inputConfigoutputConfigserviceURLs
For example:
const session = createSession(wsUrl, {
inputConfig: {
sampleRate: 16000,
enableVAD: true,
enableEnhancer: true,
vadRedemptionMs: 500,
},
outputConfig: {
sampleRate: 48000,
},
serviceURLs: {
login: "/api/auth/login",
sessions: "/api/sessions",
sessionDetail: (sessionId) => `/api/sessions/${sessionId}`,
upload: "/api/upload",
},
});
inputConfig
inputConfig controls the frontend input audio session. In normal browser microphone mode, the most commonly used fields are:
sampleRateInput audio sample rate. The default is16000.enableVADWhether to enable frontend VAD. The default istrue.enableEnhancerWhether to enable frontend speech enhancement. The default istrue.vadRedemptionMsVAD redemption window in milliseconds.
Bridge mode with mode: "web_bridge" also supports:
modeEither"microphone"or"web_bridge".participantIdParticipant identifier for bridge-backed input.bridgeShared audio bridge instance.autoEmitVadWhether to auto-broadcast frontend VAD events in bridge mode.
outputConfig
outputConfig currently mainly supports:
sampleRateOutput playback sample rate. The default is48000.
serviceURLs
serviceURLs overrides auxiliary HTTP endpoints. It currently supports:
loginsessionssessionDetailThis can be either a fixed URL or a function of the form(sessionId) => URL.upload
If omitted, the frontend derives default URLs automatically from wsUrl:
POST /api/auth/loginGET /api/sessionsGET /api/sessions/{session_id}POST /api/upload