Recipe

The examples below show how to extend the framework by modifying it directly.

Introduce a New ASR Model

Assume you want to add Qwen3ASRFlashRealtime, whose implementation currently lives in src/xtalk/speech/asr/qwen3_asr_flash_realtime.py.

Create qwen3_asr_flash_realtime.py under src/xtalk/speech/asr.
Prepare the class skeleton and implement the required methods. For model interfaces, refer to src/xtalk/speech/interfaces.py.

from ..interfaces import ASR


class Qwen3ASRFlashRealtime(ASR):
    def __init__(
        self,
        *,
        api_key: Optional[str] = None,
        config: Optional[Qwen3ASRFlashConfig] = None,
    ) -> None:
        ...

    def recognize(self, audio: bytes) -> str:
        ...

    def recognize_stream(
        self,
        audio: bytes,
        *,
        is_final: bool = False,
        chat_history: str | None = None,
    ) -> str:
        ...

    def stream_chunk_bytes_hint(self) -> int | None:
        ...

    def reset(self) -> None:
        ...

    def clone(self) -> "ASR":
        ...

    async def async_recognize(self, audio: bytes) -> str:
        ...

    async def async_recognize_stream(
        self,
        audio: bytes,
        *,
        is_final: bool = False,
        chat_history: str | None = None,
    ) -> str:
        ...

Register the new implementation in src/xtalk/speech/asr/__init__.py.
Use it in the configuration:

"asr": {
        "type": "Qwen3ASRFlashRealtime",
        "params": {
            "api_key": "your key"
        }
    }

Introduce a New Agent

Refer to src/xtalk/llm_agent/experimental.py. The implementation and configuration process is similar to the section above, but remember to register the model in src/xtalk/llm_agent/__init__.py.

`accept` Logic

async def async_accept(self, context: AgentContext) -> AsyncIterator[AgentOutput]:
    pass

The accept method subscribes to external inputs and starts the related processing logic. AgentContext comes from src/xtalk/serving/modules/llm_agent_context_manager.py. The currently stable context types include asr_partial, asr_final, and loop.

The loop event is triggered once when the connection is established. It can be used for any proactive logic, or to start an output loop. src/xtalk/llm_agent/experimental.py uses it to trigger proactive dialogue.

AgentOutput can be a string, a tool call, or a tool call result. After a tool call returns, the Manager can use it to trigger related logic. For example, in src/xtalk/serving/modules/llm_agent_context_manager.py, the direct_audio tool call triggers downstream logic in src/xtalk/serving/modules/direct_audio_manager.py to generate directly playable audio events.

From a design perspective, Agent is expected to be the main reasoning core of the whole system and to integrate information from other components into output.

Introduce a New Manager

The Agent in the previous section requires a new src/xtalk/serving/modules/direct_audio_manager.py to forward tool-call output into audio events. All Manager implementations can be created directly under src/xtalk/serving/modules, and then registered in src/xtalk/serving/service.py and src/xtalk/serving/module_types.py.

Manager uses the observer pattern for event subscription and publishing. All events are defined in src/xtalk/serving/events.py. src/xtalk/serving/modules/input_gateway.py and src/xtalk/serving/modules/output_gateway.py are special cases responsible for receiving frontend input and sending output back to the frontend.

To invoke models inside a Manager, refer to src/xtalk/serving/modules/asr_manager.py. You can use methods such as pipeline.get_asr_model.

Note that wait_for_completion in the event publishing method publish controls whether await waits until listeners directly triggered by that event have completed. Enabling wait_for_completion along an event chain ensures that every handler in that chain finishes before control returns to the original event source.

Introduce a New Model Type

Create the interface in src/xtalk/speech/interfaces.py, then create the corresponding folder and model file under src/xtalk/speech. The workflow is similar to introducing a new ASR model. After that, register the new type in src/xtalk/model_loader.py and src/xtalk/model_types.py.