Roadmap

Turn detection The goal is to approach real human conversation. The basic capability is deciding when the AI should start replying, when it should give backchannels, and when it should treat itself as interrupted. The advanced capability is letting the AI proactively interrupt and jump in.
Multi-speaker support
- Multi-speaker identification models
- How the LLM should use speaker information
TTS control
- Style consistency across multi-sentence synthesis
- Paralinguistic control, TTS API design, and trigger conditions
Applications
- Voice workspace Users trigger the AI to call tools and perform tasks, similar to a voice-first OpenClaw
- Robot conversation
- Psychological counseling
API testing
Benchmarking
More model integrations
- Turn detection models easyturn, smartturn, turnsense
- TTS models Doubao TTS, Alibaba Bailian TTS, ...