Skip to content

Roadmap

  • Turn detection The goal is to approach real human conversation. The basic capability is deciding when the AI should start replying, when it should give backchannels, and when it should treat itself as interrupted. The advanced capability is letting the AI proactively interrupt and jump in.
  • Multi-speaker support
    • Multi-speaker identification models
    • How the LLM should use speaker information
  • TTS control
    • Style consistency across multi-sentence synthesis
    • Paralinguistic control, TTS API design, and trigger conditions
  • Applications
    • Voice workspace Users trigger the AI to call tools and perform tasks, similar to a voice-first OpenClaw
    • Robot conversation
    • Psychological counseling
  • API testing
  • Benchmarking
  • More model integrations
    • Turn detection models easyturn, smartturn, turnsense
    • TTS models Doubao TTS, Alibaba Bailian TTS, ...