Qwen3-Omni-Flash-Realtime
Real-time multimodal model with streaming audio input and VAD for live use.
Model Details
Provider
Alibaba Cloud
Model Type
multimodal
Context Window
65,536 tokens
Pricing
Input (1M)$0.52
Output (1M)$1.99
Capabilities
1. Real-time audio streaming
- Built-in VAD for detecting speech.
2. Multimodal reasoning
- Text, audio, image inputs.
3. Great for live agents
- Call centers, tutoring, interactive systems.