Qwen3-Omni-Flash
Hybrid thinking multimodal model with upgraded vision, audio, and agent abilities.
Model Details
Provider
Alibaba Cloud
Model Type
multimodal
Context Window
65,536 tokens
Pricing
Input (1M)$0.43
Output (1M)$1.66
Capabilities
1. Advanced multimodal reasoning
- Vision, audio, video inputs.
2. Supports thinking mode
- Unique for multimodal.
3. 17 voices, 10 languages
- Great for voice agents.
4. Designed for real-world interactions
- Recognition, teaching, analysis.