Qwen3-Omni-Flash

Hybrid thinking multimodal model with upgraded vision, audio, and agent abilities.

Model Details

Provider

Alibaba Cloud

Model Type

multimodal

Context Window

65,536 tokens

Pricing

Input (1M)$0.43
Output (1M)$1.66

Capabilities

1. Advanced multimodal reasoning

  • Vision, audio, video inputs.

2. Supports thinking mode

  • Unique for multimodal.

3. 17 voices, 10 languages

  • Great for voice agents.

4. Designed for real-world interactions

  • Recognition, teaching, analysis.

The platform for your ideal software

Use Appaca to to do the most with any software you need, just for your use case.