Qwen3-Omni-Flash-Realtime

Real-time multimodal model with streaming audio input and VAD for live use.

Model Details

Provider

Alibaba Cloud

Model Type

multimodal

Context Window

65,536 tokens

Pricing

Input (1M)$0.52
Output (1M)$1.99

Capabilities

1. Real-time audio streaming

  • Built-in VAD for detecting speech.

2. Multimodal reasoning

  • Text, audio, image inputs.

3. Great for live agents

  • Call centers, tutoring, interactive systems.

The platform for your ideal software

Use Appaca to to do the most with any software you need, just for your use case.