OpenAI

GPT-4o Audio

Preview multimodal model that accepts and outputs audio, optimized for natural voice interactions and real-time conversational experiences.

audio 128K tokens context From $2.5 / 1M tokens

Use in Appaca

GPT-4o Audio at a glance

Context window

128K tokens

Input price

$2.5

per 1M tokens

Output price

$10

per 1M tokens

Why use GPT-4o Audio

Strengths, benchmarks, and where GPT-4o Audio fits in your team's workflow.

1. True multimodal audio model

Accepts raw audio as input and produces audio or text as output.
Enables hands-free, voice-first app experiences.

2. Natural real-time speech interaction

Low-latency audio generation suitable for conversational agents.
Great for voice assistants, phone bots, and interactive voice UI.

3. Large 128K context window

Supports long conversations, call transcripts, instructions, or multi-part interactions.
Ideal for building persistent voice agents or phone workflows.

4. High-output capacity

Up to 16,384 max output tokens for extended responses or long explanations.
Suitable for complex reasoning tasks in voice format.

5. Hybrid text + audio workloads

Combine audio input/output with text prompts, instructions, or structured control.
Useful for customer support bots, spoken form systems, IVR replacements, etc.

6. Compatible with the latest APIs

Works with Chat Completions, Responses API, Realtime API, and Assistants.
Supports streaming, function calling, and advanced developer tooling.

7. Strong performance for a preview model

High reasoning and expression abilities relative to most audio-capable models.
Designed for production-style experimentation prior to full release.

8. Ideal for next-gen voice applications

Build lifelike AI agents, interview bots, tutoring systems, and spoken knowledge tools.
Perfect for startups building audio-first user experiences.

Appaca

Power your team with GPT-4o Audio

Appaca is the AI workspace for operators. Build internal tools and AI co-workers powered by GPT-4o Audio - connected to your real data and ready for your whole team. No code, no deployment.

Internal transcription tools

Build internal tools that process audio with GPT-4o Audio - call summaries, meeting notes, voice search. Describe what you need, no code needed.

Automate audio processing

Schedule transcription to run automatically inside your workspace, then push summaries to Slack or your built-in database. No servers to manage.

One workspace for the whole team

Give everyone on your team access to audio-powered tools and AI co-workers from a single shared workspace, with team access built in.

Describe it, and it's built

Tell the Appaca agent what your team needs and it builds a working app powered by GPT-4o Audio - connected to the tools you already use.

Get started free Learn about Appaca

Solutions

Where teams use GPT-4o Audio

See how teams put GPT-4o Audio to work inside Appaca - internal tools and AI agents that work around you.

Sales

Transcribe and summarise sales calls with GPT-4o Audio, then push next steps straight into your pipeline.

Explore Appaca for Sales

HR

Turn interviews and meetings into structured notes your people team can search and share.

Explore Appaca for HR

Operations

Process recorded standups, briefings, and voice notes into reports automatically.

Explore Appaca for Operations

Explore all solutions

GPT-4o Audio pricing

Audio pricing

Text input

$2.5

per 1M tokens

Text output

$10

per 1M tokens

Audio input

$40

per 1M tokens

Audio output

$80

per 1M tokens

More OpenAI models

Compare other OpenAI models in the Appaca AI models directory - specs, pricing, and use cases for each.

text

GPT-5.5

OpenAI's smartest and most capable model yet for agentic coding, knowledge work, and computer use, delivering a new class of intelligence at GPT-5.4 latency.

View model

text

GPT-5.4

OpenAI's frontier model for complex professional work with best intelligence at scale for agentic, coding, and professional workflows.

View model

text

GPT-5.2

Previous frontier model for complex professional work with configurable reasoning effort.

View model

Browse all models

Compare GPT-4o Audio with other models

See how GPT-4o Audio stacks up against other audio models - pricing, context windows, and strengths side by side.

GPT-4o Audio vs GPT-4o mini Audio

Browse all comparisons

Build internal AI tools with GPT-4o Audio

Describe the tool your team needs and get a working app powered by GPT-4o Audio - with a built-in database, team access, and integrations. No code, no deployment.

Get started free

✦