API Guides & Tutorials
SPEECH TO TEXT
Transcribe audio to text using Sarvam AI's saaras model with 22 Indic language support.
Overview
The Speech to Text API transcribes audio files into text. Powered by Sarvam AI's saaras:v3 model with support for 22 Indian languages + English. Supports auto language detection.
Endpoint: POST /v1/audio/transcriptions
Basic Usage
from openai import OpenAI
client = OpenAI(
api_key="cm_your_key",
base_url="https://api.callmissed.com/v1"
)
with open("audio.wav", "rb") as f:
response = client.audio.transcriptions.create(
model="saaras:v3",
file=f
)
print(response.text)Parameters
| Parameter | Type | Description |
|---|---|---|
model | string | saaras:v3 |
file | file | Audio file (WAV, MP3, etc.) |
language | string | Language code (auto-detected if omitted) |
mode | string | Output mode β see below |
response_format | string | json, text, or verbose_json |
timestamp_granularities[] | array | ["word"] for word-level timestamps (OpenAI-compatible) |
Output Modes
| Mode | Description |
|---|---|
transcribe | Standard transcription (default) |
translate | Transcribe and translate to English |
verbatim | Exact transcription including filler words |
translit | Transliteration to Latin script |
codemix | Code-mixed output (Indic + English) |
Was this page helpful?