Transcription
RealtimeKit provides two transcription modes powered by Cloudflare Workers AI:
| Mode | Model | Processing time | Use case |
|---|---|---|---|
| Real-time | Deepgram Nova-3 | During the meeting | Live captions for attendees |
| Post-meeting | Whisper Large v3 Turbo | After the meeting ends | Transcript files and webhooks |
RealtimeKit processes each participant audio stream separately. This helps identify each speaker in the final transcript.
We recommend upgrading to the Workers Paid plan to avoid Workers AI processing limits on the Free plan. Learn more in Billing and Free plan limits.
Real-time transcription streams participant audio to Deepgram Nova-3 on Workers AI and sends transcript events to meeting participants during the meeting.
You can turn on real-time transcription for participants by setting permissions.transcription_enabled: true in the participant's preset. This lets you decide which participant audio is transcribed. For example, you can transcribe speaker audio without transcribing audience audio.
To update an existing preset, use the Update a preset API:
curl -X PATCH "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/realtime/kit/$APP_ID/presets/$PRESET_ID" \ -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "permissions": { "transcription_enabled": true } }'To create a preset, refer to the Create a preset API reference.
RealtimeKit transcribes audio only for participants who join with a preset that has permissions.transcription_enabled: true.
During the meeting, RealtimeKit streams transcript updates to the client SDK. To access existing transcripts from meeting.ai.transcripts or listen for new transcript events with meeting.ai.on("transcript", ...), refer to Consume real-time transcripts.
The preset controls whose audio is transcribed. The meeting configuration controls how RealtimeKit transcribes that audio. Use ai_config.transcription to set the spoken language, boost recognition for custom terms, and control profanity filtering for a specific meeting.
curl -X POST "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/realtime/kit/$APP_ID/meetings" \ -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "title": "Weekly product review", "ai_config": { "transcription": { "language": "en-US", "keywords": ["RealtimeKit", "Cloudflare"], "profanity_filter": false } } }'| Option | Type | Default | Description |
|---|---|---|---|
language | string | en-US | Language code for transcription |
keywords | string[] | [] | Terms to boost recognition (names, jargon) |
profanity_filter | boolean | false | Filter offensive language |
Real-time transcription is powered by Deepgram Nova-3 on Workers AI.
Nova-3 on Workers AI supports the following languages for transcription:
| Language | Code(s) |
|---|---|
| English | en, en-US, en-AU, en-GB, en-IN, en-NZ |
| Spanish | es, es-419 |
| French | fr, fr-CA |
| German | de, de-CH |
| Hindi | hi |
| Russian | ru |
| Portuguese | pt, pt-BR, pt-PT |
| Japanese | ja |
| Italian | it |
| Dutch | nl |
Use multi for automatic multilingual detection across all of the languages listed above.
If no language is specified, the model defaults to en-US. For best accuracy, explicitly set the language code matching your audio.
Real-time transcription sends interim and final transcript updates to the client SDK. Use interim updates for live captions, and use final updates for transcript history or saved UI state.
// Get transcript entries already received by the client.const transcripts = meeting.ai.transcripts;
// Listen for transcript updates during the meeting.meeting.ai.on("transcript", (transcript) => { if (transcript.isPartialTranscript) { updateLiveCaption(transcript.peerId, transcript.transcript); return; }
appendFinalTranscript(transcript);});{ "id": "1a2b3c4d-5678-90ab-cdef-1234567890ab", "name": "Alice", "peerId": "4f5g6h7i-8j9k-0lmn-opqr-1234567890st", "userId": "uvwxyz-1234-5678-90ab-cdefghijklmn", "customParticipantId": "abc123xyz", "transcript": "Hello everyone", "isPartialTranscript": false, "timestamp": 1716700000000}| Field | Description |
|---|---|
id | Unique transcript entry ID |
name | Display name of the participant who spoke |
peerId | Peer ID of the participant who spoke. Changes if they rejoin. |
userId | Persistent participant ID |
customParticipantId | Participant identifier set when the participant was added |
transcript | Transcribed text |
isPartialTranscript | true for interim updates, false for final updates |
timestamp | Unix epoch timestamp in milliseconds |
Post-meeting transcription generates a transcript after the meeting ends using Whisper Large v3 Turbo on Workers AI and delivers it through a webhook or REST API. To identify speakers, RealtimeKit processes each participant's audio separately before creating the final transcript.
You can turn on post-meeting transcription when you create a meeting. Set transcribe_on_end: true to generate a transcript after the meeting ends. To also generate a summary automatically after the transcript is available, set summarize_on_end: true.
curl -X POST "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/realtime/kit/$APP_ID/meetings" \ -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "title": "Weekly product review", "transcribe_on_end": true, "summarize_on_end": true, "ai_config": { "transcription": { "language": "en" } } }'Use ai_config.transcription.language to set the transcript language. For supported values, refer to Post-meeting supported languages. If transcribe_on_end is not set, RealtimeKit does not generate a post-meeting transcript.
Post-meeting transcription supports Whisper Large v3 Turbo language codes. Omit ai_config.transcription.language to let Whisper detect the spoken language.
Common language codes include:
| Language | Code | Language | Code | Language | Code |
|---|---|---|---|---|---|
| English | en | Spanish | es | French | fr |
| German | de | Hindi | hi | Portuguese | pt |
| Japanese | ja | Italian | it | Dutch | nl |
| Russian | ru | Chinese | zh | Cantonese | yue |
Additional post-meeting language codes
| Language | Code |
|---|---|
| Afrikaans | af |
| Albanian | sq |
| Amharic | am |
| Arabic | ar |
| Assamese | as |
| Azerbaijani | az |
| Bashkir | ba |
| Basque | eu |
| Belarusian | be |
| Bengali | bn |
| Bosnian | bs |
| Breton | br |
| Bulgarian | bg |
| Catalan | ca |
| Croatian | hr |
| Czech | cs |
| Danish | da |
| Estonian | et |
| Faroese | fo |
| Finnish | fi |
| Galician | gl |
| Georgian | ka |
| Greek | el |
| Gujarati | gu |
| Haitian Creole | ht |
| Hausa | ha |
| Hawaiian | haw |
| Hebrew | he |
| Hungarian | hu |
| Icelandic | is |
| Indonesian | id |
| Javanese | jw |
| Kannada | kn |
| Kazakh | kk |
| Khmer | km |
| Korean | ko |
| Lao | lo |
| Latin | la |
| Latvian | lv |
| Lingala | ln |
| Lithuanian | lt |
| Luxembourgish | lb |
| Macedonian | mk |
| Malagasy | mg |
| Malay | ms |
| Malayalam | ml |
| Maltese | mt |
| Maori | mi |
| Marathi | mr |
| Mongolian | mn |
| Myanmar | my |
| Nepali | ne |
| Norwegian | no |
| Norwegian Nynorsk | nn |
| Occitan | oc |
| Pashto | ps |
| Persian | fa |
| Polish | pl |
| Punjabi | pa |
| Romanian | ro |
| Sanskrit | sa |
| Serbian | sr |
| Shona | sn |
| Sindhi | sd |
| Sinhala | si |
| Slovak | sk |
| Slovenian | sl |
| Somali | so |
| Sundanese | su |
| Swahili | sw |
| Swedish | sv |
| Tagalog | tl |
| Tajik | tg |
| Tamil | ta |
| Tatar | tt |
| Telugu | te |
| Thai | th |
| Tibetan | bo |
| Turkmen | tk |
| Turkish | tr |
| Ukrainian | uk |
| Urdu | ur |
| Uzbek | uz |
| Vietnamese | vi |
| Welsh | cy |
| Yiddish | yi |
| Yoruba | yo |
Post-meeting transcripts are available in multiple formats. Use CSV or JSON for application workflows, and use SRT or VTT when you need subtitle files.
| Format | Use case |
|---|---|
| CSV | Spreadsheets and data analysis |
| JSON | Programmatic access |
| SRT | Video subtitle files |
| VTT | Web video captions (<track> element) |
"1000","peer-123","user-456","cust-789","Alice","Hello everyone""3000","peer-234","user-567","cust-890","Bob","Hi Alice"CSV rows use the following field order: start time in milliseconds, peer ID, user ID, custom participant ID, participant name, and transcript text.
[ { "startTime": 1000, "endTime": 2500, "sentence": "Hello everyone", "peerData": { "id": "peer-123", "userId": "user-456", "displayName": "Alice", "cpi": "cust-789", "joinedAt": "2024-08-07T10:15:29.000Z", "leftAt": "" } }]100:00:01,000 --> 00:00:02,500Alice: Hello everyone
200:00:03,000 --> 00:00:04,500Bob: Hi AliceAfter RealtimeKit finishes processing a post-meeting transcript, you can receive the transcript download URL through a webhook or fetch it with the REST API. Use webhooks for asynchronous backend workflows, and use the REST API when you need to retrieve the transcript for a specific session.
Configure the meeting.transcript event in RealtimeKit webhooks:
{ "event": "meeting.transcript", "meeting": { "id": "bbb8940e-1b97-402a-97d6-2708b7feca41", "title": "Weekly sync", "endedAt": "2026-06-03T10:30:00.000Z", "createdAt": "2026-06-03T10:00:00.000Z", "sessionId": "05e57591-d89e-45c9-ae44-08dc1eaad0e0", "startedAt": "2026-06-03T10:00:00.000Z", "status": "LIVE", "organizedBy": { "id": "c94c437b-592a-4a39-b9e2-47ef1451e43b", "name": "Example organization" } }, "transcriptDownloadUrl": "https://example.com/transcript.csv", "transcriptDownloadUrlExpiry": "2026-06-10T10:30:00.000Z"}Refer to Fetch the complete transcript for a session.
curl -X GET "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/realtime/kit/$APP_ID/sessions/$SESSION_ID/transcript" \ -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN"Transcripts are available for 7 days after the meeting ends. Download or copy transcript files before the URL expiry time returned by the webhook or REST API.
RealtimeKit's default transcription records each participant's audio track and processes it with Workers AI. Workers AI usage is billed to your Cloudflare account using audio model pricing, which scales by participant audio minutes, not meeting duration.
On the Workers Free plan, Workers AI includes 10,000 Neurons per day. To use more than 10,000 Neurons per day, upgrade to the Workers Paid plan. Workers Paid includes the same 10,000 daily free Neurons, then bills additional usage at $0.011 per 1,000 Neurons.
You can upgrade to the Workers Paid plan in the Cloudflare dashboard under Manage account.
RealtimeKit transcription uses these Workers AI audio model rates:
| Transcription mode | Workers AI model | Neurons per audio minute |
|---|---|---|
| Post-meeting | @cf/openai/whisper-large-v3-turbo | 46.63 |
| Real-time | @cf/deepgram/nova-3 WebSocket | 836.36 |
RealtimeKit transcription is a managed transcription workflow. When transcription is turned on, RealtimeKit processes participant audio with Workers AI and stores transcript outputs in RealtimeKit-managed storage.
For real-time transcription, RealtimeKit streams audio from participants with transcription turned on to Workers AI and sends transcript updates to meeting participants during the meeting.
For post-meeting transcription, RealtimeKit processes each participant's audio separately after the meeting ends, creates the final transcript files, and makes them available through a webhook or REST API.