Skip to content

Transcription

RealtimeKit provides two transcription modes powered by Cloudflare Workers AI:

ModeModelProcessing timeUse case
Real-timeDeepgram Nova-3During the meetingLive captions for attendees
Post-meetingWhisper Large v3 TurboAfter the meeting endsTranscript files and webhooks

RealtimeKit processes each participant audio stream separately. This helps identify each speaker in the final transcript.

We recommend upgrading to the Workers Paid plan to avoid Workers AI processing limits on the Free plan. Learn more in Billing and Free plan limits.

Real-time transcription

Real-time transcription streams participant audio to Deepgram Nova-3 on Workers AI and sends transcript events to meeting participants during the meeting.

Turn on real-time transcription

You can turn on real-time transcription for participants by setting permissions.transcription_enabled: true in the participant's preset. This lets you decide which participant audio is transcribed. For example, you can transcribe speaker audio without transcribing audience audio.

To update an existing preset, use the Update a preset API:

Terminal window
curl -X PATCH "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/realtime/kit/$APP_ID/presets/$PRESET_ID" \
-H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"permissions": {
"transcription_enabled": true
}
}'

To create a preset, refer to the Create a preset API reference.

RealtimeKit transcribes audio only for participants who join with a preset that has permissions.transcription_enabled: true.

During the meeting, RealtimeKit streams transcript updates to the client SDK. To access existing transcripts from meeting.ai.transcripts or listen for new transcript events with meeting.ai.on("transcript", ...), refer to Consume real-time transcripts.

Configure transcription settings

The preset controls whose audio is transcribed. The meeting configuration controls how RealtimeKit transcribes that audio. Use ai_config.transcription to set the spoken language, boost recognition for custom terms, and control profanity filtering for a specific meeting.

Terminal window
curl -X POST "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/realtime/kit/$APP_ID/meetings" \
-H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"title": "Weekly product review",
"ai_config": {
"transcription": {
"language": "en-US",
"keywords": ["RealtimeKit", "Cloudflare"],
"profanity_filter": false
}
}
}'
OptionTypeDefaultDescription
languagestringen-USLanguage code for transcription
keywordsstring[][]Terms to boost recognition (names, jargon)
profanity_filterbooleanfalseFilter offensive language

Real-time supported languages

Real-time transcription is powered by Deepgram Nova-3 on Workers AI.

Nova-3 on Workers AI supports the following languages for transcription:

LanguageCode(s)
Englishen, en-US, en-AU, en-GB, en-IN, en-NZ
Spanishes, es-419
Frenchfr, fr-CA
Germande, de-CH
Hindihi
Russianru
Portuguesept, pt-BR, pt-PT
Japaneseja
Italianit
Dutchnl

Use multi for automatic multilingual detection across all of the languages listed above.

If no language is specified, the model defaults to en-US. For best accuracy, explicitly set the language code matching your audio.

Consume real-time transcripts

Real-time transcription sends interim and final transcript updates to the client SDK. Use interim updates for live captions, and use final updates for transcript history or saved UI state.

Client SDK

JavaScript
// Get transcript entries already received by the client.
const transcripts = meeting.ai.transcripts;
// Listen for transcript updates during the meeting.
meeting.ai.on("transcript", (transcript) => {
if (transcript.isPartialTranscript) {
updateLiveCaption(transcript.peerId, transcript.transcript);
return;
}
appendFinalTranscript(transcript);
});

Transcript payload

{
"id": "1a2b3c4d-5678-90ab-cdef-1234567890ab",
"name": "Alice",
"peerId": "4f5g6h7i-8j9k-0lmn-opqr-1234567890st",
"userId": "uvwxyz-1234-5678-90ab-cdefghijklmn",
"customParticipantId": "abc123xyz",
"transcript": "Hello everyone",
"isPartialTranscript": false,
"timestamp": 1716700000000
}
FieldDescription
idUnique transcript entry ID
nameDisplay name of the participant who spoke
peerIdPeer ID of the participant who spoke. Changes if they rejoin.
userIdPersistent participant ID
customParticipantIdParticipant identifier set when the participant was added
transcriptTranscribed text
isPartialTranscripttrue for interim updates, false for final updates
timestampUnix epoch timestamp in milliseconds

Post-meeting transcription

Post-meeting transcription generates a transcript after the meeting ends using Whisper Large v3 Turbo on Workers AI and delivers it through a webhook or REST API. To identify speakers, RealtimeKit processes each participant's audio separately before creating the final transcript.

Turn on post-meeting transcription

You can turn on post-meeting transcription when you create a meeting. Set transcribe_on_end: true to generate a transcript after the meeting ends. To also generate a summary automatically after the transcript is available, set summarize_on_end: true.

Terminal window
curl -X POST "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/realtime/kit/$APP_ID/meetings" \
-H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"title": "Weekly product review",
"transcribe_on_end": true,
"summarize_on_end": true,
"ai_config": {
"transcription": {
"language": "en"
}
}
}'

Use ai_config.transcription.language to set the transcript language. For supported values, refer to Post-meeting supported languages. If transcribe_on_end is not set, RealtimeKit does not generate a post-meeting transcript.

Post-meeting supported languages

Post-meeting transcription supports Whisper Large v3 Turbo language codes. Omit ai_config.transcription.language to let Whisper detect the spoken language.

Common language codes include:

LanguageCodeLanguageCodeLanguageCode
EnglishenSpanishesFrenchfr
GermandeHindihiPortuguesept
JapanesejaItalianitDutchnl
RussianruChinesezhCantoneseyue

Additional post-meeting language codes

LanguageCode
Afrikaansaf
Albaniansq
Amharicam
Arabicar
Assameseas
Azerbaijaniaz
Bashkirba
Basqueeu
Belarusianbe
Bengalibn
Bosnianbs
Bretonbr
Bulgarianbg
Catalanca
Croatianhr
Czechcs
Danishda
Estonianet
Faroesefo
Finnishfi
Galiciangl
Georgianka
Greekel
Gujaratigu
Haitian Creoleht
Hausaha
Hawaiianhaw
Hebrewhe
Hungarianhu
Icelandicis
Indonesianid
Javanesejw
Kannadakn
Kazakhkk
Khmerkm
Koreanko
Laolo
Latinla
Latvianlv
Lingalaln
Lithuanianlt
Luxembourgishlb
Macedonianmk
Malagasymg
Malayms
Malayalamml
Maltesemt
Maorimi
Marathimr
Mongolianmn
Myanmarmy
Nepaline
Norwegianno
Norwegian Nynorsknn
Occitanoc
Pashtops
Persianfa
Polishpl
Punjabipa
Romanianro
Sanskritsa
Serbiansr
Shonasn
Sindhisd
Sinhalasi
Slovaksk
Sloveniansl
Somaliso
Sundanesesu
Swahilisw
Swedishsv
Tagalogtl
Tajiktg
Tamilta
Tatartt
Telugute
Thaith
Tibetanbo
Turkmentk
Turkishtr
Ukrainianuk
Urduur
Uzbekuz
Vietnamesevi
Welshcy
Yiddishyi
Yorubayo

Output formats

Post-meeting transcripts are available in multiple formats. Use CSV or JSON for application workflows, and use SRT or VTT when you need subtitle files.

FormatUse case
CSVSpreadsheets and data analysis
JSONProgrammatic access
SRTVideo subtitle files
VTTWeb video captions (<track> element)

Examples

"1000","peer-123","user-456","cust-789","Alice","Hello everyone"
"3000","peer-234","user-567","cust-890","Bob","Hi Alice"

CSV rows use the following field order: start time in milliseconds, peer ID, user ID, custom participant ID, participant name, and transcript text.

Consume post-meeting transcripts

After RealtimeKit finishes processing a post-meeting transcript, you can receive the transcript download URL through a webhook or fetch it with the REST API. Use webhooks for asynchronous backend workflows, and use the REST API when you need to retrieve the transcript for a specific session.

Webhook

Configure the meeting.transcript event in RealtimeKit webhooks:

{
"event": "meeting.transcript",
"meeting": {
"id": "bbb8940e-1b97-402a-97d6-2708b7feca41",
"title": "Weekly sync",
"endedAt": "2026-06-03T10:30:00.000Z",
"createdAt": "2026-06-03T10:00:00.000Z",
"sessionId": "05e57591-d89e-45c9-ae44-08dc1eaad0e0",
"startedAt": "2026-06-03T10:00:00.000Z",
"status": "LIVE",
"organizedBy": {
"id": "c94c437b-592a-4a39-b9e2-47ef1451e43b",
"name": "Example organization"
}
},
"transcriptDownloadUrl": "https://example.com/transcript.csv",
"transcriptDownloadUrlExpiry": "2026-06-10T10:30:00.000Z"
}

REST API

Refer to Fetch the complete transcript for a session.

Terminal window
curl -X GET "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/realtime/kit/$APP_ID/sessions/$SESSION_ID/transcript" \
-H "Authorization: Bearer $CLOUDFLARE_API_TOKEN"

Transcript availability

Transcripts are available for 7 days after the meeting ends. Download or copy transcript files before the URL expiry time returned by the webhook or REST API.

Billing and Free plan limits

RealtimeKit's default transcription records each participant's audio track and processes it with Workers AI. Workers AI usage is billed to your Cloudflare account using audio model pricing, which scales by participant audio minutes, not meeting duration.

On the Workers Free plan, Workers AI includes 10,000 Neurons per day. To use more than 10,000 Neurons per day, upgrade to the Workers Paid plan. Workers Paid includes the same 10,000 daily free Neurons, then bills additional usage at $0.011 per 1,000 Neurons.

You can upgrade to the Workers Paid plan in the Cloudflare dashboard under Manage account.

RealtimeKit transcription uses these Workers AI audio model rates:

Transcription modeWorkers AI modelNeurons per audio minute
Post-meeting@cf/openai/whisper-large-v3-turbo46.63
Real-time@cf/deepgram/nova-3 WebSocket836.36

Data processing and storage

RealtimeKit transcription is a managed transcription workflow. When transcription is turned on, RealtimeKit processes participant audio with Workers AI and stores transcript outputs in RealtimeKit-managed storage.

For real-time transcription, RealtimeKit streams audio from participants with transcription turned on to Workers AI and sends transcript updates to meeting participants during the meeting.

For post-meeting transcription, RealtimeKit processes each participant's audio separately after the meeting ends, creates the final transcript files, and makes them available through a webhook or REST API.