Speech-to-Text Guide
Discover the core transcription features of Salad Transcription API. Learn how to utilize parameters like return_as_file
, sentence_level_timestamps
, word_level_timestamps
, diarization
, sentence_diarization
, and how to specify language_code
for optimal performance.
Transcription API Features Guide
Introduction
Salad Transcription API offers a suite of powerful features to help you get the most out of your audio and video content. This guide covers the key transcription parameters you can use to customize your transcription outputs:
- Transcription Output Options:
return_as_file
sentence_level_timestamps
word_level_timestamps
- Speaker Identification:
diarization
sentence_diarization
- Language Specification:
language_code
By properly utilizing these parameters, you can enhance the accuracy, efficiency, and usability of your transcriptions.
Transcription Parameters
1. return_as_file
Description
The return_as_file
parameter allows you to receive your transcription output as a downloadable file URL. This is
particularly useful when dealing with large responses, as it helps avoid issues with response size limitations.
- Default:
false
- Type:
boolean
Usage
Set "return_as_file": true
in your request to receive the transcription output as a file URL.
Example:
Note:
- If the response size exceeds 1 MB, the output will automatically be returned as a file URL, even if return_as_file is set to false.
- This helps ensure reliable delivery of large transcription outputs.
2. sentence_level_timestamps
Description
Include timestamps at the sentence level in your transcription output.
- Default:
false
- Type:
boolean
Usage
Set "sentence_level_timestamps": true
to include sentence-level timestamps. Set to false if you do not need them.
Example:
Output Format
3. word_level_timestamps
Description
Include timestamps for each word in your transcription output.
- Default:
false
- Type:
boolean
Usage
Set "word_level_timestamps": true
to include word-level timestamps.
Example:
Output Format Word-level timestamps are provided in the word_segments array of the output.
4. diarization
Description
Enable speaker separation and identification at the word level.
- Default:
false
- Type:
boolean
Usage
Example:
Output Format Speaker labels are included in both segments and word_segments when diarization is enabled.
5. sentence_diarization
Description
Include speaker information at the sentence level.
- Default:
false
- Type:
boolean
Usage
Example:
Output Format Speaker labels are included in the segments array when sentence_diarization is enabled.
Note: If several speakers are identified in one sentence the most appearing one will be returned.
6. language_code
Description
Specify the language of the transcription to improve diarization and timestamps accuracy performance.
- Default:
en
(English) - Type:
string
Usage
Set "language_code": language_code
to specify the language of the audio content.
Example:
Supported Languages Providing the correct language_code enhances transcription accuracy, especially for features like diarization. Below is a list of supported language codes: “fr” (French), “de” (German), “es” (Spanish), “it” (Italian), “ja” (Japanese), “zh” (Chinese), “nl” (Dutch), “uk” (Ukrainian), “pt” (Portuguese), “ar” (Arabic), “cs” (Czech), “ru” (Russian), “pl” (Polish), “hu” (Hungarian), “fi” (Finnish), “fa” (Persian), “el” (Greek), “tr” (Turkish), “da” (Danish), “he” (Hebrew), “vi” (Vietnamese), “ko” (Korean), “ur” (Urdu), “te” (Telugu), “hi” (Hindi), “ca” (Catalan), “ml” (Malayalam) If the language you use is not in the list, just skip that parameter.
Note: Mandatory for Diarization: When using diarization or sentence_diarization, specifying language_code is required for better performance. Default Language: If language_code is not specified, it defaults to English (“en”).
Was this page helpful?