Accounts & Credentials
Create SaladCloud Account
To start, visit https://portal.salad.com/ to log in or create a free account. New users
receive 5 free audio hours to transcribe at no cost!
Watch our quick tutorial on how to use your free credits
effectively.
Create SaladCloud Organization
Next, create a new organization or select an existing one you’d like to use for transcription. Click on Switch to
this Organization for the org you want to use.
Get Transcription API URL
Then, click on Inference Endpoints in the left-hand nav and open up the Transcription API.
Copy your Transcription API URL
Authentication
Copy your SaladCloud API key from your account here.
API Reference
Example cURL Post Transcript
curl -X POST https://api.salad.com/api/public/organizations/{my-organization}/inference-endpoints/transcribe/jobs \
-H "Salad-Api-Key: {your-api-key}" \
-H "Content-Type: application/json" \
-d '{
"input": {
"url": "https://example.com/path/to/file.mp3",
"language_code": "en",
"return_as_file": false,
"sentence_level_timestamps": "true"
"word_level_timestamps": false,
"diarization": false,
"sentence_diarization": true,
"srt": false,
"translate": "",
"summarize": 50,
"llm_translation": "french, german, portuguese",
"srt_translation": "spanish, Hindi",
"custom_prompt": "",
"custom_vocabulary": ""
},
"webhook": {your webhook url},
"metadata": {
"my-job-id": 1234
}
}'
Webhook is a method that enables application to transmit data to another application immediately upon the occurrence of
an event. Instead of having your application periodically poll for updates, a webhook delivers data to a specified URL
as soon as the event occurs, facilitating instant communication and minimizing latency. In our case transcription result
gets sent to the webhook.
Example cURL Get Transcript
curl https://api.salad.com/api/public/organizations/{my-organization}/inference-endpoints/transcribe/jobs/54e84442-3576-45ca-904c-a1d90bc77baf \
-H "Salad-Api-Key: {your-api-key}"
Configure you header to include your unique SaladCloud API URL which can be found above as the variable “API URL”. Also
include your unique SaladCloud API key which can be found here as the variable
“Salad-Api-Key”.
To find more information about each parameter, please, visit “Features” option in the menu on the left.
Update the JSON input parameters to customize your transcription output.
”url”
The url parameter should be a downloadable link to your audio or video file. We recommend using a file service like AWS
S3, Azure Blob Storage, Dropbox, or Google Drive that offers secure, downloadable links. For instructions on how to
generate downloadable links for these services, please refer to
How to Create Downloadable File Links
Send media in any of these formats.
Audio: AIFF, FLAC, M4A, MP3, WAV
Video: MKV, MOV, WEBM, WMA, MP4 (most codecs), AVI
”return_as_file”
Set to true to receive the transcription output as a downloadable file URL. This is useful for large outputs. Default is
false.
“language_code”
Transcription is available in 97 languages. We automatically identify the source language. “language_code” need to be
specified only if diarization is required.
“sentence_level_timestamps”
Sentence level timestamps are returned on default. Set to false if not needed.
“word_level_timestamps”
Set to “true” for word level timestamps. Set to “false” on default.
“diarization”
Set to “true” for speaker separation and identification. If you don’t require it, set it to “false”.
Diarization requires the language_code
to be defined. By default, it is set to "en"
(English).
You can also diarize in “fr” (French), “de” (German), “es” (Spanish), “it” (Italian), “ja” (Japanese), “zh”
(Chinese), “nl” (Dutch), “uk” (Ukrainian), “pt” (Portuguese), “ar” (Arabic), “cs” (Czech), “ru” (Russian),
“pl” (Polish), “hu” (Hungarian), “fi” (Finnish), “fa” (Persian), “el” (Greek), “tr” (Turkish), “da”
(Danish), “he” (Hebrew), “vi” (Vietnamese), “ko” (Korean), “ur” (Urdu), “te” (Telugu), “hi” (Hindi), “ca”
(Catalan), “ml” (Malayalam).
“sentence_diarization”
Set to true
to include speaker information at the sentence level. Requires language_code to be specified. Default is
false
.
”translate”
Set "translate": "to_eng"
to translate the transcription into English. This replaces the original transcription with
the English translation. All additional parameters may also be used alongside translation.
”llm_translation”
Provide a comma-separated list of target languages to translate the transcription into multiple languages using our LLM
integration. The original transcription will be returned as normal, along with the translations. Supported
languages: English, French, German, Italian, Portuguese, Hindi, Spanish, Thai
“srt”
Set to “true” to generate a .srt output for caption and subtitles. If you don’t require it, set it to “false”.
“srt_translation”
Provide a comma-separated list of target languages to translate the generated SRT captions into multiple languages. The
original transcription and SRT content will be returned as normal. Supported languages: English, French, German,
Italian, Portuguese, Hindi, Spanish, Thai
”summarize”
Set to an integer value representing the maximum number of words for the summary of the transcription.
”custom_vocabulary” (preview)
Provide a comma-separated list of custom words or phrases to improve transcription accuracy for domain-specific terms.
”custom_prompt” (preview)
Provide a custom instruction to perform specific analyses on the transcription using our LLM integration.
”webhook”
*Optional. Webhook is a method that enables application to transmit data to another application immediately when
process is finished. Specify your webhook url to use.
”my-job-id”
*Optional. If you need an identifier from your system you can the job id as desired.
Step 3 | Make Post Request
Make your POST transcript request. If successful, you will receive a confirmation response that will include the job
“id” *(ex: fb724cc9-d0f7-44a1-86c4-180c7fab975e).
Step 4 | Make Get Request
Make your GET transcript request. If successful, you will receive a JSON transcript output that will include the inputs
you requested.
Example Transcript Output
{
"id": "fb724cc9-d0f7-44a1-86c4-180c7fab975e",
"input": {
"url": "https://example.com/path/to/file.mp3",
"language_code": "en",
"word_level_timestamps": true,
"return_as_file": false,
"diarization": true,
"sentence_diarization": true,
"srt": true,
"sentence_level_timestamps": true,
"summarize": 50,
"llm_translation": "french, german, portuguese",
"srt_translation": "spanish, Hindi"
},
"status": "succeeded",
"events": [
{
"action": "created",
"time": "2024-10-05T19:22:20.1115308+00:00"
},
{
"action": "started",
"time": "2024-10-05T21:38:40.1424816+00:00"
},
{
"action": "succeeded",
"time": "2024-10-05T21:39:49.1000268+00:00"
}
],
"output": {
"sentence_level_timestamps": [
{
"text": "You know that little voice in your head?",
"timestamp": [
0.0,
5.9
],
"start": 0.0,
"end": 5.9,
"speaker": "SPEAKER_02"
},
...
{
"text": "Because when we change the dialogue, we can change our world.",
"timestamp": [
115.14,
121.42
],
"start": 115.14,
"end": 121.42,
"speaker": "SPEAKER_02"
}
],
"word_segments": [
{
"word": "You",
"start": 4.274,
"end": 4.395,
"score": 0.89,
"speaker": "SPEAKER_02"
},
{
"word": "know",
"start": 4.435,
"end": 4.555,
"score": 0.87,
"speaker": "SPEAKER_02"
},
{
"word": "that",
"start": 4.576,
"end": 4.696,
"score": 0.773,
"speaker": "SPEAKER_02"
},
{
"word": "little",
"start": 4.756,
"end": 4.977,
"score": 0.833,
"speaker": "SPEAKER_02"
},
{
"word": "voice",
"start": 5.037,
"end": 5.378,
"score": 0.869,
"speaker": "SPEAKER_02"
}
...
{
"word": "world.",
"start": 120.938,
"end": 121.199,
"score": 0.219,
"speaker": "SPEAKER_02"
}
],
"srt_content": "1\n00:00:04,274 --> 00:00:05,880\nYou know that little voice in your head?\n\n2\n00:00:10,506 --> 00:00:13,240\nThe one that tells you to ignore a tasteless joke?\n\n3\n00:00:21,650 --> 00:00:25,920\nThe one that tells you to keep quiet when a client makes a racist comment?\n\n4\n00:00:31,604 --> 00:00:36,197\nThe little voice that tells you you're not smart enough because English isn't your\n\n5\n00:00:36,278 --> 00:00:37,100\nfirst language.\n\n6\n00:00:46,284 --> 00:00:50,138\nOr that it's okay to judge someone for leaving early because they have family\n\n7\n00:00:50,158 --> 00:00:50,660\ncommitments.\n\n8\n00:01:07,800 --> 00:01:11,880\nOr that it's not a big deal when everyone's opinion isn't considered.\n\n9\n00:01:21,129 --> 00:01:23,880\nYou may think you're the only one that hears that voice.\n\n10\n00:01:24,901 --> 00:01:27,659\nBut that voice also speaks to other people.\n\n11\n00:01:28,601 --> 00:01:30,819\nIt says, You're different.\n\n12\n00:01:32,202 --> 00:01:33,177\nYou're an outsider.\n\n13\n00:01:34,843 --> 00:01:35,899\nYou lack commitment.\n\n14\n00:01:37,020 --> 00:01:38,719\nYour opinion doesn't matter.\n\n15\n00:01:39,881 --> 00:01:45,619\nInstead of listening to that little voice, you need to find yours and make it heard.\n\n16\n00:01:51,651 --> 00:01:53,438\nIt's time for all of us to speak up.\n\n17\n00:01:56,143 --> 00:02:01,199\nBecause when we change the dialogue, we can change our world.\n\n",
"summary": "Silence oppressive voices; amplify your own.",
"llm_translations": {
"French": "Connaissez-vous cette petite voix dans votre tête ? \n\nLa voix qui vous dit de ignorer une blague de mauvais goût ?\n\nLa voix qui vous dit de garder le silence quand un client fait un commentaire raciste ?\n\nLa voix qui vous dit que vous ne êtes pas intelligent(e) car l'anglais n'est pas votre langue maternelle ?\n\nOu que c'est normal de juger quelqu'un pour être parti tôt parce qu'il a des engagements familiaux ?\n\nLa voix qui vous dit de ne rien dire quand les autres sont réduits à des stéréotypes ?\n\nOu que ce n'est pas important de prendre en compte toutes les opinions ?\n\nVous pensez peut-être que vous êtes le seul à entendre cette voix. Mais cette voix parle également aux autres.\n\nElle dit : Vous êtes différent(e). Vous êtes un(e) extérieur(e). Vous manquez de détermination. Votre opinion n'a pas d'importance.\n\nAu lieu d'écouter cette petite voice, vous devez trouver la vôtre et faire entendre votre voix. \n\nC'est l'heure pour nous tous de parler. \n\nCar lorsque nous changeons le dialogue, nous pouvons changer notre monde.",
"German": "Du weißt, diese kleine Stimme in deinem Kopf? Die eine, die dir sagt, ein unfassbarer Witz zu ignorieren? Die eine, die dich auffordert, leise zu sein, wenn jemand einen rassistischen Kommentar abgibt? Die kleine Stimme, die dir sagt, du seist nicht intelligent genug, weil Englisch nicht deine Muttersprache ist. Oder dass es in Ordnung ist, jemanden für früh zu gehen zu verurteilen, weil er Familienverpflichtungen hat. Die eine, die sagt, nichts sagen zu sollen, wenn andere zu Stereotypen reduziert werden. Oder dass das nicht großes Problem ist, wenn nicht alle Meinungen berücksichtigt werden. Du denkst vielleicht, du wärst der Einzige, der diese Stimme hört. Aber diese Stimme spricht auch andere Menschen an. Sie sagt: \"Du bist anders.\" \"Du bist ein Außenseiter.\" \"Du hast keine Verpflichtung.\" \"Deine Meinung zählt nicht.\" Anstatt dieser kleinen Stimme zu lauschen, musst du deine eigene finden und laut werden lassen. Es ist Zeit für uns alle aufzuschreien. Denn wenn wir die Konversation ändern, können wir unsere Welt verändern. Danke.",
"Portuguese": "Você sabe aquele pequeno voz na sua cabeça? A que diz para ignorar uma piada sem graça? A que diz para ficar calado quando um cliente faz um comentário racista? A pequena voz que diz você não é inteligente o suficiente porque inglês não é a sua língua materna. Ou que está bem julgar alguém por sair cedo porque têm compromissos familiares. A que diz para não dizer nada quando os outros são reduzidos a estereótipos. Ou que não é um grande problema quando nenhuma opinião é considerada. Você pode pensar que você é o único que ouve aquela voz. Mas essa voz também fala com outras pessoas. Diz: Você é diferente. Você é estranho. Você falta de compromisso. Sua opinião não importa. Em vez de ouvir aquele pequeno voz, você precisa encontrar a sua e fazê-la ser ouvida. É hora para todos nós falarmos alto. Porque quando mudamos o diálogo, podemos mudar nosso mundo. Obrigado."
},
"srt_translation": {
"Spanish": "1\n00:00:04,274 --> 00:00:05,880\n¿Sabes esa pequeña voz en tu cabeza?\n\n2\n00:00:10,506 --> 00:00:13,240\nLa que te dice ignorar un chiste de mal gusto?\n\n3\n00:00:21,650 --> 00:00:25,920\nLa que te dice callar cuando alguien hace un comentario racista?\n\n4\n00:00:31,604 --> 00:00:36,197\nEsa voz pequeña que te dice que no eres lo suficientemente inteligente porque el inglés no es tu\n\n5\n00:00:36,278 --> 00:00:37,100\nprimer idioma.\n\n6\n00:00:46,284 --> 00:00:50,138\nO que está bien juzgar a alguien por irse temprano porque tienen compromisos familiares.\n\n7\n00:00:50,158 --> 00:00:50,660\nO que no es un gran problema cuando no se considera la opinión de todos.\n\n8\n00:01:07,800 --> 00:01:11,880\nO que está bien con que nadie tenga la última palabra.\n\n9\n00:01:21,129 --> 00:01:23,880\nPuede parecer que solo tú escuchas esa voz.\n\n10\n00:01:24,901 --> 00:01:27,659\nPero esa voz también habla a otras personas.\n\n11\n00:01:28,601 --> 00:01:30,819\nDice: Eres diferente.\n\n12\n00:01:32,202 --> 00:01:33,177\nEres un forastero.\n\n13\n00:01:34,843 --> 00:01:35,899\nFaltas de compromiso.\n\n14\n00:01:37,020 --> 00:01:38,719\nTu opinión no importa.\n\n15\n00:01:39,881 --> 00:01:45,619\nEn lugar de escuchar esa voz pequeña, necesitas encontrar la tuya y hacer que se escuche.\n\n16\n00:01:51,651 --> 00:01:53,438\nEs hora de que todos hablamos.\n\n17\n00:01:56,143 --> 00:02:01,199\nPorque cuando cambiamos el diálogo, podemos cambiar nuestro mundo.",
},
"text": " You know that little voice in your head? The one that tells you to ignore a tasteless joke? The one that tells you to keep quiet when a client makes a racist comment? The little voice that tells you you're not smart enough because English isn't your first language. Or that it's okay to judge someone for leaving early because they have family commitments. The one that tells you not to say anything when others are being reduced to stereotypes. Or that it's not a big deal when everyone's opinion isn't considered. You may think you're the only one that hears that voice. But that voice also speaks to other people. It says, You're different. You're an outsider. You lack commitment. Your opinion doesn't matter. Instead of listening to that little voice, you need to find yours and make it heard. It's time for all of us to speak up. Because when we change the dialogue, we can change our world. Thank you.",
"duration": 0.04,
"processing_time": 68.46978211402893
},
"create_time": "2024-10-05T19:22:20.1115308+00:00",
"update_time": "2024-10-05T21:39:49.1000268+00:00"
}
”status”
If your transcription request is “pending”, it has not yet been picked up for processing.
“created” the transcription job is now in our queue to be processed.
“running” the transcription is processing.
“failed” the transcript was not created. We will automatically re-try the transcription process until it fails three
times. If we are unable to transcribe your media, you will get a 200 error with one of the following messages. Send us a
support request at [email protected] if this happens.
*“succeeded” the transcript is ready. If we were not able to pull your url, or there was another issue with the audio
file you will see an “error” in the response and one of the reasons:
*File size is more than 3GB *File can not be downloaded or duration is missing. Please check your file and try again.
*File duration is more than 2.5 hours
Transcription Automation
Webhooks are already available, and you can check the documentation above (under Example cURL Post Transcript). Using
webhooks is a better way to get results as soon as possible. However, if your use case better fits with polling, use the
following code. Make sure you update it as needed:
Pass 1: Submitting Audio Files for Transcription
This part of the script submits audio files for transcription and collects job IDs. The job IDs are used in the second
pass to check the status and retrieve the results.
import requests
import json
organization_name = "your_organization_name"
url = f"https://api.salad.com/api/public/organizations/{organization_name}/inference-endpoints/transcribe/jobs"
salad_key = "your_salad_key"
language_code = "en"
list_of_audio_files = [
"audio_url",
"audio_url",
"audio_url",
]
list_of_job_ids = []
headers = {
"Salad-Api-Key": salad_key,
"Content-Type": "application/json",
"accept": "application/json",
}
for audio_file in list_of_audio_files:
data = {
"input": {
"url": audio_file,
"language_code": language_code,
"word_level_timestamps": True,
"diarization": True,
"srt": True,
}
}
response = requests.post(url, headers=headers, json=data)
job_id = response.json()["id"]
list_of_job_ids.append(job_id)
Pass 2: Checking Job Status and Retrieving Results
This part of the script checks the status of each transcription job periodically and retrieves the results once the
transcription is completed.
import time
while len(list_of_job_ids) > 0:
for job in list_of_job_ids:
response_url = f"https://api.salad.com/api/public/organizations/salad/inference-endpoints/transcribe/jobs/{job}"
response = requests.get(response_url, headers=headers)
if response.json()['status'] == 'running' or response.json()['status'] == 'pending':
print(f"Job {job} is still {response.json()['status']}, waiting for retry...")
continue
elif response.json()['status'] == 'succeeded':
with open(f'result_{job}.json', 'w', encoding='utf-8') as f:
json.dump(response.json(), f, ensure_ascii=False)
list_of_job_ids.remove(job)
else:
print(f"Job {job} failed, waiting for retry...")
time.sleep(30)
continue