Zoom Media Speech to Text API (v2)

Download OpenAPI specification:Download

Authentication

api_key

Zoom Media Speech to Text API Token

Security scheme type: API Key
Header parameter name: x-zoom-s2t-key

Batch

The asynchronous HTTP interface provides a non-blocking POST method for transcribing audio. You can use this method following three steps:

  1. Create a session with the language model needed.
  2. Upload your files.
  3. Check if the service finished processing your audio file.

Initiate batch session

Initiate a new asynchronous Speech to Text session for a specific language.

Authorizations:
Request Body schema: application/json
language
required
string

Set the language for this session

callback_url
string

If set a HTTPS callback will be made to a web endpoint once the transcription is done.

callback_method
string
Default: "POST"
Enum: "POST" "PUT"

Specify the method to use for the HTTP callback. Requires callback_url to be set.

callback_format
string
Default: "application/json"

Set this to specify a transcription format. Requires callback_url to be set.

callback_headers
Array of strings

Array of headers that needs to be present in the callback request. Requires callback_url to be set.

punctuation
boolean
Default: false

If set to true punctuation will be enabled.

Responses

200

New session initialized

400

invalid-language

401

unauthorized

415

content-invalid

422

body-invalid

500

server-error

post /api/v2/speech-to-text/session

Zoom Media API Endpoint

https://api.zoommedia.ai/api/v2/speech-to-text/session

Request samples

Content type
application/json
Copy
Expand all Collapse all
{
  • "language": "en-us",
  • "callback_method": "POST",
  • "callback_format": "application/json",
  • "callback_headers":
    [
    ],
  • "punctuation": false
}

Response samples

Content type
application/json
Copy
Expand all Collapse all
{
  • "language": "en-us",
  • "sessionId": "49b5b257000004000006ccdfc8",
  • "zoom_id": "49b5b257000004000006ccdfc8"
}

Get transcript session status or result

Authorizations:
path Parameters
zoom_id
required
string 26 characters

Zoom Media Speech to Text session ID. This is the unique identifier for the transcript session.

query Parameters
format
string
Enum: "application/json" "application/ttml+xml" "text/sbv" "text/srt" "text/vtt"

Control the output format for the result. Works the same as the Accept header, when both are set the query parameter gets preference.

Output formats

  • application/json - Zoom Media JSON format
  • application/ttml+xml - TTML subtitle format
  • text/sbv - SBV subtitle format
  • text/srt - SRT subtitle format
  • text/vtt - WebVTT subtitle format

Value must be URL encoded.

max_line_length
integer [ 0 .. 200 ]
Default: 37

Controls the maximum length of a line in a segment block, when the maximum is reached a new line is added to the segment. By setting the value to 0 this limitation will be removed. This parameter only works for controlling subtitle output and will be ignored for JSON.

max_line_words
integer [ 0 .. 100 ]
Default: 12

Controls the maximum number of words on a line in a segment block, when the maximum is reached a new line is added to the segment. By setting the value to 0 this limitation will be removed. This parameter only works for controlling subtitle output and will be ignored for JSON.

max_segment_duration
integer [ 0 .. 20000 ]
Default: 7000

Controls the maximum number of milliseconds a segment will be visible, when the maximum is reached a new subtitle segment is created. By setting the value to 0 this limitation will be removed. This parameter only works for controlling subtitle output and will be ignored for JSON.

max_segment_lines
integer [ 0 .. 4 ]
Default: 2

Controls the maximum number lines in a segment, when the maximum is reached a new subtitle segment is created. By setting the value to 0 this limitation will be removed. This parameter only works for controlling subtitle output and will be ignored for JSON.

max_segment_words
integer [ 0 .. 100 ]
Default: 24

Controls the maximum number of words in a segment, when the maximum is reached a new subtitle segment is created. By setting the value to 0 this limitation will be removed. This parameter only works for controlling subtitle output and will be ignored for JSON.

break_on_silence
integer [ 0 .. 20000 ]
Default: 1500

Controls the maximum amount between two words in a segment is permitted. When the maximum is reached a new subtitle segment is created. By setting the value to 0 this limitation will be removed. This parameter only works for controlling subtitle output and will be ignored for JSON.

Responses

200

Succesful session state response or subtitle

get /api/v2/speech-to-text/session/{zoom_id}

Zoom Media API Endpoint

https://api.zoommedia.ai/api/v2/speech-to-text/session/{zoom_id}

Response samples

Content type
Copy
Expand all Collapse all
{
  • "language": "en-us",
  • "sessionId": "49b5b257000004000006ccdfc8",
  • "zoom_id": "49b5b257000004000006ccdfc8",
  • "done": true,
  • "results":
    [
    ]
}

Start Media file transcription

Start a Transcription by uploading a media file (video/* or audio/*) or by setting a media url (video_url)

Authorizations:
path Parameters
zoom_id
required
string 26 characters

Zoom Media Speech to Text session ID. This is the unique identifier for the transcript session.

Request Body schema:
upload
string <binary>

Responses

200

Session state response

post /api/v2/speech-to-text/session/{zoom_id}

Zoom Media API Endpoint

https://api.zoommedia.ai/api/v2/speech-to-text/session/{zoom_id}

Request samples

Content type
No sample

Response samples

Content type
application/json
Copy
Expand all Collapse all
{
  • "language": "en-us",
  • "sessionId": "49b5b257000004000006ccdfc8",
  • "zoom_id": "49b5b257000004000006ccdfc8"
}

Start Media file transcription

Start a Transcription by uploading a media file

Authorizations:
path Parameters
zoom_id
required
string 26 characters

Zoom Media Speech to Text session ID. This is the unique identifier for the transcript session.

Request Body schema: *
string <binary>

Responses

200

Session state response

put /api/v2/speech-to-text/session/{zoom_id}

Zoom Media API Endpoint

https://api.zoommedia.ai/api/v2/speech-to-text/session/{zoom_id}

Response samples

Content type
application/json
Copy
Expand all Collapse all
{
  • "language": "en-us",
  • "sessionId": "49b5b257000004000006ccdfc8",
  • "zoom_id": "49b5b257000004000006ccdfc8"
}