This code sample shows how to use Amazon Transcribe streaming SDK to convert speech to text. The code takes a stereo audio file as input and outputs the transcriptions.
const clientconfig: TranscribeStreamingClientConfig = {
region: region
};
try {
this._client = new TranscribeStreamingClient(clientconfig);
} catch (error) {
console.error('Error creating Transcribe Streaming client', error);
process.exit(1);
}
Use TranscribeStreamingClient to create an instance of Transcribe streaming client.
const audiopipeline:chain = new chain([
fs.createReadStream(this._mediafilename, { highWaterMark: CHUNK_SIZE }),
async data => {
// add timer delay if required
return data;
}
]);
const transcribeInput = async function* () {
for await (const chunk of audiopipeline) {
yield { AudioEvent: { AudioChunk: chunk } };
}
};
Transcribe streaming SDK requires an async iterable as the audio input parameter and the payload must be
in the format { AudioEvent: { AudiohChunk: chunk } } where
chunk is a buffer of audio bytes of certain length (e.g. 200ms).
First, setup an audio streaming pipeline using createReadStream to read chunks of audio bytes from
the file and pass-through (add timer delays if required) those bytes to the output. Iterate through this
pipeline to setup the async iterable that returns the required payload (for each chunk of data read from the file stream).
const response = await this._client.send(
new StartStreamTranscriptionCommand({
LanguageCode: 'en-US,
MediaSampleRateHertz: 8000,
MediaEncoding: 'pcm',
EnableChannelIdentification: true,
NumberOfChannels: 2,
AudioStream: transcribeInput()
})
);
StartStreamTranscriptionCommand starts the transcribe streaming session.
The parameters shown in the code snippet above are some of commonly used parameters
for standard transriptions.
MediaSampleRateHertz- 8000 Hz is typical sampling rate for telephony audio
MediaEncoding- PCM L16 is one of the supported encodings. If the audio is in another format (e.g. MuLaw), it must be first converted to one of the supported formats such as PCM L16.
NumberOfChannels- This sample code assumes that the input audio files are stereo (2 channel).
EnableChannelIdentification- set to true to separate the transcriptions by channel (i.e. turn-by-turn in a two-person conversation recording).
AudioStream- the streaming audio source. This must be setup before calling StartStreamTranscriptionCommand.
Transcribe supports a number of customizations such as Custom Vocabulary, Custom Language Model, etc.
that can be specified as parameters to StartStreamTranscriptionCommand
Refer to for more details.