HomeBlog PostsWoodworkingAbout MeContact
AI/ML
Transcribe streaming node client
Babu Srinivasan
Babu Srinivasan
May 04, 2023

This code sample shows how to use Amazon Transcribe streaming SDK to convert speech to text. The code takes a stereo audio file as input and outputs the transcriptions.

GitHub repo.

Transcribe streaming client

const clientconfig: TranscribeStreamingClientConfig = {
    region: region
};
try {
    this._client = new TranscribeStreamingClient(clientconfig);
} catch (error) {
    console.error('Error creating Transcribe Streaming client', error);
    process.exit(1);
}

Use TranscribeStreamingClient to create an instance of Transcribe streaming client.

Setup Streaming Audio input for Transcribe

const audiopipeline:chain = new chain([
    fs.createReadStream(this._mediafilename, { highWaterMark: CHUNK_SIZE }),
    async data => {
        // add timer delay if required
        return data;
    }
]);

const transcribeInput = async function* () {
    for await (const chunk of audiopipeline) {
        yield { AudioEvent: { AudioChunk: chunk } };
    }
};

Transcribe streaming SDK requires an async iterable as the audio input parameter and the payload must be in the format { AudioEvent: { AudiohChunk: chunk } } where chunk is a buffer of audio bytes of certain length (e.g. 200ms).

First, setup an audio streaming pipeline using createReadStream to read chunks of audio bytes from the file and pass-through (add timer delays if required) those bytes to the output. Iterate through this pipeline to setup the async iterable that returns the required payload (for each chunk of data read from the file stream).

Start streaming audio to Transcribe

const response = await this._client.send(
    new StartStreamTranscriptionCommand({
        LanguageCode: 'en-US,
        MediaSampleRateHertz: 8000,
        MediaEncoding: 'pcm', 
        EnableChannelIdentification: true,
        NumberOfChannels: 2,
        AudioStream: transcribeInput()
    })
);

StartStreamTranscriptionCommand starts the transcribe streaming session. The parameters shown in the code snippet above are some of commonly used parameters for standard transriptions.

MediaSampleRateHertz - 8000 Hz is typical sampling rate for telephony audio

MediaEncoding - PCM L16 is one of the supported encodings. If the audio is in another format (e.g. MuLaw), it must be first converted to one of the supported formats such as PCM L16.

NumberOfChannels - This sample code assumes that the input audio files are stereo (2 channel).

EnableChannelIdentification - set to true to separate the transcriptions by channel (i.e. turn-by-turn in a two-person conversation recording).

AudioStream - the streaming audio source. This must be setup before calling StartStreamTranscriptionCommand.

Additional parameters and customization

Transcribe supports a number of customizations such as Custom Vocabulary, Custom Language Model, etc. that can be specified as parameters to StartStreamTranscriptionCommand Refer to Amazon Transcribe SDK documentation for more details.


Tags

Amazon TranscribeAWSAIML

Related Posts

Speech Recongition - Speaker independent isolated word recognition
March 08, 2024
© 2024 broken-ear.io