Web application for transcribing micrphone/speaker audio

Home Blog Posts Woodworking About Me Contact

AI/ML

Babu Srinivasan

October 18, 2023

This code sample shows how to use AudioWorklet interface of the Web Audio API to ingest audio stream and then use Amazon Transcribe streaming API to convert speech to text.

Github repo

The instructions to run the code sample is available in the README file of the repo linked above.

We will walk through the important sections of the code in this blog post.

First, we create a MediaStream object using either getDisplayMedia or getUserMedia as shown below.

if (audiosource === 'ScreenCapture') {
    stream = await window.navigator.mediaDevices.getDisplayMedia({
    video: true,
    audio: true,
    });
} else {
    stream = await window.navigator.mediaDevices.getUserMedia({
    video: false,
    audio: true,
    });
}

Next, we load the AudioWorklet custom audio processing script. The script itself is placed inside the public/ directory.

await audioContext.audioWorklet.addModule('./worklets/recording-processor.js');

recording-processor.js contains the javascript code that usually implements custom audio processing that run in a separate thread to provide low latency audio processing.

In this Web application example, the audioworklet processor implements simple buffering (i.e. buffering audio for 200ms) of audio frames before sending it to Transcribe streaming for transcription.

The code below sets up an async iterable on the audio source (ouptut from audioworklet processor) that is used as input to Transcribe streaming API.

const audioDataIterator = pEvent.iterator<'message', MessageEvent<MessageDataType>>(mediaRecorder.port, 'message');

const getAudioStream = async function* () {
for await (const chunk of audioDataIterator) {
    if (chunk.data.message === 'SHARE_RECORDING_BUFFER') {
    const abuffer = pcmEncode(chunk.data.buffer[0]);
    const audiodata = new Uint8Array(abuffer);
    yield {
        AudioEvent: {
        AudioChunk: audiodata,
        },
    };
    }
}
};

The code below shows how Transcribe streaming SDK is used to convert speech to text.
Async iterable getAudioStream is specified as the AudioStream parameter for Transcribe.

const transcribeClient = new TranscribeStreamingClient({
    region: 'us-east-1',
    credentials: currentCredentials,
});

const command = new StartStreamTranscriptionCommand({
    LanguageCode: language,
    MediaEncoding: 'pcm',
    MediaSampleRateHertz: sampleRate,
    AudioStream: getAudioStream(),
});
const data = await transcribeClient.send(command);

The web application displays the transcriptions (output from Transcribe streaming SDK).

Tags

Related Posts