HomeBlog PostsWoodworkingAbout MeContact
AI/ML
Speech Recongition - Speaker independent isolated word recognition
Babu Srinivasan
Babu Srinivasan
March 08, 2024
Speech Recongition - Speaker independent isolated word recognition

As a specialist AI/ML solutions architect, I focus on ASR (automatic speech recognition) and related AI/ML domains. I have been working with Transcribe for the last 2.5 years, however, my first experience with speech recognition goes way back to the 90s.

We (3 of my classmates and I) did a project on isolated word recognition for the B.Tech degree in Computer Science & Engineering. I recently got hold of that project report (hardcopy) and it brought back those memories of us sitting in the lab in front of a PC with 8086 processor running MS-DOS, tinkering with the hardware, and testing the system. The project report explored both hardware components and software techniques for speech recognition.

While ASR field has advanced a lot, some of the foundational concepts and signal processing still hold good.

I have provided few snippets and screenshots from the project report below.

The system consisted of two parts - hardware for signal processing and software for training/inference.

Hardware implementation consisted of an amplifier, 3Khz Low Pass Filter, and an Analog to Digital Converter card that plugged into the PC expansion slot.

hw

The software component had two parts - 1. the code that programmed the analog to digital converter card to capture speech signal and store the digital form in the PC. 2. Feature exatraction

The Feature extraction consisted of standard technques such as windowing and calculating Cepstrum co-efficients. The flow diagram below also shows the training and recognition (inference) phases.

sw

During training, reference patterns for trained words were stored in a file. During recognition/inference phase, the sample utterances are brought into time registeration with reference patterns before calculating the distance to determine pattern similarity.

The software compponents (including the UI) were written using C and 8086 assembler language. I no longer have access to the source code.

And, below are the screenshots of the UI :-)

ui 1

ui 2


Tags

ASRAIMLSpeech RecognitionSignal Processing

Related Posts

Web application for transcribing micrphone/speaker audio
October 18, 2023
© 2024 broken-ear.io