Convert Speech to Text using Python | Speech Recognition | Machine Learning Project Tutorial

Updated: May 31, 2023

Unlock the power of speech-to-text conversion with Python! This comprehensive tutorial explores speech recognition techniques and machine learning. Learn to transcribe spoken words into written text using cutting-edge algorithms and models. Enhance your skills in natural language processing and optimize your applications with this hands-on project tutorial. #SpeechToText #Python #SpeechRecognition #MachineLearning #NLP

Convert Speech to Text using Speech Recognition

In this project tutorial we will install the Google Speech Recognition module and covert real-time audio to text and also convert an audio file to text data.

You can watch the step by step explanation video tutorial down below

Project Information

The objective of the project is to convert speech to text in real time and convert audio file to text. It uses google speech API to convert the audio to text.

Libraries

speech_recognition
Google Speech API

We install the module to proceed

# install the module
!pip install speechrecognition
!conda install pyaudio

Requirement already satisfied: speechrecognition in c:\programdata\anaconda3\lib\site-packages (3.8.1)
Collecting PyAudio
Using cached PyAudio-0.2.11.tar.gz (37 kB)
Building wheels for collected packages: PyAudio
Building wheel for PyAudio (setup.py): started
Building wheel for PyAudio (setup.py): finished with status 'error'
Running setup.py clean for PyAudio
Failed to build PyAudio
Installing collected packages: PyAudio
Running setup.py install for PyAudio: started
Running setup.py install for PyAudio: finished with status 'error'

Now we import the module

# import the module
import speech_recognition as sr

We initialize the module

# initialize
r = sr.Recognizer()

Convert Speech to Text in Real time

We will convert real time audio from a microphone into text

while True:
with sr.Microphone() as source:
# clear background noise
r.adjust_for_ambient_noise(source, duration=0.3)

print("Speak now")
# capture the audio
audio = r.listen(source)

try:
text = r.recognize_google(audio)
print("Speaker:", text)
if text == 'quit':
break
except:
print('Please say again!!!')

Speak now
Speaker: welcome to the channel
Speak now
Speaker: testing speech recognition
Speak now
Speaker: quit

Microphone() - Receive audio input from microphone
adjust_for_ambient_noise(source, duration=0.3) - Clear any background noise from the real time input
listen(source) - Capture the audio from the source
recognize_google(audio) - Google Speech recognition function to convert audio into text
text == 'quit' - Condition to quit the while loop

Convert Audio to Text

Now we will process and convert an audio file into text

with sr.AudioFile('test.wav') as source:
print("listening to audio")
# capture the audio
audio = r.listen(source)

try:
text = r.recognize_google(audio)
print("Audio:", text)
except:
print('Error')

listening to audio
Audio: welcome to speech recognition

Displayed text is the same as the speech in the audio file
For larger audio files you need to split them in smaller segments for better processing

Final Thoughts

Very useful tool for converting real time recordings into text which can help in chats, interviews, narration, captions, etc.
You can also use this process for Emotional Speech recognition and further analyze the text for sentiment analysis.
The Google Speech recognition is a very effective and precise module, you may implement any other module to convert speech into text as per your preference.

In this project tutorial we have explored Convert Speech to Text process using the Google Speech Recognition module. We have installed the module and processed real time audio recording and an audio file converting into text data.

Get the project notebook from here

Thanks for reading the article!!!

Check out more project videos from the YouTube channel Hackers Realm

1199