Convert Speech to Text using Python | Speech Recognition | Machine Learning Project Tutorial
Converting Speech to Text is a basic speech recognition process to convert an audio input to a text file using machine learning. This is useful to convert data from customer service, interview call, or any audio file into a text file. It enables the recognition and translation of spoken language into text through computational linguistics.
In this project tutorial we will install the Google Speech Recognition module and covert realtime audio to text and also convert an audio file to text data.
You can watch the step by step explanation video tutorial down below
The objective of the project is to convert speech to text in real time and convert audio file to text. It uses google speech API to convert the audio to text.
Google Speech API
We install the module to proceed
# install the module !pip install speechrecognition !conda install pyaudio
Requirement already satisfied: speechrecognition in c:\programdata\anaconda3\lib\site-packages (3.8.1) Collecting PyAudio Using cached PyAudio-0.2.11.tar.gz (37 kB) Building wheels for collected packages: PyAudio Building wheel for PyAudio (setup.py): started Building wheel for PyAudio (setup.py): finished with status 'error' Running setup.py clean for PyAudio Failed to build PyAudio Installing collected packages: PyAudio Running setup.py install for PyAudio: started Running setup.py install for PyAudio: finished with status 'error'
Now we import the module
# import the module import speech_recognition as sr
We initialize the module
# initialize r = sr.Recognizer()
Convert Speech to Text in Real time
We will convert real time audio from a microphone into text
while True: with sr.Microphone() as source: # clear background noise r.adjust_for_ambient_noise(source, duration=0.3) print("Speak now") # capture the audio audio = r.listen(source) try: text = r.recognize_google(audio) print("Speaker:", text) if text == 'quit': break except: print('Please say again!!!')
Speak now Speaker: welcome to the channel Speak now Speaker: testing speech recognition Speak now Speaker: quit
Microphone() - Receive audio input from microphone
adjust_for_ambient_noise(source, duration=0.3) - Clear any background noise from the real time input
listen(source) - Capture the audio from the source
recognize_google(audio) - Google Speech recognition function to convert audio into text
text == 'quit' - Condition to quit the while loop
Convert Audio to Text
Now we will process and convert an audio file into text
with sr.AudioFile('test.wav') as source: print("listening to audio") # capture the audio audio = r.listen(source) try: text = r.recognize_google(audio) print("Audio:", text) except: print('Error')
listening to audio Audio: welcome to speech recognition
Displayed text is the same as the speech in the audio file
For larger audio files you need to split them in smaller segments for better processing
Very useful tool for converting real time recordings into text which can help in chats, interviews, narration, captions, etc.
You can also use this process for Emotional Speech recognition and further analyze the text for sentiment analysis.
The Google Speech recognition is a very effective and precise module, you may implement any other module to convert speech into text as per your preference.
In this project tutorial we have explored Convert Speech to Text process using the Google Speech Recognition module. We have installed the module and processed real time audio recording and an audio file converting into text data.
Get the project notebook from here
Thanks for reading the article!!!
Check out more project videos from the YouTube channel Hackers Realm