• Hackers Realm

Convert Speech to Text using Python | Speech Recognition | Machine Learning Project Tutorial

Converting Speech to Text is a basic speech recognition process to convert an audio input to a text file using machine learning. This is useful to convert data from customer service, interview call, or any audio file into a text file. It enables the recognition and translation of spoken language into text through computational linguistics.

In this project tutorial we will install the Google Speech Recognition module and covert realtime audio to text and also convert an audio file to text data.

You can watch the step by step explanation video tutorial down below

Project Information

The objective of the project is to convert speech to text in real time and convert audio file to text. It uses google speech API to convert the audio to text.


  • speech_recognition

  • Google Speech API

We install the module to proceed

# install the module
!pip install speechrecognition
!conda install pyaudio

Requirement already satisfied: speechrecognition in c:\programdata\anaconda3\lib\site-packages (3.8.1) Collecting PyAudio Using cached PyAudio-0.2.11.tar.gz (37 kB) Building wheels for collected packages: PyAudio Building wheel for PyAudio (setup.py): started Building wheel for PyAudio (setup.py): finished with status 'error' Running setup.py clean for PyAudio Failed to build PyAudio Installing collected packages: PyAudio Running setup.py install for PyAudio: started Running setup.py install for PyAudio: finished with status 'error'

Now we import the module

# import the module
import speech_recognition as sr

We initialize the module

# initialize
r = sr.Recognizer()

Convert Speech to Text in Real time

We will convert real time audio from a microphone into text

while True:
    with sr.Microphone() as source:
        # clear background noise
        r.adjust_for_ambient_noise(source, duration=0.3)
        print("Speak now")
        # capture the audio
        audio = r.listen(source)
            text = r.recognize_google(audio)
            print("Speaker:", text)
            if text == 'quit':
                print('Please say again!!!')

Speak now Speaker: welcome to the channel Speak now Speaker: testing speech recognition Speak now Speaker: quit

  • Microphone() - Receive audio input from microphone

  • adjust_for_ambient_noise(source, duration=0.3) - Clear any background noise from the real time input

  • listen(source) - Capture the audio from the source

  • recognize_google(audio) - Google Speech recognition function to convert audio into text

  • text == 'quit' - Condition to quit the while loop

Convert Audio to Text

Now we will process and convert an audio file into text

with sr.AudioFile('test.wav') as source:
    print("listening to audio")
    # capture the audio
    audio = r.listen(source)
        text = r.recognize_google(audio)
        print("Audio:", text)

listening to audio Audio: welcome to speech recognition

  • Displayed text is the same as the speech in the audio file

  • For larger audio files you need to split them in smaller segments for better processing

Final Thoughts

  • Very useful tool for converting real time recordings into text which can help in chats, interviews, narration, captions, etc.

  • You can also use this process for Emotional Speech recognition and further analyze the text for sentiment analysis.

  • The Google Speech recognition is a very effective and precise module, you may implement any other module to convert speech into text as per your preference.

In this project tutorial we have explored Convert Speech to Text process using the Google Speech Recognition module. We have installed the module and processed real time audio recording and an audio file converting into text data.

Get the project notebook from here

Thanks for reading the article!!!

Check out more project videos from the YouTube channel Hackers Realm