Speech recognition with Python & Google

Share this post

After getting into the text, image and video, I absolutely had to start to take a look on the audio side. This is my goal in this article which is first a practising one. So, the idea of this post is very simple … how to capture voice information and transcribe it into text with your computer?

If you want to create your own digital assistant, this is clearly is how and where to start anyway …

Python is a language or rather a rich environment to perform this Voice -> Text transcription, so we just need some libraries:

pyAudio (https://pypi.org/project/PyAudio/)

pip install PyAudio

Be careful if like me you run Ubuntu, the installation via pip may not work. Instead, prefer:

sudo apt-get install python-pyaudio python3-pyaudio

speech_recognition

pip install SpeechRecognition

This API allows you to do voice-to-text transcription. For this it can rely on different engines:

CMU Sphinx (offline)
Google Speech Recognition (the one we’re going to use here)
Google Cloud Speech API
Wit.ai
Microsoft Bing Voice Recognition
Houndify API
IBM Speech to Text
Snowboy Hotword Detection (offline)

Index

Microphone management

First of all initizalize speech_recognition :

r = sr.Recognizer()

Then we can list the microphones available on the computer:

sr.Microphone.list_microphone_names()

['HDA Intel HDMI: 0 (hw:0,3)',
 'HDA Intel HDMI: 1 (hw:0,7)',
 'HDA Intel HDMI: 2 (hw:0,8)',
 'HDA Intel HDMI: 3 (hw:0,9)',
 'HDA Intel HDMI: 4 (hw:0,10)',
 'HDA Intel PCH: ALC3232 Analog (hw:1,0)',
 'HDA NVidia: HDMI 0 (hw:2,3)',
 'HDA NVidia: HDMI 1 (hw:2,7)',
 'HDA NVidia: HDMI 2 (hw:2,8)',
 'HDA NVidia: HDMI 3 (hw:2,9)',
 'hdmi',
 'pulse',
 'default']

At this level you must choose the right microphone by specifying the device_index parameter as below:

micro = sr.Microphone(device_index=5)

Or just use the one by default :

micro = sr.Microphone()

First live recording

Doing your first speech recognition is extremely easy and takes a few lines in Python. for that we open the microphone channel (line 1) and we listen …

with micro as source:
    print("Speak!")
    audio_data = r.listen(source)
    print("End!")
result = r.recognize_google(audio_data)
print (">", result)

Note in line 5 the use of the function recognize_google () which allows the Google service to analyze your audio stream and to transcribe the text to you. The result should be if you said “good morning”:

Speak!
End!
> good morning

How does it work?

Enter the display of Speak! and End! speak and say “Good Morning (we’ll see later how to manage other languages like French)
Once End! displayed you will notice that the execution is waiting for something. In fact the program calls the Google function and waits for the textual transcription of the audio tape.

Saving a wav file

It can be useful to record your voice in a wav file to transcribe it later or later.

For that we will use pyAudio like this:

import pyaudio
import wave
 
chunk = 1024  # Record in chunks of 1024 samples
sample_format = pyaudio.paInt16  # 16 bits per sample
channels = 2
fs = 44100  # Record at 44100 samples per second
seconds = 10
filename = "output.wav"
 
p = pyaudio.PyAudio()  # Create an interface to PortAudio
 
print('Start Recording ...')
 
stream = p.open(format=sample_format,
                channels=channels,
                rate=fs,
                frames_per_buffer=chunk,
                input=True)
 
frames = []  # Initialize array to store frames
 
# Store data in chunks for 3 seconds
for i in range(0, int(fs / chunk * seconds)):
    data = stream.read(chunk)
    frames.append(data)
 
# Stop and close the stream 
stream.stop_stream()
stream.close()
# Terminate the PortAudio interface
p.terminate()
 
print('... Finished recording')
 
# Save the recorded data as a WAV file
wf = wave.open(filename, 'wb')
wf.setnchannels(channels)
wf.setsampwidth(p.get_sample_size(sample_format))
wf.setframerate(fs)
wf.writeframes(b''.join(frames))
wf.close()

This portion of code records your microphone for 10 seconds and stores the result in the output.wav file

Voice recognition in French with Google

Imagine that you recorded your voice with the following lyrics:

"L'histoire commence un beau matin tout le monde va bien les élèves sont heureux"

r = sr.Recognizer()
with sr.AudioFile(filename) as source:
    audio = r.record(source)
    try:
        data = r.recognize_google(audio, language="fr-FR")
        print(data)
    except:
        print("Please try again")

Histoire commence un beau matin tout le monde va bien les élèves sont heureux

Note in line 5 the use of the option language = “fr-FR” which allows the use of a speech recognition model in French.

And there you have it, we saw in this article how to transcribe voice to text with Python and Google Speech Recognition. In a future article we may add a touch of NLP later in order to start a simple voice assistant much like we did for the analysis of movie reviews.