Speech recognition with Python & Google

Share this post

After getting into the text, image and video, I absolutely had to start to take a look on the audio side. This is my goal in this article which is first a practising one. So, the idea of this post is very simple … how to capture voice information and transcribe it into text with your computer?

If you want to create your own digital assistant, this is clearly is how and where to start anyway …

Python is a language or rather a rich environment to perform this Voice -> Text transcription, so we just need some libraries:

  • pyAudio (https://pypi.org/project/PyAudio/)
pip install PyAudio

Be careful if like me you run Ubuntu, the installation via pip may not work. Instead, prefer:

sudo apt-get install python-pyaudio python3-pyaudio
  • speech_recognition
pip install SpeechRecognition

This API allows you to do voice-to-text transcription. For this it can rely on different engines:

Microphone management

First of all initizalize speech_recognition :

r = sr.Recognizer()

Then we can list the microphones available on the computer:

['HDA Intel HDMI: 0 (hw:0,3)',
 'HDA Intel HDMI: 1 (hw:0,7)',
 'HDA Intel HDMI: 2 (hw:0,8)',
 'HDA Intel HDMI: 3 (hw:0,9)',
 'HDA Intel HDMI: 4 (hw:0,10)',
 'HDA Intel PCH: ALC3232 Analog (hw:1,0)',
 'HDA NVidia: HDMI 0 (hw:2,3)',
 'HDA NVidia: HDMI 1 (hw:2,7)',
 'HDA NVidia: HDMI 2 (hw:2,8)',
 'HDA NVidia: HDMI 3 (hw:2,9)',

At this level you must choose the right microphone by specifying the device_index parameter as below:

micro = sr.Microphone(device_index=5)

Or just use the one by default :

micro = sr.Microphone()

First live recording

Doing your first speech recognition is extremely easy and takes a few lines in Python. for that we open the microphone channel (line 1) and we listen …

with micro as source:
    audio_data = r.listen(source)
result = r.recognize_google(audio_data)
print (">", result)

Note in line 5 the use of the function recognize_google () which allows the Google service to analyze your audio stream and to transcribe the text to you. The result should be if you said “good morning”:

> good morning

How does it work?

  • Enter the display of Speak! and End! speak and say “Good Morning (we’ll see later how to manage other languages like French)
  • Once End! displayed you will notice that the execution is waiting for something. In fact the program calls the Google function and waits for the textual transcription of the audio tape.

Saving a wav file

It can be useful to record your voice in a wav file to transcribe it later or later.

For that we will use pyAudio like this:

import pyaudio
import wave
chunk = 1024  # Record in chunks of 1024 samples
sample_format = pyaudio.paInt16  # 16 bits per sample
channels = 2
fs = 44100  # Record at 44100 samples per second
seconds = 10
filename = "output.wav"
p = pyaudio.PyAudio()  # Create an interface to PortAudio
print('Start Recording ...')
stream = p.open(format=sample_format,
frames = []  # Initialize array to store frames
# Store data in chunks for 3 seconds
for i in range(0, int(fs / chunk * seconds)):
    data = stream.read(chunk)
# Stop and close the stream 
# Terminate the PortAudio interface
print('... Finished recording')
# Save the recorded data as a WAV file
wf = wave.open(filename, 'wb')

This portion of code records your microphone for 10 seconds and stores the result in the output.wav file

Voice recognition in French with Google

Imagine that you recorded your voice with the following lyrics:

"L'histoire commence un beau matin tout le monde va bien les élèves sont heureux"
r = sr.Recognizer()
with sr.AudioFile(filename) as source:
    audio = r.record(source)
        data = r.recognize_google(audio, language="fr-FR")
        print("Please try again")
Histoire commence un beau matin tout le monde va bien les élèves sont heureux

Note in line 5 the use of the option language = “fr-FR” which allows the use of a speech recognition model in French.

And there you have it, we saw in this article how to transcribe voice to text with Python and Google Speech Recognition. In a future article we may add a touch of NLP later in order to start a simple voice assistant much like we did for the analysis of movie reviews.

Share this post

Benoit Cayla

In more than 15 years, I have built-up a solid experience around various integration projects (data & applications). I have, indeed, worked in nine different companies and successively adopted the vision of the service provider, the customer and the software editor. This experience, which made me almost omniscient in my field naturally led me to be involved in large-scale projects around the digitalization of business processes, mainly in such sectors like insurance and finance. Really passionate about AI (Machine Learning, NLP and Deep Learning), I joined Blue Prism in 2019 as a pre-sales solution consultant, where I can combine my subject matter skills with automation to help my customers to automate complex business processes in a more efficient way. In parallel with my professional activity, I run a blog aimed at showing how to understand and analyze data as simply as possible: datacorner.fr Learning, convincing by the arguments and passing on my knowledge could be my caracteristic triptych.

View all posts by Benoit Cayla →

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Fork me on GitHub