Google Collaboratory

Share this post

Why a cloud solution?

In a previous article , I strongly suggested that you use Jupyter to design and work on your machine learning models. Of course I have not changed my mind, quite the contrary. Nevertheless Jupyter as it is has a big drawback: it must be installed! Of course with Anaconda, no worries you just have a button to click.

But unless you have a war machine at your disposal (lucky that you are) you will need some power whenever you go dealing with high volumes. What’s more, not everyone has GPUs at their disposal!

In short, a simple answer is to move towards a Cloud solution!

In this regard you will have some solutions available to you to do Jupyter in cloud mode. How about using a 100% free solution? well I highly recommend Google Colaboratory.

Google Collaboratory in brief

Collaboratory is a Google incubation project created for collaboration (as the name suggests), training and research related to Machine Learning. Collaboratory is a Jupyter notebook environment that really doesn’t require any configuration and runs entirely in the Google cloud.
One constraint: Collaboratory notebooks are saved in Google Drive and can also be shared like Google Docs or Sheets documents. A GitHub gateway (maybe not for long) is also available.

Of course Collaboratory is available for free, you just need to have a Google account.

For more information on Collaboratory I invite you to go to the FAQ .

What’s really great is that you can use a GPU for free for 12 hours (continuous use)!

Getting started with Google Colaboratory

Ok, type in your browser the URL: https://colab.research.google.com/

Quickly Google Collaboratory offers you either to create a notebook or to get one from Google Drive or Github, or to upload one directly from your computer:

You even have some interesting examples to consult to get the most out of the tool.

So create a notebook or do like me get one from Google Drive. Good news about Jupyter notebooks are of course compatible with Google Colaboratory.

Getting started is quick for those who are already used to Jupyter. The environment is almost identical except for a few tricks and other small added features (and very useful for that matter).

Just a catch: the data files!

And yes we were in a sweet dream until then. If the world of the cloud does indeed make life easier on the installation side and more generally machine power, there remains a problem:

You have to be able to interact with the rest of the world, and therefore be able to read / write flat files at a minimum!

It seems to be just common sense indeed. Unfortunately this is not the most fun part of the solution. I told you your notebooks are stored in Google Drive. That’s one thing, now data can come from multiple places. In the context of this article, I suggest you place your files in Google Drive. We will see how to retrieve them in Google Colaboratory… because unfortunately it is not automatic!

You will find several examples and ways of doing it (via PyDev, API, etc.) in examples provided by Google . Despite these examples I had a hard time in the file recovery phase. Here’s how to do it easily with PyDev.

Downloaded a file from Google Drive -> Colaboratory

First of all I have a file (here sample1000.csv) placed in Google Drive:

To be able to retrieve this file I need its Google ID, here’s how:

  • Right click on the file
  • Choose Get Shareable Link from the drop-down menu:
  • Copy the URL. but only get the id.

NB: For example, we will only recover the part in bold here: https://drive.google.com/open?id= 1Pl-GxINYFcXL2ASaQjo_BFFiRVIZUObB

Now back to our Notebook, enter this code in a cell

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authentification Google
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# Download du fichier
id = '1Pl-GxINYFcXL2ASaQjo_BFFiRVIZUObB'
downloaded = drive.CreateFile({'id': '1Pl-GxINYFcXL2ASaQjo_BFFiRVIZUObB'})
downloaded.GetContentFile('sample1000.csv')

Here it is, the file is now present in the Collaboratory environment. You have noticed … no need to specify the directory, the ID allows Google to find it wherever it is in your Drive.

You just have to read it as usual with Pandas for example:

pd.read_csv('sample1000.csv').head()

Be careful before anything else you will have to install the PyDev library. This is done via the pip command directly in a cell of the notebook. For example, you can add this command line at the start of the previous code:

!pip install -U -q PyDrive

Upload a file from Collaboratory -> Google Drive

Now that you can work with your data, you will probably want to get the results of your work (your predictions for example).

For that this portion of code will help you:

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# 2. Create & upload a file ()from here to Google drive) text file.
uploaded = drive.CreateFile({'title': 'sample1000_resultat.csv'})
uploaded.SetContentString('Contenu du fichier ici :-)')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))

And that’s what you get your foot in with this tool.
Please feel free to share your thoughts with me in the comments below.

Share this post

Benoit Cayla

In more than 15 years, I have built-up a solid experience around various integration projects (data & applications). I have, indeed, worked in nine different companies and successively adopted the vision of the service provider, the customer and the software editor. This experience, which made me almost omniscient in my field naturally led me to be involved in large-scale projects around the digitalization of business processes, mainly in such sectors like insurance and finance. Really passionate about AI (Machine Learning, NLP and Deep Learning), I joined Blue Prism in 2019 as a pre-sales solution consultant, where I can combine my subject matter skills with automation to help my customers to automate complex business processes in a more efficient way. In parallel with my professional activity, I run a blog aimed at showing how to understand and analyze data as simply as possible: datacorner.fr Learning, convincing by the arguments and passing on my knowledge could be my caracteristic triptych.

View all posts by Benoit Cayla →

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Privacy Preference Center

Analytics

NOTICE RELATING TO COOKIES
What is a cookie and what is it used for?

A cookie (or connection witness) is a text file that can be saved, subject to your choices, in a dedicated space on the hard drive of your terminal (computer, tablet, etc.) when consulting a online service through your browser software.
It is transmitted by a website's server to your browser. Each cookie is assigned an anonymous identifier. The cookie file allows its issuer to identify the terminal in which it is registered during the period of validity or registration of the cookie concerned. A cookie cannot be traced back to a natural person.

When you visit this site, it may be required to install, subject to your choice, various statistical cookies.
What types of cookies are placed by the website?


Google Analytics & Matomo Statistics Cookies

These cookies are used to establish statistics of visits to my site and to detect navigation problems in order to monitor and improve the quality of our services.
Exercise your choices according to the browser you use

You can configure your browser at any time in order to express and modify your wishes in terms of cookies, and in particular regarding statistical cookies. You can express your choices by setting your browser to refuse certain cookies.

If you refuse cookies, your visit to the site will no longer be counted in Google Analytics & Matomo and you will no longer be able to benefit from a number of features that are nevertheless necessary to navigate certain pages of this site.
However, you can oppose the registration of cookies by following the operating procedure available below:

On Internet Explorer
1. Go to Tools> Internet Options.
2. Click on the privacy tab.
3. Click on the advanced button, check the box "Ignore automatic management of cookies".

On Firefox
1. At the top of the Firefox window, click the Firefox button (Tools menu in Windows XP), then select Options.
2. Select the Privacy panel.
3. Configure Conservation rules: to use the personalized parameters for the history.
4. Uncheck Accept cookies.

On Chrome
1. Click on the wrench icon which is located in the browser toolbar.
2. Select Settings.
3. Click Show advanced settings.
4. In the “Confidentiality” section, click on the Content settings button.
5. In the "Cookies" section, you can block cookies and data from third-party sites

On Safari
1. Go to Settings> Preferences
2. Click on the Privacy tab
3. In the "Block cookies" area, check the "always" box.

About Opera
1. Go to Settings> Preferences
2. Click on the advanced tab
3. In the "Cookies" area, check the "Never accept cookies" box.
social network sharing cookies

On certain pages of this site there are buttons or modules of third-party social networks that allow you to use the functionalities of these networks and in particular to share content on this site with other people.
When you go to a web page on which one of these buttons or modules is located, your browser can send information to the social network which can then associate this visualization with your profile.

Social network cookies, over which this site has no control, may then be placed in your browser by these networks. I invite you to consult the confidentiality policies specific to each of these social networking sites, in order to become aware of the purposes for using the browsing information that social networks can collect using these buttons and modules.
- Twitter
- Google+
- LinkedIn

Statistiqcs only

Fork me on GitHub