The Matplotlib library

Share this post

The Python matplotlib library is a library that allows you to do 2D data visualization in an extremely convenient way. It is not essential, of course, but when approaching Machine Learning projects it can be very practical. The objective of this tutorial is to get you started with this famous bookstore.

Note: This library comes with the Anaconda distribution.

There are many types of graphs supported, so I won’t detail them all in this tutorial. just remember that this library covers many types of graphs: lines, curves, histograms, points, bars, pie chart, tables, polars, etc.

This tutorial will explain how to make curves and histograms, for others I suggest you go see the documentation on the official site here .

For the dataset used, it’s simply that of the titanic on Kaggle.com: train.csv .

Draw curves

To begin with, you obviously have to declare the Python library with the import command. By usage we choose the alias plt. We take the opportunity to retrieve the data set that we will limit to the first 5 lines in order to have readable graphs (for this tutorial):

import pandas as pd import matplotlib.pyplot as plt
titanic = pd.read_csv("./data/train.csv")[:5]

Labels on the axes

Adding labels is done via the xlabel and ylabel commands:

plt.ylabel('Prix des billets')
plt.xlabel('Passagers')
plt.show()

We get :

Assignment of abscissa data

So far we have only given values ​​on the ordinate, let us specify the values ​​on the abscissa (passenger number):

plt.ylabel('Prix des billets')
plt.xlabel('Passagers')

Stacking lines

Now it is often useful to stack curves in order to better analyze the data, it’s very simple with matplotlib by stacking the calls to the plot function:

plt.plot(titanic.PassengerId, titanic.Fare)
plt.plot(titanic.PassengerId, titanic.Survived)
plt.plot(titanic.PassengerId, titanic.Age)
plt.ylabel('Prix des billets')
plt.xlabel('Passagers')

Here is the result of the 3 lines on the same graph:

Change visual

Once again the library makes our life easier and allows us to change color, line or line thickness in the blink of an eye simply by specifying options in the call to the plot function:

Here are the codes / options available:

  • Curve Codes:  or  or -.
  • Color codes: bgrcmykw
  • Line width with linewidth
plt.plot(titanic.PassengerId, titanic.Fare, "k-.", linewidth=5)
plt.plot(titanic.PassengerId, titanic.Survived, "r", linewidth=1)
plt.plot(titanic.PassengerId, titanic.Age, "c", linewidth=10)
plt.ylabel('Prix des billets')
plt.xlabel('Passagers')

the result :

Add a grid

Simply by adding a call to the grid () function:

plt.grid(True)

Voilà le résultat:

Do more in your graph

The matplotlib library has several functions for this:

  • text () and annotate () for adding text in the graph (for example, let’s add a point specifying the maximum price)
  • legend () to add the legend
  • etc.
plt.text(2,70, '  Prix max !')
plt.legend()

The result :

Create graph grids

Just as it is possible to stack several curves in the same graph, it is possible to create graph grids; This is very practical for stacking graphs which must be looked at together but which do not have the same orders of magnitude in terms of abscissa and ordinate. To do so, we use the subplot () instruction which will describe the grid. This command takes multiple arguments;

  1. Number of lines of the graph
  2. Number of columns in the graph grid
  3. Index of the graph in the grid
  4. Options (here we have specified a background color)

In the example below we define a grid of 1 column and 2 rows:

plt.subplot(1, 2, 1)
plt.plot(titanic.PassengerId, titanic.Fare, "k-.", linewidth=5)
plt.subplot(1, 2, 2, facecolor='y')
plt.plot(titanic.PassengerId, titanic.Age, "c", linewidth=10)

Note: we also specify that the second graph has a yellow background

Here is the result:

Some other diagrams available

Histogram

The histogram is a very practical graph because it presents you in cumulative (automatic) your data set. This is very useful for example to see the distribution frequencies of your values:

plt.hist(titanic.Fare)

Warning: this function does not work if you have character type labels (abscissa). To remedy this, use the bar () function by performing the count “manually”.

You can look at these two Python functions as an example implementation on Gist:

Simple histogram

# Where matplotlib do not work (when using string labels for hist())
from collections import Counter
def hist_string(df) :
    distincts_count = Counter(df)
    df = pd.DataFrame.from_dict(distincts_count, orient='index')
    df.plot(kind='bar')
hist_string(titanic.Embarked)

Double histogram

# The goal here is to create a double histogram with string labels which is not supported yet with the hist() matplotlib function

import matplotlib.pyplot as plt
import numpy as np
from collections import Counter

# Dead list
l1 = titanic[titanic["Survived"] == 0]["Embarked"].dropna()
# Survivor list
l2 = titanic[titanic["Survived"] == 1]["Embarked"].dropna()

# draw histogram function
def hist2Str(_l1_Values, _l2_Values, _l1_Label, _l2_Label, _X_Label, _Y_Label):
    step = 0.3
    # Values lists 1 & 2
    count1 = Counter(_l1_Values)
    counts1 = count1.values()
    count2 = Counter(_l2_Values)
    counts2 = count2.values()
    # Labels in x
    labels = count1.keys()

    # Draw histogram
    bar_x = np.arange(len(counts1))
    plt.bar(bar_x, counts1, step, align = 'center')
    plt.bar(bar_x + step, counts2, step, color='r', align = 'center')
    plt.xticks(bar_x + step/2, labels)
    # Axis labels & draw
    plt.ylabel(_Y_Label)
    plt.xlabel(_X_Label)
    plt.legend([_l1_Label, _l2_Label])
    plt.draw()
hist2Str(l1, l2, "Dead", "Survived", "Embarked", "Passenger")

Bar chart

Visually it is a histogram, but except that as for the curves (above) it is you who specify the data (abscissa and ordinate):

plt.bar(titanic.PassengerId, titanic.Fare, color="b")

Pie chart

Very useful to visualize the share of each data on a “finite set” (like percentages):

plt.pie(titanic.Fare, autopct='%1.1f%%', startangle=180)

Scatter

To simply place point distributions :

plt.scatter(titanic.PassengerId, titanic.Fare)

Cumulate diagrams into one

As we saw previously, it is now sufficient to stack the calls to plot (), scatter (), bar (), etc.

plt.scatter(titanic.PassengerId, titanic.Fare)
plt.bar(titanic.PassengerId, titanic.Fare, color="y")
plt.plot(titanic.PassengerId, titanic.Age, "c", linewidth=10)

Summary

The matplotlib documentation is available here .

Summary of diagram functions in this tutorial:

  • plot () : to plot curves
  • scatter () ; to plot points
  • bar () : for bar charts
  • pie () ; for camenbert
  • hist () ; for histograms
Share this post

Benoit Cayla

In more than 15 years, I have built-up a solid experience around various integration projects (data & applications). I have, indeed, worked in nine different companies and successively adopted the vision of the service provider, the customer and the software editor. This experience, which made me almost omniscient in my field naturally led me to be involved in large-scale projects around the digitalization of business processes, mainly in such sectors like insurance and finance. Really passionate about AI (Machine Learning, NLP and Deep Learning), I joined Blue Prism in 2019 as a pre-sales solution consultant, where I can combine my subject matter skills with automation to help my customers to automate complex business processes in a more efficient way. In parallel with my professional activity, I run a blog aimed at showing how to understand and analyze data as simply as possible: datacorner.fr Learning, convincing by the arguments and passing on my knowledge could be my caracteristic triptych.

View all posts by Benoit Cayla →

One thought on “The Matplotlib library

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Privacy Preference Center

Analytics

NOTICE RELATING TO COOKIES
What is a cookie and what is it used for?

A cookie (or connection witness) is a text file that can be saved, subject to your choices, in a dedicated space on the hard drive of your terminal (computer, tablet, etc.) when consulting a online service through your browser software.
It is transmitted by a website's server to your browser. Each cookie is assigned an anonymous identifier. The cookie file allows its issuer to identify the terminal in which it is registered during the period of validity or registration of the cookie concerned. A cookie cannot be traced back to a natural person.

When you visit this site, it may be required to install, subject to your choice, various statistical cookies.
What types of cookies are placed by the website?


Google Analytics & Matomo Statistics Cookies

These cookies are used to establish statistics of visits to my site and to detect navigation problems in order to monitor and improve the quality of our services.
Exercise your choices according to the browser you use

You can configure your browser at any time in order to express and modify your wishes in terms of cookies, and in particular regarding statistical cookies. You can express your choices by setting your browser to refuse certain cookies.

If you refuse cookies, your visit to the site will no longer be counted in Google Analytics & Matomo and you will no longer be able to benefit from a number of features that are nevertheless necessary to navigate certain pages of this site.
However, you can oppose the registration of cookies by following the operating procedure available below:

On Internet Explorer
1. Go to Tools> Internet Options.
2. Click on the privacy tab.
3. Click on the advanced button, check the box "Ignore automatic management of cookies".

On Firefox
1. At the top of the Firefox window, click the Firefox button (Tools menu in Windows XP), then select Options.
2. Select the Privacy panel.
3. Configure Conservation rules: to use the personalized parameters for the history.
4. Uncheck Accept cookies.

On Chrome
1. Click on the wrench icon which is located in the browser toolbar.
2. Select Settings.
3. Click Show advanced settings.
4. In the “Confidentiality” section, click on the Content settings button.
5. In the "Cookies" section, you can block cookies and data from third-party sites

On Safari
1. Go to Settings> Preferences
2. Click on the Privacy tab
3. In the "Block cookies" area, check the "always" box.

About Opera
1. Go to Settings> Preferences
2. Click on the advanced tab
3. In the "Cookies" area, check the "Never accept cookies" box.
social network sharing cookies

On certain pages of this site there are buttons or modules of third-party social networks that allow you to use the functionalities of these networks and in particular to share content on this site with other people.
When you go to a web page on which one of these buttons or modules is located, your browser can send information to the social network which can then associate this visualization with your profile.

Social network cookies, over which this site has no control, may then be placed in your browser by these networks. I invite you to consult the confidentiality policies specific to each of these social networking sites, in order to become aware of the purposes for using the browsing information that social networks can collect using these buttons and modules.
- Twitter
- Google+
- LinkedIn

Statistiqcs only

Fork me on GitHub