Retrieve historical meteo data

Share this post

For an analysis I wanted to do and after several searches, I realized that it was not that easy to get historical weather data. Of course, as i’m french I went to Meteo France Open Data and tried other open data site. But nothing really usable or it seems without a paid subscription. So I decided to retrieve them through a Python program and the scraping technique.

Where to get this data and how?

Never mind, instead of looking for ready-made datasets I found a site (historique-meteo.net) which propose a good number of meteorological data and which is also available in several levels of granularity:

  • Country (France, Europe, etc.), Region, Department, City
  • Year, month, day

The proposed data belongs to these two historical axes but at least we can simply have:

  • Maximum temperature (° C)
  • Minimum temperature (° C)
  • Wind speed (km / h)
  • Humidity (%)
  • Cloud cover (%)
  • Day length (hr)

Never mind, I don’t – yet – need more! I suggest you see in this article a small Python program that allows you to retrieve this data and write it to a csv file. To do this I will use the scraping technique that I explained in detail in a previous article.

If you are not interested in the program, I make this data already collected in GitHub (this will prevent you from asking this site unnecessarily):

Meteo data extraction program

I made this program in Python 3.7 using basic libraries. The program revolves around several functions which I will describe to you here in case you want to improve them, which will not be difficult since I am not a big developer 😉

You can check out the Python code here.

Program beggining

Libraries needed

import pandas as pd
import numpy as np
import requests
import lxml.html as lh
from datetime import datetime, timedelta
import sys
import getopt

Then a urlbase variable specifies the base URL for accessing the site. Change this value if you want, for example, to change the country for the extraction of weather data from another country (for example africa/cameroon/). By default, historical meteo data for France is used.

Another labels array, specifies the data to extract in the page. In fact, these are exactly the labels that are presented on the web page. The program will browse the page and when it finds this caption in a table it will take the front data.

Then come the regions listed in a regions table. In this program we extract the data by region, but we could completely modify it to extract the data by city, department or even country. For the regions, a subtlety because the site presents historical data taking into account the old French regional division. So I added another reg_target table that references the new regions. A function at the end will convert this old region to new region.

Utility functions

Some utility functions follow:

getValue(): Used to retrieve and especially remove unnecessary characters from weather data
convTimeInMinute(): The duration of the day data is in HH:MM:SS format, this function converts it into minutes (we never have data in seconds)
getValueFromXPath(): returns the raw data from the XPath path in the page
getXPath(): create the XPath path by scanning an array
The getOneMeteoFeature() function allows it to retrieve weather data in a given page (we have one page per Region and per day).

The get1RegionMeteoByDay() function retrieves all the weather data for a day and a region.

getAllRegionByDay () as for it retrieves for a given day, all the weather information for all regions.

GetMeteoData () allows it to retrieve all weather data for all regions between two given dates. Dates must be specified in YYYY / MM / DD format.

To finish the convertRegionData () function converts all the data retrieved (with the Day / old regions granularity) with the splitting of the new regions. To do this, we perform an aggregation of the old regions that were grouped together and an average of the data.

Main()

The main program (main) accepts several arguments in order to be able to extract the data on a date range. To launch the program from the command line, you will need to type:

GetFRMeteoData.py -s <Start Date> -e <End Date> -f <Target Folder>

-s indicates the start date of the extraction (YYYY / MM / DD format)
-e indicates the end date of extraction (YYYY / MM / DD format)
-f indicates the directory in which the result will be stored in csv format. For information the file will have the name MeteoFR_ _ .csv (eg MeteoFR_2019-06-01_2019-12-31.csv)
-h indicates how to use the command line.

Once the program is launched, you should get a file like this one (here opened with Excel):

Once again, please feel free to directly upload the files I have already checked out on GitHub:

You can also fork the project and improve it 😉 …

Share this post

Benoit Cayla

In more than 15 years, I have built-up a solid experience around various integration projects (data & applications). I have, indeed, worked in nine different companies and successively adopted the vision of the service provider, the customer and the software editor. This experience, which made me almost omniscient in my field naturally led me to be involved in large-scale projects around the digitalization of business processes, mainly in such sectors like insurance and finance. Really passionate about AI (Machine Learning, NLP and Deep Learning), I joined Blue Prism in 2019 as a pre-sales solution consultant, where I can combine my subject matter skills with automation to help my customers to automate complex business processes in a more efficient way. In parallel with my professional activity, I run a blog aimed at showing how to understand and analyze data as simply as possible: datacorner.fr Learning, convincing by the arguments and passing on my knowledge could be my caracteristic triptych.

View all posts by Benoit Cayla →

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Privacy Preference Center

Analytics

NOTICE RELATING TO COOKIES
What is a cookie and what is it used for?

A cookie (or connection witness) is a text file that can be saved, subject to your choices, in a dedicated space on the hard drive of your terminal (computer, tablet, etc.) when consulting a online service through your browser software.
It is transmitted by a website's server to your browser. Each cookie is assigned an anonymous identifier. The cookie file allows its issuer to identify the terminal in which it is registered during the period of validity or registration of the cookie concerned. A cookie cannot be traced back to a natural person.

When you visit this site, it may be required to install, subject to your choice, various statistical cookies.
What types of cookies are placed by the website?


Google Analytics & Matomo Statistics Cookies

These cookies are used to establish statistics of visits to my site and to detect navigation problems in order to monitor and improve the quality of our services.
Exercise your choices according to the browser you use

You can configure your browser at any time in order to express and modify your wishes in terms of cookies, and in particular regarding statistical cookies. You can express your choices by setting your browser to refuse certain cookies.

If you refuse cookies, your visit to the site will no longer be counted in Google Analytics & Matomo and you will no longer be able to benefit from a number of features that are nevertheless necessary to navigate certain pages of this site.
However, you can oppose the registration of cookies by following the operating procedure available below:

On Internet Explorer
1. Go to Tools> Internet Options.
2. Click on the privacy tab.
3. Click on the advanced button, check the box "Ignore automatic management of cookies".

On Firefox
1. At the top of the Firefox window, click the Firefox button (Tools menu in Windows XP), then select Options.
2. Select the Privacy panel.
3. Configure Conservation rules: to use the personalized parameters for the history.
4. Uncheck Accept cookies.

On Chrome
1. Click on the wrench icon which is located in the browser toolbar.
2. Select Settings.
3. Click Show advanced settings.
4. In the “Confidentiality” section, click on the Content settings button.
5. In the "Cookies" section, you can block cookies and data from third-party sites

On Safari
1. Go to Settings> Preferences
2. Click on the Privacy tab
3. In the "Block cookies" area, check the "always" box.

About Opera
1. Go to Settings> Preferences
2. Click on the advanced tab
3. In the "Cookies" area, check the "Never accept cookies" box.
social network sharing cookies

On certain pages of this site there are buttons or modules of third-party social networks that allow you to use the functionalities of these networks and in particular to share content on this site with other people.
When you go to a web page on which one of these buttons or modules is located, your browser can send information to the social network which can then associate this visualization with your profile.

Social network cookies, over which this site has no control, may then be placed in your browser by these networks. I invite you to consult the confidentiality policies specific to each of these social networking sites, in order to become aware of the purposes for using the browsing information that social networks can collect using these buttons and modules.
- Twitter
- Google+
- LinkedIn

Statistiqcs only

Fork me on GitHub