For an analysis I wanted to do and after several searches, I realized that it was not that easy to get historical weather data. Of course, as i’m french I went to Meteo France Open Data and tried other open data site. But nothing really usable or it seems without a paid subscription. So I decided to retrieve them through a Python program and the scraping technique.
Where to get this data and how?
Never mind, instead of looking for ready-made datasets I found a site (historique-meteo.net) which propose a good number of meteorological data and which is also available in several levels of granularity:
- Country (France, Europe, etc.), Region, Department, City
- Year, month, day
The proposed data belongs to these two historical axes but at least we can simply have:
- Maximum temperature (° C)
- Minimum temperature (° C)
- Wind speed (km / h)
- Humidity (%)
- Cloud cover (%)
- Day length (hr)
Never mind, I don’t – yet – need more! I suggest you see in this article a small Python program that allows you to retrieve this data and write it to a csv file. To do this I will use the scraping technique that I explained in detail in a previous article.
If you are not interested in the program, I make this data already collected in GitHub (this will prevent you from asking this site unnecessarily):
Meteo data extraction program
I made this program in Python 3.7 using basic libraries. The program revolves around several functions which I will describe to you here in case you want to improve them, which will not be difficult since I am not a big developer 😉
You can check out the Python code here.
import pandas as pd
import numpy as np
import lxml.html as lh
from datetime import datetime, timedelta
Then a urlbase variable specifies the base URL for accessing the site. Change this value if you want, for example, to change the country for the extraction of weather data from another country (for example africa/cameroon/). By default, historical meteo data for France is used.
Another labels array, specifies the data to extract in the page. In fact, these are exactly the labels that are presented on the web page. The program will browse the page and when it finds this caption in a table it will take the front data.
Then come the regions listed in a regions table. In this program we extract the data by region, but we could completely modify it to extract the data by city, department or even country. For the regions, a subtlety because the site presents historical data taking into account the old French regional division. So I added another reg_target table that references the new regions. A function at the end will convert this old region to new region.
Some utility functions follow:
getValue(): Used to retrieve and especially remove unnecessary characters from weather data
convTimeInMinute(): The duration of the day data is in HH:MM:SS format, this function converts it into minutes (we never have data in seconds)
getValueFromXPath(): returns the raw data from the XPath path in the page
getXPath(): create the XPath path by scanning an array
The getOneMeteoFeature() function allows it to retrieve weather data in a given page (we have one page per Region and per day).
The get1RegionMeteoByDay() function retrieves all the weather data for a day and a region.
getAllRegionByDay () as for it retrieves for a given day, all the weather information for all regions.
GetMeteoData () allows it to retrieve all weather data for all regions between two given dates. Dates must be specified in YYYY / MM / DD format.
To finish the convertRegionData () function converts all the data retrieved (with the Day / old regions granularity) with the splitting of the new regions. To do this, we perform an aggregation of the old regions that were grouped together and an average of the data.
The main program (main) accepts several arguments in order to be able to extract the data on a date range. To launch the program from the command line, you will need to type:
GetFRMeteoData.py -s <Start Date> -e <End Date> -f <Target Folder>
-s indicates the start date of the extraction (YYYY / MM / DD format)
-e indicates the end date of extraction (YYYY / MM / DD format)
-f indicates the directory in which the result will be stored in csv format. For information the file will have the name MeteoFR_ _ .csv (eg MeteoFR_2019-06-01_2019-12-31.csv)
-h indicates how to use the command line.
Once the program is launched, you should get a file like this one (here opened with Excel):
Once again, please feel free to directly upload the files I have already checked out on GitHub:
You can also fork the project and improve it 😉 …