If Open Data is indeed a reality, it is not always easy to find usable datasets. Often too aggregated, very poorly prepared, or worse, difficult to access… it is very difficult to shop. In any case, building up a dataset, or simply supplementing / enriching an existing one quickly becomes an obstacle course.
The idea of this post is therefore simple: provide a (non-exhaustive) list of several sites that will provide you with free data. For the rest, it’s up to you
- Geonames.com is the essential site for all geolocated data. You will find there for download in the dump sectionlists of countries, regions, cities, etc.
- The OECD data are also very practical and downloadable CSV on the site. Employment, agriculture, education, etc. in short, you will find everything here. One regret, these data are often too aggregated …
- Kaggle of course is a must!
- Tableau via Tableau Public also offers a large number of datasets
- The UCI also publishes several hundred data sets
- Gap Minder offers you some really interesting datasets. A reference site for those curious about the company!
- Inevitably you will go to the French Open Data site to retrieve “official” data.
- Its American counterpart data.gov
- The UN
- And of course the European !
- By the way, who said that Paris didn’t have its Open Data?
- I also really like this site which in addition to open data reference software, and other useful and free services: Framasoft . This site also offers an Open-Data referencing service: Framalibre .
- If you are looking for food data go to OpenFoodFacts
- Météo France also has its Open Data!
- The SNCF also elsewhere.
- The World Bank also publishes its data.
- For data on companies, obviously it is necessary to refer to the registry of the commercial court .
- The wine
- The beer(unfortunately this site is being updated if’ll see here )
- And for movie fans, it’s impossible to miss the IMDB benchmark .
Besides these main sites, there are of course a multitude of others.
In fact, you will find a lot of them through your search engine, but to find the best veins, you will also have to search GitHub .
For exemple :
There are also portals like (be careful because these do not always provide data for free, some services may indeed become chargeable):
And the phenomenon is growing day by day. In fact, you can even find anything and everything… so be careful to check the credibility of the data sources!
Another concern that we often encounter is not being able to recover its much desired data. You have them in front of you, in your browser… but you cannot download them. The format of the recovered data is not suitable or the recovery mode is not suitable (synchronous, asynchronous, need to call for services, etc.). Ouch! sometimes you have to be creative to retrieve the holy grail.
In any case, you will not escape data preparation!
In short, the world of Open Data is and remains a jungle in which you have to venture equipped, otherwise you won’t do much!
Good luck then and do not hesitate to send me your links if you find new jewelry.