You said Open Data… yes but where?

Share this post

If Open Data is indeed a reality, it is not always easy to find usable datasets. Often too aggregated, very poorly prepared, or worse, difficult to access… it is very difficult to shop. In any case, building up a dataset, or simply supplementing / enriching an existing one quickly becomes an obstacle course.

The idea of ​​this post is therefore simple: provide a (non-exhaustive) list of several sites that will provide you with free data. For the rest, it’s up to you

Let’s go:

  • Geonames.com is the essential site for all geolocated data. You will find there for download in the dump sectionlists of countries, regions, cities, etc.
  • The OECD data are also very practical and downloadable CSV on the site. Employment, agriculture, education, etc. in short, you will find everything here. One regret, these data are often too aggregated …
  • Kaggle of course is a must!
  • Tableau via Tableau Public also offers a large number of datasets
  • The UCI also publishes several hundred data sets
  • Gap Minder offers you some really interesting datasets. A reference site for those curious about the company!
  • Inevitably you will go to the French Open Data site to retrieve “official” data.
  • Its American counterpart data.gov
  • The UN
  • And of course the European !
  • By the way, who said that Paris didn’t have its Open Data?
  • I also really like this site which in addition to open data reference software, and other useful and free services: Framasoft . This site also offers an Open-Data referencing service: Framalibre .
  • If you are looking for food data go to OpenFoodFacts
  • Météo France also has its Open Data!
  • The SNCF also elsewhere.
  • The World Bank also publishes its data.
  • For data on companies, obviously it is necessary to refer to the registry of the commercial court .
  • The wine
  • The beer(unfortunately this site is being updated if’ll see here )
  • And for movie fans, it’s impossible to miss the IMDB benchmark .

Besides these main sites, there are of course a multitude of others.

In fact, you will find a lot of them through your search engine, but to find the best veins, you will also have to search GitHub .

For exemple :

There are also portals like (be careful because these do not always provide data for free, some services may indeed become chargeable):

And the phenomenon is growing day by day. In fact, you can even find anything and everything… so be careful to check the credibility of the data sources!

Another concern that we often encounter is not being able to recover its much desired data. You have them in front of you, in your browser… but you cannot download them. The format of the recovered data is not suitable or the recovery mode is not suitable (synchronous, asynchronous, need to call for services, etc.). Ouch! sometimes you have to be creative to retrieve the holy grail.

In any case, you will not escape data preparation!

In short, the world of Open Data is and remains a jungle in which you have to venture equipped, otherwise you won’t do much!

Good luck then and do not hesitate to send me your links if you find new jewelry.

Share this post

Benoit Cayla

In more than 15 years, I have built-up a solid experience around various integration projects (data & applications). I have, indeed, worked in nine different companies and successively adopted the vision of the service provider, the customer and the software editor. This experience, which made me almost omniscient in my field naturally led me to be involved in large-scale projects around the digitalization of business processes, mainly in such sectors like insurance and finance. Really passionate about AI (Machine Learning, NLP and Deep Learning), I joined Blue Prism in 2019 as a pre-sales solution consultant, where I can combine my subject matter skills with automation to help my customers to automate complex business processes in a more efficient way. In parallel with my professional activity, I run a blog aimed at showing how to understand and analyze data as simply as possible: datacorner.fr Learning, convincing by the arguments and passing on my knowledge could be my caracteristic triptych.

View all posts by Benoit Cayla →

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Fork me on GitHub