10 Open Data resources to use with Python

Recently, I have asked on Twitter if there are any good sources for free and open data to use to learn Python (and R):

In this post I will list the suggestions I have got so far.

  • Awesome Public Datasets: A huge collection of public datasets. Categorized by field (e.g., biology, economics, machine learning, etc).
  • UCI Machine learning Repository: ”…currently maintain 349 data sets as a service to the machine learning community”
  • https://www.kaggle.com/datasets: Also a list of publicly available datasets.
  • Goverment data: govermental data. Everything from agriculture to science & research. Very interesting.
  • Google Public Data: Huge collection of different data sources that are public. Seems really nice.
  • Amazon public data sets: ”AWS hosts a variety of public data sets that anyone can access for free.” Seems interesting.
  • Movielens:  ”Learn more about movies with rich data, images, and trailers. Browse movies by community-applied tags, or apply your own tags. Explore the database with expressive search tools.”  Movie lens is not really a data source in the way that I asked. However, the suggestion was that one could use the movie ratings to learn hadoop/spark/MapReduce. I may give this a try. If I ever get time.

This was the different data sources people on twitter replied to my tweet. I have myself found this very intersting: Open Psychology data. This is a journal that describes open and re-usable Psychology data. If you are interested in playing around with personality data it can be found here. Finally, APA have link to open data sets: Data Links.

I know, the title is wrong: I gave you a huge amount of different data sources to use. Some may contain overlapping links to data but I would assume that we now have data to play around with for quite some time. Do you know any more data sources that are open and free? Please leave a comment!



