Recently, I have asked on Twitter if there are any good sources for free and open data to use to learn Python (and R):
Got some good suggestions but still looking for more. #Python #OpenData #DataScience #rstats #BigDatahttps://t.co/WdZ7U7sPpu
— freddy (@freddy1876) April 30, 2016
In this post I will list the suggestions I have got so far.
- Awesome Public Datasets: A huge collection of public datasets. Categorized by field (e.g., biology, economics, machine learning, etc).
- UCI Machine learning Repository: ”…currently maintain 349 data sets as a service to the machine learning community”
- https://www.kaggle.com/datasets: Also a list of publicly available datasets.
- Goverment data: govermental data. Everything from agriculture to science & research. Very interesting.
- Google Public Data: Huge collection of different data sources that are public. Seems really nice.
- Amazon public data sets: ”AWS hosts a variety of public data sets that anyone can access for free.” Seems interesting.
- Movielens: ”Learn more about movies with rich data, images, and trailers. Browse movies by community-applied tags, or apply your own tags. Explore the database with expressive search tools.” Movie lens is not really a data source in the way that I asked. However, the suggestion was that one could use the movie ratings to learn hadoop/spark/MapReduce. I may give this a try. If I ever get time.
This was the different data sources people on twitter replied to my tweet. I have myself found this very intersting: Open Psychology data. This is a journal that describes open and re-usable Psychology data. If you are interested in playing around with personality data it can be found here. Finally, APA have link to open data sets: Data Links.
I know, the title is wrong: I gave you a huge amount of different data sources to use. Some may contain overlapping links to data but I would assume that we now have data to play around with for quite some time. Do you know any more data sources that are open and free? Please leave a comment!