10 Open Data resources to use with Python

Recently, I have asked on Twitter if there are any good sources for free and open data to use to learn Python (and R):

In this post I will list the suggestions I have got so far.

  • Awesome Public Datasets: A huge collection of public datasets. Categorized by field (e.g., biology, economics, machine learning, etc).
  • UCI Machine learning Repository: ”…currently maintain 349 data sets as a service to the machine learning community”
  • https://www.kaggle.com/datasets: Also a list of publicly available datasets.
  • Goverment data: govermental data. Everything from agriculture to science & research. Very interesting.
  • Google Public Data: Huge collection of different data sources that are public. Seems really nice.
  • Amazon public data sets: ”AWS hosts a variety of public data sets that anyone can access for free.” Seems interesting.
  • Movielens:  ”Learn more about movies with rich data, images, and trailers. Browse movies by community-applied tags, or apply your own tags. Explore the database with expressive search tools.”  Movie lens is not really a data source in the way that I asked. However, the suggestion was that one could use the movie ratings to learn hadoop/spark/MapReduce. I may give this a try. If I ever get time.

This was the different data sources people on twitter replied to my tweet. I have myself found this very intersting: Open Psychology data. This is a journal that describes open and re-usable Psychology data. If you are interested in playing around with personality data it can be found here. Finally, APA have link to open data sets: Data Links.

I know, the title is wrong: I gave you a huge amount of different data sources to use. Some may contain overlapping links to data but I would assume that we now have data to play around with for quite some time. Do you know any more data sources that are open and free? Please leave a comment!

 

 

Annonser

Delta Plots on Response time data using Python

In this post we are going to learn how to do delta plots for response (reaction) time data. Response time data are often used in experimental psychology. It is the dependent variable in many experiments that aim to draw interference of cognitive processes.

Delta plots is a visualization method (Pratte, Rouder, Morey, & Feng, 2010;Speckman, Rouder, Morey, & Pratte, 2008). These visualizations (i.e., the plots) are created using the quantiles of the resposne time distribution. Research has indicated that even without a precise statistical inference test, delta plots can give the researcher key information concerning the underlying mechanisms of tasks thought to assess constructs such as, for instance, cognitive control and inhibition (Pratte, Rouder, Morey, & Feng, 2010)

import matplotlib.pyplot as plt

import numpy as np

data = {"x1":[0.794, 0.629, 0.597, 0.57, 0.524, 0.891, 0.707, 0.405, 0.808, 0.733,
    0.616, 0.922, 0.649, 0.522, 0.988, 0.489, 0.398, 0.412, 0.423, 0.73,
    0.603, 0.481, 0.952, 0.563, 0.986, 0.861, 0.633, 1.002, 0.973, 0.894,
    0.958, 0.478, 0.669, 1.305, 0.494, 0.484, 0.878, 0.794, 0.591, 0.532,
    0.685, 0.694, 0.672, 0.511, 0.776, 0.93, 0.508, 0.459, 0.816, 0.595],

    "x2":[0.503, 0.5, 0.868, 0.54, 0.818, 0.608, 0.389, 0.48, 1.153, 0.838,
    0.526, 0.81, 0.584, 0.422, 0.427, 0.39, 0.53, 0.411, 0.567, 0.806,
    0.739, 0.655, 0.54, 0.418, 0.445, 0.46, 0.537, 0.53, 0.499, 0.512,
    0.444, 0.611, 0.713, 0.653, 0.727, 0.649, 0.547, 0.463, 0.35, 0.689,
    0.444, 0.431, 0.505, 0.676, 0.495, 0.652, 0.566, 0.629, 0.493, 0.428]}

labels = list('AB')

fig = plt.figure(figsize=(10, 10), dpi=100)

ax = fig.add_subplot(111)

bp = ax.boxplot([data['x1'], data['x2']])

boxplot_response_time_python_matplotlib_delta_plot

Here is the code that will create a Delta plot on our response time data above:

p = np.arange(10, 100, 10)

df=np.percentile(data['x1'], p) - np.percentile(data['x2'], p)
av=(np.percentile(data['x1'], p)+np.percentile(data['x2'], p))/2

plt.figure()
fig = plt.figure(figsize=(12, 9), dpi=100)

plt.plot(av,df, 'ro')
plt.ylim(-.05,.25)
plt.ylabel('Response Time Difference (sec)')
plt.xlabel('Men Response time (sec)')
plt.show()

Delta Plot

Delta Plot using Python
Delta Plot

That was pretty simple. In this tutorial you have learned how to create a delta plot that will lend you support for drawing interference of things such as inhibition or cognitive control. Drawing inference is something you will have to do for yourself! But you can have a look at the references below for more information. They will probably help.

References

  • Pratte, M. S., Rouder, J. N., Morey, R. D., & Feng, C. (2010). Exploring the differences in distributional properties between Stroop and Simon effects using delta plots. Attention, Perception & Psychophysics, 72(7), 2013–25. http://doi.org/10.3758/APP.72.7.2013
  • Speckman, P. L., Rouder, J. N., Morey, R. D., & Pratte, M. S. (2008). Delta Plots and Coherent Distribution Ordering. The American Statistician, 62(3), 262–266. http://doi.org/10.1198/000313008X333493

Introduction Video to Statsmodels

I found this introduction to Statsmodels. For you that don’t know Statsmodels is a great Python library for conducting statistical analysis. Many common methods are covered by the package. If you want to learn more Python and Data Analysis you will most likely enjoy this Youtube video:

I surely learned more Python data analysis by watching it anyway. It makes some tasks a lot easier and makes Python more similar to R.