10 Open Data resources to use with Python

Recently, I have asked on Twitter if there are any good sources for free and open data to use to learn Python (and R):

In this post I will list the suggestions I have got so far.

  • Awesome Public Datasets: A huge collection of public datasets. Categorized by field (e.g., biology, economics, machine learning, etc).
  • UCI Machine learning Repository: ”…currently maintain 349 data sets as a service to the machine learning community”
  • https://www.kaggle.com/datasets: Also a list of publicly available datasets.
  • Goverment data: govermental data. Everything from agriculture to science & research. Very interesting.
  • Google Public Data: Huge collection of different data sources that are public. Seems really nice.
  • Amazon public data sets: ”AWS hosts a variety of public data sets that anyone can access for free.” Seems interesting.
  • Movielens:  ”Learn more about movies with rich data, images, and trailers. Browse movies by community-applied tags, or apply your own tags. Explore the database with expressive search tools.”  Movie lens is not really a data source in the way that I asked. However, the suggestion was that one could use the movie ratings to learn hadoop/spark/MapReduce. I may give this a try. If I ever get time.

This was the different data sources people on twitter replied to my tweet. I have myself found this very intersting: Open Psychology data. This is a journal that describes open and re-usable Psychology data. If you are interested in playing around with personality data it can be found here. Finally, APA have link to open data sets: Data Links.

I know, the title is wrong: I gave you a huge amount of different data sources to use. Some may contain overlapping links to data but I would assume that we now have data to play around with for quite some time. Do you know any more data sources that are open and free? Please leave a comment!



Some notes on experimental design in Psychology

Major Confounding factors

Maturation – Mainly concerns longitudinal studies (and children) – as subjects grow older between pre- and posttreatment/test it may affect the results. The children, for instance, might get more sophisticated, get more experience, bigger, stronger, and so on, as the age. Natural maturation halso happen in other subjects. When in a new environment adults make predictable changes or adjustments over time. Diseases usually have predictive courses. This can lead to the fact that observed changes over time may be due to maturation rather than the independent variable.

History – During the course of a study, independent events that will affect the outcome can occur. Generally, threats to internal validity are due to history when there are long times between pre- and posttest measurements.

Testing – repeated testing of participants can threaten the internal validity, because the participants might get more skilled through repeated training on the measurement instrument.

Instrumentation – Findings can be due to changes in the measuring instrument over time rather than due to IV.

Regression to the Mean – when selecting subjects on the basis of their scores on a measure is extremely high or low they are usually not that extreme in a second testing. That is, their scores will regress to toward the mean. The amount of regression is contingent upon how much the performance of the test is due to variable factors. These variable factors can be, i.e., amount of study. More variable factors equals more regression.

Selection – These confounding factor appears when, for instance, comparing groups that are not equivalent before the manipulation begins.

Attrition – Attrition occurs when participants that drop out of the study due to some biasing factor. For instance, if participants drop out from one group but not from another (or not as much) one can lose important characteristics etc. It is important to not create situations or use procedures that can bias some participants against completing the study, and changing the outcome.

Diffusion of Treatment – If participants from that have different experimental conditions are able to talk with each other, some can expose the procedures to others. Test-participants might talk to control-participants that might not be aware that they are in a control group. These types of information exchanges are called diffusion of treatment and can affect the data such that the differences between groups disappear.

Sequence effects – experiences with one condition might affect responses to later conditions. If condition order is ABC systematic confounding can occur. For instance, performance in BC might reflect both the effect of the condition or the effect of already been exposed to A. To get rid of sequence effect one use more than one order.

Subject and Experimenter Effects

Expectations and biases of both the experimenter and the subjects can systematically affect the results of a study in subtle ways, thus reducing validity of the study.

Subject Effects – Participants in an experiment are not completely naïve. That is, they will have understandings, ideas and maybe misunderstandings about what to expect in the study. Different people have different reasons for participating. These reasons can be money, course credit, etc. Others might participate because they hope to learn something. Participants volunteer and carry out their role based on different motivations, understandings, expectations, and biases, which all can affect the outcome of a study. An experimental setting is not natural. When being observed people might behave differently than if they were not observed. This can lead to subject effects. Subject effects refer to any changes in behavior that was due being part of an experiment rather than experimental variables. Demand characteristics are when participants get cues on how they are expected to behave (according to hypotheses, etc). Demand characteristics usually occur unintentionally. Placebo effect, a related phenomenon, occurs when participants are expecting a specific effect.

Experimenter effects – concerns with any biasing effects that are due to actions of the researcher. Experimenter expectancies – the experimenter’s expectation about the outcome of the study. These expectations might cause researchers to bias results in many ways. The experimenter can influence the participant’s behavior in favor of the hypotheses, cherry picking data and statistical methods, and interpret results in a biased manner. 

Example of ways experimenter can influence the participant: Presenting cues in the form of intonation, facial expressions, change in posture, verbally reinforce some responses and not others, or incorrectly record participants’ responses.

Experimental designs

Pre-posttest with control group controls for history and maturation.


  • Systematic between-groups variance
    • Difference between groups could be due to
      1. Effect of the independent variable (experimental variance which is what we want!)
      2. Effects of confounding variables (extraneous variance)
      3. A combination of (1) and (2)

Natural variability that is due to sampling error will increase the group variability some.

  • Nonsystematic Within-Groups Variance
    • Error Variance – non-systematic within-groups variability.
      Due to random factors affecting some participants more than other within a group rather than systematically reflecting all members of a group. Error variance can increase by factors that are not stable, such as participant feeling ill or uncomfortable participating… Experimenter and equipment variations can also cause measurement errors for some participants.

In experimentation, each study is designed so as to maximize experimental variance, control extraneous variance, and minimize error variance.”

Maximizing experimental variance. Experiment variance is due to independent variables (IV) effect on dependent variables (DV). At least to levels of de IV should be present in an experiment. Experimental conditions need to be distinct! It can be useful to have a manipulation check to see that manipulation had the planned effect on p’s. One way to check if to use ratings.

To efficiently control for extraneous variables and minimize their possible different effects on the groups we must be sure that (1) the two groups (experimental and control) are AS similar as possible, (2) the groups are treated in exactly the same way EXCEPT for the IV manipulation.

Ways to control extraneous variance:

  1. Random assignment to groups decreases probability that the groups will differ – Best method
  2. Homogenous sample
  3. Confounding variables can be built into the experiment as an additional IV
  4. Matching or Within-subjets deisgn

Minimizing Error Variance.

Large error variance can hide differences between conditions due to the experimental manipulations. Measurement error is one error variance source. If participants does not respond consistent from trial to trial due to such factors the instrument is unreliable. To minimize sources of error variance carefully controlled conditions of measurement and have reliable instruments. Another source of error variance is individual differences. These types of variances minimized by within-subjects designs.

Experimental designs – Randomize when possible!

The four basic designs to test single IV using independent groups:

  1. Randomized, posttest-only, control-group design
    Here we have two groups: Group A and Group B. The treatment in the groups are compared in the post-test only. 
     This is made to test hypothesis that IV affect dependent measurements.
    Random selection will protect external validity. Furthermore, attrition and regression to the mean are also reduced by random assignment of participants (i.e., both groups will have [roughly] the same amount of extremes). Threats to internal validity is from instrumentation, history, and maturation are minimized due to inclusion of control group.
  1. Randomized, pretest-posttest, control-group design
    Improvement of R pt-only c-g design (the one above). Pretreatment/test
  2. Multilevel, completely randomized, between-subjects design
  3. Solomon’s four-group design. Pretests will affect participants’ responses to the treatment or to the posttest. Pretest can interact with the experimental manipulation which will produce confounding interaction effects.

T-test evaluates the size of the difference between the means of the two groups. The two means are divided by an error term. The error term is a function of the variance scores within each group and the sample sizes. Easy applied, common, and useful to test differences between two groups.

Analysis of Variance (ANOVA)
For multilevel designs with more than two groups. One-way ANOVAQ – only one independent variable. ANOVA uses both the within-groups variance and the between-group variance. Within-groups variance is a measure of nonsystematic variation within a group – error or chance variation among individual participants within a group. Due to factors such as individual differences and measurement errors. Between-groups variance is representing how variable group means are. Is a measurement of both systematic factors that affect the groups differently and of variation due to sampling error. The systematic factors include experimental variance and extraneous variance. Furthermore it also represents how variable the group means are. Approx. same means = small between-groups variance -> large difference in group means = between-groups variance is large.

The F-test is used to get statistical significance from an ANOVA. The F-test involves the ratio of the between-group mean square to the within-groups mean square.

F= mean square between groups/mean square within groups

The ratio can be increased by either increasing the between-groups mean square or by decreasing the within-groups mean square. Between-group mean squares increases by maximizing the differences between groups. The within-groups mean square is minimized by controlling as many potential sources of random error as possible. Maximization of experimental variance and minimization of error variance is what we want!

Rejection by the hypotheses that there are no systematic differences between groups UNLESS the F-ratio is larger than we would expect by chance alone.

UPDATE: I found an exceptional post on how to do one-way ANOVA using Python. In fact, there are 4 different Python methods for doing a Python ANOVA: One-Way ANOVA in Python.

Planned comparison is done to probe possible significance differences between the means. The F­-ratio will only tell us that there IS a difference. Not in which direction or between which groups. This is done by the means of planned comparison/a priori comparison/contrast.


Delta Plots on Response time data using Python

In this post we are going to learn how to do delta plots for response (reaction) time data. Response time data are often used in experimental psychology. It is the dependent variable in many experiments that aim to draw interference of cognitive processes.

Delta plots is a visualization method (Pratte, Rouder, Morey, & Feng, 2010;Speckman, Rouder, Morey, & Pratte, 2008). These visualizations (i.e., the plots) are created using the quantiles of the resposne time distribution. Research has indicated that even without a precise statistical inference test, delta plots can give the researcher key information concerning the underlying mechanisms of tasks thought to assess constructs such as, for instance, cognitive control and inhibition (Pratte, Rouder, Morey, & Feng, 2010)

import matplotlib.pyplot as plt

import numpy as np

data = {"x1":[0.794, 0.629, 0.597, 0.57, 0.524, 0.891, 0.707, 0.405, 0.808, 0.733,
    0.616, 0.922, 0.649, 0.522, 0.988, 0.489, 0.398, 0.412, 0.423, 0.73,
    0.603, 0.481, 0.952, 0.563, 0.986, 0.861, 0.633, 1.002, 0.973, 0.894,
    0.958, 0.478, 0.669, 1.305, 0.494, 0.484, 0.878, 0.794, 0.591, 0.532,
    0.685, 0.694, 0.672, 0.511, 0.776, 0.93, 0.508, 0.459, 0.816, 0.595],

    "x2":[0.503, 0.5, 0.868, 0.54, 0.818, 0.608, 0.389, 0.48, 1.153, 0.838,
    0.526, 0.81, 0.584, 0.422, 0.427, 0.39, 0.53, 0.411, 0.567, 0.806,
    0.739, 0.655, 0.54, 0.418, 0.445, 0.46, 0.537, 0.53, 0.499, 0.512,
    0.444, 0.611, 0.713, 0.653, 0.727, 0.649, 0.547, 0.463, 0.35, 0.689,
    0.444, 0.431, 0.505, 0.676, 0.495, 0.652, 0.566, 0.629, 0.493, 0.428]}

labels = list('AB')

fig = plt.figure(figsize=(10, 10), dpi=100)

ax = fig.add_subplot(111)

bp = ax.boxplot([data['x1'], data['x2']])


Here is the code that will create a Delta plot on our response time data above:

p = np.arange(10, 100, 10)

df=np.percentile(data['x1'], p) - np.percentile(data['x2'], p)
av=(np.percentile(data['x1'], p)+np.percentile(data['x2'], p))/2

fig = plt.figure(figsize=(12, 9), dpi=100)

plt.plot(av,df, 'ro')
plt.ylabel('Response Time Difference (sec)')
plt.xlabel('Men Response time (sec)')

Delta Plot

Delta Plot using Python
Delta Plot

That was pretty simple. In this tutorial you have learned how to create a delta plot that will lend you support for drawing interference of things such as inhibition or cognitive control. Drawing inference is something you will have to do for yourself! But you can have a look at the references below for more information. They will probably help.


  • Pratte, M. S., Rouder, J. N., Morey, R. D., & Feng, C. (2010). Exploring the differences in distributional properties between Stroop and Simon effects using delta plots. Attention, Perception & Psychophysics, 72(7), 2013–25. http://doi.org/10.3758/APP.72.7.2013
  • Speckman, P. L., Rouder, J. N., Morey, R. D., & Pratte, M. S. (2008). Delta Plots and Coherent Distribution Ordering. The American Statistician, 62(3), 262–266. http://doi.org/10.1198/000313008X333493

Inverse Efficiency Score

In this post I will briefly discuss a way to deal with speed-accuracy trade offs in response times experiments (RT). When conducting RT experiments and collecting responses such as correct and incorrect responses to visual stimuli one can at times find that under certain conditions people respond slower but more accurate. For instance if you have a condition with distractors and people are responding slower everything may seem fine. However, if you look at the accuracy data (proportion of correct responses) you may see that people responded faster. The inverse efficiency score combines speed and error. IES is suggested to be an “observable measure that gauges the average energy consumed by the system over time”. It is calculated by dividing RT by 1 – the proportion of Errors (PE), or the proportion of correct responses (PC)If two conditions have the same mean RT but differ in PE, IES of the condition that has the highest PE will increase more than the IES of the condition with the lower PEInterestingly, if there is a speed and accuracy trade-off, the IES will even out the PE differences. It is not always better to use IES. Seemingly, a lot of changes can happen when using IES. This is because it includes two variables and their sampling error. Therefore, the variability of the measure increases. Furthermore, whether the division RT by PC is a good reflection of the relative weights of speed and accuracy is unclear.