Basic Python Data Visualization with Matplotlib

7 januari, 20207 januari, 2020 programminginpsychologyLämna en kommentar

In this short post, you will learn how to create a basic plot with Python. To create a plot we will use Matplotlib.

What is Matplotlib?

Matplotlib is a package used to draw charts with Python. Often it is used via Jupyter Notebook which is a web application where you can interactively write and run Python code. In Jupyter, you can see the charts directly in the browser. However, you will mostly work with scripts that create the data visualizations as files.

In many cases where charts are to be created, data is stored in data types whose structure is optimized to perform mathematical operations quickly. The package that provides these is NumPy. The NumPy package is usually imported with:

import numpy as np

Data Visualization in Python

Below you will find an example of how matplotlib is used to draw a diagram and save it to file. First, you need to import matplotlib and the plot (to draw the graphics):

import matplotlib
import matplotlib.pyplot as plt

Second, to save the plots as .png files we use the following command:

matplotlib.use('AGG')

Third, you type the following (into your script) to create the plot:

plt.figure()

plt.plot([1, 2, 75, 6, 7])
plt.ylabel('The label is useless in this plot example')

plt.savefig("data_visualization_in_Python_example.png")

Briefly about the plot function()

The package matplotlib has a module called pyplot. In the pyplot module, the plot() function is defined that we use to draw simpler graphs.

Other libraries to use when doing data science:

Pandas is also great for data wrangling, plotting, and descriptive statistics.
Seaborn is easier to use, and a wrapper around matplotlib
Statsmodels for data analysis

Make sure to check this site for Pandas Python Tutorials, data visualization, data analysis, and many more neat guides and how-tos for programming.

That’s all! Let me know if you need anything!

An Introduction to Data Analysis with Python and Pandas

6 januari, 20196 januari, 2019 programminginpsychologyLämna en kommentar

Here’s a great Pandas tutorial for the beginner:

How to Use MultiIndex in Pandas Dataframe

7 december, 2018 programminginpsychologyLämna en kommentar

Support Vector Machines: A Visual Explanation with Sample Python Code

25 oktober, 201825 oktober, 2018 programminginpsychologyLämna en kommentar

I found this very pedagogical and made it easier for me to understand SVM! Hope you enjoy it!

Getting started with Machine Learning using Python and Scikit-Learn

17 oktober, 201817 oktober, 2018 programminginpsychologyLämna en kommentar

Machine Learning and Go

3 oktober, 20183 oktober, 2018 programminginpsychologyLämna en kommentar

Negative Binomial Regression in R

5 september, 20185 september, 2018 programminginpsychologyLämna en kommentar

Hey! Here’s a tutorial video for you R enthusiasts out there! Eh, well, I assume that the data scientist use the tools needed for the job needed to be done?! Well, in this very nice R tutorial you will learn how to carry out negative binomial regression using R statistical programming environment. Enjoy!

K-Means Clustering – Methods using Scikit-learn in Python

12 juli, 201812 juli, 2018 programminginpsychologyLämna en kommentar

10 Open Data resources to use with Python

11 maj, 2016 programminginpsychologyLämna en kommentar

Recently, I have asked on Twitter if there are any good sources for free and open data to use to learn Python (and R):

Got some good suggestions but still looking for more. #Python #OpenData #DataScience #rstats #BigData https://t.co/WdZ7U7sPpu

— freddy (@freddy1876) April 30, 2016

In this post I will list the suggestions I have got so far.

Awesome Public Datasets: A huge collection of public datasets. Categorized by field (e.g., biology, economics, machine learning, etc).
UCI Machine learning Repository: ”…currently maintain 349 data sets as a service to the machine learning community”
https://www.kaggle.com/datasets: Also a list of publicly available datasets.
Goverment data: govermental data. Everything from agriculture to science & research. Very interesting.
Google Public Data: Huge collection of different data sources that are public. Seems really nice.
Amazon public data sets: ”AWS hosts a variety of public data sets that anyone can access for free.” Seems interesting.
Movielens: ”Learn more about movies with rich data, images, and trailers. Browse movies by community-applied tags, or apply your own tags. Explore the database with expressive search tools.” Movie lens is not really a data source in the way that I asked. However, the suggestion was that one could use the movie ratings to learn hadoop/spark/MapReduce. I may give this a try. If I ever get time.

This was the different data sources people on twitter replied to my tweet. I have myself found this very intersting: Open Psychology data. This is a journal that describes open and re-usable Psychology data. If you are interested in playing around with personality data it can be found here. Finally, APA have link to open data sets: Data Links.

I know, the title is wrong: I gave you a huge amount of different data sources to use. Some may contain overlapping links to data but I would assume that we now have data to play around with for quite some time. Do you know any more data sources that are open and free? Please leave a comment!

Bands Incorporated — OUseful.Info, the blog…

1 maj, 2016 programminginpsychologyLämna en kommentar

A few weeks ago, as I was doodling with some Companies House director network mapping code and simple Companies House chatbot ideas, I tweeted an example of Iron Maiden’s company structure based on co-director relationships. Depending on the original search is seeded, the maps may also includes elements of band members’ own personal holdings/interests. The […]

via Bands Incorporated — OUseful.Info, the blog…

Programming in Psychology

Statistics, Programming, and Psychology