*This is part of my coursework for Data Management and Visualization.*

I am using Python to analyse the data available from Gapminder. Following on from last week’s assignment, I had to create graphs to display the variables and the relationships between them.

# Python program

#import data analysis package

import pandas

import numpy

import seaborn

import matplotlib.pyplot as plt# bug fix for display formats to avoid run time errors

pandas.set_option(‘display.float_format’, lambda x:’%f’%x)#import the entire data set to memory

data = pandas.read_csv(‘mynewgapminder.csv’, low_memory=False)#ensure that variables are numeric

data[‘suicideper100th’] = data[‘suicideper100th’].convert_objects(convert_numeric=True)

data[‘alcconsumption’] = data[‘alcconsumption’].convert_objects(convert_numeric=True)

data[‘polityscore’] = data[‘polityscore’].convert_objects(convert_numeric=True)#I only want to look at countries where stats exist for suicide rate

#get subset where suicide rates exist

sub1=data[data[‘suicideper100th’]>0]

sub2=sub1.copy()#first look at suicide rates

print (“Suicide rate per 100,000 population, age adjusted”)

print (‘————————-‘)

#print a description, including count, mean, standard devation, min, max, and percentiles

desc1 = sub2[‘suicideper100th’].describe()

print(desc1)

print ()#Univariate histogram for suicide rate

#give the figure a unique number so it will display as a separate graph

plt.figure(101)

#plot suicide rate as a histogram

seaborn.distplot(sub2[‘suicideper100th’].dropna(), kde=False)

plt.xlabel(‘Suicides per 100,000 population’)

plt.ylabel(‘Number of countries’)

plt.title(‘Histogram for suicide rate’)#next look at alcohol consumption

print (“Alcohol consumption per adult (age 15+), in litres”)

print (‘————————-‘)

#print a description, including count, mean, standard devation, min, max, and percentiles

desc2 = sub2[‘alcconsumption’].describe()

print(desc2)

print ()#Univariate histogram for alcohol consumption

#give the figure a unique number so it will display as a separate graph

plt.figure(102)

#plot alcohol consumption as a histogram

seaborn.distplot(sub2[‘alcconsumption’].dropna(), kde=False)

plt.xlabel(‘Alcohol consumption per adult in litres’)

plt.ylabel(‘Number of countries’)

plt.title(‘Alcohol consumption’)##next look at polity score

print (“Polity (democracy) score”)

print (‘————————-‘)

#print a description, including count, mean, standard devation, min, max, and percentiles

desc3 = sub2[‘polityscore’].describe()

print(desc3)

print ()

#Univariate histogram for polity score

#give the figure a unique number so it will display as a separate graph

plt.figure(103)

#plot polity score as a histogram

seaborn.distplot(sub2[‘polityscore’].dropna(), kde=False)

plt.xlabel(‘Polity score’)

plt.ylabel(‘Number of countries’)

plt.title(‘Polity score’)#Now look at relationship between suicide and alcohol

#basic scatterplot: Q->Q

#give the figure a unique number so it will display as a seperate graph

plt.figure(104)

#plot alcohol consumption and suicide rate as a scatterplot

scat1 = seaborn.regplot(x=”alcconsumption”, y=”suicideper100th”, data=data)

plt.xlabel(‘Alcohol consumption per adult in litres’)

plt.ylabel(‘Suicide rate per 100,000 population’)

plt.title(‘Scatterplot for the Association Between Alcohol Consumption and Suicide Rate’)# quartile split (use qcut function & ask for 4 groups – gives you quartile split)

print (‘Alcohol rate – 4 categories – quartiles’)

data[‘AlcoholGRP’]=pandas.qcut(data.alcconsumption, 4, labels=[“25th%tile”,”50%tile”,”75%tile”,”100%tile”])

c10 = data[‘AlcoholGRP’].value_counts(sort=False, dropna=True)

print(c10)

print ()# bivariate bar graph C->Q

#give the figure a unique number so it will display as a separate graph

plt.figure(205)

#plot alcohol consumption as quartiles and display against suicide rate

seaborn.factorplot(x=’AlcoholGRP’, y=’suicideper100th’, data=data, kind=”bar”, ci=None)

plt.xlabel(‘Alcohol consumption per adult in litres’)

plt.ylabel(‘Suicide rate per 100,000 population’)

plt.title(‘Bar Chart for the Association Between Alcohol Consumption and Suicide Rate’)

# Suicide rate

Description of the suicideper100th variable:

Suicide rate per 100,000 population, age adjusted

————————-

count 191.000000

mean 9.640839

std 6.300178

min 0.201449

25% 4.988449

50% 8.262893

75% 12.328551

max 35.752872

Name: suicideper100th, dtype: float64

The univariate graph for suicide rate:

This graph is unimodal, with its highest peak at around 7.5 to 10 suicides per 100000 of the population, which is close to the mean of 9.64. It seems to be skewed to the left as there are more countries with low suicide rates than with higher rates.

# Alcohol consumption

Description of the alcconsumption variable:

Alcohol consumption per adult (age 15+), in litres

————————-

count 185.000000

mean 6.689730

std 4.908841

min 0.030000

25% 2.560000

50% 5.920000

75% 9.860000

max 23.010000

Name: alcconsumption, dtype: float64

The univariate graph for alcohol consumption:

This graph is bimodal, with one peak at 0 to 2.5 litres consumed per adult, and a second peak at 7.5 to 10 litres. The mean is 6.69 litres. It seems to be skewed to the left as there are more countries with low alcohol intake than with higher rates.

# Polity score

Description of the polityscore variable:

Polity (democracy) score

————————-

count 159.000000

mean 3.616352

std 6.320350

min -10.000000

25% -2.000000

50% 6.000000

75% 9.000000

max 10.000000

Name: polityscore, dtype: float64

The univariate graph for polity score:

This graph is unimodal, with a peak at 10. It is highly skewed to the right as there are more countries with high polity scores (democracies) than with lower ones (autocracies).

# Relationship between suicide rate and alcohol intake

My initial hypothesis was that a high alcohol rate would be correlated with a high suicide rate. In this case, the alcohol consumption rate is the explanatory variable that goes on the X axis, while the suicide rate is the response variable that goes on the Y axis.

Scatter plot for alcohol consumption and suicide:

This scatterplot shows a weak positive relationship between alcohol consumption and suicide rate. In other words, countries who consume high quantities of alcohol show a slight tendency to have a high suicide rate.

Bar chart for alcohol consumption (in quartiles) and suicide:

The bar chart shows the relationship in further detail. In the lower three quartiles, there is little relationship between alcohol consumption and suicide rate. However, there is a definite trend for countries in the highest quartile for alcohol consumption to have a high rate of suicide.

Further research is required before any definite conclusions can be drawn.