*This is part of my coursework for Data Analysis Tools.*

I am using Python to analyse the data available from Gapminder. I want to compare the suicide rates against alcohol consumption for the countries in the data set. Both response variable (suicide rate per 100000) and explanatory variable (alcohol consumption per adult in litres) are quantitative variables, and so Pearson correlation coefficient (r) can be used.

The scatterplot for the two variables seems to show a positive linear correlation:

The correlation coefficient is 0.35, indicating a weak positive linear correlation.

The r^{2} value is 0.12, indicating that only 12% of the variability in the suicide rate is due to alcohol consumption.

The p-value is 7.31e-07, or 0.000000731, indicating that the correlation is highly significant.

This suggests that an increased alcohol consumption in a country is correlated with an increase in the recorded suicide rate, although the relationship is not very strong.

It’s impossible to say whether this is causal (high consumption of alcohol leads to an increase in suicide) or other factors are involved (for example, a high rate of depression could cause both alcoholism and suicide).

## Python program

#import data analysis package

import pandas

import numpy

import seaborn

import scipy

import matplotlib.pyplot as plt# bug fix for display formats to avoid run time errors

pandas.set_option(‘display.float_format’, lambda x:’%f’%x)#import the entire data set to memory

data = pandas.read_csv(‘mynewgapminder.csv’, low_memory=False)#ensure that variables are numeric

data[‘suicideper100th’] = data[‘suicideper100th’].convert_objects(convert_numeric=True)

data[‘alcconsumption’] = data[‘alcconsumption’].convert_objects(convert_numeric=True)#plot alcohol consumption and suicide rate as a scatterplot

scat1 = seaborn.regplot(x=”alcconsumption”, y=”suicideper100th”, data=data)

plt.xlabel(‘Alcohol consumption per adult in litres’)

plt.ylabel(‘Suicide rate per 100,000 population’)

plt.title(‘Scatterplot for the Association Between Alcohol Consumption and Suicide Rate’)#clean NAs

data_clean=data.dropna()#get the correlation coefficient

print (‘association between alcohol consumption and suicide rate’)

print (scipy.stats.pearsonr(data_clean[‘alcconsumption’], data_clean

[‘suicideper100th’]))