This is part of my coursework for Data Analysis Tools.
I am using Python to analyse the data available from Gapminder. I want to compare the suicide rates against alcohol consumption for the countries in the data set. Both response variable (suicide rate per 100000) and explanatory variable (alcohol consumption per adult in litres) are quantitative variables, and so Pearson correlation coefficient (r) can be used.
The scatterplot for the two variables seems to show a positive linear correlation:
The correlation coefficient is 0.35, indicating a weak positive linear correlation.
The r2 value is 0.12, indicating that only 12% of the variability in the suicide rate is due to alcohol consumption.
The p-value is 7.31e-07, or 0.000000731, indicating that the correlation is highly significant.
This suggests that an increased alcohol consumption in a country is correlated with an increase in the recorded suicide rate, although the relationship is not very strong.
It’s impossible to say whether this is causal (high consumption of alcohol leads to an increase in suicide) or other factors are involved (for example, a high rate of depression could cause both alcoholism and suicide).
#import data analysis package
import matplotlib.pyplot as plt
# bug fix for display formats to avoid run time errors
pandas.set_option(‘display.float_format’, lambda x:’%f’%x)
#import the entire data set to memory
data = pandas.read_csv(‘mynewgapminder.csv’, low_memory=False)
#ensure that variables are numeric
data[‘suicideper100th’] = data[‘suicideper100th’].convert_objects(convert_numeric=True)
data[‘alcconsumption’] = data[‘alcconsumption’].convert_objects(convert_numeric=True)
#plot alcohol consumption and suicide rate as a scatterplot
scat1 = seaborn.regplot(x=”alcconsumption”, y=”suicideper100th”, data=data)
plt.xlabel(‘Alcohol consumption per adult in litres’)
plt.ylabel(‘Suicide rate per 100,000 population’)
plt.title(‘Scatterplot for the Association Between Alcohol Consumption and Suicide Rate’)
#get the correlation coefficient
print (‘association between alcohol consumption and suicide rate’)
print (scipy.stats.pearsonr(data_clean[‘alcconsumption’], data_clean