Testing a Potential Moderator

This is part of my coursework for Data Analysis Tools.

I am using Python to analyse the data available from Gapminder. Using Pearson’s correlation coefficient (r), I found that alcohol consumption has a significant positive correlation with suicide rate for the whole data set (r=0.35, p=7.31e-07). However, I want to see if this correlation is dependent on the geographical region of the world.

I created a new variable called “region” and assigned a number for each geographical area:

  1. Asia (34 countries)
  2. Europe (50 countries)
  3. Africa (50 countries)
  4. Middle East (17 countries)
  5. Americas (43 countries)
  6. Oceania (19 countries)

I then tested the correlation coefficient for each region:

Association between alcohol consumption and suicide rate for ASIA
(0.10174016798317355, 0.59947647029652795)

Association between alcohol consumption and suicide rate for EUROPE
(0.60505083925984826, 1.7244830114896035e-05)

Association between alcohol consumption and suicide rate for AFRICA
(0.22021753857669515, 0.12839491207675466)

Association between alcohol consumption and suicide rate for MIDDLE EAST
(-0.056124327060797188, 0.83644085383899258)

Association between alcohol consumption and suicide rate for AMERICAS
(0.070065617449753828, 0.69376393774656941)

Association between alcohol consumption and suicide rate for OCEANIA
(0.33344561639190951, 0.24400145879606708)

In Europe, the correlation between alcohol consumption and suicide rate is significant, with a p-value of 1.72e-05, or 0.0000172.  The correlation coefficient is 0.6, indicating a strong positive linear correlation. So within Europe, a high rate of alcohol consumption is associated with a high rate of suicide.

For all other regions, the correlation between alcohol consumption and suicide rate is not significant, with a p-value greater than 0.05. For these regions, we must accept the null hypothesis that there is no relationship between alcohol consumption and suicide rate.

This is further illustrated by looking at the scatterplots for the different regions:

Scatterplots

Python program

#import data analysis package
import pandas
import numpy
import seaborn
import scipy
import matplotlib.pyplot as plt

# bug fix for display formats to avoid run time errors
pandas.set_option(‘display.float_format’, lambda x:’%f’%x)

#import the entire data set to memory
data = pandas.read_csv(‘mynewgapminder.csv’, low_memory=False)

#ensure that variables are numeric
data[‘suicideper100th’] = data[‘suicideper100th’].convert_objects(convert_numeric=True)
data[‘alcconsumption’] = data[‘alcconsumption’].convert_objects(convert_numeric=True)
data[‘region’] = data[‘region’].convert_objects(convert_numeric=True)
data[‘region’] = data[‘region’].replace(”,numpy.nan)

#clean NAs
data_clean=data.dropna()

#get the correlation coefficient for all regions
print (‘Association between alcohol consumption and suicide rate for ALL REGIONS’)
print (scipy.stats.pearsonr(data_clean[‘alcconsumption’], data_clean
[‘suicideper100th’]))
print(”)

#display counts for regions
print(‘counts for region’)
print (“Asia=1, Europe=2, Africa=3, Middle East=4, Americas=5, Oceania=6″)
ct3 = data.groupby(‘region’).size()
print (ct3)
print(”)

#break into subgroups depending on region
subAsia=data_clean[(data_clean[‘region’])==1]
subEurope=data_clean[(data_clean[‘region’])==2]
subAfrica=data_clean[(data_clean[‘region’])==3]
subMiddleEast=data_clean[(data_clean[‘region’])==4]
subAmericas=data_clean[(data_clean[‘region’])==5]
subOceania=data_clean[(data_clean[‘region’])==6]

#get the correlation coefficient for Asia
print (‘Association between alcohol consumption and suicide rate for ASIA’)
print (scipy.stats.pearsonr(subAsia[‘alcconsumption’], subAsia
[‘suicideper100th’]))
print(”)

#plot alcohol consumption and suicide rate as a scatterplot for Asia
plt.figure(401)
scat1 = seaborn.regplot(x=”alcconsumption”, y=”suicideper100th”, data=subAsia)
plt.xlabel(‘Alcohol consumption per adult in litres’)
plt.ylabel(‘Suicide rate per 100,000 population’)
plt.title(‘Association Between Alcohol Consumption and Suicide Rate in ASIA’)

#get the correlation coefficient for Europe
print (‘Association between alcohol consumption and suicide rate for EUROPE’)
print (scipy.stats.pearsonr(subEurope[‘alcconsumption’], subEurope
[‘suicideper100th’]))
print(”)

#plot alcohol consumption and suicide rate as a scatterplot for Europe
plt.figure(402)
scat1 = seaborn.regplot(x=”alcconsumption”, y=”suicideper100th”, data=subEurope)
plt.xlabel(‘Alcohol consumption per adult in litres’)
plt.ylabel(‘Suicide rate per 100,000 population’)
plt.title(‘Association Between Alcohol Consumption and Suicide Rate in EUROPE’)

#get the correlation coefficient for Africa
print (‘Association between alcohol consumption and suicide rate for AFRICA’)
print (scipy.stats.pearsonr(subAfrica[‘alcconsumption’], subAfrica
[‘suicideper100th’]))
print(”)

#plot alcohol consumption and suicide rate as a scatterplot for Africa
plt.figure(403)
scat1 = seaborn.regplot(x=”alcconsumption”, y=”suicideper100th”, data=subAfrica)
plt.xlabel(‘Alcohol consumption per adult in litres’)
plt.ylabel(‘Suicide rate per 100,000 population’)
plt.title(‘Association Between Alcohol Consumption and Suicide Rate in AFRICA’)

#get the correlation coefficient for Middle East
print (‘Association between alcohol consumption and suicide rate for MIDDLE EAST’)
print (scipy.stats.pearsonr(subMiddleEast[‘alcconsumption’], subMiddleEast
[‘suicideper100th’]))
print(”)

#plot alcohol consumption and suicide rate as a scatterplot for the Middle East
plt.figure(404)
scat1 = seaborn.regplot(x=”alcconsumption”, y=”suicideper100th”, data=subMiddleEast)
plt.xlabel(‘Alcohol consumption per adult in litres’)
plt.ylabel(‘Suicide rate per 100,000 population’)
plt.title(‘Association Between Alcohol Consumption and Suicide Rate in MIDDLE EAST’)

#get the correlation coefficient for Americas
print (‘Association between alcohol consumption and suicide rate for AMERICAS’)
print (scipy.stats.pearsonr(subAmericas[‘alcconsumption’], subAmericas
[‘suicideper100th’]))
print(”)

#plot alcohol consumption and suicide rate as a scatterplot for the Americas
plt.figure(405)
scat1 = seaborn.regplot(x=”alcconsumption”, y=”suicideper100th”, data=subAmericas)
plt.xlabel(‘Alcohol consumption per adult in litres’)
plt.ylabel(‘Suicide rate per 100,000 population’)
plt.title(‘Association Between Alcohol Consumption and Suicide Rate in AMERICAS’)

#get the correlation coefficient for Oceania
print (‘Association between alcohol consumption and suicide rate for OCEANIA’)
print (scipy.stats.pearsonr(subOceania[‘alcconsumption’], subOceania
[‘suicideper100th’]))
print(”)

#plot alcohol consumption and suicide rate as a scatterplot for Oceania
plt.figure(406)
scat1 = seaborn.regplot(x=”alcconsumption”, y=”suicideper100th”, data=subOceania)
plt.xlabel(‘Alcohol consumption per adult in litres’)
plt.ylabel(‘Suicide rate per 100,000 population’)
plt.title(‘Association Between Alcohol Consumption and Suicide Rate in OCEANIA’)

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s