## Testing Moderation in the Context of Correlation (Coursera – Data Analysis Tools)

The material this week focused in part again on the smoking and nicotine dependence relationship and further whether it was moderated by lifetime depression status. This led me to wonder if other mental health issues may be moderators. For this example I chose to consider Lifetime Social Phobia

Initially we need to esatblish the relationship between smoking quantity and nicotine dependence.

Code

import pandas

import numpy

import scipy.stats

import seaborn

import matplotlib.pyplot as plt

import statsmodels.stats.proportion as sm

data = pandas.read_csv(‘nesarc_pds.csv’, low_memory=False)

#setting variables you will be working with to numeric

data[‘TAB12MDX’] = data[‘TAB12MDX’].convert_objects(convert_numeric=True)

data[‘CHECK321’] = data[‘CHECK321’].convert_objects(convert_numeric=True)

data[‘S3AQ3B1’] = data[‘S3AQ3B1’].convert_objects(convert_numeric=True)

data[‘S3AQ3C1’] = data[‘S3AQ3C1’].convert_objects(convert_numeric=True)

data[‘AGE’] = data[‘AGE’].convert_objects(convert_numeric=True)

#subset data to young adults age 18 to 25 who have smoked in the past 12 months

sub1=data[(data[‘AGE’]>=18) & (data[‘AGE’]<=25) & (data[‘CHECK321’]==1)]

#make a copy of my new subsetted data

sub2 = sub1.copy()

# recode missing values to python missing (NaN)

sub2[‘S3AQ3B1’]=sub2[‘S3AQ3B1’].replace(9, numpy.nan)

sub2[‘S3AQ3C1’]=sub2[‘S3AQ3C1’].replace(99, numpy.nan)

#recoding values for S3AQ3B1 into a new variable, USFREQMO

recode1 = {1: 30, 2: 22, 3: 14, 4: 6, 5: 2.5, 6: 1}

sub2[‘USFREQMO’]= sub2[‘S3AQ3B1’].map(recode1)

#recoding values for S3AQ3B1 into a new variable, USFREQMO

recode2 = {1: 30, 2: 22, 3: 14, 4: 5, 5: 2.5, 6: 1}

sub2[‘USFREQMO’]= sub2[‘S3AQ3B1’].map(recode2)

def USQUAN (row):

if row[‘S3AQ3B1’] != 1:

return 0

elif row[‘S3AQ3C1’] <= 5 :

return 3

elif row[‘S3AQ3C1’] <=10:

return 8

elif row[‘S3AQ3C1’] <= 15:

return 13

elif row[‘S3AQ3C1’] <= 20:

return 18

elif row[‘S3AQ3C1’] > 20:

return 37

sub2[‘USQUAN’] = sub2.apply (lambda row: USQUAN (row),axis=1)

# contingency table of observed counts

ct1=pandas.crosstab(sub2[‘TAB12MDX’], sub2[‘USQUAN’])

print (ct1)

# column percentages

colsum=ct1.sum(axis=0)

colpct=ct1/colsum

print(colpct)

# chi-square

print (‘chi-square value, p value, expected counts’)

cs1= scipy.stats.chi2_contingency(ct1)

print (cs1)

/Code

Result

/Result

With a large chi square value (194.42) and a significant p value (4.22e-40), we see that smoking quantity and nicotine dependence are signifanctly associated.

Then we include the third variable, Lifetime Social Phobia, and split the data into two sets, 0= no social phobia, 1= social phobia.

Code

sub3=sub2[(sub2[‘SOCPDLIFE’]== 0)]

sub4=sub2[(sub2[‘SOCPDLIFE’]== 1)]

print (‘association between smoking quantity and nicotine dependence for those W/O social phobia’)

# contingency table of observed counts

ct2=pandas.crosstab(sub3[‘TAB12MDX’], sub3[‘USQUAN’])

print (ct2)

# column percentages

colsum=ct1.sum(axis=0)

colpct=ct1/colsum

print(colpct)

# chi-square

print (‘chi-square value, p value, expected counts’)

cs2= scipy.stats.chi2_contingency(ct2)

print (cs2)

print (‘association between smoking quantity and nicotine dependence for those WITH social phobia’)

# contingency table of observed counts

ct3=pandas.crosstab(sub4[‘TAB12MDX’], sub4[‘USQUAN’])

print (ct3)

# column percentages

colsum=ct1.sum(axis=0)

colpct=ct1/colsum

print(colpct)

# chi-square

print (‘chi-square value, p value, expected counts’)

cs3= scipy.stats.chi2_contingency(ct3)

print (cs3)

/Code

Result

/Result

In this we find :

Those without social phobia having a high chi square value (182.56) and a significant p value (1.51e-37)

Those with social phobia also have a high chi square value (15.59) and a significant p value (0.008)

So in this situation we would say that the condition of social phobia does not moderate the relationship between smoking quantity and nicotine dependence.