Finally on to the second assignment in this course

As in the first assignment I am using the NESARC dataset for this assignment. In the Chi Square Test rather than have an explanatory variable and a quantitative variable, we have two explanatofy variables. So I have chosen this time to use alcohol dependence and the category of how often a responder drank alcohol.

Hypothesis Test Question –

Is the number of days when consuming alcohol associated to alcohol dependence.

When examining this association, the chi square test of analysis reveals that the null hypothesis can be rejected.

Code:

import pandas

import numpy

import scipy.stats

import seaborn

import matplotlib.pyplot as plt

data = pandas.read_csv(‘nesarc_pds.csv’, low_memory=False)

# new code setting variables you will be working with to numeric

data[‘ALCABDEP12DX’] = data[‘ALCABDEP12DX’].convert_objects(convert_numeric=True)

data[‘CONSUMER’] = pandas.to_numeric(data[‘CONSUMER’], errors=’coerce’)

data[‘S2AQ8A’] = pandas.to_numeric(data[‘S2AQ8A’], errors=’coerce’)

data[‘S2AQ8B’] = pandas.to_numeric(data[‘S2AQ8B’], errors=’coerce’)

data[‘AGE’] = pandas.to_numeric(data[‘AGE’], errors=’coerce’)

#subset data to young adults age 18 to 25 who have CONSUMED ALCOHOL in the past 12 months

sub1=data[(data[‘AGE’]>=18) & (data[‘AGE’]<=25) & (data[‘CONSUMER’]==1)]

#make a copy of my new subsetted data

sub2 = sub1.copy()

# recode missing values to python missing (NaN)

sub2[‘S2AQ8A’]=sub2[‘S2AQ8A’].replace(9, numpy.nan)

sub2[‘S2AQ8B’]=sub2[‘S2AQ8B’].replace(99, numpy.nan)

#recoding values for S3AQ3B1 into a new variable, USFREQMO

recode1 = {1: 30, 2: 30, 3: 14, 4: 6, 5: 6, 6: 2.5, 7: 1, 8: 0.5, 9: 0.5, 10: 0.5}

sub2[‘USFREQMO’]= sub2[‘S2AQ8A’].map(recode1)

#recoding values for ALCABDEP12DX into a new variable, ALCDEP

recode2 = {0: 0, 1: 0, 2: 1, 3: 1}

sub2[‘ALCDEP’]= sub2[‘ALCABDEP12DX’].map(recode2)

# contingency table of observed counts

ct1=pandas.crosstab(sub2[‘ALCDEP’], sub2[‘USFREQMO’])

print (ct1)

# column percentages

colsum=ct1.sum(axis=0)

colpct=ct1/colsum

print(colpct)

# chi-square

print (‘chi-square value, p value, expected counts’)

cs1= scipy.stats.chi2_contingency(ct1)

print (cs1)

/Code

Result:

/Result

Alcohol Dependence being:

0 No alcohol diagnosis, or alcohol abuse only

1 Alcohol dependence only or alcohol abuse and dependence

Frequency of use being number of days per month.

A Chi Square test of independence revealed that among daily, young adult drinkers, number days per month when alcohol is consumed (collapsed into 6 ordered categories) and past year alcohol dependence (binary categorical variable) were significantly associated, X2 =466.55, 5 df, p=1.32e-98.

Post hoc comparisons of rates of alcohol dependence by pairs of number of days per month drinking revealed that higher rates of alcohol dependence were seen among those drinking on more days, up to 14 to 13 days per month. In comparison, prevalence of alcohol dependence was statistically similar among those groups drinking 6 or less days per month.