file chisquare assignment.doc                                                                                     June 26, 2002

 

Statistical Analysis of Field Project #1

due July 3rd

 

 

Examine at least one four-cell relationship in the results of Field Project #1 that seems to call for statistical analysis, using the chi-square test described below.  Present your results in a brief report, showing the table being evaluated (like the table under 2 below), including the corresponding table of expected values, and giving the value of c2 and the corresponding probability that the null hypothesis is correct.  Explain what your results mean in terms of the effect of your independent variable(s) on the conditioning of /r/.

 

 

How to Calculate the Chi-Square Statistic

 

1.       Organize the data into a table.  Use actual numbers of tokens, NOT percentages.  This is the “observed” data.  Include the marginals (row and column totals) and total N (number in sample) in your table:

 

 

unvoc

voc

total

AA women

32

29

61

AA men

33

45

78

total

65

74

139

 

In this example (L-vocalization in Philadelphia: /l/ is not pronounced, at leats not as a consonant, similar to /r/-vocalization or deletion), the percentages of vocalization (deletion) are 48% and 58% for women and men, respectively; you might wonder whether this difference is significant.

 

2.       Find the “expected” distribution of data (i.e. the distribution that you would get if there were no effect of the variable) using the following formula.  Note that the marginals in the “expected” table are the same as in the “observed” table.

 

expected value of a cell = row total x column total ÷ total number

OR:      expected value = column total ÷ total N x row total

 

This is the proportion of the total that would be in that cell as it would be predicted by the marginals.  That is, if the distribution of the data is perfectly regular, you would expect to have 65 one hundred thirty-ninths of the total of 61 in the top left cell (AA women, unvoc), and similarly for the other three cells. 

 

women, unvoc       = 65 x 61  ¸ 139      =  28.53       I'm writing out only two decimal

women, voc      = 74 x 61 ¸ 139      =  32.47      places; the computer will give you

men, unvoc      = 65 x 78 ¸ 139      =  36.47      a lot more.

men, voc          = 74 x 78 ¸ 139      =  41.53


Table of expected values:

 

 

unvoc

voc

total

women

28.53

32.47

61

men

36.47

41.53

78

total

65

74

139

 

 

3.       Calculate c2.  The formula is:

c2 = S (observed – expected)2

                        expected

To do this, calculate the c2 for each cell.  This is a measure of the lack of fit between the observed and expected values.  Get the value of each cell by computing:

 

(observed – expected)2

                        expected

 

 

women, unvoc       = (32-28.53)2      =      3.472      =      12.07      =      0.423

                              28.53               28.53         28.53

 

women, voc      = (29-32.47)2                                    =      0.372

                              32.47              

 

men, unvoc      = (33-36.47)2                                    =      0.331

                               36.47              

 

men, voc          = (45-41.53)2                                    =       0.291

                                            41.53

 

c2 per cell:

 

unvoc

voc

women

0.423

0.372

men

0.331

0.291

 

Next, add up the numbers in the third table.  This gives you the c2 statistic. 

 

                                                S = total c2 = 1.417

 

4.       Finally, check to see if the c2 statistic is significant for your particular data set.  This depends on the number of degrees of freedom of your table.  Find the degrees of freedom:

 

degrees of freedom = (# of rows –1)(# of columns –1)

 

In our example:

 

df = (2-1)(2-1) = (1)(1) = 1

 

Now check the significance.  You can do this by hand or in Excel.

 

By hand:  Find where your calculated c2 statistic falls in a c2 table for the appropriate number of degrees of freedom.  If the corresponding P value (probability) is less than .05, then your c2  is significant.  Read this as, "The probability that your results are due to chance is less than .05."  "Your results" means the deviation of the observed data from the expected values.  That is, this tells us whether the independent variable (i.e. ethnicity, gender, neighborhood) does have a significant effect on the dependent variable (in this case, /l/-vocalization).

 

For one degree of freedom (df=1), we need a c2 of 3.841 or greater in order to say that the data is significant at P < 0.05.  In our example, c2 is 1.42, so it is not statistically significant.  We conclude that gender does not significantly affect /l/-vocalization for the African-American women and men who were studied in this experiment.

 

In Excel: In an empty cell, use the following function:

 

            =chidist(c2, degrees of freedom)

 

Put the name of the cell with the c2 statistic in the first position and the number of degrees of freedom in the second position.  The number you get is the probability that the distribution of the data in the table is due purely to chance.  If P = 0.05 or less, you can say that the effect of the independent variable is significant.

 

For our c2 of 1.417, this formula produces a probability of 0.234.  This means that there is a 23% chance that the differences in the data in the table are the result of chance.  This is not statistically significant.

 

5.       EXCEL SHORTCUT—Once you’ve entered the “observed” values and calculated the “expected” values, Excel can do the rest for you.  Enter the following function in an empty cell:

 

=chitest(range of cells of observed values, range of expected values)

 

If your table is in cells A3, A4, B3, B4, then the range is A3:B4.  Again, the number you get is the probability that the distribution of the data in the table is due purely to chance.  If P = 0.05 or less, you can say that the effect of the independent variable is significant.

 

Given the probability, you can get the c2 value with the function chiinv(P,df).


6.       ALTERNATIVE COMPUTATION—For a four-cell table set up like this:

 

a            b            m1            where m1 = a+b

c            d            m2                m2 = c+d

m3            m4            T             m3 = a+c

                                                m4 = b+d

                                                T = a+b+c+d

 

you can use the following formula, which is very easily computed on a hand calculator:

 

c2 =               (ad-bc)2 T

m1 m2 m3 m4

 

This gives you the c2  statistic.  Again, you have one degree of freedom. Again, determine the statistical significance either by looking it up in a table of the c2 distribution or by using the chidist function in Excel.