file chisquare assignment.doc June 26, 2002
Statistical Analysis of Field Project #1
due July 3rd
Examine at least one fourcell relationship in the results of Field
Project #1 that seems to call for statistical analysis, using the chisquare
test described below. Present your
results in a brief report, showing the table being evaluated (like the table
under 2
below), including the corresponding table of expected values, and giving the
value of c^{2 }and the corresponding probability that the null hypothesis
is correct. Explain what your results
mean in terms of the effect of your independent variable(s) on the conditioning
of /r/.
How to Calculate the ChiSquare Statistic
1.
Organize
the data into a table. Use actual
numbers of tokens, NOT percentages.
This is the “observed” data.
Include the marginals (row and column totals) and total N (number in
sample) in your table:

unvoc 
voc 
total 
AA women 
32 
29 
61 
AA men 
33 
45 
78 
total 
65 
74 
139 
In
this example (Lvocalization in Philadelphia: /l/ is not pronounced, at leats
not as a consonant, similar to /r/vocalization or deletion), the percentages
of vocalization (deletion) are 48% and 58% for women and men, respectively; you
might wonder whether this difference is significant.
2.
Find
the “expected” distribution of data (i.e. the distribution that you would get
if there were no effect of the variable) using the following formula. Note that the marginals in the “expected”
table are the same as in the “observed” table.
expected value of a cell = row total x column total
÷ total number
OR: expected
value = column total ÷ total N x row total
This is the proportion of the total that would be in
that cell as it would be predicted by the marginals. That is, if the distribution of the data is perfectly regular,
you would expect to have 65 one hundred thirtyninths of the total of 61 in the
top left cell (AA women, unvoc), and similarly for the other three cells.
women, unvoc =
65 x 61 ¸ 139 = 28.53 I'm writing out only two decimal
women, voc =
74 x 61 ¸ 139 = 32.47 places; the computer will give you
men, unvoc =
65 x 78 ¸ 139 = 36.47 a lot more.
men, voc = 74 x 78 ¸ 139 = 41.53
Table of expected values:

unvoc 
voc 
total 
women 
28.53 
32.47 
61 
men 
36.47 
41.53 
78 
total 
65 
74 
139 
3.
Calculate
c^{2}. The formula is:
c^{2} = S (observed – expected)^{2}^{}
expected
To do this, calculate the c^{2} for each cell. This is a measure of the lack of fit between
the observed and expected values. Get
the value of each cell by computing:
(observed – expected)^{2}^{}
expected
women, unvoc =
(3228.53)^{2} = 3.47^{2} = 12.07 = 0.423
28.53 28.53 28.53
women, voc =
(2932.47)^{2} = 0.372
32.47
men, unvoc =
(3336.47)^{2} = 0.331
36.47
men, voc = (4541.53)^{2} =
0.291
41.53
c^{2 }per cell:

unvoc 
voc 
women 
0.423 
0.372 
men 
0.331 
0.291 
S = total c^{2} = 1.417
4.
Finally,
check to see if the c^{2} statistic is significant
for your particular data set. This
depends on the number of degrees of freedom of your table. Find the degrees of freedom:
degrees of freedom = (# of rows –1)(# of columns –1)
In our example:
df = (21)(21) = (1)(1) = 1
Now
check the significance. You can do this
by hand or in Excel.
For
one degree of freedom (df=1), we need a c^{2 }of 3.841 or greater in order
to say that the data is significant at P < 0.05. In our example, c^{2} is 1.42, so it is not
statistically significant. We conclude
that gender does not significantly affect /l/vocalization for the
AfricanAmerican women and men who were studied in this experiment.
In Excel: In an empty cell, use the
following function:
=chidist(c^{2}, degrees of freedom)
Put the name of the cell with the c^{2} statistic in the first
position and the number of degrees of freedom in the second position. The number you get is the probability that the distribution of the
data in the table is due purely to chance.
If P = 0.05 or less, you can say that the effect of the independent
variable is significant.
For our c^{2} of 1.417, this formula
produces a probability of 0.234. This
means that there is a 23% chance that the differences in the data in the table
are the result of chance. This is not
statistically significant.
5.
EXCEL
SHORTCUT—Once you’ve entered the “observed” values and calculated the
“expected” values, Excel can do the rest for you. Enter the following function in an empty cell:
=chitest(range of cells of observed values, range of
expected values)
If your table is in cells A3, A4, B3, B4, then the
range is A3:B4. Again, the number you
get is the probability that the
distribution of the data in the table is due purely to chance. If P = 0.05 or less, you can say that the
effect of the independent variable is significant.
Given the probability, you can get the c^{2} value with the function chiinv(P,df).
6.
ALTERNATIVE
COMPUTATION—For a fourcell table set up like this:
a b m_{1} where m_{1} = a+b
c d m_{2} m_{2} = c+d
m_{3}
m_{4} T
m_{3} = a+c
m_{4} = b+d
T = a+b+c+d
you
can use the following formula, which is very easily computed on a hand
calculator:
c^{2} = (adbc)^{2}
T
m_{1} m_{2} m_{3} m_{4}
This
gives you the c^{2 }statistic. Again, you have one
degree of freedom. Again, determine the statistical significance either by
looking it up in a table of the c^{2} distribution or by using
the chidist function in Excel.