file chisquare assignment.doc June 26, 2002
Statistical Analysis of Field Project #1
due July 3rd
Examine at least one four-cell relationship in the results of Field
Project #1 that seems to call for statistical analysis, using the chi-square
test described below. Present your
results in a brief report, showing the table being evaluated (like the table
under 2
below), including the corresponding table of expected values, and giving the
value of c2 and the corresponding probability that the null hypothesis
is correct. Explain what your results
mean in terms of the effect of your independent variable(s) on the conditioning
of /r/.
How to Calculate the Chi-Square Statistic
1.
Organize
the data into a table. Use actual
numbers of tokens, NOT percentages.
This is the “observed” data.
Include the marginals (row and column totals) and total N (number in
sample) in your table:
|
|
unvoc |
voc |
total |
|
AA women |
32 |
29 |
61 |
|
AA men |
33 |
45 |
78 |
|
total |
65 |
74 |
139 |
In
this example (L-vocalization in Philadelphia: /l/ is not pronounced, at leats
not as a consonant, similar to /r/-vocalization or deletion), the percentages
of vocalization (deletion) are 48% and 58% for women and men, respectively; you
might wonder whether this difference is significant.
2.
Find
the “expected” distribution of data (i.e. the distribution that you would get
if there were no effect of the variable) using the following formula. Note that the marginals in the “expected”
table are the same as in the “observed” table.
expected value of a cell = row total x column total
÷ total number
OR: expected
value = column total ÷ total N x row total
This is the proportion of the total that would be in
that cell as it would be predicted by the marginals. That is, if the distribution of the data is perfectly regular,
you would expect to have 65 one hundred thirty-ninths of the total of 61 in the
top left cell (AA women, unvoc), and similarly for the other three cells.
women, unvoc =
65 x 61 ¸ 139 = 28.53 I'm writing out only two decimal
women, voc =
74 x 61 ¸ 139 = 32.47 places; the computer will give you
men, unvoc =
65 x 78 ¸ 139 = 36.47 a lot more.
men, voc = 74 x 78 ¸ 139 = 41.53
Table of expected values:
|
|
unvoc |
voc |
total |
|
women |
28.53 |
32.47 |
61 |
|
men |
36.47 |
41.53 |
78 |
|
total |
65 |
74 |
139 |
3.
Calculate
c2. The formula is:
c2 = S (observed – expected)2
expected
To do this, calculate the c2 for each cell. This is a measure of the lack of fit between
the observed and expected values. Get
the value of each cell by computing:
(observed – expected)2
expected
women, unvoc =
(32-28.53)2 = 3.472 = 12.07 = 0.423
28.53 28.53 28.53
women, voc =
(29-32.47)2 = 0.372
32.47
men, unvoc =
(33-36.47)2 = 0.331
36.47
men, voc = (45-41.53)2 =
0.291
41.53
c2 per cell:
|
|
unvoc |
voc |
|
women |
0.423 |
0.372 |
|
men |
0.331 |
0.291 |
S = total c2 = 1.417
4.
Finally,
check to see if the c2 statistic is significant
for your particular data set. This
depends on the number of degrees of freedom of your table. Find the degrees of freedom:
degrees of freedom = (# of rows –1)(# of columns –1)
In our example:
df = (2-1)(2-1) = (1)(1) = 1
Now
check the significance. You can do this
by hand or in Excel.
For
one degree of freedom (df=1), we need a c2 of 3.841 or greater in order
to say that the data is significant at P < 0.05. In our example, c2 is 1.42, so it is not
statistically significant. We conclude
that gender does not significantly affect /l/-vocalization for the
African-American women and men who were studied in this experiment.
In Excel: In an empty cell, use the
following function:
=chidist(c2, degrees of freedom)
Put the name of the cell with the c2 statistic in the first
position and the number of degrees of freedom in the second position. The number you get is the probability that the distribution of the
data in the table is due purely to chance.
If P = 0.05 or less, you can say that the effect of the independent
variable is significant.
For our c2 of 1.417, this formula
produces a probability of 0.234. This
means that there is a 23% chance that the differences in the data in the table
are the result of chance. This is not
statistically significant.
5.
EXCEL
SHORTCUT—Once you’ve entered the “observed” values and calculated the
“expected” values, Excel can do the rest for you. Enter the following function in an empty cell:
=chitest(range of cells of observed values, range of
expected values)
If your table is in cells A3, A4, B3, B4, then the
range is A3:B4. Again, the number you
get is the probability that the
distribution of the data in the table is due purely to chance. If P = 0.05 or less, you can say that the
effect of the independent variable is significant.
Given the probability, you can get the c2 value with the function chiinv(P,df).
6.
ALTERNATIVE
COMPUTATION—For a four-cell table set up like this:
a b m1 where m1 = a+b
c d m2 m2 = c+d
m3
m4 T
m3 = a+c
m4 = b+d
T = a+b+c+d
you
can use the following formula, which is very easily computed on a hand
calculator:
c2 = (ad-bc)2
T
m1 m2 m3 m4
This
gives you the c2 statistic. Again, you have one
degree of freedom. Again, determine the statistical significance either by
looking it up in a table of the c2 distribution or by using
the chidist function in Excel.