Home | Research |
You can download the script here and source it locally, or source it directly from this location with the following command:
source("http://www.ling.upenn.edu/~joseff/scripts/recontrast.R")
recontrast(x, type = "sum")
x can be a data frame, or a factor. If given a data frame, recontrast() will change the contrasts for all unordered factors with < 1000 levels.
As of now, recontrast() will generate sum and treatment contrasts if you pass "sum" or "treatment" to type.
The output of recontrast() is either a factor with the new contrasts, or a data frame with all of its factor columns given the specified contrasts
See here for a discussion of contrasts with sample data.
By default, R sets up treatment contrasts for categorical factors. Treatment contrasts assume that some level (the first one) is untreated, or default with respect to the response. The reported effects for the treated levels are their differences from the default group. For experimental data, where there is one control group and many experimental groups, treatment contrasts are obviously the most appropriate. For some linguistic data, treatment contrasts can also be appropriate. For example, if you assume that for some variation a lingual articulation will introduce a bias, but a labial articulation will not, you can treat the labial articulation as "untreated."
However, the assumption of an untreated group is inappropriate for a lot of linguistic data, and the comparision of all levels to some reference level is uninteresting. Sum contrasts are another way of setting up comparisons in the model, where reported coefficients for each level represent the difference of that level from the mean of the response variable. Sum contrasts are the standard kind of contrasts used in sociolinguistic research. In fact, they are the only kind of contrast one can use in GoldVarb (with centered factors). They are so called because the sum of all effects is 0 when reported in log-odds (they average to 0.5 when reported as "factor weights," the inverse logit of the log-odds coefficients).
So why write a script to change the contrasts, when we could change the default options in R about what kinds of contrasts to automatically generate, or we could just use contr.sum() and contr.treatment() on a factor by factor basis? The primary reason is that the output of contr.sum() is this:
recontrast() names the columns of sum contrasts matrix, so that models fit with these factors will be more readable. It also operates over factors and data frames, which are the important conceptual units of model fitting.
If you change, reorder, or eliminate levels of a factor, it will be necessary to reconstruct the contrasts for that factor again with recontrast().