Recontrast

recontrast()

Home Research

Download

You can download the script here and source it locally, or source it directly from this location with the following command:

source("http://www.ling.upenn.edu/~joseff/scripts/recontrast.R")

Usage:

recontrast(x, type = "sum")

x can be a data frame, or a factor. If given a data frame, recontrast() will change the contrasts for all unordered factors with < 1000 levels.

As of now, recontrast() will generate sum and treatment contrasts if you pass "sum" or "treatment" to type.

The output of recontrast() is either a factor with the new contrasts, or a data frame with all of its factor columns given the specified contrasts

Motivations

See here for a discussion of contrasts with sample data.

By default, R sets up treatment contrasts for categorical factors. Treatment contrasts assume that some level (the first one) is untreated, or default with respect to the response. The reported effects for the treated levels are their differences from the default group. For experimental data, where there is one control group and many experimental groups, treatment contrasts are obviously the most appropriate. For some linguistic data, treatment contrasts can also be appropriate. For example, if you assume that for some variation a lingual articulation will introduce a bias, but a labial articulation will not, you can treat the labial articulation as "untreated."

However, the assumption of an untreated group is inappropriate for a lot of linguistic data, and the comparision of all levels to some reference level is uninteresting. Sum contrasts are another way of setting up comparisons in the model, where reported coefficients for each level represent the difference of that level from the mean of the response variable. Sum contrasts are the standard kind of contrasts used in sociolinguistic research. In fact, they are the only kind of contrast one can use in GoldVarb (with centered factors). They are so called because the sum of all effects is 0 when reported in log-odds (they average to 0.5 when reported as "factor weights," the inverse logit of the log-odds coefficients).

So why write a script to change the contrasts, when we could change the default options in R about what kinds of contrasts to automatically generate, or we could just use contr.sum() and contr.treatment() on a factor by factor basis? The primary reason is that the output of contr.sum() is this:

> contr.sum(levels(td$PreSeg)) [,1] [,2] [,3] [,4] fricative 1 0 0 0 l 0 1 0 0 nasal 0 0 1 0 obstruent 0 0 0 1 sibilant -1 -1 -1 -1 Whereas the output of contr.treatment() is this: > contr.treatment(levels(td$PreSeg)) l nasal obstruent sibilant fricative 0 0 0 0 l 1 0 0 0 nasal 0 1 0 0 obstruent 0 0 1 0 sibilant 0 0 0 1 The crucial difference here is that contr.treatment() names the columns of the contrasts matrix, but contr.sum() doesn't. This means that if we were to fit a model with PreSeg with treatment contrasts, the coefficients would be meaningfully named, like PreSegnasal. Fitting the model with sum contrasts would just number the coefficients, returning a non-meaningfully named PreSeg3.

recontrast() names the columns of sum contrasts matrix, so that models fit with these factors will be more readable. It also operates over factors and data frames, which are the important conceptual units of model fitting.

Words of Warning

If you change, reorder, or eliminate levels of a factor, it will be necessary to reconstruct the contrasts for that factor again with recontrast().

CC-GNU GPL
This software is licensed under the CC-GNU GPL version 2.0 or later.