Computational Linguistics
Computational linguistics is a field at the intersection of linguistics and computer science concerned with applying methods from the fields of artificial intelligence and machine learning to problems involving language.
Computational linguistics is exceptionally well represented at Penn,
both at the Department of Linguistics and at the Department of Computer and
Information Science. Weekly meetings, such as "Clunch" (computational
linguistics and lunch) and XTAG,
for ongoing work in tree adjoining grammar, as well as the Institute for Research in
Cognitive Science, provide students and faculty the opportunity
to work together and exchange ideas on current research topics. Penn
also benefits from its closeness to the Linguistic Data Consortium.
Faculty in computational linguistics often hold joint positions in
Linguistics and Computer and Information Science. , the
inventor of tree adjoining grammar (TAG), has been working with us since the
1950s and has done seminal work in a very wide variety of subfields.
developed the first computationally tractable
parser that reflects the findings of syntactic theory. He also
participated in creating the first hand-parsed corpus, the English Penn Treebank,
which had a significant impact on the field of computational
linguistics. The project has continued ever since, branching out to
include a number of other languages (such as Chinese) within the past
decade; the Treebank corpora have been used to train automatic taggers
and parsers as well as in linguistic research.
CIS professor Fernando Pereira, whose earlier work highlighted
the connections between parsing and deduction, is now a leading figure
in the field of machine learning.
Those colleagues have devised and teach a full program of courses in
computational linguistics which are attended by students from both linguistics and computer science.
, , , and other colleagues also teach relevant
courses, and the programs in linguistics and computer science have
trained large numbers of graduate students with substantial expertise
in both areas.
In addition to a secondary appointment in the Computer Science department,
is director of the Linguistic Data Consortium. The LDC constructs online corpora
of diverse types in many languages, maintains a digital archive of
research papers in computational linguistics, and hosts a variety of
seminars and conferences. Liberman has published extensively on the
theoretical and practical underpinnings of the LDC's work, especially on
the construction of corpora and of formal frameworks for linguistic
annotation.
is interested in computational models of language acquisition and language change. Specifically, he studies the interaction between the representation of linguistic information and the mechanisms of language processing and learning, with strong commitment to the empirical findings in the psychology of language.