Fall 2023 Wednesday 3:45pm

Zoom link:

https://upenn.zoom.us/j/98946041799?pwd=ZW5PdkNFdW5IWVFTUlQyR01SakJxUT09

The fundamental goal of linguistics is to provide a discovery procedure for language: “a practical and mechanical method for actually constructing the grammar, given a corpus of utterances.” (Syntactic Structures, Chomsky 1957, 50-51). A discovery procedure is what children use to learn the properties of their native language.

Chomsky’s 1955 manuscript The Logical Structure of Linguistic Theory (LSLT) made the first, and perhaps the last, systematic attempt at a discovery procedure. By 1957, a discovery procedure was already deemed premature. Indeed, virtually nothing was known about child language at the time, the theory of computation (“a practical and mechanical method”) was still in its infancy, and even the landmark Brown Corpus was still a decade away. Rather, the weaker decision/evaluation procedure was adopted: to determine whether a grammar is the best one for a corpus, putting aside the question where the grammar comes from in the first place. Much of the subsequent development in generative grammar has been to characterize the specific properties of language: possible and impossible structures, and the child only needs to pick and choose.

Almost seventy years later, we believe that a discovery procedure is within sight. This class aims to develop a discovery procedure in the spirit of LSLT: a constructional program that builds the representations and processes—e.g., phonemes, allophones, syntactic categories and their combinations, transformations—on a distributional basis from language specific data. To do so, however, we must abandon the traditional conception of the grammar as the best description of the corpus, and analogously, the traditional conception of learning as providing the optimal account of the learning data (e.g., minimizing prediction error or description length, maximizing posterior probability). The grammar only needs to be adequate, in a sense that can be made precise, to ensure a largely uniform outcome across children with significant individual differences in linguistic and learning experiences.

The discovery procedure developed in this class is in line with traditional hypothesis testing approach in linguistics and psychology that keeps learning simple, online, incremental, and quite importantly these days, interpretable. The key difference is that (a) the procedure not only evaluates hypotheses but also generates/revises them and (b) a hypothesis needn’t be rejected when it’s contradicted by evidence as long as it remains adequate. The goal is to derive all linguistic regularities on a distributional basis, with only Merge, the composition of hierarchical structures, as the ingredient of Universal Grammar.

The development of this class reflects significant changes in the convener’s own thinking of language and learning but also insights from developmental psychology, machine learning, and comparative linguistics, including recent work by colleagues, collaborators, and especially students.

There will be 3-4 problem sets. They are computational implementation of language learning algorithms and/or quantitative analysis of distributional data from realistic corpus data. Required for registered students who will also write a term paper.

Provisional Schedule

8/30

Discovery procedure and its empirical constraints.
Discovery procedure in a historical context. Variability and uniformity in child language. Case study: null subjects in child language and machine learning.

Readings: Chomsky (1951, p1-6), Chomsky (1957, Chapter 6), Chomsky (2021), Labov (2012, Section 2 & 3), Lewontin (1983, p85-89), Yang (2002, Sec 1.2, Sec 4.3)

9/6

The learning of word meanings
Words before phonemes. Hypothesis testing and global statistical learning in word learning. Pursuit: exploration and exploitation. Memory constraints on word learning.

Readings: Gleitman & Trueswell (2020), Stevens et al. (2017), Koehne et al. (2013), Yurovsky & Yu (2008), Soh & Yang (2021)
PS1: Local and global word learning models (Stevens et al. 2017)

9/13

Functional load and phonemic categories
The persistent failure of information theory in language. Lessons from sound change in dialect contact. The limit of homophony and the motivation for categories.

Readings: Behrend & Bitterman (1961), Surendran & Niyogi (2006), Yang (2009),

9/20

The great vowel conspiracy
Word segmentation. The developmental sequence of phonemic and word learning. Toward an unsupervised, online, incremental, and non-parametric model.

Readings: Swingley (2022), Cui (2020, Ch1-3, briefly Ch4), Labov (1987). Dresher (2020) and commentary on 9/20.

9/27

The emergence of phonological representations
Non-monotonic learning of phonology. The Alternation Condition and the abstractness debate: when you hear is not what you learn. The motivation for allophony and the construction of local and nonlocal representations and processes.

Readings: Richter (2021, Ch1-5), Belth (2023a), Belth (2023b), Hyman (2018), Liberman (2018)
PS2: Word segmentation scaling up (Frank et al. 2013)

10/4

The formation of syntactic categories
Varieties of distributional learning. Psychological models of category formation. Formal regularities in child language. Grammaticalization in the first two years. Note: Many of the papers from Rushen Shi's lab are relevant for this unit.

Readings: Shi (2014), Reeder et al. (2013), Liang et al. (2022), Mintz (2003), Medin et al. (1987)

10/11

A calculus for rules
The history of words vs. rules. Psychological motivation for the Tolerance Principle and its formal consequences.

Readings: Yang (2016; Ch1-3), Schuler et al. (2016), Shi & Emond (2023)

10/18

An adequate discovery procedure
Abductive discovery of productivity. Hypothesis formation and testing in a unified framework. The search for global and local rules.

Readings: Belth et al. (2021), POP (Chapter 4), Bjornsdottir (2021), Gorman & Yang (2019), Kirov & Cotterell (2018), van Tuijl (2021), Yang (2023)

10/25

11/1

The overestimation of linguistic regularity
The origin of the “linguistic wars” and lasting lessons from Remarks. Learning-theoretic alternatives to architectural design of morphology and syntax.

Readings:

11/8

How I stopped worrying and learned to love the Subset Problem
Why overgeneralization should be embraced, not avoided. The perils of indirect negative evidence. Generalize and retreat.

Readings:
PS4: Everyone learns Arabic plurals.

11/15

Argument structure and recursion
Syntax and semantics without linking rules. Recursive structures as distributional learning.

Readings:

11/22

No class; Thanksgiving break

11/29

Variation and probabilistic learning
Children vs. adults in learning and generalization and computational resources.Why linguistic variation is above all categorical.

Readings:

12/6

Learning impossible languages: The case of syntactic islands
The learning of strong and weak islands across languages. UG is dead, love live UG.

Readings:

Readings

Beech, C. and Swingley, D. (2023). Consequences of phonological variation for algorithmic word segmentation. Cognition, 235:105401.

Behrend, E. R. and Bitterman, M. (1961). Probability-matching in the fish. The American Journal of Psychology, 74(4):542–551.

Belth, C. (2023a). A learning-based account of local phonological processes. Phonology (in press).

Belth, C. (2023b). Towards a learning-based account of underlying forms: A case study in Turkish.
Proceedings of the Society for Computation in Linguistics, 6(1):332–342.

Belth, C., Payne, S., Beser, D., Kodner, J., and Yang, C. (2021). The greedy and recursive search
for morphological productivity. In Proceedings of CogSci 2021.

Bjornsdottir, S. M. (2021). Productivity and the acquisition of gender. Journal of Child Language,
48:1209–1234.

Caplan, S., Kodner, J., and Yang, C. (2020). Miller’s monkey updated: Communicative efficiency
and the statistics of words in natural language. Cognition, 205:1044–1066.

Chomsky, N. (1951). Morphophonemics of Modern Hebrew. Master’s thesis, University of Pennsylvania. Published by Garland, New York, 1979.

Chomsky, N. (1957). Syntactic structures. Mouton, The Hague.

Chomsky, N. (2021). Simplicity and the form of grammars. Journal of Language Modelling, 9:5–15.

Cui, A. (2020). The emergence of phonological categories. Penn dissertation.