LING5700 Discovery Procedure

Fall 2023 Wednesday 3:45pm

Zoom link:

The fundamental goal of linguistics is to provide a discovery procedure for language: “a practical and mechanical method for actually constructing the grammar, given a corpus of utterances.” (Syntactic Structures, Chomsky 1957, 50-51). A discovery procedure is what children use to learn the properties of their native language.

Chomsky’s 1955 manuscript The Logical Structure of Linguistic Theory (LSLT) made the first, and perhaps the last, systematic attempt at a discovery procedure. By 1957, a discovery procedure was already deemed premature. Indeed, virtually nothing was known about child language at the time, the theory of computation (“a practical and mechanical method”) was still in its infancy, and even the landmark Brown Corpus was still a decade away. Rather, the weaker decision/evaluation procedure was adopted: to determine whether a grammar is the best one for a corpus, putting aside the question where the grammar comes from in the first place. Much of the subsequent development in generative grammar has been to characterize the specific properties of language: possible and impossible structures, and the child only needs to pick and choose.

Almost seventy years later, we believe that a discovery procedure is within sight. This class aims to develop a discovery procedure in the spirit of LSLT: a constructional program that builds the representations and processes—e.g., phonemes, allophones, syntactic categories and their combinations, transformations—on a distributional basis from language specific data. To do so, however, we must abandon the traditional conception of the grammar as the best description of the corpus, and analogously, the traditional conception of learning as providing the optimal account of the learning data (e.g., minimizing prediction error or description length, maximizing posterior probability). The grammar only needs to be adequate, in a sense that can be made precise, to ensure a largely uniform outcome across children with significant individual differences in linguistic and learning experiences.

The discovery procedure developed in this class is in line with traditional hypothesis testing approach in linguistics and psychology that keeps learning simple, online, incremental, and quite importantly these days, interpretable. The key difference is that (a) the procedure not only evaluates hypotheses but also generates/revises them and (b) a hypothesis needn’t be rejected when it’s contradicted by evidence as long as it remains adequate. The goal is to derive all linguistic regularities on a distributional basis, with only Merge, the composition of hierarchical structures, as the ingredient of Universal Grammar.

The development of this class reflects significant changes in the convener’s own thinking of language and learning but also insights from developmental psychology, machine learning, and comparative linguistics, including recent work by colleagues, collaborators, and especially students.

There will be 3-4 problem sets. They are computational implementation of language learning algorithms and/or quantitative analysis of distributional data from realistic corpus data. Required for registered students who will also write a term paper.

Provisional Schedule

Discovery procedure and its empirical constraints.
Discovery procedure in a historical context. Variability and uniformity in child language. Case study: null subjects in child language and machine learning.

Readings: Chomsky (1951, p1-6), Chomsky (1957, Chapter 6), Chomsky (2021), Labov (2012, Section 2 & 3), Lewontin (1983, p85-89), Yang (2002, Sec 1.2, Sec 4.3)
The learning of word meanings
Words before phonemes. Hypothesis testing and global statistical learning in word learning. Pursuit: exploration and exploitation. Memory constraints on word learning.

Readings: Gleitman & Trueswell (2020), Stevens et al. (2017), Koehne et al. (2013), Yurovsky & Yu (2008), Soh & Yang (2021)
PS1: Local and global word learning models (Stevens et al. 2017)
Functional load and phonemic categories
The persistent failure of information theory in language. Lessons from sound change in dialect contact. The limit of homophony and the motivation for categories.

Behrend & Bitterman (1961), Surendran & Niyogi (2006), Yang (2009), 
The great vowel conspiracy
Word segmentation. The developmental sequence of phonemic and word learning. Toward an unsupervised, online, incremental, and non-parametric model.

Swingley (2022), Cui (2020, Ch1-3, briefly Ch4), Labov (1987)
The emergence of phonological representations
Non-monotonic learning of phonology. The Alternation Condition and the abstractness debate: when you hear is not what you learn. The motivation for allophony and the construction of local and nonlocal representations and processes.

Readings: Richter (2021, Ch1-5), Belth (2023a), Belth (2023b), Hyman (2018), Liberman (2018)
PS2: Word segmentation scaling up (Frank et al. 2013)
The formation of syntactic categories
Varieties of distributional learning. Psychological models of category formation. Formal regularities in child language. Grammaticalization in the first two years.

Memorization or generalization
A covid test for grammars: How to draw conclusions about the underlying grammatical system from a corpus.

A calculus for rules
The history of words vs. rules. Psychological motivation for the Tolerance Principle and its formal consequences.

PS3: Combinatorics of syntactic categories
An adequate discovery procedure
Abductive discovery of productivity. Hypothesis formation and testing in a unified framework. The search for global and local rules.

The overestimation of linguistic regularity
The origin of the “linguistic wars” and lasting lessons from Remarks. Learning-theoretic alternatives to architectural design of morphology and syntax.

How I stopped worrying and learned to love the Subset Problem
Why overgeneralization should be embraced, not avoided. The perils of indirect negative evidence. Generalize and retreat.

PS4: Everyone learns Arabic plurals.
Argument structure and recursion
Syntax and semantics without linking rules. Recursive structures as distributional learning.

No class; Thanksgiving break
Variation and probabilistic learning
Children vs. adults in learning and generalization and computational resources.Why linguistic variation is above all categorical.

Learning impossible languages: The case of syntactic islands
The learning of strong and weak islands across languages. UG is dead, love live UG.



Beech, C. and Swingley, D. (2023). Consequences of phonological variation for algorithmic word segmentation. Cognition, 235:105401.

Behrend, E. R. and Bitterman, M. (1961). Probability-matching in the fish. The American Journal of Psychology, 74(4):542–551.

Belth, C. (2023a). A learning-based account of local phonological processes. Phonology (in press).

Belth, C. (2023b). Towards a learning-based account of underlying forms: A case study in Turkish.
Proceedings of the Society for Computation in Linguistics, 6(1):332–342.

Caplan, S., Kodner, J., and Yang, C. (2020). Miller’s monkey updated: Communicative efficiency
and the statistics of words in natural language. Cognition, 205:1044–1066.

Chomsky, N. (1951). Morphophonemics of Modern Hebrew. Master’s thesis, University of Pennsylvania. Published by Garland, New York, 1979.

Chomsky, N. (1957). Syntactic structures. Mouton, The Hague.

Chomsky, N. (2021). Simplicity and the form of grammars. Journal of Language Modelling, 9:5–15.

Cui, A. (2020). The emergence of phonological categories. Penn dissertation.

Frank, M. C., Tenenbaum, J. B., and Gibson, E. (2013). Learning and long-term retention of large-scale artificial languages. PloS one, 8(1):e52500.

Gleitman, L. R. and Trueswell, J. C. (2020). Easy words: Reference resolution in a malevolent referent world. Topics in cognitive science, 12(1):22–47.

Hyman, L. M. (2018). Why underlying representations? Journal of Linguistics, 54(3):591–610.

Koehne, J., Trueswell, J. C., and Gleitman, L. R. (2013). Multiple proposal memory in observational word learning. In Proceedings of the 35th Annual meeting of the Cognitive Science Society. Austin, TX: Cognitive Science Society.

Labov, W. (1987). The overestimation of functionalism. In Dirven, R. and Fried, V., editors, Functionalism in Linguistics, pages 311–332. John Benjamins Publishing Company, Amsterdam.

Labov, W. (2012). What is to be learned. Review of Cognitive Linguistics, 10(2):265–293.

Lewontin, R. C. (1983). The organism as the subject and object of evolution. Scientia, 77(18).

Liberman, M. (2018). Toward progress in theories of language sound structure. Shaping phonology, pages 201–222.

Lignos, C. (2011). Modeling infant word segmentation. In Proceedings of the 15th Conference on Computational Language Learning, pages 28–38.

Richter, C. (2021). Alternation-sensitive phoneme learning: Implications for children’s development and language change. Penn dissertation.

Soh, C. and Yang, C. (2021). Memory constraints on cross situational word learning. In Proceedings of CogSci 2021, pages 307–313.

Stevens, J. S., Gleitman, L. R., Trueswell, J. C., and Yang, C. (2017). The pursuit of word meanings. Cognitive science, 41:638–676.

Surendran, D., Niyogi, P., et al. (2006). Quantifying the functional load of phonemic oppositions, distinctive features, and suprasegmentals. Amsterdam studies in the theory and history of linguistic science series 4, 279:43.

Swingley, D. (2022). Infants’ learning of speech sounds and word-forms. In Papafragou, A., Trueswell, J., and Gleitman, L., editors, The Oxford hanbook of the mental lexicon. Oxford University Press.

Yang, C. (2002). Knowledge and learning in natural language. Oxford University Press, Oxford.

Yang, C. (2009). Population structure and language change. Ms., University of Pennsylvania.

Yurovsky, D. and Yu, C. (2008). Mutual exclusivity in cross-situational statistical learning. In
Proceedings of the annual meeting of the cognitive science society, volume 30.


The CHILDES database

SUBTLEX corpus

English Lexicon Project.

Unix for Poets.

Python bootcamp.

CMU Pronunication Dictionary.