LING5700 Discovery Procedure

Fall 2023 Wednesday 3:45pm

Zoom link:
 https://upenn.zoom.us/j/98946041799?pwd=ZW5PdkNFdW5IWVFTUlQyR01SakJxUT09


The fundamental goal of linguistics is to provide a discovery procedure for language: “a practical and mechanical method for actually constructing the grammar, given a corpus of utterances.” (Syntactic Structures, Chomsky 1957, 50-51). A discovery procedure is what children use to learn the properties of their native language.

Chomsky’s 1955 manuscript The Logical Structure of Linguistic Theory (LSLT) made the first, and perhaps the last, systematic attempt at a discovery procedure. By 1957, a discovery procedure was already deemed premature. Indeed, virtually nothing was known about child language at the time, the theory of computation (“a practical and mechanical method”) was still in its infancy, and even the landmark Brown Corpus was still a decade away. Rather, the weaker decision/evaluation procedure was adopted: to determine whether a grammar is the best one for a corpus, putting aside the question where the grammar comes from in the first place. Much of the subsequent development in generative grammar has been to characterize the specific properties of language: possible and impossible structures, and the child only needs to pick and choose.

Almost seventy years later, we believe that a discovery procedure is within sight. This class aims to develop a discovery procedure in the spirit of LSLT: a constructional program that builds the representations and processes—e.g., phonemes, allophones, syntactic categories and their combinations, transformations—on a distributional basis from language specific data. To do so, however, we must abandon the traditional conception of the grammar as the best description of the corpus, and analogously, the traditional conception of learning as providing the optimal account of the learning data (e.g., minimizing prediction error or description length, maximizing posterior probability). The grammar only needs to be adequate, in a sense that can be made precise, to ensure a largely uniform outcome across children with significant individual differences in linguistic and learning experiences.

The discovery procedure developed in this class is in line with traditional hypothesis testing approach in linguistics and psychology that keeps learning simple, online, incremental, and quite importantly these days, interpretable. The key difference is that (a) the procedure not only evaluates hypotheses but also generates/revises them and (b) a hypothesis needn’t be rejected when it’s contradicted by evidence as long as it remains adequate. The goal is to derive all linguistic regularities on a distributional basis, with only Merge, the composition of hierarchical structures, as the ingredient of Universal Grammar.

The development of this class reflects significant changes in the convener’s own thinking of language and learning but also insights from developmental psychology, machine learning, and comparative linguistics, including recent work by colleagues, collaborators, and especially students.

There will be 3-4 problem sets. They are computational implementation of language learning algorithms and/or quantitative analysis of distributional data from realistic corpus data. Required for registered students who will also write a term paper.


Provisional Schedule

8/30
Discovery procedure and its empirical constraints.
Discovery procedure in a historical context. Variability and uniformity in child language. Case study: null subjects in child language and machine learning.

Readings: Chomsky (1951, p1-6), Chomsky (1957, Chapter 6), Chomsky (2021), Labov (2012, Section 2 & 3), Lewontin (1983, p85-89), Yang (2002, Sec 1.2, Sec 4.3)
9/6
The learning of word meanings
Words before phonemes. Hypothesis testing and global statistical learning in word learning. Pursuit: exploration and exploitation. Memory constraints on word learning.

Readings: Gleitman & Trueswell (2020), Stevens et al. (2017), Koehne et al. (2013), Yurovsky & Yu (2008), Soh & Yang (2021)
PS1: Local and global word learning models (Stevens et al. 2017)
9/13
Functional load and phonemic categories
The persistent failure of information theory in language. Lessons from sound change in dialect contact. The limit of homophony and the motivation for categories.

Readings:
Behrend & Bitterman (1961), Surendran & Niyogi (2006), Yang (2009), 
9/20
The great vowel conspiracy
Word segmentation. The developmental sequence of phonemic and word learning. Toward an unsupervised, online, incremental, and non-parametric model.

Readings:
Swingley (2022), Cui (2020, Ch1-3, briefly Ch4), Labov (1987). Dresher (2020) and commentary on 9/20.
9/27
The emergence of phonological representations
Non-monotonic learning of phonology. The Alternation Condition and the abstractness debate: when you hear is not what you learn. The motivation for allophony and the construction of local and nonlocal representations and processes.

Readings: Richter (2021, Ch1-5), Belth (2023a), Belth (2023b), Hyman (2018), Liberman (2018)
PS2: Word segmentation scaling up (Frank et al. 2013)
10/4
The formation of syntactic categories
Varieties of distributional learning. Psychological models of category formation. Formal regularities in child language. Grammaticalization in the first two years. Note: Many of the papers from Rushen Shi's lab are relevant for this unit.

Readings:
Shi (2014), Reeder et al. (2013), Liang et al. (2022), Mintz (2003), Medin et al. (1987)
10/11
A calculus for rules
The history of words vs. rules. Psychological motivation for the Tolerance Principle and its formal consequences.

Readings: Yang (2016; Ch1-3), Schuler et al. (2016),  Shi & Emond (2023)
10/18
An adequate discovery procedure
Abductive discovery of productivity. Hypothesis formation and testing in a unified framework. The search for global and local rules.

Readings: Belth et al. (2021),
POP (Chapter 4), Bjornsdottir (2021), Gorman & Yang (2019), Kirov & Cotterell (2018),  van Tuijl (2021), Yang (2023)
10/25
The formation of syntactic categories
Varieties of distributional learning. Psychological models of category formation. Formal regularities in child language. Grammaticalization in the first two years. Note: Many of the papers from Rushen Shi's lab are relevant for this unit.

Readings:
Shi (2014), Reeder et al. (2013), Liang et al. (2022), Mintz (2003), Medin et al. (1987)Memorization or generalization

A Covid test for grammars

Readings: Valian et al. (2009), Yang (2013), Pine et al. (2013), Goldin-Meadow & Yang (2017), Shi (2023)
11/1
The overestimation of linguistic regularity
The origin of the “linguistic wars” and lasting lessons from Remarks. Learning-theoretic alternatives to architectural design of morphology and syntax.

Readings:
11/8
How I stopped worrying and learned to love the Subset Problem
Why overgeneralization should be embraced, not avoided. The perils of indirect negative evidence. Generalize and retreat.

Readings:
PS4: Everyone learns Arabic plurals.
11/15
Argument structure and recursion
Syntax and semantics without linking rules. Recursive structures as distributional learning.

Readings:
11/22
No class; Thanksgiving break
11/29
Variation and probabilistic learning
Children vs. adults in learning and generalization and computational resources.Why linguistic variation is above all categorical.

Readings:
12/6
Learning impossible languages: The case of syntactic islands
The learning of strong and weak islands across languages. UG is dead, love live UG.

Readings:

Readings


Beech, C. and Swingley, D. (2023). Consequences of phonological variation for algorithmic word segmentation. Cognition, 235:105401.

Behrend, E. R. and Bitterman, M. (1961). Probability-matching in the fish. The American Journal of Psychology, 74(4):542–551.

Belth, C. (2023a). A learning-based account of local phonological processes. Phonology (in press).

Belth, C. (2023b). Towards a learning-based account of underlying forms: A case study in Turkish.
Proceedings of the Society for Computation in Linguistics, 6(1):332–342.

Belth, C., Payne, S., Beser, D., Kodner, J., and Yang, C. (2021). The greedy and recursive search
for morphological productivity. In Proceedings of CogSci 2021
.

Bjornsdottir, S. M. (2021). Productivity and the acquisition of gender. Journal of Child Language,
48:1209–1234.

Caplan, S., Kodner, J., and Yang, C. (2020). Miller’s monkey updated: Communicative efficiency
and the statistics of words in natural language. Cognition, 205:1044–1066.

Chomsky, N. (1951). Morphophonemics of Modern Hebrew. Master’s thesis, University of Pennsylvania. Published by Garland, New York, 1979.

Chomsky, N. (1957). Syntactic structures. Mouton, The Hague.

Chomsky, N. (2021). Simplicity and the form of grammars. Journal of Language Modelling, 9:5–15.

Cui, A. (2020). The emergence of phonological categories. Penn dissertation.

Dresher, B. E. (2020). Foundations of Contrastive Hierarchy Theory. Talk given at ABRALIN AO VIVO

Frank, M. C., Tenenbaum, J. B., and Gibson, E. (2013). Learning and long-term retention of large-scale artificial languages. PloS one, 8(1):e52500.

Gleitman, L. R. and Trueswell, J. C. (2020). Easy words: Reference resolution in a malevolent referent world. Topics in cognitive science, 12(1):22–47.

Goldin-Meadow, S. and Yang, C. (2017). Statistical evidence that a child can create a combina- torial linguistic system without external linguistic input: Implications for language evolution. Neuroscience and Biobehavioral Reviews, 81(Part B):150 – 157

Gorman, K. and Yang, C. (2019). When nobody wins. In Rainer, F., Gardani, F., Luschu ̈tzky, H. C., and Dressler, W. U., editors, Competition in inflection and word formation, pages 169–193. Springer, Berlin.

Hyman, L. M. (2018). Why underlying representations? Journal of Linguistics, 54(3):591–610.

Kirov, C. and Cotterell, R. (2018). Recurrent neural networks in linguistic theory: Revisiting Pinker and Prince (1988) and the past tense debate. Transactions of the Association for Computational Linguistics, 6:651–665.

Koehne, J., Trueswell, J. C., and Gleitman, L. R. (2013). Multiple proposal memory in observational word learning. In Proceedings of the 35th Annual meeting of the Cognitive Science Society. Austin, TX: Cognitive Science Society.

Labov, W. (1987). The overestimation of functionalism. In Dirven, R. and Fried, V., editors, Functionalism in Linguistics, pages 311–332. John Benjamins Publishing Company, Amsterdam.

Labov, W. (2012). What is to be learned. Review of Cognitive Linguistics, 10(2):265–293.

Lewontin, R. C. (1983). The organism as the subject and object of evolution. Scientia, 77(18).

Liang, K., Marsala, D., and Yang, C. (2022). Distributional learning of syntactic categories. In Gong, Y. and Kpogo, F., editors, Proceedings of the 46th Boston University Conference on Language Development, number 442-455, Somerville, MA. Cascadilla Press.

Liberman, M. (2018). Toward progress in theories of language sound structure. Shaping phonology, pages 201–222.

Lignos, C. (2011). Modeling infant word segmentation. In Proceedings of the 15th Conference on Computational Language Learning, pages 28–38.

Medin, D. L., Wattenmaker, W. D., and Hampson, S. E. (1987). Family resemblance, conceptual cohesiveness, and category construction. Cognitive psychology, 19(2):242–279.

Mintz, T. H. (2003). Frequent frames as a cue for grammatical categories in child directed speech. Cognition, 90(1):91–117.

Pine, J. M., Freudenthal, D., Krajewski, G., and Gobet, F. (2013). Do young children have adult- like syntactic categories? Zipf’s law and the case of the determiner. Cognition, 127(3):345–360.

Reeder, P. A., Newport, E. L., and Aslin, R. N. (2013). From shared contexts to syntactic categories: The role of distributional information in learning linguistic form-classes. Cognitive psychology, 66(1):30–54.

Richter, C. (2021). Alternation-sensitive phoneme learning: Implications for children’s development and language change. Penn dissertation.

Schuler, K., Yang, C., and Newport, E. (2016). Testing the Tolerance Principle: Children form productive rules when it is more computationally efficient to do so. In The 38th Cognitive Society Annual Meeting, Philadelphia, PA.

Shi, R. (2014). Functional morphemes and early language acquisition. Child Development Perspectives, 8(1):6–11.

Shi, R. (2023). Functional categories, determiners, prosody and early child grammar. In Armoskaite, D., Armoskaite, S., and Wiltschko, M., editors, Oxford handbook of determiners. Oxford University Press.

Shi, R. and Emond, E. (2023). The threshold of rule productivity in infants. Frontiers in Psychology, page In press.

Soh, C. and Yang, C. (2021). Memory constraints on cross situational word learning. In Proceedings of CogSci 2021, pages 307–313.

Stevens, J. S., Gleitman, L. R., Trueswell, J. C., and Yang, C. (2017). The pursuit of word meanings. Cognitive science, 41:638–676.

Surendran, D., Niyogi, P., et al. (2006). Quantifying the functional load of phonemic oppositions, distinctive features, and suprasegmentals. Amsterdam studies in the theory and history of linguistic science series 4, 279:43.

Swingley, D. (2022). Infants’ learning of speech sounds and word-forms. In Papafragou, A., Trueswell, J., and Gleitman, L., editors, The Oxford handbook of the mental lexicon. Oxford University Press.

Valian, V., Solt, S., and Stewart, J. (2009). Abstract categories or limited-scope formulae? The case of children’s determiners. Journal of Child Language, 36(4):743–778.

van Tuijl, R. and Coopmans, P. (2021). The productivity of Dutch diminutives. Linguistics in the Netherlands, 38(1):128–143.

Yang, C. (2002). Knowledge and learning in natural language. Oxford University Press, Oxford.

Yang, C. (2009). Population structure and language change. Ms., University of Pennsylvania.

Yang, C. (2013). Ontogeny and phylogeny of language. Proceedings of the National Academy of Sciences, 110(16):6324–6327.

Yang, C. (2016). The price of linguistic productivity: How children learn to break rules of language. MIT Press, Cambridge, MA.

Yang, C. (2023). A user’s defense of the Tolerance Principle. Beitrage zur Geschichte der deutschen Sprache und Literatur.

Yurovsky, D. and Yu, C. (2008). Mutual exclusivity in cross-situational statistical learning. In
Proceedings of the annual meeting of the cognitive science society, volume 30.


Resources

The CHILDES database

The 
SUBTLEX corpus

The 
English Lexicon Project.

Unix for Poets.

Python bootcamp.

The 
CMU Pronunication Dictionary.