1965-1969   Harvard University   Linguistics and Applied Mathematics    
1972 M.I.T. Linguistics M.S.
1972-1975 M.I.T. Linguistics Ph.D.

Professional Experience:

University of Pennsylvania:
  (Primary Appointment): Trustee Professor of Phonetics, Department of Linguistics: 1990-2010
                                         Christopher H. Browne Distinguished Professor of Linguistics: 2010-present
  (Secondary Appointment): Professor, Department of Computer and Information Science: 1992-present
 Member, Psychology Graduate Group
 Distinguished Research Fellow, Annenberg Public Policy Center
 Faculty Director, Ware College House: 2001-present
 Faculty Director, College Houses and Academic Services: 2006-2013
 Director, Institute for Research in Cognitive Science: 2001-2006
 Director, Linguistic Data Consortium: 1992-present

Head, Linguistics Research Department, AT&T Bell Laboratories: 1987-1990
Visiting Assistant Professor, M.I.T.: 1978
Member of Technical Staff, AT&T Bell Laboratories: 1975-1987

Fellow, American Association for the Advancement of Science
Fellow, Linguistic Society of America

Other Experience: U.S. Army, 1969-1972

Research Interests:

Corpus-based phonetics; speech and language technology; clinical applications of linguistic analysis; the phonology and phonetics of lexical tone, and its relationship to intonation; gestural, prosodic, morphological and syntactic ways of marking focus, and their use in discourse; formal models for linguistic annotation; information retrieval and information extraction from text; the organization of spoken communication in the human brain, especially in relation to the evolutionary substrates for speech and language, and to analogous systems in other animals; agent-based models of language evolution and learning.

Teaching at Penn:

LING001: Introduction to Linguistics
LING005: First-year Seminar -- "The Landscape of Research and Innovation at Penn"
LING052: First-year Seminar -- Big Data in Linguistics
LING052: First-year Seminar -- "Digital Science and Scholarship: Exploring Speech and Language"
COGS 501-502: Mathematical Foundations of Language and Communication Sciences
LING525/CIS558: Computer Analysis and Modeling of Biological Signals and Systems
LING520-521: Phonetics I-II
LING620: Research Topics in Clinical Linguistics
LING502/202: Field Linguistics
COGS001: Introduction to Cognitive Science
COLL002: Biology, Language and Culture
HUM100: Human Nature

Selected Professional Activies (current and past):

Director, Linguistic Data Consortium (1992-present).
Co-Editor, Annual Review of Linguistics (2014-present).
Editorial Advisory Boards: Cognition; Computer Speech and Language; Speech Communication; International Journal of Corpus Linguistics; Linguistics and Language Technology.
Executive Committee, Linguistic Society of America.
Chair, International Coordinating Committee on Speech Databases and Assessment (COCOSDA).
DARPA TIDES Advisory Committee.
U.S. Dictionaries Advisory Board, Oxford University Press.

Weblog: Language Log



(1979) The Intonational System of English. Garland Publishing.
(1995) Invitation to Cognitive Science, ed. with Lila Gleitman. MIT Press.
(2006) Far from the Madding Gerund, with Geoffrey K. Pullum. William, James & Co.


(1973) "Alternatives", CLS 9.
(1974a) "On Conditioning the Rule of Subject-Auxiliary Inversion", NELS 5, pp. 77-91.
(1974b) "Prosodic Form and Discourse Function", with I. Sag, CLS 10, pp. 416-427.
(1975a) "Intonational Disambiguation of Indirect Speech Acts", with I. Sag, CLS 11, pp. 487- 497.
(1975b) "The Intonational System of English" (MIT PhD dissertation)
(1977a) "The Geniohyoid and the Role of the Strap Muscles in Pitch Control", with D. Erickson and S. Niimi, Haskins Laboratories Quarterly Status Report 49, pp. 103-110.
(1977b) "Studies of Metrical Patterns", with J.P. Olive and P. Zukovsky, JASA 62 (S1).
(1977c) "Further Work on Duration Modeling in Reiterant Speech", JASA 62 (S1).
(1977d) "On Stress and Linguistic Rhythm", with A. Prince, Linguistic Inquiry 8 pp. 249-336.
(1978a) "Use of Nonsense-Syllable Mimcry in the Study of Prosodic Phenomena", with L.A. Streeter, JASA 63 pp. 231-233.
(1978b) "Modeling of Durational Patterns in Reiterant Speech", pp. 127-138 in D. Sankoff , Ed., Linguistic Variation: Models and Methods. Academic Press.
(1978c) "Phonetic Transcription, Stress and Segment Durations from Spelled Proper Names", JASA 64.
(1978d) "On the Nature of Normalization Functions for Durations in Speech", JASA 64.
(1979a) "Text-to-Speech Conversion by Rule and a Practical Application," with P.B. Denes and J.P. Olive, Proceedings of the 8th International Congress of Phonetic Sciences, Copenhagen. Vol. I, p. 350.
(1979b) "A Set of Concatenative Units for Speech Synthesis", with J.P. Olive, JASA 65.
(1979c) "A Metric for the Height of Certain Pitch Peaks in English", with J.B. Pierrehumbert, JASA 66.
(1979d) "The Intrinsic Pitch of Vowels in Sentence Context", with C.H. Shadle and J..B. Pierrehumbert, JASA 66.
(1980) "Intelligibility of Consonants Produced by Dyadic Rule Synthesis", with J.P. Olive and K. O'Connor-Dukes, JASA 68.
(1981a) "Speech Recognition by Computer", with S.E. Levinson, Scientific American, April 1981, V. 244, p. 64-76.
(1981b) "Effects of Linguistic Boundary and Stress Placement on Speech Dynamics: A Preliminary Study Using a New Method to Compare Articulatory Movement Across Uttersances", with O. Fujimura, JASA 69.
(1982a) review of J.C. Simon, Ed. Spoken Language Generation and Understanding, in JASA 72 p. 1657.
(1982b) "Test of an Automatic Syllable Peak Detector", with M. Fleck, JASA 72.
(1982c) "Modeling the Fundamental Frequency of the Voice", with J.B. Pierrehumbert, Contemporary Psychology 27 pp. 690-692.
(1983a) "In Favor of Some Uncommon Approaches to the Study of Speech", pp. 265-274 in P. MacNeilage, Ed. Speech Motor Control, Springer Verlag.
(1983b) "The Symmetric Time-warping Problem: from Continuous to Discrete", with J.B. Kruskal, pp. 125-163 in D. Sankoff and J.B. Kruskal, Eds., Time Warps, String Edits and Macromolecules, Addison-Wesley. Republished 1999 in the David Hume Series, CSLI Publications.
(1983c) "Neurobiology of Language Processes: a Linguistic Point of View", pp. 7-16 in M. Studdert-Kennedy, Ed., The Psychobiology of Language, MIT Press.
(1983d) "On finding the iguana", with J.B. Pierrehumbert, Contemporary Psychology 28.
(1984a) "Words and Sounds", pp. 157-173 in Kerr et al. Eds., Science, Computers and the Information Onslaught, Academic Press.
(1984b) "Intonational Invariance under Changes in Pitch Range and Length", with J.B. Pierrehumbert, pp. 157-234 in M. Aronoff and R. Oehrle, Eds., Language Sound Structure, MIT Press.
(1984c) "Synthesis by rule of English intonation patterns", with M. Anderson and J. Pierrehumbert, IEEE ICASSP 1984.
(1985) "Text to Speech Work at Bell Laboratories", with J.P. Olive, JASA 79.
(1986a) "Connectionist Models of Natural Language", Proceedings of the 24th Annual Meeting of the Association for Computational Linguistics.
(1986b) "Synthesis of Falling Nuclear Pitch Accents", with S. Steele, JASA 80.
(1986b) "Stressing English Noun Compounds Correctly", with Richard Sproat, JASA 80.
(1987a) "The Shape and Alignment of Rising Intonation", with S. Steele, JASA 81.
(1987b) "Towards Treating English Nominals Correctly", with R. Sproat, pp. 140-146 in Proceedings of the 25th Annual Meeting of the ACL.
(1989a) "Speaker Independent Phonetic Transcription of Fluent Speech for Large Vocabulary Speech Recognition," with S.E. Levinson and A. Ljolje, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing.
(1989b) "The ACL Data Collection Initiative", pp. 173-188 in Proceedings of DARPAWorkshop on Speech and Natural Language Processing, Morgan Kaufman.
(1990a) "Morphology and Rhyming: Two Powerful Alternatives to Letter-to-Sound Rules for Speech Synthesis", with C.H. Coker and K.W. Church, pp. 83-87 in Proceedings of the ESCA Workshop on Speech Synthesis.
(1990b) "A Finite-state Morphological Processor for Spanish," with E. Tsoukermann, pp. 277-283 in Hans Karlgren, Ed., COLING90.
(1991a) "Cryptographic Protection of Databases and Software," with J. Feigenbaum and R. Wright, in Feigenbaum and Merrit, editors, Distributed Computing and Cryptography, AMS and ACM, v. 2, 161-172.
(1991b) "The Trend Towards Statistical Models in Natural Language Processing", in Natural Language and Speech, edited by E. Klein and F. Veltman, Springer Verlag, pp. 1-8.
(1991c) "A Procedure for quantitatively comparing the syntactic coverage of English grammars", with S. Abney, S. Flickenger, C. Gdaniec, R. Grishman, P. Harrison, D. Hindle, R. Ingria, F. Jelinek, J. Klavans, M. Marcus, S. Roukos, B. Santorini & T. Strzalkowski
(1992a) "The Phonetics of Igbo Tone", with J.M. Schultz, S. Hong and V. Okeke, Procedings of IRCS Prosody Workshop, August 1992.
(1992b) "Text Analysis and Word Pronunciation in Text-to-Speech Synthesis," with K. Church, pp. 791-832 in Furui and Sondhi, Eds., Advances in Speech Technology, Marcel Dekker.
(1992c) "The Stress and Structure of Modified Noun Phrases in English," with R. Sproat, pp. 131-181 in Lexical Matters, Sag and Szabolsci, Eds. University of Chicago Press.
(1992d) "The Structure and Intonation of Business Telephone Greetings", with C. McLemore, pp. 68-83 in Penn Review of Linguistics, 1992.
(1993) "The Phonetic Interpretation of Tone in Igbo" with J.M. Schultz et al., Phonetica 50(3) 147-160.
(1994a) "Computer Speech Synthesis: Its Status and Prospects," pp. 107-116 in D. B. Roe and J. G. Wilpon, Eds., Voice Communication between Humans and Machines, National Academy Press.
(1994b) "Phonological Optionality in Latin Clitics," Penn Review of Linguistics, 1994.
(1994c) "UNIPEN project of on-line data exchange and recognizer benchmarks", with I. Guyon et al., in Proceedings, International Conference on Pattern Recognition.
(1994d) "Commentary on Kaplan and Kay", Computational Linguistics 20(3).
(1995a) "On the Phonetic Interpretation of the Yoruba Tonal System" with Akin Akinlabi, in Proceedings, International Congress of Phonetic Sciences.
(1995b) "The Sound Structure of Mawu Words," pp. 55-86 in Invitation to Cognitive Science, L. Gleitman and M. Liberman, Eds., MIT Press.
(1995c) "The Cognitive Science of Language," with Lila Gleitman, in L. Gleitman and M. Liberman, Eds. Invitation to Cognitive Science, MIT Press, pp. xix-xxxvii.
(1996a) "Error analysis and disfluency modeling in the switchbboard domain", with R. Rosenfeld, R. Agaarwal, B. Byrne, R. Iyer, E. Shriberg, J. Unverfuehrt, D. Vergyri, and E. Vidal. ICSLP1996.
(1996b) "Web presentation of very large text and speech corpora", with Xhibiao Wu.
(1997) "Le Consortium de Données Linguistiques", Journées Scientifiques et Techniques du Réseau Francophone d'Ingénierie de la Langue, l'AUPELF-UREF.
(1998a) "The Creation, Distribution and Use of Linguistic Data," with C. Cieri, Proceedings, First International Conference on Language Resources and Evaluation, Granada, 1998.
(1998b) "Transcriber: a Free Tool for Segmenting, Labeling and Transcribing Speech," with C. Barras, E. Geoffrois and Z. Wu, Proceedings of LREC-1998.
(1998c) "Towards a Formal Framework for Linguistic Annotation", with S. Bird, Proceedings, International Conference on Spoken Language Processing, Sydney, 1998.
(1998d) "Annotation Graphs as a Framework for Multidimensional Linguistic Data Analysis", with S. Bird, Proceedings, Workshop on Standards and Tools for Discourse Tagging, Association for Computational Linguistics.
(1998e) "The Creation, Distribution and Use of Linguistic Data", with C. Cieri, Proceedings of LREC-1998.
(1999a) "The TDT-2 Text and Speech Corpus", with C. Cieri, D. Graff, N. Martey and S. Strassel, Proceedings of DARPA Broadcast News Workshop, 1999.
(1999b) "Topic Detection and Tracking using IDF-Weighted Cosine Coefficient", with M. Schultz, Proceedings of DARPA Broadcast News Workshop, 1999
(1999c) "The TDT-2 Text and Speech Corpus", with C. Cieri, D. Graff, N. Martey, & S. Strassel. DARPA Broadcast News Workshop.
(1999d) "BITS: A Method for Bilingual Text Search over the Web", with X. Ma. Machine Translation Summit VII
(1999e) "A Formal Framework for Linguistic Annotation",with Steven Bird. Department of Computer and Information Science Technical Report, University of Pennsylvania, 1999.
(2000a) "Issues in Corpus Creation and Distribution", with C. Cieri, pp. 49-56, Proceedings of LREC-2000, Athens.
(2000b) "Large Multilingual Broadcast News Corpora for Cooperative Research in Topic Detection and Tracking", with C. Cieri, D. Graff, M. Nii and S. Strassel, pp. 925-930, Proceedings of LREC-2000, Athens.
(2000c) "ATLAS: A Flexible and Extensible Architecture for Linguistic Annotation", with S. Bird, D. Day, J. Garofolo, J. Henderson and C. Laprun, pp. 1699-1706, Proceedings of LREC-2000, Athens.
(2000d) "Issues in Corpus Creation and Distribution: the Evolution of the Linguistic Data Consortium", with C. Cieri, In Proceedings LREC-2000.
(2000e) "The Tonal Phonology of Yoruba Clitics", with A. Akinlabi, pp. 31-63 in B. Gerlach and J. Grijzenhout, Eds., Clitics in Phonology, Morphology and Syntax, John Benjamins
(2001a) "Transcriber: development and use of a tool for assisting in the creation of speech corpora", with C. Barras, E. Geoffrois and Z. Wu, Speech Communication 33.1-2 (pp. 5-22).
(2001b) "A Formal Framework for Linguistic Annotation", with S. Bird, Speech Communication 33.1-2 (pp. 23-60).
(2001c) "Tonal Complexes and Tonal Alignment", with A. Akinlabi, in NELS 31.
(2002a) Schultz, M. and M. Liberman, "Towards a 'Universal Dictionary' for Multi-Language Information Retrieval Applications", in J. Allan, J. Carbonell and J. Yamron, Eds., Topic Detection and Tracking: Event-based Information Organization, Kluwer International Series on Information Retrieval, Kluwer Academic Press.
(2002b) "Corpora for Topic Detection and Tracking", with C. Cieri, S. Strassel, D. Graff, N. Martey, K. Rennert and M. Liberman. In J. Allan, J. Carbonell and J. Yamron, Eds., Topic Detection and Tracking: Event-based Information Organization, Kluwer International Series on Information Retrieval, Kluwer Academic Press.
(2002c) "TIDES language resources: A resource map for translingual information access", with C. Cieri. LREC-2002.
(2003a) "Shallow Semantic Annotation of Biomedical Corpora for Information Extraction", with S. Kulick, M. Palmer, and A. Schein. BioLink2003.
(2003b) "Automated information extraction from biomedical text", with R.S. Winters, Y. Jin, R. McDonald, S. Kulick, A. Bies, M.A. Mandel, E. Pancoast, F.C.N. Pereira &, P.S. White. TIBETS.
(2004) "Integrated Annotation for Biomedical Information Extraction", with.S. Kulick, A. Bies, M. Mandel, R. McDonald, M. Palmer, A. Schein, and L. Ungar. HLT/NAACL.
(2005a) " Identifying and Extracting Malignancy Types in Cancer Literature.", with Y. Jin, K. Lerman, M. Mandel, R. McDonald, F. Pereira, P. White, and S. Winters. BioLink2005.
(2005b) 'The Place of Culture in a World Dictionary of the Yoruba Language", with Yiwola Awoyale, in Toyin Falola and Ann Genova, Eds., Yoruba Creativity, Africa World Press.
(2006a) "Linguistic Data Resources", with C. Cieri, V. Arranz and K. Choukri, Chapter 3 in Tanja Schultz and Katrin Kirchhoff (eds.) Multilingual Speech Processing, Elsevier, Academic Press.
(2006b) "The Mixer and Transcript Reading Corpora: Resources for Multilingual, Crosschannel Speaker Recognition Research", with C. Cieri, W. Andrews, J.P. Campbell, G. Doddington, J. Godfrey, S. Huang, A. Martin, H. Nakasone, M. Przybocki, & K. Walker. LREC 2006
(2006c) "Towards an Integrated Understanding of Speaking Rate in Conversation", with J. Yuan and, C. Cieri. Interspeech 2006.
(2006d) "Automated recognition of malignancy mentions in biomedical literature", with Y. Jin, R. T. McDonald, K. Lerman, M. A. Mandel, S. Carroll, F. C. Pereira, R. S. Winters, & P. S. White. .BMC Bioinformatics.
(2006e) "A Context Pattern Induction Method for Named Entity Extraction", with P. Talukdar, T. Brants, and F. Pereira. Computational Natural Language Learning (CoNLL-X).
(2006f) "Language and gender differences in speech overlaps in conversation", with J. Yuan and C. Cieri, J. Acoust. Soc. Am. 120, 3295
(2006g) "Frequency and amplitude derivatives as syllable-level F0 features", with J. Yuan, J. Acoust. Soc. Am. 120, 3090
(2007a) "Lightly supervised attribute extraction", with K. Bellare, P. Talukdar, F. Pereira & A. McCallum (NIPS 2007)
(2007b) "Towards an Integrated Understanding of Speech Overlaps in Conversation", with J. Yuan and C. Cieri (ICPhS2007)
(2007c) "Perception of Disfluency: Language Differences and Listener Bias", with C. Lai, K. Gorman & J. Yuan (InterSpeech 2007)
(2008a) "Vowel acoustic space in continuous speech: An example of using audio books for research", with Jiahong Yuan (CatCod 2008)
(2008b) "Speaker identification in the SCOTUS corpus", with Jiahong Yuan (Acoustics 2008)
(2008c) "Different Roles of Pitch and Duration in Distinguishing Word Stress in English", with Stephen Isard and Jiahong Yuan (InterSpeech 2008)
(2009a) "Automatic formant extraction for sociolinguistic analysis of large corpora", with Keelan Evanini and Stephen Isard (InterSpeech 2009)
(2009b) "Investigating /l/ Variation in English through Forced Alignment", with Jiahong Yuan (InterSpeech 2009)
(2009c) "The annotation conundrum" (EACL 2009)
(2010a) "Robust speaking rate estimation using broad phonetic class recognition", with Jiahong Yuan (IEEE ICASSP 2010)
(2010b) "F0 Declination in English and Mandarin Broadcast News Speech", with Jiahong Yuan (Interspeech 2010)
(2010c) "Fred Jelinek", Comptuational Linguistics 36(4): 595-599, Dec. 2010
(2010d) "A New Approach to Lexical Disambiguation of Arabic Text", with Rushin Shah and Lyle Ungar (EMNLP 2010)
(2011a) "Automatic measurement and comparison of vowel nasalization across languages," with Jiahong Yuan, ICPhS XVII 2244-2247
(2011b) "Automatic detection of 'g-dropping' in American English using forced alignment," with Jiahong Yuan, (IEEE ICASSP 2011)
(2011c) "Speech Processing Tools - An Introduction to Interoperability", with Christoph Draxler, Toomas Altosaar, Sadaoki Furui, and Peter Wittenburg (Interspeech 2011)
(2011c) "Mining a year of speech", with John Coleman, Greg Kochanski, Lou Burnard, and Jiahong Yuan. VLSP 2011: New tools and methods for very-large-scale phonetics research
(2012) "/l/ variation in American English: A corpus approach," with Jiahong Yuan ( Journal of Speech Sciences, 1(2), pp. 35-46)
(2013a) "Articulatory trajectories for large-vocabulary speech recognition," with V. Mitra, W. Wang, A. Stolcke, H. Nam , C. Richey ,and J. Yuan (IEEE ICASSP 2013)
(2013b) "Using Multiple Versions of Speech Input in Phone Recognition", with Jiahong Yuan, Andreas Stolcke, Wen Wang, and Vikramjit Mitra, (IEEE ICASSP 2013)
(2013c) "Scale Space Expansion of Acoustic Features Improves Speech Event Detection", with Neville Ryant and Jiahong Yuan (IEEE ICASSP 2013)
(2013d) "A Cross-language Study on Automatic Speech Disfluency Detection", with Wen Wang, Andreas Stolcke, and Jiahong Yuan (NAACL-HLT 2013)
(2013e) "Automating phonetic measurement: The case of voice onset time", with Neville Ryant and Jiahong Yuan (ICA 2013)
(2013f) "Automatic Phonetic Segmentation using Boundary Models", with Jiahong Yuan, Neville Ryant, Andreas Stolcke, Vikramjit Mitra, and Wen Wang (InterSpeech 2013)
(2013g) "Speech Activity Detection on YouTube Using Deep Neural Networks", with Neville Ryant and Jiahong Yuan (InterSpeech 2013)
(2014a) "Highly Accurate Phonetic Segmentation Using Boundary Correction Models and System Fusion", with Andreas Stolcke, Neville Ryant, Vikramjit Mitra, Jiahong Yuan, and Wen Wang (IEEE-ICASSP 2014)
(2014b) "Automatic Phonetic Segmentation in Mandarin Chinese: Boundary Models, Glottal Features and Tone", with Jiahong Yuan and Neville Ryant (IEEE-ICASSP 2014)
(2014c) "Mandarin Tone Classification Without Pitch Tracking", with Neville Ryant and Jiahong Yuan (IEEE-ICASSP 2014)
(2014d) "Highly Accurate Mandarin Tone Classification In The Absence of Pitch Information", with Neville Ryant, Malcolm Slaney, Elizabeth Shriberg, and Jiahong Yuan (Speech Prosody 2014)
(2014e) "F0 Declination in English and Mandarin Broadcast News Speech", with Jiahong Yuan, Speech Communication Nov-Dec. 2014
(2014f) "Parser Evaluation Using Derivation Trees", with Seth Kulick, Ann Bies, Justin Mott, Antony Kroch, and Beatrice Santorini, ACL 2014.
(2014g) "New directions for language resource development and distribution", with Christopher Cieri, Denise DiPersio, Andrea Mazzuchi, Stephanie Strassel, and Jonathan Wright (LREC 2014).
(2015a) "A Crosslinguistic Study of Prosodic Focus", with Yong-cheol Lee, Bei Wang, Sisi Chen, Martine Adda-Decker, Angélique Amelot, and Satoshi Nambu (IEEE ICASSP 2015).
(2015b) "The effect of spectral slope on pitch perception", with Jianjing Kuang (InterSpeech 2015)
(2015c) "Investigating Consonant Reduction in Mandarin Chinese with Improved Forced Alignment", with Jiahong Yuan, (InterSpeech 2015)
(2015d) "Sentence selection for automatic scoring of Mandarin proficiency", with Jiahong Yuan, Xiaoying Xu, Wei Lai, Weiping Ye, Xinru Zhao,( SIGHAN 2015
(2015e) "Development of pitch contrast in Korean prosody: A corpus study",with Sunghye Cho and Yong-cheol Lee, (ICKL 2015 )
(2015f) "Annual Review of Linguistics: Introduction to Volume 1", with Barbara Partee, Annual Review of Linguistics 2015
(2016a) "Variation and change in the use of hesitation markers in Germanic languages", with Martijn Wieling, Jack Grieve, Gosse Bouma, Joseph Fruehwald, John Coleman Language Dynamics and Change, 2016
(2016b) "Pauses and Pause Fillers in Mandarin Monologue Speech: The Effects of Sex and Proficiency", with Jiahong Yuan, Xiaoying Xu, & Wei Lai (IEEE ICASSP2016)
(2016c) "Voice quality as a pitch-range indicator", with Jianjing Kuang and Yixuan Guo, (IEEE ICASSP2016)
(2016d) "The effect of vocal fry on pitch perception", with Jianjing Kuang (IEEE ICASSP 2016)
(2016e) "Morris Halle: An Appreciation", Annual Review of Linguistics 2016
(2016f) "Building Language Resources for Exploring Autism Spectrum Disorders", with Julia Parish-Morris, Christopher Cieri, Leila Bateman, Emily Ferguson and Robert Schultz (LREC 2016)
(2016g) "Exploring Autism Spectrum Disorders Using HLT", with Julia Parish-Morris, Neville Ryant, Christopher Cieri, Leila Bateman, Emily Ferguson and Robert Schultz, CLPsych Workshop (NAACL-HLT 2016)
(2016h) "Automatic Analysis of Phonetic Speech Style Dimensions", with Neville Ryant, (InterSpeech 2016)
(2016i) "The rhythmic constraint on prosodic boundaries in Mandarin Chinese based on corpora of silent reading and speech perception", with Wei Lai, Jiahong Yuan, Ya Li, & Xiaoying Xu (InterSpeech 2016)
(2016j) "Phoneme, Phone Boundary, and Tone in Automatic Scoring of Mandarin Proficiency", with Jiahong Yuan (InterSpeech 2016)
(2016k) "Pitch-range perception: the dynamic interaction between voice quality and fundamental frequency", with Jianjing Kuang (InterSpeech 2016)
(2016l) "Prosodic Strength Intrinsic to Lexical Items: A Corpus Study of Tone Reduction in Tone4+Tone4 Words in Mandarin Chinese", with Wei Lai & Jiahong Yuan (ISCSLP 2016)
(2016m) "Large-scale analysis of Spanish /s/-lenition using audiobooks" , with Neville Ryant (International Congress on Acoustics 2016).
(2016n) "Production and Perception of Tone 3 Focus in Mandarin Chinese", with Yong-Cheol Lee and Ting Wang, Frontiers in Psychology 26 July 2016.
(2016o) "Weighted log-odds-ratio, informative Dirichlet prior method to compare peer review feedback for top and bottom quartile college students in a first-year writing program", with Valerie Ross, Lan Ngo, and Roger LeGrand, Educational Data Mining 2016.
(2017a) "Introducing a Novel Community-Based Assessment Tool: The Computerized Social Affective Language Task (C-SALT)", with  N. Minyanou, L. Bateman, C. Cieri, N. Ryant, J. Brown, E. Kim Z. Dravis, E. Ferguson, K. Bartley, A. Pomykacz, J. Pandey, A. de Marchena, R. Schultz, and J. Parish-Morris, IMFAR 2017 [poster]
(2017b) "Automatic Measurement of Prosody in Behavioral Variant FTD", with Naomi Nevler, Sharon Ash, Charles Jester, David Irwin, and Murry Grossman, Neurology.
(2017c) "Linguistic camouflage in girls with autism spectrum disorder", with Julia Paris-Morris, Christopher Cieri, John Herrington, Benjamin Yerys, Leila Batman, Joseph Donaher, Emily Ferguson, Juhi Pandey, and Robert Schultz, Molecular Autism.
(2017d) "Chinese TIMIT: A TIMIT-like corpus of standard Chinese", with Jiahong Yuan, Hongwei Ding, Sishi Liao, and Yuqing Zhan, OCOCOSDA 2017.
(2018a) "Enhancement and analysis of conversational speech: JSALT 2017", with Elika Bergelson, Kenneth Church, Alejandrina Cristia, Jun Du, Sriram Ganapathy, Sanjeev Khudanpur, Diana Kowalski, Mahesh Krishnamoorthy, Rajat Kulshreshta, Yu-Ding Lu, Matthew Maciejewski, Florian Metze, Jan Profant, Neville Ryant, Lei Sun, Yu Tsao, and Zhou Yu, ICASSP 2018.
(2018b) "Using Forced Alignment for Phonetics Research", with Jiahong Yuan, Wei Lai, and Chris Cieri, in Huang, Jin, & Hsieh, Eds., Chinese Language Resources and Processing: Text, Speech and Language Technology, Springer.
(2018c) "From ‘Solved Problems’ to New Challenges: A Report on LDC Activities", with Christopher Cieri, Stephanie Strassel, Denise DiPersio, Jonathan Wright and Andrea Mazzucchi, LREC 2018.
(2018d) "Introducing NIEUW: Novel Incentives and Workflows for Eliciting Linguistic Data", with Christopher Cieri, James Fiumara, Chris Callison-Burch and Jonathan Wright, LREC 2018.
(2018e) "GlobalTIMIT: Acoustic-Phonetic Datasets for the World’s Languages", with Nattanun Chanchaochai, Christopher Cieri, Japhet Debrah, Hongwei Ding, Yue Jiang, Sishi Liao, Jonathan Wright, Jiahong Yuan, Juhong Zhan, Yuqing Zhan, Interspeech 2018.
(2018f) "Validated Automatic Speech Biomarkers in Primary Progressive Aphasia", with Naomi Never, Sharon Ash, David Irwin, Mark Liberman, Murray Grossman, Annals of Clinical and Translational Neurology
(2018g) "Towards Progress in Theories of Language Sound Structure", In Diane Brentari and Jackson Lee, Eds., Shaping Phonology, University of Chicago Press.
(2018h)"Integrating voice quality cues in the pitch perception of speech and non-speech utterances", with Jianjing Kuang, Frontiers in Psychology
(2019a) "Corpus Phonetics", Annual Review of Linguistics.
(2019b) "The impossibility of language acquisition (and how they do it)", with Lila Gleitman, Cynthia McLemore, and Barbara Partee, Annual Review of Linguistics.
(2019c) "Compensation for F0 variation with vocal effort and vowel height in Cantonese tone perception", with Wei Lai and Qianxin He, ICPhS 2019.
(2019d) "Automatic detection of ASD in children using acoustic and text features from brief natural conversations", with Sunghye Cho, Neville Ryant, Meredith Cola, Robert T. Schultz and Julia Parish-Morris, Interspeech 2019.
(2019e) "Automatic Detection of Prosodic Focus in American English", with Sunghye Cho and Yong-cheol Lee, Interspeech 2019.
(2019f) "The Second DIHARD Diarization Challenge: Dataset, task, and baselines", with Neville Ryant, Kenneth Church, Christopher Cieri, Alejandrina Cristia, Jun Du, and Sriram Ganapathy, Interspeech 2019.
(2019g) "Introduction to the special issue on annotated corpora", with Marie Candito, Traitement Automatique des Langues 2019.
(2020a) "Human Language Technology", with Charles Wayne. AI Magazine, Summer 2020.
(2020b) "Automatic classification of primary progressive aphasia patients using lexical and acoustic features", with Sunghye Cho, Naomi Nevler, Sanjana Shellikeri, Sharon Ash, and Murray Grossman. LREC 2020.
(2020c) "A new acoustic-based pronunciation distance measure", with Martijn Bartelds, Caitlin Richter, and Martijn Wieling, Frontiers in Artificial Intelligence 2020.
(2020d) "Automated Analysis of Natural Speech in Amyotrophic Lateral Sclerosis Spectrum Disorders", with Naomi Nevler, Sharon Ash, Corey T. McMillan, Lauren Elman, Leo McCluskey, David J Irwin, Sunghye Cho, and Murray Grossman. Neurology.
(2020e) "Measuring foreign accent strength using an acoustic distance measure", with Martijn Bartelds, Wietse de Vries, Caitlin Richter, and Martijn Wieling. In 12th International Seminar on Speech Production, pp. 17-20.
(2021a) "Lexical and acoustic characteristics of young and older healthy adults", with Sunghye Cho, Naomi Nevler, Nataliia Parjane, David Irwin, Neville Ryant, Sharon Ash, Christopher Cieri, and Murray Grossman. Forthcoming in Journal of Speech, Language, and Hearing Research.
(2021b) "Automated analysis of lexical features in Frontotemporal Degeneration", with Sunghye Cho, Naomi Nevler, Sharon Ash, Sanjana Shellikeri, David Irwin, Laruen Massimo, Katya Rascovsky, Christopher Olm, and Murray Grossman. Cortex 2021.
(2021c) "Birdsong Leaning and Culture: Analogies with Human Spoken Language", with Julia Hyland Bruno, Erich Jarvis, and Ofer Tchernichovski. Annual Review of Linguistics.
(2021d) "The Future of Computational Linguistics: On Beyond Alchemy", with Kenneth Church. Frontiers in Artificial Intelligence, 2021.
(2021e) "The Third DIHARD Diarization Challenge", with Neville Ryant, Prachi Singh, Venkat Krishnamohan, Rajat Varma, Kenneth Church, Christopher Cieri, Jun Du, and Sriram Ganapathy. Interspeech 2021.
(2021f) "Probing Acoustic Representations for Phonetic Properties", with Danni Ma and Neville Ryant. IEEE ICASSP 2021.
(2021g) "Digital Speech Analysis in Progressive Supranuclear Palsy and Corticobasal Syndromes", with Natalia Parjane, Sunghye Cho, Sharon Ash, Katheryn Cousins, Sanjana Shellikeri, Leslie Shaw, David Irwin, Murray Grossman, and Naomi Nevler. Forthcoming in Journal of Alzheimer's Disease.
(2021h) "Natural Language Processing Methods are Sensitive to Sub-Clinical Linguistic Differences in Schizophrenia Spectrum Disorders", with Sunny Tang, Reno Kriz, Sunghye Cho, Suh Jung Park, Jenna Harowitz, Raquel Gur, Mahendra Bhati, Daniel Wolf, and João Sedoc. NPJ Schizophrenia 2021
(2021j) "Automated analysis of digitized letter fluency data", with Sunghye Cho, Naomi Nevler, Natalia Parjane, Christopher Cieri, Murray Grossman, and Katheryn Cousins. Frontiers in Psychology, 2021
(2021k) "Automatic recognition of suprasegmentals in speech", with Jiahong Yuan, Neville Ryant, Xinghu Cai, and Kenneth Chuch.
(2022a) "Neural Representations for Modeling Variation in English Speech", with Martijn Bartelds, Wietse de Vries, Faraz Sanal, Caitlin Richter, and Martijn Wieling. Journal of Phonetics, 2022.
(2022b) "Lexical and acoustic speech features relating to Alzheimer’s disease pathology", with Sunghye Cho, Katheryn Cousins, Sanjana Shellikeri, Sharon Ash, David Irwin, Murray Grossman, and Naomi Nevler. Neurology.
(2022c) "Who does what to whom? graph representations of action-predication in speech relate to psychopathological dimensions of psychosis", with Amir Nikzad, Yan Cong, Sarah Berretta, Katrin Hänsel, Sunghye Cho, Sameer Pradhan, Leily Behbehani, Danielle DeSouza & Sunny X. Tang, NPJ Schizophrenia, 2022
(2022d) "Identifying stable speech-language markers of autism in children: Preliminary evidence from a longitudinal telephony-based study", with Sunghye Cho, Riccardo Fusaroli, Maggie Rose Pelella, Kimberly Tena, Azia Knox, Aili Hauptmann, Maxine Covello, Alison Russell, Judith Miller, Alison Hulink, Jennifer Uzokwe, Kevin Walker, James Fiumara, Juhi Pandey, Christopher Chatham, Christopher Cieri, Robert Schultz, & Julia Parish-morris. CLPsych 2022.
(2022e) "Natural speech markers of Alzheimer’s disease co-pathology in Lewy Body dementias", with Sanjana Shellikeri, Sunghye Cho, Katheryn A.Q. Cousins, Erica Howard, Yvonne Balganorth, Daniel Weintraub, Meredith Spindler, Andres Deik, Edward B. Lee, John Q. Trojanowski, David Irwin, David Wolk, Murray Grossman, and Naomi Nevler. Parkinsonism and Related Disorders, 2022.
(2022f) "GRAIL—A Generalized Representation and Aggregation of Information Layers", with Sameer Pradhan, 16th Annual Linguistic Annotation Workshop, LREC 2022.
(2022g)"Reflections on 30 Years of Language Resource Development and Sharing", with Christopher Cieri, Sunghey Cho, Stephanie Strassel, James Fiumara, and Jonathan Wright, LREC 2022.
(2022h) "The mapping between syntactic and prosodic phrasing in English and Mandarin", with Jianjing Kuang, May Pik Yu Chan, Nari Rhee, and Hongwei Ding. Interspeech 2022.
(2022i) "Inferring pitch from coarse spectral features", with Danni Ma and Neville Ryant. POMA 2022.
(2023a) "Evolution of linguistic markers of agency, centrality and content during metacognitive therapy for psychosis". with Amir Nikzad, Paul Lysaker, Kyle Minor, Bethany Leonhardt, Hemifer Vohs, Courtney Wiesepape, and Sunny Tang.
(2023b) "Sex differences in the temporal dynamics of autistic children’s natural conversations", with Sunghye Cho, Meredith Cola, Azia Knox, Maggie Rose Pelella, Alison Russell, Aili Hauptmann, Maxine Covello, Christopher Cieri, Robert T Schultz & Julia Parish-Morris, Molecular Autism 2023.
(2023c) "Latent factors of language disturbance and relationships to quantitative speech features", with Sunny Tang, Katrin Hänsel, Yan Cong, Amir H Nikzad, Aarush Mehta, Sunghye Cho, Sarah Berretta, Leily Behbehani, Sameer Pradhan, & Majnu John. Schizophrenia Bulletin.
(2023d) "Characterizing and Detecting Delirium with Clinical and Computational Measures of Speech and Language Disturbance", with Sunny Tang, Yan Cong, Gwenyth Mercep, Mutahiri Bhatti, Grace Serpe, Valeria Gromova, Sarah Berretta, Majnu John, & Liron Sinvani. Journal of Psychiatry and Neuroscience, 7/4/2023.
(2023e) "Comparison of Category and Letter Fluency Tasks Through Automated Analysis", with Carmen Gonzalez-Recober, Naomi Nevler, Sanjana Shellikeri, Katheryn Cousins, Emma Rhodes, Murray Grossman, David Irwin & Sunghye Cho. Frontiers in Psychology.
(2023f) "Digital markers of motor speech impairments in spontaneous speech of patients with ALS-FTD spectrum disorders", with Sanjana Shellikeri, Sunghye Cho, Sharon Ash, Carmen Gonzalez-Recober, Corey T. Mcmillan, Lauren Elman, Colin Quinn, Defne A. Amado, Michael Baer, David J. Irwin, Lauren Massimo, & Christopher A. Olm. Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration.
(2023g) "Latent Factors of Language Disturbance and Relationships to Quantitative Speech Features", with Sunny X. Tang, Katrin Hänsel, Yan Cong, Amir H. Nikzad, Aarush Mehta, Sunghye Cho, Sarah Berretta, Leily Behbehani, Sameer Pradhan, and Majnu John. Schizophrenia Bulletin.
(2023h) "Using Forced Alignment for Phonetics Research", with Jiahong Yuan, Wei Lai, & Christopher Cieri. Chinese Language Resources, Springer.
(2024a) "Automated Measures of Syntactic Complexity in Natural Speech Production: Older and Younger Adults as a Case Study", with Galit Agmon, Sameer Pradhan, Sharon Ash, Naomi Nevler, Murray Grossman, and Sunghye Cho. Journal of Speech Language and Hearing Research.
(2024b) "Changes in Digital Speech Measures in Asymptomatic Carriers of Pathogenic Variants Associated With Frontotemporal Degeneration", with Naomi Nevler, Sunghye Cho, Katheryn Cousins, Sharon Ash, Christopher Olm, Sanjana Shellikeri, Galit Agmon, Carmen Gonzalez-Recober, Sharon Xie, Megan Barker, Masood Manoochehri, Corey Mcmillan, David Irwin, Lauren Massimo, Laynie Dratch, Gayathri Cheran, Edward D Huey, Stephanie Cosentino, Vivianna Van Deerlin, & Murray Grossman. Neurology.
(2024c) "Automatic classification of AD pathology in FTD phenotypes using natural speech", with Sunghye Cho, Christopher Olm, Sharon Ash, Sanjana Shellikeri, Galit Agmon, Katheryn Counis, David Irwin, Murray Grossman, and Naomi Nevler. Alzheimer's & Dementia.
(2024d) "Speech markers of depression dimensions across cognitive status", with Laili Soleimani, Yuxia Ouyang, Sunghye Cho, Arash Kia, Michal Schnaider Beeri, Hung‐Mo Lin, Ramit Ravona‐Springer, Nadia Ramsingh, Murray Grossman, and Naomi Nevler. Forthcoming in a special issue of Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring.
(2024e) "Some prosodic consequences of varied discourse functions in a Cantonese sentence-final particle", with Jonathan Him Nok Lee, Ka-Fai Yip, and Jianjing Kuang. Speech Prosody 2024.
(2024f) "The impact of prosodic boundary and information structure on tonal coarticulation in spontaneous Cantonese", with Xin Gao and Cesko Voeten. Speech Prosody 2024.
(2024g) "Do we EXPECT TO find phonetic traces for syntactic traces?", with Jonathan Him Nok Lee and Martin Salzmann. Interspeech 2024.

Manuscripts and submitted papers:

"Featurizing Text: Converting Text into Predictors for Regression Analysis", with Dean Foster and Robert Stine (draft 10/18/2013)
"Development of pitch contrast and Seoul Korean intonation: A corpus study", with Sunhye Cho and Yong-cheol Lee.
"Comparison of Category and Letter Fluency Tasks Through Automated Analysis", with Carmen Gonzalez-Recober, Naomi Nevler, Sanjana Shellikeri, Katheryn Cousins, Emma Rhodes, Murray Grossman, & Sunghye Cho.

Some Grants over the years:

Mining a Year of Speech, sponsors NSF and JISC (award amount $100,000 to Penn and £100,000 to Oxford), January 2010-March 2011, with Jiahong Yuan and Chris Cieri (Penn) and John Coleman (Oxford). See "Mining Years and Years of Speech: Final Report of the Digging into Data project 'Mining a Year of Speech'", 2011.

New tools and methods for very-large-scale phonetics research, sponsor NSF (award amount $450,000), June 2010-May 2013, with Jiahong Yuan (Penn), Susan Davidson (Penn) and Andreas Stolcke (SRI/Berkeley).

Language Preservation 2.0: Crowdsourcing Oral Language Documentation using Mobile Devices, sponsor NSF (award amount $101,501), June 2012-May 2014, with Steven Bird.

SIREN-IL: Specialized Intra/Interlingual Resources for Emergent News - Incident Languages (LORELEI), sponsor DARPA, (award amount $12,234,739), 8/5/2015-12/31/2019.

Sonic Viper, sponsor Maryland Procurement Office, 9/30/2015-3/31/2020, (Award amount $7,154,487; "Create linguistic corpora to foster innovative ideas and techniques in the HLT research community at large").

NIEUW: Novel Incentives and Workflows in Linguistic Data Collection and Annotation, sponsor NSF (award amount $1,218,465), July 2017 to June 2023, with Chris Cieri and Chris Callison-Burch.

CAIRO-MS: Conflicting Account Information Resources in Omnivorous Media Streams, sponsor DARPA (award amount $4,159,847) 12/13/2017-6/13/2022.

Automated Speech Analysis in FTD Spectrum Disorders, sponsor DoD/USAMRDC (award amount $1,945,670), 06/01/2020-05/31/2023, with Penn Frontotemporal Degeneration Center.

Annotated Data for the Investigation of Facets of Personality (AnnoDIFP), (award amount $1,163,265.50), subcontract to Florida Institute of Technology, 8/6/18-8/15/2024.

Voice Biomarkers, Roche Pharmaceuticals, (award amount $226,387.00), 10/15/19-2/1/2023 (extended due to pandemic).

Digitizing Human Vocal Interaction to Understand and Diagnose Autism, NIH (award amount $682,099.00), 6/1/20-5/31/2023.


Linguistics, Language and the Public Award, by the Linguistic Society of America, "given for a body of work that has had a demonstrable impact on the public awareness of language and/or linguistics", to Language Log 2009

Antonio Zampolli Prize, by the European Language Resources Association, "awarded to individuals whose work lies within the areas of Language Resources and Language Technology Evaluation with acknowledged outstanding contributions", 2010.

IEEE James L. Flanagan Speech and Audio Processing Award, "For pioneering contributions and continued leadership in robust, replicable, and data-driven speech and language science and engineering", 2017.