Kyle Rawlins (Johns Hopkins University) will be the "Thursday Thoughts" speaker on February 4th.
Standard approaches to thematic roles in linguistic theory and in computational settings face two well-known problems (Dowty 1991, Levin and Rappaport Hovav 1995): (i) role characterization, the challenge of coming up with precise necessary and sufficient conditions for thematic roles while avoiding (ii) role fragmentation, the splintering of a small list of intuitive thematic roles into a very large set of roles where generalizations are harder to find. Dowty (1991) proposed that, for purposes of generalizations about mapping to syntactic role, thematic roles should be construed as something more like prototype structures ("proto-roles"). However, with a few exceptions, Dowty's theory has proven hard to assess. In this talk I present work that evaluates Dowty's proto-role hypothesis on a large scale. First, building on work by Kako (2006), I show a method for collecting large data sets of proto-role labels via crowdsourcing from untrained annotators. Second, I show that the results of this dataset validates something like Dowty's proto-role hypothesis, and presents substantial challenges for a non-fragmented categorical thematic role approach. Finally, I briefly sketch a model for discovering "natural semantic roles" from proto-role annotations, addressing an important theoretical question that has been open since Dowty: how many proto-roles do we need? This model is also compared directly against models trained on other role annotations (PropBank and VerbNet) in a task centered around learning a solution to the linking problem (cf. Lang and Lapata 2010), demonstrating that a model trained on semantic proto-role labeling (SPRL) can match or outperform models that use categorical roles.
Joint work with: Frank Ferrarro, Drew Reisinger, Rachel Rudinger, Ben Van Durme, Aaron Steven White