Adam Albright, MIT
Friday, December 12th at 4 pm, Stevenson Fireside Lounge
Testing phonological biases with Artificial Grammar learning experiments
Go back to ColloquiaAs with most linguistic input, the data that children receive about phonological patterns is rife with ambiguity. For example, children hearing voicing alternations in German ([diːp] ~ [diːbə] ‘thief-sg./pl.’, [maʊs] ~ [mɔɪzɐ] ‘mouse-sg./pl.’) receive no evidence as to whether a single final devoicing process affects all word-final obstruents, or just the subset of obstruents that German happens to have, or whether separate processes affect different subsets of segments. Thus, the data radically underdetermines the analysis (poverty of the stimulus), and learners must employ prior biases in order to favor one analysis over another. By observing how speakers extend alternations to novel words, strings, and segments, it is possible to gain insight into these biases—e.g., a preference for simpler rules may lead them to generalize devoicing to as broad a class of segments as possible, while a preference for typologically common rules favor generalization of devoicing to other obstruents, but not to sonorants. In this talk, I present experimental evidence test three types of bias: (1) a bias against alternations, favoring uniform paradigms (McCarthy 1998); (2) a bias in favor of alternations that target broader classes of segments (Peperkamp et al. 2006); (3) a substantive bias against perceptually salient alternations (Steriade 2001).
Learners’ biases were probed using Artificial Grammar experiments, in which adult English speakers were taught singular~plural pairs in a “Martian language”, and were then asked to produce or rate plural forms. In all of the languages reported here, obstruent-final stems exhibited voicing or continuancy alternations (dap~dabi, brup~brufi). A premise of learning biases is that the less data learners have received, the more their behavior will reflect prior biases. In the first set of experiments, we manipulated the amount of data that learners received by varying the frequency of alternations across different segments, in order to test how generalization changes with increasing amounts of data. For example, if learners are biased to expect non-alternation, we expect fewer alternating responses for languages with less data about obstruents, and for rarer segments within a language. If learners expect alternations to apply to broad classes of segments, we expect processes affecting attested segments to be generalized to unattested or rarer segments. Finally, if learners are biased to expect certain alternations (e.g., voicing) over others (e.g., continuancy), we expect participants to generalize preferred alternations at higher rates than dispreferred alternations. In a second set of experiments, we independently manipulated the evidence that learners received for alternation and also non-alternation, in order to test whether the preference for non-alternation is purely a prior bias (OO-Faith » Markedness), or whether it is learned on the basis of data from non-alternating paradigms. The results show that increasing the number of training items with alternating paradigms significantly increases the probability of choosing alternations in the test phase, while increasing the number of items with non-alternating paradigms does not increase the probability of selecting a uniform paradigm. Thus, the results are generally support a prior bias for uniform paradigms.
These results can be modeled accurately using a maximum entropy (maxent) grammar of weighted constraints. Three properties of maxent models make them well suited to modeling the observed biases. First, the set of prior/innate constraints is a parameter of the model, and by including correspondence (faithfulness) constraints in the grammar, it is possible to model an expectation for non-alternation. By specifying prior distributions over constraint weights, we can model an initial bias to obey certain constraints (such as faithfulness) at the expense of others. Finally, it is possible to specific different distributions for different constraints, reflecting the fact that learners demote some constraints more readily than others. This allows us to model the fact that participants favor alternations that target broad classes of segments, and favor certain alternations over others.
(Joint work with Youngah Do, Georgetown University)