Lecture16-ComplexSystems2

Lecture16-ComplexSystems2 - Psych 215L: Language...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Psych 215L: Language Acquisition Lecture 16 Complex Systems Computational Problem: Figure out the order of words (syntax) Jareth juggles crystals Subject Verb Object German English Subject Verb Object Subject Verb Subject Object Kannada Subject Object Verb Object Remember: Children only see the output of the system (the observable word order of Subject Verb Object) and have to reverse engineer the generative process behind it. Similarities & Differences: Parameters Chomsky: Different combinations of different basic elements (parameters) would yield the observable languages (similar to the way different combinations of different basic elements in chemistry yield many differentseeming substances). Thinking About Syntactic Variation Big Idea: A relatively small number of syntax parameters yields a large number of different languages’ syntactic systems. Verb Similarities & Differences: Parameters Similarities & Differences: Parameters Chomsky: Different combinations of different basic elements (parameters) would yield the observable languages (similar to the way different combinations of different basic elements in chemistry yield many differentseeming substances). Chomsky: Different combinations of different basic elements (parameters) would yield the observable languages (similar to the way different combinations of different basic elements in chemistry yield many differentseeming substances). Big Idea: A relatively small number of syntax parameters yields a large number of different languages’ syntactic systems. Big Idea: A relatively small number of syntax parameters yields a large number of different languages’ syntactic systems. 5 different parameters of variation 2 different parameter values of one parameter Similarities & Differences: Parameters Chomsky: Different combinations of different basic elements (parameters) would yield the observable languages (similar to the way different combinations of different basic elements in chemistry yield many differentseeming substances). Similarities & Differences: Parameters Big Idea: A relatively small number of syntax parameters yields a large number of different languages’ syntactic systems. Big Idea: A relatively small number of syntax parameters yields a large number of different languages’ syntactic systems. Total languages that can be represented = 25 = 32 English Japanese Tagalog Navajo French … Learning Language Structure Learning Language Structure Chomsky: Children are born knowing the parameters of variation (and also potentially what values that can have). This is part of Universal Grammar. Input from the native linguistic environment determines what values these parameters should have. Chomsky: Children are born knowing the parameters of variation (and also potentially what values that can have). This is part of Universal Grammar. Input from the native linguistic environment determines what values these parameters should have. English Learning Language Structure Learning Language Structure Chomsky: Children are born knowing the parameters of variation (and also potentially what values that can have). This is part of Universal Grammar. Input from the native linguistic environment determines what values these parameters should have. Chomsky: Children are born knowing the parameters of variation (and also potentially what values that can have). This is part of Universal Grammar. Input from the native linguistic environment determines what values these parameters should have. Japanese Navajo Yang (2004): Learning Complex Systems Like Language Yang (2004): Learning Complex Systems Like Language But obviously language is learned, so children can’t know everything beforehand. How does this fit with the idea of innate biases/knowledge? Only humans seem able to learn human languages Something in our biology must allow us to do this. Observation: we see constrained variation across languages in their sounds, words, and structure. The knowledge of the ways in which languages vary is children’s innate knowledge. This is what Universal Grammar is: innate biases for learning language that are available to humans because of our biological makeup (specifically, the biology of our brains). English Chomsky Navajo Children know parameters of language variation…which they use to learn their native language Yang (2004): Learning Complex Systems Like Language The big point: even if children have innate knowledge of language structure, we still need to understand how they learn what the correct structural properties are for their particular language. One idea is to remember that children are good at tracking statistical information (like transitional probabilities) in the language data they hear. Yang (2004): Learning Complex Systems The linguist-psychologist breakdown Linguists Characterize “scope and limits of innate principles of Universal Grammar that govern the world’s languages”. Psychologists Emphasize the “role of experience and the child’s domain-general learning ability”. David Lightfoot Michael Tomasello English Children know parameters of language variation…which they use to learn their native language Elizabeth Bates Noam Chomsky Navajo Stephen Crain Brian MacWhinney Yang (2004): Learning Complex Systems Statistics for word segmentation (remember Gambell & Yang (2006)) “Modeling shows that the statistical learning (Saffran et al. 1996) does not reliably segment words such as those in child-directed English. Specifically, precision is 41.6%, recall is 23.3%. In other words, about 60% of words postulated by the statistical learner are not English words, and almost 80% of actual English words are not extracted. This is so even under favorable learning conditions”. Unconstrained (simple) statistics: not so good. If statistical measure is constrained by language-specific knowledge (words have only one main stress), performance increases dramatically: 73.5% precision, 71.2% recall. Yang (2004): Learning Complex Systems Combining statistics with Universal Grammar A big deal: “Although infants seem to keep track of statistical information, any conclusion drawn from such findings must presuppose that children know what kind of statistical information to keep track of.” Ex: Transitional Probability P(pa | da )? …of rhyming syllables? …of syllables with nasal consonants? …of syllables of the form CV (ba, ti)? Constrained statistics - much better! Linguistic Knowledge for Learning Structure Parameters = constraints on language variation. Only certain rules/patterns are possible. This is linguistic knowledge. A language’s grammar = combination of language rules = combination of parameter values Idea: use statistical learning to learn which value (for each parameter) that the native language uses for its grammar. This is a combination of using linguistic knowledge & statistical learning. Yang (2004): Variational Learning Idea taken from evolutionary biology: In a population, individuals compete against each other. The fittest individuals survive while the others die out. How do we translate this to learning language structure? Yang (2004): Variational Learning Idea taken from evolutionary biology: In a population, individuals compete against each other. The fittest individuals survive while the others die out. Yang (2004): Variational Learning Idea taken from evolutionary biology: A child’s mind consists of a population of grammars that are competing to analyze the data in the child’s native language. How do we translate this to learning language structure? Population of Grammars Individual = grammar (combination of parameter values that represents the structural properties of a language) Fitness = how well a grammar can analyze the data the child encounters Yang (2004): Variational Learning Intuition: The most successful (fittest) grammar will be the native language grammar because it can analyze all the data the child encounters. This grammar will “win”, once the child encounters enough native language data because none of the other competing grammars can analyze all the data. Variational Learning Details At any point in time, a grammar in the population will have a probability associated with it. This represents the child’s belief that this grammar is the correct grammar for the native language. Prob = ?? Prob = ?? Native language data point “It’s raining.” This grammar can analyze the data point while the other two can’t. Prob = ?? Variational Learning Details Before the child has encountered any native language data, all grammars are equally likely. So, initially all grammars have the same probability, which is 1 divided the number of grammars available. Variational Learning Details As the child encounters data from the native language, some of the grammars will be more fit because they are better able to account for the structural properties in the data. Prob = 1/3 Prob = 1/3 Prob = 1/3 If there are 3 grammars, the initial probability for any given grammar = 1/3 Variational Learning Details After the child has encountered enough data from the native language, the native language grammar should have a probability near 1.0 while the other grammars have a probability near 0.0. Other grammars will be less fit because they cannot account for some of the data encountered. Grammars that are more compatible with the native language data will have their probabilities increased while grammars that are less compatible will have their probabilities decreased over time. Prob = 0.0 Prob = 0.0 1/3 --> 1/20 1/3 --> 3/20 Variational Learning Details How do we know if a grammar can successfully analyze a data point or not? Example: Suppose Prob = 1.0 1/3 --> 4/5 is the subject-drop parameter. is +subject-drop, which means the language may Prob = 1/3 optionally choose to leave out the subject of the sentence, like in Spanish. is -subject-drop, which means the language must always have a subject in a sentence, like English. Prob = 1/3 Prob = 1/3 Here, one grammar is +subject-drop while two grammars are -subject-drop. Variational Learning Details Variational Learning Details How do we know if a grammar can successfully analyze a data point or not? How do we know if a grammar can successfully analyze a data point or not? Example data: Vamos = coming-1st-pl = “We’re coming” Example data: Vamos = coming-1st-pl = “We’re coming” The +subject-drop grammar is able to analyze this data point Prob = 1/3 as the speaker optionally dropping the subject. Prob = 1/3 Prob = 1/3 The -subject-drop grammars cannot analyze this data point since they require sentences to have a subject. Important idea: From the perspective of the subject-drop parameter, certain data will only be compatible with +subject-drop grammars. These data will always reward grammars with +subject-drop and always punish grammars with -subject-drop. Certain data always punish -subject-drop grammar(s). 1/3 --> 1/4 1/3 --> 1/2 1/3 --> 1/4 The -subject-drop grammars would have their probabilities decreased if either of them tried to analyze the data point. Variational Learning Details Certain data always reward +subject-drop grammar(s). The +subject-drop grammar would have its probability increased if it tried to analyze the data point. 1/3 --> 1/4 1/3 --> 1/2 1/3 --> 1/4 These are called unambiguous data for the +subject-drop parameter value because they unambiguously indicate which parameter value is correct (here: +subject-drop) for the native language. The Power of Unambiguous Data Unambiguous data from the native language can only be analyzed by grammars that use the native language’s parameter value. This makes unambiguous data very influential data for the child to encounter, since it is incompatible with the parameter value that is incorrect for the native language. Ex: the -subject-drop parameter value is not compatible with sentences that drop the subject. So, these sentences are unambiguous data for the +subject-drop parameter value. Important to remember: To use the information in these data, the child must know the subject-drop parameter exists. Yang (2004): Learning Complex Systems Learning Parametric Systems: Variational Learning Yang (2004): Learning Complex Systems Learning Parametric Systems: Variational Learning Grammars compete against each other to see which can best analyze the available data. Grammars compete against each other to see which can best analyze the available data. 0.2 0.3 0.8 0.7 0.1 Parameterized Grammars Added perk: Learning is then gradual (probabilistic). Problem: Do unambiguous data exist for entire grammars? This requires data that are incompatible with every other possible parameter of every other possible grammar…. This algorithm can take advantage of the fact that grammars are really sets of parameter values. 0.8 0.7 0.2 0.3 0.9 Parameter values can be probabilistically accessed. Prob = .2*.7*.2*.7*.9 Prob = .2*.3*.2*.3*.1 Yang (2004): Learning Complex Systems Learning Parametric Systems: Variational Learning Prob = .8*.7*.2*.7*.1 Yang (2004): Learning Complex Systems Learning Parametric Systems: Variational Learning Grammars compete against each other to see which can best analyze the available data. Grammars compete against each other to see which can best analyze the available data. 0.2 0.3 0.8 0.7 0.1 The Learning Algorithm For each data point d encountered in the input Choose a grammar probabilistically from available grammars by probabilistically accessing the parameter values. 0.3 0.2 0.7 0.8 0.0 The Learning Algorithm 0.8 0.7 0.2 0.3 0.9 For each data point d encountered in the input 0.7 0.8 0.3 0.2 1.0 Choose a grammar probabilistically from available grammars by probabilistically accessing the parameter values. If this grammar can analyze the data point, increase the probability of all participating parameters values slightly (reward) successful analysis Yang (2004): Learning Complex Systems Learning Parametric Systems: Variational Learning Yang (2004): Learning Complex Systems Learning Parametric Systems: Variational Learning Grammars compete against each other to see which can best analyze the available data. Grammars compete against each other to see which can best analyze the available data. 0.1 0.4 0.1 0.6 0.2 The Learning Algorithm For each data point d encountered in the input Problem ameliorated: unambiguous data much more likely to exist for individual parameter values instead of entire grammars. 0.9 0.6 0.9 0.4 0.8 Choose a grammar probabilistically from available grammars by probabilistically accessing the parameter values. If this grammar can analyze the data point, increase the probability of all participating parameters values slightly (reward) unsuccessful analysis Else decrease the probability of all participating parameters values slightly (punish) Yang (2004): Learning Complex Systems Variational Learning: Sample Case Yang (2004): Learning Complex Systems Variational Learning: Sample Case Null subjects: Null subjects: Parameter 1: Pro-drop, rely on unambiguous subject-verb agreement Ex: Spanish, Italian (+pro-drop) Ex: English (-pro-drop) √ Yo puedo cantar. I can-1st-sg sing-inf ‘I can sing’ √ Puedo can-1st-sg ‘I can sing’ cantar. sing-inf √ Hay lluvia. Is-3rd-sg rain “There is rain” √ Parameter 1: Topic-drop, drop subject/object if discourse topic Ex: Chinese (+topic-drop) Ex: English (-topic-drop) I can sing (Topic = Jareth) √ x * Can sing x * Is rain √ There is rain. Mingtian guiji hui xiayu. Tomorrow estimate will rain ‘It is tomorrow that Jareth believes it will rain’ x *It is tomorrow that believes will rain. Yang (2004): Learning Complex Systems Variational Learning: Sample Case Yang (2004): Learning Complex Systems Variational Learning: Sample Case Null subjects: 2 binary parameters, 4 grammars Null subjects: 2 binary parameters, 4 grammars +pro-drop, +topic-drop Warlpiri, American Sign Language +pro-drop, -topic-drop Italian, Spanish +pro-drop, +topic-drop Warlpiri, American Sign Language +pro-drop, -topic-drop Italian, Spanish -pro-drop, +topic-drop Chinese -pro-drop, -topic-drop English -pro-drop, +topic-drop Chinese -pro-drop, -topic-drop English What happens for an English-learning child? What happens for an English-learning child? Pro-drop languages usually depend on rich subject-verb agreement morphology. English doesn’t have that, which is something a child will easily notice. Knock out +pro-drop grammars. Yang (2004): Learning Complex Systems Variational Learning: Sample Case Yang (2004): Learning Complex Systems Variational Learning: Sample Case Null subjects: 2 binary parameters, 4 grammars Null subjects: Prediction if kids take awhile to notice English is -topic-drop +pro-drop, +topic-drop Warlpiri, American Sign Language +pro-drop, -topic-drop Italian, Spanish -pro-drop, +topic-drop Chinese -pro-drop, -topic-drop English English kids use +topic-drop (Chinese-style) grammar until they encounter enough expletives to notice that English does not optionally drop topics. Property of Chinese-style grammar: Can drop both subjects and objects What happens for an English-learning child? But this still leaves the +topic-drop option. What data will rule that out? Answer: Expletive subjects. (Can’t topic-drop them.) “There’s a goblin in the castle.” But this only occurs in 1.2% of the “It’s raining outside.” data. (fairly rare) Prediction: When English children use +topic-drop grammar, they will drop subjects and objects at the same relative rate that +topic-drop (Chinese) children do Same rate: English children using Chinese grammar? Yang (2004): Learning Complex Systems Additional Evidence for the importance of (un)ambiguity Variational Learning: General Predictions The time course of when a parameter is set depends on how frequent the necessary evidence is in child-directed speech. Parameters set early: more unambiguous data Parameters set late: less unambiguous data Parameters set at the same time: equal quantity of unambiguous data Hadley, Rispoli, Fitzgerald, & Bahnsen (2010): input informativity (how much ambiguity in the input) is the most consistent predictor for morphosyntactic growth. Pelham (2011): input ambiguity affects how children acquire pronoun forms (“It appears children may be sensitive to levels of ambiguity such that low ambiguity may aid error-free acquisition, while high ambiguity may blind children to case distinctions, resulting in errors.”) Another case study for variational learning Another case study for variational learning Explain why children’s early output consistently contains “optional infinitives” (OIs) that are ungrammatical in the adult language. They produce these incorrect forms at the same time that they produce correct “finite” forms. Note: Not just a matter of shortening the word form – sometimes, the incorrect form is actually longer (French, Dutch). Also, the word order sometimes changes (Dutch). This seems likely to be the result of some process happening in the child’s mind, rather than simple production error. English Correct: Occasional output: French Input: “Mummy goes to work.” “Mummy go to work” “La poupée The Occasional output: doll “La poupée dormir” The doll Dutch Input: “Ik eet I Occasional output: dort.” sleep-3rd-sg “Ik I eat-3rd-sg ijs sleep-inf ijs.” ice cream eten” ice cream eat-inf One explanation: Variational Learning Model Legate & Yang (2007) Grammar options: +Tense (English) vs. –Tense (Mandarin Chinese) OI errors results because initial hypothesis is –Tense. This lessens over time when unambiguous +Tense data are observed. +Tense unambiguous data: Morphological marking he goes home Prediction: Morphologically rich languages like Spanish have a very short OI stage because a large proportion of the input rewards +Tense (and punishes –Tense). Morphologically poor languages like English have a longer OI stage because only a small proportion of the input rewards the [+Tense] grammar (and punishes –Tense). Another explanation: MOSAIC model Freudenthal et al. (2010) Model of Syntax Acquisition in Children: “MOSAIC is a constructivist model of language learning, with no built-in knowledge of syntactic categories or rules, which is implemented as a working computational model.” – Algorithmic level? “MOSAIC takes as input corpora of child- directed speech and learns to produce as output ‘child-like’ utterances that become progressively longer as learning proceeds…input corpora are fed through the model multiple times.” Input: “He will” “He wants” “Go home” “Go away” One explanation: Variational Learning Model Legate & Yang (2007) Languages tested: English, French, Spanish Observed behavior seems to match unambiguous input distributions OI duration: English (high) > French (moderately high) >> Spanish (very low) +Tense unambiguous data: English > French >> Spanish Possible critique (from Freudenthal et al. 2010) Too easy because rates of OI are very different. What about Dutch and German, who have OI rates that are moderately high? Another explanation: MOSAIC model Freudenthal et al. (2010) - has a strong utterance-final bias in learning “MOSAIC does not encode a word or phrase unless everything that follows that phrase has already been encoded in the network.” - has a weak utterance-initial bias in learning “The utterance-initial bias enables MOSAIC to associate utterance-initial words and short (frequent) phrases with (longer) utterance-final phrases.” - represents declaratives and questions separately (so no underlying linkage between these forms) Who could you see? has no relation to You could see him. Another explanation: MOSAIC model Another explanation: MOSAIC model Freudenthal et al. (2010) Where OI errors come from: Compound finites Freudenthal et al. (2010) Where OI errors come from: Compound finites English: He can go home. English: He can go home. “Go home” utterance-final bias Another explanation: MOSAIC model Another explanation: MOSAIC model Freudenthal et al. (2010) Where OI errors come from: Compound finites Freudenthal et al. (2010) Where OI errors come from: Compound finites English: He can go home. “Go home”, “He go home” utterance-final bias + weak utterance-initial bias + linking English: He can go home. “Go home”, “He go home” utterance-final bias + weak utterance-initial bias + linking Dutch (+ changed word order): Hij wil ijs eten. He wants ice cream eat-inf “He wants to eat ice cream.” Another explanation: MOSAIC model Another explanation: MOSAIC model Freudenthal et al. (2010) Where OI errors come from: Compound finites Freudenthal et al. (2010) Where OI errors come from: Compound finites English: He can go home. “Go home”, “He go home” utterance-final bias + weak utterance-initial bias + linking English: He can go home. “Go home”, “He go home” utterance-final bias + weak utterance-initial bias + linking Dutch (+ changed word order): Dutch (+ changed word order): Hij wil ijs eten. “Ijs eten” He wants ice cream eat-inf “He wants to eat ice cream.” utterance final bias Freudenthal et al. (2010) Concluding Thoughts “…it is clear that both the VLM and MOSAIC do a relatively good job of predicting the cross-linguistic data…if we focus on the results of the second set of analyses, it is clear that there are important lexical effects on the distribution of OI errors in children’s speech that are difficult for the VLM to explain…” “…A more lexically oriented input-driven account could probably deal with this problem relatively easily by simply distinguishing between what the child is learning about copulas and auxiliaries and what the child is learning about lexical verbs, and predicting high levels of OI errors on lexical verbs and lower levels of OI errors on copulas and auxiliaries. Interestingly, this is exactly the pattern of results reported in two recent lexically oriented analyses of early child English (Wilson, 2003; Pine, Conti-Ramsden, Joseph, Lieven & Serratrice, 2008).” Hij wil ijs eten. “Ijs eten”, “Hij ijs eten” He wants ice cream eat-inf “He wants to eat ice cream.” utterance final bias +weak utterance initial bias + linking ...
View Full Document

Ask a homework question - tutors are online