Sunday, July 31, 2005

How Children Learn Language - Grammar 1

So I will be taking a seminar next semester which is intended to be a debate about how children learn language among different schools of thought. But I haven't studied language acquisition for a couple years now, so I am trying to read some items to get ready. Here are my notes on the first one. If you are interested in exactly how kids figure out the language they speak, then read on. If not, ummm, then that's not good. This is going to be much longer than notes to myself would be, but I am also using this as a sort of Ling 102 practice session.

The things that children learn to do are astounding. Anyone who has seen a baby, especially one a few weeks old, and then sees them again months or years later, can be captivated by the enormous changes that have occurred. Many of these changes are physical, but just as many are mental. One of the most amazing feats that children accomplish is that within a few short years they learn to speak. They learn to speak any language that they are born into. Consider how hard it was when you studied a language in class, high school or college, then think about how a child can pretty much speak like an adult at the age of 6. They might not be able to tie their shoes or stay at home after school safely, but they can string all these words together, using their tongues and throat, or hands and arms for some, and talk to us.

This fact has captivated linguists for years. A small child can speak any language on earth, but the most advanced computers can't really speak any yet. What exactly makes the child so special? Additionally, some linguists have spent years and years teaching language to animals of other species, the most famous and successful being Savage-Rumbaugh's work with bonobos and chimps. First, those animals do more than we ever thought possible - hundreds of words, putting some of the words together in novel combinations - but they are still surpassed by small children. (The bonobos are being taught sign language, as it fits their physiology better, but healthy children whose native language is signed are far more eloquent.) Noam Chomsky, far and away the most famous linguist of the 20th century (and also into politics if you read that sort of stuff) hypothesized that these special abilities of children are, in fact, special. He argued for a special language organ that is genetically specified and unique to humans. Our DNA literally instructs us on how language works, and that is why small children can outdo supercomputers and every other animal on earth. For a very popular account of language working in this Chomskian tradition, see Steven Pinker's The Language Instinct.

Actually, the reasons for Chomsky's assertion of this Language Organ or Language Device were quite different from the reasons I gave. He argued that from the data presented to children, it was impossible for them to learn language, and yet all healthy children across the world do. This argument has been called the "Poverty of the Stimulus" or the "Logical Problem of Language Acquisition." Since it is impossible to learn language from what children hear, then they must in a sense already know it. Philosophers among you should now be thinking of Plato's theories of knowledge. So, argues Chomsky, a child learning English is less like an adult learning basket-weaving, and more like an organ that grows in the brain, according to principles of biological development.

This blog's topic is actually my notes from an article by Brain MacWhinney, a psychologist at Carnegie Mellon, called "A multiple process solution to the logical problem of language acquisition" (MacWhinney 2004 Journal of Child Language, 31, 883-914.) The purpose of the article is to show that there really is no "logical problem" that requires a language organ to get around. In other words, it is to show that the Chomsky stuff above is all wrong. Looking at the title of the article again, MacWhinney will demonstrate that the logical problem of language isn't all that big, and what is left of it can be handled with a host of cogntive process that are not language-specific at all, but general to the way humans think. Before we jump into it, I should say that we are not going to talk about all the things children learn about language. After all, they memorize words, how the words sound, what the words mean, when to use the words, etc. This article is almost entirely concerned with how the words go together; i.e., the syntax or grammar of language. So, this deals with the question of how English speaking kids learn that "John likes dogs" is OK and "John dogs likes" is not OK. (But note that it is perfectly fine in many other languages, such as Japanese and German, assuming the words are Japanese or German words of course.)

OK, so MacWhinney's first task is to define what the supposed logical problem is, in order to get rid of it. Chomsky and his tradition (the dominant school of linguistics) have several arguments for the logical problem.

1) Most of what children actually hear is not grammatical. There are mistakes, retracings, and missing words. They cannot learn correct grammar from incorrect input. MacWhinney cites articles briefly that demonstrate that care-giver language is as grammatically correct as searching formal corpuses as the Wall Street Journal, so this goes away quick.
2) Beyond the bad input children receive, they also rarely get corrective advice - negative evidence. No one says 'no' when the child makes a mistake. Even when someone does provide negative evidence, the children appear to ignore it. (The lit is full of amusing dialogs between parents and children where a parent attempts to correct a child's speech, and the child blithefully ignores them.) Gold (1967) provides a proof that negative evidence is absolutely necessary for a child to learn grammar. He takes as a model that the child will be attempting to put words together to see what works. When the child does this, she is trying different possible grammars of English to see which is the correct one. Gold demonstrated that if you have at least one non-finite grammar as a possibility, then you can never rule it out with positive evidence alone. The non-finite grammar will generate all of the sentences which the child knows are valid, because she hears them, but that grammar will also allow sentences which are, in fact, ungrammatical in the language. If no one ever says 'no' then simply hearing more sentences will never get the child to the correct language. The key here is that hearing the language only - positive evidence - is insufficient. Negative evidence is required, but, says Chomsky et al, children don't get negative evidence.
3) Chilren are able to say some sentences that are good, valid sentences in their language, but they never hear anyone say similar sentences. In fact, Chomsky states, they use such sentences error free. The argument, if all the sentences are true, is strong. It says a) there is such and such type of grammar, b) children use this grammatical stucture without error, c) children never hear such structures; d) therefore there is some innate mechanism guiding them. MacWhinney runs through Chomsky's example of "structural dependence." This has to do with what auxiliary words can be moved to make a sentence a question. To take the most basic, take 'is' from 'Paca is crazy' and move it to the front making 'is Paca crazy?" The child might think the rule is "take the first auxililary and move it to the front." But English grammar is not so simple. Take the case where there is a relative clause inside the sentence:

(1) The man who is running is angry.

You can only move the second 'is' to the front to make a grammatical sentence.

(2) Is the man who is running angry?

You cannot say

(3) Is the man who running is angry?

Notice that in (3) we moved the first 'is' but it makes no sense. The ability to move words does not depend on what comes first but on the grammatical structure the words are parts of.

Chomsky claims that 1) children use this structure error-free but 2) never hear such sentences.

MacWhinney, who is the maintainer of the largest child language corpus (and which did not exist when Chomsky first came up with the argument), decided to do a search on child language data to see if children really never hear this sentence. His search agreed with Chomsky that it was exceedingly rare, but he found examples of parallel types of senteces which children hear all the time. Questions with wh-words are quite common and have the same rule on auxiliaries. See:

(4) Where is the man who is running?

If the child can see that this is the same pattern, then the data is in fact abundant about how to move auxiliaries. In other words, Chomsky's claim that children have no evidence for how to do this just isn't true. MacWhinney then moves through similar arguments for other grammatical structures and shows that in truth there is positive evidence. He also discusses how pronouns relate to nouns (binding conditions for the UG folk) and shows that neither do children produce these structures error-free.

So at this point, MacWhinney is asserting that children don't learn structures error-free and that they do have good positive evidence for their language's grammar. However, note that this leaves Gold's objections largely in tact, i.e., that children must have negative evidence ('no') to rule out certain grammars. MacWhinney's next task then is to demonstrate that a child can in fact learn language from positive evidence only. Now we get to the 'multiple process' part of the solution to the logical problem.

An undercurrent through-out this section of the paper, though MacWhinney never really makes it a bullet point, is that the Gold/Chomsky framing of what children do to learn grammar is wrong. The Chomsky 'generative' tradition always has the child trying out enormous sets of possible grammars. The child then requires enormous evidence to reign herself in - to show that large numbers of the grammars she is trying are incorrect. MacWhinney argues that children never really do this at all, due to a number of cognitive buffers. Here are his 7 solutions:

1) Gold argued that negative evidence was required to back off from a non-finite grammar. There has been work in the last 30 years though, which could indicate that human language is in fact finite, not non-finite. He cites Hausser's work with left-associative grammars, and Kanazawa's work with categorial grammars called k-valued grammars. I have no idea what these grammars are, but the point is clear. If these finite grammars are adequate to describe human language, then Gold's worries about non-finite grammars is irrelevant - for children at least.

2) The Chomskian tradition has always believed grammar is the manipulations of discrete symbols. However, there is evidence that grammar is actually probabilistic - it is 95% likely that a noun phrase will be followed by a Verb Phrase is probalistic. Again, this makes a huge mathematical difference, as it has been shown that probabilistic grammars are learnable with positive evidence alone, again getting us around Gold. Amusingly, MacWhinney states that 'it is surprising that this solution has not received more attention,' and yet this is the shortest section of his paper as well. Fortunately, for us, others have not ignored this. A book called Probabilistic Linguistics, edited by Hay, Jannedy, and Bod sits upon my shelf. It makes the same point that a probabilistic grammar is learnable, getting around Chomsky.

Points 1 and 2 make the same point. If you conceive of what grammar is differently than Chomsky does, the logical problem of acquiring said grammar vanishes. Of course, MacWhinney neither establishes that these grammars are sufficient to describe human language or shows how k-valued and probabilistic grammars go together. But then, he only has 20 pages or so here.

3) The next solution has to do with that idea of children generating large numbers of possible grammars, each of which, other than the correct one, must be ruled out. He argues for a principle of conservatism by which children only hypothesize the least-powerful grammar that their evidence allows. Such a conservative principle will not rule out every incorrect grammar by itself, but it certainly restricts the problem to an enormous degree. MacWhinney particularly mentions his item-based theory of acquisition. By this theory, when a child hears a sentence, such as "Bill gives a gift to John," they learn "giver+give+gift+to+recipient." Nothing more general. Notice that the very word 'give' is listed there, not any group of words such as 'verb', which would create tons of errors - *"Bill salutes a gift to John." So the item-based concept of learning is very conservative, and hence restricts the grammars significantly. Such an item-based grammar is extended to a large degree through analogy. More on that next.

4) MacWhinney argues that 'all errors can be viewed as cases of overapplication of productive patterns.' So like the give/salute thing above. The child learned a correct sentence of English, but then over-generalized the pattern. A verb like 'salute' cannot go into such a pattern. If this is the sort of errors left to solve, then MacWhinney has mechanisms to solve them. His main one is the idea of Competition.

(A side note before proceeding from me. Isn't not extending a productive pattern equally a problem? So instead of producing too much and getting it wrong, the child never makes the connection and never extends his grammar. This would have to be tested. My thoughts on how to test for underproductivity: a) child must possess the syntactic frame for another word (say, a double object pattern with 'give'); b) the child must know another word and know the meaning of the word which could go in his syntactic frame c) the child must be paying attention and yet 4) not figure out that the other word can go in that sentence. Main point, however, is that underproduction might be an error as well, and figuring out how children escape that one could be as enlightening as learning how to escape overgeneralization.)

MacWhinney sees two competing forces in language acquisition. On the one side is the force of analogy, where a child thinks that this word or this syntactic structure is kinda like this one, maybe I can try it too? On the other side is simple rote memorization. Often these two forces compete. MacWhinney discusses the word *'goed.' Analogy with regular verbs of English will create pressure to try sticking an -ed on the end of 'go' to make it past tense. On the other hand, the child will be hearing 'went.' In time, the strength of 'went' will be greater than 'goed' and 'went' will win out. MacWhinney never really discusses it as a central topic, but he very likely has neural network modeling in mind here in this discussion, especially networks that use competitive learning algorithms, such as the winner take all method, which have exhibited many of the features of human language learning. The main problem with this section is that it is only 3 pages or so, and we are left wondering how well competition really works as a model. It's a very solid idea, but.... evidence?

5) A fifth solution is that of cue construction. Honestly, this part is rather confusing. I think MacWhinney is saying that sometimes Competition is not enough, and the child will have to hypothesize additional cues which help them resolve problems. These cues appear to be nuanced semantic distinctions in the examples.

6) A sixth method that children could use to fix problems of over-generalization is self-monitoring. This monitoring actually helps to strengthen one option over another, as they block mistakes.

One very intriguing idea comes up in this section. Quoting (2004:907-8) "Berwick (1987) found that syntactic learning could arise from the attempt to extract meaning during comprehension." Now, this is a very interesting idea. If you go back to Gold, there seems to be little relationship between what a child is trying to understand and how they expand their grammar. The child simply knows some bits of their language and, constrained by universal grammar, just tries a whole bunch of stuff, but as long as it fits UG, there is no method to the experimentation. Berwick is bringing up the idea that grammars expand for very particular reasons - as a means to comprehend speech coming at them right then. This introduces the possibility of grammar expanding in a very particular way as a child explores ways to understand what he just heard. Moreover, most models of speech comprehension seem to have syntax as a sort of gateway to meaning, where only if the syntax is already known, do children understand the meaning of the words. But this is almost certainly not correct. Instead the child hears a grammatical structure they do not know, and in an attempt to ferret out a possible meaning, they rearrange their grammar until it fits. How do they make this re-arrangement?

7) Last one! Indirect negative evidence is also available. A child may never be told that 'goed' is wrong - direct negative evidence - but if they compare how often they expect to hear this form and compare that to how often they actually hear it (hardly ever if at all), they may surmise that 'goed' must be ungrammatical, since otherwise she would almost certainly have heard it by now.

Voila. Long enough for ya?



Tom Naka said...

Hi i am totally blown away with the blogs people have created its so much fun to read alot of good info and you have also one of the best blogs !! Have some time check my link of dental health care

Jack Naka said...

Hi, I was just blog surfing and found you! If you are interested, go see my beauty salon denver related site. It pretty much covers beauty salon denver stuff. I guess you may find something of interest.

Anonymous said...

You have done a great job on setting up your Blog. Your site will definately be bookmarked.

I am in the process of setting up a history of cellular phone
site. It's basically a resource site which covers history of cellular phone
related stuff.

Please let me know what you think if you have time to check it out.