Thursday, August 11, 2005

Language Without Grammar - Grammar 2

First, if you aren't into language, there are a couple shorter posts below this about my life which you should skip to.

OK, here is my notes on William O'Grady's Emergentist Syntax specifically as it relates to learning languages. (The "here is" above is a little joke based on these papers, where the coming grammaticality of "here is" is mentioned when it should be "here are.") The notes are largely based upon his 2003 talk "Language Without Grammar." They are supplemented by the 2002 manuscript "An Emergentist Approach to Syntax" as well as the language acquisition and concluding chapters of his 2005 Syntactic Carpentry book. The first two can be downloaded from his web site at the University of Hawai'i. The last can be purchased or checked out from your local academic library, though it is brand-new and may not be there yet. Dr. O'Grady is actually a professor of mine. I took all of last year's syntax courses from him, and he is one of my references when I apply for jobs. He is also a great teacher, and I am planning on modeling my teaching this semester to a large degree on him. That said, I will not always agree with him in my notes below. However, these are just informal notes to get me ready for the semester, so there will be more questions than criticism.

OK, so some quick review. In the last essay about language acquisition:

here

we introduced you to the amazing abilities children have in learning languages, far surpassing the greatest artificial intelligence, bonobos, etc. We also introduced the classic argument called "the logical problem of language acquisition" or "poverty of the stimulus" argument from Chomsky et al. The argument is basically:

1) Children are able to use certain pieces of grammar without error.
2) They never ever hear these types of grammar used.
3) Since they never hear it, they cannot learn it from experience.
4) The ability to employ this grammar must be innate or inborn.

The last article was by Brain MacWhinney and was called "A multiple process solution to the problem of language acquistion." MacWhinney's solution had 2 main parts - 1) it turns out that these grammatical structures are in fact heard by children, and 2) there are several ways that children might learn to use them. Much of what we discuss with O'Grady will fit into this context.

A second background distinction should also be discussed before jumping in. It is the distinction between language performance and language competence. This distinction is at least as old as Saussure - 1920s? or was it turn of century? - but yet again was made a principle pillar of linguistic thought by, you guessed it, Chomsky. One actually gets tired of always having to position yourself as in some sort of agreement or disagreement about Comsky's views, but that is largely the state of the field. Language Competence according to Chomsky is what people know, not necessarily consciously, about their language. It is what allows you English speakers reading this to know that "Mary hit the ball" is a good sentence, while "ball the Mary hit" is not. It is what lets you know that in "Sue says Mary loves herself" herself refers to Mary and not Sue. (By reference, I mean simply that the person Mary loves is Mary, and not Sue.) And tons of things like that. Performance is the part of language that just lets you get the words in and out of your head. So I might know the sentence should be "Mary hit the ball" and due to performance errors say "Mary hit ball" or "Mary hit the - uh - chair, I mean, ball."

Performance versus competence is a simplifying assumption. If you can remove all issues of performance, then you can explain what it means to know how to speak English natively. You can explain what knowledge it is that we all share which a French speaker doesn't have. Simply put, you can explain language competence. It is a good and useful concept, distinguishing the two. However, I have had the idea for a few months now that when you remove performance you remove too much. What if performance explains most of the things you want to know? It turns out Dr. O'Grady has had similar ideas, apparently, (he is a good prof in that his intro classes are not indoctrination classes into his theory; this stuff never came up in the whole year of classes with him; the most he ever let on during the Intro Chomsky class was that he didn't think Chomsky was right.), though, of course they are far more developed.

Now it is time for O'Grady.

O'Grady asks the question: "Do we really need a grammar anyway?" Almost all theories of language assume we do. We all learn English grammar in school and teach it to people learning English. Linguistics libaries are full of books named for one grammar after another - Universal Grammar, Head-Driven Phrase Stucture Grammar, k-valued grammars, functionalist grammars, embodied construction grammars, etc.

When O'Grady is speaking of grammar, he has something specific in mind. All of the above theories of language argue that there is some set of principles, blueprints, guidelines, rules, etc., that tell English speakers how to speak correctly. It lets them get the words out in the right order and makes the words agree with each other and the like. This grammar is something apart from the performance part of language - the little bits of neural machinery which get the words out of your mouth. It is an abstract set of principles that make you speak right. Chomsky himself is quite clear and adamant that his Grammar is something more than the artefacts of performance. (O'Grady 2005 for the Chomsky citation) If you are a Chomskian, you think these rules are literally specified in the genes - part of the genetic endowment of humans, like walking upright and having opposable thumbs.

But what if these processes which allow you to speak are all there is? What if all the traditional concerns of linguists regarding grammar can be explained by looking at the processor needed to contruct sentences? In O'Grady's metaphor, what if there are no blueprints and no architects, there are simply carpenters, putting the sentence together as they go, and it is how they happen to work which gives the appearance of rules. The carpenters don't have a plan, they just do it, and the grammar falls out or emerges - hence, an "emergentist theory of syntax."

(Those of you scientifically inclined might notice that this sense of emergence is different than the strict one of properties emerging from complex networks, meaning complex in the technical sense. In the sense here, grammar emerges from a type of non-linguistic behavior, the need to efficiently manage working memory, but there is no argument that there is a self-organizing system a la complexity theory. Grammar, here, emerges from simple processing.)

O'Grady's concept is quite elegant, especially if you have any familiarity with traditional grammars. He posits a simple neural processor, which puts together words from left to right. The words it puts together have various requirements that are language-specific. In English, a verb must look to its left for its subject. Then it looks to the right for its direct object, if it has one. The processor reads these requirements for the verb, and when it follows the lexical instructions the word order of English shows up - subject, verb, object. A language with a different word order, say, Japanese, would use the same processor, but the words in the language would be set to look left for both subject and object, yielding the subject,object, verb word order. Next, we specify the Efficiency Requirement (O'Grady 2003, 2):

'Dependencies' (lexical requirements) must be resolved at the first opportunity.

This is a really super-big important rule in this theory. The processing motivation for this is to minimize the use of working memory. In order to keep how much is loaded into working memory, you have to get things out of it as fast as possible. So the processor tries to follow the lexical requirements as soon as it can, and then dumps it from memory. Remember this for the language learning stuff later. It is worth repeating here that there is no grammatical or processing rule that says words should be in a certain order, so of course children never learn such a rule. Instead it is specified for each and every word what its requirements are. So, for instance, kick will look for 2 noun phrases, one left and one right. That is not a specification for Verbs nor for transitive verbs, etc. It is a specification of "kick" only. (This echoes MacWhinney and Tomasello's item-based acquisition concepts.) When a person learns the word "kick" they also learn these requirements. That is not to say, of course, that they may not realize that "kick" is like "hit", but it is still a lexical specification.

So to compose a simple English sentence, you don't have sets of grammatical rules, you just:

1) Take a verb 'speak' which requires a noun phrase to the left and one to the right.
2) Find the first noun phrase to the left. This resolves the first requirement. This noun phrase is the subject.
3) Find the first noun phrase to the right. This resolves the second requirement.

This occurs in time, with the first noun being grabbed first, and the second a little later. The result behaves as if the subject is higher up in a structural tree, if you ever drew grammar trees in school, but there are no trees and there is no grammar. The subject just got grabbed first to satisfy the efficiency requirement.

O'Grady then goes on to show how many of the central topics of grammatical theory, especially those of concern to generative (Chomskian) grammarians, can be explained with just these simple rules. No other grammar is required at all. One very nice solution has to do with forming questions. In the Chomsky tradition, you have to move words around a lot. So the question "Is grass green?" starts off as "grass is green" and the "is" moves to the front following a set of principles. O'Grady drops the whole movement thing - hoorah! - and simply proposes that certain words like "is" can look to the left or right for their first argument. If you let "is" look to the right, then it can appear first "is grass green?" Notice that this is different from MacWhinney. MacWhinney assumes that there is movement and provides explanations for how children might learn it. O'Grady instead does away with movement and provides a different mechanism for allowing verbs to appear at the front in questions.

Now we get back to the "logical problem" argument. Chomsky's classic example deals with sentences such as:

"Are Americans who are rich happy?"

In a statement this is:

"Americans who are rich are happy."

Chomsky argues that children never hear such sentences, and yet they never ever mess this up. MacWhinney argued in his paper that children do hear this stuff, and there are ways to learn it. O'Grady disagrees with both. While it turns out Chomsky was wrong, children do hear this, they still don't hear it much, and there is no evidence they ever learn it. We don't hear children trying different things and finally getting it right with more examples. They NEVER say "Are American who rich are happy?" So, according to O'Grady, both Chomsky and MacWhinney are wrong. There must be something that prevents children from ever saying the ungrammatical sentence.

According to O'Grady, it is, of course, this efficiency-driven processor, not a rule. First, remember that in this system the words don't move. There is no matter of moving the right or wrong word to the front. Nothing moves. The question is: can the grammatical and ungrammatical sentences both be constructed with the processor that us humans have?

The grammatical one can be built easily. To build "Are Americans who are rich happy?" you take the first "Are". You immediately look right for the first argument "Americans who are rich" and add it. Then you add the second and final argument "happy." Voila, done.

What happens if you try to build "Are Americans who rich are happy?"

It is impossible. The first argument cannot be built because there is no verb in it. If you get around this, you try to add the second argument, but the only choice is "are happy" which is a verb, when you are looking for a noun. It simply cannot be done, and that is why children never create sentences like this. O'Grady also points to a study of Japanese speakers learning English. While they made all sorts of errors, none of them ever tried to move the wrong verb to the front. The indication then is that this is not learned, but comes simply from the way sentences are built.

O'Grady goes on to build arguments for relative clauses and pronoun interpretation. He shows that whenever a word's requirements can be resolved quickly, there is a corresponding ease of learning with children and second-language learners. Whenever, the requirement cannot be resolved quickly, the burden on working memory becomes large and errors increase. This is also the case with people, typically after strokes, who lose the ability to understand some grammatical structures of language. If they lose working memory, then they have a harder time understanding, for instance, pronouns whose antecedents are a long distance away.

So those are the basics. Here are my questions, some of which O'Grady might answer in his book, but I haven't read the whole thing yet.

1) I would like to develop to a greater extent exactly what constructing left to right means. Left to right, I would think, is only a metaphor based on how we happen to write. (Remember other language traditions write up to down, right to left, etc.) It would seem like we are basically talking about time, but it is not just time. The first argument gets dumped out of the working memory into the phonetic implementation first in time. But the idea of being left also seems to have some positional idea in that the first argument has to wait in place while a complex argument is being built, and then it is dumped. It's like the processor has to keep in mind that the argument needs to stay "over there". Moreover, there appears to be some concept of "slots". If the processor is forming a question, it appears that the 2nd argument is dumped to the right or "after" the 1st argument, since it shows up second. You can't just say that the 1st argument is dumped out for speaking first, because the whole thing is waiting on the verb, which is even earlier, and the verb cannot be dumped until the 2nd argument is found. In other words, the verb finds the 1st argument, then the 1st argument waits "in position" until the second argument is located, and then and only then the whole sentence is pushed out for phonetic implementation.

2) This all depends to a huge degree on the requirements in the lexicon. O'Grady points out in his book that this is nothing special about his theory, and he is right. When taking Chomsky grammar from him last semester, I thought the whole time just how much is required of the lexicon by that theory. At the same time, some idea that the lexicon really has all this information in it would be extremely helpful in accepting the theory. In particular, the arguments that words take seem to be rather flexible. Sometimes they need adjectives in the predicate, sometimes they need nouns, sometimes they look left only, sometimes they look left and right. Is it a separate word each time the requirements change? Or does this all come from some interface with the meaning the person is trying to express? Finally, is there any evidence that children get the dependencies for a word wrong and then correct them?

3) O'Grady positions himself apart from what he terms "eliminativist connectionism." I won't go today into what connectionism is. This is just too long already. And he makes his theory quite specifically "symbolist" in that the processor manipulates symbols. I was thinking when I started this note that the problem was that he never justified this position. However, I guess the justification is the theory itself. If his theory indeed explains a lot, then he has shown that manipulating symbols works. I, however, am very suspicious of symbol manipulation as the explanation of cognition. Connectionist neural networks always seem more biologically plausible. That said, I do have serious issues with the learning algorithms that connectionist models use, none of which seem to have actual instantiation in the brain.

4) I have been reading the stuff mentioned here, as well as books like Probabilistic Linguistics and Bybee's Phonology and Language Use. One trend of this research over the last 10 years or so seems to be the re-emergence of the lexicon. For a long time, issues of the lexicon have gotten shortshrift (sp?) in linguistic theory in favor of the rules of semantics, syntax, and phonology. But increasingly, the processing - the phonology, the syntax - are getting tied into the lexicon again, such that soon lexical specification will be perhaps the central topic of linguistics.

5) O'Grady's account ties in really well with a paper from Elizabeth Bates (and someone else) called "The Emergence of Grammar from the Lexicon" or some such. It provides wads of statistical data showing that the best possible indicator of a child's syntactic development is their lexical development, i.e. how many words they know. This is the case in normally developing children as well as children with less common development patterns like those with William Syndrome. Bates's statistics make sense in O'Grady's theory.

Good news. I will be posting more language stuff, but it may become less technical, as it will be my notes for the Linguistics 102 class coming up. First reading "mentalese" from Pinker's The Language Instinct, in which he argues, well states, that manipulating symbols is the foundation of cognitive science, like the cell theory for biology. Sigh....

No comments: