The full version of the paper is online. Else try the local version.
Human brain design differs from chimps by 5%. The design of a species can only evolve at a rate of .1 bits per generation. The brain has a genetic design size of 100 kilobytes.
"If language depends on new cognitive faculties evolved during the time of homo sapiens (on the order of 250,000 years, or 12,000 generations) then by the speed limit, the maximum amount of new design information in the brain over that period is of the order of 150 bytes."
Both social situation and language representation should have the following things:
We do not know if chimps have a working theory of mind.
Several lines of evidence suggest that social intelligence is located in the ventral pre-frontal cortex (VPC). Some language happens here too-- the most active part of Broca's area overlaps with it. The theory is consistent with neuroantomical data.
The computational model represents social situations with scripts in tree form (the program was written in prolog.) Following is a script for "First I bit Joe (a male) and it made him mad. Then he bit me while I was eating nuts." The --> arrow indicates a time relation.
script / \ scene-->scene / \ / \ I Joe Joe me | male male | bit mad | eat Joe bit nuts meVervets would need to only go about 4 nodes deep. When you get a bunch of these you can generalize to represent that biting anyone will likely result in getting bitten back. This is done with script intersection, which finds the common structure. This is efficient and robust. There is evidence that primates can learn new social regularities from just a few examples, and this mechanism can do that. You can use the script to simulate future situations. This is script unification. The two together produce a simple script algebra.
How can this be extended to a theory of mind? By making deeper trees. To know that Bob knows the above script, you could have:
script | scene | bob knows script / \ scene-->scene / \ / \ I Joe Joe me | male male | bit mad | eat Joe bit nuts meNow, how do you represent words? How about a complex word like gives:
script holder / \ / \ / \ / \ script script / | \ \ \ / | \ \ \ / | \ \ \ / | \ \ \ / | \ \ \ scene scene scene scene scene | | | | || | | | | | || | | | | | || | "bob" "give" "joe" "apple" || \ /| \ / | \ / | \ bob joe \ agent patient \ \ scene now has / \ / \ joe appleWhere the things in quotes are sounds (so "apple" means the sound of someone saying apple) and bob, joe, apple are all variables for any people heard (you can use the script for "mom gave phil a cup.") So the left script is hearing the sentence, and right is the act of giving that must be happening.
The author's prolog implementation handles a 400 word subset of english including all parts of speech, complex verbs, tense, aspect, mood, passives, anaphora, gaps, ambiguity and so on. It can understand or generate (when used backward).
To deal with ambiguity, when encountering a situation to which more than one script applies, as in learning take the intersection and go on processing with that new script. This works for both social situations and language. This only works for languages which obey the Greenberg-Hawkins universals (Greenberg 1966). For other languages the intersection would destroy too much meaning. In this account, language would have evolved to obey these universals.
The language learning happens the same way that social situation learning does. In testing it has learned 50 new words with no prior knowledge, and it appears that more is possible.