The full version of the paper is online. Else try the local version.
Human brain design differs from chimps by 5%. The design of a species can only evolve at a rate of .1 bits per generation. The brain has a genetic design size of 100 kilobytes.
"If language depends on new cognitive faculties evolved during the time of homo sapiens (on the order of 250,000 years, or 12,000 generations) then by the speed limit, the maximum amount of new design information in the brain over that period is of the order of 150 bytes."
Both social situation and language representation should have the following things:
We do not know if chimps have a working theory of mind.
Several lines of evidence suggest that social intelligence is located in the ventral pre-frontal cortex (VPC). Some language happens here too-- the most active part of Broca's area overlaps with it. The theory is consistent with neuroantomical data.
The computational model represents social situations with scripts in tree form (the program was written in prolog.) Following is a script for "First I bit Joe (a male) and it made him mad. Then he bit me while I was eating nuts." The --> arrow indicates a time relation.
              script
              /    \
          scene-->scene
         /   \     /  \
        I    Joe  Joe  me
        |    male male  |
        bit  mad  |    eat
        Joe       bit  nuts
                  me
Vervets would need to only go about 4 nodes deep.
When you get a bunch of these you can generalize to represent that
biting anyone will likely result in getting bitten back. This is done
with script intersection, which finds the common structure. This is
efficient and robust. There is evidence that primates can learn new
social regularities from just a few examples, and this mechanism can
do that. You can use the script to simulate future situations. This is
script unification. The two together produce a simple script algebra.
How can this be extended to a theory of mind? By making deeper trees. To know that Bob knows the above script, you could have:
              script
                |
              scene
                |
               bob
              knows
              script
              /    \
          scene-->scene
         /   \     /  \
        I    Joe  Joe  me
        |    male male  |
        bit  mad  |    eat
        Joe       bit  nuts
                  me
Now, how do you represent words?
How about a complex word like gives:
                        script
                        holder
       	       	       	/    \
		       /      \
		      /	       \
		     / 		\
		   script       script
		   / | \ \     	     \
	       	  /  |	\ \ 	      \
	       	 /   |	 \ \	       \
	       	/    |	  \ \	   	\
	       /     |	   \ \	   	 \
     	    scene scene scene scene  	scene
       	     | 	    |	   |   	|      	|| |
	     | 	    |      |	|     	|| |
	     | 	    |      |    |     	|| |
            "bob" "give" "joe" "apple"	||  \
				       	/|   \
				       / |    \
				      /	 |     \
				    bob	joe    	\
				  agent patient	 \
						  \
						   scene
                                                   now
 						   has
						   / \
						  /   \
						joe   apple
Where the things in quotes are sounds (so "apple" means the sound of
someone saying apple) and bob, joe, apple are all variables for any
people heard (you can use the script for "mom gave phil a cup.") So
the left script is hearing the sentence, and right is the act of
giving that must be happening.The author's prolog implementation handles a 400 word subset of english including all parts of speech, complex verbs, tense, aspect, mood, passives, anaphora, gaps, ambiguity and so on. It can understand or generate (when used backward).
To deal with ambiguity, when encountering a situation to which more than one script applies, as in learning take the intersection and go on processing with that new script. This works for both social situations and language. This only works for languages which obey the Greenberg-Hawkins universals (Greenberg 1966). For other languages the intersection would destroy too much meaning. In this account, language would have evolved to obey these universals.
The language learning happens the same way that social situation learning does. In testing it has learned 50 new words with no prior knowledge, and it appears that more is possible.