Quillian 1968 Semantic Memory

Quillian, M. (1968). Semantic Memory, in M. Minsky (ed.), Semantic Information Processing, pp 227-270, MIT Press; reprinted in Collins & Smith (eds.), Readings in Cognitive Science, section 2.1

Author of the summary: Jim Davies, 1998, jim@jimdavies.org

Cite this paper for:

First semantic memory model that covers both general knowledge and word knowledge
Type nodes have meaning, token nodes point to type nodes. Many words have token nodes.
A memory model that wants to represent no information that can be inferred.
Six kinds of links between concepts in a semantic network (p87).
Three types of parameter symbols S, D, M. (slots to be filled when parsing sentences). (p88).
Spreading activation with labels indicating the source (p91).

p80 (pages are of the Readings in Cognitive Science reprint)
The question dealt with: "What constitutes a reasonable view of how semantic information is organized within a person's memory?" The model presented here is a model of human cognition (p96)

Here is the first task we want the model to be able to do: Given two English words, compare and contrast the meanings. The model presented in this chapter does a reasonable job at this.

First, some caveats:

The model deals only with the "objective" meanings, as opposed to the emotive connotations
It is not a model of learning
The model focuses on recognition, not recall

p81 Some background work:
There used to be 2 competing theories concerning semantic memory: an aggregate of associated elements, or based on plans. This distinction is no longer important because it has been shown that attributes, concepts and plans can all be represented as lists (as in the IPL language from Newell, Shaw, and Simon). Learning theorists also buy this. The lack of distinction allows modelers to use all kinds of cognitive elements (plans, attributes, associations) as building blocks (BASEBALL, SAD-SAM, and STUDENT all did it this way).

Simmon's Synthex project used a memory of verbatim text with an index. It was found to fail when it had to make inferences.

Green et al. and Lindsay organized memory as a single predefined hierarchy. It failed when it had to make inferences where there was a changed subject (jumped about the hierarchy). This gets even more rigid as the info in the tree grows.

p82:
The models mentioned above do not deal a whole lot with permanent memory-- they are intended to model cognitive processes. Linguistic theories appear to care about permanent memory even less. Following in the tradition of Chomsky, linguists attempt to understand the nature of language apart from people's use of it. Actually, they are not completely consistent in whether or not they believe that their grammars are a model of human language use. But in any case, the ideas that the tradition has spawned has inspired a lot of psychlinguistic work.

Some assume that semantic memory for words is seperate from semantic memory of other things (like the memory of your dog's face). Here this will not be assumed. Instead we will assume that the semantic memory accounts for words, facts, perceptions, etc.

p83
Our theory says that "language is remembered, dealt with in thought, and united to nonlinguistic concepts in a form that looks like the result of phase structure rules. . ." The grammar of an uttered sentence is decided after the meaning, not before.

The model

There are a mass of nodes connected by links. The nodes roughly correspond to words. They get their meaning in two ways:

Type node: The node links to other nodes that explicitly make the definition.
Token node: The node links to its type node. There can be many tokens for a given type (water and agua point to the same meaning), but not vice versa.

p85:
The type nodes has many connections, which in turn have their own connections. The full meaning of a concept is all of the nodes that can be connected from the type node. Each link has direction and is labeled. p86:
The quantity of information argues that there should be no redundancy nor information that could be inferred from more primitive information. (see summary author's note 1)

It may be that visual and spatial representations are stored in the same semantic network, but that such representations can be retrieved and experienced "directly" to do spatial reasoning.

There need to be many kinds of links.
The ontology of links:

Subclass to superclass
Modification (adjective or adverb)
Disjunction (e.g. earth, air, fire, water)
Conjunction (e.g. old and red need to be conjoined so they both can modify house in the phrase "old red house")
and 6.(open ended) (The final two are open ended, linking two thing concepts to a relationship concept to form a sort of custom link.)

p87-8:
What the nodes really represent are properties, which are flexible and primitive. When a property is connected to a concept, there is a numerical tag (With a fineness of 9 gradations) specifying the intensity of that relationship. Words like "a," "six," "perhaps," "very," and "not" are not nodes, but dictate that range restricting tags be placed to the token nodes of other words.

p88:
Pronouns in sentences are replaced with explicit references to nodes according to what that pronoun represents.

Plane:
Each different meaning of a word has its own plane. For example, the word "plant" means 1. a living thing, 2. a place of manufacture, 3. a verb meaning to place or put somewhere. Each sense of the word corresponds to its own plane. The meaning, given a plane, is a function of connections within the plane and connections to things off the plane. See the diagram on page 84 for the plant example.

There are three kinds of parameter symbols (S, D, M):

S
"the parameter symbol whose value is to be any word related to the present word as its subject."
D
"the parameter symbol whose value is to be any word related to the present word as its direct object."
M
"the parameter symbol whose value is to be any word that the present word directly modifies."

p89:
These are necessary for specifying the relationships between words on the same definition plane. So in the definition of "to comb," there would be a parameter slot D which was expecting something to comb through-- the object of the combing. This slot stays open, expecting something in the text to come along and fill it. "D always refers to some object of the word in whose defining plane it appears. There may be clue words (like "hair") which tell what the slot is likely to be filled with.

p91:
One thing they had the program do was to take two words and tell what their relationship was. This was compared to human data, then altered, to try to get it about right.

This is how it finds the correct meaning of two words. Starting with each word, expand outward with associative links, raising the activation of each node encountered, and also labeling that node with the source patriarch. An intersection node is a node that is activated by both partiarchs' searching.

The activated nodes are labeled with

where the spread started
which node most recently activated me (current patriarch)

You can follow a path from a node to the patriarch with the second label. An intersection node will have two paths, one to one patriarch and the other to the other. In a sense this is the two patriarchs searching for each other.

The third part of the program outputs sentence-like strings that describe the relationship between the two concepts, like, given "plant" and "live," it returns: "A plant is a live structure" and "Plant is structure which gets food from air. This food is thing being has to take into itself to keep live."

p95:
In an experiment, the model correctly disambiguated 12 out of 19 ambiguous words.

p96:
Improvements to be made with the model:

The parameters D, S, M are not sufficient. E.g. In the definition of "swarm" there would be a connection to bees with an S link (subject). But in the sentence "The garden swarmed with bees," the garden is not clustering in some area, as we might think from the definition. This could be changed so that there were ergative and locative kinds of subjecthood.

p98:
It is widely held that the same grammar is used to generate and understand speech. For example, the analysis by synthesis theory (Miller and Chomsky) claims that language understanding happens as a result of trying to recreate the conditions which would result in uttering the sentence yourself.

There are contradictory facts, though: Children can understand sentences more complex than they can generate, as can foreigners learning the language.

This model can understand without recreating a generative hypothesis. This may have broad implications, as the facts stated above are part of the reason Chomsky claims that there is an innate grammar.

Summary author's notes:

In the modern ACT-R cognitive architecture, inferred facts become represented and eventually can be retrieved on their own. This corresponds to the psychological effect of the difference between retrieval of addition facts as opposed to figuring them out. The model presented in this paper, in stating that there should be no information stored that can be inferred, would make the model figure out every addition problem by adding, with none to retrieve. I think the reaction time data for addition facts argues against this view.

Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster:

JimDavies ( jim@jimdavies.org )