Marr 1982: Vision

D. Marr, Vision. Freeman Publishers, 1982.

Author of the summary: J. William Murdock, 1997, murdock@cc.gatech.edu

Cite this paper for:

Vision can be understood as an information processing task which converts a numerical image representation into a symbolic shape-oriented representation.


Keywords: Vision

Systems: No specific system is discussed.

Summary: Chapter 1: Discusses the history of research in perception,
leading to the notion that the key problem is a rigorous study of the
internal mechanisms of vision rather than merely the behavioral
characteristics that emerge from these mechanisms.  Introduces the
notion of levels of analysis and asserts that the study of complex
systems is highly dependent on this notion.  Presents a very general
discussion of representation and process.  Describes three levels of
analysis of information processing: theory, representation /
algorithm, implementation.  Argues that both of the top two levels
(theory and representation / algorithm) are crucial items of study.
Introduces the problem of vision as mapping inputs as arrays of
photoreceptor values to outputs of (less obvious) internal
representations. Discusses the visual system of the housefly.  Briefly
discusses human vision concluding with three stage process for vision:
from image to primal sketch to 2.5-D sketch to 3-D model
representation.

Chapter 2: Discusses early vision: the first two stages of the vision
process (converting an image to a primal sketch and converting that
primal sketch to a 2.5-D sketch).  Describes underlying images as
consisting of surfaces often composed of hierarchically organized
elements (e.g. stripes on a cat composed of hairs on a cat).
Discusses continuity, boundaries, and motion.  Describes the concept
of the primal sketch in detail.  Provides mathematical formulas for
detecting zero-crossings (i.e. breaks in image intensity).  Describes
the conversion from zero-crossing into a raw primal sketch of edges,
blobs, etc.  Discusses key issues in the representation of localized
orientation and organization.  Discusses light source and transparency
effects.  Presents the construction of the full primal sketch as the
recursive composition of elements from the raw primal sketch into
larger more general tokens.

Chapter 5: Discusses the conversion from a homogeneous 2.5-D sketch
into a modularized (i.e. multiple levels of abstraction) 3-D model.
Focuses on the issues of representation and recognition of shapes and
coordinate axes.  Describes potential extensions to the theory such as
2-D vision, curved axes, relationships between multiple objects,
Presents a series of issues in greater detail: building the 3-D model,
relating the object-centered coordinate system in the 3-D model to the
viewer-centered one in the earlier stages, cataloging (i.e. memory
storage, indexing, and retrieval) of the models, and recognition.
Provides some psychological evidence for the preceding discussion.

Chapter 6: Summarizes four major points of the preceding work: levels
of explanation, vision as an information-processing task, process
oriented accounts of visual behavior, and the heterogeneity of both
subject (e.g. content, process, representation, etc.) and methodology
(e.g. mathematical analysis, microscopic neurological observation,
psychological experimentation, , etc.).

Chapter 7: Introduces a question and answer format for addressing key
issues in this theory.  Defends the notion of levels of explanation
while admitting that the levels do have interconnections.  Describes
systems based on feature detection as inherently too limited to do
effective general visual information processing.  Distinguishes
between representation and implementation in regards to the issues of
procedural and declarative information.  Briefly characterizes the
transition between images and zero-crossings as involving a change
from a numerical domain to a symbolic one.  Argues against microworld
analyses such as blocks-world and Waltz's prism figures as being
inherently non scalable.  Argues that Minsky's frames are really
implementational mechanisms rather than representations.  States that
the majority of AI (including ELIZA, productions systems, etc.) as
being inherently mechanism based and claims that "the goal of such
studies is is mimicry rather than true understanding."  Further
discusses the numerical to symbolic transition.  Discusses computational
efficiency issues (within neurons) relating to eye movements.  Briefly
discusses natural language processing, planning, etc. within the
context of the modularized levels of explanation framework.

Summary author's notes:

This summary came from a file which had the following disclaimer: "The following summaries are the completely unedited and often hastily composed interpretations of a single individual without any sort of systematic or considered review. As such it is very likely that at least some of the following text is incomplete, inadequate, misleading, or simply wrong. One might view this as a very preliminary draft of a survey paper that will probably never be completed. The author disclaims all responsibility for the accuracy or use of this document; this is not an official publication of the Georgia Institute of Technology or the College of Computing thereof, and the opinions expressed here may not even fully match the fully considered opinions of the author much less the general opinions of the aformentioned organizations."

Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster:

JimDavies ( jim@jimdavies.org )

Last modified: Tue Mar 9 18:07:25 EST 1999