Gupta 1997: Visual Information Retrieval

[ CogSci Summaries home | UP | email ]
http://www.jimdavies.org/summaries/

Gupta, A.[Amarnath], Jain, R.[Ramesh], Visual Information Retrieval, CACM(40), No. 5, May 1997, pp. 70-79.

@article{bb27183,
        AUTHOR = "Gupta, A. and Jain, R.",
        TITLE = "Visual Information Retrieval",
        JOURNAL = "CACM",
        VOLUME = "40",
        YEAR = "1997",
        NUMBER = "5",
        MONTH = "May",
        PAGES = "70-79"}

Author of the summary: Jim Davies, 2000, jim@jimdavies.org

Cite this paper for:

drawbacks of pixel information (noise, rotation, illumination)

quadtree of color histograms

languages for formulating queries for image retrieval

A good image query language should have an image processing tool, a feature space manipulation tool, an object specification tool, a measurement specification tool, a classification tool, a spatial arrangement tool, temporal arrangement tool, and a data definition tool.

When people think of information retrieval, they usually think of text. But how do you retrieve images? You could request them with a textual query.

But images are first class information bearing entities all on their own. There are two kinds of information associated with a visual object: metadata (alphanumeric information about the object) and visual features (information contained within the object.) You get visual features through computational processing (computer vision, image processing, etc.) (p72)

The simplist visual features are pixel data. Such information can be used to find color shifted images, images with some color in a given area, etc.

Drawbacks:

sensitive to noise
translation and rotation variant
sensitive to variations in illumination, etc.

On the other extreme are human-annotated images (e.g. there is a tank here, there is a building here.) Most actual applications fall somewhere inbetween. (p73)

color

When color is attended to, you can answer questions like "Find all images in which more than 30% of the pixels are sky blue and more than 20% of the pixels are green (an outdoor picture?). You can make a color histogram that shows a frequency distribution of color.

By making a quadtree of historgrams (make a color distribution for all quadrants recursively until the quads are 16x16 pixels or smaller) you can ask questions specific to areas of the image. e.g. find all images with red in the center and blue all around.

shape

Assume the images have pure color and distinct shapes, like typical clip art. With images like this you can segment each image into a number of color regions so each region contains a connected set of points, all of the same color. Then for segments you can compute properties like color, area, elongation and centrality. Then you can answer queries like "find all images with two blue circles." (p74)

face retrieval

At the media lab they have an eigenface database. Each face processed and described by 20 eigenfeatures, representing any face. As transformations become more meaningful, they get more difficult to automate. Completely automated image analysis can only occur in small, controlled domains.

video

most look at video as a series of images, but this does not take advantage of the motion in the video. They contain 3 kinds of motion information: one due to movement of the objects within a scene, one due to motion of the camera, and one do to special effects.

the query

A system called PICQUERY is a language for formulating queries for images. Another way to do it is to query by example. This can be done with a kind of drawing system. Then the image can be changed to further adjust the query. A good query language should include the following:

an image processing tool
to facilitate change of texture, change foreground to background, remove an object, etc.
a feature space manipulation tool
to use text queries of features (but see annotation tool)
an object specification tool
object identification.
a measurement specification tool
where size is important.
a classification tool.
for groupings and higher level identification (part of the class of tumors, e.g.)
a spatial arrangement tool
location-sensitive
temporal arrangement tool
for time related queries.
an annotation tool
when user knows what she wants but maybe cannot draw it.
a data definition tool
the user has a prior set of models to characterize properties of the image. "Find other mugshots with similar features."

Summary author's notes:

none

Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster:

JimDavies (jim@jimdavies.org)

Last modified: Thu Jan 27 09:22:13 EST 2000