@article{bb27183, AUTHOR = "Gupta, A. and Jain, R.", TITLE = "Visual Information Retrieval", JOURNAL = "CACM", VOLUME = "40", YEAR = "1997", NUMBER = "5", MONTH = "May", PAGES = "70-79"}
But images are first class information bearing entities all on their own. There are two kinds of information associated with a visual object: metadata (alphanumeric information about the object) and visual features (information contained within the object.) You get visual features through computational processing (computer vision, image processing, etc.) (p72)
The simplist visual features are pixel data. Such information can be used to find color shifted images, images with some color in a given area, etc.
Drawbacks:
color
When color is attended to, you can answer questions like "Find all images in which more than 30% of the pixels are sky blue and more than 20% of the pixels are green (an outdoor picture?). You can make a color histogram that shows a frequency distribution of color.
By making a quadtree of historgrams (make a color distribution for all quadrants recursively until the quads are 16x16 pixels or smaller) you can ask questions specific to areas of the image. e.g. find all images with red in the center and blue all around.
shape
Assume the images have pure color and distinct shapes, like typical clip art. With images like this you can segment each image into a number of color regions so each region contains a connected set of points, all of the same color. Then for segments you can compute properties like color, area, elongation and centrality. Then you can answer queries like "find all images with two blue circles." (p74)
face retrieval
At the media lab they have an eigenface database. Each face processed and described by 20 eigenfeatures, representing any face. As transformations become more meaningful, they get more difficult to automate. Completely automated image analysis can only occur in small, controlled domains.
video
most look at video as a series of images, but this does not take advantage of the motion in the video. They contain 3 kinds of motion information: one due to movement of the objects within a scene, one due to motion of the camera, and one do to special effects.
the query
A system called PICQUERY is a language for formulating queries for images. Another way to do it is to query by example. This can be done with a kind of drawing system. Then the image can be changed to further adjust the query. A good query language should include the following: