AccessMyLibrary : Search Information that Libraries Trust AccessMyLibrary | News, Research, and Information that Libraries Trust

AccessMyLibrary    Browse    A    AI Magazine    Unrestricted recognition of 3D objects for robotics using multilevel triplet invariants.

Unrestricted recognition of 3D objects for robotics using multilevel triplet invariants.

Publication: AI Magazine

Publication Date: 22-JUN-04

Author: Granlund, Gosta H. ; Moe, Anders
How to access the full article: Free access to all articles is available courtesy of your local library. To access the full article click the "See the full article" button below. You will need your US library barcode or password.

Bookmark this article

Print this article

Link to this article

Email this article

Digg It!

Add to del.icio.us

RSS

COPYRIGHT 2004 American Association for Artificial Intelligence

A method for unrestricted recognition of three-dimensional objects was developed. By unrestricted, we imply that the recognition will be done independently of object position, scale, orientation, and pose against a structured background. It does not assume any preceding segmentation or allow a reasonable degree of occlusion. The method uses a hierarchy of triplet feature invariants, which are at each level defined by a learning procedure. In the feedback learning procedure, percepts are mapped on system states corresponding to manipulation parameters of the object. The method uses a learning architecture with channel information representation. This article discusses how objects can be represented. We propose a structure to deal with object and contextual properties in a transparent manner.

**********

Recognition of objects is the most fundamental mechanism of vision. By object, we mean some entity that is discernible from other entities, which includes the everyday notion of an object as something you can hold in your hand; however, it goes far beyond that. A certain object can contain parts, which, for convenience sake, we also want to denote as objects. Objects can consequently group to build up entities that we assign certain properties, and for that reason, we want to give a collective notion of object.

An extension of the discussion along these lines has the consequence that nearly anything can be viewed as an object--a line, a wheel, a bicycle, a room, a house, a landscape, and so on. This level transparency is probably inevitable as well as practical. The segmentation of the external world varies depending on the scale at which it is observed and what aspects are at the focus of attention.

Still, we have the idea that objects should be possible to sort into categories. The difficulty with the generation of a taxonomy is the ambition to obtain a conceptually manageable and simplified structure for something that is in reality a complex, multidimensional structure. In practice, there are for a given context, certain aspects that are more crucial than others, which will determine to what category we finally assign an object.

This irreducible multidimensional characteristic of objects determines how we have to approach the issue of what is one object and what is another: There are simply different ways to define an object, depending on the setting.

Object-Centered versus View-Centered Representation

Over the years, there has been increasing interest in research on invariants (Jacobsson and Wechsler 1982; Kanatani 1987; Koenderink and van Doorn 1975; Mundy and Lisserman 1992). Most of the methods proposed treat invariants as geometric properties, the rules for which should be input into the system. Theoretical investigation of invariance mechanisms is undoubtedly an important task because it gives clues to possibilities and limitations. It is not likely, however, that more complex invariants can be programmed into a system. The implementation of such invariance mechanisms in systems will have to be made through learning.

An important application of invariant representation is object description. To position ourselves for a thorough analysis, we look at two traditional major lines of approach that have been used for object description: (1) object-centered representation and (2) view-centered representation (Granlund 1999b; Riesenhuber and Poggio 2000a, 2000b) (figure 1).

[FIGURE 1 OMITTED]

>From the real object, a number of measurements or projections are generated. They can be images of the object taken from different angles (figure 1a). From these measurements, we can proceed along one of two different tracks.

One of the tracks leads to the object-centered representation that combines these measurement views into some closed-form mathematical object (Grimson 1990) (figure 1b).

The image appearance of an instance of a particular orientation of the object is then obtained using separate projection mappings.

A view-centered representation, however, combines a set of appearances of an object, without trying to make any closed-form representation (Beymer and Poggio 1996; Poggio and Edelman 1990; Ullman and Basri 1991) (figure 1c).

Object-Centered Representation

The basic motive of the object-centered representation is to produce a representation that is as compact and as invariant as possible. It generally produces a closed-form representation, which can subsequently be subjected to interpretation. Thus, no unnecessary information is included about details on how the information was derived. A central idea is that matching to a reference should be easier because the object description has no viewpoint-dependent properties. A particular view or appearance of the object can be generated using appropriate projection methods.

We can view the compact invariant representation of orientation as vectors and tensors (Granlund and Knutsson 1995) as a simple variety of object-centered representation. Over a window of a data set, a set of filters are applied, producing a component vector of a certain dimensionality. The components of the vector tend to be correlated for phenomena of interest, which means that they span a lower-dimensional subspace. The components can consequently be mapped into some mathematical object of a lower dimensionality to produce a more compact and invariant representation, that is, a vector or a tensor (Granlund and Knutsson 1995).

A drawback of the object-centered representation is that it requires a preconceived notion about the object to ultimately be found, its mathematical and representational structure, and the way in which the observed percepts should be integrated to support the hypothesis of the postulated object. It requires that the expected types of relations are predefined and already existing in the system and that an external system keeps track of the development of the system, such as the allocation of storage and the labeling of information. Such a preconceived structure is not well suited for self-organization and learning. It requires an external entity that can "observe labels and structure" and take action on this observation. It is a more classical declarative representation rather than a procedural representation.

View-Centered Representation

In a view-centered representation, no attempt is made to generalize the representation of the entire object into some closed form. The different parts are kept separate but linked together using the states or responses, which correspond to or generate the particular views. The result is a representation that is not nearly as compact or invariant. However, it tells what the state of the system is associated to a particular percept state. A view-centered representation in addition has the advantage of being potentially self-organizing. This property will be shown to be crucial for the development of a learning percept-action structure. There are indications from perceptual experiments that the view-centered representation is the one used in biological visual systems (Riesenhuber and Poggio 2000a, 2000b).

An important reason for the view representation is that it allows an interpretation rather than a geometric description of an object that we want to deal with. By interpretation, we denote links to actions that are related to the object and information about how the object transforms under the actions.

Combination of Representation Properties

An object-centered representation is, by design, as invariant as possible with respect to contextual specificities. It has the stated advantage of being independent of the observation angle, distance, and so on. This independence has, however, the consequence of cutting off all links that it has to specific contexts or response procedures that are related to that context or view.

The generation of an invariant representation implies discarding information that is essential for the system to act using the information.

It is postulated that we can represent objects as invariant combinations of percepts and responses, suggesting that we will start out from the view-centered representation of objects.

The structure that results from the preceding model will be of type frames-within-frames, where individual transformations of separate objects are necessary within a larger scene frame. See figure 2 for an intuitive illustration of this frames-within-frames as a tree. The ability to handle local transformations is absolutely necessary and would not be possible with a truly iconic view representation. It is postulated that the frames-within-frames partitioning is isomorphic with the response map structure. In this way, the response...

Read the full article for free courtesy of your local library.


More Articles from AI Magazine
National Science Foundation summer field institute for Rescue Robots f...
June 22, 2004
AAAI-04 IAAI-04 San Jose, California.(Conference Report)
June 22, 2004
The 2003 International Conference on Automated Planning and Scheduling...
June 22, 2004
The Eudaemonic Pie: a review.(Book Review)
June 22, 2004
The St. Thomas common sense symposium: designing architectures for hum...
June 22, 2004
Find companies classified under Electronic computers

What's on AccessMyLibrary?

32,379,037 articles
in the following categories:

Arts, Business, Consumer News, Culture & Society, Education, Government, Personal Interest, Health, News, Science & Technology