This report describes the main features of a view-based model of object recognition. The model does not attempt to account for specific cortical structures; it tries to capture general properties to be expected in a biological architecture for object recognition. The basic module is a regularization network (RBF-like; see Poggio and Girosi, 1989; Poggio, 1990) in which each of the hidden units is broadly tuned to a specific view of the object to be recognized. The network output, which may be largely view independent, is first described in terms of some simple simulations. The following refinements and details of the basic module are then discussed: (1) some of the units may represent only components of views of the object—the optimal stimulus for the unit its “center,” is effectively a complex feature; (2) the units' properties are consistent with the usual description of cortical neurons as tuned to multidimensional optimal stimuli and may be realized in terms of plausible biophysical mechanisms; (3) in learning to recognize new objects, preexisting centers may be used and modified, but also new centers may be created incrementally so as to provide maximal view invariance; (4) modules are part of a hierarchical structure—the output of a network may be used as one of the inputs to another, in this way synthesizing increasingly complex features and templates; (5) in several recognition tasks, in particular at the basic level, a single center using view-invariant features may be sufficient.
Modules of this type can deal with recognition of specific objects, for instance, a specific face under various transformations such as those due to viewpoint and illumination, provided that a sufficient number of example views of the specific object are available. An architecture for 3D object recognition, however, must cope- to some extent—even when only a single model view is given. The main contribution of this report is an outline of a recognition architecture that deals with objects of a nice class undergoing a broad spectrum of transformations—due to illumination, pose, expression, and so on- by exploiting prototypical examples. A nice class of objects is a set of objects with sufficiently similar transformation properties under specific transformations, such as viewpoint transformations. For nice object classes, we discuss two possibilities: (1) class-specific transformations are to be applied to a single model image to generate additional virtual example views, thus allowing some degree of generalization beyond what a single model view could otherwise provide; (2) class-specific, view-invariant features are learned from examples of the class and used with the novel model image, without an explicit generation of virtual examples.