Image Understanding Workshop

[merged small][merged small][merged small][merged small][merged small][merged small][merged small][ocr errors][merged small][merged small][merged small]

Hinkelman, E., "NET: A utility for building regular process networks on the BBN Butterfly parallel processr," Butterfly Project Report 5, Computer Science Dept., Univ. Rochester, September 1986. Hollbach, S.C., "Tinker toy recognition from 2D connectivity," TR 196, Computer Science Dept., Univ. Rochester, October, 1986.

Kyburg, H.E. Jr., "Bayesian and Non-Bayesian Evidential Updating, TR139 (revised), Computer Science Dept., Univ. Rochester, July 1985.

LeBlanc, T.J. and J.M. Mellor-Crummey, "Debugging parallel programs with instant replay," Butterfly Project Report 12, Computer Science Dept., Univ. Rochester, September 1986.

LeBlanc, T.J., N.M. Gafter, and T. Ohkami, "SMP: a message-based programming environment for the BBN Butterfly," Butterfly Project Report 8, Computer Science Dept., Univ. Rochester, July 1986.

Litman, D.J., "Plan recognition and discourse analysis: an integrated approach for understanding dialogues," Ph.D. thesis and TR 170, Computer Science Dept., Univ. Rochester, September 1985. Loui, R.P., J.A. Feldman and H.E. Kyburg, Jr., "Interval-Based Decisions for Reasoning Systems," Proceedings, AAAI Workshop on Probability and Uncertainty in AI., 1985.

Low, J.R., "Experiments with remote procedure call on the Butterfly," Butterfly Project Report 16, Computer Science Dept., Univ. Rochester, December 1986.

[ocr errors][merged small][merged small][merged small]

Sanchis, L.A., "Multiple-way network partitioning,"

TR 181, Computer Science Dept., Univ.
Rochester, March 1986.

Scott, M.L., "The interface between distributed operating system and high-level programming language," Butterfly Project Report 6, Computer Science Dept., Univ. Rochester, revised September 1986; also, TR 182, Computer Science Dept., Univ. Rochester, revised September 1986; also, Proceedings, Parallel Processing Conf., October 1986.

Scott, M.L., LYNX Reference Manual," Butterfly Project Report 7, Computer Science Dept., Univ. Rochester, revised August 1986.

Scott, M.L. and A.L. Cox, "An empirical study of message-passing overhead," Butterfly Project Report 17, Computer Science Dept., Univ. Rochester, December 1986.

Sher, D.B., "Optimal likelihood generators for edge. detection under Gaussian additive noise, "TR 185, Computer Science Dept., Univ. Rochester, June 1986; IEEE Proc., Conf. on CVPR, June 1986. Sher, D.B., "Evidence combination for vision, using likelihood generators," Proc., DARPA Image Understanding Workshop, Miami, FL, December

1985.

Sher, D., "Template Matching on Parallel Architectures," TR156, Computer Science Dept., Univ. Rochester, July 1985.

Sher, D. and A. Tevanian, "A Hough Chip," internal course project report, Computer Science Dept., Univ. Rochester, April 1983.

Book Chapters:

Allen, J.F., "Maintaining Knowledge About Temporal Intervals," in Readings in Knowledge Representation. R. Brachman and H. Levesque (eds). Los Altos, CA: Morgan Kaufmann Publishers, Inc., 1985.

Allen, J.F., "Speech Acts," in Encyclopedia of Artificial Intelligence. S. Shapiro (ed). New York: John Wiley & Sons, Inc., in press, 1987.

Allen, J.F. and H.A. Kautz, "A Model of Naive Temporal Reasoning," in Formal Theories of the Common Sense World (Vol. 1). J.R. Hobbs and R. Moore (eds). Norwood, NJ: Ablex Publishing Co., 1985.

Allen, J.F. and C.R. Perrault, "Analyzing Intention in Utterances," in Readings in Natural Language Processing. B. Grosz, B. Webber, and K. SparckJones (eds). Los Altos, CA: Morgan Kaufmann Publishers, Inc., 1986.

Brown, C.M. (ed). Advances in Computer Vision. Hillsdale, NJ: Lawrence Erlbaum Associates, Pub., 1986.

Brown, C.M., "Space-Efficient Hough Transformation for Object Location," in Statistical Image Processing and Graphics. E. Wegman (ed). Marcel-Dekker, in press, 1987.

Feldman, J.A., "A Functional Model of Vision and Space," in Vision, Brain and Cooperative Computation. M. Arbib and A. Hanson (eds). Cambridge, MA: MIT Press/Bradford Books, in press, 1987.

Feldman, J.A., "Energy Methods in Connectionist Modelling," to appear, Pattern Recognition, Theory and Applications, P.A. Devijver (ed.), NATO ASI Series in Computer Science, Springer Verlag, 1987.

Feldman, J.A., "Massively Parallel Computational Models," in M. Commons (ed). Cambridge, MA: Harvard University Press, in press, 1987.

Feldman, J.A., "Neural Representation of Conceptual Knowledge," to appear, forthcoming book, L. Nadel et al. (eds.), 1987; also, TR 189, Computer Science Dept., Univ. Rochester, June 1986.

Feldman, J.A. and L. Shastri, "Evidential Reasoning in Semantic Networks," to appear, forthcoming book, Wilks and Partridge (eds.), 1987.

Kautz, H., "Formalizing Spatial Concepts and Spatial Language," in Formal Theories of the Commonsense World. J. Hobbs (ed). Stanford, CA: Center for the Study of Language and Information, 1985.

Shastri, L. and J.A. Feldman, "Neural Nets, Routines and Semantic Networks," in Advances in Cognitive Science. N. Sharkey (ed). Ellis Horwood Publishers, in press, 1987.

Image Understanding and Robotics Research

at Columbia University

John R. Kender1

Peter K. Allen

Terrance E. Boult

Department of Computer Science

Columbia University, New York, NY 10027

0 Introduction

The Vision and Robotics Laboratory at Columbia has attained some measure of maturity. Three professors of computer science (John Kender, Peter Allen, and Terry Boult) and one research computer scientist (Hussein Ibrahim) lead a staff of two research programmers (Earl Smith, lab manager, and Lisa Brown) and a body of approximately 15 graduate students (listed below). Recent equipment grants have augmented our VAX, Sun, Grinnell, and Puma facility with a Masscomp real-time processor, and will shortly add an IBM arm and a real-time pyramid-based Aspex image processor.

Our research investigations reflect the principal interests of the four faculty members. Several of these investigations are joint projects with non-vision/robotics faculty members of the Department of Computer Science; such faculty are listed below, together with the students and staff associated with the effort. Briefly, from vision through parallel algorithms to robotics our interests are:

1This work was supported in part by the Defense Advanced Research Projects Agency under contract N00039-84-C-0165.

1.1 Generalized Stereo: Spectral/Polarization

We have reexamined the physics and the assumptions underlying the phenomena of both binocular and photometric stereo, and have succeeded in generalizing them to a large class of methods for determining surface orientation, which we call "physical stereo" [21]. The name derives from the insight that virtually any measurable image quantity which has a determinable variation described by laws of physics can serve as the basis for the recovery of analytic and geometric surface properties, including depth, gradient, and curvature.

Differences in observed angle under unchanging illumination but changing viewpoint (binocular stereo) are related to surface depth, and differences in observed irradiance under changing illuminant position but unchanging viewpoint (photometric stereo) are related to surface orientation. However, both can be generalized to other observations: differences in irradiance under changing wavelength or polarization can be related to surface orientation, too, as can certain differences in the Fourier domain that arise under changing object position and rotation. Any observed changes due to variations in surface temperature or electrical current flow can be exploited also, given certain reflectance models. Even changes due to variations in the propagating medium ("mist stereo") are quantifiably useful.

We have first generalized and extended the method of photometric stereo to spectral stereo and polarization stereo, two related methods which can be used singly or in combination with each other [22]. Spectral and polarization stereo methods consider the intersection of equi-reflectance curves in gradient space, while varying the wavelength and/or the polarization of light emanating from a single light source as the imaging geometry is left invariant. In order to capture the observable effects of varying both of these physical properties, we replaced the ubiquitous Lambertian reflectance model with a more realistic one in which they appear as parameters: the Torrance-Sparrow model. The model is applicable to a wide variety of isotropically rough surfaces ranging from metals to paper.

An analytic study, backed by simulations of the error behavior, resulted in the following predictions on experimental accuracy which are in the process of being tested. Polarization stereo should work best with dielectrics (for example, magnesium oxide), which are virtually insensitive to spectral variations; roughly speaking, the situation is reversed for conductors (for example, aluminum). For polarization stereo, best results should be obtained when two separate observations are taken with light polarized parallel to the plane of incidence and perpendicular to it, respectively. photometric stereo, precision is paramount, since the methods are sensitive to error. We have discussed two methods for solving for the surface gradient: either directly by intersecting equi-reflectance curves for two different light qualities, or indirectly by using the two observations to decompose the reflectance function into its specular and diffuse components, which are themselves then intersected.

Unlike photometric stereo, a third independent equireflectance curve is impossible in spectral or polarization stereo, due to the inherent symmetry of isotropic reflectance functions. Thus, the method results in an intrinsic ambiguity in the surface orientation. However, by using a variable, asymmetric aperture over the single light source, we calculated that it should be possible to simulate the effect of

multiple light sources in the manner of pure photometric stereo.

1.2 Generalized Stereo: Surface Curvature

In related work, we demonstrated how photometric stereo can be used to directly compute principle curvatures at each point on an arbitrary smooth surface [23]. This obviates the existing technique of explicitly differentiating the normal map derived from standard photometric stereo. Further, the method does not require auxiliary assumptions about the underlying surface. What is exploited instead is the image gradient: additional image information heretofore neglected.

We demonstrated that the availability of surface gradient and image gradient information allows the calculation of the local image Hessian, which gives complete knowledge about the second order rate of change of a smooth surface with respect to the image plane. We further derived that from the image Hessian it is straightforward to compute the curvature matrix for any visible point on the surface. Principle curvatures and their directions are given by the eigenvalues and eigenvectors of this matrix, and the Gaussian curvature is given by its determinant. If needed, lines of curvature are obtainable in the image by finding integral curves through the projected curvature vector field. An interesting observation we determined is that the sign of the determinant of the image Hessian is equal to the sign of the Gaussian curvature; thus, a qualitative (but crude) segmentation of the surface into curvature classes is possible directly from image observables.

Our current research in this area attempts to further explore physical stereo and its relation to differential geometry. Perhaps it will culminate in a unified paradigm for low-level vision.

1.3 Approximation Methods in Stereo

We succeeded, under limited assumptions, to quantify the errors that occur in the three-space measurements derived under binocular imaging. The result was reassuring: in its most simplified form it confirmed existing but previously unexamined practice.

When two corresponding pixels of a stereo pair of real images are back-projected into three space, they often do not intersect due to alignment or calibration error. The situation is exaggerated if the positions of the pixels themselves are uncertain. Then, the uncertainty of a position in the image plane translates into a cone of uncertainty in three space, and the intersection of the two cones is a badly-behaved volume of uncertainty in three space. What point in this volume is the most likely candidate for the originating stimulus in the images?

Through standard convolution techniques it is theoretically possible to determine the a priori probabilities of each such point in three space; however, in the general case, the actual convolution is intractable. Simplifying the problem somewhat, we assumed that the image observations suffer from position errors with a Gaussian distribution; the cones of uncertainty then have Gaussian cross sections. Although the convolution itself remains intractable, if one solves only for the space of most likely three-space locations, an exact closedform solution is obtained. The locus is determined solely by that component of the image disparity vector that lies parallel to the camera baseline: it is as if the back-projected lines were

truly coplanar to begin with. Thus the typical practice of ignoring off-baseline components of the disparity vector is justified, at least to the extent that the assumed Gaussian model is valid.

The method easily extends to multiple cameras or multiple views.

1.4 Non-Rigid Body Tracking

In work originally stimulated by an interest in character recognition [18, 20], we are investigating the problem of inexpensive motion tracking of nonrigid bodies over long series of (100 or more) frames. Our current technique assumes an initial segmentation of the image into regions that correspond to objects. To attack the fundamental problem of computational cost, we found it useful to discretize both the spatial and the intensity dimensions. The approach is then one of coarse-fine control, along these two orthogonal features.

Elements in the coarse spatial representations were first labeled with the label of the most prominent object covering it. In successive frames, the algorithm would look for the nearest neighboring label with similar intensity data. In cases of ambiguity, the algorithm had to search in neighboring regions for the proper correspondence; when found, the appropriate label would be propagated.

The method worked well even for objects that moved quickly and changed shape, as long as sufficiently high contrast was maintained: that is, as long their intensities were sufficiently different than those of nearby objects so that the coarse intensity data was a sufficient discriminator. In general, it also worked well for the interior of large regions. However, there was occasional deterioration at the common borders of regions with similar intensities. Without additional shape constraints or temporal processing, this appears unavoidable.

We are exploring additional, inexpensive representations that can be used to establish the best match in one frame from information derived from a short sequence of previous frames. This appears to involve a scale-space filtering of the objects that were initially segmented; further, the number of dimensions we coarsify will likely grow to three, to include the dimension of time.

Aside from obvious real-time applications, this work has direct impact on the automatic colorization of movies.

2 Middle-level Vision

We have found the middle levels of vision a rich ground. In addition to the mathematical analyses common to this level, we are able to report on the successful application of techniques of cognitive psychology to a difficult methodological problem, on the preliminary implementation of a middle-level system that fuses multiple conflicting knowledge sources, and on a new method for shape recovery that has industrial vision applications.

2.1 Surface Reconstruction: Optimality

We have explored the complexity of the task of reconstructing smooth surfaces from sparse depth data, approaching it in two novel ways. The first way is computationally, by using the tools and techniques of information-based complexity [3, 6]. After generalizing the

traditional formulation of the problem to the two related problems of surface interpolation and surface approximation, we stated the optimal abstract mathematical solutions and discussed four realizations of them. Two of these four approaches are nontraditional and have been previously overlooked, but they permit the use of functions with fractional numbers of derivatives.

We compared all four approaches, calculated their computational complexity, pointed out ways to optimize performance, and detailed the image-taking circumstances under which each approach was most appropriate [4]. The results were extensively discussed, and documented with tables of timings too voluminous to summarize here. Within each of the approaches, we demonstrated that the exact mathematical solution depends critically on tunable parameters that define the meaning of "smooth". This led us to the conclusion that mathematics alone cannot provide powerful enough criteria for evaluating the success of a surface approximation method.

Our current research attempts to use the tools of information-based complexity again, this time to attack the problem of optimal surface segmentation.

2.2 Surface Reconstruction: Psychology

Our second novel approach to the complexity of surface reconstruction was psychological, by designing experiments to empirically determine what it was that human beings meant by "smooth". We devised two experiments: one to measure the detectability of discontinuities in one-dimensional curves, and one to rank the goodness of surface reconstruction under the methods investigated mathematically above [5, 7]. In the first experiment we established that human beings are blind to discontinuities in the second derivative unless they are large. The second experiment strongly suggested that for surfaces with discontinuities, the non-traditional approaches are considerably more accurate reconstructors, even though they perform independently of the value of the smoothness parameter.

2.3 Surface Reconstruction: Super-Quadrics

In a related effort, we have begun to investigate the utility of super-quadrics as a model for surfaces in image understanding. Although super-quadrics appear to be powerful tools for computer graphics, allowing complex solids and surfaces to be constructed and altered easily from a few interactive parameters, it is not immediately clear if these graphics features are advantageous for the purposes of surface reconstruction.

We have begun experiments to recover some or all of the eleven super-quadrics parameters directly from depth data. Noise in the date is unavoidable, and certain superquadric parameters appear to be rather unstable when "run backwards". Thus, we have overconstrained the problem by using a pyramid-based relaxation scheme to quickly propagate local parameter estimates to neighbors.

« iepriekšējā Turpināt »

Grāmatas

Image Understanding Workshop: Proceedings of a Workshop Held at ..., 1. sējums