Lapas attēli
PDF
ePub

• Feature extraction & object recognition

• Applications

I would like to express my sincere thanks to

Prof. Timothy Poston
Department of Mathematics
Pohang Institute of Science
and Technology
P.O. Box 125

Pohang, Kyung Buk 790-330, Korea
Tel: +82-562-79-2052
Fax: +82-562-79-2799
E-mail: tim@vision.postech.ac.kr

who contributed the following material. Readers should note that many important Japanese research topics on computer vision were not presented here.

It has been said that if all a rat knew of rat psychology was the information in psychology textbooks, he would fail every social interaction he attempted with another rat. Similarly, if the processing of his visual input rested on current algorithms for vision, he would be safest to rely on his excellent sense of smell. Broadly speaking, most computer vision applications depend on an extremely predictable environment: "is that a nut or a bolt?" algorithms that depend on consistent lighting and would often report "bolt" for a severed finger. The highly stereotyped behavior of an animal adapted to cage life (and no longer viable in the wild) is richness itself compared to any manufactured system. Since back-of-the-envelope calculations suggest that the processing capacity of the current generation of supercomputers is up there with the nervous system of a housefly, it is a remarkable fact that progress is, in fact, being made in solving visual tasks far more interesting to humans than anything a fly can do.

This meeting was reasonably representative of the state of the art. For example, one Korean paper (Ref 1) at this meeting reported on a system for extracting car types and license plate numbers from camera images, that is, in place and working well in its limited universe of car and plate types. The problem of workaday character recognition is a much larger one in East Asia than in pure-alphabet countries (though even there decorative scripts, from Shatter to ornamental Arabic, make a universe too wild and varied for exist

ing computer methods). A Japanese high school graduate is supposed to recognize about 2,000 Chinese characters; a Korean who knows only the phonetic script is functional but cannot read (for instance) most newspaper headlines. Identifying characters from a universe of thousands, even in a fixed typeface, is a qualitatively different problem from working with Western problem from working with Western character sets. Just as with English writing, handwritten text has far more variation and consequent difficulty. Thus to achieve over 98% cumulative accuracy on a realistically large character set is not a small achievement. This was done by two Japanese papers in radically different ways. One (Ref 2) used Fourier transforms of rectangular windows within a character to estimate how like a diagonal/vertical/etc. stroke that part of the character seemed, tested on 881 character categories from a standard database of handwritten characters. The other (Ref 3) worked on the cheap printing in a translated Isaac Asimov novel (processing it in about the time Asimov seems to need to write one), which involved 1,164 distinct one), which involved 1,164 distinct characters. This paper used a more directly geometrical approach, searching for pieces of approximate straight line within the image, calculating their lengths, and so on. Many other methods lengths, and so on. Many other methods are under development (some of which look unlikely ever to scale to a large character set with good reliability); this

contention of innumerable ideas reflects the direct importance of the problem, its attraction to vision researchers as a problem universe of large but controlled size, and the lack of conceptual convergence in this area. There are so many uses for automated literacy that effort and resources will continue to pour in this direction, but it would be unwise at this time to place any bets on what method--existing or still to be developed--will finally dominate the field of character recognition.

In any meeting about computer analyses and decision-making, nowadays, one expects neural networks. At this conference there were five: using networks for identifying objects in an image ("Choose one out of Sky/Grass/ Road/Tree/Road/Car") (Ref 4), segmenting simple images ("Separate this stool sketch from the background sketch of floor and folding screen") (Ref 5), stereo matching (Ref6), an image thinning method (Ref 7), and a classifier for polyhedra with up to eight faces, at most four meeting at a point (Ref 8). As is common for neural net research, the problems handled were quite small, and while directions for development were pointed out, there was no analysis of the way the network's necessary size and learning time would scale with the complexity of the problem. In most network problems, unfortunately, these scaling properties are abominably bad, so that the network "solution" is no better than a classical algorithm that takes exponential time or worse, except for the "learning" that reduces the load on human intelligence in creating the solution. Some of the papers here may scale usefully--some neural networks are proving useful in practical applications--but none of them address the question.

The enormous range of methods applied to scene analysis (optical flow, modelling of the object seen and comparison of the image with prediction, fitting a distorted quadric surface,

analysis of a moving 3D outline, shape from shading...) generously represented at this meeting represents not only the immaturity of the field (as with character recognition) but almost certainly the multifaceted nature of the problem. The human vision system can respond "couple dancing!" to a greytoned image, a line sketch, a half-second movie showing only points of light attached to dancers in the dark ... and thus solves its problems in multiple ways. This multiplicity is presumably in some sense necessary, as the evolutionary cost of evolving it cannot have been low. Complicated systems have many potential defects, so that many mutations could cripple them, and very few-at a given moment--improve their present working. The papers here represent normal progress in existing approaches to subproblems in the Great Problem of "What am I seeing?"--a number of papers that specialists will need to read, but nothing that starts a whole new approach, or represents a step toward the problem created by the multiplicity itself. Given that a robot fit to explore a rich environment will almost certainly need (like the mammalian brain) to use many submethods in visual analysis, how should it integrate the results? How should the computer/how does the brain represent the objects about which so much information arrives in conflicting formats? As each submethod becomes more powerful, the problem of integration or "sensor fusion" becomes more urgent.

Since major progress here would be a large step toward understanding the dynamics of consciousness, it is not a trivial problem. Not surprisingly, at this meeting there was no session on integrating the output of the descriptors for rigid shapes, faces, etc. discussed in the many papers on how to use camera images, range data, and so forth.

As one might expect, given the respective research populations and funding of Japan and South Korea, there were 47 papers from Japan against 33 from the host country, of which a certain number were "trial flights" by graduate students giving their first conference papers. In some cases, this was painfully obvious in the quality of the work as well as in the confidence of the presentation. The experience of placing work in the setting of a larger and more developed research effort will certainly be strengthening for Korean work in computer vision.

REFERENCES

1. "Recognition of Car Type and Extraction of Car Number Plate by Image Processing," Dong-Uk Cho, YoungLae Bae, and Young-Kyu Kang, Systems Engineering Research Institute/ Korea Institute of Science and Technology (SERI/KIST), Korea.

2. "Recognition of Handprinted Characters by FFT," Tadayoshi Shioyama and Akira Okumura, Kyoto Institute of Technology, Japan.

3. "An Experiment on Printed Japanese Character Recognition using a PC for the Braille Translation of Novel Books," Yasuhiro Shimada and Mitsuru Shiono, Okayama University of Science, Japan.

4. "Hopfield Net-Based Image Labelling with MRF-Based Energy Function," Byoong K. Ko and Hyun S. Yang, KAIST.

5. "Image Segmentation Using Neural Networks," Ao Guo-Li, Cui Yu-Jun, Masao Izumi, and Kunio Fukunaga, College of Engineering, University of Osaka Prefecture, Japan.

6. "Stereo Matching Using Neural Network of an Optimized Energy Function," Jun Jae Lee, Seok Je Cho, and Yeong Ho Ha, Kyungbuk National University, Korea.

7. "Automatic Construction of Image Transformation Processes Using Feature Selection Network," Tomoharu Nagao, Takeshi Agui, and Hiroshi Nagahashi, Tokyo Institute of Technology, Japan.

8. "3-D Polyhedral Object Recognition using Fast Algorithm of ThreeDimensional Hough Transform and Partially Connected Recurrent Neural Network," Woo Hyung Lee, Sung Suk Kim, Kyung Sup Park, and Soo Dong Lee, Ulsan University, Korea.

Appendix

RESEARCH ACTIVITIES AT KOREAN UNIVERSITIES/
RESEARCH INSTITUTES IN COMPUTER VISION

[Data collected by Prof. Joon H. Han, POSTECH, and Prof. Hyun S. Yang, KAIST]

KYUNGBUK NATIONAL UNIVERSITY

A. Research Areas

1.

2.

Computer Vision (stereo vision, pattern recognition, range image analysis, motion estimation)

Image Analysis (restoration, enhancement, edge extraction and thinning, segmentation, data compression) 3. Neural Network (pattern recognition, stereo vision, image analysis)

B. Projects (partial list)

1. 3D object recognition from 2D images

2.

3.

Development of shape recognition and synthesis technology by using image processing techniques
Integrated circuit (IC) layout pattern recognition by using image processing techniques

[blocks in formation]

1.

Color image processing system (IBM PC/AT with color image processor, color CCD camera (512 x 512x 8 bits), color monitor)

2. Pseudo color/BW image processing system (IBM PC/386 with ITI ljl series image processor, IBM PC/AT with ITEX-PC-Plus, color CCD camera (512 x 512 x 8 bits), B/W monitor)

3. Stereo vision system (IBM PC/AT with FG-100-AT frame grabber, two CCD cameras, B/W monitor)

4.

Laser range scanner system (Technical Arts) (100X scanner, solid state camera (ICIDTEC), RCA monitor, laser power supply, Visual 500 terminal)

5. SUN 4/260C workstation (color graphic system, color monitor (1280 x 1024), digitizer tablet, plotter)

[blocks in formation]
[blocks in formation]

1. Image Coding [2nd generation coding, region based coding, texture analysis/synthesis, motion compensated coding, motion detector, target tracker (real-time)]

2. Computer Vision (low-level segmentation (color, B/W), shape matching (relaxation), polygonal approximation)

B. Facilities (Gould 8400 IP + 19-inch RGB monitor, Micro-VAX II, Image Technology IP512 Image Processing System, SNU RGB Image Processing System, PDP 11/23, IBM PC 386, AT, XT)

C. Faculty Prof. Sang Uk Lee

Automation & Systems Research Institute

A. Research Areas [Computer Vision (low and high level)]

B. Current Projects (On the development of a color vision system employing DSP, Real-time vision system)

C. Facilities (SUN 4 workstations, CCD camera, Adaptive robot, IP 150 Image Processing System, IBM PC/AT, 386 etc.)

D. Researchers - Prof. Sang Uk Lee (SNU), Prof. Jhong Soo Choi (Chung-Ang Univ.), Prof. Rae Hon Park (Sogang Univ.), 6 research assist.

YONSEI UNIVERSITY

A. Research Areas

1. Neural Network Modeling

Korean Character Recognition

3. Dynamic Character Recognition

4. Korean Character Processing Computer

B. Facilities (Micro-VAX II, Vax II/750, Solbourne, IBM PC/AT, scanner, B/W camera, printers)

C. Researchers - Prof. Il Byung Lee, ~10 graduate students

CHUNG-ANG UNIVERSITY

Image & Information Engineering Lab

A. Research Areas

1. Medical Ultrasound Imaging

2. Computer Vision

3. Visual Communication

B. Current Projects (a study on image understanding system using active focusing and meta-knowledge)
C. Facilities (workstations, PCs 132-bit, image data acquisition system, plotter, logic analyzer, IBM 3090)
D. Researchers - Prof. Jong-Soo Choi, 1 assist. prof., 17 graduate students

[blocks in formation]

B. Facilities (SUN SPARC workstations, Macintosh workstations, VGA color PC 386, VGA color Notebook 386, WACOM SD-510C tablet digitizer, laser beam printers, IBM PC/ATS)

C. Researchers - Prof. Seong Whan Lee, 5 graduate students, 2 research scientists

« iepriekšējāTurpināt »