Scientific Information Bulletin

relating to color, range, separation of cameras for stereo images, and other parameters that is used at the input phase. Next, properties of the image are invoked to locate image features such as edges, lines, or regions. This phase is usually called feature extraction. Lastly, at the highest level, there is some underlying object model, for example, the designer knows that the scene is supposed to be of an automobile, and then matching is done to locate these objects in the scene. This involves solving problems such as direction, angle, and occlusion. The result is scene description or scene knowledge. Research in computer vision is often compartmentalized into subtopics that follow this modularization as well. For example, "image processing" usually refers mostly to the lowest levels, whereas pattern matching research almost always refers to the highest level.

In computer vision research it is not too difficult to get to the leading edge of what has been accomplished, and thus almost any project will quickly need to address advanced problems. But simply put, because the Japanese have tried so many different approaches, their breadth of research experience is very much greater than the Korean's. They are also trying deeper and more sophisticated techniques, although the disparity might not be too great in a few specific promising areas such as scene identification.

JAPAN AND KOREAN
COMPUTER VISION
SUMMARIZED

From the Japanese side

Prof. Yoshiaki Shirai

Dept of Mechanical Engineering for
Computer Controlled Machinery
Osaka University

Suita, Osaka 565, Japan

Tel: +81-6-877-5111 x4706

Fax: +81-6-876-4975

E-mail: shirai@ccm.osaka-u.ac.jp

Stereo

Topic

(or multidimensional) Time sequence images Image database

No. of

Papers

46 45

presented a clear summary of past
work in Japan. Shirai pointed out that
Japan has a Computer Vision Group
with about 500 Japanese members. They Image understanding
meet bimonthly and had their first sym-
posium this summer (this is in addition
to any international meetings that have
been held). The group's chair is Prof.
Yachida, mentioned above. There is a
Special Interest Group (SIG) in Pattern
Recognition and Understanding (until
recently Pattern Recognition and Learn-
ing) sponsored by the Institute of Elec-
tronics, Information, and Communi-
cation Engineers (IEICE), which pub-
lishes about 125 papers yearly in 10
issues. This group also includes a small
amount of speech recognition. There is
a Special Interest Group in Computer
Vision (SIG CV) sponsored by the
Information Processing Society of Japan
(IPSJ), focusing on image processing,
that publishes about 60 papers each
year in a bimonthly journal. Finally,
there is also a SIG in Pattern Measure-
ment sponsored by the Society on Instru-
mentation and Control Engineers
(SICE), which publishes about 20 papers
yearly in four issues, but this is heavily
oriented toward very practical hard-
ware problems.

Shirai pointed out that in a few areas,
such as industrial applications, there is
far more work than is represented by
the number of published papers.

The only more recent data are from the IPSJ's SIG CV for 1990-1991:

A survey of the database of information processing literature in Japan (this covers the period 1986-1988, the latest data that are available) characlatest data that are available) characterizes computer vision related papers as follows (excluding coding of images).

Topic

Time sequence images
Feature extraction
3D input and modeling
Stereo

Medical
Matching

Neural network for matching
Shape from X

Face

No. of
Papers

It is clear that the most important new area is analysis of sequences of images, and this view was also shared by the Korean attendees. While there are only four papers concerning computer vision in the field of human faces, this is also seen to be a growing area, incorporating human computer interface, remote teleconferencing, human emotional information processing, and image coding.

Shirai went on to describe several specific Japanese projects that involve computer vision. The most elaborate of these is the ¥20B ($140M) 19831991 "Robot for Extreme Work" project, in which the ultimate application is the development of an autonomous teleoperated robot for nuclear plant, pipe cleaning, underwater, and emergency (such as fire) operation. This particular project involves much more than just computer vision, and in fact

research has been done on fundamental problems of locomotion, teleexistence, manipulation, and sensing, as well as the development of a system integration language. The part of the project dealing with these fundamental issues actually received the bulk of the funding, and more applied aspects, i.e., to really develop such a robot, were not so well funded. In addition to Japanese universities, the Electrotechnical Laboratory (ETL), Fujitsu, Toshiba, and other companies participated--Toshiba working on feature extraction and Fujitsu on projecting images onto a sphere (which Shirai claimed works well in clean environments). ETL has done a great deal of work on sensing, stereo, and robot vision language development and actually issued a special issue of the ETL Bulletin on Robotics in which this has been summarized. Shirai showed several photos of the prototypes that had been developed. One of these looked like a monster from "Star Wars II," and Shirai admitted that 8 years was a long time for this technology and that a newer project would have designed a less clumsy looking robot.

Another interesting Japanese project is a vision-based vehicle system. This shares some of the same goals as similar projects in the United States, such as at Carnegie Mellon University (CMU). The Japanese project [which is also supported by the Ministry of International Trade and Industry (MITI)] is in two phases. The initial or phase-O part was mostly done by Fujitsu and Nissan around 1989 and involved a vehicle on a special test course, shadowless illumination, and only large obstacles. The vehicle (a motor home) has three cameras for lane detection and two more for obstacle avoidance and a sonar system. Techniques used are line following for lane finding and sonar for obstacles and for range finding. Phase-1, which runs from 1989 to 1995, involves learning how to run the vehicle

on a highway with a centerline by distinguishing line and road boundaries and also road signs. Phase-2, from 1995 to 2000, will deal with multilane highways, tunnels, rain, windshield wipers, and using stereo for obstacle avoidance. Phase-3, from 2000 to 2030, will (hopefully) deal with normal roads, crossings, parking, backing up, and using a mirror and will involve tools of scene understanding and map matching. This project also has a very unique perspective on wanting to use active sensing, for example, to help the scene understanding by using sound and to understand the sounds being received by use of the input visual data. Thus the project designers are thinking about sensor fusion and multisensor integration. These parts of the program will begin soon at Tokyo University. Shirai admitted that at the moment image segmentation is one of their most difficult problems, but he did show us some film of the motor home on its test road and it seemed to be working, although rather slowly. This appeared to be at a much less advanced state than the CMU project I saw more than a year ago.

From the Korean side

Prof. Chan Mo Park
Professor and Chairman

Dept of Computer Science
& Engineering
Pohang Institute of Science
and Technology
P.O. Box 125
Pohang 790-600, Korea
Tel: +82-562-79-2251
Fax: +82-562-79-2299
E-mail: parkcm@vision.postech.

ac.kr

gave a summary of computer vision activities in Korea. Until very recently there was not much to report, and even today he emphasized that industrial applications are very limited. Most research is occurring at universities and government research institutes using

facilities imported from other countries. Several Korean companies do market low-price machine vision systems developed in Korea but, to date, their performance has not been impressive. Production line utilization of computer vision is infrequent and limited to simple inspection and very repetitive tasks. Park claimed that Korean companies would rather not purchase a general purpose vision system such as a Sammi-AB but prefer to obtain very task-specific systems. Industry does see a very strong need for efficient algorithms for segmentation, classification and, of course, for high reliability.

Before 1989 work was very scattered and mostly restricted to workshops and courses in medical imaging, computer-aided design/manufacturing (CAD/CAM), image processing, and computer graphics. Modern work really begins only in 1989 with an Image Processing and Image Understanding Workshop (at POSTECH) at which time it was decided to have annual workshops in order to share research activities. Subsequently, two workshops have been held with a total of 42 papers presented. Two related meetings are worth mentioning, an International Workshop on Medical Imaging (March 1991 at the Korea Institute of Science and Technology) and a Chapter Meeting of the Korea Information Society (May 1991 at Chung-Joo University), which had as its theme "Current Status of Pattern Recognition Technology" and generated half a dozen overview papers. There are now three SIGs interested in vision: SIG AI (artificial intelligence) (Korea Information Science Society), SIG IP-TV (information processing) (Korean Institute of Telematics and Electronics), and SIG Images (Korean Institute of Communication Science).

Park also gave a list of research activities at various Korean research centers (see the Appendix) but did not

go into detail about the projects. This list gives a realistic sense of the work going on in Korea. Because the data were collected by asking scientists, the amount of thought and detail provided varies greatly (how many PCs does a Cray-2 equate to). But by scanning this, it is very clear that there are only a very few places with substantial equipment resources with respect to vision. I will try to obtain more details about the actual progress of the research at those institutes. Park did show AVIS (a project at POSTECH), which is an automated inspection system for use in the Pohang steelmaking factory using the PIPE computer (purchased from Aspex). It is also installed at the Korea Advanced Institute of Science and Technology (KAIST).

For the future Park felt that vision work should concentrate on factory automation, that biomedical applications were still a promising field that could have broader applications, and that handwritten character recognition was the key to office automation applications. In the area of more fundamental research, he felt that Korean scientists should work on moving target detection, remote sensing, mobile robots, and other motion-related problems and that the Korean Government needed to take a more active role with additional funding, manpower development, and mechanisms to encourage cooperation between industry and university, as well as international cooperation.

PANEL DISCUSSION: APPLICATION OF COMPUTER VISION FOR AUTOMATION

This was the most fascinating part of the meeting, as it placed six experts together and gave each an opportunity to describe work that they had seen and work that they were hoping would be done in the future. Panelists were

Dr. Sung Kwun Kim
Executive Director

Robotics & Automation R&D

Division

Samsung Electronics
259 Gong Dan-Dong
Gumi, Kyung Buk, Korea
Tel: +82-546-460-2015
Fax: +82-546-461-8038

Prof. Jong Soo Choi
Dept of Electronic Engineering
Chung-ang University
221 HukSeok Dong
DongJak Gu, Seoul, Korea
Tel: +82-2-815-9231-7 x2235
Fax: +82-2-815-9938

Prof. Kwang Ik Kim
Dept of Mathematics
Pohang Institute of Science
and Technology

P.O. Box 125

Pohang, Kyung Buk 790-330, Korea
Tel: +82-562-79-2044
Fax: +82-562-79-2799
E-mail: kimki@vision.postech.ac.kr

Dr. Takeshi Shakunaga

NTT Human Interface Laboratories NTT

3-9-11, Midori-cho

Musashino-shi, Tokyo 180, Japan

Tel: +81-422-59-3336
Fax: +81-422-59-2245
E-mail: shaku@nttarm.ntt.jp

Dr. Johji Tajima
Research Manager

Pattern Recognition Research
Laboratory

NEC C&C Information Technology
Research Laboratories
1-1 Miyazaki 4-chome

Unfortunately, none of the panelists provided handouts and so my summary below is based on notes that may not be completely accurate.

Ejiri (Hitachi) only made a few remarks but pointed out that vision systems were realized in Japan 20 years ago. (See my comments earlier about depth and breath of research vis-a-vis Japan and Korea.) He also pointed out that there was very tough competition between Japanese companies but very friendly discussions between researchers.

Miyamae-ku, Kanagawa 216, Japan (Isn't this the Japanese way; maybe this

work have a common thread that they begin with a good (high contrast) image input. He mentioned three specific examples of vision systems now in use within Toshiba but did not go into any real detail about any of the specific hardware or software techniques that were used.

• Soldering inspection system for the mounting of integrated circuits (ICs) onto printed circuit boards (PCBs). In some sense this is a very simple problem, as there is a clean model of what the image is supposed to look like. The hard part of this problem is to get good input images. Toshiba's inspection station uses 168 light-emitting diodes (LEDs) to illuminate different parts of the target.

• Agricultural automation. This involves using a robot to cut young plants at the proper stem length.

• Digital audio tape, and VCR, magnetic head gap-width adjusting system using computer processing of images of Moire patterns.

Kuno commented succinctly about the state of the art, that "we are using '70s algorithms on '80s hardware." As for the future he felt that there would be no general purpose vision system in the near future because of cost issues. In his view there are three basic ways to use computer vision systems.

• Use simple (e.g., low cost) vision system cleverly for factory automation, human computer interface, etc.

• Apply heuristic tuning to fields with strong needs, e.g., character scanning/ recognition is a perfect example.

• Do basic research on sophisticated vision systems for future applications, such as robots, nuclear power plants, space work, etc.

Presumably Toshiba's research support standardized database of images would
will follow these paths.
be very helpful for studying algorithms.
As far as new directions, he mentioned
the importance of sensor fusing to
enhance the reliability of existing
techniques.

Tajima (NEC) felt that for image processing (as opposed to image understanding) there were already very cheap general purpose systems with many operators built into hardware for preprocessing (such as thresholding, etc.). He then went on to give a rapid description of a collection of vision applications within NEC, again with few details.

• Multilayer substrate inspection
station to detect short circuits, pin
holes, etc. for use with the boards
NEC uses on their supercomputers
(SX series). This system can inspect
a 225-mm2 board area in 25 minutes.

• Soldering inspection station, looking
a great deal like Toshiba's, with
five cameras and lights for three-
dimensional (3D) views.

• Deformation measurement by laser
range finding for circuit boards.

• Inspection system for determining
if foreign objects are inside empty
(Coke) bottles, and another system
for determining the amount of liquid
in a bottle.

A 3D human body measurement
system. This was the most intriguing
of the lot. The application here is to
determine the tailoring of apparel
by measuring cross sections of
humans. The subject is in a box and
is illuminated by six lasers. The
software uses a body model that
runs on a workstation and a database
that runs on a minicomputer.

As far as industry was concerned, Tajima felt that the important work needs to be done in 3D recognition as well as motion detection, and that recognition of features needs to be above 99% to be industrially useful. He felt a

Shakunaga (NTT) claimed that NTT was trying to combine visual information processing with other technologies to develop applications in the area of advanced visual telecom services and network support, both of obvious importance to NTT. He gave two examples.

• Maintenance. A manhole facility inspection system using a truckmounted, underground-looking radar that eliminated the need for digging to locate pipes. This uses pattern recognition and frequency domain analysis. This is said to work to a depth of 1.5 meters, which includes 75% of the company's pipes. (If you have ever lived in "dig we must" New York you will know how welcome such a system would be.) A second system uses a TV camera on a stalk that looks inside manholes and generates a stereo view of the facility's layout inside (using vertical camera movement for the second image) and then generates a drawing of the manhole contents. This uses edge detection, which is said to be accurate to 0.05 pixel.

⚫ Human computer interface. The idea is to transmit less feature data for teleconferencing. NTT has been experimenting with human head, lip, and body image readers. The idea is to interpret head motion and generate understanding based on head movement. This uses edge detection of head silhouette and analysis of facial area.

Shakunaga divided future research themes in three directions.

Early vision. Because human systems are very adaptable, we should study adaptive tuning of input data and attention getting mechanisms. We should also study human implementation of early vision algorithms for edge, region, texture, shape, distance, motion, etc.

Middle level vision. Requires research into model based matching, from specific (recognition) to generic (cognition).

• High level vision. Study 3D world description and manipulation. Consider integration of vision and semantic databases.

Kim (Samsung) felt their problems were similar to NTT's and to Toshiba's. He also felt that the cost of vision systems will be coming down quickly, although this is now still a bottleneck. He gave a short list of computer vision applications but with even fewer details than the other industrially based speakers.

• System for mounting a screw in a microwave oven cavity.

Simple assembly.

Choi (Chung-ang University) described work in 3D vision. Of course, the major problem is to extract information about a 3D world from twodimensional (2D) images. This can begin with range finding, for knowing the distance to objects will then allow one to determine which one is in front, etc.; or it can begin with finding segments, features, objects ... to which stereo, etc. can be applied. (An occlusion boundary, for instance, allows triangulation on the occluding edge--it is not a feature of the occluded object.) The two approaches are:

Passive

• Monocular vision requires a priori information (in some problems this is available).

• Photometric stereo, e.g., using different light sources.

Shape from shading, although recovering surface orientation from a single image is obviously ill posed as are many of the monocular techniques.

• Range data from two different images, or a sequence.

not in a factory situation. There are no camera standardizations. Missing scene parts and shadowing are a problem, as obviously it isn't possible to deduce 3D data for missing parts of a scene.

COMMENTS ABOUT SPECIFIC CONTRIBUTED PAPERS

A complete list of titles/authors of the presented papers is being prepared and will be distributed (electronically) as soon as it is ready. However, topics discussed included

• Character recognition & document understanding

• Image processing & coding

• Hough transform

• Scene understanding & object recognition

• Neural nets

• Stereo & shape recognition

• Motion & sequential image processing

• Sensing & recognition

Mobile robots

• Vision paradigm

• Computer vision hardware & algorithms

• Motion & shape reconstruction

• Intermediate & high level vision

• Thinning, quadtree, & component labeling

• 3D modeling & recognition

« iepriekšējā Turpināt »

Grāmatas