Lapas attēli
PDF
ePub

ELECTRONIC DICTIONARY

RESEARCH INSTITUTE (EDR)

The Electronic Dictionary Research Institute is described and a potential new project on knowledge archives is discussed.

by David K. Kahaner

INTRODUCTION

Machine translation (MT) is a major activity in Japan. It is considered an investment in the future to develop and enhance Japan's national information capacity. Most Japanese electronics companies are involved in the development of MT systems; some are already in use and many others are almost ready. Many users of MT systems consider them to be extremely valuable in limited fields such as technical manual translation. The Japanese also believe that MT will promote standardization of technical writing and glossary development and increase the use of electronic media for document transmission. There is no question that MT is being more actively pursued in Japan than in any other country.

From the U.S. side, there have been numerous studies on Japanese MT. The most definitive was conducted by the Japan Technical Evaluation Center (JTEC), with a workshop on MT conducted in March 1991, and a comprehensive report issued shortly after that. See

Machine Translation in Japan

(January 1992) Jamie Carbonell, Chair Japanese Technology Evaluation Center

Loyola College in Maryland 4501 North Charles Street Baltimore, MD 21210-2699

The basic conclusions of this study, in addition to the comments above, centered on the fact that fairly conventional approaches are being employed, a great deal of pre-editing is still in use, a great deal of pre-editing is still in use, and more native English speakers are needed during the development stages of the projects.

The purpose of this report is not to focus on MT directly but to describe one important research activity, the Japan Electronic Dictionary Research Institute (EDR). EDR's functions are as follows:

(1) To produce computer software (programs and a database that can be used as a dictionary) and to perform research into systems utilizing such a dictionary.

(2) To license industrial ownership of the products of (1) and to license "know-how," including the copyright for the computer programs.

Thus EDR's work can be viewed as a major underpinning to MT systems.

EDR is a private company that is supported by the Japan Key Technology Center (JKTC), as well as a collection of Japanese companies: Fujitsu, NEC, Hitachi, Sharp, Toshiba, Oki, Mitsubishi, and Matsushita. JKTC is run both by the Ministries of International Trade and Industry (MITI) and Posts and Telecommunications (MPT).

Funding arrangements are complex [see, for example, my article "Advanced Telecommunication Research Institute (ATR)," Scientific Information Bulletin 17(2), 19-23 (1992)], but in the case of EDR, its funding by JKTC is through MITI. EDR was set up to run as a 9-year project, ending in 1994. EDR's total budget from JKTC is ¥14B, plus about 30% from the participating companies. There is no comparable size project in the United States.

EDR has a laboratory in central Tokyo [adjacent to the Institute for New Generation Computer Technology (ICOT)] with 50-70 people, including three or four computer scientists. In addition, there are distributed laboratories at each of the industrial firms associated with the project. The total work force has been as high as 300, although it is normally about 100. Workers are often employed from commercial dictionary companies, and the central laboratory has many "company" employees, too. On my visit, I

met

Toshio Yokoi, General Manager
Japanese Electronic Dictionary
Research Institute, Ltd.
Mita-Kokusai Building Annex
4-28 Mita 1-chome
Minato-ku, Tokyo 108, Japan
Tel: +81-3-3798-5521
Fax: +81-3-3798-5335

[blocks in formation]

03/92 Word dictionary complete (1st ed., Japanese and English) for commercial use Dictionary interface published (2nd ed.)

03/93 Complete for commercial use Word dictionary (2nd ed.) Concept dictionary (1st ed.) Bilingual dictionaries

(1st ed.) Co-occurrence dictionaries

(1st ed.)

Dictionary interface (3rd ed.)

The EDR project has only modest interaction with ICOT, although it is just next door. ICOT's work is seen as fairly theoretical when viewed from EDR's perspective. It is hoped that in time bridges will be built between MT work at Japanese companies through EDR to ICOT. Some early EDR work was done with Prolog, although now most of the programming is in C on standard Unix workstations. (Generally, in Japan, there is much reduced interest in LISP-based artificial intelligence (AI) related work; most successful projects are now being written in C.)

EDR researchers see themselves as providing the key technologies for MT. They freely admit that their data structures are relatively flat, but believe that successful MT over general fields cannot occur without a very large dictionary and associated software tools to access it. They believe that research on small-scale dictionaries can help to improve electronic dictionary technology, but that the minimal requirement of an electronic dictionary is that it should be of large scale. Consequently,

A schedule for access has also been language accumulation cannot be done published.

01/91 Dictionary interface published (1st ed.)

12/91 External evaluation group established

Word, Concept, Co-occurrence dictionaries given to six universities for evaluation

manually, even with the best efforts. A large dictionary requires the development of computer, natural language, and knowledge-processing technology. For example, different language dictionaries interact through so-called head concepts, i.e., word dictionaries are connected through concepts, in a way

that EDR believes will make it relatively easy to add new languages. In addition, EDR chose the information processing field as a prototype technical application field. They estimate that 30-50 million words exist in the fields of mechanics, chemistry, biology, medicine, economics, law, etc. EDR thinks of itself as providing a knowledge base, rather than a database, in the sense that each dictionary item in the EDR dictionaries is created and described by linguistic specialists. These data are verified, evaluated, and corrected in connection with text databases with computer tools. (This is not to say that EDR content is perfect; the JTEC panel found opportunities to criticize some individual entries.)

USE OF EDR DICTIONARIES AND A POTENTIAL NEW PROJECT

EDR staff members want to encourage international cooperation in the use and further development of the dictionaries. All the results of the EDR project will be sold at reasonable prices I was told. The same conditions regarding the use of the EDR electronic dictionaries are to be applied to all users, no matter whether they are domestic (Japanese) or overseas. EDR plans to set its prices much lower than those of machine readable dictionaries currently on sale. Further, for academic users, such as universities and public research institutions, special measures are being planned, including very low prices. Details are being formulated now and should be available some time this spring.

EDR's view of the role of the electronic dictionary as a primary tool for knowledge acquisition leads naturally to an extension in order to study the accumulation of other knowledge. The EDR project formally ends in 1994, although sales or licenses of the dictionaries may provide funding for maintenance and other research. Hence project planners are investigating what

to do next. We were given a careful description of one potential project, proposed by EDR's General Manager Toshio Yokoi, under the general heading of Knowledge Archives. He explained to us that at the moment this is simply an idea, and there are no firm commitments from the Japanese Government. The project may not be implemented, or it might be rearranged in a significant way. Nevertheless, Yokoi's idea is to push forward in the area of very large-scale knowledge bases and to develop "knowledge archives." He wants to perform research and development of various technologies in the following areas.

• The technology to acquire and collect in an automated way vast amounts of knowledge.

The technology in which knowledge bases are self-organized so that substantial amounts of knowledge can be systematically stored.

The technology that supports the creation of new knowledge by using vast amounts of existing knowledge.

⚫ The development of appropriate and applicable knowledge bases that fulfill the need for various knowledge usage.

• The technology that translates and transmits knowledge to promote the interchange and common use of knowledge.

• The development of a basic knowledge base that can be shared by all applications.

A fuller description of Yokoi's proposal is given by him in the paper "Knowledge Archives--Very LargeScale Knowledge Bases Forming the Basis of Knowledge Processing Technology," available from him at the address above.

NEURAL NETWORK RESEARCH AND

DEVELOPMENT IN ASIA

The 1991 International Joint Conference on Neural Networks, held on
18-21 November 1991 at Singapore, is summarized and assessed.

INTRODUCTION

In the past decade, there have been significant increases in research and development (R&D) in the area of neural networks in the United States, Europe, and Asia. In the United States, much of the research activity is supported by the Office of Naval Research (ONR) and the Defense Advanced Research Projects Agency (DARPA). The impetus is provided by the work of P. Werbos on the backpropagation algorithm, the work of J. Hopfield on neural modeling, and the work of S. Grossberg on the adaptive resonance theory. In Europe, funding increases have also been seen in the European Community (EC). There the impetus is provided by the work of T. Kohonen in Finland on self-organizing maps and the work of R. Eckmiller in Germany on neural control. In Japan, neural network research is seen as the natural follow-on to the fifth generation computer program. The New Information Processing Technology (NIPT) program, also called the Real World Computing program, is a multimillion dollar, long-term program that is still in the planning stage [see the article by D. Kahaner, "First New Information Processing Technology Workshop '91," Scientific Information Bulletin 17(1), 51-60 (1992)]. The impetus in Japan is

by Clifford Lau

provided by the work of S. Amari on the mathematics of neural computing and by the work of K. Fukushima on the neocognitron. Together with the increases in R&D activities, there have been many conferences on the subject of neural networks.

The 1991 International Joint Conference on Neural Networks was held on 18-21 November 1991 at the Westin Stamford and Westin Plaza in Singapore. At the conference, which was attended by about 530 people from all over the world, 440 papers were presented. A breakdown of the authors and attendees breakdown of the authors and attendees as well as their countries is given in Table 1. To no one's surprise, many papers were from the United States (slightly over 100) and from Japan (slightly under 100). However, there were a significant number of papers from Australia, China, Korea, Taiwan, and Singapore.

This report summarizes the research and development work in neural networks in these and other Asian countries. The research in the United States is not included in this report because the state of the art in neural network research is probably familiar to those who follow this field in the United States. The research in Europe is also not The research in Europe is also not included here because the papers, even though many were presented, are not representative of the large amount of effort in Europe.

AUSTRALIA

In Australia, neural network research is spread out in many universities and industrial research laboratories such as the University of Western Australia, the University of Melbourne, the University of New South Wales, Royal Melbourne Institute of Technology, Queensland University of Technology, Monash University, and Telecom Australia Research Laboratory. Much of the work is in applying neural network technology to various problems. Table 2 lists the research topics and locations.

As can be seen from Table 2, the research interest in neural network technology in Australia is very broad. Of particular interest to the Navy is the work of Mathew J. Boek at the Royal Melbourne Institute of Technology on the application of neural networks to rotating machine fault diagnosis. A backpropagation network is used to classify the condition of an operating desk fan based on its vibration signature. Data from a set of experiments are used to train the network. The trained network is then used to detect and classify faults commonly occurring in industrial fans, such as impeller unbalance and cracked impeller blades. The results of these experiments show that the network is quite successful at distinguishing between the two types of faults

« iepriekšējāTurpināt »