Scientific Information Bulletin

Observations on Computational Mathematics in Japan
David K. Kahaner and C.T. Kelley

.........

The International Symposium on Computational Mathematics is summarized,
with special emphasis on the contributions from Japanese and Chinese
researchers. Kelley's observations on optimization research, based on visits to three
research facilities, are presented.

The German National Research Center for Computer Science in Tokyo
David K. Kahaner

The Tokyo liaison office of the German National Research Center for Computer
Science is described.

Fourth Institute for Supercomputing Research Supercomputing Workshop
David K. Kahaner and Tony F. Chan

Page

........

Some observations on the trends and characteristics of parallel supercomputing
research in Japan are presented.

Japanese Database Activities

David K. Kahaner

The activities of the Japan Database Promotion Center are described and the
current state of affairs is summarized with respect to databases in Japan, especially
those of interest to scientists.

Hitachi's Various Research Laboratories

David K. Kahaner

Hitachi's research plan for the 1990s and computer-related research in the
company's research laboratories are summarized.

Observations on Neural Network Research and Development in Japan
Clifford Lau

Japan, with its very advanced VLSI technology capabilities, is in a very good
position to capitalize on the advances in neural network research and neurocom-
puter architectures. This article describes the latest research and development
activities in Japan based on a recent trip to Japan and other readings of publicly
available materials.

Page

X-Ray Lithography in Japan

Kenneth L. Davis

Electronics

A study panel was formed by the Office of Naval Research and DARPA to evaluate
the current status of x-ray lithography technology in Japan. This article summarizes
the panel's findings.

General

A Changing Paradigm for Industry-University Cooperation in Japan
T.W. Eagar

Toyohashi University of Technology and its sister university in Nagaoka, estab-
lished to provide new paths for advancement for students from technical high
schools and colleges of technology, encourage research of direct interest to industry,
creating a new paradigm for industry-university cooperation in Japan.

Life Science

Marine Biology and Biotechnology in Japan

Aharon Gibor

Japan's economy is a prime example of the dependence on marine biomass
resources for the national food basket. This article describes some of the research
institutions conducting research programs in marine biology and the focal areas of
interest.

The First International Conference on Brain Electromagnetic Topography
J.W. Wolfe and S. Matsuoka

This conference focused on two major areas: the recording and displaying of
magnetic brain potentials (magnetoencephalography) and electrical brain poten-
tials (electroencephalography).

101

Materials Science

Trends in Materials Processing in Japan

T.W. Eagar

This paper briefly describes the latest trends in materials processing in Japan as
reported at the Fifth International Symposium on Advanced Technology in
Welding, Materials Processing, and Evaluation.

Ocean Science

The Republic of Korea Navy Ocean and Underwater Medical Research and Training Center

Neal A. Naito

Page

111

115

This article describes the Korean Ocean and Underwater Medical Research and
Training Center, the leading military institution in the country conducting sub-
marine, diving, and hyperbaric medical research.

Seminar on Autonomous Underwater Vehicles
Gregory Stone

Autonomous underwater vehicles (AUVs) are a new and rapidly developing tool
for undersea science. This unique seminar brought together representatives from
five countries who are currently developing AUVs.

TECHNO-OCEAN '90
Gregory Stone

TECHNO-OCEAN has emerged as the premier ocean technology conference for
the Western Pacific region. Many of the papers and discussions addressed global
climate change issues and the application of ocean technology to those issues.

Satellite Remote Sensing in Japanese Oceanography
Kathryn A. Kelly

Because Japanese scientists are just starting to focus on global problems in earth
science, there is as yet no demand for global data. This article describes Japan's
efforts to expand the use of satellite data in oceanography.

121

.... 125

129

Cover: Drawing of the three-compartment saturation dive chamber complex at the Korean Navy Ocean and Underwater Medical Research and Training Center (OUMRTC). The complex consists of a main chamber that houses the living quarters for mission personnel, a subchamber that contains shower and toilet facilities, and a wet diving pot. Courtesy of Neal A. Naito. See his article on OUMRTC on page 115.

SIBRIEF

Scientific Information Briefs

/40 is multiprocessor. Their nomenclature is mildly confusing, as the designation /x0 corresponds to the number of scalar rather than vector units, even though the latter determine peak performance.

Fujitsu is deeply interested in multiprocessing; one indication has been their Ministry of International Trade and Industry (MITI) sponsored joint research with NEC and Hitachi, called informally the HPP project, involving four VP2600s each operating as a uniprocessor attached to a very large shared buffer memory. Fujitsu claims that such a large multiprocessor was developed mainly to demonstrate their success with room temperature HMET devices (see below) as the communications drivers between the computers and memory. Nevertheless, using this, a NEC researcher was able to solve a very large system of 32K linear equations in less than 11 hours.

Fujitsu is probably experimenting on a /40 multiprocessor for the VP2600, but has not released any public information about this. Without a /40 for the VP2600, Fujitsu's VP2000 series peak performance (however unrelated to actual performance) will fall short of current competition from NEC as well as new machines from Cray, and perhaps others. In the meantime though, the VP2000 series comes in a variety of colors, including Elegance Red, Future White, and Florence Green.

Peak performance values of the /10 and /20 models in any line are the same, as this is determined entirely by vector processing. Peak performance can easily be computed once the machine cycle time and the maximum possible number of simultaneous floating point

operations are known. For example, the VP2400/40 and VP2600 each have cycle times of 3.2 ns. To achieve the advertised 5.0 GFLOPS peak implies 16 simultaneous floating point operations. For the VP2400/40 this requires 8 per vector unit, while for the VP2600/20 16 simultaneous operations are required. Each of Fujitsu's vector units is described as having two arithmetic pipes, but in reality they are more complicated. Each pipe is capable of simultaneously performing both an addition and a multiplication. In addition, the pipes effectively deliver twice (VP2400/40) or four times (VP2600/ 20) as much data. Thus each pipe on the VP2600/20 can produce four floating point additions and four floating point multiplications per cycle. This is similar to the "superword” concept on the ill-fated Cyber 205. Of course, if a calculation is dyadic, that is, does not involve both a multiplication and addition, then the peak performance will be reduced by 50%.

By studying the performance of VP2000 machines on typical job streams, it has been observed that when the scalar unit is 100% in use, the vector unit is about 50% to 75% busy. Thus, the addition of a second scalar unit can significantly increase throughput, and was presumably Fujitsu's reason for adding it. However, for any single user problem it might not be possible to keep the vector unit constantly busy. Thus, the most practical environment for such a setup would be a computing center or other multiuser job shop, where several user jobs can be run simultaneously. Kyoto University, a typical busy university computing center, will be getting a VP2600/10 soon.

We asked why they were getting only one scalar processor. Although the university made a very strong case for two scalar processors, the Ministry of Education decided (based on budgetary, or other, grounds) to only support the one scalar processor system. However, it is an easy field upgrade to add the second scalar unit. The choice of a VP2600/10 rather than a VP2400/40 was a matter of policy, Kyoto has always tried to purchase the fastest machine available. It is also possible that they would like to upgrade eventually to a multiprocessor 2600 when this is avail

able.

DO 4000 J=1,4096

DO 4000 I=1,2048
A(I, J)=0.0
A(I+2048,J)=0.0

4000 CONTINUE

Then the actual multiplication is as follows.

DO 5000 L=0,1

DO 5000 J=1,4096
DO 5000 K=1,4096,4
DO 5000 II=1,2048
I=II+(2048+L)

A(I,J)=A(I,J)+B(I,K)

5000 CONTINUE

*C(K,J)+B(I,K+1)

*C(K+1,J)

+B(I,K+2)*C(K+2,J) +B(I,K+3)*C(K+3,J)

In this case the matrices are large enough that there is significant memory to register to memory traffic. Nevertheless, Fujitsu's FORT77/VP compiler is able to vectorize this effectively and generate 4.8 GFLOPS, 96% of peak performance.

As is the case with most of today's vector supercomputers, data to and from the vector arithmetic units need to pass through vector registers. In the VP2600 these registers have a capacity of 128 KB (64 elements times 256 registers times 8 byte data) but can be concatenated in various ways, for example, as 2048 times 8 times 8 instead. Thus, the organization of the registers is very flexible. To get data between memory and the vector registers, Fujitsu only provides two load/store pipelines. This could be a bottleneck, although the NEC register flexibility may alleviate it to a certain extent. Memory to register bandwidth has been criticized in the VP2000 series, but at least one new benchmark, given below, suggests that Fujitsu has been making efforts to deal with this. The computation of interest is that of multiplying large matrices A=B*C, each of which is 4096 by 4096, with real 64-bit floating point components. The source program is written in 100% standard Fortran but is organized to take advantage of the two-pipe structure of the VP2000 architecture in a very clear way. The essential segment of the source program consists of first zeroing the target array.

I visited this factory in March 1990 and reported on the SX-3 in a previous issue of the Scientific Information Bulletin ["NEC's new supercomputer, the SX-3," 15(3), 4-6 (1990)]. Then the only running system had one processor. Now, several one-processor machines are being tested prior to shipment and a two-processor system has been set up and is being debugged. Chief designer Watanabe stated that a one-processor system, depending upon peripheral options, would cost in the neighborhood of $10M. He claimed that the four-processor system will be up in a few months, and we have heard estimates that it will cost roughly $25M.

Peak performance of a uniprocessor system is 5.5 GFLOPS, based on a cycle time of 2.9 ns and 16 simultaneous operations (16/2.5=5.5). The vector unit in such a system consists of one, two, or four sets of vector pipelines. Each vector pipeline set consists of two add/ shift and two multiply/logical functional pipelines. Each of the functional pipelines can be operated simultaneously; thus, the arithmetic processor in a uniprocessor system with four vector pipeline sets can execute up to 16 floating point operations per machine cycle. To get near peak performance, all 16 pipes must be kept busy. Data are fed to and exit from the arithmetic pipes to vector registers, with a maximum capacity of 144 KB. It is unlikely that an SX-3 system would be purchased without all four pipes in each processor.

The four-processor system is thus capable of 22 GFLOPS peak, although this assumes that all the data can be kept in the vector registers. To the extent that data must be brought from main memory to the registers, performance may degrade. The bandwidth between memory and the registers depends on the memory hardware technology, and on how the data are arranged in the memory banks, but serious applications must keep data in registers to get good performance. Further, 22 GFLOPS requires 64 simultaneous operations, and this will mean that different operations have to occur simultaneously. Also, unless the user program can be divided up into simultaneous, independent tasks that use the same data in the vector registers, arrays will have to be quite long to absorb the startup penalty of being parcelled out to several processors. The most effective environment for such multiprocessors is a busy multiuser computer center, similar to that for other large multiprocessors. Most computer

« iepriekšējā Turpināt »

Grāmatas