« iepriekšējāTurpināt »
Since only the listed symptoms are
the AP1000 below both the Intel Delta detected, unanticipated errors will not
machine and the Thinking Machines be detected and could cause system At Fujitsu, we were shown the new Connection Machine. The lower perfailure. The second problem is that Fujitsu parallel computer, called the formance as compared to the Delta there is no recovery from recovery fail- AP1000. Fujitsu has produced a small machine probably results mainly from ures. In other words, if a fault occurs number of machines of this type rang- the performance difference between during the process of recovering from a ing from 64 processors to 512 proces- the SPARC processor with a Weitek fault, the system could fail. This prob- sors. These machines are used inside floating point co-processor used by the lem was recognized by Fujitsu and led Fujitsu and at various universities in AP1000 and the Intel i860 used by the to the definition of a critical route, i.e., Japan and other countries for parallel Delta machine. I should mention that any procedure whose failure would lead processing research. At this time, Fujitsu the AP1000 possesses no fault tolerto a recovery failure. The design team has no plans to try to market this ance features. Fujitsu has also recently took special care to inspect these crit- computer. Highlights of the AP1000 developed a fault tolerant computer ical routes, reduce their number, and architecture include three communi- called the SURE SYSTEM 2000, but simplify their design. It is safe to say cation networks, nodes composed of we were not shown anything related to that the measures described in the paper SPARC processors with Weitek float- this machine during the tour. will serve to greatly increase the reli- ing point co-processors, and a very fast The second stop on the tour was at ability of the SXO operating system. interprocessor communication mech- a Hitachi research laboratory specialThere is still a good chance, however, anism. The three networks consist of izing in expert systems. Here, we were that unanticipated bugs will cause sys- one for broadcasting to the nodes from shown a videotape detailing the develtem failure or seriously impaired per- either the host or another node, one to opment and use of a Hitachi expert formance in the early stages of system handle barrier synchronization, and a system for planning construction of large life. (One attendee from Tandem third for interprocessor point-to-point tunnels. We were also shown statistics pointed out to me that one of the communication. The point-to-point concerning the numbers of expert sysimportant features of Fujitsu's system communication network has a torus tems in place at 261 Japanese comwas the incorporation of an artificial configuration and the routing algorithm panies. The data showed the percentfault generation mechanism built into combines wormhole routing with a ages of these systems that were diagthe operating system. Each SXO com- structured buffer pool approach to avoid nostic type (about 40%), planning type ponent can have 10 or more interrupt deadlock. The Fujitsu representative (about 27%), design type (about 16%), points, and several thousand have been reported that they had achieved a pro- control type (about 6%), and other built in already. The final product can cessing rate in the range of 2-3 GFLOPS (about 11%). Again, we were not be shipped with the traps embedded using their 512-processor machine on shown anything related to fault tolerant but not in operation. In this form there the LINPACK benchmark. This places computing. is negligible overhead, but it is very easy to bring them into effect. Traps are written as C macros, and hence are easy to program. DKK)
PARALLEL PROCESSOR FOR
Project GRAPE, a parallel computer for many-body calculations, which
version operates at 10 GFLOPS.
by David K. Kahaner
subsequent motion of the particles as information about the particles and
time progresses. In principle this is the use of fast algorithms. For example, Classical many-body systems play a very simple; the differential equations in some situations forces die off rapidly key role in astrophysics, star clusters, are solved numerically by taking finite, with distance, and so it is only necesgalaxies, and clusters of galaxies when but small, time steps. At each time step sary to compute the effect of nearby their interactions are described as a the forces are computed and the particles particles. Thus particles could be system of particles interacting through moved an appropriate distance. If there grouped into near and distant, leading gravity. At the other end of the mass are N particles, there are N^2 force to efficient algorithms. For short-range spectrum, plasmas and systems of interactions that need to be computed, force problems of this type, linked-list molecules are also treated as many usually by pairwise summation. For large algorithms can reduce the force calcubody systems that interact through systems this portion dominates all other lation to something proportional to N. Coulomb and van der Waals force. In aspects of the computation, as the time Tree algorithms can be used in those astrophysics, the applications are pri- for other parts is usually linearly cases when the required accuracy need marily to basic science, but at the molec- dependent on N.
not be very high. These latter algoular level applications are to drugs, N-body computations are one of rithms can be quite complicated and smart material design, and many other the major absorbers of supercomputer might be difficult to program on fine practical problems.
cycles. For example, at the Australian grained supercomputers, although the The basic idea is very simple: a large National University the two scientific amount of computational work can be collection of particles start at an initial fields using the most supercomputer reduced from about N^2 to about configuration and interact with each time during 1990 were astrophysics and N*In(N) for the force calculation. For other according to Newton's law, the molecular dynamics (see D.K Kahaner, example, a force calculation for the i-th acceleration of each particle determined “Computing and Related Science in particle could be written as (a cell is a by the forces acting on it, those gener Australia,” 16(4), 47-68 (1991)]. Most collection of nearby particles) ated by the other particles, and any of this was for N-body calculations. For other external forces. The detailed form example, one 2,000-body simulation on subroutine treeforce(i,cell) of the force between two particles is a Fujitsu VP-400 took 40 hours, although if( cell and particle i are well separated) determined by the system but depends very clever algorithms can reduce this.
force force from the center of mass nonlinearly on the distance between Molecular dynamics simulations are
of cell them. (The major assumption here is extensively used to study three- else that the particles are point masses.) dimensional structures of protein force = sum of forces from the This leads to a system of N second molecules and the phase transition of
children of cell order ordinary differential equations materials.
endif (if there are N particles), all of them The basic calculation is inherently end looking very much alike.
parallel. Many clever techniques have In the simulation of these systems been developed to speed up these cal- Of course, this has to be implemented one thus needs to solve the equations culations including extremely sophisti- either in a language allowing recursion and their solution will describe the cated data structures for linking
or in Fortran by an appropriate data The GRAPE project (GRAPE-1, a machine running at 10 GFLOPS, is a list.
GRAPE-1A, GRAPE-2, GRAPE-2A, pretty good endorsement of this For systems on a uniform spatial GRAPE-3, and GRAPE-4) has been approach.) The hardware only performs mesh, fast Fourier transform (FFT) evolving only over the past 3 years. one function; the software is simple to methods can also be applied. FFT algo- This is a university project that has had interface with existing Fortran or C. rithms also reduce computation from collaborative assistance and hardware GRAPE is attached to a Unix host by a N^2 to N*In(N) and can be very effec- support from Fuji-Xerox (just in case VME bus (the earliest version used an tive, but they only work with a uniform you thought that they only built copiers). IEEE-488 interface). The host sends mesh. There is still a limit of a few tens GRAPE-3 is complete and Makino the particles' positions to GRAPE, of thousands on the number of parti- states that its effective speed is which calculates the force on any cles that can be easily dealt with. The 10 GFLOPS. A series of papers, in requested particle and sends this back number of atoms in a large protein English, has been submitted (to mostly to the host. The total user documentamolecule exceeds 10,000, and when physics and astronomy journals) and tion is available as a few “man"
pages. immersed in water the system can easily some papers have already appeared. (I have read several examples of GRAPE exceed 100,000 particles. Direct pair- A bibliography is included below. (Most programs; most users will be able to wise summation of the forces can lead of these papers have multiple authors, write their own applications with very to excessive computation but is very including at least Sugimotoor Makino; little “spin-up” time.) Of course, this general and applicable to virtually any thus, authors' names are omitted.) means that to some extent performance physical structure.
GRAPE is known in the astronomy of a GRAPE system is limited by the
community. In fact, Makino has just host and the communications, while GRAPE
returned from a substantial visit to the this is not the case for DMDP. For
Cambridge Institute of Astronomy. But molecular forces, the force computaGRAPE (GRAvity PipE) is the name it is not well known otherwise, perhaps tion is less expensive (because only for a series of special purpose parallel because, until recently, the major appli- nearby particles need be considered), , computers designed at the University cations of the project have been to and thus GRAPE will be less appropriof Tokyo by
astronomical rather than molecular ate than DMDP, but for long range
systems. Because so many of the project's forces the opposite will be true. NeverProf. Daiichiro Sugimoto
papers are in English, and also since theless, when costs are considered, Dept of Earth Science and
the principals are easily available via GRAPE might be a good choice for Astronomy
E-mail, the purpose of my report is to either.
shows clearly the evolution of the sci3-8-1 Komaba
The fundamental idea behind entists' thinking about applications, as Meguro-ku, Tokyo 153, Japan GRAPE is that direct summation of well as the availability of better hardTel: +81-3-3467-1171 x452
forces is simple and parallel. Very excel- ware. GRAPE-1,-1A, and -3 are "low Fax: +81-3-3465-3925
lent work in this area has been demon- accuracy.” Subtraction between posiE-mail: email@example.com
strated by users of Connection Machine, tion coordinates is in fixed point using tokyo.ac.jp
but the GRAPE view is that with the 16, 16, and 20 bits, respectively. Accu
advent of VLSI it ought to be possible mulation of the force is 48, 48, and and
to build a special purpose chip of modest 56 bits, respectively. In GRAPE-2 and
cost which is still flexible ENOUGH. -2A, subtraction between coordinates Dr. Junichiro Makino
This is not the only project with this and force accumulation is 64-bit floatDept of Information Science and goal. The Delft Molecular Dynamics ing point (double precision), while the Graphics
Processor (DMDP) is specifically remainder of the calculation is single College of Arts and Sciences designed for these problems, too. The precision. The GRAPE-3 system uses University of Tokyo
major difference is that Delft's machine a custom LSI chip; up to 48 chips (strictly 3-8-1 Komaba
does all the computations, while speaking, 46 chips for the force calculaMeguro-ku, Tokyo 153, Japan GRAPE systems only perform the force tion) can be installed, allowing force Tel: +81-3-3467-1171 X240
computation. The latter allows much calculation on a similar number of Fax: +81-3-3465-2896
simpler hardware and software. (The particles, in parallel. At a 10-MHz clock, E-mail: firstname.lastname@example.org- fact that the first design for GRAPE-1 GRAPE-3
can perform about tokyo.ac.jp began in April 1989, and there is already 300 MFLOPS for each chip, thus about
13.8 GFLOPS when fully configured approach is ad hoc, but they have taken “A special-purpose computer for graviThe full system is in operation now and this to the stage where new science is tational many-body systems GRAPE-2,” the measured speed is at present being done daily. It is a very impressive PASJ 43, 547-555 (1991). 9.9 GFLOPS. Newer machines have job and it will be interesting to see how clever pipelining, calculate not only well the systems work on molecular
well the systems work on molecular “GRAPE-2A: A special purpose comthe force but also the potential, use dynamics problems.
puter for simulation of many-body interpolation tables for forces, etc. They
systems with arbitrary central force,” also can implement some tree algo- BIBLIOGRAPHY
to appear in HICSS-25, 25th Hawaii rithms. A great deal of detail concern
International Conference on System ing the hardware of GRAPE systems is “A special-purpose computer for gravi- Sciences, Koloa, Hawaii, 7-11 January given in the papers listed below, so we tational many-body problems,” Nature 1992 omit this here. Figures 1 through 6 Japan 345(6270), 33-35 (3 May 1990). illustrate different GRAPE systems.
"Amodified Aarseth code for GRAPE Current plans are to develop “GRAPE: Special purpose computer and vector processors,” in press, PASI GRAPE-4, which is targeted at for simulation of many-body problems,” (1991). 1 TFLOP. This will be done by using International Symposium on Supercom1,024 1-GFLOP chips in a three-level puting, Fukuoka, Japan, 6-8 November “A special-purpose N-body machine tree. The host is connected to four 1991.
GRAPE-1,” Computer Physics Comm controller boards and each board is
60, 187-194 (1990). connected to 16 processor boards, each “Treecode with a special-purpose prowith 16 pipeline chips on it. The pro- cessor," to appear in Publications of the “GRAPE-1A: Special-purpose comcessor board is similar to that of Astronomical Society of Japan (PASJ) puter for N-body simulation with tree GRAPE-3's, but the wordlength will (1991).
code,” to appear in PASJ (1991). be increased to allow higher accuracy calculations. The designers hope that “Project GRAPE: Special purpose “GRAPE-3: Highly parallelized specialthe entire package will consume about computers for many-body problems,” purpose computer for gravitational 5 to 10 kW and thus will be air cooled. to appear in High Performance Com- many-body simulations,” to appear in They estimate it will cost about $2M to puting: Research and Practice in Japan, HICSS-25, 25th Hawaii International build.
edited by R. Mendez (John Wiley and Conference on System Sciences, Koloa, Sons, 1992).
Hawaii, 7-11 January 1992. COMMENT
GRAPE systems were designed mostly by astronomers who have real problems to solve. In some way their