Lapas attēli
PDF
ePub

Some communications take priority over others in a prescribed way. Each battle station has a preassigned task-say, to attack and kill a predetermined set of targets. That task must be accomplished no matter what the workload presented to the battle station is.

So, overload control is an important attribute. We know how to do that. And total network is designed to accomplish its task no matter what happens. When there is a failure in some part of the network, some other part of the network takes over that task. So, it should be in an SDI system.

To back up my analogy of the network and SDI further, my written statement provides this committee with detail from lessons we have learned from helping to develop that network. This includes a discussion of the basic capabilities of high reliability and fault tolerance as well as high availability, maintainability, and adaptability.

I conclude this short summary by pointing out that these lessons learned are directly applicable to the major SDI challenges of battle management and associated command, control, and communications.

Mr. Chairman, I thank you for the opportunity to present my views on this very important topic.

[The prepared statement of Mr. Buchsbaum follows:]

PREPARED STATEMENT OF SOLOMON J. BUCHSBAUM, AT&T BELL LABORATORIES My name is Sol Buchsbaum. I am employed by AT&T Bell Laboratories (Bell Labs) as Executive Vice President, Customer Systems. AT&T, our parent company, does a lot of business with the Federal Government. The bulk of the business involves communications systems and services provided to both civilian and military agencies of the Federal Government. Also included are specially designed systems for the Department of Defense, particularly the U.S. Navy. Much of that business is supported by research and development done by Bell Labs, and the people doing that work report to me.

During the 1970's Bell Labs was the prime contractor for the SAFEGUARD project. And perhaps it's worth mentioning that a decade earlier Bell Labs people, operating the Bellcomm, a company founded by AT&T to assist NASA, were responsible for the systems engineering of Project Apollo that put our men on the moon. Bell Labs has no contract with the Strategic Defense Initiative Organization (SDIO), but we do support one of the SDIO's systems architecture contractors, as well as ballistic missile defense work for the Secretary of the Army.

However, my familiarity with the Strategic Defense Initiative (SDI) program comes not so much through my work at Bell Labs as through three other activities of mine. I serve on General Abrahamson's Strategic Defense Initiative Advisory Committee, which has been briefed in some detail on SDI projects and their management. I chair the White House Science Council, which has also been interested in and exposed to SDI and its programs. And lastly, I am now a senior consultant to the Defense Science Board, having once been its chairman.

During my years with the Defense Science Board, I was particularly interested in Command and Control. For example, a few years ago I chaired a Defense Science Board task force on Command and Control Systems Management.

In appearing before your subcommittee today, however, I do not represent or speak on behalf of any of these entities or bodies. I am testifying as an individual. The views I will present are those of Sol Buchsbaum, a citizen-hopefully an informed citizen.

Let me say at the outset that I recognize that the Strategic Defense Initiative faces enormous challenges and problems which I do not minimize. However, a vision of the world in which the two superpowers have agreed to constrain their respective offensive nuclear forces to levels much lower than today's and, at the same time, have also agreed to protect themselves and their allies against nuclear attack with defensive systems—a protective shield—of reasonable effectiveness is an attractive one. Surely, that would be a more stable world than the one in which we live today.

Therefore, I believe it behooves the United States to do the research to determine whether or not we can develop the technology that such a protective shield would require.

No one can deny that this is a complex and difficult task. But I for one am heartened by the progress that General Abrahamson and his people are making. The President made his speech raising the possibility of such a defensive system in March 1983, less than three years ago. The SDIO has been in business for only about two years. During that time, however, progress in its program has been made in a number of areas.

Advances have been made and are continuing in such key areas as sensors, defensive weapons, and space transport. And these advances are beginning to provide positive answers to questions that have been raised concerning the cost-effectiveness, survivability and robust performance of a multilayered defense system.

For example, in sensors advances have been made in the field of interactive discrimination, using directed energy-neutral particle beams or high power lasers-to allow the identification of nuclear weapons. These should help give us the capability to discriminate accurately real targets from a variety of decoys. Large optical sensors have been developed with "on chip" data processing capability reducing computerized loads on other system components. These electronic parts are radiation resistant. Also, cryocoolers have been developed that should increase sensor performance against targets and should help reduce system costs.

In the area of interception by defensive weapons we've seen some of the following accomplishments: Work has been done in atmospheric compensation that could allow ground-based high power beams to transit the atmosphere effectively. Lethality testing has shown that the Soviet's current boosters may be very soft, and the hardening of those boosters would impose a high economic cost on the Soviets. Also, free electron laser and high power optic advances are showing the way to build effective laser weapons. And finally, the development of kinetic kill concepts for midcourse allows for more cost-effective discrimination and interception.

I have just touched on the challenges of developing SDI elements such as sensors, target discriminators, and defensive weapons. Then there is the crowning challenge of integrating these elements into a complete system whose parts work coherently to accomplish SDI's mission. A key function that must be considered from the very beginning in coming up with this integrated system is battle management and its associated command, control, and communication-BM/C3 for short.

I want to address BM/C3 in some detail, because recently this has been raised as the area in which we face the greatest-some say impossible-challenges. In fact, some critics have tried to use this to build a case against the reliability, if not the viability, of an DSI system. I want to provide some opinions and evidence that I hope demonstrate this should not be the case.

The BM/C3 function is basically a problem of information movement and management-involving all SDI elements as well as key human decision makers. The problem is difficult because of the vast quantities of information that must be obtained, processed, and synthesized throughout a large, geographically distributed systemand done so quickly. Such a system, it has been estimated, would require ten's of millions of lines of software. And the issue of software complexity is compounded by the associated needs to test, simulate, modify, and evolve the BM/C3 system.

Some critics have specifically questioned if it is possible to generate great quantities of error-free software for the system, and to ensure that it is, indeed, error-free software.

This is the wrong question. Designers of large real-time systems-systems that depend on complex software and hardware-know that it is impossible to generate great quantities of error-free software. They also know that major problems in field use of software need not be associated with program bugs. Software is always part of a larger system that includes hardware, communications, data, procedures, and people.

The right question, as well as the key issue, is the broader one of whether the total BM/C3 system can be designed to be robust and resilient in a changing and error-prone environment. The key, then, is not whether the software contains errors, but how the whole system compensates for such errors as well as for possible subsystem failures. The BM/C3 system must operate continuously and reliably despite any errors or failures. Parts of the system must cover for each other to limit the effect of such problems-and they must do so quickly, typically in milliseconds or seconds.

Can such a large, robust, and resilient system be designed-and not only designed, but built, tested, deployed, operated and further evolved and improved? I believe the answer is yes. I seem confident of this answer because most if not all of the essen

tial attributes of the BM/C3 system have, I believe, been demonstrated in comparable terrestrial systems.

The system most applicable to the issue at hand is the U.S. Public Telecommunications Network. The network has attributes that are intrinsic to the BM/C3 system as well. These include the capabilities for continuous, reliable operation; for fault tolerance and overload control; for evolution in response to changing demands, functions, and advancing technology; for human control if and as appropriate for compatibility and interoperability among all systems; and for continuous testing, diagnosis, and maintenance-much of it by remote means.

In other words, the telecommunications network has high reliability, availability, maintainability, and adaptability. And the network achieves these capabilities largely because it is a distributed system that uses redundancies and that uses well-specified, well-controlled interfaces in the coupling together of all component systems.

Now, let's imagine for a moment that the switches in the network are SDI battle stations; and that everything connected to the network-phones, data terminals, private branch exchanges (PBXs), etc.-corresponds to SDI sensors that feed information to the "battle stations" and cause them to take action.

Let's also imagine that the network signaling system that interconnects all switches is the nerve system, the backbone of the SDI communications system.

Some network "battle stations", each with its own "brain", are connected to others directly and some indirectly, but all are interconnected in a hierarchy. All these "battle stations" can communicate with one another, but not on an equal basis. Some communications take priority over others in a prescribed way. Each "battle station" has a preassigned task-say, to attack and kill a predetermined set of targets. That task must be accomplished no matter what the work load presented to the battle station is. So overload control is an important attribute. We know how to do that. And the total network is designed to accomplish its task no matter what happens. When there is a failure in some part of the network, some other part of the network takes over that task. So it should be in an SDI system.

In the telecommunications network, there are many "battle stations" or switches-20,000 local and nearly 1000 toll switches. And roughly 150 million "sensors" or telephones are connected to the network by 85 million access lines. Some network elements are enormously complex, highly reliable computers. For example, the 4 ESSth switching system, today's largest, can process and route over 700,000 calls per hour, and uses a program of about 1.3 million lines of code.

With this comparision between the telecommunications network and SDI in mind, let's examine the lessons we learned from helping to develop, test, deploy and operate the network. They suggest that there are three keys to achieving high reliability, availability, maintainability, and adaptability.

The first is the use of distributed architectures both for the entire network and for major systems within the network. This approach compartmentalizes crucial functions in modules throughout the country as well as within individual systems. And the approach also helps minimize software complexity, a constant concern in large-system design. Even though the total network requires over 40 million lines of software, for example, each component system requires no more than two million lines.

The second key is the use of redundancy, again both in the entire network and in the component systems. And the third key is the coupling together, the integration, of all the component systems by means of well-specified, well-controlled interfaces. Let's examine in a bit more detail how the telecommunications network achieves its basic capabilities, with special focus on its approach to high reliability through tolerance to errors or failures.

For high reliability, the telecommunications network is built from highly reliable component systems. For example, the mean time between failures for a circuit pack is twenty years. Yet the network as a whole is much more reliable than its individual components. That's because the network is designed to be fault tolerant. It continuously and automatically checks its own condition. When a problem is detected, it isolates the faulty component, so that the network can continue to function using a substitute or redundant component. This strategy of isolating faults so that they do not propagate through the rest of a system is crucial to designing highly reliable systems such as the network. In contrast, the famous New York City blackout of 1965 illustrates how a single fault propagated through almost the entire Northeast power grid.

In the telecommunications network there is a layered approach to fault tolerance. At the highest layer, fault tolerance is designed in through network management. Consider what happens during a peak holiday calling period or a major network disruption such as a hurricane. Network managers can use available information and

controls to make decisions about rerouting traffic in order to maximize network call-handling capacity despite abnormal conditions. In addition, vital classes of calls, such as police or hospital, can also be prioritized to get through.

The next layer of fault tolerance consists of automatic alternate routing of callsdone by a switching system if it detects some type of "traffic jam” along one possible route to a call's destination. For example, there are four different routes available today between Washington, D.C. and Los Angeles, and this will increase to fourteen by 1990.

At the next layer, fault tolerance is built in through the use of redundancy-such as dual processors in each switching system. This dual-processor architecture is designed not for load-sharing between processors, but for achieving high reliability. If a fault is detected in the first processor, control of the switch is automatically shifted to the second processor, to sustain continuous operation. The switch also automatically alerts a craft technician with information on the cause and location of the problem as well as its solution.

At an even lower level, fault tolerance is often built into a circuit pack itself, particularly if it includes a microprocessor. If a fault is detected, the microprocessor's software invokes fault recovery procedures to correct the problem or at least to prevent it from affecting a larger portion of the system.

For high availability, the public telecommunications network is designed to work at its specified level of performance even when some of its component elements are unavailable. As with fault tolerance, a layered approach is taken to achieving high availability. Redundancy is built in at many levels, and systems are designed to remain on-line even while undergoing maintenance and administrative operations. Individual network elements such as switching systems are also designed for high availability. The design goal for a central office switch is continuous, around-theclock operation for 20 years or more—with no more than three minutes per year of downtime. This goal is usually met because of the reliable components and redundant architecture noted earlier.

The telecommunications network is also highly maintainable in order to meet the goals of high reliability and availability. Equipment is not only well-maintained, but also repaired or replaced quickly if it fails.

High maintainability is achieved through several interrelated ways. First, both systems and their component subsystems are designed to be highly modular so that faults can be isolated to a small portion of a system or a subsystem. Once isolated, faults can easily be fixed by replacing or repairing the small part-for instance, an insertable circuit pack. Second, systems are designed to include extensive maintenance diagnostics, through which a system continuously and automatically monitors its own overall condition as well as the condition of its component subsystems. And third, the collection and analysis of maintenance information are centralized even though network systems are distributed over a wide geographic area, many of them in unstaffed locations. Over 100 different large computer-based operations systems are vital to supporting such network needs as centrally administering and controlling maintenance for various types of systems. Moreover, network elements such as newer switching systems, to cite one example, are becoming increasingly sophisticated. So are their problems, which also tend to occur less often. Thus, technicians are less experienced and therefore less proficient in diagnosing and fixing them. These trends make it more and more important for maintenance functions to be automated. In fact, this is one of the areas in which expert software systems are being introduced most rapidly in the network.

Finally, the telecommunications network is highly adaptable on a variety of levels. The network adapts to the minute-by-minute changes in traffic conditions as well as to changing customer needs and to the evolution of technology, which may occur over months, years, and decades. Moreover, the network gracefully evolves each and every day, with new equipment being installed and with older equipment being reconfigured or removed. Through all this change, there is always interoperability among all systems and equipment, no matter what the vintage. The key to this is that adaptability is a network design requirement.

The distributed and highly modular architecture of the network increases its adaptability. Changes made to one network module have only a limited effect on the working of other modules. Different portions of the network must necessarilly affect each other, but great efforts are taken to control these interdependencies. Accordingly, the interfaces among the component systems of the network are carefully specified and controlled. While individual systems may evolve quite rapidly, the interfaces among systems are permitted to change much less quickly.

Another means for achieving high network adaptability is through the careful division of responsibility for decision making between mechanized systems and people.

Far too many choices are made too fast for network decision making to be left to people, but the possibility for human intervention is left open. Although mechanized systems make most of the decisions, these same systems also provide the information necessary for trained personnel to intervene. Because of the vast amounts of raw data available in a system as large and complex as the network, it is vital that information be reduced and focused to enable people to make reasoned choices.

In sum, we achieve high reliability, availability, maintainability, and adaptability in the telecommunications network through three key approaches: the use of distributed architectures, of redundancy, and of well-specified, well-controlled interfaces among all component systems. This approach not only reduces software complexity. It also permits the fullest use of software as a strength, enhancing network flexibility and resiliency.

The network lessons I cited are reinforced by the experience of AT&T and other large corporations in additional complex software and hardware systems. I am most familiar with AT&T examples, of course. And these range from SAFEGUARD, with its total software of over two million instructions, to 25 years and 25,000 miles of installed underwater sound detection systems which have never experienced a failure of any of the electronic components.

I conclude by pointing out that the lessons we learned from the development, systems integration, and evolution of the telecommunications network as well as from other complex projects should also apply to the major SDI challenges of battle management and command, control, and communications.

Thank you for the opportunity to present my views on this important subject. Senator WARNER. Dr. Buchsbaum, that is a very encouraging point. We thank you.

Now we will have Dr. Cohen.

STATEMENT OF DANNY COHEN, DIRECTOR, SYSTEMS DIVISION, INFORMATION SCIENCES INSTITUTE, UNIVERSITY OF SOUTH

ERN CALIFORNIA

Mr. COHEN. Thank you, Mr. Chairman and members of the committee, for the opportunity to appear before you to discuss the software issues of SDI.

I am Danny Cohen from the Information Sciences Institute [ISI] of the University of Southern California. I received my B.S. in mathematics from Technion, the Israel Institute of Technology, and my Ph.D. in applied mathematics-computer science-from Harvard University.

I have been on the computer science faculties of Harvard and Caltech. I have been active in the computing field since 1962. My main research accomplishments are in computer communication and real-time systems.

I am also the chairman of the SDIO Panel on computing in support of Battle Management/C3.

I would like to add that the views I express here are my personal ones and on behalf of most of the SDIO panel and not necessarily of the institute I am working for.

All the members of the panel were selected on the basis of their computing research or Battle Management/C3 background. The panel focused its work on these technical issues. In addition to the discussions that the panel had, and to the presentations that were given to the panel, the panel also consulted several other computing experts.

It is my own judgment, and also that of the panel, that the computing requirements for the battle management of the strategic defense system can be met. This judgment is based upon the following reasons:

« iepriekšējāTurpināt »