Manufacturer | ID | location | Date | from |
Cray Research | Cray-1 A | ** | 1976 | Lawrence Livermore Laboratory |
Cray Research | Cray 1M/4400 | ** | 1978 | Cray Research |
Cray Research | Cray-2 | ** | 1985 | Lawrence Berkeley Lab |
Photo - 64 K Bytes
A new (2021) Cray History website :-)
Architecture
|
Special features - Cray 1
|
|
Cycle times from http://netlib2.cs.utk.edu/utk/lsi/pcwLSI/text/node9.html#SECTION00410000000000000000
Year of Introduction | Model Name | Cycle Time in Nanoseconds |
1976 | CRAY 1 | 12.5 |
1982 | CRAY X-MP | 9.5 |
1985 | CRAY 2 | 4.1* |
1988 | CRAY Y-MP | 6.5 |
1992 | CRAY Y-MP C-90 | 4.0 |
From Tera, acquired Cray Research assets from SGI April 2000 (SGI acquired Cray Research in August 1996)
The first Cray-1® system was installed at Los Alamos
National Laboratory in 1976 for $8.8 million. It boasted a
world-record speed of 160 million floating-point operations
per second (160 megaflops) and an 8 megabyte (1 million
word) main memory. The Cray-1's architecture reflected its
designer's penchant for bridging technical hurdles with
revolutionary ideas. In order to increase the speed of this
system, the Cray-1 had a unique "C" shape which enabled
integrated circuits to be closer together. No wire in the
system was more than four feet long. To handle the intense
heat generated by the computer, Cray developed an
innovative refrigeration system using Freon.
In order to concentrate his efforts on design, Cray left the CEO position in 1980 and became an independent contractor. As he worked on the follow-on to the Cray-1, another group within the company developed the first multiprocessor supercomputer, the Cray X-MP™, which was introduced in 1982. The Cray-2™ system appeared in 1985, providing a tenfold increase in performance over the Cray-1. In 1988, Cray Research introduced the Cray Y-MP®, the world's first supercomputer to sustain over 1 gigaflop on many applications. Multiple 333 MFLOPS processors powered the system to a record sustained speed of 2.3 gigaflops. Always a visionary, Seymour Cray had been exploring the use of gallium arsenide in creating a semiconductor faster than silicon. However, the costs and complexities of this material made it difficult for the company to support both the Cray 3 and the Cray C90ä development efforts. In 1989, Cray Research spun off the Cray 3 project into a separate company, Cray Computer Corporation, headed by Seymour Cray and based in Colorado Springs, Colorado. (Tragically, Seymour Cray died of injuries suffered in an auto accident in September, 1996 at the age of 71. The 1990s brought a number of transforming events to Cray Research. The company continued its leadership in providing the most powerful supercomputers for production applications. The Cray C90™ featured a new central processor with industry-leading sustained performance of 1 gigaflop. Using 16 of these powerful processors and 256 million words of central memory, the system boasted unrivaled total performance. The company also produced its first "minisupercomputer," the Cray XMS system, followed by the Cray Y-MP EL series and the subsequent Cray J90™. In 1993, Cray Research offered its first massively parallel processing (MPP) system, the Cray T3D™ supercomputer, and quickly captured MPP market leadership from early MPP companies such as Thinking Machines and MasPar. The Cray T3D proved to be exceptionally robust, reliable, sharable and easy-to-administer, compared with competing MPP systems. Since its debut in 1995, the successor Cray T3E™ supercomputer has been the world's best selling MPP system. The Cray T3E-1200E system has the distinction of being the only supercomputer to ever sustain one teraflop (1 trillion calculations per second) on a real-world application. In November 1998, a joint scientific team from Oak Ridge National Laboratory, the National Energy Research Scientific Computing Center (NERSC), Pittsburgh Supercomputing Center and the University of Bristol (UK) ran a magnetic magnetism application at a sustained speed of 1.02 teraflops. In another technological landmark, the Cray T90™ became the world's first wireless supercomputer when it was unveiled in 1994. Also introduced that year, the Cray J90 series has since become the world's most popular supercomputer, with over 400 systems sold.
Cray Research merged with SGI (Silicon Graphics, Inc.) in
February 1996. In August 1999, SGI created a separate Cray
Research business unit to focus exclusively on the unique
requirements of high-end supercomputing customers. Assets
of this business unit were sold to Tera Computer Company in
March 2000.
From Yahoo, news wire info {SGI paid $760 million for Cray Research in 1996} {In April 2000, Cray (was Tera) paid SGI $58 million for the remnants of Cray Research, SGI lost over 92 percent on their "investment" in 3.5 years. The SGI Cray T3E is based on the Dec Alpha chip. The Cray C090 and T90 were the last of the Cray style vector processing from Cray Research/SGI. }
|
from http://www.cs.uiuc.edu/whatsnew/newsletter/fall98/chen.html
After earning his MS in 1972, Chen came to Illinois to work with Professor Dave Kuck and
graduate student Duncan Lawrie, who were championing the new concept of parallelism in
the ILLIAC IV project. After a year at Floating Point Systems, Chen joined Cray Research as its chief designer, where he led the development of the world’s most commercially successful parallel vector supercomputers, the Cray X-MP, and its successor the Cray Y-MP. Chen began by making some architectural changes to the Cray-1, which was introduced in 1971. In the Cray X-MP (Chen said that the "X" stood for "extraordinary"), Chen introduced shared-memory multiprocessing to vector supercomputing. The machine contained two pipelined processors compatible with the Cray-1 and shared memory. The X-MP series was expanded to include 1- and 4-processor machines. The X-MP4 was the first supercomputer installed at the National Center for Supercomputing Applications (NCSA) at Illinois (summer 1985). The first of the Y-MP series, Cray’s new multiprocessor vector supercomputer introduced in 1988, contained 1 processor, followed by 8, and then 16. All these machines shared essentially the same architecture, and the majority were designed by Chen and his team. Cray Research enjoyed tremendous growth from 1982–86 as its customer base expanded beyond government laboratories to commercial applications. This was the "heroic age" of the supercomputing industry. http://wotug.ukc.ac.uk/parallel/documents/misc/timeline/timeline.txt ========1972======== Seymour Cray leaves Control Data Corporation, founds Cray Research Inc. (GVW: CDC, CRI) |
Details From http://www.cs.umass.edu/~weems/CmpSci635/635lecture16.html
A Case Study: The Cray 1 and Family
The Cray 1 was first delivered in 1976. This was around the same time that 8-bit microprocessors were beginning to gain popularity, typical memory components were 1K bit SRAM and 4 K bit DRAM. Most machines were operating at about a 1 MHz clock rate, had 32-bit words, and large mainframes had 1 MB to 8 MB of RAM. The Cray 1 had (Baron and Higbie CS manual)
The Cray 1 has 3 basic data types: addesses (24-bit integer), integers (64-bit), floating point (64-bit, 48-bit mantissa). The 12 functional units are divided into four groups. Group 1 -- Vector units Group 2 -- Vector and scalar units Group 3 -- Scalar units Group 4 -- Address units The machine itself is divided into six major subsystems
Cray 1 instructions are 32 or 16 bits, so from 2 to 4 instructions can be packed into a word. Instructions are thus addressed on 16-bit boundaries while data is addressed on 64-bit boundaries. The instruction unit has four 16-word instruction buffers, three instruction registers, and one instruction counter. Each 16-bit field in a word is called an instruction parcel. The three instruction registers are
For a 32-bit instruction, the low-order portion is fetched to the NIP and then moved to the LIP. There is no mechanism for discarding instructions in the pipe -- once in the CIP/LIP, they will be issued. At most they will be delayed for some time. The instruction buffers are tied to the memory via the 16-way interleaving, so it is possible to fill a buffer in 4 clock cycles (recall that the clock is 12.5 ns and memory is 50 ns). Buffers are filled on a demand basis in a round-robin pattern. They thus act as an instruction cache of 256 instructions, organized into four lines of 64 instructions. Each buffer has its own address comparator, so we would call this a fully associative cache (easy to implement when there are only 4 lines). The buffers cannot be written to -- a write bypasses the instruction cache and only goes to main memory. Scalar instruction issue requires that all of the instruction's required resources be free -- otherwise the instruction waits. Vector instruction issue in the Cray involves reserving functional units, including memory, operand registers and result registers, and then releasing an instruction once all of its resources are available. In addition, some data paths are shared between the vector and scalar components, and these must be available. The control unit is able to detect when a result register for one vector operation is an operand for another vector operation and, if the two vector instructions do not conflict in any other resource requirements, it sets up a vector chaining operation between the two instructions. Address Component There are 8 24-bit address registers, 64 24-bit spill registers, an adder, and a multiplier in this component. Its purpose is to perform index arithmetic and send the results to the scalar and vector components so that they can fetch the appropriate operands. Arithmetic is performed on the address registers directly. The spill registers are used to hold address values that do not fit into the address registers. A set of 8 addresses can be transferred between the address registers and their spill registers in a single cycle. Thus, they bear a certain similarity to the register windows of the SPARC (or vice versa). The spill registers can be thought of as an explicitly managed data cache with 8 lines. Their value is that they reduce the traffic to main memory, freeing that resource for vector operations. Scalar Component Similar to the address component, the scalar component has 8 64-bit registers and 64 64-bit spill registers. It has sole access to four functional units: Integer Add, Logical, Shift, and Population Count. The Scalar Component also has access to three functional units that are shared with the Vector Component: Floating Add, Multiply, and Reciprocal Approximation. Because the scalar component has its own integer units, it can always execute integer operations in parallel with a vector operation. However, for floating point, the vector unit takes priority. Vector Component The are 8 64-word vector registers in the vector component. It takes four memory loads to fill a vector register. Normally, this would require 16 instruction cycles. However, careful pipelining in the memory unit reduces the time to just 11 cycles. A vector mask register contains a bit-map of the elements in a register operand that will participate in an instruction. A vector length register determines whether fewer than 64 operands are contained in a set of vector operands. Manipulating these values is the primary reason for the population and leading zeros counter. Vector loads and stores specify the first location, the length, and the stride. I/O Component The I/O component has 24 programmable I/O channel units. I/O has the lowest priority for memory access.
Cray XM-P
Cray YM-P
Cray 2
Practical Considerations in Supercomputer Design To achieve such high speeds, high-power (i.e. hot) drivers are employed, signals are detected with specialized analog circuits, conductors are all shielded and precisely tuned in both impedance and length, and data is encoded with error-correcting so that losses can be recovered. In addition, the circuits are usually designed to operate in balanced mode so that there is no change in power drawn as drivers switch. As one driver switches from low to high, another switches from high to low, so that the power supply sees a DC load and there is no coupling of switching noise back into the logic via the power supply. In addition, using balanced signal lines can increase the signal to noise ratio by 6dB, although these are not often used. In a design such as the Cray-1, roughly 40% of the transistors supposedly do nothing but balance the power loading. Even so, these machines dissipate large amounts of heat. The IBM 3090 uses special thermal conduction modules in which a multichip substrate is mounted in a carrier with built-in plumbing for a chilled water jacket. CDC used a similar system in its designs, and on one instance a maintenance crew pumped live steam through the building air conditioning system, which crossed over to the processor, with predictable results. This raises the issue that these machines usually need thermal shut-down systems, and possibly even fire suppression gear. The Cray-1 series uses piped freon, and each board has a copper sheet to conduct heat to the edges of the cage, where freon lines draw it away. The first Cray-1 was in fact delayed six months due to problems in the cooling system: lubricant that is normally mixed with the freon to keep the compressor running would leak through the seals as a mist and eventually coat the boards with oil until they shorted out. The Cray-2 is unique in that it uses a liquid bath to cool the processor boards. A special nonconductive liquid (flourinert) is pumped through the system and the chips are immersed in this. Special fountains aerate the liquid, and reservoirs are provided for storing the liquid when it is pumped out for service. This is somewhat remeniscent of the oil cooling bath that was sometimes used in magnetic core memory units. The ETA-10 was originally going to use a liquid nitrogen bath, but I believe this turned out to be too difficult to implement (on a side note, I have known scientific labs where the researchers deal with cooling problems in air-cooled machines by opening a tank of liquid nitrogen at the inlet, but that's not quite the same). As a final note, Lawrence Livermore National Labs has announced that it will henceforth buy no more vector supercomputers. The handwriting is clearly on the wall for this breed of system, and all of the major manufacturers are moving, finally, to parallel processing. |
Number manufactured
User Experience
A History of Supercomputing at Florida State University by Jeff Bauer |
If you have comments or suggestions, Send e-mail to Ed Thelen
Go to Antique Computer home page
Go to Visual Storage page
Go to top
Updated April 12, 2000