Quadrics without communication boards


Without communication boards there are two basic components of the Quadrics, one ZCPU, that controls the program flow, making all integer and address calculations, and a mesh of floating point processors, the MADS.

For the moment there exist five different configurations of this mesh, named Q1, Q2, Q4, Q8, Q16. They are all based on the same topology principle. This is, they consist of a threedimensional lattice with periodic boundary conditions, which has the volume 2x2xn.

Going to the level of processing boards the reason for this becomes more transparent. Each of the above machines consist of a one dimensional line of processing boards, that each house a 2x2x2 cube of processors. A Qn is made of n such boards.
Therefore every MAD has exactly one next neighbour that belongs to a different board. There is no need of something (the communication boards) that controls whether the offboard communication goes the correct way, because there is only one way.

The peak performance of all these configurations and also of those with a different topology, i.e. Quadrics including communication boards, is based on the performance of the single processors. Their peak performance is 50 MFlop (frequency of 25 MHz, two operations per cycle). Therefore the total peak performance of a Qn is n times 400 MFlop.

The full configuration, which consists of 4 towers (4xQH4), each tower of 4 crates and each crate of 16 processing boards, i.e. it collects 2048 MAD processors, has therefore a peak performance of 100 GFlop. (This causes the name APE-100).

The DFG - system consisting of one QH2 and one Q1 has together 13.2 GFlop.

For the practical use not the peak performance but the effective performance is of interest. Deviations from peak performance are due to memory access and integer (ZCPU) calculation, that could not be hidden behind the floating point calculations. For this it is no rareness to get only 20 or even 10 percent of the peak performance. This is not the case for the Quadrics. Programs that match the architecture well, as they appear e.g. in lattice field theory, can reach up to 75% of peak performance!