SPARC Pipelining

Introduction

SPARC architecture supports pipelining. It avoids hazards in two ways: load delay slots (stalls) and branch delay slots. It does not do forward passing of address calculations or arithmetic values as MIPS does, by apparently having one less cycle in the pipeline. A SPARC CPU uses a four-step CPU cycle with steps of Fetch instruction, Execute instruction, Memory access, and Write back. During the Execute portion of one instruction, the following will be in Fetch, and by the time the second instruction would need a calculation from the first, the calculation is done.

Load delay slots
Load delay slots, known alternately as pipeline stalls, are done for load commands that are providing data for instructions immediately following. Suppose there was a load instruction preceding an addition instruction, and the load provided one of the values to be used in the addition:

CPU cycle:
F - E - M - W       ld	%r10+offset,%r11 [load to r11]
    F - E - M - W   add %r11,%r12,%r13    [add r11+r12]
|   |   |   |   |
0   1   2   3   4  
Time in machine cycles

If the load instruction were to immediately precede the addition instruction, providing the value that the add instruction needs, the value would not be loaded by the time the addition instruction needs it. The load instruction is doing the memory access (M) at the beginning of cycle 2, when the addition instruction is doing the execute of the values it needs. To solve this, a SPARC machine inserts a "load delay slot," otherwise known as a processor stall, to delay the execution of the add instruction:

F - E - M - W         ld	%r10+offset,%r11 [load to r11]
    D - D - D - D     [delay]	
        F - E - M - W add %r11,%r12,%r13  [add r11+r12]
|   |   |   |   |
0   1   2   3   4  
Time in machine cycles

Now, the load fetches the value needed for the add instruction in time for the add instruction to get what the programmer expects to be there. The MIPS processor architecture once did this, but now has data forwarding instead to solve this hazard.

Branch Delay Slots
Branch instructions effectively call a subprogram or another program. In SPARC machines, branches read a condition code set by an earlier compare and then branch accordingly (instead of in MIPS where the branch compares a register set up by a slt instruction to a register that always has a zero - %zero - and then makes the decision itself whether to branch or not, and then calculates the address). If the branch address is not calculated and loaded into the program counter until after the execution of the branch instruction, instructions past the branch may be executed.

F - E - M - W       	[branch]
    F - E - M - W 	[some next instruction]
|   |   |   |   |
0   1   2   3   4  
Time in machine cycles

The address to which the branch instruction might send the machine to fetch the next instruction is not calculated and written to the program counter until the end of execution of the branch instruction, which is when the next sequential instruction is being fetched - not the one at the branch address. This could have unintended consequences.

SPARC architecture asks the programmer to share some of the responsibility for this problem with the machine. The programmer may insert an instruction after the branch that will execute or insert a nop (no operation) instruction afterward to, effectively, delay the call of the subroutine or program while the branch address is calculated. It is the unconvenient responsibility of the programmer (or a compiler) to decide whether to insert the inncuous instruction or nop. This is unlike in MIPS, which inserts delays for the programmer.

What the machine does for the programmer is it keeps two program counters, the additional one called the %npc or next program counter. The instruction being executed in the F-E-M-W SPARC cycle is the one pointed to by the %pc; the one being fetched is the one pointed to by the %npc. The %pc is always loaded with the address of the %npc. So, the instruction following a branch is always going to be executed. But, in the case of a branch instruction, the %npc is updated with the branch address - if it is to be taken - after the execute cycle of the branch instruction. Only one instruction immediately following a branch will be executed if the branch is taken.

Version 9 of the SPARC architecture improves pipelining by providing some new instructions, such as move on condition, that eliminate the need for some branches, as well as allowing some branch prediction to be set in software

Prev: Subroutines
Next: Newer Features

Paul Austin mailbox

Charles Gates

Last modified: Tue Apr 23 12:24:53 EDT 2002