
B-40 March, 2003 Developer’s Manual
Intel
®
80200 Processor based on Intel
®
XScale
™
Microarchitecture
Optimization Guide
B.5.3 Scheduling Multiply Instructions
Multiply instructions can cause pipeline stalls due to either resource conflicts or result latencies.
The following code segment would incur a stall of 0-3 cycles depending on the values in registers
r1, r2, r4 and r5 due to resource conflicts.
mul r0, r1, r2
mul r3, r4, r5
The following code segment would incur a stall of 1-3 cycles depending on the values in registers
r1 and r2 due to result latency.
mul r0, r1, r2
mov r4, r0
Note that a multiply instruction that sets the condition codes blocks the whole pipeline. A 4 cycle
multiply operation that sets the condition codes behaves the same as a 4 cycle issue operation.
Consider the following code segment:
muls r0, r1, r2
add r3, r3, #1
sub r4, r4, #1
sub r5, r5, #1
The add operation above would stall for 3 cycles if the multiply takes 4 cycles to complete. It is
better to replace the code segment above with the following sequence:
mul r0, r1, r2
add r3, r3, #1
sub r4, r4, #1
sub r5, r5, #1
cmp r0, #0
Please refer to Section 14.4, “Instruction Latencies” to get the instruction latencies for various
multiply instructions. The multiply instructions should be scheduled taking into consideration these
instruction latencies.
Komentarze do niniejszej Instrukcji