chapter3-m5-ziavras

Ziavras extracting loop level parallelism 4 new code

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: S1’: A[i+1] = A[i+1] + B[i+1] S2’: B[i+2] = C[i+1] + D[i+1] S1 depends on S2 S2 doesn’t depend on S1 dependence is not circular is not circular S. Ziavras Extracting Loop-Level Parallelism (4) New code does not contain loop-carried dependence A[1] = A[1] + B[1]; A[1] B[1] For (i=1; i<=99; i=i+1) { B[i+1] = C[i] + D[i]; C[i] D[i]; A[i+1] = A[i+1] + B[i+1]; } B[101] = C[100] + D[100]; Iterations may be overlapped, provided statements in each iteration are kept in order S. Ziavras Extracting Loop-Level Parallelism (5) For (i=1; i<=100; i=i+1) { A[i] = B[i] + C[i] D[i] = A[i] * E[i] } /* S1 */ /* S2 */ The dependence S1 S2 is not a problem if S2 gets the value of A[i] from the reg. that receives the result of S1 (as long as the reg. is not overwritten by any other instr.; not here) S. Ziavras Extracting Loop-Level Parallelism (6) Recurrence: a variable is defined based on the value of that variable is defined based on the value of that variable in an earlier iteration • Detecting a recurrence is important for 2 reasons 1. Some architectures (e.g., vector computers) have special support for executing recurrences 2. Some recurrences can be the source of a reasonable recurrences can be the source of reasonable amount of parallelism Dependence distance of 5 distance of For (i=6; i<=100; i=i+1) { Y[i] = Y[i-5] + Y[i]; } The larger the dependence distance, the more potential parallelism can be obtained by loop unrolling S. Ziavras Improving Instruction Delivery Rate • • Current multiple-issue processors deliver 4-8 instrs. every clock cycle Techniques supporting multiple issues 1. Branch-target buffers 2. Integrated instr. fetch units 3. Predicting addresses for indirect branches S. Ziavras Branch-Target Buffers Branch-prediction cache that stores the predicted address for the next instr. after a branch is called S. Ziavras Steps in Handling an Instruction Assume: only taken branches stored in buffer S. Ziavras Penalties for Branch-Target Buffers Instr. in buffer Prediction Yes T Yes Actual branch Penalty cycles cycles T 0 T NT 2 (1 cycle...
View Full Document

This document was uploaded on 02/09/2014.

Ask a homework question - tutors are online