It would increase the usefulness of x to include a

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Table 1.2 on page 12 to support indexed addressing. If you have access to a hardware simulator, test your design. (This is non-trivial!) Estimate the performance benefit of a single-cycle delayed branch. A delayed branch allows the instruction following the branch to be executed whether or not the branch is taken. The instruction after the branch is in the 'delay slot'. Assume the dynamic instruction frequencies shown in Table 1.3 on page 21 and the pipeline structure shown in Figure 1.13 on page 22; ignore register hazards; assume all delay slots can be filled (most single delay slots can be filled). If there is a dedicated branch target adder in the decode stage, a branch has a 1-cycle delayed effect, so a single delay slot removes all wasted cycles. One instruction in four is a branch, so four instructions take five clock cycles without the delay slot and four with it, giving 25% higher performance. If there is no dedicated branch target adder and the main ALU stage is used to compute the target, a branch will incur three wasted cycles. Therefore four instructions on average include one branch and take seven clock cycles, or six with a single delay slot. The delay slot therefore gives 17% higher performance (but the dedicated branch adder does better even without the delay slot). Example 1.3 Exercise 1.3.1 Estimate the performance benefit of a 2-cycle delayed branch assuming that all the first delay slots can be filled, but only 50% of the second delay slots can be filled. Why is the 2-cycle delayed branch only relevant if there is no dedicated branch target adder? What is the effect on code size of the 1- and 2-cycle delayed branches suggested above? (All unfilled branch delay slots must be filled with no-ops.) Exercise 1.3.2 The ARM Architecture Summary of chapter contents The ARM processor is a Reduced Instruction Set Computer (RISC). The RISC concept, as we saw in the previous chapter, originated in processor research programmes at Stanford and Berkeley universities around 1980. In this chapter we see how the RISC ideas helped shape the AR...
View Full Document

This document was uploaded on 10/30/2011 for the course CSE 378 380 at SUNY Buffalo.

Ask a homework question - tutors are online