A2 - Performance (solutions)
ECE 429
Question 1
Problem
Question 1.13 From Hennessy and Patterson Ed. 5
Your company is trying to choose between purchasing the Opteron or Itanium 2. You have analyzed your
companys applications, and 60% of the time it will

Assignment #1 MIPS (Solutions)
Q1) : abs $t2, $t3
q2:
sub $t2, $zero, $t3
bgez $t2, done
sub $t2 $zero, $t2
done:
#sign-flipped value into $t2
#t2<0, so flip sign again
Q2) : f = g - A[B[4];
Assume:
- register $s5 holds a pointer to the start of array B (

Q1)
a) 2-way set associative = 2 blocks/set = 16384/2 sets = 8192 sets
byte offset selects 1 byte from the 32 bytes in the block: 32 = 25 5bits to address each byte
index selects 1 set from 8192 sets: 8192 = 213 13 bits to address each set
tag is the rema

1) Assume you are given a 2-way set associative cache with a block size of 32 bytes. The cache
can accommodate a total of 16384 blocks from main memory. The size of addresses is 32 bits.
a)How many bits are there in the tag, index, and byte offset fields

Q1)
a)A processor has a clock cycle time of 1ns. It has a CPI of 3 cycles for ALU operations, 4 cycles for branches and 5 cycles
for memory operations. A certain application is 50% ALU operations, 10% branches and 40% memory operations. What
is the maximu

Q1)
Each block in memory is 8 bytes apart (8 byte blocks)
Each cache has 4 blocks, direct mapped
BOFS = log2(block size) = log2(8) = 3
Index = log2(#sets) = log2(4) = 2
Tag = rest of address
Block indices for the addresses in the question (index highlight

a) R2 contains the number of non-zero entries in the first n elements of array p.
b) There are 7 mispredicts (shown in italics).
System
Branch Predictor
Branch Behavior
State
PC
R3/R4
b1 bits
b2 bits
Predicted

Q1: Branch Prediction
This problem will investigate the effects of adding global history bits to a standard branch
prediction mechanism. In this problem assume that the MIPS ISA has no delay slots.
Throughout this

1)For each part of this exercise, assume the initial cache and memory state as illustrated in
Figure 1. Each part of this exercise specifies a sequence of one or more CPU operations of the
form:
P#: <op> <address> [ value ]
where P# designates the CPU (ex

Problem P1: Sequential Consistency
For this problem we will be using the following sequences of instructions. These are small
programs, each executed on a different processor, each with its own cache and register set. In the
following R is a register and

Problem P1: Sequential Consistency
For this problem we will be using the following sequences of instructions. These are small
programs, each executed on a different processor, each with its own cache and register set. In the
following R is a register and

ECE 621: Computer Organization
Fall 2014
Course description
This is a graduate course on computer architecture focusing on quantitative methods for cost and performance design tradeos. This course covers the fundamentals of classical and modern general pr

Assignment 1: MIPS ISA
Instructions: Ensure that you state your register allocations when not specified.
1) Find the shortest sequence of MIPS instruction that performs the below operation:
abs $t2, $t3 #R[rd] = |R[rt]|
2) For the C code, what is the corr

A2 - Performance
ECE 429
Question 1
Question 1.13 From Hennessy and Patterson Ed. 5
Your company is trying to choose between purchasing the Opteron or Itanium
2. You have analyzed your companys applications, and 60% of the time it will
be running applicat

Computer Architecture
Unit 1: Performance
Hiren Patel
[email protected]
Special thanks to Prof. Milo Martin, Prof. Amir Roth, Prof. Mark Hill, Prof.
Guri Sohi, Prof. Jim Smith, Prof. Krste Asanovic, Prof. Daniel Sorin, and
Prof. David Wood. T

Computer Architecture
Unit 5: Datapath and Control
Slides originally developed by Amir Roth and Daniel Sorin with
contributions by Milo Martin at University of Pennsylvania with sources
that included University of Wisconsin slides by Mark Hill, Guri Sohi,

Computer Architecture
Unit 2: Instruction-set Architecture
Special thanks to Prof. Milo Martin, Prof. Amir Roth, Prof. Mark Hill, Prof.
Guri Sohi, Prof. Jim Smith, Prof. Daniel Sorin, Prof. Krste Asanovic, and
Prof. David wood. These slides heavily borrow

Computer Architecture
Hiren Patel
Superscalar
Special thanks to Prof. Daniel Sorin, Prof. Milo Martin, Prof. Amir Roth,
Prof. Mark Hill, Prof. Guri Sohi, Prof. Jim Smith and Prof. David wood.
These slides heavily borrow theirs.
1
Where are we now
Instru

ECE 621
Computer Architecture
Unit 0: Introduction
Hiren Patel
[email protected]
Special thanks to Prof. Milo Martin, Prof. Amir Roth, Prof. Mark Hill, Prof.
Guri Sohi, Prof. Jim Smith, Prof. Krste Asanovic, Prof. Daniel Sorin, and
Prof.

Computer Architecture
Unit 6: Pipelining
Slides originally developed by Amir Roth with contributions by Milo Martin
at University of Pennsylvania with sources that included University of
Wisconsin slides by Mark Hill, Guri Sohi, Jim Smith, Daniel Sorin an

Computer Architecture
Hiren Patel
Dynamic Scheduling
Special thanks to Prof. Daniel Sorin, Prof. Milo Martin, Prof. Amir
Roth, Prof. Mark Hill, and Prof. David Wood. These slides heavily
borrow theirs.
1
Dynamic Scheduling
Basic pipeline started with sin

Abstract
Parallel systems that support the shared memory abstraction are becoming
widely accepted in many areas of computing. Writing correct and efficient
programs for such systems requires a formal specification of memory semantics,
called a memory cons

Computer Architecture
Hiren Patel
Dynamic Scheduling II
Special thanks to Prof. Daniel Sorin, Prof. Milo Martin, Prof. Amir
Roth, Prof. Mark Hill, and Prof. David Wood. These slides heavily
borrow theirs.
1
Dynamic Scheduling II
So far: dynamic schedulin

Computer Architecture
Hiren D. Patel
Multicore
(Shared Memory Multiprocessors)
Special thanks to Prof. Milo Patel H. D., Prof. Amir Roth, Prof. Mark Hill,
Prof. Guri Sohi, Prof. Jim Smith and Prof. David Wood. These slides
heavily borrow theirs.
(Patel H.

Computer Architecture
Hiren D. Patel
Hardware Multithreading
Special thanks to Prof. Milo Martin, Prof. Amir Roth, Prof. Mark Hill.
These slides heavily borrow theirs.
(Patel, H. D.): Hardware Multithreading
1
Multithreading (MT)
Application
Three implem

Computer Architecture
Hiren Patel
Vector and array processors (SIMD)
Special thanks to Prof. Onur Mutlu. These slides heavily borrow theirs.
1
Data Parallelism
Concurrency arises from performing the same operations
on different pieces of data
Single ins

Computer Architecture
Hiren Patel
Multicore
(Shared Memory Multiprocessors)
Special thanks to Prof. Milo Patel H. D., Prof. Amir Roth, Prof. Mark Hill,
Prof. Guri Sohi, Prof. Jim Smith and Prof. David Wood. These slides
heavily borrow theirs.
1
Uniprocess