a)
lw lw bne lw lw bne sw
IF
ID IF
EX ID IF
M ID IF
WB ID IF
EX ID IF
M ID IF
WB ID IF
EX ID
M EX
WB IF
ID IF
EX ID IF
22 cycles sw , suceed because flush (branch is taken) 4 to fill 7 7 instructions
CS M151B:
Computer Systems Architecture
Week 8
Agenda
Homework 5 due Saturday, February 27th, 12:00
AM
ie. in 8 hours. I swear it was different last week.
Advanced Pipelining:
Deeper Pipelines/Super P
CS M151B:
Computer Systems Architecture
Week 1
Disc 1E
TA: Uen-Tao Wang
Email: [email protected]
Office Hours: T 1-3PM
The other TA's are scheduled for
M 3-5PM
W 1-3PM
TA Room: BH 2432
Discussio
CS M151B:
Computer Systems Architecture
Week 9
Memory
Last week, we covered caches and why it was
necessary to have them.
Recap: We need DRAM/Main Memory since we
need some large memory in order to
ad
Fall 2004
NAME _
1. Execution Time (15 points): You are asked to choose between two approaches to reducing the impact of loads on processor performance on the multicycle datapath. For this problem, as
Assume for the rest of this problem that all logic gates have the following delays:
Fan In
Delay
1
5T
2
8T
3
12T
4
15T
5
20T
6
24T
7 or more 4T x fan-in
So a 2-input AND gate would have delay 8T and a
CS M151B:
Computer Systems Architecture
Week 5
Single Cycle Datapath. again.
Ugh.
Ex: Extending Datapath for bne
bne (branch if not equal to)
if (R[rs] != R[rt])
PC = PC + 4 + SE(I)
else
PC = PC + 4
D
1)
problem: Too many capacity misses in the data cache
solution: Increase size of cache
drawback: larger latency, power
problem: Too many control hazards
solution: dynamic branch prediction
drawback:
Fall 2004
NAME _
1. Execution Time (15 points): You are asked to choose between two approaches to reducing the impact of loads on processor performance on the multicycle datapath. For this problem, as
2. Caching In on the TLB (20 points): Consider the data cache for a processor that uses byte addressed memory. The cache is a 4KB 2-way set associative cache with an 8 byte block size that uses LRU re
Problem 1. Using the diagrams on slides 20 and 23 from lecture 4, what is the maximal latency to drive out the final Carry- Out and all Sum bits for the specific 16bit hierarchical carry lookahead add
UNIVERSITY OF CALIFORNIA, LOS ANGELES
` BERKELEY DAVIS IRVINE LOS ANGELES RIVERSIDE SAN DIEGO SAN FRANCISCO SANTA BARBARA
UCLA
SANTA CRUZ
CS M151B / EE M116C
Final Exam
Before you start, make sure yo
5
The Processor:
Datapath
and Control
In a major matter,
no details are small.
French Proverb
5.1
Introduction 284
5.2
Logic Design Conventions 289
5.3
Building a Datapath 292
5.4
A Simple Implementat
CS M151B Discussion
Week 10
Cache Coherence
Cache Coherence
Assume:
Multiple cores, each with private L1$
Shared L2$
What if multiple L1$ have same blocks?
Reads only: No problem.
Writes: How to notif
a.
1
lw
2
IF
ID EX M WB
lw
IF
3
ID
4
ID
bne
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22
ID EX M WB
IF
ID
ID
sw
ID EX M WB
IF ID
EX
IF
ID
IF
lw
IF
lw
ID EX M WB
IF
ID
ID
bne
ID EX M WB
IF
ID
ID
sw
CS M151B Winter 2015 Discussion 1A/B Week 10
1. Review topics: (the corresponding lecture slides are on CCLE under week x)
1. Prior to first midterm:
1. Week 1: Performance equation, Amdahl's law
2. W
Solutions of the practice final
1. solution -> drawback
(a) increase cache size drawback -> increase cache access time
(b) Loop unrolling with compiler -> increase code size
(c) Hierarchical CLA -> Mo
CS M151B:
Computer Systems Architecture
Week 10
Cache Coherence
Consider a two processor shared memory
multi-processor each running the following
snippets.
P0 runs:
addi $t0, $zero, 10
sw $t0, A
P1 ru
CS M151B Discussion
Week 3
Arithmetic
2's Complement integer representation
high bit determines sign (1=neg,0=pos)
-x = ~x+1
Integer Addition
Conceptually: add multi-bit numbers by adding 1 bit at a t
Chapter 3
Arithmetic for Computers
Can We Make a Faster Adder?
Worst case delay for N-bit Ripple Carry
Adder
CarryIn
2N gate delays
2 gates per CarryOut
N CarryOuts
a
b
CarryOut
We will explore the Ca
Chapter 2
Instructions: Language
of the Computer
Immediate Operands
Constant data specified in an instruction
addi $s3, $s3, 4
No subtract immediate instruction
Just use a negative constant
addi $s2,
Assume for the rest of this problem that all logic gates have the following delays:
Fan In
Delay
1
T
2
2T
3
3T
4
5T
5
7T
6
10T
7 or more 2T x fan-in
So a 2-input AND gate would have delay 2T and a 4-i
1
Prolog
right to match goals
goals are evaluated left to right
depth-first search
sort(L,S) :- perm(L,S), sorted(S).
sorted([]) :- true.
sorted([_]) :- true.
sroted([X,Y|L]) :- X =<Y, sorted([Y|L]).
EE116C/CS151B Homework 8
Problem 1
In this exercise, we will look at the different ways capacity affects overall performance. In general,
cache access time is proportional to capacity. Assume that mai
EE116C/CS151B Homework 6
Problem 1
The performance advantage of both the multi cycle and the pipelined designs is limited
by the longer time required to access memory versus use of the ALU. Suppose th
EE116C/CS151B Homework 7
Problem 1
For a direct-mapped cache design with a 32-bit address, the following bits of the address are
used to access the cache.
1 What is the cache block size (in words)?
2