1. A program has an instruction count of 1 million. The instructions can
be divided into 4 types depending on their CPI. Type A has a CPI of 1,
Type B has a CPI of 2, Type C has a CPI of 3 and Type D has a CPI of 3.
The occurrence of the instructions in t
Cyclic Redundancy Check
Cyclic redundancy check (CRC) is an error-detecting code technology that is commonly
used in digital networks and storage devices to detect accidental changes to raw data1. Due to the
reason that the transfer of data may meet some
1. Return to Zero (RZ)
In RZ, the signal drops to zero between each pulse. Because it drops to zero every time,
so this signal is actually self-clocking.
2. Non return to zero (NRZ)
In NRZ, 1 is represented by one significant condition (usually a positive
a. 1.With 5 nodes
Because total data size is 300GB, so each node needs to process 60GB.
Because 30% of the data must be read from remote nodes, so 30% *60=18GB
According to figure 6.6, the bandwidth between racks is 100MB/s, so for the remote data,
a. To achieve 95% of the availability, the Outage Hours should be
In figure 6.1, for 2400 servers
The Outage Hours is 4+ (250+250+250) +250*(1/12) =774.8 hours
However, we want the Outage Hours to be reduced to 438 hours
So if the H
b. Assume that the percentage of vectorization is f. And because a computation is 10 times faster
when it runs on vector mode, so we can assume there are 10 processors now rather than 1.
Then the speedup= 1/ (f/10 +1-f)
Because the desired speedup is
The throughout for this kernel on this GPU = 1.5*0.80*0.85*0.70*10*32/4= 57.12 GFLOP/sec
(1). If the number of single-precision lanes is increased to 16, the throughout =
1.5*0.80*0.85*0.70*10*32/2= 114.24 GFLOP/sec, so the speedup= 114.24/57.1
C.7.a, b (For part b, include data hazard also)
Because there is a stall per 5 instructions for 5-stage machine, so CPI5-stage= (1+5)/5=1.2
Also there are 3 stalls per 8 instructions for 12-stage machine,
So CPI12-stage= (3+8)/8=1.375
Because the clock cycle time is equal to the time of the longest stage plus register time.
So the clock cycle time of 5-stage pipeline = 0.8 +0.1=0.9 ns
Because the 10-stage pipeline splits all stages in half, so the time of the longest stage of 10-
Improvability Performance of Multistage
I. Introduction of Multistage Blocking
In the Crossbar Interconnection Network, everything is good except the great cost. The
switch cost of Crossbar is N2, which is fine when the number of points is small. However,