hw5
4 Pages

hw5

Course Number: EE 108, Fall 2009

College/University: Stanford

Word Count: 1375

Rating:

Document Preview

EE108B Winter 2008-2009 Prof. Kozyrakis Homework #5 Due Tuesday, Mar 10th by 5pm in Gates 310 Work in groups of at most 2 students, but turn in only one HW per group. Problem 1 [Total 12 points] a. [2 points] Why is the cost of page fault in virtual memory considered very high? b. [2 points] What are the typical advantages of having larger page size? c. [2 points] Would you employ write-through or write-back...

Unformatted Document Excerpt
Coursehero >> California >> Stanford >> EE 108

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Winter EE108B 2008-2009 Prof. Kozyrakis Homework #5 Due Tuesday, Mar 10th by 5pm in Gates 310 Work in groups of at most 2 students, but turn in only one HW per group. Problem 1 [Total 12 points] a. [2 points] Why is the cost of page fault in virtual memory considered very high? b. [2 points] What are the typical advantages of having larger page size? c. [2 points] Would you employ write-through or write-back policy in virtual memory? Why? d. [4 points] You are designing a system that uses virtual memory. Assume that you have a 36-bit, byte-addressable virtual address space and a 32-bit, byte-addressable physical address space. The page size is 4 KB. Page table entries (PTEs) contain the following metadata - valid bit, dirty bit, support for 2 protection modes and a use bit that is periodically cleared. What is the minimum size of a PTE? What is the size (in both bits/bytes and pages) of a processs page table, assuming a single-level, noninverted page table? e. [2 points] Write one advantage and one disadvantage of a virtually tagged cache. Problem 2 [Total 10 points] An important advantage of interrupts over polling is the ability of the processor to perform other tasks while waiting for communication from an I/O device. Suppose that a 2 GHz processor needs to read 1000 bytes of data from a particular I/O device. The I/O device supplies 1 byte of data every 0.01 ms. The code to process the data and store it in a buffer takes 2000 cycles. a. [5 points] If the processor detects that a byte of data is ready through polling, and a polling iteration takes 60 cycles, how many cycles does the entire operation take? b. [5 points] If instead, the processor is interrupted when a byte is ready, and the processor spends the time between interrupts on another task, how many cycles of this other task can the processor complete while the I/O communication is taking place? The overhead for handling an interrupt is 200 cycles. Problem 3 [Total 14 points] Assume you have a 1GHz processor with 2-levels of cache, DRAM main memory, and a hard disk for virtual memory. The first level cache is split for instructions and data. The L1 cache is virtually addressed and physically tagged. The memory system has the following parameters Hit Time L1 Cache L2 cache DRAM Disk 1 cycle 12 cycles + 1cycle/64 bits 80ns + 10ns/8 bytes 20ms + 20ns/byte Miss Rate 4% for data 1% for instructions 2% 0.001% --Block Size 64 bytes 128 bytes 16Kbytes pages --- 1 EE108B Winter 2008-2009 Prof. Kozyrakis The system includes a TLB with a miss rate of 0.1% for data. It never misses for instructions. The TLB miss penalty is 40 cycles. TLB hits take place in parallel with level-1 cache access. a. [3 points] What is the AMAT in number of cycles for instruction accesses? b. [3 points] What is the AMAT in number of cycles for data accesses? c. [8 points] You have now decided to change your L1 cache from virtually indexed, physically tagged to virtually indexed, virtually tagged. What does this change architecturally? What are the new AMAT times for instructions and data? Problem 4 [Total 35 points] Working for a hard drive manufacturer, you recently received a rather cryptic note from a fellow engineer. She provided a number of technical specifications about the hard drive, but neglected to include some basic facts, figuring that you could easily do the calculations yourself. At the end of the note she also provided a few suggestions to improve performance and asked for your thoughts on the proposals. The information contained in the message tells you that the drive spins at 8,000 RPM, has 526 sectors per track, 512 bytes per sector, 10 platters, and 12,100 cylinders. It also notes that the average seek is down to 5.6 ms for the Ultra160 SCSI interface which has a DMA controller overhead of approximately 0.5 ms. [5 points] What is the unformatted capacity of the drive in GB where 230 =1,073,741,824? b. [5 points] Excluding overhead, what is the disk bandwidth (bytes/sec)? c. [5 points] What is the average time to access a single disk sector? d. [5 points] What is the effective bandwidth one of of these drives when transferring 64 KB blocks? Use the variable b to represent the disk bandwidth and t to represent the average access time. a. Now that you know a little more about the drive, you decide to try it out yourself. The DMA controller on your system can support up to 32 disks and is using an I/O bus capable of a sustained 16010^6 bytes/sec. The system has a 64 bit wide memoryprocessor bus running at 150 MHz that can transfer four 64 bits every 8 bus cycles. e. [5 points] How many disk drives are required to saturate the I/O bus? Is it even possible? You must calculate the number of drives needed to saturate the bus for 2 cases: 1. Continuous read and 2. Reading blocks of 64 KB. f. [5 points] How many I/O buses are required to saturate the memory-processor bus? g. [5 points] A common problem designers of I/O systems face with DMA transfers is maintaining cache coherency. One possible solution to this problem is to send all I/O operations through the cache. This strategy is unfortunately less than ideal. What problem is introduced into the system when dealing with large block transfers into memory? 2 EE108B Winter 2008-2009 Problem 5: [Total 15 points] Prof. Kozyrakis Assume that we have the following two magnetic disk configurations: a single disk and an array of four disks. Each disk has 256 sectors per track, each sector holds 1Kbytes, the disk revolves at 10,000 RPM seek time is 6 ms transfer bandwidth is 4MB/s disk controller delay 0.1 ms per transaction, either for a single disk or for the array. consecutive sectors on the single disk system are spread one sector per disk in the array. Assume: requests are random reads, half of which are 16 KB and half of which are 32 KB of data from sequential sectors. sectors may be read in any order rotational latency is one-half the revolution time for the single disk read and the disk array read. Determine the performance in bytes per second for each system. (i.e Transfer rate of 1 disk and Transfer rate of 4 disks). Problem 6 [Total 14 points] a. [5 points] Here are a variety of building blocks used in an I/O system that has a synchronous processor-memory bus running at 200 MHz and one or more I/O adapters that interface I/O buses to the processor-memory bus. Memory system: The memory system has a 32-bit interface and handles four-word transfers. The memory system has separate address and data lines and, for writes to memory, accepts a word every clock cycle for 4 clock cycles and then takes an additional 4 clock cycles before the words have been stored and it can accept another transaction. DMA interface: The I/O adapters use DMA to transfer the data between the I/O buses and the processor-memory bus. The DMA unit arbitrates for the processormemory bus and sends/receives four-word blocks from/to the memory system. The DMA controller can accommodate up to eight disks. Initiating a new I/O operation (including the seek and access) takes 1.5 ms, during which another I/O cannot be initiated by this controller (but outstanding operations can be handled). I/O bus: The I/O bus is a synchronous bus with a sustainable bandwidth of 6 MB/sec; each transfer is one word long. Disks: The disks have a measured average seek plus rotational latency of 20 ms. The disks have a read/write bandwidth of 4 MB/sec, when they are transferring. Find the time required to read a 32 KB sector from a disk to memory, assuming that this is the only activity on the bus. 3 EE108B Winter 2008-2009 Prof. Kozyrakis Hint: first identify the slowest component of the system and then perform your calculation based on that. b. [9 points] Consider the case of the system in Part (a) with a total of 5 I/O buses, 5 DMA controllers, and 50 disks. With this organization, clearly it is possible to saturate the I/O buses because you have 5 of them at 6 MB/sec and 50 disks at 4 MB/sec. Compute the minimum block size (which should be a power of two) that will saturate the I/O buses. For this block size, how many I/O operations per second can the system perform and what is the I/O bandwidth? 4
MOST POPULAR MATERIALS FROM EE 108
MOST POPULAR MATERIALS FROM EE
MOST POPULAR MATERIALS FROM Stanford