10 Pages

Paper1

Course: CS 838, Fall 2009
School: Evansville
Rating:
 
 
 
 
 

Word Count: 7132

Document Preview

Case The for a Single-Chip Multiprocessor Kunle Olukotun, Basem A. Nayfeh, Lance Hammond, Ken Wilson, and Kunyung Chang Computer Systems Laboratory Stanford University Stanford, CA 94305-4070 http://www-hydra.stanford.edu Abstract Advances in IC processing allow for more microprocessor design options. The increasing gate density and cost of wires in advanced integrated circuit technologies require that we look...

Register Now

Unformatted Document Excerpt

Coursehero >> Indiana >> Evansville >> CS 838

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Case The for a Single-Chip Multiprocessor Kunle Olukotun, Basem A. Nayfeh, Lance Hammond, Ken Wilson, and Kunyung Chang Computer Systems Laboratory Stanford University Stanford, CA 94305-4070 http://www-hydra.stanford.edu Abstract Advances in IC processing allow for more microprocessor design options. The increasing gate density and cost of wires in advanced integrated circuit technologies require that we look for new ways to use their capabilities effectively. This paper shows that in advanced technologies it is possible to implement a single-chip multiprocessor in the same area as a wide issue superscalar processor. We find that for applications with little parallelism the performance of the two microarchitectures is comparable. For applications with large amounts of parallelism at both the fine and coarse grained levels, the multiprocessor microarchitecture outperforms the superscalar architecture by a significant margin. Single-chip multiprocessor architectures have the advantage in that they offer localized implementation of a high-clock rate processor for inherently sequential applications and low latency interprocessor communication for parallel applications. To understand the performance trade-offs between wide-issue processors and multiprocessors in a more quantitative way, we compare the performance of a six-issue dynamically scheduled superscalar processor with a 4 two-issue multiprocessor. Our comparison has a number of unique features. First, we accurately account for and justify the latencies, especially the cache hit time, associated with the two microarchitectures. Second, we develop floor-plans and carefully allocate resources to the two microarchitectures so that they require an equal amount of die area. Third, we evaluate these architectures with a variety of integer, floating point and multiprogramming applications running in a realistic operating system environment. The results show that on applications that cannot be parallelized, the superscalar microarchitecture performs 30% better than one processor of the multiprocessor architecture. On applications with fine grained thread-level parallelism the multiprocessor microarchitecture can exploit this parallelism so that the superscalar microarchitecture is at most 10% better. On applications with large grained thread-level parallelism and multiprogramming workloads the multiprocessor microarchitecture performs 50100% better than the wide superscalar microarchitecture. The remainder of this paper is organized as follows. In Section 2, we discuss the performance limits of superscalar design from a technology and implementation perspective. In Section 3, we make the case for a single chip multiprocessor from an applications perspective. In Section 4, we develop floor plans for a six-issue superscalar microarchitecture and a 4 two-issue multiprocessor and examine their area requirements. We describe the simulation methodology used to compare these two microarchitectures in Section 5, and in Section 6 we present the results of our performance comparison. Finally, we conclude in Section 7. 1 Introduction Advances in integrated circuit technology have fueled microprocessor performance growth for the last fifteen years. Each increase in integration density allows for higher clock rates and offers new opportunities for microarchitectural innovation. Both of these are required to maintain microprocessor performance growth. Microarchitectural innovations employed by recent microprocessors include multiple instruction issue, dynamic scheduling, speculative execution and non-blocking caches. In the future, the trend seems to be towards CPUs with wider instruction issue and support for larger amounts of speculative execution. In this paper, we argue against this trend. We show that, due to fundamental circuit limitations and limited amounts of instruction level parallelism, the superscalar execution model will provide diminishing returns in performance for increasing issue width. Faced with this situation, building a complex wide issue superscalar CPU is not the most efficient use of silicon resources. We present the case that a better use of silicon area is a multiprocessor microarchitecture constructed from simpler processors. 2 The Limits of the Superscalar Approach Appears in Proceedings Seventh International Symp. Architectural Support for Programming Languages and Operating Systems (ASPLOS VII), Cambridge, MA, October 1996 A recent trend in the microprocessor industry has been the design of CPUs with multiple instruction issue and the ability to execute instructions out of program order. This ability, called dynamic scheduling, first appeared in the CDC 6600 [21]. Dynamic scheduling uses hardware to track register dependencies between instructions; an instruction is executed, possibly out of program order, as soon as all of its dependencies are satisfied. In the CDC 6600 the register dependency checking was done with a hardware structure called the scoreboard. The IBM 360/91 used register renaming to improve the efficiency of dynamic scheduling using hardware struc- Instruction Fetch Issue and Retirement Execution Integer Units Reorder Buffer Instruction Fetch & Decode and Instruction Issue Queues LD/ST Units FP Units achieving this. A technique that divides the instruction cache into banks and fetches from multiple banks at once is not too expensive to implement and provides performance that is within 3% of a perfect scheme on an 8-wide issue machine. Even with good branch prediction and alignment a significant cache miss rate will limit the ability of the fetcher to maintain an adequate window of instructions. There are still some applications such as large logic simulations, transactions processing and the OS kernel that have significant instruction cache miss rates even with fairly large 64 KB two way set-associative caches [19]. Fortunately, it is possible to hide some of the instruction cache miss latency in a dynamically scheduled processor by executing instructions that are already in the instruction window. Rosenblum et. al. have shown that over 60% of the instruction cache miss latency can be hidden on a database benchmark with a 64KB two way set associative instruction cache [19]. Given good branch prediction and instruction alignment it is likely that the fetch phase of a wide-issue dynamic superscalar processor will not limit performance. In the issue phase, a packet of renamed instructions is inserted into the instruction issue queue. An instruction is issued for execution once all of its operands are ready. There are two ways to implement renaming. One could use an explicit table for mapping architectural registers to physical registers, this scheme is used in the R10000 [24], or one could use a combination reorder buffer/instruction queue as in the PA-8000 [14]. The advantage of the mapping table is that no comparisons are required for register renaming. The disadvantage of the mapping table is that the number of access ports required by the mapping table structure is O W , where O is the number of operands per instruction and W is the issue width of the machine. An eight-wide issue machine with three operands per instruction requires a 24 port mapping table. Implementing renaming with a reorder buffer has its own set of drawbacks. It requires n Q O W 1-bit comparators to determine which physical registers should supply operands for a new packet of instructions, where n is the number of bits required to encode a register identifier and Q is the size of the instruction issue queue. Clearly, the number of comparators grows with the size of the instruction queue and issue width. Once an instruction is in the instruction queue, all instructions that issue must update their dependencies. This requires another set of n Q O W comparators. For example, a machine with eight wide issue, three operand instructions, a 64entry instruction queue, and 6-bit comparisons requires 9,216 1-bit comparators. The net effect of all the comparison logic and encoding associated with the instruction issue queue is that it takes a large amount of area to implement. On the PA-8000, which is a fourissue machine with 56 instruction issue queue entries, the instruction issue queue takes up 20% of the die area. In addition, as issue widths increase, larger windows of instructions are required to find independent instructions that can issue in parallel and maintain the full issue bandwidth. The result is a quadratic increase in the size of the instruction issue queue. Moving to the circuit level, the instruction issue queue uses a broadcast mechanism to communicate the tags of the instructions that are issued, which requires wires that span the length of the structure. In future advanced integrated circuit technologies these wires will have increasingly long delays relative to the gates that drive them [9]. Given this situation, ultimately, the instruction issue queue will limit the cycle time of the processor. For these reasons we believe that the instruction issue Instruction Cache Data Cache Figure 1. A dynamic superscalar CPU tures called reservation stations [3]. It is possible to design a dynamically scheduled superscalar microprocessor using reservation stations; Johnson gives a thorough description of this approach [13]. However, the most recent implementations of dynamic superscalar processors have used a structure similar to the one shown in Figure 1. Here register renaming between architectural and physical registers is done explicitly, and instruction scheduling and register dependency tracking between instructions are performed in an instruction issue queue. Examples of microprocessors designed in this manner are the MIPS Technologies R10000 [24] and the HP PA-8000 [14]. In these processors the instruction queue is actually implemented as multiple instruction queues for different classes of instructions (e.g. integer, floating point, load/store). The three major phases of instruction execution in a dynamic superscalar machine are also shown in Figure 1. They are fetch, issue and execute. In the rest of this section we describe these phases and the limitations that will arise in the design of a very wide instruction issue CPU. The goal of the fetch phase is to present the rest of the CPU with a large and accurate window of decoded instructions. Three factors constrain instruction fetch: mispredicted branches, instruction misalignment, and cache misses. The ability to predict branches correctly is crucial to establishing a large, accurate window of instructions. Fortunately, by using a moderate amount of memory (64Kbit), branch predictors such as the selective branch predictor proposed by McFarling are able to reduce misprediction rates to under 5% for most programs [15]. However, good branch prediction is not enough. As Conte pointed out, it is also necessary to align a packet of instructions for the decoder [7]. When the issue width is wider than four instructions there is a high probability that it will be necessary to fetch across a branch for a single packet of instructions since, in integer programs, one in every five instructions is a branch [12]. This will require fetching from two cache lines at once and merging the cache lines together to form a single packet of instructions. Conte describes a number of methods for queue will fundamentally limit the performance of wide issue superscalar machines. In the execution phase, operand values are fetched from the register file or bypassed from earlier instructions to execute on the functional units. The wide superscalar execution model will encounter performance limits in the register file, in the bypass logic and in the functional units. Wider instruction issue requires a larger window of instructions, which implies more register renaming. Not only must the register file be larger to accommodate more renamed registers, but the number of ports required to satisfy the full instruction issue bandwidth also grows with issue width. Again, this causes a quadratic increase in the complexity of the register file with increases in issue width. Farkas et. al. have investigated the effect of register file complexity on performance [10]. They find that an eight-issue machine only performs 20% better than a four-issue machine when the effect of cycle-time is included in the performance estimates. The complexity of the bypass logic also grows quadratically with number of execution units; however, a more limiting factor is the delay of the wires that interconnect the execution units. As far as the execution units themselves are concerned, the arithmetic functional units can be duplicated to support the issue width, but more ports must be added to the primary data cache to provide the necessary load/store bandwidth. The cheapest way to add ports to the data cache is by building a banked cache [20], but the added multiplexing and control required to implement a banked cache increases the access time of the cache. We investigate this issue in more detail in Section 4.2. ger rating which is 70% of the rating of a 200 MHz MIPS R10000, which is a four-issue machine [6]. Both machines have the same size data and instruction caches, but the R5000 has a blocking data cache, while the R10000 has a non-blocking data cache. Applications in the second class have large amounts of parallelism and see performance benefits from a variety of methods designed to exploit parallelism such as superscalar, VLIW or vector processing. However, the recent advances in parallel compilers make a multiprocessor an efficient and flexible way to exploit the parallelism in these programs [1]. Single-chip multiprocessors, designed so that the individual processors are simple and achieve very high clock rates, will work well on integer programs in the first class. The addition of low latency communication between processors on the same chip also allows the multiprocessor to exploit the parallelism of the floating point programs in the second class. In Section 6 we evaluate the performance of a single-chip multiprocessor for these two application classes. There are a number of ways to use a multiprocessor. Today, the most common use is to execute multiple processes in parallel to increase throughput in a multiprogramming environment under the control of a multiprocessor aware operating system. We note that there are a number of commercially available operating systems that have this capability (e.g. Silicon Graphics IRIX, Sun Solaris, Microsoft Windows NT). Furthermore, the increasingly widespread use of visualization and multimedia applications tends to increase the number of active processes or independent threads on a desktop machine or server at a particular point in time. Another way to use a multiprocessor is to execute multiple threads in parallel that come from a single application. Two examples are transaction processing and hand parallelized floating point scientific applications [23]. In this case the threads communicate using shared memory, and these applications are designed to run on parallel machines with communication latencies in the hundreds of CPU clock cycles; therefore, the threads do not communicate in a very fine grained manner. Another example of manually parallelized applications are fine-grained thread-level integer applications. Using the results from Wall's study, these applications exhibit moderate amounts of parallelism when the instruction window size is very large and the branch prediction is perfect because the parallelism that exists is widely distributed. Due to the large window size and the perfect branch prediction it will be very difficult for this parallelism could be extracted with a superscalar execution model. However, it is possible for a programmer that understands the nature of the parallelism in the application to parallelize the application into multiple threads. The parallelism exposed in this manner is fine-grained and cannot be exploited by a conventional multiprocessor architecture. The only way to exploit this type of parallelism is with a single-chip multiprocessor architecture. A third way to use a multiprocessor is to accelerate the execution of sequential applications without manual intervention; this requires automatic parallelization technology. Recently, this automatic parallelization technology was shown to be effective on scientific applications [2], but it is not yet ready for general purpose integer applications. Like the manually parallelized integer applications, these applications could derive significant performance benefits from the low-latency interprocessor communication provided by a single-chip multiprocessor. 3 The Case for a Single-Chip Multiprocessor The motivation for building a single chip multiprocessor comes from two sources; there is a technology push and an application pull. We have already argued that technology issues, especially the delay of the complex issue queue and multi-port register files, will limit the performance returns from a wide superscalar execution model. This motivates the need for a decentralized microarchitecture to maintain the performance growth of microprocessors. From the applications perspective, the microarchitecture that works best depends on the amount and characteristics of the parallelism in the applications. Wall has performed one of the most comprehensive studies of application parallelism [22]. The results of his study indicate that applications fall in two classes. The first class consists of applications with low to moderate amounts of parallelism; under ten instructions per cycle with aggressive branch prediction and large, but not infinite window sizes. Most of these applications are integer applications. The second class consists of applications with large amounts of parallelism, greater than forty instructions per cycle with aggressive branch prediction and large window sizes. The majority of these applications are floating point applications and most of the parallelism is in the form of loop-level parallelism. The application pull towards a single-chip multiprocessor arises because these two classes of applications require different execution models. Applications in the first class work best on processors that are moderately superscalar (2 issue) with very high clock rates because there is little parallelism to exploit. To make this more concrete we note that a 200 MHz MIPS R5000, which is a single issue machine when running integer programs, achieves a SPEC95 inte- 6-way SS # of CPUs Degree superscalar # of architectural registers # of physical registers # of integer functional units # of floating pt. functional units # of load/store ports BTB size Return stack size Instruction issue queue size I cache D cache L1 hit time L1 cache interleaving Unified L2 cache L2 hit time / L1 penalty Memory latency / L2 penalty 1 6 32int / 32fp 160int / 160fp 3 3 8 (one per bank) 2048 entries 32 entries 128 entries 32 KB, 2-way S. A. 32 KB, 2-way S. A. 2 cycles (4 ns) 8 banks 256 KB, 2-way S. A. 4 cycles (8 ns) 50 cycles (100 ns) 4x2-way MP 4 4x2 4 x 32int / 32fp 4 x 40int / 40fp 4x1 4x1 4x1 4 x 512 entries 4 x 8 entries 4 x 8 entries 4 x 8 KB, 2-way S. A. 4 x 8 KB, 2-way S. A. 1 cycle (2 ns) N/A 256 KB, 2-way S. A. 5 cycles (10 ns) 50 cycles (100 ns) Table 1. Key characteristics of the two microarchitectures 4 Two Microarchitectures To compare the wide superscalar and multiprocessor design approaches, we have developed the microarchitectures for two machines that will represent the state of the art in processor design a few years from now. The superscalar microarchitecture (SS) is a logical extension of the current R10000 superscalar design, widened from the current four-way issue to a six-way issue implementation. The multiprocessor microarchitecture (MP), is a four-way single-chip multiprocessor composed of four identical 2-way superscalar processors. In order to fit four identical processors on a die of the same size, each individual processor is comparable to the Alpha 21064, which became available in 1992 [8]. These two extremely different microarchitectures have nearly identical die sizes when built in identical process technologies. The processor size we select is based upon the kinds of processor chips that advances in silicon processing technology will allow in the next few years. When manufactured in a 0.25 m process, which should be possible by the end of 1997, each of the chips will have an area of 430 mm2 -- about 30% larger than leading-edge microprocessors being shipped today. This represents typical die size growth over the course of a few years among the largest, fastest microprocessors [11]. We have argued that the simpler two-issue CPU used in the multiprocessor microarchitecture will have a higher clock rate than the six issue CPU; however, for the purposes of this comparison we have assumed that the two processors have the same clock rate. To achieve the same clock rate the wide superscalar architecture would require deeper pipelining due to the large amount of instruction issue logic in the critical path. For simplicity, we ignore latency variations between the architectures due to the degree of pipelining. We assume the clock frequency of both machines is 500 MHz. At 500 MHz the main memory latencies experienced by the processor are large. We have modeled the main memory as a 50-cycle, 100 ns delay for both architectures, typical values in a workstation today with 60 ns DRAMs and 40 ns of delays due to buffering in the DRAM controller chips [25]. Table 1 shows the key characteristics of the two architectures. We explain and justify these characteristics in the following sections. The integer and floating point functional unit result and repeat latencies are the same as the R10000 [24] 4.1 6-Way Superscalar Architecture The 6-way superscalar architecture is a logical extension of the current R10000 design. As the floorplan in Figure 2 and the area breakdown in Table 2 indicate, the logic necessary for out-of-order instruction issue and scheduling dominates the area of the chip, due to the quadratic area impact of supporting 6-way instruction issue. First, we increased the number of ports in the instruction buffers by 50% to support 6-way issue instead of 4-way, increasing the area of each buffer by about 30-40%. Second, we increased the number of instruction buffers from 48 to 128 entries so that the processor examines a larger window of instructions for ILP to keep the execution units busy. This large instruction window also compensates for the fact that the simulations do not execute code that is optimized for a 6-way superscalar machine. The larger instruction window size and wider issue width causes a quadratic area increase of the instruction sequencing logic to 3-4 times its original size. Altogether, the logic necessary to handle out-of-order instruction issue occupies about 120 mm2 -- about 30% of the die. In comparison, the actual execution units only occupy about 70 mm2 -- just 18% of the die is required to build triple R10000 execution units in a 0.25 m process. Due to the increased rate at which instructions are issued, we also enhanced the fetch logic by increasing the size of the branch target buffer to 2048 entries and the call-return stack to 32 entries. This increases the branch prediction accuracy of the processor and pre- 21 mm 21 mm I-Cache #1 (8K) I-Cache #2 (8K) External Interface External Interface Instruction Fetch Instruction Cache (32 KB) Clocking & Pads Clocking & Pads Data Cache (32 KB) 21 mm 21 mm D-Cache #1 (8K) D-Cache #2 (8K) D-Cache #3 (8K) D-Cache #4 (8K) Integer Unit Reorder Buffer, Instruction Queues, and Out-of-Order Logic Processor #3 Processor #4 Floating Point Unit I-Cache #3 (8K) I-Cache #4 (8K) Figure 2. Floorplan for the six-issue dynamic superscalar microprocessor. vents the instruction fetch mechanism from becoming a bottleneck since the 6-way execution engine requires a much higher instruction fetch bandwidth than the 2-way processors used in the MP architecture. The on-chip memory hierarchy is similar to the Alpha 21164 -- a small, fast level one (L1) cache backed up by a large on-chip level two (L2) cache. The wide issue width requires the L1 cache to support wide instruction fetches from the instruction cache and multiple loads from the data cache during each cycle. The two-way set associative 32 KB L1 data cache is banked eight ways into eight small, single-ported, independent 4 KB cache banks each of which handling one access every 2 ns processor However, cycle. the additional overhead of the bank control logic and crossbar required to arbitrate between the multiple requests sharing the 8 data cache banks adds another cycle to the latency of the L1 cache, and increases the area by 25%. Therefore, our modeled L1 cache has a hit time of 2 cycles. Backing up the 32 KB L1 caches is a large, unified, 256 KB L2 cache that takes 4 cycles to access. These latencies are simple extensions of the times obtained for the L1 caches of current Alpha microprocessors [4], using a 0.25 m process technology Figure 3. Floorplan for the four-way single-chip multiprocessor. sors is less than one-fourth the size of the 6-way SS processor, as shown in Table 3. The number of execution units actually increases in the MP because the 6-way processor had three units of each type, while the 4-way MP must have four -- one for each CPU. On the other hand, the issue logic becomes dramatically smaller, due to the decrease in instruction buffer ports and the smaller number of entries in each instruction buffer. The scaling factors of these two units balance each other out, leaving the entire processor very close to one-fourth of the size of the 6-way processor. The on-chip cache hierarchy of the multiprocessor is significantly different from the cache hierarchy of the 6-way superscalar processor. Each of the 4 processors has its own single-banked and singleported 8 KB instruction and data caches that can both be accessed in a single 2 ns cycle. Since each cache can only be accessed by a single processor with a single load/store unit, no additional overhead is incurred to handle arbitration among independent memoryaccess units. However, since the four processors now share a single L2 cache, that cache requires an extra cycle of latency during every access to allow time for interprocessor arbitration and crossbar delay. We model this additional L2 delay by penalizing the MP an additional cycle on every L2 cache access, resulting in a 5 cycle L2 hit time. 4.2 4 x 2-way Superscalar Multiprocessor Architecture 5 Simulation Methodology The MP architecture is made up of four 2-way superscalar processors interconnected by a crossbar that allows the processors to share the L2 cache. On the die, the four processors are arranged in a grid with the L2 cache at one end, as shown in Figure 3. Internally, each of the processors has a register renaming buffer that is much more limited than the one in the 6-way architecture, since each CPU only has an 8-entry instruction buffer. We also quartered the size of the branch prediction mechanisms in the fetch units, to 512 BTB entries and 8 call-return stack entries. After the area adjustments caused by these factors are accounted for, each of the four proces- Accurately evaluating the performance of the two microarchitectures requires a way of simulating the environment in which we would expect these architectures to be used in real systems. In this section we describe the simulation environment and the applications used in this study. 5.1 Simulation Environment We execute the applications in the SimOS simulation environment [18]. SimOS models the CPUs, memory hierarchy and I/O devices L2 Communication Crossbar Inst. Decode & Rename On-Chip L2 Cache (256KB) TLB On-Chip L2 Cache (256KB) Processor #1 Processor #2 CPU Component 256K On-Chip L2 Cache 8-bank D Cache (32 KB) 8-bank I Cache (32 KB) TLB Mechanism External Interface Unit Instruction Fetch Unit and BTB Instruction Decode Section Instruction Queues Reorder Buffer Integer Functional Units FP Functional Units Clocking & Overhead Total Size a 0.35mm R10K Original Size (mm2) 219 26 28 10 27 18 21 28 17 20 24 73 -- Size Extrapolated to 0.25mm (mm2) 112 13 14 5 14 9 11 14 9 10 12 37 -- % Growth Due to New Functionality 0% 25% 25% 200% 0% 200% 250% 250% 300% 200% 200% 0% -- New Size (mm2) 112 17 18 15 14 28 38 50 34 31 37 37 430 % Area 26% 4% 4% 3% 3% 6% 9% 12% 9% 7% 9% 9% 100% Table 2. Size extrapolations for the 6-way superscalar from the MIPS R10000 processor % Area (of CPU / of entire chip) 6% /3% 7% / 3% 9% / 5% 13% / 7% 10% / 5% 8% / 4% 3% / 2% 20% / 10% 23% / 12% 100% / 50% 26% 3% 12% 9% 100% CPU Component D Cache (8 KB) I Cache (8 KB) TLB Mechanism Instruction Fetch Unit and BTB Instruction Decode Section Instruction Queues Reorder Buffer Integer Functional Units FP Functional Units Per-CPU Subtotal 256K On-Chip L2 Cache a External Interface Unit Crossbar Between CPUs Clocking & Overhead Total Size 0.35mm R10K Original Size (mm2) 26 28 10 18 21 28 17 20 24 -- 219 27 -- 73 -- Size Extrapolated to 0.25mm (mm2) 13 14 5 9 11 14 9 10 12 -- 112 14 -- 37 -- % Growth Due to New Functionality -75% -75% 0% -25% -50% -70% -80% 0% 0% -- 0% 0% -- 0% -- New Size (mm2) 3 4 5 7 5 4 2 10 12 53 112 14 50 37 424 Table 3. Size extrapolations in the 4 2-way MP from the MIPS R10000 processor. a. estimated from current L1 caches of uniprocessor and multiprocessor systems in sufficient detail to boot and run a commercial operating system. SimOS uses the MIPS-2 instruction set and runs the Silicon Graphics IRIX 5.3 operating system which has been tuned for multiprocessor performance. SimOS actually simulates the operating system; therefore, all the memory references made by the operating system and the applications are generated. This feature is particularly important for the study of multiprogramming workloads where the time spent executing kernel code makes up a significant fraction of the nonidle execution time. A unique feature of SimOS that makes studies such as this feasible is that SimOS supports multiple CPU simulators that use a common instruction set architecture. This allows trade-offs to be made between the simulation speed and accuracy. The fastest CPU simu- lator, called Embra, uses binary-to-binary translation techniques and is used for booting the operating system and positioning the workload so that we can focus on interesting regions of execution. The medium performance CPU simulator, called Mipsy, is two orders of magnitude slower than Embra. Mipsy is an instruction set simulator that models all instructions with a one cycle result latency and a one cycle repeat rate. Mipsy interprets all user and privileged instructions and feeds memory references to a memory system simulator. The slowest, most detailed CPU simulator is MXS, which supports dynamic scheduling, speculative execution and non-blocking memory references. MXS is over four orders of magnitude slower than Embra. The cache and memory system component of our simulator is completely event-driven and interfaces to the SimOS processor model Integer applications compress eqntott m88ksim MPsim applu apsi swim tomcatv pmake compresses and uncompresses file in memory translates logic equations into truth tables Motorola 88000 CPU simulator VCS compiled Verilog simulation of a multiprocessor Floating point applications solver for parabolic/elliptic partial differential equations solves problems of temperature, wind, velocity, and distribution of pollutants shallow water model with 1K 1K grid mesh-generation with Thompson solver Multiprogramming application parallel make of gnuchess using C compiler Table 4. The applications. which drives it. Processor memory references cause threads to be generated which keep track of the state of each memory reference and the resource usage in the memory system. A call-back mechanism is used to inform the processor of the status of all outstanding references, and to inform the processor when a reference completes. These mechanisms allow for very detailed cache and memory system models, which include cycle accurate measures of contention and resource usage throughout the system. extra processors in the MP microarchitecture to run multiple compilations in parallel. A difficult problem that arises when comparing the performance of different processors is ensuring that they do the same amount of work. The solution is not as easy as comparing the execution times of each application on each machine. Due to the slow simulation speed of the detailed CPU simulator (MXS) used to collect these results it would take far too long to run the applications to completion. Our solution is to compare the two microarchitectures over a portion of the application using a technique called representative execution windows [5]. In most compute intensive applications there is a steady state execution region that consists of a single outer loop or a set of loops that makes up the bulk of the execution time. It is sufficient to sample a small number of iterations of these loops as a representative execution window if the execution time behavior of the window is indeed representative of the entire program. Simulation results show that for most applications the cache miss rates and the number of instructions executed in the window deviates by less than 1% from the results for the entire program. The simulation procedure begins with a checkpoint taken with the Embra simulator. Simulation from the checkpoint starts with the instruction level simulator Mipsy and the full memory system. After the caches are warmed by running the Mipsy simulator through the representative execution window at least once, the simulator is switched to the detailed simulator, MXS, to collect the performance results presented in this paper. We use the technique of representative execution windows for all the applications except pmake. Pmake does not have a well defined execution region that is representative of the application as a whole. Therefore, the results for pmake are collected by running the entire application with MXS. 5.2 Applications The performance of nine realistic applications is used to evaluate the two microarchitectures. Table 4 shows that the nine applications are made up of two SPEC95 integer benchmarks (compress, m88ksim), one SPEC92 integer benchmark (eqntott), one other integer application (MPsim), four SPEC95 floating point benchmarks (applu, apsi, swim, tomcatv), and a multiprogramming application (pmake). The applications are parallelized in different ways to run on the MP microarchitecture. Compress is run unmodified on both the SS and MP microarchitectures; using only one processor of the MP architecture. Eqntott is parallelized manually by modifying a single bit vector comparison routine that is responsible for 90% of the execution time of the application [16]. The CPU simulator m88ksim is also parallelized manually into three threads using the SUIF compiler runtime system. Each of the three threads is allowed to be in a different phase of simulating a different instruction at the same time. This style of parallelization is very similar to the overlap of instruction execution that occurs in hardware pipelining. The MPsim application is a Verilog model of a bus based multiprocessor running under a multi-threaded compiled code simulator (Chronologic VCS-MT). The multiple threads are specified manually by assigning parts of the model hierarchy to different threads. The MPsim application uses four closely coupled threads; one for each of the processors in the model. The parallel versions of the SPEC95 floating point benchmarks are automatically generated by the SUIF compiler system [2]. The pmake application is a program development workload that consists of the compile phase of the Modified Andrew Benchmark [17]. The same pmake application is executed on both microarchitectures; however, the OS takes advantage of the 6 Performance Comparison We begin by examining the performance of the superscalar microarchitecture and one processor of the multiprocessor microarchitecture. Table 5 shows the IPC, branch prediction rates and cache miss rates for one processor of the MP; Table 6 shows the IPC, branch prediction rates, and cache miss rates for the SS microarchitecture. The cache miss rates are presented in the tables in terms of misses per completed instruction (MPCI); including instructions that complete in kernel and user mode. When the issue width is increased from two to six we see that the actual IPC increases by less than a factor of 1.6 for all of the integer and multiprogramming applications. For the floating point applications the performance improvement varies from a factor of 1.6 for tomcatv to 2.4 for swim.. BP Rate % 85.9 79.8 91.7 78.7 79.2 95.1 99.7 99.6 86.2 I cache %MPCI 0.0 0.0 2.2 5.1 0.0 1.0 0.0 0.0 2.3 D cache %MPCI 3.5 0.8 0.4 2.3 2.0 4.1 1.2 7.7 2.1 L2 cache %MPCI 1.0 0.7 0.0 2.3 1.7 have significant instruction cache stall time which is due to the large instruction working set size of these applications. Pmake also has multiple processes and significant kernel execution time which further increases the instruction cache miss rate. 2 D Cache Stall I Cache Stall 1.5 Pipeline Stall Actual IPC Program compress eqntott m88ksim MPsim applu apsi swim tomcatv pmake IPC 0.9 1.3 1.4 0.8 0.9 0.6 0.9 0.8 1.0 IPC 1 0.5 compress m88ksim eqntott MPsim tomcatv apsi applu swim 1.2 2.2 0.4 Figure 4. IPC Breakdown for a single 2-issue processor. Table 5. Performance of a single 2-issue superscalar processor. 6 pmake 2.1 0 Program compress eqntott m88ksim MPsim applu apsi swim tomcatv pmake IPC 1.2 1.8 2.3 1.2 1.7 1.2 2.2 1.3 1.4 BP Rate % 86.4 80.0 92.6 81.6 79.7 95.6 99.8 99.7 82.7 I cache %MPCI 0.0 0.0 0.1 3.4 0.0 0.2 0.0 0.0 0.7 D cache %MPCI 3.9 1.1 0.0 1.7 2.8 3.1 2.3 4.2 1.0 L2 cache %MPCI 5 D Cache Stall I Cache Stall Pipeline Stall 4 Actual IPC 1.1 1.1 0.0 IPC 2.3 2.8 3 2 2.6 2.5 4.3 m88ksim eqntott MPsim compress tomcatv apsi applu swim pmake 1 0.6 0 Table 6. Performance of the 6-issue superscalar processor. One of the major causes of processor stalls in a superscalar processor is cache misses. However, cache misses in a dynamically scheduled superscalar processor with speculative execution and nonblocking caches are not straightforward to characterize. The cache misses that occur in a single issue in-order processor are not necessarily the same as the misses that will occur in the speculative outof-order processor. In speculative processors there are misses that are caused by speculative instructions that never complete. With non-blocking caches, misses may also occur to lines which already have outstanding misses. Both types of misses tend to inflate the cache miss rate of a speculative out-of-order processor. The second type of miss is mainly responsible for the higher L2 cache miss rates of the 6-issue processor compared to the 2-issue processor, even though the cache sizes are equal. Figure 4 shows the IPC breakdown for one processor of the MP microarchitecture with an ideal IPC of two. In addition to the actual IPC achieved, we show the loss in IPC due to data and instruction cac...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

Western Michigan - CS - 6030
ACCEPTED FROM OPEN CALLBLUETOOTH AND WI-FI WIRELESS PROTOCOLS: A SURVEY AND A COMPARISON` ERINA FERRO AND FRANCESCO POTORTI, INSTITUTE OF THE NATIONAL RESEARCH COUNCIL (ISTICNR)ABSTRACTBluetooth and IEEE 802.11 (Wi-Fi) are two communication protocol s
Georgia Tech - CEE - 4100
WHEN YOU BUILD.Contractor Failure What causesSurety Bonds:Financial Security Construction AssuranceBONDcontractor failure?What Causes Contractor Failure? Rapid Over Expansion Project Management Problems Change in ownership or key personnel Communi
Allan Hancock College - JSSC - 9899
536IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999Comparative Analysis of MasterSlave Latches and Flip-Flops for High-Performance and Low-Power SystemsVladimir Stojanovic and Vojin G. Oklobdzija, Fellow, IEEEAbstractIn this paper, we
Cal Poly - EE - 3604
The Complete Smith ChartBlack Magic Design0.110.12 0.380.13 0.370.14 0.360.150450.09 0.4 10.41100.39 10090 1.00.15 0.35800.1 0.3 460.91.240701.40.08 0.4 20 120.80.7351.60.160 305570.6 600.30.07 3 0.4 0 13700.0(+ jX
CSU Northridge - HCMTH - 326
Transparencies to accompany Rosen, Discrete Mathematics and Its ApplicationsSections 4.1 & 4.2Section 4.1 - Mathematical Induction and Section 4.2 - Strong Induction and Well-Ordering A very special rule of inference! Definition: A set S is well ordered
East Los Angeles College - EC - 202
EC 202 Intermediate MicroeconomicsLecture 8, Week 9, Autumn 2008 Competitive MarketsWhat is a Perfectly Competitive Market?A perfectly competitive market consists of firms that produce identical products that sell at the same price. Each firm's volume
East Los Angeles College - EC - 202
EC 202 Intermediate MicroeconomicsWeek 8, Lecture 7, Autumn 2008 The Competitive FirmLecture OutlineProfit Function Profit Maximisation Two Rules for Profit Maximisation Supply Curves Factor Demand Factor Substitution and Output Effects Duality Reading
East Los Angeles College - EC - 202
EC 202 Intermediate MicroeconomicsWeek 7, Lecture 6, Autumn 2008 Costs and Cost FunctionsLecture OutlineEconomic costs Short run costs Long run costs Cost minimisation Economies of Scale Factor demand functions Examples of cost functions Reading: MKR C
East Los Angeles College - EC - 202
EC 202 Intermediate MicroeconomicsLecture 5, Week 6, Autumn 2008 The Firm: Production FunctionLecture OutlineThe Production Function Marginal and Average Products Returns to Scale Isoquants The Marginal Rate of Technical Substitution Elasticity of Subs
Carnegie Mellon - ECE - 18630
Cryptography and Competition Policy Issues with `Trusted Computing'Ross AndersonCambridge UniversityAbstract. The most significant strategic development in information technology over the past year has been `trusted computing'. This is popularly associ
Sanford-Brown Institute - EN - 164
Image Processing9Graphics and imaging are two-dimensional applications. Just as in the one-dimensional case, it is often desirable to manipulate the data through filtering. This chapter describes the implementation of two-dimensional filtering using 3 3
UMass (Amherst) - STAT - 597
STAT597C: Solution of Homework C2 1. options MPRINT SYMBOLGEN; %macro printall(numf=); 0o i=1 %to &numf; data payroll&i; infile "C:\Documents and Settings\anna\Desktop\stat597\hw\payroll&i.dat" firstobs=2 truncover; input ID $ Dept : $30. Date MMDDYY10. A
Caltech - GE - 148
NATURE|Vol 437|15 September 2005|doi:10.1038/nature04158INSIGHT REVIEWMarine microorganisms and global nutrient cyclesKevin R. Arrigo1 The way that nutrients cycle through atmospheric, terrestrial, oceanic and associated biotic reservoirs can constrain
Washington - CSEP - 505
The Objective Caml system release 3.08Documentation and user's manualXavier Leroy (with Damien Doligez, Jacques Garrigue, Didier Rmy and Jr^me Vouillon) e eoJuly 13, 2004Copyright c 2004 Institut National de Recherche en Informatique et en Automatique
alamo.edu - JOTTUM - 2318
May 27, 2005 12:24l57-fm-studentSheet number 15 Page number xvcyan magenta yellow blackA Note to StudentsThis course is potentially the most interesting and worthwhile undergraduate mathematics course you will complete. In fact, some students have wr
Wells - BIOL - 152
Constitutive expression of a single antimicrobial peptide can restore wild-type resistance to infection in immunodeficient Drosophila mutantsPhoebe Tzou*, Jean-Marc Reichhart, and Bruno Lemaitre**Centre de Genetique Moleculaire, Centre National de la Re
ASU - CSE - 494
Indexing by Latent Semantic AnalysisScott Deerwester Graduate Library School University of Chicago Chicago, IL 60637 Susan T. Dumais George W. Furnas Thomas K. Landauer Bell Communications Research 435 South St. Morristown, NJ 07960 Richard Harshman Univ
MD University College - IFSM - 432
Principles of Incident Response and Disaster RecoveryChapter 11 Crisis Management and Human FactorsObjectives Understand the role of crisis management in the typical organization Guide the creation of a plan preparing for crisis management Understand a
MD University College - IFSM - 432
Principles of Incident Response and Disaster RecoveryChapter 10 Business Continuity Operations and MaintenanceObjectives Discuss the details of how a BC plan implementation unfolds Understand the methods used to continuously improve the BC process Desc
MD University College - IFSM - 432
Principles of Incident Response and Disaster RecoveryChapter 7 Disaster Recovery: Preparation and ImplementationObjectives Understand the ways to classify disasters, both by speed of onset and source Know who should form the membership of the disaster
MD University College - IFSM - 432
Principles of Incident Response and Disaster RecoveryChapter 9 Business Continuity Preparation and ImplementationObjectives Understand the elements of business continuity Recognize who should be included in the business continuity team Know the methodo
MD University College - IFSM - 432
Principles of Incident Response and Disaster RecoveryChapter 5 Incident Response: Reaction, Recovery, and MaintenanceObjectives Understand the elements of an incident recovery response, and be aware of the impact of selecting a reaction strategy, devel
MD University College - IFSM - 432
Principles of Incident Response and Disaster RecoveryChapter 3 Incident Response: Preparation, Organization, and PreventionObjectives Know the process used to organize the incident response process Understand how policy affects the incident response pl
MD University College - IFSM - 432
Principles of Incident Response and Disaster RecoveryChapter 2 Planning for Organizational ReadinessObjectives Identify an individual or group to create a contingency policy and plan Understand the elements needed to begin the contingency planning proc
Whitman - M - 495
The beamer classManual for version 3.06.\begincfw_frame \frametitlecfw_There Is No Largest Prime Number \framesubtitlecfw_The proof uses \textitcfw_reductio ad absurdum. \begincfw_theorem There is no largest prime number. \endcfw_theorem \begincfw_proof
Michigan State University - ECE - 202
First-order Circuit Frequency ResponseECE 202Page 1 of 23Copyright by Hayder RadhaECE 202Page 2 of 23Copyright by Hayder RadhaECE 202Page 3 of 23Copyright by Hayder RadhaECE 202Page 4 of 23Copyright by Hayder RadhaECE 202Page 5 of 23Copyrig
Purdue - STAT - 525
STAT 525 - Spring 2009 Midterm 2 Thursday April 2, 2009Time: 2 hours Name (please print):Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will be deducted for false statements, even if the fina
GWU - NSAEBB - 247
La informacin referente a usuarios por municipio no se tiene disponible, ya que las empresas no la reportan a ese nivel de desglose. A continuacin se presenta la tabla de 2001 a 2003 con la informacin pblica disponible de usuarios de telefona mvil en el e
Rose-Hulman - CM - 000000
SODIUM BOROHYDRIDESODIUM BOROHYDRIDEMSDS Number: S3146 - Effective Date: 11/17/991. Product IdentificationSynonyms: Sodium tetrahydroborate CAS No.: 16940-66-2 Molecular Weight: 37.83 Chemical Formula: NaBH4 Product Codes: J.T. Baker: V023 Mallinckrod
Iowa State - C - 9656
IOWA STATE UNIVERSITYUniversity Extension4-H News_December 2008Edition Vol. 11http:/www.extension.iastate.edu/winneshiek.htmlA great big thanks to everyone that participated and attended this years Recognition Party. Keep up the outstanding work! I
Wisconsin - GEOG - 527
1999 GEOLOGIC TIME SCALECENOZOICCHRON. CHRON. ANOM. ANOM. HIST. HIST.MESOZOICPICKS (Ma) 0.01 1.8 3.6 5.3 7.1 90 11.2 80PALEOZOICPICKS UNCERT. (m.y.) (Ma) 65 71.3 .2 1 260 83.5 85.8 89.0 93.5 99.0 1 1 1 1 4 300PRECAMBRIANPICKS (Ma) 248 252 256 260
RPI - ESE - 680
letters to nature.Construction of a genetic toggle switch in Escherichia coliTimothy S. Gardner*, Charles R. Cantor* & James J. Collins** Department of Biomedical Engineering, Center for BioDynamics and Center for Advanced Biotechnology, Boston Univer
Arizona - GEOS - 342
Snowball Earthby Paul F. Hoffman and Daniel P. SchragIce entombed our planet hundreds of millions of years ago, and complex animals evolved in the greenhouse heat wave that followedOur human ancestors had it rough. Saber-toothed cats and woolly mammot
Evansville - CS - 838
3 42 0 1( )! ' %#! & $ " r r y ww vq vq2 q w H~vqtuw Y s | H~ cfw_qtYcfw_wqHr`qHr cfw_qxzBz cfw_wcfw_~ ~ Hs Hqq y u wy ts ~Rwxcfw_ppy H~48q zx8gzGtsHrx8qq~lcfw_HrYyq yo cfw_p 2Ggzqqx cfw_wcfw_~r ~ y84z~q dGyb0vqrycfw_pws Hyuxsq lxq r iH~ ~
Alabama - BIO - 108
Chapter 19- ContinuedPart B: Ecosystem Dynamics: Food Webs and Geochemical Cylesand Part C: Ecological Succession andBiome TypesEnergy Pyramids gy yLet's look how energy flows through an ecosystem. There are several important principles to keep in mi
Ill. Chicago - NR - 1865
Dear Graduate or Professional School Recruiter: Thank you for registering for the 4th annual Chicago Graduate and Professional School Fair. As we get closer to the October 2nd date, I would like to provide you with some additional information about the ev
Bridgeport - TCMG - 510
Syllabus Foundations of Corporate, Government and Information Security and Continuity University of Bridgeport TCMG 510XOn Line & Traditional Class Version ( Traditional for Fall 2007) H.M. Van Bemmelen (e:mail: HVB-Fairfield@worldnet.att.net) 1.0 Course
NJIT - CS - 110
Chapter 2: Data and ExpressionsJava Software Solutions Foundations of Program Design Sixth Edition by Lewis & LoftusCopyright 2009 Pearson Education, Inc. Publishing as Pearson Addison-WesleyData and Expressions Let's explore some other fundamental pr
Duke - CPS - 110
/ care of Darrell Anderson/ load a page from the executable (or zero if uninit data segment)./ into the current address space./void AddrSpace:LoadPage(OpenFile *executable, NoffHeader *noffH, int vpn)cfw_ int pageStart = vpn * PageSize; / page start
IUPUI - ECE - 602
ECE 602 Lumped Systems TheoryNovember 12, 20071ECE 602 Homework Assignment 41. Use two different methods to compute the unit-step response of the system 0 1 -3 -3 y = [1 2] x. x = Assume zero initial conditions. 2. Discretize the state equation of Pro
IUPUI - ECE - 602
ECE 602 Lumped Systems TheoryAssignment 5November 13, 20071ECE 602 Homework Assignment 5Bounded-Input Bounded-Output Stability 1. Determine whether each of the following systems is BIBO stable for t [0, ). (a) The system whose impulse response is g(t
IUPUI - ECE - 602
ECE 602 Lumped Systems TheorySeptember 16, 20071ECE 602 Homework Assignment 31. Consider the A and B matrices obtained in Assignment 2, namely, A=0 1 2 30 0 0 0 0 -20 /r0 0 0 0 00 0 0 0 2r0 0 0 0 1 0 0 0 0 0 0 0 2 0 0 -0 0 0 01 mr00 0 0 0 1 0 ,
Tennessee - ECE - 335
2N3906 / MMBT3906 / PZT39062N3906MMBT3906CPZT3906CE C BE CTO-92ESOT-23Mark: 2ABSOT-223BPNP General Purpose AmplifierThis device is designed for general purpose amplifier and switching applications at collector currents of 10 A to 100 mA.A
Tennessee - ECE - 335
2N3904 / MMBT3904 / PZT39042N3904MMBT3904CPZT3904CE C BE CTO-92ESOT-23Mark: 1ABSOT-223BNPN General Purpose AmplifierThis device is designed as a general purpose amplifier and switch. The useful dynamic range extends to 100 mA as a switch a
Midwestern State University - AFS - 101
insight review articlesAgricultural sustainability and intensive production practicesDavid Tilman*, Kenneth G. Cassman, Pamela A. Matson |, Rosamond Naylor| & Stephen Polasky*Department of Ecology, Evolution and Behavior, and Department of Applied Econ
Arizona - ECE - 372
Features Fast Read Access Time 150 ns Automatic Page Write Operation Internal Address and Data Latches for 64 Bytes Internal Control Timer Fast Write Cycle Times Page Write Cycle Time: 3 ms or 10 ms Maximum 1 to 64-byte Page Write Operation Low Power D
Middlebury - WRPR - 0100
A Conversation with Barbara Ganley of Middlebury College and Ideas Unleashed! at the Vermont State CollegesThe 21st-Century College ClassroomUsing Blogs and Digital Stories to Foster Active Learning and Collaborative Learning CommunitiesWriting Exercis
Dallas - PDFS - 200609
CampusLife17"Mermaid"BuildingGetting beneath the scales of UTD's new research facilityConstruction for the facility is on schedule and on budget. UTD broke ground on the Natural Science and Engineering building in November 2004 and expects it to be
Washington - MEMO - 570
LARC 570 - Memo assignment #3Structural Critique of a Design ThesisLaura DavisAmy C. Potter. Cross Cultural Practice: The Design of a School for an Orphanage Near Bangalore, India. A thesis for the degree of Master of Architecture, University of Washin
San Jose State - CMPE - 202
THE ANYACCOUNT PATTERNMohamed E. Fayad1 and Haitham Hamza21Computer Engineering Dept., College of Engineering, San Jos State University One Washington Square, San Jos, CA 95192-0180 m.fayad@sjsu.edu2Computer Science and Engineering Dept., University
NMT - READ - 32749
THECONSTITUTIONOFTHENEWMEXICOINSTITUTE OFMININGANDTECHNOLOGYSTUDENTCHAPTER OFTHE AMERICANSOCIETYOFCIVILENGINEERS Article1:StatementofPurposeWe,theStudentChapteroftheAmericanSocietyofCivilEngineers,hereafterreferredto astheorganization,aredevotedtopromoti
Penn State - ALD - 5113
Domain C: Reflecting on Teaching for Understanding Artifact #3: Math Review WorksheetThis particular artifact belongs under domain C and the sub-domain C2, "Reflecton-Action." This review worksheet belongs under this domain because she is critically anal
Rose-Hulman - CE - 489
Todd J. Clementoni CM 2179 CE 489 Life Long LearningMay 9, 2007 Page 1 of 1Significant changes in the CE profession over the last 10-20 years Growth of green engineering Advances in graphical and technical software Advances in research methods and tech
Texas San Antonio - CS - 6463
Efficiently Computing Static Single Assignment Form and the Control Dependence GraphRON CYTRON, MARK and F. KENNETH Brown ZADECK University IBM Research JEANNE Division FERRANTE, BARRY K. ROSEN, and N. WEGMANIn optimizing compilers, data structure choic
University of Alabama in Huntsville - MIS - 112
U N I V E R S I T YI N F O R M A T I O NT E C H O L O G YS E R V I C E SPico: A Unix text editorThis guide tells you how to use the Pico text editor to create and edit text files on UITS Unix computers.To access this guide on the World Wide Web, set
Minnesota - BIOC - 4521
Physical Biochemistry 4521 NameBe sure to show all of your work.Exam 3Description Avogadro constant Boltzmann constant Charge of electron Coulomb constant Gas constant Plank's constant Mass of an electron Mass of hydrogen Speed of lightSymbol Value,
Arizona - MIS - 596
25Chapter 3ISI RESEARCH: LITERATURE REVIEWChapter OverviewIn this chapter, we review the technical foundations of ISI and the six classes of data mining technologies specified in our ISI research framework: information sharing and collaboration, crime
Michigan - SSW - 20075
S.W. 790 Minicourse Instructor: Sallie Foley Trauma and Treatment Home Phone: (734) 663-9618 Spr/Summer 2007 Office Hours: By appointment, 2798 SSWB, Campus Mail School of Social Work Office Phone: (734)936-6611 (leave messages here) E-mail: smfoley@umich
University of Illinois, Urbana Champaign - TD - 5552009
Approximate Dynamic Programming using Fluid and Diffusion Approximations with Applications to Power ManagementWei Chen, Dayu Huang, Ankur Kulkarni, Jayakrishnan Unnikrishnan, Quanyan Zhu, Prashant Mehta , Sean Meyn , and Adam Wiermanof ECE and CSL, UIUC
University of Florida - STA - 3032
Linear Regression Analysis Simple Linear RegressionA homeowner recorded the amount of electricity in kilowatt-hours (KWH) consumed in his house on each of 21 days. He also recorded the numbers of hours his air conditioner (AC) was turned on and the numbe
Minnesota - REC - 1202
WINTER CAMPING AT BOULDER LAKE TRIP OBJECTIVESI. Travel Loading a pulk - weight distribution, tarp wrap Loading a pack - avoid loose gear on outside of pack Snowshoes - fitting, use of Skiis - waxing, use of poles sizing Keeping Warm II. Clothing - a