{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

lecture1 - CIS 450 Computer Architecture and Organization...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CIS 450 Computer Architecture and Organization Lecture 1: Introduction Mitch Neilsen [email protected] 219D Nichols Hall Topics Syllabus Outline Course Theme Programmer Perspective vs Builder Perspective Five great realities of computer systems Some data types are approximations of the real thing Assembly language is important to learn Memory matters Performance is more than asymptotic time complexity Computers do more than just execute programs Fundamental Concepts Course Theme Abstraction is good, but don't forget reality! Abstractions have limitations Especially in the presence of bugs Need to understand underlying implementations Useful outcomes Become more effective programmers Able to find and eliminate bugs efficiently Able to tune program performance Prepare for later "systems" classes in CIS & EECE Compilers, Operating Systems, Networks, Microcontrollers, Embedded Systems, Real-time Systems Great Reality #1 Int's are not Integers, Float's are not Reals Examples Is x2 0? Float's: Yes! Int's: 40000 * 40000 --> 1600000000 50000 * 50000 --> ?? Is (x + y) + z = x + (y + z)? Unsigned & Signed Int's: Yes! Float's: (1e20 + -1e20) + 3.14 --> 3.14 1e20 + (-1e20 + 3.14) --> ?? Computer Arithmetic Does not generate random values Arithmetic operations have important mathematical properties Cannot assume "usual" properties Due to finiteness of representations Integer operations satisfy "ring" properties Commutativity, associativity, distributivity Floating point operations satisfy "ordering" properties Monotonicity, values of signs Observation Need to understand which abstractions apply in which contexts Important issues for compiler writers and serious application programmers Great Reality #2 You've got to know assembly language Chances are, you'll never program in assembly Compilers are much better & more patient than humans BUT, understanding assembly language is key to understanding the machine-level execution model Behavior of programs in presence of bugs High-level language model breaks down Tuning program performance Understanding sources of program inefficiency Implementing system software Compiler has machine code as target Operating systems must manage process state Assembly Code Example Time Stamp Counter Special 64-bit register in Intel-compatible machines Incremented every clock cycle Read with rdtsc instruction Application Measure time required by procedure In units of clock cycles double t; start_counter(); P(); t = get_counter(); printf("P required %f clock cycles\n", t); Code to Read Counter Write small amount of assembly code using GCC's asm facility Inserts assembly code into machine code generated by compiler static unsigned cyc_hi = 0; static unsigned cyc_lo = 0; /* Set *hi and *lo to the high and low order bits of the cycle counter. */ void access_counter(unsigned *hi, unsigned *lo) { asm("rdtsc; movl %%edx,%0; movl %%eax,%1" : "=r" (*hi), "=r" (*lo) : : "%edx", "%eax"); } Code to Read Counter /* Record the current value of the cycle counter. */ void start_counter() { access_counter(&cyc_hi, &cyc_lo); } /* Number of cycles since the last call to start_counter. */ double get_counter() { unsigned ncyc_hi, ncyc_lo; unsigned hi, lo, borrow; /* Get cycle counter */ access_counter(&ncyc_hi, &ncyc_lo); /* Do double precision subtraction */ lo = ncyc_lo - cyc_lo; borrow = lo > ncyc_lo; hi = ncyc_hi - cyc_hi - borrow; return (double) hi * (1 << 30) * 4 + lo; } Measuring Time Trickier than it Might Look Many sources of variation Example Sum integers from 1 to n n 100 1,000 1,000 10,000 10,000 1,000,000 1,000,000 1,000,000,000 Cycles 961 8,407 8,426 82,861 82,876 8,419,907 8,425,181 8,371,2305,591 Cycles/n 9.61 8.41 8.43 8.29 8.29 8.42 8.43 8.37 Great Reality #3 Memory Matters Memory is not unbounded It must be allocated and managed Many applications are memory dominated Memory referencing bugs especially pernicious Effects are distant in both time and space Memory performance is not uniform Cache and virtual memory effects can greatly affect program performance Adapting program to characteristics of memory system can lead to major speed improvements Memory Referencing Bug Example int main() { long int a[2] = {0xAAAA,0xBBBB}; float d = 3.14; printf("d = %.15g\n", d); a[2] = 48879; /* Out of bounds reference */ printf("d = %.15g\n", d); return 0; } Output: d = 3.14000010490417 d = 6.84940676377327e-41 // correct // incorrect Memory Referencing Errors C and C++ do not provide any memory protection Out of bounds array references Invalid pointer values Abuses of malloc/free Can lead to nasty bugs Whether or not bug has any effect depends on system and compiler Action at a distance Corrupted object logically unrelated to one being accessed Effect of bug may be first observed long after it is generated How can we deal with this? Program in Java, Lisp, or ML Understand what possible interactions may occur Use or develop tools to detect referencing errors Memory Performance Example Implementations of Matrix Multiplication Multiple ways to nest loops /* ijk */ for (i=0; i<n; i++) { for (j=0; j<n; j++) { sum = 0.0; for (k=0; k<n; k++) sum += a[i][k] * b[k][j]; c[i][j] = sum; } } /* jik */ for (j=0; j<n; j++) { for (i=0; i<n; i++) { sum = 0.0; for (k=0; k<n; k++) sum += a[i][k] * b[k][j]; c[i][j] = sum } } Matmult Performance (Alpha 21164) Too big for L1 Cache Too big for L2 Cache 160 140 120 100 80 60 40 20 0 ijk ikj jik jki kij kji matrix size (n) Blocked matmult perf (Alpha 21164) 160 140 120 100 80 60 40 20 0 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 matrix size (n) bijk bikj ijk ikj Great Reality #4 There's more to performance than asymptotic complexity Constant factors matter too! Easily see 10:1 performance range depending on how code written Must optimize at multiple levels: algorithm, data representations, procedures, and loops Must understand system to optimize performance How programs compiled and executed How to measure program performance and identify bottlenecks How to improve performance without destroying code modularity and generality Great Reality #5 Computers do more than execute programs They need to get data in and out I/O system critical to program reliability and performance They communicate with each other over networks Many system-level issues arise in presence of network Concurrent operations by autonomous processes Coping with unreliable media Cross platform compatibility Complex performance issues Course Perspective Most Systems Courses are Builder-Centric Hardware Architecture Design pipelined processor in Verilog Operating Systems Implement large portions of operating system Compilers Write compiler for simple language Networking Implement and simulate network protocols Course Perspective (cont.) This Course is Programmer-Centric Purpose is to show how by knowing more about the underlying system, one can be more effective as a programmer Enable you to Write programs that are more reliable and efficient Incorporate features that require hooks into OS E.g., concurrency, signal handlers Not just a course for dedicated hackers We bring out the hidden hacker in everyone Cover material in this course that you won't see elsewhere Fundamental Concepts All computer architectures have at their core the concept of the machine. Modern machines (based on the machines formulated by von Neumann 50 years ago) are programmable. Modern machines combine a logic unit with registers and memory to read and execute these programs. Program Logic The logic unit (usually called the Central Processing Unit (CPU)) includes a control unit and an arithmetic logic unit, and the CPU executes very simple program logic instructions. These instructions are stored in the form of binary data called object code. Humans very rarely read object code. Von Neumann Machine CPU The original Von Neumann machine. Von Neumann Machine Von Neumann Machine The "von Neumann" machine can refer to the real machine that he built, but we don't care about that... According to von Neumann, a computer must have the following: 1. Addressable memory 2. Arithmetic Logic Unit (ALU) 3. Program Counter (PC) Thus, a von Neumann computer is programmable. Nearly all modern personal computers, microcomputers, and microcontrollers are based on the von Neumann machine model. Von Neumann Machine PC = 0; do { instruction = memory[PC++]; decode( instruction ); fetch( operands ); execute; store( results ); } while( instruction != halt ); Von Neumann Machines There are three popular sub-architectures for the von Neumann machine: 1. Stack machine 2. Accumulator machine 3. Load/Store machine Stack Machine A Stack Machine uses few, or no registers All operations occur in main memory using the stack Instructions are often very compact Popular for software machines Example machines: Java Virtual Machine, HP RPN calculators, Microsoft .Net VES Stack Machine Accumulator Machine Operations are performed on a single or very small set of accumulators Accumulators are general purpose registers A "pure" accumulator machine may only have a single accumulator Pure accumulators aren't very practical Popular for simplicity of design Example machines: Intel Pentium Accumulator Machine Intel Pentium 4 Processor The Pentium 4 chip. The photograph is copyrighted by the Intel Corporation, 2003 and is used by permission. Load/Store Machine Sometimes called General Purpose Register (GPR) machines Use many general purpose registers called a register file (Relatively) Large amounts of data can be loaded directly into the CPU Example machines: SPARC, PPC, Parrot Load/Store Machine Program Logic The logic unit (usually called the Central Processing Unit (CPU)) includes a control unit and an arithmetic logic unit, and the CPU executes very simple program logic instructions. These instructions are stored in the form of binary data called object code. Humans very rarely read object code. Object Code $ hexdump /bin/ls | head 0000000 0000010 0000020 0000030 0000040 0000050 0000060 0000070 0000060 0000070 0000080 0000090 457f 0002 602c 001b 8034 0004 8154 0001 8154 0001 8000 1000 464c 0003 0001 001a 0804 0000 0804 0000 0804 0000 0804 0000 0101 0001 0000 0006 0120 0003 0013 0001 0013 0001 4b48 0001 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 98b0 0034 0034 0120 0154 0013 0000 0013 0000 4b48 5000 0000 0804 0020 0000 0000 0000 0000 0000 0000 0000 0001 0001 0000 0034 0009 8034 0005 8154 0004 8000 0004 8000 0005 d000 0000 0000 0028 0804 0000 0804 0000 0804 0000 0804 0000 0805 ... Assembly Code Instead of reading object code, programmers have created a textual representation of object code called assembly code. In "true" assembly code, each instruction corresponds exactly to one short segment of object code. The translation in "true" assembly should be a one-to-one mapping in both directions. Assembly Code Assembly tends to take the following form: LABEL: instruction REGISTER, ARG1, ARG2 The actual format varies by platform and notation. Intel notation puts the destination register first. Unix notation puts the source register first. Java Assembly usually only has a single argument (ARG#) per instruction. Parrot Example (add.pasm) The first three lines are similar for both hardware CPUs and software Virtual Machines (VMs) set I0, 2 set I1, 3 add I2, I0, I1 print "2 + 3 = " print I2 print "\n" end Macro-Assembly Language Even plain assembly code tends to be a bit unwieldy. Programmers have added features to assembly to make it easier to read and write, called macro-assembly. The program which translates assembly to object code is called an assembler. A program which translates a high-level language (HLL) to assembly/macro-assembly is called a compiler. Contemporary Multilevel Machines A typical six-level computer architecture. The support method for each level is indicated below it . Getting Started: File Editors Nano Emacs Vim ( coolest) Something else Basic Unix Commands cp x y mv x y rm x mkdir x rmdir x ln -s x y chown anne x chgrp annes_friends x chmod 755 x Search Commands find projects -name "*.pasm" projects/add.pasm projects/hello.pasm grep set * add.pasm:set I0, 2 add.pasm:set I1, 3 grep -r foo . Compilation Commands gcc foo.c gcc foo.s g++ foo.cpp f77 oldfoo.f java barfoo.java perl perlfoo.pl python almost-cool.py ruby current-hype.rb parrot hwkfoo.pasm Summary Outline: Overview of Computer Architecture and Organization. To Do: Read Chapter 1, obtain a CIS account, check out the resources available online via K-State Online at http://online.ksu.edu. ...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online