Slides - CSE 360: Introduction to Computer Systems Course...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CSE 360: Introduction to Computer Systems Course Notes Wi`10 (bbair@cse.osu.edu) http://carmen.osu.edu Bettina Bair Copyright 19982009 by Bettina Bair, Jim Dinan,Wayne Heym, Rick Parent, Todd Whittaker, Pete Ware CSE360 1 Section Details Class Meets Instructor Homepage Office Hours Phone MTWF 9:30, DL266 Bettina Bair (bbair@cse.osu.edu) www.cse.ohio-state.edu/~bbair Dreese Labs 493 MW11:30, TF 1:30 292-2565 CSE360 2 Topics of Discussion Course description Required texts Policies Syllabus Expectations CSE360 3 Description: Introduction to computer architecture Design of digital computer systems Hardware control / microprogramming Hardware-software interface Low-level programming using assembly language Prerequisites: CSE 214 or 222 or H222L CSE360 4 Text: 1. 2. 3. 4. Computer Systems: Architecture, Organization, and Programming, Arthur B. Maccabe, Irwin, 1993. Sparc Architecture, Assembly Language Programming & C, Richard Paul, Prentice Hall a good reference, if you are interested Class handouts Material online at http://carmen.osu.edu CSE360 5 Grading Policy: An assigned grader will grade all homeworks and labs your lecturer will grade all exams. Missed assignments or tests without prior approval will receive a grade of zero. Reasonable excuses must be given in writing to me one week prior to the due date or test date, at which time the circumstances will be evaluated, and approval granted or rejected. No late homeworks or labs will be accepted. Exams are closed book, closed notes, and cover all of the material up to that point. CSE360 6 Grading Weights: Homeworks (6) Labs (3) Midterm Final 4% each 4% each 30% 34% as assigned as assigned around the 6th week as indicated in master schedule Grading Scale - to be determined You must pass the final to pass the course CSE360 7 Can I work on assignments from home? Submission via Carmen "dropbox" HW: MS Word, PDF, or text format Labs: Submitted as text formatted source (*.s) file Require access to ISEM application Available thru your CSE account: stdsun.cse.ohio-state.edu SSH, telnet and file transfer (ftp) protocols are useful Read more about remote access on Carmen ISEM may also be available online where? How? I don't know. CSE360 8 Students with Disabilities If you need an accommodation based on the impact of a disability, please contact me to arrange an appointment as soon as possible. Office for Disability Services verifies the need for accommodations Helps develop accommodation strategies. If you have not previously contacted the Office for Disability Services, I encourage you to do so. CSE360 9 Academic Misconduct Academic misconduct is defined as any activity which tends to compromise the academic integrity of the institution, or subvert the educational process. University policy requires that all cases of suspected academic misconduct be submitted to the Committee for Academic Misconduct for a hearing and evaluation. Any academic misconduct will be dealt with via the appropriate University authorities. 10 CSE360 Academic Misconduct Homework and lab assignments may be completed with a partner Put both names on submitted assignments Exams are to be your own work CSE360 11 Expectations Read your e-mail Read, reply to the class discussion group on Carmen Attend class (it's correlated to results!) 4 Credit class costs $887 $887 / 40 classes = $22.18 Complete homeworks and labs on time Read the assigned pages from the text CSE360 12 Can I change my section? Not until Brutus updates at the end of the first week only if there are seats available. CSE Majors that are Graduating Seniors CSE Majors People who attend class the first week Priority will be given CSE360 13 Who do I approach if I have a problem with grading? For labs and homework, contact your grader first See me if not resolved For exams, contact me CSE360 14 The Carmen Discussion Group carmen.osu.edu It's a place for students to discuss issues related to course work. Post any questions you might have. Use discretion when making a posting. Look out for important announcements. Instructors/Graders answer questions whenever they can. CSE360 15 Course Objectives Principles of Computer Organization and Architecture Basic Machine Representation of Signed Integers, Character Strings, Arrays, Stacks, Records, Linked Lists; Fundamentals of Computer Instruction Set Architectures; Low Level Algorithms for Data Manipulation and Conversion and Parameter Passing 16 Assembly Language Programming. CSE360 150+ Years of Amazing Computers Sherman, set the WABAC Machine to the year 1822... CSE360 17 Babbage's Difference Engine, 1822 Babbage's difference engine No. 2, finally built in 1991 Could hold 7 numbers of 31 decimal digits Could tabulate 7th degree polynomials CSE360 18 Mathematician, Patron Wrote a program for Babbage's (theoretical) Analytical Engine to calculate the Bernoulli sequence, in 1843 In 1979, a contemporary programming language was named Ada in her honour. CSE360 Ada Lovelace (18151852), the first programmer 19 1890: Hollerith Tabulating System Census Counter, operated on punch cards Was A System Of Machines: Card puncher Tabulator Sorting Box Hollerith's Business Joined A Firm That Later Became IBM. CSE360 20 1804: Jacquard Loom Inventor: Joseph Marie Jacquard Programmable textiles machine Pioneered the use of paper punch cards 194345: Eniac Electrical Numerical Integrator And Computer Built To Compute Ballistics Tables For U.S. Army Artillery During World War II. 1,000 Times Faster Than Any Existing Device. External Plug Wires Used To Program The Machine Principal Designers, J. Presper Eckert And John Mauchley Cost, About $400,000 CSE360 22 Vacuum Tubes ENIAC Used Some 18,000 Vacuum Tubes. 30 Feet By 50 Feet Weighed 30 Tons The ENIAC was a decimal machine! CSE360 23 Programming the Eniac CSE360 24 Original Eniac Programmers CSE360 25 The Bug Mark II Computer at Harvard In 1947, engineers found A moth stuck in a relay Relay: Electromechanical switch Taped it in their logbook Labeled it "first actual case of bug being found." CSE360 26 Grace Hopper (19061992) 1953: Created the first compiler A-0 programming language Translates English Language Instructions Into Language Of The Target Computer "Lazy" And Hoped That "The Programmer May Return To Being A Mathematician." Led To The Development Of The Business Language Cobol. Retired From The U.S. Navy As A Rear Admiral. CSE360 27 IAS (19461952) Institute For Advanced Study At Princeton University Designed And Directed By John Von Neumann Binary 40-bit word Externally stored programs Programs and data stored in the same memory: Stored Program or Von Neumann Architecture Cost: Several Hundred Thousand Dollars CSE360 28 1949: Core Memory A Small Ring, Or Core, Of Ferrite (A Ferromagnetic Ceramic) Can Be Magnetized In Either Of Two Opposite Directions. A Core Can Be Used For Storing One Bit Of Information. For Almost 15 Years, 'Core' Was The Most Important Memory Device. The Invention Of Core Memory Was A Leap Forward In Costeffectiveness And Reliability. CSE360 29 1950s Assembly Programming Class Requirements first, then code... CSE360 30 1965: PDP8 Programmed Data Processor 50,000+ Sold Cost: $18,000. Speed: 1.5 Micro-second Cycle Time Primary Memory: 4K 12-bit Word Core Memory Power: 780 Watts What does cycle time mean? CSE360 31 1960s/70s IBM S/360 CSE360 32 1977: Trs80 Radio Shack "Trash-80," 4K Of Memory Could Not Handle Lowercase Letters Only Three Error Messages: "HOW?" Whenever The User Tried To Perform An Illegal Function "What" When A Syntax Error Occurred "Sorry" When The Available Memory Ran Out Cost Only $400! Some 55,000 Machines Sold In First Year CSE360 33 1979: Vic20 Processor Speed: 1.0227 Mhz. ROM: 16kb RAM: 5kb (3.5kb User Memory) Expandable To 32kb. Screen: 22 Columns By 23 Rows. Character Dot Matrix: 8 By 8 Or 8 By 16 (User Programmable). Screen Dot Matrix: 176 By 184 With Up To 16 Colors. Sound: 3 Voices Plus White Noise. Media: Tape Drive ! C P t s r i f s ' a n i t t e B CSE360 34 1984: Macintosh Revolutionary Graphical User Interface (GUI). A Device Called A Mouse Pictorial Symbols (Icons) On The Screen. Select Commands, Call Up Files, Start Programs, Etc. Original Selling Price: $2,495 CSE360 35 What if you had to build your own computer from scratch? What would it need to do? How would you store information? CSE360 36 Course Objectives Understanding computer architecture Structure of computer systems How are instructions executed Information representation Modern machines are built with digital circuits Everything is represented as bits: 1's and 0's Electric current flowing or not flowing, magnetized in one direction or the other How can we represent data using only bits? 37 CSE360 Course Objectives Low-level Binary encoding of computer programs Binary encoding of data "High"-level Assembly language programming Expressing programs in assembly SPARC, Motorola HC11, etc CSE360 38 Homework #0-0 Log into Carmen See if you can find the following: Contact information for your instructor. Course policy on late assignments Course notes (slides) Reading assignment for the second class-meeting Dropbox and deadline for first homework Story of Mel, A Real Progammer in the discussion group CSE360 39 Homework #0-1 Purchase the textbook written by Maccabe. Read the assigned material for the week Pledge to do the reading assignment before each class meeting. CSE360 40 Homework #0-10 Login to your CS unix account, on stdsun.cse.ohiostate.edu. Your default password is the last four digits of your social security number followed by your first and last initials. For example, Luke Skywalker, whose social security number is 123-45-6789, has a password of 6789ls. In a CSE laboratory room, you will have to log in to the Windows PC first. Your initial password there is the same as for UNIX except that it has an additional exclamation mark (`!') at the end. Luke Skywalker's initial Windows password is 6789ls! 41 CSE360 Make a Table on an Index Card Show Different Representations of Numeric Values. Column Headings Should be: Decimal Octal Hexadecimal Binary CSE360 42 One Row for Each Numeric Value. Show, in Increasing Order, Representations for 0, 1, 2, 3, 4, ... 20 Then, 25, 26, ... 216 Finally 220, 230, 231, 232 CSE360 43 For Example, Decimal 0 1 2 And so on. Octal 0 1 2 Hex 0 1 2 Binary 0 1 10 Note Roman Nat'l Lang zero 20 21 I II one two 20 32 And so on. 24 40 14 20 10100 100000 25 .. 2 16 2 20 2 30 2 31 2 32 XXIV XXXII Twenty .. CSE360 44 Information Representation: Numeral Systems http://cowbirdsinlove.com/43 Information Representation 1 Positional Number Systems: position of character in string indicates a power of the base (radix). Common bases: 2, 8, 10, 16. (What base are we using to express the names of these bases?) Base ten (decimal): digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 form the alphabet of the decimal system. E.g., 360 = 11 Base eight (octal): digits 0, 1, 2, 3, 4, 5, 6, 7 form the alphabet. E.g., 4 7 48 = 46 CSE360 Information Representation 2 Base 16 (hexadecimal): digits 0-9 and A-F. E.g., 1 6 = 3C1 Base 2 (binary): digits (called "bits") 0, 1 form the alphabet. E.g., 1 10 = 001 In general, radix r representations use the first r chars in {0...9, A...Z} and have the form dn-1dn-2...d1d0. Summing dn-1 rn-1 + dn-2 rn-2 + ... + d0 r0 will convert to base 10. Why to base 10? 47 CSE360 Information Representation 3 Base Conversions Convert to base 10 by multiplication of powers E.g., 1 125 = ( )1 00 0 Convert from base 10 by repeated division E.g., 6320 = ( )8 1 Converting base x to base y: convert base x to base 10 then convert base 10 to base y 48 CSE360 Information Representation 4 Special case: converting among binary, octal, and hexadecimal is easier Go through the binary representation, grouping in sets of 3 or 4. E.g., 101 12 = 1 1 1100 1 01 00 1 = 338 1 101 12 = 10 00 1100 11 1 1 = D96 1 E.g., C3B1 6 = ( )8 CSE360 49 Information Representation 5 What is special about binary? The basic component of a computer system is a transistor (transfer resistor): a two state device which switches between logical "1" and "0" (actually represented as voltages on the range 5V to 0V). Octal and hexadecimal are bases in powers of 2, and are used as a shorthand way of writing binary. A hexadecimal digit represents 4 bits, half of a byte. 1 byte = 8 bits. A bit is a binary digit. Get comfortable converting among decimal, binary, octal, hexadecimal. Converting from decimal to hexadecimal (or binary) is easier going through octal. 50 CSE360 Information Representation 6 Binary 0000 0001 0010 0011 0100 0101 0110 0111 CSE360 Hex 0 1 2 3 4 5 6 7 Decimal Binary 0 1 2 3 4 5 6 7 1000 1001 1010 1011 1100 1101 1110 1111 Hex 8 9 A B C D E F 51 Decimal 8 9 10 11 12 13 14 15 Information Representation 7 Ranges of values Q: Given k positions in base n, how many values can you represent? A: nk values over the range (0...nk-1)10 n=1 0, k=3: 103=1000 range is (0...999)10 n=2, k=8: 28=256 range is (0...255)10 n=1 6, k=4: 164=65536 range is (0...65535)10 Q: How are negative numbers represented? CSE360 52 Information Representation 8 Integer representation: Value and representation are distinct. E.g., 12 may be represented as XII, C16, 1210, and 11002. Note: -12 may be represented as -C16, -1210, and -11002. Simple and efficient use of hardware implies using a specific number of bits, e.g., a 32-bit string, in a binary encoding. Such an encoding is "fixed width." CSE360 53 Information Representation 8 Integer representation: Four (fixed-width) methods simple binary signed magnitude binary coded decimal 2's complement Simple binary: as seen before, all numbers are assumed to be positive, e.g., 8-bit representation of 660 = 000 0002 and 1 1 1 1 1 940 = 100 0002 1 1 CSE360 54 Information Representation 9 Signed magnitude: simple binary with leading sign bit. 0 = positive, 1 = negative. E.g., 8-bit signed mag.: 660 = 01 1 00 0002 1 660 = 11 1 00 0002 1 What ranges of numbers may be expressed in 8 bits? Largest: Smallest: CSE360 55 Information Representation 10 Problems: (1)Compare the signed magnitude numbers 1 000 0000 and 0000 0000. (2)Must have "subtraction" hardware in addition to "addition" hardware. Extend 11 00 001 to 12 bits 0 CSE360 56 Information Representation 10 Binary Coded Decimal (BCD): use a 4 bit pattern to express each digit of a base 10 number 0000 = 0 000 1 = 1 000 = 2 00 1 = 3 1 1 000 = 4 00 1 11 = 5 01 10 = 6 01 1 1 = 7 1 000 = 8 1 1 = 9 11 00 00 = + 11 01 = E.g., 123 : 0000 000 1 000 001 1 1 +123 : 11 00 000 1 000 001 1 1 123 : 11 01 000 1 000 001 1 1 CSE360 57 Information Representation 11 BCD Disadvantages: Takes more memory. 32 bit simple binary can represent more than 4 billion discrete values. 32 bit BCD can hold a sign and 7 digits (or 8 digits for unsigned values) for a maximum of 110 million values, a 97% reduction. More difficult to do arithmetic. Essentially, we must force the Base 2 computer to do Base 10 arithmetic. BCD Advantages: Used in business machines and languages, i.e., in COBOL for precise decimal math. Can have arrays of BCD numbers for essentially arbitrary precision arithmetic. 58 CSE360 Information Representation 12 Two's Complement Used by most machines and languages to represent integers. Fixes the -0 in the signed magnitude, and simplifies machine hardware arithmetic. Divides bit patterns into a positive half and a negative half (with zero considered positive); n bits creates a range of [-2n-1... 2n-1 -1]. CODE Simple Signed 2's comp 0000 0 +0 0 0001 1 1 1 0010 2 2 2 0011 3 3 3 0100 4 4 4 0101 5 5 5 0110 6 6 6 0111 7 7 7 1000 8 -0 -8 1001 9 -1 -7 1010 10 -2 -6 1011 11 -3 -5 1100 12 -4 -4 1101 13 -5 -3 1110 14 -6 -2 1111 15 -7 -1 59 CSE360 Information Representation 13 Representation in 2's complement; i.e., represent i in n-bit 2's complement, where -2 n-1 i +2 n-11 Positive numbers: same as simple binary Negative numbers: Obtain the n-bit simple binary equivalent of | i | Obtain its negation as follows: Invert the bits of that representation Add 1 to the result Ex.: convert -32010 to 16-bit 2's complement Ex.: extend the 12-bit 2's complement number 1101 0111 1000 to 16 bits. CSE360 60 Information Representation 14 Binary Arithmetic Addition and subtraction only for now Rules: similar to standard addition and subtraction, but only working with 0 and 1. 0+0=0 1+0=1 0+1=1 1 + 1 = 10 0-0=0 1-0=1 1-1=0 10 - 1 = 1 Must be aware of possible overflow. Ex.: 8-bit signed magnitude 00 11 11 0 0 + 0 1 10 00 1 = 1 Ex.: 8-bit signed magnitude 00 11 11 0 0 0 1 10 00 1 = 1 CSE360 61 Information Representation 15 2's Complement binary arithmetic Addition and subtraction are the same operation Still must be aware of overflow. Opposite signs on operands can't overflow If operand signs are same, but result's sign is different, must have overflow CSE360 62 Do CSE 360 students dream of electric sheep? http://xkcd.com/571/ Information Representation 17 Characters and Strings EBCDIC, Extended Binary Coded Decimal Interchange Code Used by IBM in mainframes (360 architecture and descendants). Earliest system ASCII, American Standard Code for Information Interchange. Most common system, fixed 7 bit representation Unicode, http://www.unicode.org New international standard Variable length encoding scheme with either 8- or 16-bit minimum INY is encoded in UTF-8 like this: 49 E2 99 A5 4E 59 E2 Lead unit 99 A5 Trail units "a unique number for every character, no matter what the platform, no matter what the program, no matter what the language." CSE360 64 Information Representation 18 ASCII Table see table 1.7 on pg. 18. In Unix, run "man ascii". Printable characters for human use interaction Control characters for non-human communication (computer-computer, computer-peripheral, etc.) Contain standard ASCII codes IBM Extended ASCII, includes graphical symbols and lines ISO/IEC 8859, several versions for Latin, Cyrillic, Arabic, and Greek alphabets 65 7 bit code: 8-bit code: most significant bit may be set CSE360 ASCII Upper and lower case characters are 0x20 (3210) apart ASCII representation of `3' is not the same as the binary representation of 3. To convert ASCII to binary (an integer), `3'-`0' = 3 Character ASCII Binary Line feed (LF) character ` ' 010 0000 `A' 100 0001 000 10102 = 0x0a = 1010 `a' 110 0001 `R' 101 0010 `\n' = 0xa `r' 111 0010 `0' `3' 011 0000 011 0011 ASCII Hex 0x20 0x41 0x61 0x52 0x72 0x30 0x33 CSE360 66 Information Representation 19 Decode: 1000001, 1010011, 1000011, 1001001, 1001001, 0100000, 1101001, 1110011, 0100000, 1100101, 1100001, 1110011, 1111001, 0000000 Or (in hex): 41 53 43 49 49 20 69 73 20 65 61 73 79 00 How many bytes is this? What's the use of the '00'? Character ` ' `A' `a' `R' `r' `0' `3' ASCII Binary 010 0000 100 0001 110 0001 101 0010 111 0010 011 0000 011 0011 ASCII Hex 0x20 0x41 0x61 0x52 0x72 0x30 0x33 CSE360 67 ASCII Easy to decode x But takes up a fixed amount of space even if we don't need all the characters String definition is programming language dependent. C, C++: strings are arrays of characters terminated by a null byte. CSE360 68 Binary Heart http://www.xkcd.com/99 Information Representation 20 Simple data compression ASCII codes are fixed length. Huffman codes are variable length and based on statistics of the data to be transmitted. Assign the shortest encoding to the most common character. In English, the letter `e' is the most common. Either establish a Huffman code for an entire class of messages, Or create a new Huffman code for each message, sending/storing both the coding scheme and the message. "a widely used and very effective technique for compressing data; savings of 20% to 90% are typical, depending on the characteristics of the file being compressed." (Cormen, p. 337) CSE360 70 Letter Frequencies (courtesy Wikipedia) English Spanish ECL Expected Code Length Char Fixed len encoding Freq Var len encoding # bits Expected # bits Avg len 00 01 10 11 2 .5 .25 .15 .10 1 01 001 000 1 2 3 3 .5 .5 .45 .3 1.75 CSE360 72 Information Representation 21 Huffman Tree for "a man a plan a canal panama" Determine frequencies of letters (example ignores spaces) Count `a' `c' `l' `m' `n' `p' 10 1 2 2 4 2 Frequency 0.476190 0.047619 0.095238 0.095238 0.190476 0.095238 Create a forest of single node trees. Choose the two trees having the smallest total frequencies (the two "smallest" trees) Merge them together (lesser frequency as the left subtree. 73 Continue merging until only one tree remains. CSE360 Information Representation 22 Reading a `1' calls for following the left branch. Reading a `0' calls for following the right branch. Decoding using the tree: To decode `0001', start at root and follow r_child, r_child, r_child, l_child, revealing encoded `m'. Huffman Tree for "a man a plan a canal panama" 1 .0 'a' .462 7 .5238 'n' .905 1 .3333 . 1 428 .905 1 'c' .07 46 'l' .0952 'm' .0952 'p' .0952 CSE360 74 Information Representation 23 Comparison of Huffman and 3-bit code example 3-bit: 000 011000100 000 101010000100 000 001000100000010 101000100000011000 = 63 bits Huffman: 1 0001101 1 00000010101 1 001110110010 0000101100011 = 46 bits Savings of 17 bits, or 27% of original message `a' `c' `l' `m' `n' `p' Totals 3bit code 000 001 010 011 100 101 Huffman Code 1 0011 0010 0001 01 0000 Count 10 1 2 2 4 2 H length 10 4 8 8 8 8 46 3 length 30 3 6 6 12 6 63 CSE360 75 Tree for: ABE DEFACED A FADED BED freq A B C D E F 4/19 2/19 1/19 5/19 5/19 2/19 19/19 9/19 A F C 5/19 3/19 B 76 10/19 D E CSE360 Recall ECL Expected Code Length Char Fixed len encoding Freq Var len encoding # bits Expected # bits Avg len 00 01 10 11 2 .5 .25 .15 .10 1 01 001 000 1 2 3 3 .5 .5 .45 .3 1.75 CSE360 77 ECL for: ABE DEFACED A FADED BED freq code ecl A 4/19 11 8/19 B 2/19 1000 8/19 C 1/19 1001 4/19 D 5/19 01 10/19 E 5/19 00 10/19 F 2/19 101 6/19 Use the ecl = 2.42 same encodings to decode 11 10000011010001 11100100 1001111000 78 CSE360 CSE 360 Sudoku http://xkcd.com/74/ Parity: Simple error detection Data transmission, aging media, static interference, dust on media, etc. demand the ability to detect errors. Ex.: send ASCII `S': send 11 11, but receive 000 11 10(`R')? 000 Single bit errors detected by using parity checking. Parity, here, is the "the state of being odd or even." CSE360 80 Information Representation 24 How to detect a 1-bit error: Add a 1-bit parity to make an odd or even number of bits per byte. ASCII Even parity Odd Parity `S' 101 0011 0101 0011 1101 0011 `E' 100 0101 1100 0101 0100 0101 Parity bit is stripped by hardware after checking. Sender/receiver both agree to odd or even parity. 2 flipped bits in the same encoding are not detected. What if parity bit is flipped? 81 CSE360 Information Representation 25 Two meanings for Hamming distance. 1. Specific. A count of the number of bits different in two encodings. E.g., dist(1100, 1001) = dist(0101, 1101) = General. The minimum over all distinct pairs in an entire code. The ASCII encoding scheme has a Hamming distance of 1. A simple parity encoding scheme has a Hamming distance of 2. 3. Hamming distance serves as a measure of the robustness of error checking (as a measure of the redundancy of the encoding). CSE360 82 Basic Components 1 Terminology from Ch. 2: Flip flop: basic storage device that holds 1 bit D flip flop: special flip flop that outputs the last value that was input to it (a data signal). Clock: two different meanings: (1) a control signal that oscillates (low to high voltage) every x nanoseconds; (2) the "write select" line for a flip flop. D ata In C lo c k D F lip F lo p D ata O ut o n e c y c le CSE360 83 Basic Components 2 Register: collection of flip flops with parallel load. Clock (or "write select") signal controlled. Stores instructions, addresses, operands, etc. Bus: Collection of related data lines (wires). I n p u t B u s d7 d6 d5 d4 d3 d2 d1 d0 8 C lo c k 8 B it R e g i s t e r 8 C lo c k O u t p u t B u s CSE360 84 Basic Components 3 Combinational circuits: implement Boolean functions. No feedback in the circuit, output is strictly a function of input. Gates: and, or, not, xor AND OR NOT XOR x E.g., xy + z y z f CSE360 85 Basic Components 4 Gates can be used in combination to implement a simple (half) adder. Addition creates a value, plus a carry-out. Z=XY CO = X Y X 0 0 1 1 Y 0 1 0 1 Z 0 1 1 0 CO 0 0 0 1 X Y Z CO CSE360 86 Basic Components 5 Sequential Circuits: introduce feedback into the circuit. Outputs are functions of input and current state. D Q C Multiplexers: combinational circuits that use n bits to select an output from 2n input lines. i0 i1 i2 i3 4 to 1 M U X f s0 s1 CSE360 87 Basic Components 6 Von Neumann Architecture M a in M e m o r y S y s t e m D a ta a n d I n s t r u c t io n P a th w a y Can access either instructions or data from ! memory in each cycle. lops f Flip O p e r a t io n a l R e g is t e r s A r it h m e t ic a n d L o g ic U n it One path to memory (von Neumann bottleneck) Com P ro g ra m C o u n te r binat Stored program system. No Circu ional its! distinction between Sequential C o n t r o l U n it programs and data Circuits! Bus! A d d ress P a th w a y Bus! Bus! I n p u t/O u tp u t S y s te m CSE360 88 Basic Components 7 Examples of Von Neumann architecture to be explored in this course: SAM: tiny, good for learning architecture MIPS: text's example assembly language SPARC: labs M68HC11: used in ECE 567 (taken by CSE majors) Roughly, the order of presentation in this course is as follows: A couple of days on the Main Memory System Weeks on the Central Processing Unit (CPU) Finish the course with the I/O System CSE360 89 What's a kilobyte? http://xkcd.com/394/ Memory Subsystem the busses Address Bus 000 k Data Bus n 001 010 011 100 101 The number of elements depend on the size of the address bus. If k=3, how many addresses? If k=4, how many addresses? n-bit Addressible # Addresses = 2k CSE360 91 Memory Subsystem the busses Address Bus 000 k Data Bus 001 010 011 100 101 Capacity depends on how many bits in each element, or the size of the data bus. If n=1 and k=3, how many bits? If n=2? n If n=8 and k=3, how many Bytes? n-bit Addressible Bit capacity = 2k * n CSE360 92 Memory Element & Address Sizes If 3 bits are used to represent memory addresses, then the memory can have at most 23 = 8 distinct addresses. If a machine's memory is 5-bit addressable, then, at each distinct address, 5 bits are stored. The contents at each address are represented by 5 bits. Address Decimal 0 1 2 3 4 5 6 7 Binary 000 001 010 011 100 101 110 111 Contents 00011 01111 01110 10100 00101 01110 10100 10011 Such a memory can store at most 8 5 = 40 bits of data. If the data bus is 10 bits wide, then up to 10 bits at a time can be transferred between memory and processor. This is a 10-bit word. CSE360 93 Memory Subsystem Addressibility Address Bus 000 k Data Bus n 001 010 011 100 101 Addressibility is the size of the memory element The size of the element may be smaller than the size of the data bus. If n=8, only 1 Byte Addressible If n=16, 1 or 2 Byte Addressible n-bit Addressible How does Addressibility affect capacity? CSE360 94 Memory Subsystem Addressing Memory may be organized into banks, with bit labels The GLOBAL address of each addressible element would be: [relative address] & [bank address] Bank 0 Address Bus Bank 1 000 1 001 1 010 1 011 1 100 1 101 1 000 001 010 011 100 101 000 0 001 0 010 0 011 0 100 0 101 0 Data Bus See the pattern that forms? CSE360 95 Memory Subsystem Alignment Data bus is 4x the size of addressible element. So, you may read (or write) one or more Bytes at a time... But only from/to the same row of memory! Address Bus Bank 00 Bank 01 Bank 10 Bank 11 000 001 010 011 100 101 000 00 001 00 010 00 011 00 100 00 101 00 000 01 001 01 010 01 011 01 100 01 101 01 000 10 001 10 010 10 011 10 100 10 101 10 000 11 001 11 010 11 011 11 100 11 101 11 Data Bus 32 Okay to read/write 2 Bytes from 10010? 2B from 01011? 4B from 01100? 4B from 00101? 8bit CSE360 96 Memory Subsystem Alignment Where are operands of various sizes positioned? Address Bus Bank 00 Bank 01 Bank 10 Bank 11 1 Bytes Aligned 2 Byte Aligned 000 001 010 011 100 101 000 00 001 00 010 00 011 00 100 00 101 00 000 01 001 01 010 01 011 01 100 01 101 01 000 10 001 10 010 10 011 10 100 10 101 10 000 11 001 11 010 11 011 11 100 11 101 11 on any address Data Bus on "halfword" boundary 32 addresses divisible by 2 end in hex 0,2,4,6,8,A,C,E) 4 Byte Aligned on "word" boundary addresses divisible by 4 end in hex 0,4,8,C) 8bit CSE360 97 Instructional Sparc Emulator ISEM Editing, Assembling, Linking, and Loading There are three components to the Instructional SPARC Emulator (ISEM) package that we use for this class: the assembler, the linker, and the emulator/debugger. CSE360 98 Instructional Sparc Emulator ISEM Editing There are a number of programs that you can use to create your source files. Emacs is probably the most popular; vi is also available, but its command syntax is difficult to learn and use; using pine program, you can use the pico editor, which combines many features of Emacs into a simple menu-driven facility. Start Emacs by "xemacs sourcefile.s &", which creates the file called sourcefile.s. Use the tutorial, accessed by typing "Ctrl-H Ctrl-H t". For other editors, you are on your own. CSE360 99 % type xmp0.s .data ! Assembler directive: data starts here. A_m, B_m, and A_m: .word '?' ! C_m are symbolic constants. Furthermore, each B_m : .word 0x30 ! is an address of a certain-sized chunk of memory. Here, C_m : .word 0 ! each chunk is four bytes (one word) long. When the ! program gets loaded, each of these chunks stores a ! number in 2's complement encoding, as follows: At ! address C_m, zero; at B_m, 48; at A_m, 0x3F = 077 = 63. ! Assembler directive, instructions start here start: ! Label (symbolic constant) for this address set A_m, %r2 ! Put address A_m into register 2 ld [%r2], %r2 ! Use r2 as an indirect address for a load (read) set B_m, %r3 ! Put address B_m into register 3 ld [%r3], %r3 ! Read from B_m and replace r3 w/ value at addr B_m sub %r2, %r3, %r2 ! Subtract r3 from r2, save in r2 set C_m, %r4 ! Put address C_m into register 4 st %r2, [%r4] ! Store (write) r2 to memory at address C_m terminate: ! Label for address where 'ta 0' instruction stored ta 0 ! Stop the program beyond_end: ! Label for address beyond the end of this program .text Example Sparc Assembly Language Instructions CSE360 100 Instructional Sparc Emulator ISEM Assembling The assembler is called "isem-as", and is the GNU Assembler (GAS), configured to cross-assemble to a SPARC object format. It is used to take your source code, and produce object code that may be linked and run on the ISEM emulator. The syntax for invoking the assembler is: isem-as [-a[ls]] sourcefile.s -o objectfile.o The input is read from sourcefile.s, and the output is written to objectfile.o. The option "-a" tells the assembler to produce a listing file. The sub-options "l" and "s" tell the assembler to include the assembly source in the listing file and produce a symbol table, respectively. 101 CSE360 Instructional Sparc Emulator ISEM The listing file Will identify all the syntactic errors in your program, and it will warn you if it identifies "suspicious" behavior in your source file. Column 1 identifies a line number in your source file. Column 2 is an offset for where this instruction or data resides in memory. Column 3 is the image of what is put in memory, either the machine instructions or the representation of the data. The final column is the source code that produced the line. At the bottom of the file you will find the symbol table. Again, the symbols are represented as offsets that are relocated when the program is loaded into memory. CSE360 102 1 2 3 4 5 6 7 7 8 9 9 10 11 12 12 13 14 15 16 isemas als labn.s o labn.o >! labn.lst 0000 0004 0008 000c 0000003F 00000030 00000000 00000000 A_m: B_m: C_m: start: .data .word '?' .word 0x30 .word 0 .text set ld set ld sub set st terminate: ta beyond_end: 0000 05000000 8410A000 0008 C4008000 000c 07000000 8610E000 0014 C600C000 0018 84208003 001c 09000000 88112000 0024 C4210000 0028 91D02000 002c 01000000 A_m, %r2 [%r2], %r2 B_m, %r3 [%r3], %r3 %r2, %r3, %r2 C_m, %r4 %r2, [%r4] 0 Contents at address in memory A_m B_m C_m start terminate beyond_end Labels are symbolic offsets Offset to address in memory Line in source file (.s) DEFINED SYMBOLS xmp0.s:2 xmp0.s:3 xmp0.s:4 xmp0.s:6 xmp0.s:14 xmp0.s:16 NO UNDEFINED SYMBOLS .data:00000000 .data:00000004 .data:00000008 .text:00000000 .text:00000028 .text:0000002c CSE360 103 Instructional Sparc Emulator ISEM Linking Linking turns a set of raw object file(s) into an executable program. From the manual page, "ld combines a number of object and archive files, relocates their data and ties up symbol references. Often the last step in building a new compiled program to run is a call to ld." Several object files are combined into one executable using ld; the separate files could reference symbols from one another. The output of the linker is an executable program. The syntax for the linker is as follows: isem-ld objectfile.o [-o execfile] Examples % isem-ld foo.o -o foo Links foo.o into the executable foo. % isem-ld foo.o Links foo.o into the executable a.out. CSE360 104 Instructional Sparc Emulator ISEM Loading/Running Execute the program and test it in the emulation environment. The program "isem" is used to do this, and the majority of its features are covered in your lab manual. Invoke isem as follows isem [execfile] Examples % isem foo Invokes the emulator, loads the program foo % isem Invokes the emulator, no program is loaded Once you are in the emulator, you can run your program by typing "run" at the prompt. CSE360 105 ISEM Debugging Tools 1 % isem xmp0 Instructional SPARC Emulator Copyright 1993 - Computer Science Department University of New Mexico ISEM comes with ABSOLUTELY NO WARRANTY ISEM Ver 1.00d : Mon Jul 27 16:29:45 EDT 1998 Loading File: xmp0 2000 bytes loaded into Text region at address 8:2000 2000 bytes loaded into Data region at address a:4000 PC: 08:00002020 start nPC: 00002024 : sethi PSR: 0000003e N:0 Z:0 V:0 C:0 0x10, %g2 ISEM> run Program exited normally. Assembly language programs are not notoriously chatty. CSE360 106 ISEM Debugging Tools 2 reg ISEM> reg ----0--- ----1--- ----2--- ----3--- ----4--- ----5--- ----6--- ----7--- Gives values of all 32 general registers Also PC Shows the resolved values of all symbolic constants Either symbol or hex address Gives the values stored in memory G 00000000 00000000 0000000f 00000030 00004008 00000000 00000000 00000000 O 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 L 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 I 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 PC: 08:0000204c nPC: 00002050 sethi PSR: 0000003e N:0 Z:0 V:0 C:0 symb ISEM> symb Symbol List beyond_end : 0x0, %g0 A_m : 00004000 B_m : 00004004 . . . terminate : 00004028 ISEM> dump A_m 0a:00004000 0a:00004010 0a:00004020 dump [addr] 00 00 00 3f 00 00 00 30 00 00 00 0f 00 00 00 00 ...?...0....... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ............... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ............... CSE360 107 ISEM Debugging Tools break [addr] Set breakpoints in execution Once execution is stopped, you can look at the contents of registers and memory. Causes one (or more) instruction(s) to be executed Registers are displayed Handy for sneaking up on an error when you're not sure where it is. trace CSE360 108 ISEM Debugging Tools For the all-time "most wanted" list of errors (and their fixes) CSE360 109 ISEM Debugging If you still need help Print a fresh copy of your source Make good notes describing the error Visit your lecturer or grader Post a question to the discussion board CSE360 110 Basic Components 11 Byte ordering: how numeric data is stored in memory Ex.: 24789651110 = 0EC699BF16 Stored at address 0 Little Big Endian Endia n Low order (little end) is at byte 0 0 OE 1 C6 2 99 3 BF High order (big end) is at byte 0 0 BF 1 99 2 C6 3 0E 7 1 6 0 5 1 4 1 3 1 2 1 1 1 0 1 Contrast with bit ordering CSE360 111 Basic Components 12 Read/Write operations: must know the address to read or write. (read = fetch = load, write = store) CPU puts address on address bus A0 A1 CPU sends read signal (R/W=1, CS=1) (Read/don't Write, Chip Select) A(m-1) CS R/ Wait Memory puts data on data bus reset (CS=0) CSE360 D0 D1 D(n-1) W 112 Basic Components 14 CPU: executes instructions -- primitive operations that the computer can perform. E.g., arithmetic A+B data movement A := B control logical if expr goto label AND, OR, XOR... Instructions specify both the operation and the operands. An encoded operand is often a location in memory where the value of interest may be found (address of value of interest). CSE360 113 Basic Components 15 Instruction set: all instructions for a machine. Instruction format specifies number and type of operands. Ex.: Could have an instruction like ADD A, B, R Where A, B, and R are the addresses of operands in memory. The result is R := A+B. Addr 0 4 8 C 17 M em ory 8 9 Label A B R CSE360 114 Basic Components 16 Actually, the "instruction" might be represented in a source file as: 0x41444420412C20422C20520A. ... A D D A , B , R As such, it is an assembly language instruction. An assembler might translate it to, say, 0x504C, the machine's representation of the instruction. As such, it is a machine language instruction. CSE360 115 A Simple Instruction Set 1 Simple instruction set: the Accumulator machine. Simplify instruction set by only allowing one operand. Accumulator implied to be the second operand. Accumulator is a special register. Similar to a simple calculator. ADD addr SUB addr MPY addr DIV addr LOAD addr STORE addr ACC ACC + M[addr] ACC ACC M[addr] ACC ACC * M[addr] ACC ACC / M[addr] ACC M[addr] M[addr] ACC CSE360 116 A Simple Instruction Set 2 Ex.: C = A B + C D Address Symbolic LOAD 20 MPY 2 1 STORE 30 LOAD 22 MPY 23 ADD 30 ! Acc<M[20] ! Acc<Acc*M[2 1] ! M[30]<Acc ! Acc<M[22] ! Acc<Acc*M[23] ! Acc<Acc+M[30] 20 21 22 23 ... 30 A B C D temp 1) 2) 3) 4) 5) Contents 0001 0010 0011 1110 0100 0010 STORE 22 ! M[22]<Acc Accumulator 0000 0001 0010 0011 1100 1110 CSE360 Try C=2A+B Try C=A+2 117 An Instruction (Encoding) Format Machine language: Converting from assembly language to machine language is called assembling. Assume 8-bit architecture. Each instruction may be 8 bits. 3 bits hold the op-code and 5 bits hold the operand. o p -co de o p eran d 7 5 4 0 How much memory can we address? How many op-codes can we have? Operation ADD SUB MPY DIV LOAD STORE Code 000 001 010 011 100 101 CSE360 118 A Simple Instruction Set 4 Convert the mnemonic op-codes into binary codes. Hand assemble our program: Instructions are stored in consecutive memory: Addr 0 1 2 3 4 5 6 ... 20 21 22 23 ... 30 Memory 100 10100 010 10101 101 11110 100 10110 010 10111 000 11110 101 10110 ... 4 5 6 7 ... 20 Mnemonic LOAD A MPY B STORE temp LOAD C MPY D ADD temp STORE C A B C D temp CSE360 119 Simple Accumulator Machine IN C Addr D ecode 2 t o 1 M U X PC IR 9 2 12 4 Op T im in g a n d C ontrol 3 Bus 5 6 7 0 10 11 M AR M DR 2 t o 1 M U X ACC 1 ALU 8 M em ory 13 14 CSE360 120 REGISTERS Simple Accumulator Machine (SAM) ACC Accumulator, stores program values IR - Instruction Register, holds the instruction during interpretation MAR - Memory Address Register, stores address to read/write to/from MDR - Memory Data Register, stores data from memory, either written/read PC - Program Counter, stores the address of the next instruction 121 CSE360 Combinational Circuits Simple Accumulator Machine (SAM) ALU - Arithmetic and logic unit, implements the operations (eg, +,-,*,/) Decode - Instruction decoder, splits off the opcode and operands INC - Incrementer, increments the PC MUX - Multiplexer, controls inputs to PC and ACC CSE360 122 Sequential Circuit Simple Accumulator Machine (SAM) Timing and control - asserts control signals, clock Memory stores instructions and data Combination of flip-flops, circuits and capacitors CSE360 123 A Simple Instruction Set 6 Control signals: control functional units to determine order of operations, access to bus, loading of registers, etc. Number 0 1 2 3 4 5 6 7 Operation ACCbus load ACC PCbus load PC load IR load MAR MDRbus load MDR Number 8 9 10 11 12 13 14 Operation ALUACC INCPC ALU operation ALU operation Addrbus CS R/W CSE360 124 A Simple Instruction Set 7 State 0 1 2 3 Y P C t o b u s lo a d M A R I N C t o P C lo a d P C F etch C S , R /W M D R t o b u s lo a d I R A d d r t o b u s lo a d M A R O P =store N C S , R /W E xe cu te 4 5 7 8 CSE360 A C C t o b u s lo a d M D R 6 CS O P = lo a d Y M D R t o b u s lo a d A C C N M D R t o b u s A L U t o A C C A L U o p lo a d A C C 125 State 0: Control Signals 2, 5, 9, 3 Put the address of the next instruction in the Addr Register and Inc. PC. IN C Addr P C t o b u s lo a d M A R I N C t o P C lo a d P C F etch D ecode 2 t o 1 M U X PC IR Op T im in g a n d C ontrol 3 C S , R /W M D R t o b u s lo a d I R A d d r t o b u s lo a d M A R 9 2 12 4 Bus 5 6 7 O P = store 0 10 11 A C C t o b u s lo a d M D R M AR M DR C S , R /W E xecute 2 t o 1 M U X ACC C S O P = lo a d 1 M D R t o b u s lo a d A C C ALU M D R t o b u s A L U t o A C C A L U o p lo a d A C C 8 M em ory 13 14 CSE360 126 State 1: Control Signals 13, 14 Fetch the word of memory at Address, and load into Data Register. P C t o b u s lo a d M A R IN C t o P C lo a d P C F etch IN C Addr D ecode 2 t o 1 M U X PC IR 9 2 12 4 Op T im in g a n d C ontrol C S , R /W 3 M D R t o b u s lo a d I R A d d r t o b u s lo a d M A R Bus 5 6 7 O P = store A C C t o b u s lo a d M D R C S , R /W 0 t ecu x E 10 11 M AR M DR CS 2 t o 1 M U X O P = lo a d ACC M D R t o b u s A L U t o A C C A L U o p lo a d A C C 1 M D R t o b u s lo a d A C C ALU 8 M em ory 13 14 CSE360 127 State 2: Control Signals 6, 4 Send the word from the Data Register to the Instruction Register. IN C P C t o b u s lo a d M A R IN C t o P C lo a d P C F etch Addr D ecode 2 t o 1 M U X PC IR 9 2 12 4 Op T im in g a n d C ontrol C S , R /W 3 M D R t o b u s lo a d I R A d d r t o b u s lo a d M A R Bus 5 6 7 O P = store A C C t o b u s lo a d M D R 0 C S , R /W t ecu x E 10 11 M AR M DR CS 2 t o 1 M U X O P = lo a d ACC 1 M D R t o b u s lo a d A C C M D R t o b u s A L U t o A C C A L U o p lo a d A C C ALU 8 M em ory 13 14 CSE360 128 State 3: Control Signals 12, 5 Put the address from the instruction in the Address Register. IN C P C t o b u s lo a d M A R IN C t o P C lo a d P C F etch Addr D ecode 2 t o 1 M U X PC IR 9 2 12 4 Op T im in g a n d C ontrol C S , R /W 3 M D R t o b u s lo a d I R A d d r t o b u s lo a d M A R Bus 5 6 7 O P =store A C C t o b u s lo a d M D R 0 C S , R /W t ecu x E 10 11 M AR M DR CS 2 t o 1 M U X O P = lo a d ACC M D R t o b u s A L U t o A C C A L U o p lo a d A C C 1 M D R t o b u s lo a d A C C ALU 8 M em ory 13 14 CSE360 129 After State 3, what values are now stored in each register? PC MAR MDR IR ACC CSE360 130 State 4: Control Signals 0, 7 Take the value from the ACCumulator and store it in the Data Register. IN C P C t o b u s lo a d M A R IN C to P C lo a d P C Addr D ecode 2 t o 1 M U X F etch Op T im in g a n d C ontrol C S , R /W 3 PC IR M D R t o b u s lo a d I R A d d r t o b u s lo a d M A R 9 2 12 4 Bus 5 6 7 O P = store 0 A C C t o b u s lo a d M D R C S , R /W t ecu x E 10 11 M AR M DR 2 t o 1 M U X CS O P = lo a d ACC 1 M D R t o b u s lo a d A C C M D R t o b u s A L U t o A C C A L U o p lo a d A C C ALU 8 M em ory 13 14 CSE360 131 State 5: Control Signal 13 Write the data from the Data Register to the address stored in the MAR. IN C P C t o b u s lo a d M A R IN C to P C lo a d P C Addr D ecode 2 t o 1 M U X F etch Op T im in g a n d C ontrol C S , R /W 3 PC IR M D R t o b u s lo a d I R A d d r t o b u s lo a d M A R 9 2 12 4 Bus 5 6 7 O P = store 0 A C C t o b u s lo a d M D R C S , R /W t ecu x E 10 11 M AR M DR 2 t o 1 M U X CS O P = lo a d ACC 1 M D R t o b u s lo a d A C C M D R t o b u s A L U t o A C C A L U o p lo a d A C C ALU 8 M em ory 13 14 CSE360 132 State 6: Control Signals 13, 14 Load the word at the Address from the Addr Reg into the Data Register. IN C P C t o b u s lo a d M A R I N C t o P C lo a d P C F etch Addr D ecode 2 t o 1 M U X PC IR 9 2 12 4 Op T im in g a n d C ontrol C S , R /W 3 M D R t o b u s lo a d I R A d d r t o b u s lo a d M A R Bus 5 6 7 O P = store 0 A C C t o b u s lo a d M D R C S , R /W t ecu x E 10 11 M AR M DR CS 2 t o 1 M U X O P = lo a d ACC 1 M D R t o b u s lo a d A C C M D R t o b u s A L U t o A C C A L U o p lo a d A C C ALU 8 M em ory 13 14 CSE360 133 After State 6, what values are now stored in each register? PC MAR MDR IR ACC CSE360 134 State 7: Control Signals 6, 1 Load the word from Data Register into the ACCumulator. P C t o b u s lo a d M A R IN C t o P C lo a d P C F etch IN C Addr D ecode 2 t o 1 M U X PC IR 9 2 12 4 Op T im in g a n d C ontrol C S , R /W 3 M D R t o b u s lo a d I R A d d r t o b u s lo a d M A R Bus 5 6 7 O P = store A C C t o b u s lo a d M D R 0 C S , R /W t ecu x E 10 11 M AR M DR CS 2 t o 1 M U X O P = lo a d ACC M D R to b u s A L U t o A C C A L U o p lo a d A C C 1 M D R t o b u s lo a d A C C ALU 8 M em ory 13 14 CSE360 135 State 8: Control Signals 6, 8, 10/11, 1 Use word from the Data Register for Arith Op and put result in ACC. P C t o b u s lo a d M A R IN C t o P C lo a d P C F etch IN C Addr D ecode 2 t o 1 M U X PC IR 9 2 12 4 Op T im in g a n d C ontrol C S , R /W 3 M D R t o b u s lo a d I R A d d r t o b u s lo a d M A R Bus 5 6 7 O P = store A C C t o b u s lo a d M D R C S , R /W t ecu x E 0 10 11 M AR M DR CS O P = lo a d 2 t o 1 M U X ACC M D R to b u s A L U t o A C C A L U o p lo a d A C C 1 M D R t o b u s lo a d A C C ALU 8 M em ory 13 14 CSE360 136 New Instruction GOTO What is necessary to implement a new instruction? New states? New control signals? New fetch/execute cycle? Try GOTO addr ! PC <- addr ! fetch 0: PC -> bus, Load MAR,INC -> PC, Load PC 1: CS, R/~w 2: MDR -> bus, Load IR ! execute n: Addr -> bus, Load PC CSE360 137 Another New Instruction SWAP Exchange value in Accumulator with value at Address SWAP addr ! Acc <- #M[addr], M[addr] <- #Acc CSE360 138 New Instruction What changes to fetch/execute cycle? The fetch part of the cycle usually remains the same. Recall the values stored in registers after each state E.g., After State 6, what values are in each register? PC MAR MDR IR ACC P C t o b u s lo a d M A R IN C t o P C lo a d P C F e tch C S , R /W M D R t o b u s lo a d I R A d d r t o b u s lo a d M A R O P = sto re A C C t o b u s lo a d M D R C S , R /W t xecu E Handy to have #M[addr] in MDR CS O P = lo a d Start after state 6 then... . 139 M D R t o b u s lo a d A C C M D R t o b u s A L U t o A C C A L U o p lo a d A C C CSE360 New State 9: Control Signals 6, 4 Save the Data value from the MDR in the Address Register. IN C 2 t o 1 M U X MDR -> bus Load IR Addr D ecode PC IR 9 2 12 4 Op T im in g a n d C ontrol 3 Bus 5 6 7 0 10 11 M AR M DR 2 t o 1 M U X ACC 1 ALU 8 M em ory 13 14 CSE360 140 New State 10: Control Signals 0, 7 Send the ACCumulator value to the Data Register. IN C 2 t o 1 M U X ACC -> bus load MDR Addr D ecode PC IR 9 2 12 4 Op T im in g a n d C ontrol 3 Bus 5 6 7 0 10 11 M AR M DR 2 t o 1 M U X ACC 1 ALU 8 M em ory CSE360 141 13 14 New State 11: Control Signals 15?, 1 Put the saved value from the IR into the ACCumulator. IR ->bus load ACC Note: there is no control signal in the current architecture opposite of 4 (Load IR), so we would have to create a new control signal (MAR to bus) in addition to creating these new states. CSE360 2 t o 1 M U X 3 IN C Addr D ecode PC IR 9 2 12 4 Op T im in g a n d C ontrol Bus 5 6 7 0 10 11 M AR M DR 2 t o 1 M U X ACC 1 ALU 8 M em ory 13 14 142 New State 12 (Old 5): Control Signals 13 Write the data from the Data Register to the address stored in the MAR. IN C CS 2 t o 1 M U X 3 Addr D ecode PC IR 9 2 12 4 Op T im in g a n d C ontrol Bus 5 6 7 0 10 11 M AR M DR 2 t o 1 M U X ACC 1 ALU 8 M em ory 13 14 CSE360 143 New Instruction Solution Changes to States, added 9 thru 12 Changes to Signals, added 15: IR-> bus Changes to Fetch/Execute, new register transfer language (RTL) PC -> bus, load MAR, INC -> PC, Load PC CS, R/w MDR -> bus, load IR Addr -> bus, load MAR CS, R/w MDR -> bus, load IR ACC -> bus, load MDR IR-> bus, load ACC CS What if we had added MAR->bus instead of IR->bus? CSE360 144 Instruction Set Architectures 1 RISC vs. CISC Complex Instruction Set Computer (CISC): Many, powerful instructions. High code density to address the Von Neumann Bottleneck. Instructions have varying lengths, number of operands, formats, and clock cycles in execution. Reduced Instruction Set Computer (RISC): Fewer, less powerful, optimized instructions. Requires simpler, faster hardware. Instructions have fixed length, number of operands, formats, and similar number of clock cycles in execution. CSE360 145 Instruction Set Architectures 2 Motivation: memory is comparatively slow. 10x to 20x slower than processor. Need to minimize number of trips to memory. Provide faster storage in the processor -- registers. Registers (16, 32, 64 bits wide) are used for intermediate storage for calculations, or repeated operands. Accumulator machine One data register -- ACC. 2 memory accesses per instruction -- one for the instruction and one for the operand. Add more registers (R0, R1, R2, ..., Rn) CSE360 146 Instruction Set Architectures 3 How many addresses to specify? With binary operations, need to know two source operands, a destination, and the operation. E.g., op (dest_operand) (src_op1) (src_op2) Based on number of operands, could have: 3 addr. machine: both sources and dest are named. 2 addr. machine: both sources named, dest is a source. 1 addr. machine: one source named, other source and dest. is the accumulator. 0 addr. machine: all operands implicit and available on the stack. CSE360 147 Instruction Set Architectures 4 1-address architecture: a:=a b+c d e Memory only Code LOAD 100 MPY 104 STORE 100 LOAD 108 MPY 112 MPY 116 ADD 100 STORE 100 # mem refs 2 2 2 2 2 2 2 2 Using registers Code LOAD 100 MPY 104 STORE R2 LOAD 108 MPY 112 MPY 116 ADD R2 STORE 100 # mem refs 2 2 1 2 2 2 1 2 1-address architecture: at least one operand must always be a register. ( address is register, 1 address is the memory operand: LOAD 100, R1). CSE360 Like an accumulator machine, but with many accumulators. 148 Instruction Set Architectures 5 3-address architecture: a:=a b+c d e Using memory only: Code MPY 100, 100, 104 ;a:=a b MPY 200, 108, 112 ;t:=c d MPY 200, 116, 200 ;t:=e t ADD 100, 200, 100 ;a:=t+a # mem refs Using registers: Code MPY R2, 100, 104 ;t1:=a b MPY R3, 108, 112 ;t2:=c d MPY R3, 116, R3 ;t2:=e t2 ADD 100, R3, R2 ;a:=t1+t2 # mem refs Memory 100 (a) 104 (b) 108 (c) 112 (d) 116 (e) ... 200 (t) What about instruction size? CSE360 149 Instruction Set Architecture How does instruction size affect addressing? 16-bit instruction, 3 address, 6 instructions Opcode = 3 bits (23=8) Operand = (size opcode) / #addr =4 bits Operand = (163) / 3 =4 bits Operand = 13 / 3 =4 bits How many addresses will be supported? What if the instruction were 32 bit? 150 CSE360 Instruction Set Architectures 6 2-address architecture: a:=a b+c d e Using memory only: Code MPY 100, 104 ;a:=ab MOVE 200, 108 ;t:=c MPY 200, 112 ;t:=td MPY 200, 116 ;t:=te ADD 100, 200 ;a:=t+a # mem refs 4 3 4 4 4 Using registers: Code MPY 100, 104 ;a:=a b MOVE R2, 108 ;R2:=c MPY R2, 112 ;R2:=R2 d MPY R2, 116 ;R2:=R2 e ADD 100, R2 ;a:=t+a # mem refs 4 2 2 2 3 Memory 100 (a) 104 (b) 108 (c) 112 (d) 116 (e) ... 200 (t) Most CISC arch. this way, making 1 operand implicit CSE360 151 Instruction Set Architectures 7 0-address architecture: a:=a b+c d e Stack machine: All operands are implicit. Only push and pop touch memory. All other operands are pulled from the top of stack, and result is pushed on top. E.g., HP calculators. Code PUSH A PUSH B MPY PUSH C PUSH D PUSH E MPY MPY ADD POP A # mem refs 2 2 1 2 2 2 1 1 1 2 Stack E D*E D B C C*D*E A*B +A C*D*E A*B CSE360 152 Instruction Set Architectures 8 RISC Load/Store Architectures -- RISC Use of registers is simple and efficient. Therefore, the only instructions that can access memory are load and store. All others reference registers. Code LOAD R2, 100 ;R2a LOAD R3, 104 ;R3b LOAD R4, 108 ;R4c LOAD R5, 112 ;R5d LOAD R6, 116 ;R6e MPY R2, R2, R3 ;R2ab MPY R3, R4, R5 ;R3cd MPY R3, R3, R6 ;R3(cd)e ADD R2, R2, R3 ;R2ab+(cd)e STORE 100, R2 ;aab+(cd)e # mem refs 2 2 2 2 2 1 1 1 1 2 Load/ Store CSE360 153 Instruction Set Architectures 9 Why load/store architectures? Number of instructions (hence, memory references to fetch them) is high, but can work without waiting on memory. CISC machines tend to need to have their more complex instructions interpreted in micro code More room in CPU for registers and memory cache. Easier to overlap instruction execution through pipelining. Fetch .... Execute Fetch .... Execute Fetch .... Execute Fetch .... Execute CSE360 154 Pipelining 1 Pipelining 2 Instruction Set Architectures 9 Side effects Register interlock: delaying execution until memory read completes. Machine waits when necessary, to avoid erroneous results. ld [%r1], %r2 add %r2, 100, %r3 Branch delays: instruction after branch is always executed. Rearranging instructions to maximize efficiency of pipelining To prevent register interlock (loads on SPARC) To use branch delay slots (branches on SPARC). Instruction scheduling CSE360 157 SPARC Assembly Language 1 SPARC (Scalable Processor ARChitecture) Used in Sun workstations, descended from RISC-II developed at UC Berkeley General Characteristics: 32-bit word size (integer, address, register size, etc.) Byte-addressable memory RISC load/store architecture, 32-bit instruction, few addressing modes Many registers (32 general purpose, 32 floating point, various special purpose registers) ISEM: Instructional SPARC Emulator - nicer than a real machine for learning to write assembly language programs. 158 CSE360 SPARC Assembly Language 2 Structure Line oriented: 4 types of lines Blank - Ignored Labeled Any line may be labeled. Creates a symbol in listing. Labels must begin with a letter (other than `L'), then any alphanumeric characters. Label must end with a colon ":". Label just assigns a name to an address. x_m: y_m: z_m: .data .word 0x42 .word 0x20 .word 0 .text start: set ld set ld x_m, %r2 [%r2], %r2 y_m, %r3 [%r3], %r3 Assembler Directives - E.g., .data .word .text, etc. Instructions Comments start after "!" character and go to the end of the line. ! Load x into reg 2 ! Load y into reg 3 CSE360 159 SPARC Assembly Language 3 Directives: Instructions to the assembler Not executed by the machine .data -- following section contains declarations Each declaration reserves and initializes a certain number of bits of storage for each of zero or more operands in the declaration. .word -- 32 bits .half -- 16 bits .byte -- 8 bits E.g., .data w : .half 2 7000 x: .byte 8 y : .byte 'm' 0x6e, 0x0, 0, 0 , z: .word 0x3C5F .text -- following section contains executable instructions CSE360 160 SPARC Assembly Language 11 More assembler directives (.asciz and .ascii): Each of the following two directives is equivalent: msg0 1: .asciz "a phrase" msg0 1: .byte 'a', ' ', 'p', 'h', 'r' .byte 'a', 's', 'e', 0 Note that .asciz generates one byte for each character between the quote (") marks in the operand, plus a null byte at the end. The .ascii directive does not generate that extra byte. Each of the following three directives is equivalent: digits: .ascii "0 1234567 89" digits: .byte '0', '1', '2', '3', '4', '5' .byte '6', '7', '8', '9' digits: .byte 0x30, 0x3, 0x32, 0x33, 0x3 1 4 .byte 0x35, 0x36, 0x3 0x38, 0x39 7 , CSE360 161 SPARC Assembly Language Memory alignment: .align 4 Used when mixing allocations of bytes, words, halfwords, etc. and need word boundary alignment Reserve bytes of space: .skip 20 Useful for allocating large amounts of space (e.g., arrays) Can now use the word "mask" anywhere we could use the constant 0x0f previously Create a symbolic constant: .set mask , 0x0f CSE360 162 SPARC Assembly Language 4 Registers -- 32 bits wide 32 general purpose integer registers, known by several names to the assembler %r0%r7 also known as %g0%g7 global registers -- Note, %r0 always contains value 0. %r8%r1 also known as %o0%o7 output registers 5 %r1 6%r23 also known as %l0%l7 local registers %r2 4%r3 also known as %i0%i7 input registers 1 Use the %r0-%r31 names for now. Other names are used in procedure calls. 32 floating point registers %f0%f3 Each reg. is 1. single precision. Double prec. uses reg. pairs. 163 CSE360 SPARC Assembly Language 5 Assembly language 3-address operations - format different from book add %r1 , %r2, %r3 !%r3 %r1 + %r2 or %r2, 0x000 4, %r2 !%r2 %r2 bwor 0x000 4 op src1 , src2, dest !opposite of text E.g., Contrast SPARC with MiPs (used in the book) indirect address notation: @addr vs [addr] operand order, especially the destination register register notation: R2 vs. %r2 branches CSE360 164 SPARC Assembly Language 6 2-address operations: load and store ld [addr], %r2 ! %r2 M[addr] st %r2, [addr] ! M[addr] %r2 Use set to put an address (a label, a symbolic constant) into a register, followed by ld to load the data itself. set x_m, %r1 !put addr x_m into %r1 ld [%r1],%r2 !use addr in %r1 to load %r2 CSE360 165 SPARC Assembly Language 7 Immediate values: operand is not an address, but a value E.g., add %rs, siconst13, %rd !%rd %rs+const Immediate value coded as 13 bit 2's complement. Range is, then, -212...212-1 or -4096 to 4095. Immediate values can be specified in decimal, hexadecimal, octal, or binary. E.g., add %r2, 0x1A, %r2 Constant is coded into instruction itself, therefore available after fetching the instruction (no extra trip to memory for an operand). On SPARC, no special notation for differentiating constants from addresses because no ambiguity in a load/store architecture. CSE360 166 SPARC Assembly Language 8 Synthetic Instructions: assembler translates one "instruction" into one or more machine instructions. set : used to load a 32-bit signed integer constant into a register. Has 2 operands - 32 bit value and register number. How does that fit into a 32 bit instruction? E.g., set iconst32, %rd set 1 0, %r3 set x_m, %r4 set '=' %r8 , clr %rd : used to set all bits in a register to 0. How? mov %rs, %rd : copies a register. neg %rs, %rd : copies the negation of a register. 167 CSE360 SPARC Assembly Language 9 Operand sizes double word = 8 bytes, word = 4 bytes, half word = 2 bytes, byte = 8 bits. Recall memory alignment issues. set x_m, %r2 !Put addr x_m in %r2 ld [%r2], %r1 !load word ldsb [%r2], %r1 !load byte, sign extended ldub [%r2], %r1 !load byte, extend with 0' s st %r1 , [%r2] !store word, addr is mult of 4 stb %r1 , [%r2] !store byte, any address sth %r1 , [%r2] !store half word, address is even CSE360 ldub to load a character stb to store a character Characters use 8 bits 168 SPARC Assembly Language 10 Traps : provides initial help with I/O, also used in operating systems programming. ta 0 : terminate program ta 1 : output ASCII character from %r8 ta 2 input ASCII character into %r8 ta 4 : output integer from %r8 in unsigned hexadecimal ta 5 : input integer into %r8, can be decimal, octal, or hex E.g., set '=' %r8 !put '=' in %r8 , ta 1 !output the '=' ta 5 !read in value into %r8 mov %r8, %r1 !copy %r8 into %r1 CSE360 set 0x0a, %r8 !load a newline into %r8 ta 1 !output the newline 169 SPARC Assembly Language 12 Quick review of instructions so far: ld [addr], %rd ! %rd M[addr] st %rd, [addr] ! M[addr] %r2 op %rs1 , %rs2, %rd ! op is ALU op op %rs, siconst1 3, %rd ! %rd %rs op const set siconst32, %rd ! %rd const ta # ! trap signal Have actually seen many more variants, e.g., ldub, ldsb, sth, clr, mov, neg, add, sub, smul, sdiv, umul, udiv, etc. Can evaluate just about any simple arithmetic expression. 170 CSE360 Review: Sparc Loads, Stores x_m: .data .word 0xa1b2c3d4 .skip 12 .text set x_m, %r2 ld [%r2], %r3 ldsb [%r2], %r4 ldub [%r2], %r5 st %r3, [%r2+4] sth %r3, [%r2+8] stb %r3, [%r2+12] ta 0 After this runs, what values are in %r2-5, and memory locations starting at byte address x_m? CSE360 171 Sparc Loads & Stores BEFORE: ISEM> dump x_m 0a:00004000 ISEM> reg ----0--- ----1--- ----2--- ----3--- ----4--- ----5--G 00000000 00000000 00000000 00000000 00000000 00000000 AFTER: ISEM> dump x_m 0a:00004000 ISEM> reg ----0--- ----1--- ----2--- ----3--- ----4--- ----5--G 00000000 00000000 00004000 a1b2c3d4 ffffffa1 000000a1 a1 b2 c3 d4 a1 b2 c3 d4 c3 d4 00 00 d4 00 00 00 a1 b2 c3 d4 00 00 00 00 00 00 00 00 00 00 00 00 CSE360 172 Hint: It's a cash register! Flow of Control 1 In addition to sequential execution, need ability to repeatedly and conditionally execute program fragments. High level language has: while, for, do, repeat, case, if-then-else, etc. Assembler has if, goto. Compare: high level vs. pseudo-assembler, implementation of f=n! f = 1 i = 2 loop: if (i > n) goto done f = f * i i = i + 1 goto loop done: ... 174 f = 1; i = 2; while (i <= n) { f = f * i; i = i + 1; } CSE360 Flow of Control 2 Branch -- put a new address in the program counter. Next instruction comes from the new address, effectively, a "goto". Unconditional branch (book) (SPARC) BRANCH addr ! PC addr ba addr ! PC addr Conditional branch (book) BRcc R1 , R2, target "if R1 cc R2 then PC target" and cc is comparison operation (e.g., LT is < , GE is , etc.) CSE360 175 Flow of Control 3 Evaluating conditional branches P C to b u s , e tc . Evaluate condition If condition is true, then PC target, else PC PC+1 F e tc h A d d r t o b u s , lo a d PC Y es O P= BRANCH No Y es Consider changes to the fetch-execute cycle given earlier for accumulator machine. Do data paths need to change? New control paths? New opcodes? New instruction formats? Y es C on d=T No O P = B R cc E x e c u te No CSE360 176 Flow of Control 4 Other conditions (from text, very similar to MIPS) BRLT Rn, Rm, target BRLE Rn, Rm, target BREQ Rn, Rm, target BRNE Rn, Rm, target BRGE Rn, Rm, target BRGT Rn, Rm, target ; if Rn < Rm then PCtarget ; if Rn Rm then PCtarget ; if Rn = Rm then PCtarget ; if Rn Rm then PCtarget ; if Rn Rm then PCtarget ; if Rn > Rm then PCtarget Can implement high level control structures now. Factorial example, using the book's assembly language: LOAD LOAD LOAD BRGTR2, MPY R1 , ADD R2, BRANCH STORE R1 #1 , ; R1 = f = 1 R2, #2 ; R2 = i = 2 R3, n ; R3 = n R3, done ; branch if i > n R1 R2 ; f = f * i , R2, #1 ; i = i + 1 loop ; goto loop f, R1 ; f = n! loop: CSE360 done: 177 Flow of Control 5 Condition Codes Book's assembly language has 3-address branches. SPARC uses 1-address branches. Must use condition codes. Non-MIPS machines use condition codes to evaluate branches. Condition Code Register (CCR) holds these bits. SPARC has 4-bit CCR. N Z V C N: Negative, Z: Zero, V: Overflow, C: Carry. All are shown in a trace, or in the reg command under ISEM. Condition codes are not changed by normal ALU instructions. Must use special instructions ending with cc, e.g., addcc. 178 CSE360 Flow of Control 6 .text start: set 1, %r2 set 0xFFFFFFFE, %r1 cc_set: subcc %r1, %r2, %r3 end: ta 0 ISEM> reg ----0--- ----1--- ----2--- ----3--- ----4--- ----5--- ----6--- ----7--G 00000000 fffffffe 00000001 00000000 00000000 00000000 00000000 00000000 O 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 L 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 I 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 PC: 08:00002028 cc_set nPC: 0000202c : subcc PSR: 0000003e N:0 Z:0 V:0 C:0 ! 2 in 32-bit 2's comp ! r3<= -2-1 %g1, %g2, %g3 ISEM> trace ----0--- ----1--- ----2--- ----3--- ----4--- ----5--- ----6--- ----7--G 00000000 fffffffe 00000001 fffffffd 00000000 00000000 00000000 00000000 O 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 L 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 CSE360 I 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 nPC: 00002030 PSR: 00b0003e N:1 Z:0 V:0 C:0 179 PC: 08:0000202c ALU Hardware 1 Recall the half-adder Full-adder adds three single digit binary numbers. Results in a sum, and a carry out. X 0 0 1 1 0 0 1 1 Y 0 1 0 1 0 1 0 1 Sum 0 1 1 0 1 0 0 1 Cou t 0 0 0 1 0 1 1 1 Cin 0 0 0 0 1 1 1 1 cin x y x cout y cin FA Sum Sum cout 180 CSE360 ALU Hardware 2 Now cascade the full adder hardware register x register y cout FA FA FA FA register z FA 0 How are CCR bits set? (Above is a ripple-carry adder.) N-bit = rzn-1 Z-bit = (rzn-1 rzn-2 rzn-3 ... rz0) V-bit = Cout Cn-1 C-bit = Cout V=1 signals a wrong result for signed arithmetic, C=1 signals a wrong result for unsigned arithmetic. CSE360 181 Flow of Control 7 Setting the condition codes Regular ALU operations don't set condition codes. Use addcc, subcc, smulcc, sdivcc, etc., to set condition codes. Consider subcc %r1 , %r2, %r0 %r1 1 0 1 %r2 0 1 1 N 0 1 0 Z 0 0 1 V 0 0 0 C 0 1 0 Do the values in the CCR tell us anything about the relationship between %r1 and %r2? CSE360 182 Flow of Control 8 Branches use logic to evaluate CCR (SPARC) Operation Branch always Branch never Branch not equal Branch equal Branch greater Branch less or equal Branch greater or equal Branch less Branch greater, unsigned Branch less or equal, unsigned Branch carry clear Branch carry set Branch positive Branch negative Branch overflow clear Branch overflow set Assembler Syntax ba bn bne be bg ble bge bl bgu bleu bcc bcs bpos bneg bvc bvs target target target target target target target target target target target target target target target target Branch Condition 1 (always) 0 (never) Z Z (Z (N V)) (Z (N V)) (N V) NV (C Z) CZ C C N N V V CSE360 183 Flow of Control 9 Setting Condition Codes (continued) Synthetic instruction cmp %rs1 , %rs2 Sets CCR, but doesn't modify any registers. Implemented as subcc %rs1, %rs2, %g0 Back to the factorial example (SPARC) set 1, %r1 set 2, %r2 set n, %r3 ld [%r3], %r3 ! %r1 = f = 1 ! %r2 = i = 2 ! Get loc of n ! Put n in %r3 loop: cmp %r2, %r3 ! Set CCR (i?n) bg done ! i > n done nop ! Branch delay umul %r1, %r2, %r1 add %r2, 1, %r2 ba loop nop done:set f, %r3 st %r1, [%r3] ! f = f * i ! i = i + 1 ! Goto loop ! Branch delay ! Get loc of f ! f = n! CSE360 184 Flow of Control 10 Branch delay slots: unique to RISC architecture Non-technical explanation: processor is running so fast, it can't make a quick turn. Instruction following branch is always executed. Technical explanation: the efficiency advantage of pipelining is greater if the following instruction, which has almost completed execution, is allowed to complete. Compilers take advantage of branch delay slots by putting a useful instruction there if possible. For our purposes, use the nop (no operation) instruction to fill branch delay slots. Beware! Forgetting the nop will be a large source of errors in your programs! CSE360 185 High Level Control Structures 1 Converting high level control structures You get to be the "compiler". Some compilers convert the source language (C, Pascal, Modula 2, etc.) into assembly language and then assemble the result to an object file. GNU C, C++ do this to GAS (Gnu Assembler). if-then-else, while-do, repeat-until are all possible to create in a structured way in assembly language. CSE360 186 High Level Control Structures 2 General guidelines Break down into independent (or nested) logical units Convert to if/goto pseudo-code. f=1 i=2 loop: if (i>n) goto done f = f*i i = i+1 goto loop done: ... f = 1; f or (i=2; i<=n; i++) f = f * i; CSE360 187 High Level Control Structures 3 if-then-else if (a<b) c = d + 1; else c = 7; init : set a, %r2 ! get &a into r2 ld [%r2], %r2 ! get a into r2 set b, %r3 ! get &b into r3 ld [%r3], %r3 ! get b into r3 if: cmp %r2, %r3 ! a ?? b (want >=) bge else ! a >= b, do then nop set d, %r5 ! get &d into r5 ld [%r5], %r5 ! get d into r5 add %r5, 1 , %r4 ! r4 < d+1 ba end nop else: set 7 %r4 ! get 7 into r4 , end: set c, %r5 ! get &c into r5 st %r4, [%r5] ! c < r4 if/goto if (a >= b) goto else c = d + 1 goto end else: c = 7 end: CSE360 188 High Level Control Structures 4 while loops: while (a<b) a = a+1; c = d; init : set a, %r4 ! get &a into r4 ld [%r4], %r2 ! get a into r2 set b, %r3 ! get &b into r3 ld [%r3], %r3 ! get b into r3 whle: cmp %r2, %r3 ! a ?? b (want >=) bge done ! a >= b skip body nop body : add %r2, 1 , %r2 ! r2 = a + 1 st %r2, [%r4] ! a = a + 1 ba whle ! repeat loop body nop done: set c, %r5 ! get &c into r5 ... if/goto: whle: body : if (a>=b) goto done a = a+1 goto whle done: CSE360 c = d 189 High Level Control Structures 5 repeat-until loops: repeat ... until (a>b) rpt: ... ... set a, %r2 ; get &a into r2 ld [%r2], %r2 ; get a into r2 set b, %r3 ; get &b into r3 ld [%r3], %r3 ; get b into r3 cmp %r2, %r3 ; a <= b? ble rpt ; do body again nop if/goto: repeat : ... if (a<=b) goto repeat CSE360 190 High Level Control Structures 6 Complex condition ... if((a<b)and(b>=c)) Primitive Language if (a>=b) then goto skip if (b<c) then goto skip body: ... ... skip: ... if((a<b)or(b>=c)) ... Primitive Language if (a<b) then goto body if (b<c) then goto skip body: ... ... skip: ... These can be combined and used in if/else or while loops. CSE360 191 Flow of Control 11 Optimizing code: change order of instructions, combine instructions, take advantage of branch delay slots. Factorial example again. (f or i:=n downto 1 do...) set 1 , %r1 ! %r1=f=1 set n, %r2! Get loc of n ld [%r2], %r2 ! Put n in %r2 loop: umul %r1 , %r2, %r1 ! f=f*n subcc %r2, 1 , %r2 ! Decrement n bg loop ! Repeat nop ! Branch delay set f, %r3! Get loc of f st %r1 , [%r3] ! f=n! Reduced 7 instructions in loop to just 4. (You gain no advantage if you optimize code in your labs.) CSE360 192 Synthetic Instructions Remember lab0? .data x_m: y_m: z_m: start: set ld set ld x_m, %r2 [%r2], %r2 y_m,%r3 [%r3], %r3 .word 0x42 .word 0x20 .word 0 .text and so on... Suppose you gave this command to ISEM (after loading): ISEM> dump start start 05 00 00 10 84 10 a0 00 c4 00 80 00 07 00 00 10 Could you find the set instruction? CSE360 193 Instruction Encodings 1 First, Instruction Encoding is how instructions are assembled All instructions must fit into 32 bits. Register-register: op=1 0, i=0 31 30 29 op 25 24 19 18 14 13 12 i rd op3 rs1 asi 54 rs2 Register-immediate: op=1 i op rd op3 rs1 0, i=1 simm13 op rd Floating point: op3 0, i=0i op=1 rs1 opf rs2 CSE360 194 Instruction Encodings 2 Call instructions: op=0 1 31 30 29 op disp30 31 30 29 28 25 24 22 21 Branch instructions: op=00, op2=00 1 op a i cond op2 disp22 SETHI instructions: op=00, op2=1 op rd op2 imm22 00 31 30 29 25 24 19 18 14 13 12 00000000 54 00011 10 00100 000000 00010 0 Ex.: add %r2, %r3, %r4 CSE360 in hexadecimal: 88008003 195 Decoding an Instruction 05 00 00 1016 0000 0101 0000 0000 0000 0000 0001 00002 Instruction Group (bits 30:31) = 00 Destination Register (bits 25:29) = 00010 Op Code (bits 22:24) = 100 Constant (bits 0:21) = 0000000000000000010000 Meaning: sethi 0x10, %r2 %r2 <-- 00000000000000000100000000000000 (0x4000) CSE360 196 Understanding SET Synthetic Usually used to put the value of an address in memory into a register. For example, set 0x4004, %r3 Can do neither `add %r0, 0x4004, %r3' nor `or %r0, 0x4004, %r3'. Why not? SET is a synthetic instruction which may be implemented in two steps. bit positions 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 #1 sethi 0x10, %r3 ! Puts 0x10 in the Most Significant 22 bits %r3 0 0 0 1 0 0 1 0 0 1 0 0x10 0 0 0 0 0 0 0 0 0 0 0 sethi %r3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 hex value 0 1 0 0 1 0 0 1 0 0 0 0x12481248 0 x x x x x x x x x x 0x10 0 0 0 0 0 0 0 0 0 0 0 0x4000 #2 or %r3, 0x0004, %r3 ! Puts 0x0004 in the least significant bits %r3 0 0 0 0 0 0 0 0 0 0 0 0 0x0004 0 0 0 0 0 0 0 0 0 0 0 0 OR %r3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0x4000 0 0x00000004 0 0x4004 Machine language encoding for 'set 0x4004, %r3' sethi 0x10, %r3 or %r3, 4, %r3 0 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0x 07 00 00 10 0 0x 86 10 E0 04 CSE360 197 SET Synthetic Instruction set iconst, rd sethi or --or-sethi --or-or %g0, iconst, rd %hi(iconst), rd %hi(iconst), rd rd, %lo(iconst), rd CSE360 198 Bitwise Operations 1 Bit Manipulation Instructions Bitwise logical operations 1 1 1 0000 1... (32 bits) 0 1 1 1... 1 100 and %rs1 , %rs2, %rd x 0 0 1 1 x 0 0 1 1 y 0 1 0 1 y 0 1 0 1 x y 0 0 0 1 x+y 0 1 1 1 1 1 1 0000 1... (32 bits) 0 1 1 1... 1 100 or %rs1 , %rs2, %rd 1 1 1 0000 1... (32 bits) xor %rs1 , %rs2, %rd 199 x 0 0 1 1 y 0 1 0 1 x y 0 1 1 0 CSE360 0 1 1 1... 1 100 SPARC Assembly Language Memory alignment: .align 4 Used when mixing allocations of bytes, words, halfwords, etc. and need word boundary alignment Reserve bytes of space: .skip 20 Useful for allocating large amounts of space (e.g., arrays) Can now use the word "mask" anywhere we could use the constant 0x0f previously Create a symbolic constant: .set mask , 0x0f CSE360 200 Bitwise Operations 2 1 1 1 0000 1... (32 bits) 0 1 1 1... 1 100 andn %rs1 , %rs2, %rd x 0 0 1 1 x 0 0 1 1 x 0 1 y 0 1 0 1 y 0 1 0 1 x xy 0 0 1 0 xy 1 0 1 1 1 1 1 0000 1... (32 bits) 0 1 1 1... 1 100 orn %rs1 , %rs2, %rd 1 0 1 1 1 0000 1... (32 bits) not %rs, %rd CSE360 Recall the cc operations, so andcc, orcc, etc. are available. 201 Bitwise Operations 3 For what kinds of things are these bit level operations used? Recall the synthetic operation clr, and mov. clr %r2 or %r0, %r0, %r2 mov %r2, %r3 or %r0, %r2, %r3 Masking operations: Want to select a bit or group of bits from a set of 32. E.g., convert lower (or upper) to upper case: `a' in binary is 01100001 `A' in binary is 01000001 All we need to do is "turn off" the bit in position 5. and %r1 , 0b101 1, %r1 will turn off that bit! 11 1 1 What if we subtract 32 (0b100000) from %r1? What about converting upper to lower case? CSE360 202 Bitwise Operations 4 Bitwise shifting operations Shift logical left: sll %rs1 , %rs2, %rd %rs1: data to be shifted %rs2: shift count %rd: destination register E.g., set 0xABCD123 4, %r2 sll %r2, 3, %r3 %r2: 11 0 1 100 10 00 11 1 11 000 1 000 00 1 000 1 1 1 %r3: 00 10 0 1 000 1 1 000 00 0000 11 1 1 10 1 00 1 11 CSE360 sll is equivalent to multiplying by a power of 2203 (barring overflow). (In the decimal system, what's a shortcut for Bitwise Operations 5 Shift Logical Right: srl %rs1 , %rs2, %rd Shifts right instead of left, inserting zeros. Arithmetic shifts: propagate the sign bit when shifting right, e.g., sra. (Left shift doesn't change.) Almost equivalent to dividing by a power of 2. Rotating shifts: Bits that would have gone into the bit bucket are shifted in instead. (E.g., rr, rl) R o t a t e R ig h t R o ta te L e f t Rotate not implemented in SPARC CSE360 204 Addressing Modes 1 Addressing Modes How do we specify operand values? In a register, location is encoded in the instruction. As a constant, immediate value is in the instruction. In memory, operand is somewhere in memory, location may only be known at runtime. Memory operands: Effective address: actual location of operand in memory. This may be calculated implicitly (e.g., by a displacement in the instruction) or may be calculated by the programmer in code. CSE360 205 Addressing Modes 2 Summary of addressing modes: Example add %r1, 100, %r1 add %r1, %r2, %r1 add %r1, [2000], %r2 add %r1, [[2000]], %r2 ld [%r1], %r2 st %r1, [%r2+%r3] st %r1, [%r2+x] ld [%r1]+, %r2 ld -[%r1], %r2 Loc. Of Operand instruction %r2 mem[2000] mem[mem[2000]] mem[%r1] mem[%r2+%r3] mem[%r2+x] mem[%r1] increment %r1 decrement %r1, mem[%r1] Suitable for Constants Integers, constants Integers, constants Pointers Pointers Arrays Records Arrays, strings, stacks Arrays, strings, stacks SPARC? Yes Yes No No Yes Yes Yes No No Mode Immediate Register Direct Memory Direct Memory Indirect Register Indirect Register Indexed Register Displaced Post Increment Pre Decrement CSE360 206 Addressing Modes 3 Memory Direct addressing Entire address is in the instruction (not in SPARC). E.g., accumulator machine: each instruction had an opcode and a hard address in memory. Can't be done on SPARC because an address is 32 bits, which is the length of an instruction. No room for opcodes, etc. Can be done in CISC because multi-word instructions are permitted. Memory Indirect addressing Pointer to operand is in memory. Instruction specifies location of pointer. Requires three memory fetches (one each for instruction, pointer, and data). Not in RISC machines because instruction is too slow; such an instruction would cause its own register interlock! CSE360 207 Addressing Modes 4 Register Indirect addressing Register has address of operand (a pointer). Instruction specifies register number, effective address is contents of register. Simulating Register Indirect addressing on SPARC SPARC doesn't truly have register indirect addressing. Assembler converts `st %r2, [%r1]' into `st %r2, [%r1+%r0]' CSE360 208 Register Direct/Indirect Addressing Pointers C-style example of pointer data type char x; // object of type character char * ptr; // pointer to character type ptr = &x; // ptr has address of x (points to x) *ptr = `a'; // store `a' at address in ptr Assembly language equivalent .data x_m: .byte 0 ! reserve character space; x_m = &x; [x_m] = x .align 4 ! align to word boundary ptr_m: .word x_m ! pointer variable; [ptr_m] = ptr .text mov 'a', %r1 ! Put ascii `a' into temp set ptr_m, %r2 ! get address ptr_m into %r2 ld [%r2], %r3 ! get address [ptr_m], i.e. x_m, into %r3 `a' r1 stb %r1, [%r3] ! store `a' at address [ptr_m], i.e., ptr ptr_m r2 x_m r3 x_m: ptr_m: `a' x_m, i.e., addr of x What's the difference between MOV and SET? CSE360 209 Addressing Modes 5 Ex.: sum up array of integers: .data n_m: .word 5 ! Size of array a_m: .word 4,2,5,8,3 ! 5 word array sum_m: .word 0 ! Sum of elements b_m: .skip 5*4 ! another 5 word array .text clr %r2 ! r2 will hold sum set n_m, %r3 ! r3 points to n ld [%r3], %r3 ! r3 gets array size set a_m, %r4 ! r4 points to array a loop: ld [%r4], %r5 ! Load element of a into r5 add %r5, %r2, %r2! sum = sum + element add %r4, 4, %r4 ! Incr ptr by word size subcc %r3, 1, %r3! Decrement counter bg loop ! Loop until count = 0 nop ! Branch delay slot set sum_m, %r1 ! r1 points to sum st %r2, [%r1] ! Store sum ta 0 ! done 5 4 2 5 8 3 n_m a_m a_m+4 a_m+8 a_m+12 a_m+16 sum_m r2 r3 0 5 4 3 2 1 r4 r5 loop a_m loop+1 a_m+4 loop+2 a_m+8 loop+3 a_m+12 loop+4 a_m+16 CSE360 210 Register Indexed & Displaced Recall these Assembler directives Reserve bytes of space: .skip 20 Create a symbolic constant: .set offset, 0x1 6 Register Indexed and Displaced addressing modes help us work with pointers, arrays, and records in assembly language. CSE360 211 Addressing Modes 7 Register Indexed addressing Suitable for accessing successive elements of the same type in a data structure. Ex.: Swap elements A[i] and A[k] in array A A+4 A+8 A+12 A r4 r7 r8 after sll .data A: .skip 24*4 ! reserve array[0..23] of int ! assume i is in %r2 and k is in %r3 .text set A, %r4 ! beginning of array ptr. sll %r2, 2, %r2 ! "multiply" i by 4 sll %r3, 2, %r3 ! "multiply" k by 4 ld [%r2+%r4], %r7 ! r7 < a[i] r2 r3 ld [%r3+%r4], %r8 ! r8 < a[k] 001 0010 st %r8, [%r2+%r4] ! a[i] < r8 < st %r7, [%r3+%r4] ! a[k] <= r7 100 1000 CSE360Effective address calculations! 212 Addressing Modes 8 Array mapping functions: used by compilers to determine addresses of array elements. Must know upper bound, lower bound, and size of elements of array. Total storage = (upper - lower + 1)*element_size Address offset for element at index k = (k - lower)*element_size Address (byte) offset for A[3] = (3-0)*4 = 12 This is for 1 dimensional arrays only! CSE360 213 Addressing Modes 9 1D array mapping: Array of n elements, each element is 4 bytes large, array starts at address arr. Total storage is 4n bytes First element is at arr + 0 Zero-based indexing Last element is at arr + 4(n-1) M[addrk] = arr + 4*k. arr+0 arr+4 arr[k] arr+8 k=2 a rr+12 k=3 arr+16 k=4 a rr+20 k=5 k=0 k=1 a r r a y o f 6 e le m e n t s , 4 b y t e s e a c h M[addrk ] CSE360 214 Addressing Modes 10 Memory is a 1D array: M[addr] Must linearize 2D arrays to map the 2D structure into 1D memory. 0 1 2 3 4 0 1 2 0 ,0 1 ,0 2 ,0 0 ,1 1 ,1 2 ,1 0 ,2 1 ,2 2 ,2 0 ,3 1 ,3 2 ,3 0 ,4 1 ,4 2 ,4 3 R o w s (0 ...2 ) 5 C o lu m n s ( 0 ...4 ) 0 ,0 0, 0 ,2 0 0 ,4 1, Convert into 1D1 array in, 3memory0 1 ,1 ..... 2 ,3 2 ,4 CSE360 215 Addressing Modes 11 2 ways to convert to 1D Row major order (C/C++, Pascal, Modula-2) stores as a sequence of rows. E.g., 0 ,0 0 ,1 0 ,2 0 ,3 0 ,4 1 ,0 1 ,1 ..... 2 ,3 2 ,4 Column major order (FORTRAN) stores as a sequence of columns. E.g., 0 ,0 1 ,0 2 ,0 0 ,1 1 ,1 2 ,1 0 ,2 ..... 1 ,4 2 ,4 CSE360 216 Addressing Modes Row major 2D array mapping function: Given an array starting at address arr, that is x columns by y rows, elements of m bytes in size, and indices starting at zero, then element A(j, k) may be found at location: +0 +1 +2 +3 +4 0,0 +5 0,1 +6 0,2 +7 0,3 +8 0,4 +9 arr + (x j + k) m 3 Rows (k = 0..2) 1,0 +10 1,1 +11 1,2 +12 1,3 +13 1,4 +14 Offset to A (0,2) = (5 * 0 + 2) * element size 2,0 2,1 2,2 2,3 2,4 5 Columns (j = 0..4) +0 0,0 +1 0,1 +2 0,2 +3 0,3 +4 0,4 +5 1,0 +6 1,1 +7 1,2 +8 1,3 +9 1,4 +10 2,0 +11 2,1 +12 2,2 +13 2,3 +14 2,4 CSE360 217 Addressing Modes 12 3D array mapping function: natural extension of 2D function. Store a series of 2d row-major arrays. +15 +16 +17 +18 +19 1,0,0 +0 +1 +20 +6 +25 +11 1,0,1 +2 +21 +7 +26 +12 1,0,2 +3 +22 +8 +27 +13 1,0,3 +4 +23 +9 +28 +14 +0 0,0,0 +5 0,1,0 +10 0,2,0 +15 1,0,0 +20 1,1,0 +25 1,2,0 +1 0,0,1 +6 0,1,1 +11 0,2,1 +16 1,0,1 +21 1,1,1 +26 1,2,1 +2 0,0,2 +7 0,1,2 +12 0,2,2 +17 1,0,2 +22 1,1,2 +27 1,2,2 +3 0,0,3 +8 0,1,3 +13 0,2,3 +18 1,0,3 +23 1,1,3 +28 1,2,3 +4 0,0,4 +9 0,1,4 +14 0,2,4 +19 1,0,4 +24 1,1,4 +29 1,2,4 1,0,4 +24 0,0,0 1,1,0 1,1,1 1,1,2 1,1,3 1,1,4 0,0,1 0,0,2 0,0,3 0,0,4 +5 +29 0,1,0 1,2,0 1,2,1 1,2,2 1,2,3 1,2,4 0,1,1 0,1,2 0,1,3 0,1,4 +10 0,2,0 0,2,1 0,2,2 0,2,3 0,2,4 CSE360 Base address arr, x columns, y rows, z depth, element size m. 218 &A(i, j, k) = arr + (x*y*i + x*j + k) * m Array Addressing Summary Dims 1 2 3 Size x*m x*y*m x*y*z*m Address of an element &A(k) = arr + k*m &A(j,k) = arr + (x*j+k)*m &A(i,j,k) = arr + (x*y*i+x*j+k)*m = arr + (x*(y*i+j)+k)*m arr Base address x Number of columns y Number of rows z depth m element size Addressing Modes 15 Displacement Addressing Suitable for accessing the individual fields of record data structures. Each field can be of a different type. N am e A ge DOB 2 0 C h a r a c t e r s Inte ge r Inte ge r L o g ic a l v ie w o f a record Use .set directive to establish offsets to fields within records. Then use displacement raddressing rto access those fields. A c t u a l la y o u t o f e c o r d in m e m o y 2 0 b y t e s person+0 4 b y t e s person+20 4 b y t e s person+24 CSE360 220 Addressing Modes 16 Ex.: Add 1 to the age field in a person record .data .set name, 0 ! offset to name field .set age, 20 ! offset to age field .set dob, 24 ! offset to date of birth person: .skip 28 ! size of a person record .text .... set person, %r1 ! get addr of person record ld [%r1+age], %r2 ! get the age of the person add %r2, 1, %r2 ! increment age by 1 st %r2, [%r1+age] ! store back to record Problem: alignment in memory. May have to waste some space in the person record in CSE360order to have the integer fields align on a word 221 boundary. Addressing Modes 17 Auto-increment and Auto-decrement addressing SPARC does not support these modes. They may be simulated using register indirect addressing followed by an add or subtract of the size of the element on that register. Useful for traversing arrays forward (auto-increment) and backward (auto-decrement). Also useful for stacks and queues of data elements. CSE360 222 CSE360 Every computer, at the unreachable memory address 0x-1, stores a secret. I found it, and it is that all humans ar-- SEGMENTATION FAULT. 223 Subroutines 1 Subroutine (also function, method, procedure, or subprogram) a portion of code within a larger program, which performs a specific task and can be relatively independent of the remaining code. reducing the duplication of code in a program enabling reuse of code across multiple programs decomposing complex problems into simpler pieces improving readability of a program hiding or regulating part of the program Requires little hardware support, mostly protocols and conventions to handle parameters. Advantages of subroutines CSE360 224 Subroutines 2 Terminology Caller: the code (which could be a subroutine itself) which invokes the subroutine of interest Callee: the subroutine being invoked by the caller Function: subroutine that returns one or more values back to the caller and exactly one of these values is distinguished as the return value Return value: the distinguished value returned by a function 225 CSE360 Subroutines 3 Terminology (continued) Procedure: a subroutine that may return values to the caller (through the subroutine's parameter(s)), but none of these values is distinguished as the return value Return address: address of the subroutine call instruction Parameters: information passed to/from a subroutine (a.k.a. arguments) Subroutine linkage: a protocol for passing parameters between the caller and the callee CSE360 226 Subroutines 4 Calling a subroutine Assembly language syntax for calling a subroutine call label nop Must change the program counter (as in a branch instruction) however, we must also keep track of where to resume execution after the subroutine finishes. Call instruction handles this atomically (i.e., without interruption) by: %r1 5 #PC (PC #nPC) nPC label Note that the PC is saved, but the nPC is changed. Why? What is the nPC? 227 CSE360 Subroutines 4 Returning from a subroutine Assembly language syntax for returning from a subroutine nop retl Again, must change the program counter to return to an instruction after the one that called the subroutine. The address of the instruction that called it was saved in %r15, and we must skip over the branch delay slot as well. So, this is accomplished by: nPC %r1 5+8 CSE360 228 Subroutines 5 Parameter passing: 2 approaches Register based linkage: pass parameters solely through registers. Has the advantage of speed, but can only pass a few parameters, and it won't support nested subroutine calls. Such a subroutine is called a leaf subroutine. Stack based linkage: pass parameters through the run-time stack. Not as fast, but can pass more parameters and have nested subroutine calls (including recursion). CSE360 229 Registerbased Linkage 1 Subroutine linkage: Startup Sequence: load parameters and return address into registers, branch to subroutine. Prologue: if non-leaf procedure then save return address to memory, save registers used by callee. Epilogue: place return parameters into registers, restore registers saved in prologue, restore saved return address, return. Cleanup Sequence: work with returned values C a lle r C a lle e P r o lo g u e S tartu p S equence C le a n u p S equence ca ll Body re tl E p ilo g u e CSE360 230 Registerbased Linkage 2 Example: Print subroutine. main: .text set 1, %r1 set 3, %r2 mov %r1, %r8 call print nop mov %r2, %r8 call print nop add %r1, %r2, call print nop ta 0 set `0', %r1 or %r8, %r1, mov %r2, %r8 ta 1 mov `\n', %r8 ta 1 retl nop ! Initialize r1 and r2 ! Print %r1 ! Print %r2 %r8 ! Do our calculation ! Print the result (expect `4') print: %r2 ! Ascii value of zero ! Treat r8 as parameter ! Move into output register ! Output character ! Output end of line (newline) ! Return What's wrong with the above code? CSE360 231 Registerbased Linkage 3 Which registers can leaf subroutines change? Convention for optimized leaf procedures: Register(s) %r0 %r1 %r2-%r7 %r8 %r8-%r13 %r14 %r15 %r30 %r16-%r29, %r31 Use Zero Temporary Caller's variables Return value Parameters Stack pointer Return address Frame pointer Caller's variables Mentionable? Yes Yes No Yes Yes No Yes, but preserve No No CSE360 The subroutine must not use the value in any other register except to save it to memory somewhere and restore it before returning to the caller. Problem: how can a subroutine call another subroutine? How can a 232 subroutine call itself? Parameter Passing 1 Review of parameter passing mechanisms: Pass by value copy: parameters to subroutine are copies upon which the subroutine acts. Pass by result copy: parameters are copies of results produced by the subroutine. Pass by reference copy: parameters to subroutine are (copies of) addresses of values upon which the subroutine acts. Callee is responsible for saving each result to memory at the location referred to by the appropriate parameter. Hybrid: some parameters passed by value copy, some by result copy, and/or some by reference copy. Callee is responsible for saving results for reference parameters. CSE360 233 Parameter Passing 2 Parameter passing notes: Array or record parameters typically are passed by reference copy (efficiency reasons). Primitive data types may be passed either way. Conventions among languages allows any language to call functions in any other language: Pascal: VAR parameters are passed by reference copy; all others are passed by value copy. C: all parameters are passed by value copy. Must explicitly pass a pointer if you want a reference parameter. C++: like Pascal, can pass by value or reference copy. FORTRAN: all things passed by reference copy (even constants). ADA: pass by value/result copy. CSE360 234 Stackbased Linkage 1 Stack based linkage Advantages Permits subroutines to call others. Allows a larger number of parameters to be passed. Permits records and arrays to be passed by value copy. Saving of registers by callee is "built-in". A way for callee to reserve memory for other uses is "built-in", too. Disadvantages Slower than register based More complex protocol Why a stack? Subroutine calls and returns happen in a last-in first-out order (LIFO). Also known as a runtime stack, parameter stack, or subroutine stack. CSE360 235 Stackbased Linkage 2 Items "saved" on the stack in one activation record Parameters to the subroutine Old values of registers used in the subroutine Local memory variables used in subroutine Return value and return address R u n t im e S t a c k 2 n d s t a c k fr a m e fo r A 1 s t s t a c k fr a m e fo r C 1 s t s t a c k fr a m e fo r B 1 s t s t a c k fr a m e fo r A E x p a n d e d V ie w L o c a l v a r ia b le s S a v e d g e n e r a l p u r p o s e r e g is t e r s R e t u r n a d d r e s s e s R e t u r n v a lu e s P aram eters Say A() calls B(), B() calls C(), and C() calls A() CSE360 236 Stackbased Linkage 3 Stack based linkage parameter passing convention C a lle r C a lle e P r o lo g u e Startup sequence: Prologue Push registers that are changed (including return address) Allocate space for local variables ca Push parameters Push space for return value S tartup S equence C le a n u p S equence ll Body re tl Epilogue Restore general purpose registers Free local variable space Use return address to return E p ilo g u e Cleanup Sequence Pop and save returned values Pop parameters CSE360 237 Stackbased Linkage 4 Stack based parameter passing example: Register %r1 %sp stack pointer 4 Invariant: Always indicates the top of the stack (it has the address in memory of the last item on stack, usually a word). Moved when items are "pushed" onto the stack. Due to interruptions (system interrupts (I/O) and exceptions), values stored above %sp (at addresses less than %sp) can change at any time! Hence, any access above %sp is unsafe! Register %r30 %fp frame pointer Indicates the previous stack pointer. Activation record is from (some subroutine-specific number of words before) the %fp to the %sp. Invariant: %fp is constant within a subroutine (after prologue). CSE360 238 Stackbased Linkage 5 Stack based parameter passing example: Want to implement the following subroutine (also a caller): ! ! ! ! ! ! ! ! ! ! ! ! ! ! global_function Integer subchr (A, B, C) Substitutes character C for all B in string A, and returns count of changes. // In comments, "*(A+index)" is index = 0 count = 0 LOOP: if *(A+index)=0 go to END if *(A+index)B go to INC *(A+index)=C count=count+1 INC: index=index+1 go to LOOP END: denoted by "ch". // while (ch != 0) { // if (ch == B) { // ch = C; // count++; } // index++; // } C_m: B_m: A_m: R_m: .data ! data section .byte 'I' ! parameter C .byte 'i' ! parameter B .asciz "i will tip" ! parameter A .align 4 .word 0 ! for storing result count CSE360 239 Stackbased Linkage 6 .data ! data section .word 'I' ! parameter C .word 'i' ! parameter B .asciz "i will tip" ! parameter A .align 4 ! align to word address stack: .skip 250*4 ! allocate 250 word stack bstak: ! point to bottom of stack R_m: .word 0 ! reserve for count .text ! Program's one-time initialization start: set bstak, %sp ! set initial stack ptr mov %sp, %fp ! set initial frame ptr ! STARTUP SEQUENCE to call subchr() sub %sp, 16, %sp ! move stack ptr set A_m, %r1 ! A is passed by reference st %r1, [%sp+4] ! push address on stack set B_m, %r1 ! B is passed by value ld [%r1], %r1 ! get value of B st %r1, [%sp+8] ! push parameter B on stack set C_m, %r1 ! C is passed by value ld [%r1], %r1 ! get value of C st %r1, [%sp+12] ! push parameter C on stack ! SUBROUTINE CALL call subchr ! make subroutine call nop ! branch delay slot ! CLEANUP SEQUENCE ld [%sp], %r1 ! pop return value off stack add %sp, 16, %sp ! pop stack set R_m, %r2 ! get address of R st %r1, [%r2]! store R . . . ! the rest of the program C_m: B_m: A_m: stack: %sp -> Return value addr (a) b c %fp -> CSE360 240 ! SUBROUTINE PROLOGUE subchr: sub %sp, 32, %sp st %fp, [%sp+28] add %sp, 32, %fp st %r15, [%fp-8] st %r8, [%fp-12] ... Stackbased Linkage 7 ! ! ! ! ! open 8 words on stack Save old frame pointer old sp is new fp save return address Save gen. Register ! Save r9-r13, omitted ! SUBROUTINE BODY ld_reg: ld [%fp+4], %r8 ! "pop" (load) addr of A ld [%fp+8], %r9 ! "pop" (load) value of B ld [%fp+12], %r10 ! "pop" (load) value of C clr %r12 ! count clr %r13 ! index loop: ldub [%r8+%r13], %r11 ! load a string chr cmp %r11, 0x0 ! is chr=null? be done ! then go to done cmp %r11, %r9 ! is chr<>B? (branch delay) bne inc ! then go to inc nop ! branch delay slot stb %r10, [%r8+%r13] ! change chr to C add %r12, 1, %r12 ! increment count inc: add %r13, 1, %r13 ! increment index ba loop ! do next chr nop ! branch delay slot done: st %r12, [%fp+0] ! "push" (store) count on stack ! EPILOGUE ... ! Restore r9-r13, omitted ld [%fp-12], %r8 ! Restore r8 ld [%fp-8], %r15 ! get saved return address ld [%fp-4], %fp ! Get old value of frame ptr add %sp, 32, %sp ! Restore stack pointer retl ! return to caller nop ! branch delay slot %sp -> return addr old frame ptr %fp -> Return value addr (a) b c ... %r9 %r8 CSE360 241 Stackbased Linkage 8 General Guidelines Keep Startups, Cleanups, Prologues, and Epilogues standard (but not necessarily identical); easy to cut, paste, and modify. Caller: leave space for return value on the TOP of the stack. Callee: always save and restore locally used registers. Pass data structures and arrays by reference, all others by value (efficiency). 242 CSE360 Our Fourth Example Architecture Motorola M68HC11 Called "HC11" for short Used in ECE 567, a course required of CSE majors References: Data Acquisition and Process Control with the M68HC11 Microcontroller, 2nd Ed., by F. F. Driscoll, R. F. Coughlin, and R. S. Villanucci, Prentice-Hall, 2000. M68HC11 Processor Manual, on Carmen CSE360 243 Another Reference Late in an academic term (such as now), you can hope to access on-line lecture notes from the Electrical and Computer Engineering course, ECE 265. Visit http://www.ece.osu.edu Under "Academic Program", click on the link "ECE Course Listings". Find 265 and click on the link "Syllabus of this quarter". CSE360 244 HC11 compared with Sparc (1) HC11 CISC Instruction encoding lengths vary (8 to 32 bits) About 316 instructions 4 16-bit user registers, one of which is divided into two 8bit registers CSE360 Sparc RISC, Load/Store Instruction encoding lengths constant (32 bits) About 175 instructions 32 32-bit user integer registers 245 HC11 compared with Sparc (2) HC11 8-bit data bus 16-bit address bus 8-bit addressable Instruction execution not overlapped Sparc 32-bit data bus 32-bit address bus 8-bit addressable Instruction execution overlapped in a pipeline CSE360 246 HC11 compared with Sparc (3) A Strange Fact: The HC11 architecture "allows accessing an operand from an external memory location with no execution-time penalty." [p. 27, M68HC11 Processor Manual] Reason: The HC11 requirements state that the CPU cycle must be kept long enough to accommodate a memory access within one cycle. This seeming miracle is accomplished by keeping processor speed slow enough. CSE360 247 HC11 Programmer's Model (1) 7 Accumulator A Accumulator D 15 X Index Register 0 0 7 Accumulator B 0 Y Index Register Stack Pointer (SP) Program Counter (PC) CSE360 248 HC11 Programmer's Model (2) Condition Code Register (CCR) 4 I 3 N 2 Z 7 S 6 X 5 H 1 V 0 C Carry/Borrow Overflow Zero Negative I Interrupt Mask Half-Carry X Interrupt Mask Stop CSE360 249 HC11 Assembly Language Format (1) Like Sparc, it is line-oriented. A line may: Be blank (containing no printable characters), Be a comment line, the first printable character being either a semicolon (`;') or an asterisk (`*'), or Have the following format (" means an optional field"): [Label] Operation [Operand field] [Comment field] CSE360 250 HC11 Assembly Language Format (2) Label: begins in column 1, ending either with a space or a colon (`:') Contains 1 to 15 characters Case sensitive The first character may not be a decimal digit (0-9) Characters may be upper- or lowercase letter, digits 09, period (`.'), dollar sign (`$'), or underscore (`_') CSE360 251 HC11 Assembly Language Format (3) Operation: Cannot begin in column 1 Contains: Instruction mnemonic, Assembler directive, or Macro call (we haven't studied macro expansion in this course) Operand field: Terminated by a space or tab character, So multiple operands are separated by commas (`,') without using any spaces or tabs 252 CSE360 HC11 Assembly Language Format (4) Comment field: Begins with the first space character following the operand field (or following the operation, if there is no operand field) So no special printable character is required to begin a comment field But it appears to be conventional to begin a comment field with a semicolon (`;') CSE360 253 Prefixes for Numeric Constants Encoding Decimal Hexadecimal Octal Binary CSE360 HC11 No symbol $ @ % Sparc No symbol 0x 0 0b 254 Assembler Directives (1) Meaning Set location counter (origin) End of source Equate symbol to a value Form constant byte CSE360 HC11 ORG END EQU FCB Sparc .data or .text Doesn't have .set .byte 255 Assembler Directives (2) Meaning Form double byte Form character string constant Reserve memory byte or bytes HC11 FDB FCC RMB Sparc .half .ascii .skip CSE360 256 HC11 Addressing Modes Immediate (IMM) Extended (EXT) Direct (DIR) Inherent (INH) Relative (REL) Indexed (INDX, INDY) CSE360 257 Immediate (IMM) Assembler interprets the # symbol to mean the immediate addressing mode Examples LDAA #10 LDAA #$1C LDAA #@17 LDAA #%11100 LDAA #'C' LDAA #LABEL CSE360 258 Extended (EXT) Lack of # symbol indicates extended or direct addressing mode. These are forms of memory direct addressing, like SAM. "Extended" means full 16-bit address, whereas "Direct" means directly to a low address, specified using only the least significant 8 bits of the address. Examples LDAA $2025 LDAA LABEL 259 CSE360 Direct (DIR) Examples LDAA $C2 LDAA LABEL CSE360 260 Inherent (INH) All operands are implicit (i.e., inherent in the instruction) Examples: ABA, SBA, DAA ABA means add the contents of register B to the contents of A, placing the sum in A (A + B A) SBA means A B A DAA means to adjust the sum that got placed in A by the previous instruction to the correct BCD result; e.g., $09 + $26 yields $2F in A, then DAA changes this to $35. CSE360 261 Relative (REL) Used only for branch instructions Relative to the address of the following instruction (the new value of the PC) Signed offset from -128 to +127 bytes Examples BGE -18 BHS 27 BGT LABEL CSE360 262 Indexed (INDX, INDY) Uses the contents of either the X or Y register and adds it to a (positive, unsigned) offset contained in the instruction to calculate the effective address Example LDAA 4,X CSE360 263 Interrupts When an interrupt is acknowledged, the CPU's hardware saves the registers' contents on the stack. An interrupt service routine ends with a(n) RTI instruction. This instruction automatically restores the CPU register values from the copies on the stack. CSE360 264 Condition Code Register (CCR) It's reasonably safe to say that every instruction that changes a register (A, B, D, X, Y, SP) affects the CCR appropriately. Unlike Sparc, there are no arithmetic instructions that do not set condition codes. There do exist instructions that compare a register to a memory location by subtracting the memory contents from the register and throwing the result away, but setting the CCR (CMPA, CMPB, CPD, CPX, CPY). CSE360 265 HC11 Condition Code Register The H bit is turned on by an 8-bit addition operation when there is a carry from the lowerorder nibble into the higher-order nibble, that is to say, from bit 3 into bit 4. 0000 1111 +0000 1000 ------------0001 0111 1 000 CSE360 266 HC11 Condition Code Register The Z bit is turned on when the result is zero. 0000 0000 The N bit is turned on when the result is negative according to the appropriately-sized 2's complement encoding scheme. 1010 1010 CSE360 267 HC11 Condition Code Register The V bit is turned on when, under the appropriately-sized 2's complement interpretation of the two source operands and the result, the result is wrong. 0100 + 1100 ------0000 CSE360 2's Comp +4 -4 ---0 Simple Binary +4 +12 ---0?? 268 Correct so V-bit is off Incorrect so C-bit is on HC11 Condition Code Register The C bit is turned on when, under the simple binary interpretation of the two source operands and the result, the result is wrong. 2's Comp +7 +7 ----2?? Simple Binary +7 +7 ---Correct so 14 C-bit is off 269 0111 + 0111 ------1110 CSE360 Incorrect so V-bit is on Example HC11 Program Problem: Produce the following waveforms on the three least significant bits (LSBs) of parallel 8-bit output Port B (mapped to $1004), where we name the bits X, Y, and Z in increasing order of significance (X is bit 0; Y is bit 1; Z is bit 2). 10 ms X Y Z CSE360 20 ms 15 ms 270 Example Source File, p. 1 STACK: EQU PORTB: EQU $00FF ; set stack pointer $1004 ; set address of Port B ORG 0 DELAY1: FCB 10 DELAY2: FCB 20 DELAY3: FCB 15 ; set the waveform times ; for X, Y, and Z CSE360 271 Example Source File, p. 2 ORG $E000 ; program starts at $E000 MAIN: LDS #STACK ; initialize stack pointer L0: LDAA #1 ; set X on Port B to 1 STAA PORTB LDAB DELAY1 ; delay for 10 ms L1: JSR DELAY_1MS DECB BNE L1 CSE360 272 Example Source File, p. 3 LDAA #%00000010 ; set Y on Port B to 1 STAA PORTB LDAB DELAY2 ; delay for 20 ms L2: JSR DELAY_1MS DECB BNE L2 LDAA #%00000100 ; set Z on Port B to 1 STAA PORTB LDAB DELAY3 ; delay for 15 ms L3: JSR DELAY_1MS DECB BNE L3 BRA L0 ; continue to cycle CSE360 273 Example Source File, p. 4 DELAY_1MS: PSHB ; subr. to delay for 1 ms LDAB #198 DELAY: DECB BRN DELAY NOP BNE DELAY PULB RETURN: RTS RESET: ORG FDB END $FFFE MAIN ; initialize reset vector CSE360 274 Traps and Exceptions 1 Traps, Exceptions, and Extended Operations Other side of low level programming -- the interface between applications and peripherals OS provides access and protocols CSE360 275 Traps and Exceptions 2 BIOS: Basic Input/Output System Subroutines that control I/O No need for you to write them as application programmer OS interfaces application with BIOS through traps (extended operations (XOPs)) A p p lic a t io n s so ftw a r e B IO S K eyboard S creen M ouse D is k CSE360 276 Traps and Exceptions 3 Where are OS traps kept? Two approaches: Transient monitor: traps kept in a library that is copied into the application at link-time A p p l 1 A p p l 2 A p p l 3 A p p l 4 O S r t n s O S r t n s O S r t n s O S r t n s Resident monitor: always keep OS in main memory; applications A A p p l 3 A p p l 5 share the trap routines. p p l 1 O S r t n s A p p l 2 A p p l 4 A p p l 6 CSE360 OS routines monitor devices. Frequently used routines kept resident; others loaded as needed. 277 Traps and Exceptions 4 (Assuming a res. monitor) How to find I/O routines? Store routines in memory, and make a call to a hard address. E.g., call 256 When new OS is released, need to recompile all application programs to use different addresses. Use a dispatcher Dispatcher is a subroutine that takes a parameter (the trap number). Dispatcher knows where all routines actually are in memory, and makes the branch for you. Dispatcher subroutine must always exist in the same location. B IO S 1 A p p lic a t io n D is p a t c h e r B IO S 1 2 B IO S n CSE360 278 Traps and Exceptions 5 Use vectored linking Branch table exists at a well known location. The address of each trap subroutine is stored in the table, indexed by the trap number. On RISC, usually about 4 words reserved in the table. If the trap routine is larger than 4 words, can call the actual routine. 100 104 108 A d d r o f t r a p 0 A d d r o f t r a p 1 A d d r o f t r a p 2 100 116 132 100+4n A d d r o f t r a p n 100+16n CSE360 279 Traps and Exceptions 6 Levels of privilege Supervisor mode - can access every resource User mode - limited access to resources OS routines operate in supervisor mode, access is determined by bit in PSW (processor status word). XOP (book's notation) can always be executed, sets privilege to supervisor mode (ta) RTX (book's notation) can only be executed by the OS, and returns privilege to user mode (rett) CSE360 280 Traps and Exceptions 7 Exceptions Caused by invalid use of resource. E.g., divide by zero, invalid address, illegal operation, protection violation, etc. Control transferred automatically to exception handler routine. Similar to trap or XOP transfer. Exceptions vs. XOPs XOPs explicit in code, exceptions are implicit XOPs service request and return to application; exceptions print message and abort (unless masked). On SPARC, trap table has 256 entries. 0-127 are reserved for exceptions and external interrupts. 128255 are used for XOPs. Trap table begins at address 0x0000. Each entry is 4 instructions (16 bytes) long. CSE360 281 Traps and Exceptions 8 Trap example: non-blocking read ta 3 If there is nothing in the keyboard buffer, return with a message that nothing is there. Otherwise, put the character into register 8. Status of the keyboard is kept in a memory location, as is the (one-character) keyboard buffer. Memory mapped devices. ! ta 3 returns character if one is there, otherwise ! it returns 0x8000000 into %r8 set 0x8000000, %r8 ! set default return val set KbdStatus, %r1 ! KbdStatus is memory loc ld [%r1], %r1 ! read status (1 is ready) andcc %r1, 1, %r1 ! check status be rtn ! can't read anything set KbdBuff, %r1 ! KbdBuff is memory loc ld [%r1], %r8 ! get character rtn: rett ! return to caller CSE360 282 Traps and Exceptions 9 Trap execution: ta 3 Calculate trap address: 3 * 16 + 0x0800 = 16 * (3 + 0x080) Save nPC and PSW to memory SPARC uses register windows Assumes local registers are available Set privilege level to supervisor mode Update PC with trap address (and make nPC = PC + 4) (jumps to trap table) Trap table has instruction ba ta3_handler rett Restores PC (from saved nPC value) and PSW (resets to user mode) Returns to application program CSE360 283 Programmed I/O 1 Programmed I/O Early approach: Isolated I/O Special instructions to do input and output, using two operands: a register and an I/O address. CPU puts device address on address bus, and issues an I/O instruction to load from or store to the device. CSE360 284 Programmed I/O 2 Isolated I/O a d d r b u s d a ta b u s r e a d /w r it e CPU a d d r b u s d a t a b u s r e a d /w r it e I /O M em ory CSE360 285 Memory Mapped I/O No special I/O instructions. Treat the I/O device like a memory address. Hardware checks to see if the memory address is in the I/O device range, and makes the adjustment. Use high addresses (not "real" memory) for I/O memory maps. E.g., 0xFFFF0000 through 0xFFFFFFFF. m e m o r y u n u s e d addr bus data bus read/write CPU Memory I/O I /O u n u s e d CSE360 286 Programmed I/O 3 Advantages of each Memory mapped: reduced instruction set, reduced redundancy in hardware. Isolated: don't have to give up memory address space on machines with little memory CSE360 287 Programmed I/O UARTs UARTs Universal Asynchronous Receiver Transmitter 0 1 1 0 CPU . . 0 Keyboard 01101010 serial UART parallel Asynchronous = not on the same clock. Handshake coordinates communication between two devices. A kind of programmed I/O. 288 CSE360 UARTs 1 UART registers Control: set up at init, speed, parity, etc. Status: transmit empty, receive ready, etc. Transmit: output data Receive: input data All four needed for bidirectional communications, Status/control, transmit / receive often combined. Why? Control bus Address bus Control Reg Status Reg Transmit Reg Receive Reg Transmit Logic Receive Logic Data bus CSE360 289 UARTs 2 Memory mapped UARTs FFFF 0000 UART 1 data Both memory and I/O "listen" to the address bus. The appropriate device will act based on the addresses. Keyboards and Printers require three addresses (when addresses are not combined). Modems require four. (why?) FFFF 0004 UART 1 status FFFF 0008 UART 1 control FFFF 000C FFFF 0010 UART 2 xmit UART 2 recv FFFF 0014 UART 2 status FFFF 0018 UART 2 control FFFF 001C UART 3 xmit and so on Address bus Control bus CPU Memory UART1 UART2 Data bus CSE360 290 Programmed I/O 4 Programmed I/O Characteristics: Used to determine if device is ready (can it be read or written). Each device has a status register in addition to the data register. Like previous trap example, must check status before getting data. Involves polling loops. CSE360 291 Programmed I/O Polling Ex.: ta 2 handler (blocking keyboard input) ta_2_handler: set KbdBuff, %r1 ! get addr of kbd buffer set KbdStatus, %r9 ! get addr of kbd status wait: ld [%r9], %r10 ! get status andcc %r10, 1, %r10 ! check if ready be wait ! loop until ready nop ! branch delay ld [%r1], %r8 ! get data rett ! return from trap Are you ready?... Are you ready now?... How about NOW?... Nope .. Not yet.. Hang on.. Can't afford to wait like this. Computer is millions of times faster than a typist. Also, multi-tasking operating systems can't wait. Special purpose computers can wait. E.g., microwave oven controllers. CSE360 Must have a better way! Interrupts are the 292 Interrupts and DMA transfers 1 Programmed (polled) I/O used busy waiting. Advantages: simpler hardware Disadvantages: wastes time I/O device "requests" service from CPU. CPU can execute program code until interrupted. Solves busy waiting problems. Interrupt handlers are run (like traps) whenever an interrupt occurs. Current application program is suspended. 293 Interrupts (IRQs on PCs) CSE360 Interrupts and DMA transfers 2 Servicing an interrupt I/O controller generates interrupt, sets request line "high". CPU detects interrupt at beginning of fetch/execute cycle (for interrupts "between" instructions). CPU saves state of running program, invokes intrpt. handler. Handler services request; sets the request line "low". Control is returned to the application program. Application Program : : *Interrupt Detected* : : Interrupt Handler Service Request : : Clear Interrupt CSE360 294 Interrupts and DMA transfers 3 Changes to fetch/execute cycle Problems Requires additional hardware in Timing & Control. Queuing of interrupts Interrupting an interrupt handler (solution: priorities and maskable interrupts) Interrupts that must be serviced within an instruction How to find address of interrupt handler Y Interrupt Pending? N Save PC Save PSW PSW=new PSW PC=handler_addr PC -> bus load MAR INC to PC load PC CSE360 295 Interrupts and DMA transfers 4 Example: interrupt driven string output Want to print a string without busy waiting. Want to return to the application as fast as possible I'm ready! CSE360 296 Trap handler implementation Install trap handler into trap table Buffer is like circular queue only outputs, at most, one character .skip 256 ! buffers string to print ! offset to front of queue ! offset to back of queue Disp_buf: disp_buf: disp_frnt: .byte 0 disp_bck: .byte 0 disp_frnt Oldest byte Undisplayed byte newest ta_6_handler: ! Copy str from mem[%r8] to mem[disp_buf+disp_bck] ! Disp_back = (disp_back+len(str)) mod 256 ! If display is ready ! If CSE360 first char is not null, then output it disp_bck byte 297 Interrupt handler implementation This too outputs only one character at most, but when display becomes ready again, it generates another interrupt which invokes this routine! display_IRQ_handler: ! Save any registers used ! If disp_frnt != disp_bck (queue is not empty) ! ! ! Get char at mem[disp_frnt] If char is not null, then output it Disp_frnt = (disp_frnt+1) mod 256 I'm ready! ! Restore registers and set the request line "low" rett ! Return from trap CPU Memory Uses CSE360 the UART for transmission. 298 Interrupts and DMA transfers 5 Problems with interrupt driven I/O CPU is involved with each interrupt Each interrupt corresponds to transfer of a single byte Lots of overhead for large amounts of data (blocks of 512 bytes) Execute 10s or 100s of instructions per byte Memory Transfer one word of data CPU Device Controller Interrupt Transfer one byte of data CSE360 299 Interrupts and DMA transfers 6 DMA (Direct Memory Access) Want I/O without CPU intervention Want larger than one byte data transfers Solution: add a new device that can talk to both I/O devices and memory without the CPU; a "specialized" CPU strictly for data transfers. CPU Memory DMA Controller CSE360 300 Device Controller Interrupts and DMA transfers 7 Steps to a DMA transfer CPU specifies a memory address, the operation (read/write), byte count, and disk block location to the DMA controller (or specify other I/O device). DMA controller initiates the I/O, and transfers the data to/from memory directly DMA controller interrupts the CPU when the entire block transfer is completed. Conflicts accessing memory. Can either arbitrate access or get a more expensive dual ported memory system. 301 Problem CSE360 ...
View Full Document

Ask a homework question - tutors are online