EE357-Nazarian-Fall09-Lab3-fp

EE357-Nazarian-Fall09-Lab3-fp - EE 357 Lab 3 – FP...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: EE 357 Lab 3 – FP Emulation 1 Introduction You will work in teams of one or two to write an assembly routine that will perform addition of 32‐bit IEEE 754 single‐precision numbers. The assembly routine will be embedded in a C program. 2 What you will learn This lab is intended to teach you how to embed assembly code inside a C‐file. It will also demonstrate how using assembly provides access to features of a particular ISA that would otherwise not be possible in C. Finally, floating point representation and arithmetic will be reinforced. 3 Background Information and Notes IEEE 754 Floating Point 32‐bit IEEE single‐precision floating point representation is shown below: 1 8 23 Sign Exponent (Excess‐127) Fraction Figure 1 – IEEE 754 Single-Precision Format Many processors (especially low‐end embedded cores) do not include floating point HW, instead emulating the operations in SW if needed. You will write a program to take in two single‐precision floating point numbers and add them using only integer operations/instructions. To reduce complexity, we will make the assumption that no special values of FP numbers (i.e. 0, inf., NaN, etc.) will be input or be the result of an operation. To emulate FP addition of these numbers requires isolating the different fields into separate variables and manipulating them. To isolate each field can be accomplished with appropriate AND and shift instructions. To isolate the sign‐bit and move it to the LSB (which will yield a variable “signa” = 0 for pos. and 1 for neg.) can be achieved as follows. move.l andi.l move.l lsr.l move.l a,d0 0x80000000,d0 #31,d1 d1,d0 d0,signa ; Set all bits but MSB to 0 ; Shift sign bit into LSB position Fall 2009 1 EE 357 Lab 3 ‐ FP Emulation To implement FP addition, you should review the algorithm given in class and attempt to implement it in assembly. The code skeleton has allocated several temporary variables that we recommend you produce along the way. An explanation of these variables is provided below. You can add more variable declarations for other temporary values if you so desire. Temp. Variable Name Description signa, signb Sign of input a and b in LSB expa, expb 8‐bit, Excess‐127 exponents in LS 8‐bits fraca, fracb Fractions fields with implied ‘1’ added. SEE EXPLANATION BELOW FOR FORMAT ageb 1 if entire value (exp + frac) of A >= B, 0 otherwise signg, expg, fracg, Sign, exponent, and fraction of larger/greater number fracl Fraction of smaller/lesser number exp_abs_diff Absolute value of difference between exponents (used for shift count of lesser fraction) opsub 1 for subtraction of fractions, 0 for addition fracsum Fraction result of addition/subtraction frac_norm, exp_norm Normalized fraction and exponent frac_round, exp_round Rounded fraction and exponent result Final output; reassembled from all fields Table 1 - Temporary Values and Descriptions Some more explanation of the representation of the fractions is required. After isolating the fraction field and then adding in the leading ‘1’ (which can be accomplished using an “ORI” instruction) we will use the LS‐bits as additional guard bits (as well as the Round and Sticky bit) and a single bit for a carry on the MS side. Thus, we recommend using the following alignment of the fraction within a 32‐bit value. Note you should maintain the sticky bit as the OR of any bit shifted out. 31 30 29‐7 62 1 0 Space for carry Leading 1 23‐bit Fraction field Figure 2 - Fraction bit layout/alignment Guard bits R S Inline Assembly in C code Most compilers include some way to write assembly code within a C file as well as providing certain features to simplify integration of assembly code. This process is usually called assembly inlining and can be used when the C language does not provide access to processor specific instructions/capabilities or simply to insert optimized, hand‐coded assembly rather than using the compiler translation of a kernel of code. 2 Fall 2009 EE 357 Lab 3 ‐ FP Emulation Codewarrior provides two methods for this. The first allows one to define an entire function using assembly. The second allows for a C definition of the function and then a mixture of C and assembly within the function. The figure below shows examples and syntax. To define an entire function using assembly requires adding the “asm” keyword in front of the function prototype. Then all code inside the function must be assembly. To mix assembly and C (which is what we will do for this lab), we can simply put blocks of assembly in a normal C function using the “asm { … }” construct. asm int add4(int a,int b,int c) { link a6,#0 move.l 8(a6),d0 // Get a add.l 12(a6),d0 // Add b add.l 16(a6),d0 // Add c unlk rts } int add4(int a,int b,int { int temp; temp = a+b; asm { move.l temp,d0 // add.l c,d0 // move.l d0,temp // } return temp; } void main() { add4(4,6,7,10); } c) d0=temp Add c temp=d0 void main() { add4(4,6,7,10); } Entire assembly function Mixed assembly/C code Figure 3 – Inline Assembly Syntax for Codewarrior A few important notes for writing mixed assembly/C code are below: 1. Within an “asm { }” block, each line can contain only a single instruction or label. 2. Global and local variables can be access using their name (as shown above with the ‘temp’ variable). The compiler will determine what address or where in the stack the variable resides and replace the name with an appropriate addressing mode when assembling the code. 3. Any registers you will overwrite should technically be saved and restored via the stack before being overwritten. 4. Return values must be placed in D0 4 Prelab Before starting this lab, consider the task of finding how many leading 0’s appear in a number BEFORE the first 1. This task is necessary when normalizing the result of your FP addition/subtraction. By determining the number of leading 0’s, the fraction can be normalized by shifting an appropriate number of places and the exponent can be adjusted by a corresponding amount. In C, this task could be accomplished as shown below but requires using bitwise AND’s and shift instructions. In assembly we can simplify this slightly (remove the AND’ing) by using the condition codes and branch instructions in ways that cannot be expressed Fall 2009 3 EE 357 Lab 3 ‐ FP Emulation in C. Complete the short assembly language loop that returns the number of leading 0’s in a value stored in D0 by using shift instructions and the BCS (Branch if Carry‐Set instruction). [Note: The code below assumes there is a 1 somewhere as it will loop infinitely if ‘num’ is 0. For our lab we will assume the result 0 never occurs.] C‐Version: int count_leading_zeros(int num){ int i=0; while (num & 0x80000000) == 0 ){ num = num << 1; i++; } return i; } Assembly Version: move.l num,d0 clr.l L0: lsl.l b____ d1 #1,d0 ___ addi.l ___,____ b____ L1: ___ count of leading 0’s for ‘num’ now sits in D1 5 Procedure 1. Download the project zip file: “ee357_lab3_fpadd.zip” from Blackboard (Assignments..Labs..Lab 3) and unzip it to a folder on your PC. 2. Open the project, by double‐clicking the .mcp file in the project folder. This project provides a skeleton file, ‘main.c’. It is setup for the Instruction Set Simulator (ISS) and thus does not require your HW board. 3. Open ‘main.c’ and examine the code. The ‘fpadd’ routine is shown with initial global declarations and begins the ‘asm’ block. You will need to complete that block. You may also add other local variable declarations as needed. The ‘main’ routine at the bottom declares two floats: x and y and assigns them. It then declares two other floats: calculated (to store the return value of your addition routine) followed by correct which will perform the compilers included emulation to arrive at the correct FP output. If the two match you can know that your routine performed correctly for the given inputs. 4 Fall 2009 EE 357 Lab 3 ‐ FP Emulation 4. Complete the code to add the two floating point numbers. You can freely overwrite any data registers but you should not use any address registers. When you do not have enough register storage you can always store values out to their local variable memory locations (that’s why we declared them for you). Be sure to maintain the variables that we declared for you (keep them updated) as we will assign partial credit by examining those variables at the end of the routine. Your basic FP addition algorithm should: a. Isolate the individual fields of each number (for the ‘fraca’ & ‘fracb’, add the implicit ‘1’ and setup the bits according to the format in Figure 2), setting the corresponding local variables appropriately. b. Determine which number is larger and set the ‘ageb’ variable appropriately. Then set the ‘signg’, ‘expg’, ‘fracg’ and ‘fracl’ variables appropriately. c. Determine the difference in exponents (‘exp_abs_diff’), the operation to perform (‘opsub’), and perform the operation storing the result to ‘fracsum’. d. Normalize the fraction and adjust the exponent by counting leading 1’s. Note that depending on the result you’ll need to shift right or left. Place the results in ‘frac_norm’ and ‘exp_norm’. e. Perform rounding using the Round‐To‐Zero/Chopping method and store the results in ‘frac_round’ and ‘exp_round’. f. Reassemble the result fields into a single FP value and store it in the ‘result’ variable as well as D0. g. At this point simply end the asm {…} block. Because the asm {…} block is inside the function declaration, the compiler will add in the ‘unlk’ and ‘rts’ instructions for you. 5. In the ‘main’ routine, assign different values to x and y that will exercise the many cases that your code needs to handle. Compile/make the project and then debug the project, stepping through it as you go. Remember you can view registers by clicking “View.. Registers”. The value of local variables is shown automatically by the debugger in the locals window. You can right‐click on any variable and change how it is displayed (i.e. as decimal, hex, or even FP). Pick test cases that will exercise the following cases at the VERY least (you should likely pick some others to test other cases). To verify the correct operation of your code, you can examine the local variables “calculated” and “correct” (change their view to “Floating Point” to see their FP equivalent) by setting breakpoints after each assignment. If they match then your code worked for that test case. a. Define a test case where x is pos. and y is pos. b. Define a test case where x is neg. and y is pos. c. Define a test case where |x| > |y| Fall 2009 5 EE 357 Lab 3 ‐ FP Emulation d. Define a test case where |x| < |y| e. Define a test case that requires renormalization by shifting in one direction f. Define a test case that requires renormalization by shifting in the other direction 6. Comment your code with enough information to convey your approach and intentions. Try to organize your code in a coherent fashion to avoid “spaghetti code” (i.e. branches all over the place jumping back and forth). Also, format the code to make it readable with good formatting and alignment of instructions, etc. 7. Submit your source file, via Blackboard (Assignments..Labs..Lab 3) attaching only the ‘main.c’ file in the ‘sources’ folder of your project. Make sure you click “Submit” on Blackboard and not just “Save”. Also, turn in a hardcopy of your main.c file to your TA. 6 Review None 6 Fall 2009 EE 357 Lab 3 ‐ FP Emulation 7 Lab Report Name: ___________________________________ Due: Friday Oct. 23rd Score: ________ (Detach and turn this sheet along with any other requested work or printouts) 1. Turn in a hardcopy of main.c with the name of your partner to your TA in discussion. 2. List the test cases that you used to test your program. A B Fall 2009 7 EE 357 Lab 3 ‐ FP Emulation 8 EE 357 Lab 3 Grading Rubric Name: ___________________________________ Req. / Guideline Sec. 5 Tests Req. 4a Req. 4b Req. 4c Req. 4d Req. 4e Req. 4f Req. 8 Mult Score 4 (Excellent) 6 All Tests Work Score: ______ / 61 1 (Deficient) 4‐5 Tests Fail 3 (Good) 1 Test Fails 2 (Poor) 2‐3 Tests Fail (0) Failure All 6 Tests Fail 1 1 1 1 1 1 2 Works Does Not Work Hard to read or poorly organized and with poor formatting Well‐ Well‐ Hard to read organized and organized with or poorly readable with acceptable organized but good labels labels and formatted well and formatting formatting 5 pts. ‐10 per day Extermely poor organization and formatting Hard Copy Late TOTAL 8 Fall 2009 ...
View Full Document

This note was uploaded on 09/14/2010 for the course EE 357 at USC.

Ask a homework question - tutors are online