hw 2 cs 153a

hw 2 cs 153a - CS153A Practice Questions Pipelining and...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS153A Practice Questions Pipelining and Memory Mapping ------------------------------------------------------------------ 1) What are the pipeline stages in the TMS320C31 and what do they do? they are: Fetch - gets the next instruction to be executed from memory Decode - decodes the next instruction to be executed and generates the address if there is one Read - loads operands from memory if there are any needed by the instruction Execute - reads from the register, performs the operation specified by the instruction, writes the result into the register or stores it at a location in memory 2) What are the 3 things that get in the way of perfect pipeline speedup and why? they are: Data Dependencies - when a future instruction uses some data is dependent on certain instructions before it, therefore you have to wait for that to happen before you can run the future instruction with the correct data Structural Hazards - when the hardware isn't enough to fit the needs of all the instructions in the pipeline Control Dependencies - there are brances in the program so until a certain instruction is reached the program doesn't know which instruction it will have to execute at the branch 3) What is Amdahl's law? His law is that the performance improvement to be gained from using a faster mode of execution is limited by the fraction of the time the faster mode can be used. 4) Why do we need pipelining? Because it improves throughput of a system. 5) Describe the 1-bit dynamic hardware branch predictor. (Give state diagram) 6) Show with an example and explanation why a 1-bit predictor doesn't work well for most loops? State the potential prediction accuracy and the accuracy that a 1-bit predictor achieves. a 1 bit predictor doesn't work well with most loops because it will cause mispredictions e.g. a loop for (i=0 i<9; i++) { do something } first and last time through the loop it will cause mispredictions. first time it will predict not going into the loop because it had never been there before whilst last time it will predict going into the loop because, the last time that is what it had done. Therefore it will have only 80% accuracy even though the potential accuracy could be 100% (????????) 7) Describe the 2-bit dynamic hardware branch predictor. (Give state diagram) State why this addresses the problem with the 1-bit predictor above. the two bit dynamic hardware branch predictor must miss a prediction twice before it gets changed. therefore a branch from a loop only gets mispredicted once per loop execution, the second time it happens as opposed to one bit prediction that would mispredict twice each time the loop is run. a branch gets resolved after which the 2 bit counter is updated. the values it can take are 00, 01, 10, 11. 2 are for taking a branch, 2 for not taking it. state diagram is http://www.cs.ucr.edu/~junyang/teach/F03_203A/slides/l10.pdf 8) What are two ways that we can avoid some of the problems with data dependencies in pipelines. One way is code reordering(generally done by the compiler). If there are instructions unrelated to the ones that have data dependencies, the code gets rearranged so that those instructions get executed sooner so at least the cpu is doing something else that's useful on that pipeline rather than just waiting. Another way is much simpler, bubbling the pipeline. That is, if the control logic determines that a data hazard will occur, it inserts NOPs into the pipeline so to stall it until it's safe to continue without the risk of hazards. 9) What is a branch misprediction penalty? it is the number of cycles lost due to a misprediction in a branch. i.e. if instead of predicting and executing the right outcome ahead of time, you predict a wrong one so you had just wasted processing time for several cycles doing something useless. 10) What is static branch prediction -- how does it work? it is where the same branch is always predicted during the execution of the whole program. it is prediction hardcoded in hardware. 11) What is a branch delay slot? How many are there for the TMS320CX? It is where stalls are inserted into the pipeline after a branch instruction until the address needed to be taken by the branch is computer. branch delay slots happen in pipelined architectures because of the branch hazards, meaning a branch that will not be resolved until the certain instruction has advanced through the pipeline. The TMS320CX has 3 branch delay slots. 12) How does the compiler find instructions for branch delay slots? What happens if it cannot find any? What happens if it can find 1 or two but not the maximum amount? It finds instructions that are not related to the branch and are safe to execute, and executes thoses first. otherwise, if it can not find any, it will condense the delayed branch into a non-delayed branch instruction. if it does finr 1 or 2 but not the max amount, it places them before the branch instruction in order to minimise the number of NOPs. --------------------------------------------------------- Assume that the MP has 16 address pins, the ROM holds 4K of space, and the RAM holds 32K of space. 13) What is the range of addresses the MP supports? 14) How many pins are there on the ROM and RAM? 15) Show three different logic layouts that set the chip selects (CE) on the ROM and RAM correctly using memory mapping. Ensure, in each case that ONLY 4K of addresses go to ROM and only 32K of addresses go to RAM. Remember that the MP asserts the CE low on each device when it selects it. --------------------------------------------------------- Name/Perm:Smaranda Velichi / 382736 #include <stdio.h> struct EleStruct { int id; char *name; }; static int counter = 0; static int SIZE = 10; struct EleStruct* create(int tag, char *name) { static struct EleStruct* current; int hash = tag % SIZE; current = (struct EleStruct *)malloc(sizeof(struct EleStruct)); current->id = hash; current->name = name; return current; } int main(int argc, char **argv) { struct EleStruct* headNode; int i = argc; headNode = create(counter, Chandra); counter+=i; return counter; } 1) Draw the runtime stack at the point marked with the square (include both the caller and callee frames). For each entry, give the address, the value, and what the location is used for. Use the info below to get started. For example, for argc, you specify: 0x10002 2 argc The hex values to the left of the code above are the instruction (code) addresses. At this point, the following is true: The value in the SP (stack pointer) register is 0x1000B The value in argcs location is 2 and its address is 0x10002 The value returned by malloc in instruction 0x8002 is 0x20024 The address of the string constant Chandra is 0x20000 The value at address 0x10000 in the stack is 0x7002 0x8001 0x8002 0x8003 0x8004 0x8005 0x9002 0x9003 0x9004 2) Write the epilogue in TMS320Cx assembly for main: Assume the value of counter is in R6 after instruction 0x9003. Things to remember: Put return value into the return register Put the return address into a register so that you can jump to it Reset the frame pointer (AR3) Reset the stack pointer (SP) Jump unconditionally to the return address 3) Write the TMS320 assembly for the following C code: int *ptr = (int*) malloc (sizeof (int)); *ptr = 2; load instruction format: ldi SRC DST //SRC here is memory location store instruction format: sti SRC DST //DST here is memory location Do you know what addressing modes your assembly instructions use? ------------------------------------------------------------------ CS/ECE 153A Practice Problems 1) What is the front-end and the back-end of the compiler? front end is where the syntax and semantics are processed, as well as a intermediate representation of the code. the back end is where the intermediate code is taken, it may be analysed, optimised or transformed, after which is will be converted into machine code for a particular OS. 2) How does the compiler and runtime system ensure that the scope of a local variable is only the function in which it is defined? 3) What are the two ways a compiler can preserve registers? Whis is better and why? 4) What are 3 register conventions that the TMS320 Compiler uses? (Hint: 2 you should get from the class notes, 1 is the type of variable the compiler puts in ARx registers vs Rx registers). 5) What are 4 things that a method prologue must due? 6) What are the things that execution of the call itself does? 7) What are 5 things that a method epilogue must do? 8) What is the stack layout of the following method on the TMS320Cx, immediately after the line i=2;? int main(int argc, char **argv) { int i,j,k; int *foo int foo[3]; char *name; i = 2; foo[0]=7; ... } 9) View the assembly in code composer for a method that calls another method. Allocate and assign local variables, parameters, and return values. Layout the stack with all of its entries when the caller makes a call to the callee and another stack when the callee returns. Be sure to put all data including actual stack addresses (use the memory viewer and the stack pointer), values, and identify each location with a variable name or a description of whats stored at that location. 10) Print out the assembly for your program identify the instructions that make up the prologue and the epilogue. State exactly what each instruction is doing and why....
View Full Document

This note was uploaded on 03/01/2009 for the course CS 153A taught by Professor Krintz during the Fall '09 term at UCSB.

Ask a homework question - tutors are online