Unformatted text preview: CS106X Winter 2008 Handout 14 January 18, 2008 Decomposition
Due to Nick Parlante. Decomposition is the process of breaking a large problem into more manageable subproblems. The motivating principle is that large problems are disproportionately harder to solve than small problems. It's much easier to write two 500-line programs than one 1000-line program. Difficulty Problem Size
In C the unit of decomposition is the function and the ADT. In C++ and Java, the unit of decomposition is the class. For decomposition to work, the subparts of the whole problem should be as independent from each other as possible. That does not mean that they subparts will not depend on each other at all. They should just not depend on details of each other unnecessarily. Independence is especially important in group projects where different sub-problems are attacked by different people. The programmer needs to be able to focus on each problem without worrying about the rest of the program. The need for good decomposition is magnified as the number of lines and the number of programmers grows. It may seem like Stanford standards are unreasonably high for our 1000 line assignments. We insist on excellence in all aspects of programming, including decomposition, because we are trying to teach skills on 1000-line programs that will get you through 100,000 line programs later. The Black Box A well-decomposed function or a well-designed class is sometimes likened to a "black box". The point of the black box is that it is opaque-- its inner workings are not apparent. Instead the box is defined by what it accomplishes. The box has well-defined behavior in terms of its input. The black box presents the simplest possible abstraction to describe what the output will be and hides the implementation. A black box is a unit of delegation-- you get to define it in terms of what you want accomplished such as "I want this data structure filled from this file", or " I want a vector of arbitrary size". All the implementation details of how the desired result is achieved are safely isolated inside. 2 Functional Object Oriented
Message Reception Port Acme Object co /* Abstraction = What are the properties and uses of this object? What messages does it respond to ? */ Input Acme Black Box co Output /* Abstracion = what does this accomplish with its input? */ The goal of decomposition is to divide the problem into independent sub-problems. Black boxes are the natural extension of this goal. They can mostly be written independently. To the extent they need to interact, it is through well-defined and relatively simple mechanisms. A function is a possibly nasty computation wrapped up in a neat package, ready to easily fit into any code. Similarly, a C++ or Java class represents an object with a tidy abstraction and a complete suite of methods. Abstraction Decomposition is a divide-and-conquer strategy. The benefit comes from being able to deal with subparts independently. From the outside, black boxes are as simple as possible so they are easy to use and fit together. This process only works if the abstraction presented by the black box basically makes sense. There should be a clear relationship between the input and output. Typically there is a lot of complexity and detail related to the implementation which should not be part of the abstraction. As input, it should require only what is strictly needed to compute the output. The acid test is: is it easy to describe what this component accomplishes? For functions, it's a good sign if you can form a description where the name of the routine is the verb and the parameters are the nouns. For a C++ method, the operation should have a clear definition relative to the receiver. Function Rules In Practice What should the parameters be? When is a subtask sufficiently complex or independent to merit being put in its own function? A function should solve one problem. Its parameters should include only what is necessary. The abstraction of what the function accomplishes with the parameters should make sense. One Problem A function should solve one problem. That does not mean that the function needs to be one line long-- the lines in the function step through the subparts of the problem. At some point, a subpart becomes sufficiently independent that it should be factored 3 out into its own function. It should be easy to describe what the function accomplishes. Length Length is a simplistic measure that cannot replace real analysis based on the structure of the problem. A function may be too long because it solves more than one problem or because some subparts need to be decomposed out. Some functions are forced to be long because their problem demands a long sequence of related sub-parts where no subpart is independent or complex enough to merit its own function. A long function automatically creates doubts about the quality of its decomposition-- so be doubly sure that everything in the function needs to be there. Ideally a function should be 5-15 lines long--that's a ballpark figure, of course; but a good rule of thumb. Short Routines Very short routines (just one to two lines of code) may be questionable. The added decomposition is often not worth the additional conceptual overhead of another function. Something that can be accomplished in two lines is probably not complex enough to merit its own routine. The decomposition can be helpful if the lines are very complex, are distracting in the calling function, require their own local variables, or are called a large number of times. A short routine can also be worthwhile just for readability. Replacing the expression:
sqrt((a.x - b.x)^2 + (a.y - b.y)^2) with the expression Distance(a,b) makes the code much more readable, even though the distance function is only one line long. Do not worry about the run-time overhead of an additional function call. A modern compiler can inline the code where appropriate to avoid the call overhead. Parameters A function should have the smallest number of parameters possible to solve its problem. The function should not add constraints on the input beyond those demanded by the problem. This makes it easier to reuse the function later in the largest number of contexts. Sometimes there is no way around having a large number of parameters. As with function length above, a large number of parameters is a warning sign that the decomposition may not be ideal. If a set of parameters always seem to travel together-- should they be packaged together in a struct? Complexity The lines in a function should narrate the steps of the function's algorithm. If the complexity or details of one "step" distract from the problem at hand, then the step should be decomposed out to isolate its problem. It's suspicious if a step requires 4 several local variables that are not used in the other steps. It's good if the steps in a function all operate at the same level of detail. Repetition Avoid repeating more than a couple lines of code. A repeated sequence should be put in its own function which is then called where the repeated code used to be. A reasonable rule of thumb is: decompose out a repeated sequence of code if it will make the program shorter including the new function declaration and calls. The common sequences do not need to be identical. Slight differences may be factored out in the parameters of the function. It's a sure warning sign that you've got some duplication if you find yourself copying and pasting code. Generality A sub-problem that is an extremely familiar or common idiom should be decomposed out. Recurring, general problems such as Searching, Sorting, Distance, Set intersection, etc... make excellent functions because their abstractions are immediately understood since they have been seen so many times. Such functions have the best chance of being reused in other parts of the program or in other programs. Also, a modern language will include pre-canned solutions to all the common general problems-- so having identified the decomposition, you don't have to write anything, you just call the library routine. Non-Local Access Access to a variable outside the function should always be through the parameters. Non-local access to a global variable violates the black-box paradigm. The routine is no longer portable and its relationship with all the rest of the code in the program becomes complex, especially when stepping through when debugging. The main advantage of a black box is that it interacts with the outside world through a small, well-defined parameter mechanism to keep the complexity in and the abstraction simple. This helps decomposition and readability and makes the routine reusable in other routines. Non-local access is only ok for constants. ...
View Full Document
This note was uploaded on 04/18/2008 for the course CS 106X taught by Professor Cain,g during the Winter '08 term at Stanford.
- Winter '08