DataStructures.pdf - Introduction 2 Chapter 00 Built-in Data Types 10 Chapter 01 Vector 24 Chapter 02 Stack 45 Chapter 03 Queue 75 Chapter 04 Deque 97

DataStructures.pdf - Introduction 2 Chapter 00 Built-in...

This preview shows page 1 out of 414 pages.

You've reached the end of your free preview.

Want to read all 414 pages?

Unformatted text preview: Introduction .................................................................................................................. 2 Chapter 00 : Built-in Data Types ................................................................................. 10 Chapter 01 : Vector ..................................................................................................... 24 Chapter 02 : Stack ....................................................................................................... 45 Chapter 03 : Queue ..................................................................................................... 75 Chapter 04 : Deque ..................................................................................................... 97 Chapter 05 : Set......................................................................................................... 117 Chapter 06 : Linked List ............................................................................................ 144 Chapter 07 : List ....................................................................................................... 169 Chapter 08 : Binary Tree ........................................................................................... 194 Chapter 09 : Binary Search Tree ................................................................................ 220 Chapter 10 : Map ...................................................................................................... 256 Chapter 11 : Sorting .................................................................................................. 277 Chapter 12 : Hash ..................................................................................................... 315 Chapter 13 : Graph .................................................................................................... 337 A. B. C. D. E. F. G. H. I. J. K. L. M. N. ADT Properties .................................................................................................... 366 ADT Performance ................................................................................................ 367 Review: Pseudocode ............................................................................................ 369 Review: UML Class Diagram............................................................................... 371 Review: Operator Overloading ............................................................................ 378 Review: Template ................................................................................................ 382 Review: Iterator ................................................................................................... 386 Review: Nested Class ........................................................................................... 389 Review: Namespace ............................................................................................. 393 Review: Separate Compilation ............................................................................. 394 Review: Recursion ............................................................................................... 396 Debugging ........................................................................................................... 401 Glossary ............................................................................................................... 408 Index.................................................................................................................... 413 C++ Data Structures | Introduction | Page 1 Introduction What is a datatype and what is a data structure? In the simplest terms, a datatype is a way to represent data digitally. This enables us to store numbers, text, colors, locations, and just about anything else you can think of on a computer. Data structures are datatypes except they represent collections of elements. Both datatypes and data structures are tools to allow us to represent ideas or physical things in the world around us in a computer program. The first generations of digital computers were not as data centric as modern computers. In fact, they were little more than calculators. They were designed to accomplish only one specific task. These tasks were purely mathematical, such as predicting tides and computing artillery tables for the military. Claude Shannon changed all this in the 1930’s when he explored methods for representing logical equations on the computer. This was followed in the 1950’s with digital representations of letters and then text. As computers started filling more business and personal roles, the need for representing diverse elements digitally increased greatly. In 1986, less than 1% of the world’s data was stored digitally. The 50% mark was crossed in 2002. Today, more than 99% of the world’s data is stored digitally and the pace of digital content creation is increasing exponentially. A key to this remarkable transformation to digital data is the extreme flexibility of computers. Not only can computers be programmed to accomplish a wide variety of tasks, but they can also be programmed to store a wide variety of data. Digital representation of data is a fundamental decision point of any programming problem and digital data is an important part of our everyday life. With a more complete understanding of data types and data structures, we can make more informed decisions and make our programs run more efficiently. This is the subject of this textbook. Three Considerations When working with a datatype or data structure, there are three important considerations: correctness, size, and performance. Correctness Correctness is a function of the datatype or data structure behaving the way the client expects. We call this the “contract.” For example, the contract for a number datatype would consist of how it performs arithmetic operations. If the client cannot rely on 2+2=4, then what value is the number datatype? All datatypes and data-structures must have the correctness property both in the typical and common uses as well as in the atypical and marginal uses. Size Size is a function of the amount of space required to represent the data. For example, one might consider storing a pixel as three integers. Since an integer takes 64 bits or 8 bytes, that is a total 24 bytes for a single RGB (red, green, blue) pixel. By itself it does not seem like much, but when combined into a 20-megapixel image, it makes for a large file. If the same pixel was to be stored as three chars (8 bits apiece for 24 bits total), then the 20-megapixel image would take 1/8th the size! Page 2 | Introduction | C++ Data Structures Performance The final consideration is performance. How long does it take to perform the operations associated with the datatype or data structure? For the most part, we care about how the performance scales as we use a large number of elements. Consider a list data-structure. If one were to need a loop to search through a list to find an element, then it would take on average 5.5 units of time (such as 5.5 milliseconds) to find the element in a list of 10 elements. If you double the size of the list, it will take twice as long (11 milliseconds). It will take on average half million units of time to find an element when there are a million elements. Since the cost is directly proportional to the size, we call this relationship linear. Thus, if the size of the list is n, then the cost of finding an element is O(n) which means roughly “n times some constant.” The most common performance characteristics of a function are the following: Notation Name Description O(1) Constant Performance is unrelated to the size of the data structure O(log n) Logarithmic Performance is a log of the size of the data structure O(n) Linear Performance is directly related to size O(n log n) N-Log-N Performance is linear times logarithmic O(n2) N-Squared Performance is the square of the size of the data structure O(2n) 2-to-the-N Performance is really bad To give you some idea of how these performance characteristics relate to each other, consider a data-set containing 1,000 elements. If it takes a blink of an eye to perform one operation (0.1 second if you are very quick), how long would a function take to complete? Performance Time Performance Implication O(1) 0.1 second Faster than you will notice O(log n) 1 second A noticeable lag O(n) 1½ minutes Takes considerable patience O(n log n) 14 minutes The user will leave the computer to take a walk O(n2) 28 hours The user lost all hope O(2n) 2 trillion times the age of the universe Consider what would happen if our one operation took us twice as long to perform (0.2 seconds as opposed to 0.1) over the same 1,000 element dataset. Executing this operation once would show a noticeable change; everything would take twice as long. However, if we used an algorithm that is O(n2) as opposed to O(n log n), then the performance implications would be dramatic (56 hours for O(n2) as opposed to 28 minutes for to O(n log n)). Thus, the exponent of the performance characteristic (meaning O(1) vs. O(n) vs. O(n2)) is much more important than the duration of a single event (0.2 seconds per operation vs. 0.1 seconds). This semester, we will consider algorithms that have all the above performance characteristics except O(2n). Now, the next obvious question is: how can we identify the performance characteristic of a given algorithm? This can be answered by looking at each of the various performance levels in turn. C++ Data Structures | Introduction | Page 3 O(1) Constant performance yields comparable execution time regardless of the size of the dataset. This means the function cannot have a loop that is related to the size of the dataset. Consider the following function: template <class T> Stack <T> :: Stack(int size) { if (size > 0) data = new T[size]; else throw "ERROR: Unable to allocate buffer"; } // // // // line line line line 1 2 3 4 It takes the same amount of time to allocate a memory block of 4 bytes as it does 4 million bytes. Thus, regardless of the value of size, it takes the same amount of time to execute this function. Line Discussion 1 Only executed once, when the function is called Cost 2 Executed at most once O(1) 3 Not technically a line of code. This is part of the IF statement 4 Executed at most once, when the passed size is invalid O(1) O(1) If we were to graph this with the vertical axis being execution time and the horizontal axis being the size of the dataset, the graph would be flat: O(1) Execution Time 12 Time (ms) 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 n This means that no matter the number of elements in the dataset (n), the execution time is always the same. Thus, we see that constant time algorithms are very efficient. They are the gold standard that we strive to achieve in our designs, but they are also not always possible. Page 4 | Introduction | C++ Data Structures O(log n) Logarithmic performance (and indeed everything but constant) does relate execution time to data-set size. However, the rate of increase slows down as the data set gets larger. An example is a binary search: bool binarySearch(double data, int n, double find) { int iFirst = 0; int iLast = n - 1; // line 1 while (iFirst <= iLast) { iMiddle = (iFirst + iLast) / 2; if (data[iMiddle] == find) return true; if (data[iMiddle] > find) iLast = iMiddle – 1; else iFirst = iMiddle + 1; } return false; // line 2 // line 3 // line 4 // line 5 // line 6 // line 7 // line 8 } Notice in line 3 how the distance between iFirst and iLast gets cut in half every iteration. Thus, it will take a single additional iteration of the main loop to search through a dataset that is twice the size. This is the distinguishing characteristic of logarithmic algorithms. Line 1 Discussion Cost Only executed once, when the function is called O(1) 2,3,5 Executed once for each loop. There are log n loops O(log n) 4,8 Can only be executed once because we exit the function after it is called O(1) 6,7 Executed about half the times the loop is run. This is still O(log n) O(log n) The graph of execution time to n increases with time but the slope decreases. O(log n) Execution Time 14 Time (ms) 13 12 11 10 9 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 n Many people find it difficult to internalize logarithmic growth. Consider this equation: 2 = → log 2 = Thus, exponents and logarithms are inverses of each other. Since 232 = 4 then log 2 4 = 32. This means it will only take 32 iterations through our binary search to find an element in a list of 4 billion. Logarithmic algorithms are very efficient, even for large datasets. C++ Data Structures | Introduction | Page 5 O(n) In linear algorithms, twice as much data takes twice as long to execute. Consider the following algorithm to find the largest element in an un-sorted array. We will use a single loop to iterate through all members of the dataset: template <class T> T & findLargest(T data, int n) { T * pT = data; for (int i = 1; i < n; i++) if (*pT < data[i]) pT = data + i; return *pT; } // line 1 // line 2 // line 3 // line 4 We can estimate the performance cost of this algorithm by counting how many times each line of code gets executed. Line Discussion Cost 1 Only executed once, when the function is called O(1) 2 Executed once per iteration of the loop O(n) 3 Executed once per iteration of the loop O(n) 4 Executed once at the end of the program O(1) Thus, we can see that the overall cost of the function is O(n). The graph of performance to data-set size is influenced by n and it increases steadily with time. Regardless of the value of n, the slope remains constant. Time (ms) O(n) Execution Time 26 24 22 20 18 16 14 12 10 8 1 2 3 4 5 6 7 8 n Page 6 | Introduction | C++ Data Structures 9 10 11 12 13 14 15 O(n log n) Some algorithms grow faster than linear but not much faster. For example, consider an algorithm that needs to perform an O(log n) search many times. If it needs to perform this search n times, then the algorithm is called O(n log n). Consider the following function: bool sorted(double array, int n) { // look for each item in the list for (int i = 0; i < n; i++) if (!binarySearch(data, n, data[i])) return false; return true; } // // // // line line line line 1 2 3 4 We can estimate the performance cost of this algorithm by counting how many times each line of code gets executed. Line Discussion 1 Executed once for every n in the array Cost 2 Since binarySearch() is O(log n) and it is executed n times, n × O(log n) O(n log n) 3 At most this can be executed one time O(1) 4 At most this can be executed one time O(1) O(n) Thus, we can see that the overall cost of the function is O(n log n), which is the cost of the most expensive line of code. The graph of performance to data-set size shows that as n increases, the slope gradually increases. However, as n gets very large, the rate of increase becomes almost unnoticeable and appears linear. O(n log n) Execution Time 68 Time (ms) 58 48 38 28 18 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 n C++ Data Structures | Introduction | Page 7 O(n2) There are times when the cost of performing an action becomes much worse as n increases. There are examples in nature. For example, the wind resistance on a car increases as a square of the speed. If you go twice as fast, it takes four times the amount of power to displace the air. As a rule, programmers try to avoid algorithms that exhibit such a behavior. Consider the following code: bool duplicatesExist(bool array, int n) { for (int i = 0; i < n; i++) for (int j = 0; j < n; j++) if (i != j && array[i] == array[j]) return true; return false; } // // // // // line line line line line 1 2 3 4 5 We can estimate the performance cost of this algorithm by counting how many times each line of code gets executed. Line Discussion 1 Executed once for every n in the array Cost 2 It will take O(n) to complete this loop. However, it will be executed n times. O(n2) 3 It takes O(1) to execute this like of code, but it will be executed n2 times O(n2) 4 At most this can be executed one time O(1) 5 At most this can be executed one time O(n) O(1) 2 Thus, we can see that the overall cost of the function is O(n ), which is the cost of the most expensive line of code. The graph of performance to data-set size shows that as n increases, the slope drastically increases. O(n2) Execution Time 200 Time (ms) 150 100 50 0 1 2 3 4 5 6 7 8 n Page 8 | Introduction | C++ Data Structures 9 10 11 12 13 14 15 Comparing algorithm complexity This brief introduction to algorithmic complexity or “Big-O” does not cover all the complexities involved. The main point is to understand how to recognize O(1) vs. O(n) vs. O(n log n) etc. This is mostly accomplished by looking at the loops and seeing how n affects the number of times the body of the loops execute. Execution Time 24 22 20 18 16 14 12 10 8 1 2 3 O(1) 4 5 6 O(log n) 7 8 9 O(n) 10 11 12 O(n log n) 13 14 15 16 O(n^2) The second point is that O(1) and O(log n) are very similar. The same is true with O(n) and O(n log n). That being said, there is a huge difference between O(1) and O(n) and there is also a huge difference between O(n) and O(n2). As programmers, we do all we can to avoid O(n2) when an O(n) or O(n log n) algorithm is possible. Similarly we do all we can to avoid O(n) when an O(log n) or O(1) algorithm is possible. C++ Data Structures | Introduction | Page 9 Chapter 00 : Built-in Data Types A data structure is a format for organizing data. Generally, the term is associated with computers, but it applies to other realms as well. For example, “11/12/1888” is a data structure associated with a date where the first field is the month (November), the second is the day of the month (the 12 th), and the final is the year. Note that “12.11.1888” is the most common date data structure in Germany because they put the day of the month on the left. Data structures on computing systems are often far more complex than their physical world counterparts. They represent such things as songs, inventory parts, and bank transactions, as well as collections of songs, collections of parts, and collections of transactions. The data structures topic is important to computer science because a great deal of programming has to do with representing real-world constructs on a digital computer. A data type is a specific format for representing data on a computer system. Since all data on a computer system is stored digitally as collections of 1’s and 0’s, there needs to be some way to translate this digital data into a format that is meaningful in a computing context. The data type specifies this translation. A data type specifies how to convert the concept of a number, a user’s name, or a movie into 1’s and 0’s stored in memory or on a long-term storage device. It also specifies how these 1’s and 0’s can be translated back to the concept they are meant to represent. The concept of a data structure and a data type are closely connected. The data structure describes how data is organized and the data type describes the translation process. There are three classifications of data types in a computing context: • • • Built-in data types. These data types are so generic and are so universally useful that they are built into most computing systems and programming languages. Built-in data types are also called “primitive data types” or simply “primitives” because they are the building blocks of other data types. Examples of these include integers, characters, and floating-point numbers. Custom data types. These data types are designed for specific applications. If I were to build an application to play a card game, I might choose to create a data type to represent a single playing card. Custom data types are built in C++ using classes, structures, type definitions, and enumerations. They are a large focus of obj...
View Full Document

  • Spring '16

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask ( soon) You can ask (will expire )
Answers in as fast as 15 minutes
A+ icon
Ask Expert Tutors