#### You've reached the end of your free preview.

Want to read all 414 pages?

**Unformatted text preview: **Introduction .................................................................................................................. 2
Chapter 00 : Built-in Data Types ................................................................................. 10
Chapter 01 : Vector ..................................................................................................... 24
Chapter 02 : Stack ....................................................................................................... 45
Chapter 03 : Queue ..................................................................................................... 75
Chapter 04 : Deque ..................................................................................................... 97
Chapter 05 : Set......................................................................................................... 117
Chapter 06 : Linked List ............................................................................................ 144
Chapter 07 : List ....................................................................................................... 169
Chapter 08 : Binary Tree ........................................................................................... 194
Chapter 09 : Binary Search Tree ................................................................................ 220
Chapter 10 : Map ...................................................................................................... 256
Chapter 11 : Sorting .................................................................................................. 277
Chapter 12 : Hash ..................................................................................................... 315
Chapter 13 : Graph .................................................................................................... 337
A.
B.
C.
D.
E.
F.
G.
H.
I.
J.
K.
L.
M.
N. ADT Properties .................................................................................................... 366
ADT Performance ................................................................................................ 367
Review: Pseudocode ............................................................................................ 369
Review: UML Class Diagram............................................................................... 371
Review: Operator Overloading ............................................................................ 378
Review: Template ................................................................................................ 382
Review: Iterator ................................................................................................... 386
Review: Nested Class ........................................................................................... 389
Review: Namespace ............................................................................................. 393
Review: Separate Compilation ............................................................................. 394
Review: Recursion ............................................................................................... 396
Debugging ........................................................................................................... 401
Glossary ............................................................................................................... 408
Index.................................................................................................................... 413 C++ Data Structures | Introduction | Page 1 Introduction
What is a datatype and what is a data structure? In the simplest terms, a datatype is a way to represent data
digitally. This enables us to store numbers, text, colors, locations, and just about anything else you can think
of on a computer. Data structures are datatypes except they represent collections of elements. Both datatypes
and data structures are tools to allow us to represent ideas or physical things in the world around us in a
computer program.
The first generations of digital computers were not as data centric as modern computers. In fact, they were
little more than calculators. They were designed to accomplish only one specific task. These tasks were purely
mathematical, such as predicting tides and computing artillery tables for the military. Claude Shannon
changed all this in the 1930’s when he explored methods for representing logical equations on the computer.
This was followed in the 1950’s with digital representations of letters and then text. As computers started
filling more business and personal roles, the need for representing diverse elements digitally increased greatly.
In 1986, less than 1% of the world’s data was stored digitally. The 50% mark was crossed in 2002. Today,
more than 99% of the world’s data is stored digitally and the pace of digital content creation is increasing
exponentially.
A key to this remarkable transformation to digital data is the extreme flexibility of computers. Not only can
computers be programmed to accomplish a wide variety of tasks, but they can also be programmed to store
a wide variety of data. Digital representation of data is a fundamental decision point of any programming
problem and digital data is an important part of our everyday life. With a more complete understanding of
data types and data structures, we can make more informed decisions and make our programs run more
efficiently. This is the subject of this textbook. Three Considerations
When working with a datatype or data structure, there are three important considerations: correctness, size,
and performance. Correctness
Correctness is a function of the datatype or data structure behaving the way the client expects. We call this
the “contract.” For example, the contract for a number datatype would consist of how it performs arithmetic
operations. If the client cannot rely on 2+2=4, then what value is the number datatype?
All datatypes and data-structures must have the correctness property both in the typical and common uses as
well as in the atypical and marginal uses. Size
Size is a function of the amount of space required to represent the data. For example, one might consider
storing a pixel as three integers. Since an integer takes 64 bits or 8 bytes, that is a total 24 bytes for a single
RGB (red, green, blue) pixel. By itself it does not seem like much, but when combined into a 20-megapixel
image, it makes for a large file. If the same pixel was to be stored as three chars (8 bits apiece for 24 bits total),
then the 20-megapixel image would take 1/8th the size! Page 2 | Introduction | C++ Data Structures Performance
The final consideration is performance. How long does it take to perform the operations associated with the
datatype or data structure? For the most part, we care about how the performance scales as we use a large
number of elements. Consider a list data-structure. If one were to need a loop to search through a list to find
an element, then it would take on average 5.5 units of time (such as 5.5 milliseconds) to find the element in
a list of 10 elements. If you double the size of the list, it will take twice as long (11 milliseconds). It will take
on average half million units of time to find an element when there are a million elements. Since the cost is
directly proportional to the size, we call this relationship linear. Thus, if the size of the list is n, then the cost
of finding an element is O(n) which means roughly “n times some constant.” The most common performance
characteristics of a function are the following: Notation Name Description O(1) Constant Performance is unrelated to the size of the data structure O(log n) Logarithmic Performance is a log of the size of the data structure O(n) Linear Performance is directly related to size O(n log n) N-Log-N Performance is linear times logarithmic O(n2) N-Squared Performance is the square of the size of the data structure O(2n) 2-to-the-N Performance is really bad To give you some idea of how these performance characteristics relate to each other, consider a data-set
containing 1,000 elements. If it takes a blink of an eye to perform one operation (0.1 second if you are very
quick), how long would a function take to complete? Performance Time Performance Implication O(1) 0.1 second Faster than you will notice O(log n) 1 second A noticeable lag O(n) 1½ minutes Takes considerable patience O(n log n) 14 minutes The user will leave the computer to take a walk O(n2) 28 hours The user lost all hope O(2n) 2 trillion times the age of the universe Consider what would happen if our one operation took us twice as long to perform (0.2 seconds as opposed
to 0.1) over the same 1,000 element dataset. Executing this operation once would show a noticeable change;
everything would take twice as long. However, if we used an algorithm that is O(n2) as opposed to O(n log
n), then the performance implications would be dramatic (56 hours for O(n2) as opposed to 28 minutes for
to O(n log n)). Thus, the exponent of the performance characteristic (meaning O(1) vs. O(n) vs. O(n2)) is
much more important than the duration of a single event (0.2 seconds per operation vs. 0.1 seconds).
This semester, we will consider algorithms that have all the above performance characteristics except O(2n).
Now, the next obvious question is: how can we identify the performance characteristic of a given algorithm?
This can be answered by looking at each of the various performance levels in turn. C++ Data Structures | Introduction | Page 3 O(1)
Constant performance yields comparable execution time regardless of the size of the dataset. This means the
function cannot have a loop that is related to the size of the dataset. Consider the following function:
template <class T>
Stack <T> :: Stack(int size)
{
if (size > 0)
data = new T[size];
else
throw "ERROR: Unable to allocate buffer";
} //
//
//
// line
line
line
line 1
2
3
4 It takes the same amount of time to allocate a memory block of 4 bytes as it does 4 million bytes. Thus,
regardless of the value of size, it takes the same amount of time to execute this function. Line Discussion
1
Only executed once, when the function is called Cost 2 Executed at most once O(1) 3 Not technically a line of code. This is part of the IF statement 4 Executed at most once, when the passed size is invalid O(1) O(1) If we were to graph this with the vertical axis being execution time and the horizontal axis being the size of
the dataset, the graph would be flat: O(1) Execution Time
12 Time (ms) 10
8
6
4
2 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 n This means that no matter the number of elements in the dataset (n), the execution time is always the same.
Thus, we see that constant time algorithms are very efficient. They are the gold standard that we strive to
achieve in our designs, but they are also not always possible. Page 4 | Introduction | C++ Data Structures O(log n)
Logarithmic performance (and indeed everything but constant) does relate execution time to data-set size.
However, the rate of increase slows down as the data set gets larger. An example is a binary search:
bool binarySearch(double data, int n, double find)
{
int iFirst = 0;
int iLast = n - 1;
// line 1
while (iFirst <= iLast)
{
iMiddle = (iFirst + iLast) / 2;
if (data[iMiddle] == find)
return true;
if (data[iMiddle] > find)
iLast = iMiddle – 1;
else
iFirst = iMiddle + 1;
}
return false; // line 2
// line 3
// line 4
// line 5
// line 6
// line 7
// line 8 } Notice in line 3 how the distance between iFirst and iLast gets cut in half every iteration. Thus, it will take
a single additional iteration of the main loop to search through a dataset that is twice the size. This is the
distinguishing characteristic of logarithmic algorithms. Line
1 Discussion Cost Only executed once, when the function is called O(1) 2,3,5 Executed once for each loop. There are log n loops O(log n) 4,8 Can only be executed once because we exit the function after it is called O(1) 6,7 Executed about half the times the loop is run. This is still O(log n) O(log n) The graph of execution time to n increases with time but the slope decreases. O(log n) Execution Time
14 Time (ms) 13
12
11 10
9
8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 n Many people find it difficult to internalize logarithmic growth. Consider this equation: 2 = → log 2 = Thus, exponents and logarithms are inverses of each other. Since 232 = 4 then log 2 4 = 32.
This means it will only take 32 iterations through our binary search to find an element in a list of 4 billion.
Logarithmic algorithms are very efficient, even for large datasets.
C++ Data Structures | Introduction | Page 5 O(n)
In linear algorithms, twice as much data takes twice as long to execute. Consider the following algorithm to
find the largest element in an un-sorted array. We will use a single loop to iterate through all members of the
dataset:
template <class T>
T & findLargest(T data, int n)
{
T * pT = data;
for (int i = 1; i < n; i++)
if (*pT < data[i])
pT = data + i;
return *pT;
} // line 1
// line 2
// line 3
// line 4 We can estimate the performance cost of this algorithm by counting how many times each line of code gets
executed. Line Discussion
Cost
1
Only executed once, when the function is called O(1)
2 Executed once per iteration of the loop O(n) 3 Executed once per iteration of the loop O(n) 4 Executed once at the end of the program O(1) Thus, we can see that the overall cost of the function is O(n).
The graph of performance to data-set size is influenced by n and it increases steadily with time. Regardless of
the value of n, the slope remains constant. Time (ms) O(n) Execution Time
26
24
22
20
18
16
14
12
10
8
1 2 3 4 5 6 7 8 n Page 6 | Introduction | C++ Data Structures 9 10 11 12 13 14 15 O(n log n)
Some algorithms grow faster than linear but not much faster. For example, consider an algorithm that needs
to perform an O(log n) search many times. If it needs to perform this search n times, then the algorithm is
called O(n log n). Consider the following function:
bool sorted(double array, int n)
{
// look for each item in the list
for (int i = 0; i < n; i++)
if (!binarySearch(data, n, data[i]))
return false;
return true;
} //
//
//
// line
line
line
line 1
2
3
4 We can estimate the performance cost of this algorithm by counting how many times each line of code gets
executed. Line Discussion
1
Executed once for every n in the array Cost 2 Since binarySearch() is O(log n) and it is executed n times, n × O(log n) O(n log n) 3 At most this can be executed one time O(1) 4 At most this can be executed one time O(1) O(n) Thus, we can see that the overall cost of the function is O(n log n), which is the cost of the most expensive
line of code. The graph of performance to data-set size shows that as n increases, the slope gradually increases.
However, as n gets very large, the rate of increase becomes almost unnoticeable and appears linear. O(n log n) Execution Time
68 Time (ms) 58
48
38
28
18
8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 n C++ Data Structures | Introduction | Page 7 O(n2)
There are times when the cost of performing an action becomes much worse as n increases. There are examples
in nature. For example, the wind resistance on a car increases as a square of the speed. If you go twice as fast,
it takes four times the amount of power to displace the air. As a rule, programmers try to avoid algorithms
that exhibit such a behavior. Consider the following code:
bool duplicatesExist(bool array, int n)
{
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
if (i != j && array[i] == array[j])
return true;
return false;
} //
//
//
//
// line
line
line
line
line 1
2
3
4
5 We can estimate the performance cost of this algorithm by counting how many times each line of code gets
executed. Line Discussion
1
Executed once for every n in the array Cost 2 It will take O(n) to complete this loop. However, it will be executed n times. O(n2) 3 It takes O(1) to execute this like of code, but it will be executed n2 times O(n2) 4 At most this can be executed one time O(1) 5 At most this can be executed one time O(n) O(1)
2 Thus, we can see that the overall cost of the function is O(n ), which is the cost of the most expensive line of
code. The graph of performance to data-set size shows that as n increases, the slope drastically increases. O(n2) Execution Time
200 Time (ms) 150
100
50
0
1 2 3 4 5 6 7 8 n Page 8 | Introduction | C++ Data Structures 9 10 11 12 13 14 15 Comparing algorithm complexity
This brief introduction to algorithmic complexity or “Big-O” does not cover all the complexities involved.
The main point is to understand how to recognize O(1) vs. O(n) vs. O(n log n) etc. This is mostly
accomplished by looking at the loops and seeing how n affects the number of times the body of the loops
execute. Execution Time
24 22 20 18 16 14 12 10 8
1 2 3
O(1) 4 5 6 O(log n) 7 8 9
O(n) 10 11 12 O(n log n) 13 14 15 16 O(n^2) The second point is that O(1) and O(log n) are very similar. The same is true with O(n) and O(n log n).
That being said, there is a huge difference between O(1) and O(n) and there is also a huge difference between
O(n) and O(n2). As programmers, we do all we can to avoid O(n2) when an O(n) or O(n log n) algorithm
is possible. Similarly we do all we can to avoid O(n) when an O(log n) or O(1) algorithm is possible. C++ Data Structures | Introduction | Page 9 Chapter 00 : Built-in Data Types
A data structure is a format for organizing data. Generally, the term is associated with computers, but it
applies to other realms as well. For example, “11/12/1888” is a data structure associated with a date where
the first field is the month (November), the second is the day of the month (the 12 th), and the final is the
year. Note that “12.11.1888” is the most common date data structure in Germany because they put the day
of the month on the left. Data structures on computing systems are often far more complex than their physical
world counterparts. They represent such things as songs, inventory parts, and bank transactions, as well as
collections of songs, collections of parts, and collections of transactions. The data structures topic is important
to computer science because a great deal of programming has to do with representing real-world constructs
on a digital computer.
A data type is a specific format for representing data on a computer system. Since all data on a computer
system is stored digitally as collections of 1’s and 0’s, there needs to be some way to translate this digital data
into a format that is meaningful in a computing context. The data type specifies this translation. A data type
specifies how to convert the concept of a number, a user’s name, or a movie into 1’s and 0’s stored in memory
or on a long-term storage device. It also specifies how these 1’s and 0’s can be translated back to the concept
they are meant to represent. The concept of a data structure and a data type are closely connected. The data
structure describes how data is organized and the data type describes the translation process.
There are three classifications of data types in a computing context: • • • Built-in data types. These data types are so generic and are so universally useful that they are built
into most computing systems and programming languages. Built-in data types are also called
“primitive data types” or simply “primitives” because they are the building blocks of other data
types. Examples of these include integers, characters, and floating-point numbers.
Custom data types. These data types are designed for specific applications. If I were to build an
application to play a card game, I might choose to create a data type to represent a single playing
card. Custom data types are built in C++ using classes, structures, type definitions, and
enumerations. They are a large focus of obj...

View
Full Document

- Spring '16