Unformatted text preview: Beginning Database Design
Beginning Database Design Clare Churcher Beginning Database Design: From Novice to Professional
Copyright © 2012 by Clare Churcher
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or
part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, transmission or information storage and retrieval, electronic adaptation, adaptation to computer software,
or by similar or dissimilar methodology now known or hereafter developed. Exempted from this
legal reservation are brief excerpts in connection with reviews or scholarly analysis, or material
supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher's location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be
obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law.
ISBN-13 (pbk): 978-1-4302-4209-3
ISBN-13 (electronic): 978-1-4302-4210-9
Trademarked names, logos, and images may appear in this book. Rather than use a trademark
symbol with every occurrence of a trademarked name, logo, or image we use the names, logos,
and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if
they are not identified as such, is not to be taken as an expression of opinion as to whether or not
they are subject to proprietary rights.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.
President and Publisher: Paul Manning
Lead Editor: Jonathan Gennick
Technical Reviewer: Stéphane Faroult
Editorial Board: Steve Anglin, Ewan Buckingham, Gary Cornell, Louise Corrigan, Morgan
Ertel, Jonathan Gennick, Jonathan Hassell, Robert Hutchinson, Michelle Lowman, James
Markham, Matthew Moodie, Jeff Olson, Jeffrey Pepper, Douglas Pundick, Ben RenowClarke, Dominic Shakeshaft, Gwenan Spearing, Matt Wade, Tom Welsh
Coordinating Editor: Anita Castro
Copy Editor: Chandra Clarke
Compositor: SPi Global
Indexer: SPi Global
Artist: SPi Global Cover Designer: Anna Ishchenko
Distributed to the book trade worldwide by Springer Science+Business Media New York, 233
Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505,
e-mail [email protected], or visit .
For information on translations, please e-mail [email protected], or visit .
Apress and friends of ED books may be purchased in bulk for academic, corporate, or promotional use. eBook versions and licenses are also available for most titles. For more information,
reference our Special Bulk Sales–eBook Licensing web page at .
Any source code or other supplementary materials referenced by the author in this text are available to readers at . For detailed information about how to locate your book’s
source code, go to . To Neville Contents at a Glance
About the Author
About the Technical Reviewer
Chapter 1: What Can Go Wrong
Chapter 2: Guided Tour of the Development Process
Chapter 3: Initial Requirements and Use Cases
Chapter 4: Learning from the Data Model
Chapter 5: Developing a Data Model
Chapter 6: Generalization and Specialization
Chapter 7: From Data Model to Relational Database Design
Chapter 8: Normalization
Chapter 9: More on Keys and Constraints
Chapter 10: Query Basics
Chapter 11: User Interface
Chapter 12: Other Implementations
Appendix Index Contents
About the Author
About the Technical Reviewer
Chapter 1: What Can Go Wrong
Mishandling Keywords and Categories
Designing for a Single Report
Summary Chapter 2: Guided Tour of the Development Process
Initial Problem Statement
Analysis and Simple Data Model
Classes and Objects
Relationships Further Analysis: Revisiting the Use Cases
Interfaces for Input Use Cases
Reports for Output Use Cases Summary Chapter 3: Initial Requirements and Use Cases
Real and Abstract Views of a Problem Data Minding
Task Automation What Does the User Do?
What Data Are Involved?
What Is the Objective of the System?
What Data are Required to Satisfy the Objective?
What are the Input Use Cases?
What Is the First Data Model?
What Are the Output Use Cases?
More About Use Cases
Exceptions and Extensions
Use Cases for Maintaining Data
Use Cases for Reporting Information Finding Out More About the Problem
What Have We Postponed?
Meals That Are Discontinued
Quantities of Particular Meals Summary Chapter 4: Learning from the Data Model
Review of Data Models
Optionality: Should It Be 0 or 1?
Student Course Example
Customer Order Example
Insect Example A Cardinality of 1: Might It Occasionally Be Two?
Sports Club Example A Cardinality of 1: What About Historical Data?
Sports Club Example
Insect Example A Many–Many: Are We Missing Anything?
Sports Club Example Student Course Example
Meal Delivery Example
When a Many–Many Doesn’t Need an Intermediate Class Summary Chapter 5: Developing a Data Model
Attribute, Class, or Relationship?
Two or More Relationships Between Classes
Different Routes Between Classes
Routes Providing Different Information
False Information from a Route (Fan Trap)
Gaps in a Route Between Classes (Chasm Trap) Relationships Between Objects of the Same Class
Relationships Involving More Than Two Classes
Summary Chapter 6: Generalization and Specialization
Classes or Objects with Much in Common
Inheritance in Summary
When Inheritance Is Not a Good Idea
Confusing Objects with Subclasses
Confusing an Association with a Subclass When Is Inheritance Worth Considering?
Should the Superclass Have Objects?
Objects That Belong to More Than One Subclass
Composites and Aggregates
It Isn’t Easy
Summary Chapter 7: From Data Model to Relational Database Design
Representing the Model Representing Classes and Attributes
Creating a Table
Choosing Data Types
Domains and Constraints
Checking Character Fields Primary Key
Determining a Primary Key
Concatenated Keys Representing Relationships
Representing 1–Many Relationships
Representing Many–Many Relationships
Representing 1–1 Relationships
Representing Inheritance Summary Chapter 8: Normalization
Dealing With Update Anomalies Functional Dependencies
Definition of a Functional Dependency
Functional Dependencies and Primary Keys Normal Forms
First Normal Form
Second Normal Form
Third Normal Form
Boyce–Codd Normal Form
Data Models or Functional Dependencies? Additional Considerations
Summary Chapter 9: More on Keys and Constraints
Choosing a Primary Key
More About ID Numbers Candidate Keys
An ID Number or a Concatenated Key? Unique Constraints
Using Constraints Instead of Category Classes
Deleting Referenced Records
Summary Chapter 10: Query Basics
Simple Queries on One Table
The Project Operation
The Select Operation
Ordering Queries with Two or More Tables
The Join Operation
Set Operations How Indexes Can Help
Indexes and Simple Queries
Disadvantages of Indexes
Types of Indexes Views
Uses for Views Summary Chapter 11: User Interface
Data Entry Forms Based on a Single Table
Data Entry Forms Based on Several Tables
Constraints on a Form
Restricting Access to a Form Reports
Basing Reports on Views
Main Parts of a Report
Grouping and Summarizing Summary Chapter 12: Other Implementations
Classes and Objects
Complex Types and Methods
Collections of Objects
OO Environments Implementing a Data Model in a Spreadsheet
Many–Many Relationships Implementing in XML
Defining XML types Querying XML
When I wrote the foreword to the first edition of Beginning Database Design, I expressed my hopes to see this book become a popular classic. I felt that it deserved to be
so. As the technical reviewer, I had thoroughly enjoyed Clare’s skill in turning a subject
that is often presented dryly into a vivid and interesting book, and her skill in dissecting
the thought process that lets you go from functional requirements to the design of a database that will be able to keep data consistent, grow, and bear the load. Beginning Database Design doesn’t enunciate, like so many books, quasi–divine rules with pretentious
jargon. It explains the goals, the common mistakes, why they are mistakes, and what
you should do instead. It brings to light the logic behind the rules, all in a short and very
There is much satisfaction in seeing five years later that my hopes have been fulfilled,
and that Beginning Database Design has become one of the leading titles on this important topic—databases are everywhere and database design belongs to the core body of
knowledge of any serious software developer. This edition has retained all the qualities
that made the first one successful, including Clare’s lucid writing and humor, and if the
page count has increased it has mostly been to include exercises allowing readers to test
their understanding and compare their solutions to the answers that are provided. As the
technical reviewer once again, I was in a privileged position to witness the small improvements—there wasn’t that much to improve—that Clare has brought to her book,
clarifying a sentence here, improving an example there. There is a great quote by SaintExupéry, the author of The Litte Prince, that says that perfection is achieved not when
there is nothing left to add, but when there is nothing left to remove. I am sure that Clare
will agree with me that this remark, written with aircraft engineering in mind, applies to
database design as well. I also feel that there is nothing to remove from this book.
Database, SQL, and Performance Consultant
RoughSea Limited About the Author
Clare Churcher (B.Sc. [Honors], Ph.D.) has designed and
implemented databases for a variety of clients and research
projects. She is currently the Head of the Applied Computing Department at Lincoln University in Lincoln, Canterbury, New Zealand. Clare has designed and delivered a
range of courses including analysis and design of information systems, databases, and programming. She has received a university teaching award in recognition of her
expertise in communicating her knowledge. Clare has
road–tested her design principles by supervising over 70
undergraduate group database design projects. Examples
from these real–life situations are used to illustrate the
ideas in this book. About the Technical Reviewer
Stéphane Faroult first discovered relational databases and the SQL language back in
1983. He joined Oracle France in its early days (after a brief spell with IBM and a bout
of teaching at the University of Ottawa) and soon developed an interest in performance
and tuning topics. After leaving Oracle in 1988, he brieﬂy tried to reform and did a bit
of operational research; but after one year, he succumbed again to relational databases.
He has been continuously performing database consultancy since then, and founded
RoughSea Limited in 1998. He is the author of The Art of SQL (O’Reilly, 2006) and of
Refactoring SQL Applications (O’Reilly, 2008). Acknowledgments
Thanks to my family, friends, and colleagues who helped with the two editions of this
book. First of all, I want to say thanks very much to my husband, Neville, for introducing me to this subject a long time ago and for always being prepared to offer advice and
support. Thanks also to all my friends and colleagues at Lincoln University for their interest and input. Most of the examples in these books are based on scenarios that have
cropped up during my teaching at Lincoln. So, a big thank you to my students for all the
quirky insights, understandings, and misunderstandings they have introduced me to over
the last 19 years.
Thanks again to my editor Jonathan Gennick for suggesting I write a second edition
and providing helpful suggestions, and also to Stéphane Faroult for his good–humored
expertise as technical reviewer. Introduction
Everyone keeps data. Big organizations spend millions to look after their payroll, customer, and transaction data. The penalties for getting it wrong are severe: businesses
may collapse, shareholders and customers lose money, and for many organizations (airlines, health boards, energy companies), it is not exaggerating to say that even personal
safety may be put at risk. And then there are the lawsuits. The problems in successfully
designing, installing, and maintaining such large databases are the subject of numerous
books on data management and software engineering. However, many small databases
are used within large organizations and also for small businesses, clubs, and private concerns. When these go wrong, it doesn’t make the front page of the papers; but the costs,
often hidden, can be just as serious.
Where do we find these smaller electronic databases? Sports clubs will have membership information and match results; small businesses might maintain their own customer
data. Within large organizations, there will also be a number of small projects to maintain data information that isn’t easily or conveniently managed by the large system–wide
databases. Researchers may keep their own experiment and survey results; groups will
want to manage their own rosters or keep track of equipment; departments may keep
their own detailed accounts and submit just a summary to the organization’s financial
Most of these small databases are set up by end users. These are people whose main
job is something other than that of a computer professional. They will typically be scientists, administrators, technicians, accountants, or teachers, and many will have only
modest skills when it comes to spreadsheet or database software.
The resulting databases often do not live up to expectations. Time and energy is expended to set up a few tables in a database product such as Microsoft Access, or in setting up a spreadsheet in a product such as Excel. Even more time is spent collecting and
keying in data. But invariably (often within a short time frame) there is a problem producing what seems to be a quite simple report or query. Often this is because the way the
tables have been set up makes the required result very awkward, if not impossible, to
achieve. Getting It Wrong A database that does not fulfill expectations becomes a costly exercise in more ways
than one. We clearly have the cost of the time and effort expended on setting up an unsatisfactory application. However, a much more serious problem is the inability to
make the best use of valuable data. This is especially so for research data. Scientific
and social researchers may spend considerable money and many years designing experiments, hiring assistants, and collecting and analyzing data, but often very little thought
goes into storing it in an appropriately designed database. Unfortunately, some quite
simple mistakes in design can mean that much of the potential information is lost. The
immediate objective may be satisfied, but unforeseen uses of the data may be seriously
compromised. Next year’s grant opportunities are lost.
Another hidden cost comes from inaccuracies in the data. Poor database design allows what should be avoidable inconsistencies to be present in the data. Poor handling
of categories can cause summaries and reports to be misleading or, to be blunt, wrong.
In large organizations, the accumulated effects of each department’s inaccurate summary information may go unnoticed.
Problems with a database are not necessarily caused by a lack of knowledge about
the database product itself (though this will eventually become a constraint) but are often the result of having chosen the wrong attributes to group together in a particular
table. This comes about for two main reasons:
The creator does not have a clear idea of what information the database is meant to
be delivering in the short and medium term
The creator does not have a clear model of the different classes of data and their relationships to each other
This book describes techniques for gaining a precise understanding of what a problem is about, how to develop a conceptual model of the data involved, and how to
translate that model into a database design. You’ll learn to design better databases.
You’ll avoid the cost of “getting it wrong.” Create a Data Model
The chasm between having a basic idea of what your database needs to be able to do
and designing the appropriate tables is bridged by having a clear data model. Data
modeling involves thinking very carefully about the different sets or classes of data
needed for a particular problem.
Here is a very simple textbook example: a small business might have customers,
products, and orders. We need to record a customer’s name. That clearly belongs with
our set of customer data. What about address? Now, does that mean the customer’s
contact address (in which case it belongs to the customer data) or where we are shipping the order (in which case it belongs with information about the order)? What about discount rate? Does that belong with the customer (some are gold card customers), or
the product (dinner sets are on special at the moment), or the order (20% off orders
over $400.00), or none of the above, or all of the above, or does it depend on the boss’s
Getting the correct answers to these questions is obviously vital if you are going to
provide a useful database for yourself or your client. It is no good heading up a column
in your spreadsheet “Discount” before you have a very precise understanding of exactly what a discount means in the context of the current problem. Data modeling– diagrams provide very precise and easy–to–interpret documentation for answers to questions such as those just posed. Even more importantly, the process of constructing a
data model leads you to ask the questions in the first place. It is this, more than anything else, that makes data modeling such a useful tool.
The data models we will be looking at in this book are small. They may represent
small problems in their entirety, but more likely they will be small parts of larger problems. The emphasis will be on looking very carefully at the relationships between a
few classes of data and getting the detail right. This means using the first attempts at
the model to form questions for the user, to find the exceptions (before they find you),
and then to make some pragmatic decisions about how much of the detail is necessary
to make a useful database. Without a good data model, any database is pretty much
doomed before it is started.
Data models are often represented visually using some sort of diagram. Diagrams allow you to take in a large amount of information at a glance, giving you the ability to
quickly get the gist of a database design without having to read a lot of text. We will be
using the class diagram notation from UML to represent our data models, but many
other notations are equally useful. Database Impleme...
View Full Document