(Expert's voice in databases) Churcher, Clare - Beginning database design _ from novice to professio

This preview shows page 1 out of 337 pages.

Unformatted text preview: Beginning Database Design Beginning Database Design Clare Churcher Beginning Database Design: From Novice to Professional Copyright © 2012 by Clare Churcher This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, transmission or information storage and retrieval, electronic adaptation, adaptation to computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis, or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher's location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. ISBN-13 (pbk): 978-1-4302-4209-3 ISBN-13 (electronic): 978-1-4302-4210-9 Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. President and Publisher: Paul Manning Lead Editor: Jonathan Gennick Technical Reviewer: Stéphane Faroult Editorial Board: Steve Anglin, Ewan Buckingham, Gary Cornell, Louise Corrigan, Morgan Ertel, Jonathan Gennick, Jonathan Hassell, Robert Hutchinson, Michelle Lowman, James Markham, Matthew Moodie, Jeff Olson, Jeffrey Pepper, Douglas Pundick, Ben RenowClarke, Dominic Shakeshaft, Gwenan Spearing, Matt Wade, Tom Welsh Coordinating Editor: Anita Castro Copy Editor: Chandra Clarke Compositor: SPi Global Indexer: SPi Global Artist: SPi Global Cover Designer: Anna Ishchenko Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail [email protected], or visit . For information on translations, please e-mail [email protected], or visit . Apress and friends of ED books may be purchased in bulk for academic, corporate, or promotional use. eBook versions and licenses are also available for most titles. For more information, reference our Special Bulk Sales–eBook Licensing web page at . Any source code or other supplementary materials referenced by the author in this text are available to readers at . For detailed information about how to locate your book’s source code, go to . To Neville Contents at a Glance Foreword About the Author About the Technical Reviewer Acknowledgments Introduction Chapter 1: What Can Go Wrong Chapter 2: Guided Tour of the Development Process Chapter 3: Initial Requirements and Use Cases Chapter 4: Learning from the Data Model Chapter 5: Developing a Data Model Chapter 6: Generalization and Specialization Chapter 7: From Data Model to Relational Database Design Chapter 8: Normalization Chapter 9: More on Keys and Constraints Chapter 10: Query Basics Chapter 11: User Interface Chapter 12: Other Implementations Appendix Index Contents Foreword About the Author About the Technical Reviewer Acknowledgments Introduction Chapter 1: What Can Go Wrong Mishandling Keywords and Categories Repeated Information Designing for a Single Report Summary Chapter 2: Guided Tour of the Development Process Initial Problem Statement Analysis and Simple Data Model Classes and Objects Relationships Further Analysis: Revisiting the Use Cases Design Implementation Interfaces for Input Use Cases Reports for Output Use Cases Summary Chapter 3: Initial Requirements and Use Cases Real and Abstract Views of a Problem Data Minding Task Automation What Does the User Do? What Data Are Involved? What Is the Objective of the System? What Data are Required to Satisfy the Objective? What are the Input Use Cases? What Is the First Data Model? What Are the Output Use Cases? More About Use Cases Actors Exceptions and Extensions Use Cases for Maintaining Data Use Cases for Reporting Information Finding Out More About the Problem What Have We Postponed? Changing Prices Meals That Are Discontinued Quantities of Particular Meals Summary Chapter 4: Learning from the Data Model Review of Data Models Optionality: Should It Be 0 or 1? Student Course Example Customer Order Example Insect Example A Cardinality of 1: Might It Occasionally Be Two? Insect Example Sports Club Example A Cardinality of 1: What About Historical Data? Sports Club Example Departments Example Insect Example A Many–Many: Are We Missing Anything? Sports Club Example Student Course Example Meal Delivery Example When a Many–Many Doesn’t Need an Intermediate Class Summary Chapter 5: Developing a Data Model Attribute, Class, or Relationship? Two or More Relationships Between Classes Different Routes Between Classes Redundant Information Routes Providing Different Information False Information from a Route (Fan Trap) Gaps in a Route Between Classes (Chasm Trap) Relationships Between Objects of the Same Class Relationships Involving More Than Two Classes Summary Chapter 6: Generalization and Specialization Classes or Objects with Much in Common Specialization Generalization Inheritance in Summary When Inheritance Is Not a Good Idea Confusing Objects with Subclasses Confusing an Association with a Subclass When Is Inheritance Worth Considering? Should the Superclass Have Objects? Objects That Belong to More Than One Subclass Composites and Aggregates It Isn’t Easy Summary Chapter 7: From Data Model to Relational Database Design Representing the Model Representing Classes and Attributes Creating a Table Choosing Data Types Domains and Constraints Checking Character Fields Primary Key Determining a Primary Key Concatenated Keys Representing Relationships Foreign Keys Referential Integrity Representing 1–Many Relationships Representing Many–Many Relationships Representing 1–1 Relationships Representing Inheritance Summary Chapter 8: Normalization Update Anomalies Insertion Problems Deletion Problems Dealing With Update Anomalies Functional Dependencies Definition of a Functional Dependency Functional Dependencies and Primary Keys Normal Forms First Normal Form Second Normal Form Third Normal Form Boyce–Codd Normal Form Data Models or Functional Dependencies? Additional Considerations Summary Chapter 9: More on Keys and Constraints Choosing a Primary Key More About ID Numbers Candidate Keys An ID Number or a Concatenated Key? Unique Constraints Using Constraints Instead of Category Classes Deleting Referenced Records Summary Chapter 10: Query Basics Simple Queries on One Table The Project Operation The Select Operation Aggregates Ordering Queries with Two or More Tables The Join Operation Set Operations How Indexes Can Help Indexes and Simple Queries Disadvantages of Indexes Types of Indexes Views Creating Views Uses for Views Summary Chapter 11: User Interface Input Forms Data Entry Forms Based on a Single Table Data Entry Forms Based on Several Tables Constraints on a Form Restricting Access to a Form Reports Basing Reports on Views Main Parts of a Report Grouping and Summarizing Summary Chapter 12: Other Implementations Object–Oriented Implementation Classes and Objects Complex Types and Methods Collections of Objects Representing Relationships OO Environments Implementing a Data Model in a Spreadsheet 1–Many Relationships Many–Many Relationships Implementing in XML Representing Relationships Defining XML types Querying XML NoSQL Summary Object–Oriented Databases Spreadsheets XML Appendix Index Foreword When I wrote the foreword to the first edition of Beginning Database Design, I expressed my hopes to see this book become a popular classic. I felt that it deserved to be so. As the technical reviewer, I had thoroughly enjoyed Clare’s skill in turning a subject that is often presented dryly into a vivid and interesting book, and her skill in dissecting the thought process that lets you go from functional requirements to the design of a database that will be able to keep data consistent, grow, and bear the load. Beginning Database Design doesn’t enunciate, like so many books, quasi–divine rules with pretentious jargon. It explains the goals, the common mistakes, why they are mistakes, and what you should do instead. It brings to light the logic behind the rules, all in a short and very readable book. There is much satisfaction in seeing five years later that my hopes have been fulfilled, and that Beginning Database Design has become one of the leading titles on this important topic—databases are everywhere and database design belongs to the core body of knowledge of any serious software developer. This edition has retained all the qualities that made the first one successful, including Clare’s lucid writing and humor, and if the page count has increased it has mostly been to include exercises allowing readers to test their understanding and compare their solutions to the answers that are provided. As the technical reviewer once again, I was in a privileged position to witness the small improvements—there wasn’t that much to improve—that Clare has brought to her book, clarifying a sentence here, improving an example there. There is a great quote by SaintExupéry, the author of The Litte Prince, that says that perfection is achieved not when there is nothing left to add, but when there is nothing left to remove. I am sure that Clare will agree with me that this remark, written with aircraft engineering in mind, applies to database design as well. I also feel that there is nothing to remove from this book. Stéphane Faroult Database, SQL, and Performance Consultant RoughSea Limited About the Author Clare Churcher (B.Sc. [Honors], Ph.D.) has designed and implemented databases for a variety of clients and research projects. She is currently the Head of the Applied Computing Department at Lincoln University in Lincoln, Canterbury, New Zealand. Clare has designed and delivered a range of courses including analysis and design of information systems, databases, and programming. She has received a university teaching award in recognition of her expertise in communicating her knowledge. Clare has road–tested her design principles by supervising over 70 undergraduate group database design projects. Examples from these real–life situations are used to illustrate the ideas in this book. About the Technical Reviewer Stéphane Faroult first discovered relational databases and the SQL language back in 1983. He joined Oracle France in its early days (after a brief spell with IBM and a bout of teaching at the University of Ottawa) and soon developed an interest in performance and tuning topics. After leaving Oracle in 1988, he briefly tried to reform and did a bit of operational research; but after one year, he succumbed again to relational databases. He has been continuously performing database consultancy since then, and founded RoughSea Limited in 1998. He is the author of The Art of SQL (O’Reilly, 2006) and of Refactoring SQL Applications (O’Reilly, 2008). Acknowledgments Thanks to my family, friends, and colleagues who helped with the two editions of this book. First of all, I want to say thanks very much to my husband, Neville, for introducing me to this subject a long time ago and for always being prepared to offer advice and support. Thanks also to all my friends and colleagues at Lincoln University for their interest and input. Most of the examples in these books are based on scenarios that have cropped up during my teaching at Lincoln. So, a big thank you to my students for all the quirky insights, understandings, and misunderstandings they have introduced me to over the last 19 years. Thanks again to my editor Jonathan Gennick for suggesting I write a second edition and providing helpful suggestions, and also to Stéphane Faroult for his good–humored expertise as technical reviewer. Introduction Everyone keeps data. Big organizations spend millions to look after their payroll, customer, and transaction data. The penalties for getting it wrong are severe: businesses may collapse, shareholders and customers lose money, and for many organizations (airlines, health boards, energy companies), it is not exaggerating to say that even personal safety may be put at risk. And then there are the lawsuits. The problems in successfully designing, installing, and maintaining such large databases are the subject of numerous books on data management and software engineering. However, many small databases are used within large organizations and also for small businesses, clubs, and private concerns. When these go wrong, it doesn’t make the front page of the papers; but the costs, often hidden, can be just as serious. Where do we find these smaller electronic databases? Sports clubs will have membership information and match results; small businesses might maintain their own customer data. Within large organizations, there will also be a number of small projects to maintain data information that isn’t easily or conveniently managed by the large system–wide databases. Researchers may keep their own experiment and survey results; groups will want to manage their own rosters or keep track of equipment; departments may keep their own detailed accounts and submit just a summary to the organization’s financial software. Most of these small databases are set up by end users. These are people whose main job is something other than that of a computer professional. They will typically be scientists, administrators, technicians, accountants, or teachers, and many will have only modest skills when it comes to spreadsheet or database software. The resulting databases often do not live up to expectations. Time and energy is expended to set up a few tables in a database product such as Microsoft Access, or in setting up a spreadsheet in a product such as Excel. Even more time is spent collecting and keying in data. But invariably (often within a short time frame) there is a problem producing what seems to be a quite simple report or query. Often this is because the way the tables have been set up makes the required result very awkward, if not impossible, to achieve. Getting It Wrong A database that does not fulfill expectations becomes a costly exercise in more ways than one. We clearly have the cost of the time and effort expended on setting up an unsatisfactory application. However, a much more serious problem is the inability to make the best use of valuable data. This is especially so for research data. Scientific and social researchers may spend considerable money and many years designing experiments, hiring assistants, and collecting and analyzing data, but often very little thought goes into storing it in an appropriately designed database. Unfortunately, some quite simple mistakes in design can mean that much of the potential information is lost. The immediate objective may be satisfied, but unforeseen uses of the data may be seriously compromised. Next year’s grant opportunities are lost. Another hidden cost comes from inaccuracies in the data. Poor database design allows what should be avoidable inconsistencies to be present in the data. Poor handling of categories can cause summaries and reports to be misleading or, to be blunt, wrong. In large organizations, the accumulated effects of each department’s inaccurate summary information may go unnoticed. Problems with a database are not necessarily caused by a lack of knowledge about the database product itself (though this will eventually become a constraint) but are often the result of having chosen the wrong attributes to group together in a particular table. This comes about for two main reasons: The creator does not have a clear idea of what information the database is meant to be delivering in the short and medium term The creator does not have a clear model of the different classes of data and their relationships to each other This book describes techniques for gaining a precise understanding of what a problem is about, how to develop a conceptual model of the data involved, and how to translate that model into a database design. You’ll learn to design better databases. You’ll avoid the cost of “getting it wrong.” Create a Data Model The chasm between having a basic idea of what your database needs to be able to do and designing the appropriate tables is bridged by having a clear data model. Data modeling involves thinking very carefully about the different sets or classes of data needed for a particular problem. Here is a very simple textbook example: a small business might have customers, products, and orders. We need to record a customer’s name. That clearly belongs with our set of customer data. What about address? Now, does that mean the customer’s contact address (in which case it belongs to the customer data) or where we are shipping the order (in which case it belongs with information about the order)? What about discount rate? Does that belong with the customer (some are gold card customers), or the product (dinner sets are on special at the moment), or the order (20% off orders over $400.00), or none of the above, or all of the above, or does it depend on the boss’s mood? Getting the correct answers to these questions is obviously vital if you are going to provide a useful database for yourself or your client. It is no good heading up a column in your spreadsheet “Discount” before you have a very precise understanding of exactly what a discount means in the context of the current problem. Data modeling– diagrams provide very precise and easy–to–interpret documentation for answers to questions such as those just posed. Even more importantly, the process of constructing a data model leads you to ask the questions in the first place. It is this, more than anything else, that makes data modeling such a useful tool. The data models we will be looking at in this book are small. They may represent small problems in their entirety, but more likely they will be small parts of larger problems. The emphasis will be on looking very carefully at the relationships between a few classes of data and getting the detail right. This means using the first attempts at the model to form questions for the user, to find the exceptions (before they find you), and then to make some pragmatic decisions about how much of the detail is necessary to make a useful database. Without a good data model, any database is pretty much doomed before it is started. Data models are often represented visually using some sort of diagram. Diagrams allow you to take in a large amount of information at a glance, giving you the ability to quickly get the gist of a database design without having to read a lot of text. We will be using the class diagram notation from UML to represent our data models, but many other notations are equally useful. Database Impleme...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture