Database Design
1 / 122
Term:
Definition:
Show example sentence
Show hint
Keyboard Shortcuts
  • Previous
  • Next
  • F Flip card

Complete list of Terms and Definitions for Database Design

Terms Definitions
What are the names of the active suppliers of nuts (DRC)? { <k> | (∃jl)(<jklm> ∈ S Λ l='Active' Λ     (∃st)(<stu> ∈ SP Λ j=s Λ        (∃no)(<nopqr> ∈ P Λ n=t Λ o='Nut'))) }
Covers (of sets of FD's) Let G and H be sets of FD's. G covers H if every FD in H is also in G+.
Transaction States Active: Xact is being executed Partially Committed: Xact's actions are complete but changes may not yet be on disk Committed: Xact's changes are on disk Failed: The Xact cannot finish, but partial changes still linger Aborted: All changes have been cleared away
Closure (of a set of attributes) Let A be a set of attributes. The closure of A, denoted A+, is the set of attributes that A functionally determines with respect to a set of FD's.
Logical Data Independence (LDI) Conceptual schema is a description of data and relationships HARD to achieve
Boyce-Codd Normal Form (BCNF) A relation R is in BCNF if, for every non-trivial FD X→A in R, X is a superkey of R.
Sort Key The field on which the records are sorted
Fourth Normal Form (4NF) A relational schema is in 4NF iff whenever there exist attribute subsets A and B of R such that A multidetermines B, then all attributes of R are also functionally dependent on A.
RAID (Level 1) Mirrored no striping opportunity for parallel reads 2x the number disks ($$$)
Physical Data Independence (PDI) Physical storage can be changed without bothering users EASIER to achieve than LDI
Null A marker that indicates that a field does not have a value
ANSI/SPARC (External Level) Describes end-user perspectives on the data
What are the average quantities in which suppliers are supplying parts (SQL)? SELECT sno, AVG(qty) FROM spj GROUP BY sno;
Data Storage Pyramid /    CPU Registers     \ /      Cache Memory      \ /    Main Memory (RAM)    \ / Solid-State Devices (SSDs) \ /  Hard (Magnetic Disk) Drives \ / Tapes/Optical Disks (Libraries) \    
Database An collection of related records or files consolidated into a common pool that provides data for one or more multiple uses.
What are the names and salaries of the people in department #5 (TRC)? { e.surname, e.givenname, e.salary | Employee(e) Λ e.deptid = 5 }
Entity An instance of a general classification
Disk Striping Advantages performance (parallelism) Disadvantages no data replication increased prob. of disk failure
Equivalence (of sets of FD's) Let G and H be sets of FD's. G and H are equivalent if G+ = H+.
Itemset The items contained within a grouping of selected items, ignoring quantities
DB Design Process Requirements Analysis - talk to the client Conceptual Design - create conceptual schema Logical Design - convert to DBMS' implementation data model (normalization) Physical Design - physical storage requirements
SQL Injection A DBMS attack in which a user tries to add (inject) SQL into an incomplete query to get the DBMS to reveal additional information
DBMS A software system that enable users to define, create, maintain and control access to the database.
What are the names of the parts that can be supplied by individual suppliers in quantity > 200 (TRC)? { p.Pname | P(p) Λ ∃q (SP(q) Λ p.P# = q.P# Λ q.Qty > 200) }
Hierarchy of Security Classes (MAC) TS - Top Secret S - Secret C - Classified U - Unclassified   TS > S > C > U
Foreign Key A field in one file whose values are drawn from the primary key field of another file
Data Mining The discovery of patterns in, or rules about, the content of a data warehouse
Support (of an itemset) The percentage of all transactions that contain the itemset
Minimal Cover (of a set of FD's) A minimal cover of a set of FD's G is a minimal set of FD's that is equivalent to G.
REVOKE (SQL) REVOKE <privilege> [ON <object>] FROM <user>;   Ex:   REVOKE ALL PRIVILEGES   ON supplier   FROM PUBLIC;
Object Data Model Could map object data into a tuple (awkward) Object persistence
Armstrong's Axioms Reflexivity: If B ⊆ A, then A→B Augmentation: If C→D, then:C→CD CE→D CE→DE Transitivity: If F→G and G→H, then F→H
Primary Key A chosen candidate key
Secondary Index The indexed field may be any field The index records are sorted The DB records are not sorted Can have as many secondary indices as needed
Trivial Functional Dependency An FD X→A is a trivial FD when A ⊆ X
Sparse Index Indexes only a subset of the field values Smaller, therefore faster searching
CREATE USER (SQL) CREATE USER <username> [<options>]; PASSWORD or IDENTIFIED BY <password> quotas limits on access (e.g. time) capabilities (to create DB's, to add users, etc.)
Integrity Be Able to recover DB's after accidents/disasters: Maintain off-site backups Log/journal DBMS actions Use a non-0-level of RAID
ANSI/SPARC (Internal Level) Defines record sizes, field representations, types of indices, etc.
Conceptual-Internal mapping Converts logical structure to physical representations, and provides PDI
GRANT (SQL) GRANT <privilege> [ON <object>] TO <user> [WITH GRANT OPTION];   Ex:   GRANT CREATE VIEW to smithj;   GRANT SELECT ON supplier TO PUBLIC;
Transaction A group of one or more operations treated as a single, indivisible unit of work
Applications of Data Mining Marketing - customer behavior, targetted emailing Finance - credit ratings, fraud detection Manufacturing - optimizing processes & resources Health care - drug side-effects, treatment effectiveness
If we have one of each part in a box, how much does the content weight (SQL)? SELECT SUM(weight) from p;
RAID (Level 0) Non-Redundant striping only good write performance no data redundancy
Multidependency Let R be a relational schema, and let A,B & C be subsets of R's attributes. B is multidependent on A (or A multidetermines B) iff the set of B values matching a given (A,C) pair of values depends only on the A set and is independent of the C set.
Bell-LaPadula Model Subjects (users, groups, etc.) have clearances; Objects (tables, tuples, etc.) have classifications Simple Security Property: S can read O only when class(S) ≥ class(O) Star Property: S can write O only when class(S) ≤ class(O)
Normalization The process of decomposing relations into relations of lower degree that no longer posses certain undesirable properties.
Prime Attribute An attribute that is a member of any CK of the relation
Data Design for Data Warehouses The basic OLAP schema is the Star Schema Fact table is at the center A variation: The Snowflake Schema Becomes more generalized away from the center
Primary Index Indexed field is a candidate key Indexed records are sorted on the CK The DB file records are sorted on the CK Can have only one per DB file
ANSI/SPARC (Conceptual Level) Defines field groupings, relationships between data, etc.
Disk Mirroring Advantages can operate with a failed drive Disadvantages cost
Create a Relation (SQL) CREATE TABLE <name> (      <attr. name> <data type> [NOT NULL],      ...      [PRIMARY KEY (<attr>)] );  
Advantages of a DBMS Data sharing Centralized control Redundancy control (normalization) Data integrity Data security Views Data independence
Disadvantages of Triggers Hard to write appropriate actions Specified separately from relations Can reduce concurrency of the DBMS Trigger interaction Rule Termination
Sources of Read/Write Delay Seek time Rotational delay Transfer time
Relationship An association between a set of entities
Data Warehouse A DBMS of data from many sources used to support an organization's decision-making
Triggers (SQL) Follow the ECA model: Event - causes the activation Condition - if true, action is performed Action - SQL statement(s)
Index A file consisting of structured references to records of another file
Querying a Data Warehouse Slicing: An equality selection in 1 or more dimensionsDicing: A range selection in 1 or more dimensionsPivoting: Changing the orientation (viewpoint) by rotating the n-dimensional array to 'hide' a dimension. Roll-up: Moves from the current level in a dimension to the next more general level (ex. Dept→College) Drill-down: Moves from the current level to the next more specific level (ex. Vegetable→Carrot)
Confidence (of Association Rules) The confidence of a rule is the fraction of itemsets containing L that also contain R.   conf(L=>R) = supp(L∪R) / supp(L)    
Blocking Factor The number of whole records that can be stored within a single block
Relational Data Model Theoretical foundation: set theory Uses no pointers No physical/logical schema distinction
Data Mart A small data warehouse supporting a portion of an organization
What are the names of the active suppliers of nuts (TRC)? { s.sname | S(s) Λ s.status = 'Active' Λ  (∃q)(SP(q) Λ s.S# = q.S# Λ (∃p)(P(p) Λ q.P# = p.P# Λ p.pname = 'Nut')) }
Order of Operations (Query Trees) Join(s) Select(s) Project(s)
Support (of Association Rules) The support of a rule is the fraction of transactions that satisfy the rule.   support(L=>R) = support(L∪R)
External-Conceptual mapping Allows field renamings and provides LDI
First Normal Form (1NF) A relation is in 1NF if its attributes are not set-valued.
Third Normal Form (3NF) A relation R is in 3NF if, for every non-trivial FD X→A that holds in R, either: X is a superkey of R, or A is a prime attribute of R.
Hierarchical Data Model Schema is tree-structured Only 1:M relationships (1 parent & M children)
Candidate Key A key that can uniquely identify a record
Influencing the Query Optimizer Change operators (ex. Do joins w/ nested queries) Insist on a join ordering (ex. Replace "from s,sp" with "s JOIN sp")
Algorithms for Project and Select Project: Sequential Scan of relation   Select: Sequential Scan (full table scan, linear search) Binary Search Index Scan
Architecture (DBMS) A description of DBMS components and their interconnections
DBMS Security Issues Availability - to authorized users only Confidentiality - preserve secrecy Integrity - prevent data corruption and/or loss
Query Optimization The process of generating and selecting different potential query operation orderings
Secondary Key Any other field besides the primary key field
What are the names of the parts that can be supplied by individual suppliers in qty > 200 (Relational Alg.)? ∏pname (σqty>200 (σSP.P#=P.P# (SP × P))) or   ∏pname (σqty>200 (SP ⋈SP.P#=P.P# P)
Theta Join r ⋈θ s where θ is any PK-FK comparison
Natural Join An equijoin in which columns with matching names in both tables are compared, and only one of the matching columns will appear in the resulting table
B-Tree of Order M Each node contains at most 2M keys Each node (excepting the root) has at least M keys A non-leaf node has at least 2 children All leaf nodes are at the same level A non-leaf node of n keys has n+1 children
Clustered Index The indexed field is not a CK The indexed records are sorted on the key The DB records are sorted on the key
Functional Determination The set of attributes X functionally determines the set of attributes Y iff whenever any two tuples of the relation agree on their X values, they must also agree on their Y values.
Second Normal Form (2NF) A relation R is in 2NF if every non-prime attribute of R is fully functionally dependent upon every CK of R.
Outer Joins Left Outer Join: retains unmatched tuples from left relation Right Outer Join: retains unmatched tuples from right relation Full Outer Join: retains all unmatched tuples
Confidentiality Provide security from a more 'external' perspective, via: Quality passwords Data encryption (public-key encryption, digital signatures, etc.)
Deleting Tuples (SQL) DELETE FROM <relation name> WHERE <condition>;
Data Flow Cleaning - attempts to detect/correct errors and omissions in the data Reformatting - structures the data to match the schema of the DW Metadata ("data about data") - shows the origins of the DW's content Backflushing - returning cleaned data to the source DBMSes for re-integration
Find the names & salaries of the employees in department #5 (Relational Alg.) ∏givenname,surname,salary (σdept#=5(Employee))
Dense Index Holds one record for each in DB file Can do 'existence check' on just the index
Disadvantages of a DBMS DB design is complex Cost (hardware/software) Availability (single point of failure) Speed vs. Flexibility
RAID (Level 3) Bit-Interleaved Parity striping unit = 1 bit can recover from disk failure
Disadvantages of a B+-Tree 'Wastes' storage capacity of internal nodes (relatively minimal) Insertions and deletions are a bit more complex
Closure (of a set of FD's) Let G be a set of FD's. The closure of G, denoted G+, is the set of FD's that are implied by G, including those of G itself.
Updating Content of Tuples (SQL) UPDATE <relation name> SET <attr. name> = <expression> [,...] [FROM <relation list>] [WHERE <condition>];   Ex:   UPDATE m SET name = 'Ray' where id = 1;
Characteristics of a Data Warehouse Usually orders of magnitude larger than an OLTP system Used to generate data summaries and to identify trends Often known as On-line Analytical Processing (OLAP) Serves as a key input source for data mining algorithms
Join Algorithms Nested-Loops Join (NLJ) Sort-Merge Join (SMJ) Hash Join
RAID (Level 5) Block-Interleaved Distributed Parity striping unit = 1 block parity blocks are distributed across all disks fairly complex controller design
Schema The overall description of a database
Preventing SQL Injection Sanity-check all user input (i.e. range-checking) Trim excess characters from input strings Accept raw input only when necessary (i.e. use option lists)
Basic Trigger Syntax (Oracle) CREATE TRIGGER <name> {BEFORE/AFTER} {INSERT/DELETE/UPDATE} ON <relation> [ [FOR EACH ROW] WHEN (<condition>) ] <PL/SQL block>;
Full Functional Dependency An FD X→Y is a full functional dependency if removing any attribute from X destroys the dependency.
Equijoin A theta join where θ is an equality comparison between PK & FK
What are the names of the active suppliers of nuts (Relational Alg.)? ∏sname (σstatus>0 ∧ pname='Nut' (σS.S#=SP.S# (σP.P#=SP.P# (S × (SP × P)))))
Data Languages Data Description Language - turns conceptual schema into physical DB description Data Manipulation Language - insert/delete data Query Language - ask questions Data Control Language - security
Goals of Data Mining Predictions of future events (DANGEROUS) Identification of events (ex. system intrusion) Classification (ex. Which discounts appeal to which customers?)
Creating a View (SQL) CREATE VIEW <name> [(<attr. list>)] AS <select stmt>; CREATE VIEW supplierpart("Supplier Name","Part #") AS SELECT DISTINCT sname, pno      FROM s, spj      WHERE s.sno = spj.sno;  
What are the names of the parts that can be supplied by individual suppliers in quantity > 200 (DRC)? { <o> | (∃n)(<nopqr> ∈ P Λ (∃tu)(<stu> ∈ SP Λ n=t Λ u > 200)) }
Network Data Model Graph-based A std. theory of DB Systems
What is the content of the Employee Relation (DRC)? {<efghi> | <efghi> ∈ Employee }
B+-Tree Each key is stored in a leaf node Copies of some keys occupy internal nodes Leaf nodes are linked sequentially left to right to create a sequence set Linear searching No extra storage required
Superkey A superkey is a set of attributes that includes a candidate key.
Finding a Minimal Cover of a FD Set Put FD's in Standard Form Minimize the LHS of the FD's Remove redundant FD's
ACID Properties of Transactions Atomicity: Xacts are all or nothing Consistency: An Xact's actions retain the consistency of the DB Isolation: Each Xact thinks it's the only Xact in the system Durability: A completed Xact's changes are permanent - won't be lost  
What are the names and salaries of the people in department #5 (DRC)? { <fei> | (∃h)(<efghi> ∈ Employee Λ h=5) }
Minimal Sets (of FD's) A set of FD's is minimal if all of the following hold: All FD's have only 1 attribute on their RHS No attribute can be removed from the LHS of any FD w/o changing the closure All of the FD's in the set are needed to retain the closure
Advantages of a B+-Tree Has built-in disk pointers for keys in leaves Supports exact-match and range queries  
DBMS Components Database Database Administrator Application Programs Hardware
Deleting Relations (SQL) DROP [TABLE|INDEX|VIEW|DATABASE] <name>;