| Terms |
Definitions |
|
What are the names of the active suppliers of nuts (DRC)?
|
{ <k> | (∃jl)(<jklm> ∈ S Λ l='Active' Λ
(∃st)(<stu> ∈ SP Λ j=s Λ
(∃no)(<nopqr> ∈ P Λ n=t Λ
o='Nut'))) }
|
|
Covers (of sets of FD's)
|
Let G and H be sets of FD's. G covers H if every FD in H is also in G+.
|
|
Transaction States
|
Active: Xact is being executed
Partially Committed: Xact's actions are complete but changes may not yet be on disk
Committed: Xact's changes are on disk
Failed: The Xact cannot finish, but partial changes still linger
Aborted: All changes have been cleared away
|
|
Closure (of a set of attributes)
|
Let A be a set of attributes. The closure of A, denoted A+, is the set of attributes that A functionally determines with respect to a set of FD's.
|
|
Logical Data Independence (LDI)
|
Conceptual schema is a description of data and relationships
HARD to achieve
|
|
Boyce-Codd Normal Form (BCNF)
|
A relation R is in BCNF if, for every non-trivial FD X→A in R, X is a superkey of R.
|
|
Sort Key
|
The field on which the records are sorted
|
|
Fourth Normal Form (4NF)
|
A relational schema is in 4NF iff whenever there exist attribute subsets A and B of R such that A multidetermines B, then all attributes of R are also functionally dependent on A.
|
|
RAID (Level 1)
|
Mirrored
no striping
opportunity for parallel reads
2x the number disks ($$$)
|
|
Physical Data Independence (PDI)
|
Physical storage can be changed without bothering users
EASIER to achieve than LDI
|
|
Null
|
A marker that indicates that a field does not have a value
|
|
ANSI/SPARC (External Level)
|
Describes end-user perspectives on the data
|
|
What are the average quantities in which suppliers are supplying parts (SQL)?
|
SELECT sno, AVG(qty)
FROM spj
GROUP BY sno;
|
|
Data Storage Pyramid
|
/ CPU Registers \
/ Cache Memory \
/ Main Memory (RAM) \
/ Solid-State Devices (SSDs) \
/ Hard (Magnetic Disk) Drives \
/ Tapes/Optical Disks (Libraries) \
|
|
Database
|
An collection of related records or files consolidated into a common pool that provides data for one or more multiple uses.
|
|
What are the names and salaries of the people in department #5 (TRC)?
|
{ e.surname, e.givenname, e.salary | Employee(e) Λ e.deptid = 5 }
|
|
Entity
|
An instance of a general classification
|
|
Disk Striping
|
Advantages
performance (parallelism)
Disadvantages
no data replication
increased prob. of disk failure
|
|
Equivalence (of sets of FD's)
|
Let G and H be sets of FD's. G and H are equivalent if G+ = H+.
|
|
Itemset
|
The items contained within a grouping of selected items, ignoring quantities
|
|
DB Design Process
|
Requirements Analysis - talk to the client
Conceptual Design - create conceptual schema
Logical Design - convert to DBMS' implementation data model (normalization)
Physical Design - physical storage requirements
|
|
SQL Injection
|
A DBMS attack in which a user tries to add (inject) SQL into an incomplete query to get the DBMS to reveal additional information
|
|
DBMS
|
A software system that enable users to define, create, maintain and control access to the database.
|
|
What are the names of the parts that can be supplied by individual suppliers in quantity > 200 (TRC)?
|
{ p.Pname | P(p) Λ ∃q (SP(q) Λ p.P# = q.P#
Λ q.Qty > 200) }
|
|
Hierarchy of Security Classes (MAC)
|
TS - Top Secret
S - Secret
C - Classified
U - Unclassified
TS > S > C > U
|
|
Foreign Key
|
A field in one file whose values are drawn from the primary key field of another file
|
|
Data Mining
|
The discovery of patterns in, or rules about, the content of a data warehouse
|
|
Support (of an itemset)
|
The percentage of all transactions that contain the itemset
|
|
Minimal Cover (of a set of FD's)
|
A minimal cover of a set of FD's G is a minimal set of FD's that is equivalent to G.
|
|
REVOKE (SQL)
|
REVOKE <privilege>
[ON <object>]
FROM <user>;
Ex:
REVOKE ALL PRIVILEGES
ON supplier
FROM PUBLIC;
|
|
Object Data Model
|
Could map object data into a tuple (awkward)
Object persistence
|
|
Armstrong's Axioms
|
Reflexivity: If B ⊆ A, then A→B
Augmentation: If C→D, then:C→CD
CE→D
CE→DE
Transitivity: If F→G and G→H, then F→H
|
|
Primary Key
|
A chosen candidate key
|
|
Secondary Index
|
The indexed field may be any field
The index records are sorted
The DB records are not sorted
Can have as many secondary indices as needed
|
|
Trivial Functional Dependency
|
An FD X→A is a trivial FD when A ⊆ X
|
|
Sparse Index
|
Indexes only a subset of the field values
Smaller, therefore faster searching
|
|
CREATE USER (SQL)
|
CREATE USER <username> [<options>];
PASSWORD or IDENTIFIED BY <password>
quotas
limits on access (e.g. time)
capabilities (to create DB's, to add users, etc.)
|
|
Integrity
|
Be Able to recover DB's after accidents/disasters:
Maintain off-site backups
Log/journal DBMS actions
Use a non-0-level of RAID
|
|
ANSI/SPARC (Internal Level)
|
Defines record sizes, field representations, types of indices, etc.
|
|
Conceptual-Internal mapping
|
Converts logical structure to physical representations, and provides PDI
|
|
GRANT (SQL)
|
GRANT <privilege>
[ON <object>]
TO <user>
[WITH GRANT OPTION];
Ex:
GRANT CREATE VIEW to smithj;
GRANT SELECT ON supplier TO PUBLIC;
|
|
Transaction
|
A group of one or more operations treated as a single, indivisible unit of work
|
|
Applications of Data Mining
|
Marketing - customer behavior, targetted emailing
Finance - credit ratings, fraud detection
Manufacturing - optimizing processes & resources
Health care - drug side-effects, treatment effectiveness
|
|
If we have one of each part in a box, how much does the content weight (SQL)?
|
SELECT SUM(weight) from p;
|
|
RAID (Level 0)
|
Non-Redundant
striping only
good write performance
no data redundancy
|
|
Multidependency
|
Let R be a relational schema, and let A,B & C be subsets of R's attributes. B is multidependent on A (or A multidetermines B) iff the set of B values matching a given (A,C) pair of values depends only on the A set and is independent of the C set.
|
|
Bell-LaPadula Model
|
Subjects (users, groups, etc.) have clearances; Objects (tables, tuples, etc.) have classifications
Simple Security Property: S can read O only when class(S) ≥ class(O)
Star Property: S can write O only when class(S) ≤ class(O)
|
|
Normalization
|
The process of decomposing relations into relations of lower degree that no longer posses certain undesirable properties.
|
|
Prime Attribute
|
An attribute that is a member of any CK of the relation
|
|
Data Design for Data Warehouses
|
The basic OLAP schema is the Star Schema
Fact table is at the center
A variation: The Snowflake Schema
Becomes more generalized away from the center
|
|
Primary Index
|
Indexed field is a candidate key
Indexed records are sorted on the CK
The DB file records are sorted on the CK
Can have only one per DB file
|
|
ANSI/SPARC (Conceptual Level)
|
Defines field groupings, relationships between data, etc.
|
|
Disk Mirroring
|
Advantages
can operate with a failed drive
Disadvantages
cost
|
|
Create a Relation (SQL)
|
CREATE TABLE <name> (
<attr. name> <data type> [NOT NULL],
...
[PRIMARY KEY (<attr>)]
);
|
|
Advantages of a DBMS
|
Data sharing
Centralized control
Redundancy control (normalization)
Data integrity
Data security
Views
Data independence
|
|
Disadvantages of Triggers
|
Hard to write appropriate actions
Specified separately from relations
Can reduce concurrency of the DBMS
Trigger interaction
Rule Termination
|
|
Sources of Read/Write Delay
|
Seek time
Rotational delay
Transfer time
|
|
Relationship
|
An association between a set of entities
|
|
Data Warehouse
|
A DBMS of data from many sources used to support an organization's decision-making
|
|
Triggers (SQL)
|
Follow the ECA model:
Event - causes the activation
Condition - if true, action is performed
Action - SQL statement(s)
|
|
Index
|
A file consisting of structured references to records of another file
|
|
Querying a Data Warehouse
|
Slicing: An equality selection in 1 or more dimensionsDicing: A range selection in 1 or more dimensionsPivoting: Changing the orientation (viewpoint) by rotating the n-dimensional array to 'hide' a dimension.
Roll-up: Moves from the current level in a dimension to the next more general level (ex. Dept→College)
Drill-down: Moves from the current level to the next more specific level (ex. Vegetable→Carrot)
|
|
Confidence (of Association Rules)
|
The confidence of a rule is the fraction of itemsets containing L that also contain R.
conf(L=>R) = supp(L∪R) / supp(L)
|
|
Blocking Factor
|
The number of whole records that can be stored within a single block
|
|
Relational Data Model
|
Theoretical foundation: set theory
Uses no pointers
No physical/logical schema distinction
|
|
Data Mart
|
A small data warehouse supporting a portion of an organization
|
|
What are the names of the active suppliers of nuts (TRC)?
|
{ s.sname | S(s) Λ s.status = 'Active' Λ
(∃q)(SP(q) Λ s.S# = q.S# Λ
(∃p)(P(p) Λ q.P# = p.P# Λ
p.pname = 'Nut')) }
|
|
Order of Operations (Query Trees)
|
Join(s)
Select(s)
Project(s)
|
|
Support (of Association Rules)
|
The support of a rule is the fraction of transactions that satisfy the rule.
support(L=>R) = support(L∪R)
|
|
External-Conceptual mapping
|
Allows field renamings and provides LDI
|
|
First Normal Form (1NF)
|
A relation is in 1NF if its attributes are not set-valued.
|
|
Third Normal Form (3NF)
|
A relation R is in 3NF if, for every non-trivial FD X→A that holds in R, either:
X is a superkey of R, or
A is a prime attribute of R.
|
|
Hierarchical Data Model
|
Schema is tree-structured
Only 1:M relationships (1 parent & M children)
|
|
Candidate Key
|
A key that can uniquely identify a record
|
|
Influencing the Query Optimizer
|
Change operators (ex. Do joins w/ nested queries)
Insist on a join ordering (ex. Replace "from s,sp" with "s JOIN sp")
|
|
Algorithms for Project and Select
|
Project: Sequential Scan of relation
Select:
Sequential Scan (full table scan, linear search)
Binary Search
Index Scan
|
|
Architecture (DBMS)
|
A description of DBMS components and their interconnections
|
|
DBMS Security Issues
|
Availability - to authorized users only
Confidentiality - preserve secrecy
Integrity - prevent data corruption and/or loss
|
|
Query Optimization
|
The process of generating and selecting different potential query operation orderings
|
|
Secondary Key
|
Any other field besides the primary key field
|
|
What are the names of the parts that can be supplied by individual suppliers in qty > 200 (Relational Alg.)?
|
∏pname (σqty>200 (σSP.P#=P.P# (SP × P)))
or
∏pname (σqty>200 (SP ⋈SP.P#=P.P# P)
|
|
Theta Join
|
r ⋈θ s where θ is any PK-FK comparison
|
|
Natural Join
|
An equijoin in which columns with matching names in both tables are compared, and only one of the matching columns will appear in the resulting table
|
|
B-Tree of Order M
|
Each node contains at most 2M keys
Each node (excepting the root) has at least M keys
A non-leaf node has at least 2 children
All leaf nodes are at the same level
A non-leaf node of n keys has n+1 children
|
|
Clustered Index
|
The indexed field is not a CK
The indexed records are sorted on the key
The DB records are sorted on the key
|
|
Functional Determination
|
The set of attributes X functionally determines the set of attributes Y iff whenever any two tuples of the relation agree on their X values, they must also agree on their Y values.
|
|
Second Normal Form (2NF)
|
A relation R is in 2NF if every non-prime attribute of R is fully functionally dependent upon every CK of R.
|
|
Outer Joins
|
Left Outer Join: retains unmatched tuples from left relation
Right Outer Join: retains unmatched tuples from right relation
Full Outer Join: retains all unmatched tuples
|
|
Confidentiality
|
Provide security from a more 'external' perspective, via:
Quality passwords
Data encryption (public-key encryption, digital signatures, etc.)
|
|
Deleting Tuples (SQL)
|
DELETE FROM <relation name>
WHERE <condition>;
|
|
Data Flow
|
Cleaning - attempts to detect/correct errors and omissions in the data
Reformatting - structures the data to match the schema of the DW
Metadata ("data about data") - shows the origins of the DW's content
Backflushing - returning cleaned data to the source DBMSes for re-integration
|
|
Find the names & salaries of the employees in department #5 (Relational Alg.)
|
∏givenname,surname,salary (σdept#=5(Employee))
|
|
Dense Index
|
Holds one record for each in DB file
Can do 'existence check' on just the index
|
|
Disadvantages of a DBMS
|
DB design is complex
Cost (hardware/software)
Availability (single point of failure)
Speed vs. Flexibility
|
|
RAID (Level 3)
|
Bit-Interleaved Parity
striping unit = 1 bit
can recover from disk failure
|
|
Disadvantages of a B+-Tree
|
'Wastes' storage capacity of internal nodes (relatively minimal)
Insertions and deletions are a bit more complex
|
|
Closure (of a set of FD's)
|
Let G be a set of FD's. The closure of G, denoted G+, is the set of FD's that are implied by G, including those of G itself.
|
|
Updating Content of Tuples (SQL)
|
UPDATE <relation name>
SET <attr. name> = <expression> [,...]
[FROM <relation list>]
[WHERE <condition>];
Ex:
UPDATE m SET name = 'Ray' where id = 1;
|
|
Characteristics of a Data Warehouse
|
Usually orders of magnitude larger than an OLTP system
Used to generate data summaries and to identify trends
Often known as On-line Analytical Processing (OLAP)
Serves as a key input source for data mining algorithms
|
|
Join Algorithms
|
Nested-Loops Join (NLJ)
Sort-Merge Join (SMJ)
Hash Join
|
|
RAID (Level 5)
|
Block-Interleaved Distributed Parity
striping unit = 1 block
parity blocks are distributed across all disks
fairly complex controller design
|
|
Schema
|
The overall description of a database
|
|
Preventing SQL Injection
|
Sanity-check all user input (i.e. range-checking)
Trim excess characters from input strings
Accept raw input only when necessary (i.e. use option lists)
|
|
Basic Trigger Syntax (Oracle)
|
CREATE TRIGGER <name>
{BEFORE/AFTER} {INSERT/DELETE/UPDATE}
ON <relation>
[ [FOR EACH ROW] WHEN (<condition>) ]
<PL/SQL block>;
|
|
Full Functional Dependency
|
An FD X→Y is a full functional dependency if removing any attribute from X destroys the dependency.
|
|
Equijoin
|
A theta join where θ is an equality comparison between PK & FK
|
|
What are the names of the active suppliers of nuts (Relational Alg.)?
|
∏sname (σstatus>0 ∧ pname='Nut'
(σS.S#=SP.S# (σP.P#=SP.P#
(S × (SP × P)))))
|
|
Data Languages
|
Data Description Language - turns conceptual schema into physical DB description
Data Manipulation Language - insert/delete data
Query Language - ask questions
Data Control Language - security
|
|
Goals of Data Mining
|
Predictions of future events (DANGEROUS)
Identification of events (ex. system intrusion)
Classification (ex. Which discounts appeal to which customers?)
|
|
Creating a View (SQL)
|
CREATE VIEW <name>
[(<attr. list>)]
AS <select stmt>;
CREATE VIEW supplierpart("Supplier Name","Part #")
AS SELECT DISTINCT sname, pno
FROM s, spj
WHERE s.sno = spj.sno;
|
|
What are the names of the parts that can be supplied by individual suppliers in quantity > 200 (DRC)?
|
{ <o> | (∃n)(<nopqr> ∈ P Λ
(∃tu)(<stu> ∈ SP Λ n=t Λ
u > 200)) }
|
|
Network Data Model
|
Graph-based
A std. theory of DB Systems
|
|
What is the content of the Employee Relation (DRC)?
|
{<efghi> | <efghi> ∈ Employee }
|
|
B+-Tree
|
Each key is stored in a leaf node
Copies of some keys occupy internal nodes
Leaf nodes are linked sequentially left to right to create a sequence set
Linear searching
No extra storage required
|
|
Superkey
|
A superkey is a set of attributes that includes a candidate key.
|
|
Finding a Minimal Cover of a FD Set
|
Put FD's in Standard Form
Minimize the LHS of the FD's
Remove redundant FD's
|
|
ACID Properties of Transactions
|
Atomicity: Xacts are all or nothing
Consistency: An Xact's actions retain the consistency of the DB
Isolation: Each Xact thinks it's the only Xact in the system
Durability: A completed Xact's changes are permanent - won't be lost
|
|
What are the names and salaries of the people in department #5 (DRC)?
|
{ <fei> | (∃h)(<efghi> ∈ Employee Λ h=5) }
|
|
Minimal Sets (of FD's)
|
A set of FD's is minimal if all of the following hold:
All FD's have only 1 attribute on their RHS
No attribute can be removed from the LHS of any FD w/o changing the closure
All of the FD's in the set are needed to retain the closure
|
|
Advantages of a B+-Tree
|
Has built-in disk pointers for keys in leaves
Supports exact-match and range queries
|
|
DBMS Components
|
Database
Database Administrator
Application Programs
Hardware
|
|
Deleting Relations (SQL)
|
DROP [TABLE|INDEX|VIEW|DATABASE] <name>;
|