For example a lab test each row tuple is a parcular

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: the server. •  An important concept in the design of a database is normaliza+on. The idea is to remove as much redundancy as possible when crea;ng the tables. This is done by breaking the full dataset into separate tables. •  The “rela;onal” in RDBMS comes from the fact that we then need to link the tables together. •  For now let’s talk about a single table.... •  A table is a rectangular arrangement of values, where a row represents a case, and a column represents a variable (just like a data frame in R). •  Another term for a table is a rela+on. The rows are referred to as tuples and the columns as a7ributes. Missing value •  An en+ty is the general object of interest. For example, a lab test. Each row (tuple) is a par;cular occurrence of the en;ty. This means that rows in the table are unique. •  To iden;fy each row, we use a key. A key is just an a>ribute or a combina;on of a>ributes that uniquely iden;fies the cases. •  In the lab test example, we need a composite key of both pa;ent ID and date, since neither is necessarily unique. •  In R, the row names of a data frame play a similar role. •  SQL allows us to interac;vely query the database to reduce the data by subse^ng, grouping, or aggrega;on. •  Each database program tends to have its own version of SQL, but they all support the same basic SQL statements. (We say statements rather than commands because SQL is referred to as a declara;ve rather than an impera;ve language.) •  The SQL statement for retrieving data is the SELECT statement. This operates on one or more tables. The result will always be another table. We have a table called chips, with data about the CPU development of PCs over ;me The simplest possible query gives back everything: SELECT...
View Full Document

This document was uploaded on 02/16/2014 for the course STATISTICS 3026 at Columbia.

Ask a homework question - tutors are online