structures, this mapping should be collected and maintained as metadata. With the advent of modern data warehouse development tools, the mapping process becomes virtually indistinguishable from the process of transformation. Extraction History Whenever historical information is analyzed, meticulous update records must be kept. Often a decision maker begins the process of constructing a time-based report by reviewing the extraction history because any changes to the business rules must be ascertained in order to apply the right rules to the right data. If sales regions were reallocated in 2004, then the decision maker will know that
results prior to that date may not be directly comparable with more recent results. Algorithms for Summarization A typical data warehouse contains a wide variety of lightly and heavily summarized data, as well as fully detailed records. The summarization algorithms applied to the detail data are important to any decision maker analyzing or interpreting the meaning of the summaries. These metadata can also save time by making it easier to decide which level of summarization is most appropriate for a given analysis context. Data Ownership Operational data stores are often "owned" by particular business units or divisions within an organization. In a DW environment, however, all data are stored in a common format and are normally accessible to all, which makes it necessary to identify the originator of each set of data, so that inquiries and corrections can be made to the proper group. It is useful to distinguish between "ownership" of data in the operational environment and "stewardship" in the DW. The administrators of the DW are responsible for the collection, summarization, and dissemination of warehouse data, and in this regard are the caretakers or stewards of the data. The administrators of the source data, however, are responsible for the accuracy of the transaction-level data and are the actual owners of the data. Patterns of Warehouse Access It is often desirable to record patterns of access to the warehouse for the purpose of optimizing and tuning DW performance. Understanding what tables are being accessed, how often, and by whom can alert the DW administrators to ways of improving or simplifying the queries being performed by the end users. Less frequently used data can be migrated to cheaper storage media, and various methods can be employed to accelerate access to the data that are most in demand. Further, the identification and recording of queries can be a valuable resource to the organization because it can facilitate the reuse of queries. Instead of spending the time to figure out how to construct a new query, a decision maker can simply access a repository of past queries and choose the one that most closely resembles the immediate need.
You've reached the end of your free preview.
Want to read all 35 pages?
- Fall '17