Handling Big Dimensions in Distributed Data Warehouses using the DWS Technique

Handling Big Dimensions in Distributed Data Warehouses using the DWS Technique

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Handling Big Dimensions in Distributed DataWarehouses using the DWS Technique M. Naresh Kumar B. Kishore madarapu.naresh@gmail.com (IV th CSE) raju09987@gmail.com ABSTRACT The DWS (Data Warehouse Striping) technique allows the distribution of large data warehouses through a cluster of computers. The data partitioning approach partition the facts tables through all nodes and replicates the dimension tables. The replication of the dimension tables creates a limitation to the applicability of the DWS technique to data warehouses with big dimensions. This paper proposes a strategy to handle large dimensions in a distributed DWS system and evaluates the proposed strategy experimentally. With the proposed strategy the performance speed up and scale up obtained in the DWS technique are not affected by the presence of big dimensions. Furthermore, it extends the scope of the technique to queries that browse big dimensions that can also benefit of the performance increase of the DWS technique. Keywords : Data warehousing, distributed query execution. 1. INTRODUCTION Data warehousing applications typically involve massive amounts of data that push database management technology to the limit. A scalable architecture is crucial, not only to handle very large amount of data but also to assure interactive response time to the OLAP (On- Line Analytical Processing) users. In fact, the decision making process using OLAP is often based on a sequence of interactive queries. That is, the answer of one query immediately sets the need for a second query, and the answering of this second query raises another query, and so on and so forth in an adhoc manner. In order to assure acceptable response time to allow the interactive OLAP querying style, even when the data warehouse becomes extremely large in size, data warehouses implementation normally use very expensive platforms, typically based on high-end servers or high- performance clusters. The use of classical parallel processing techniques proposed to relational database systems is also common in big data warehouses .Two
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
types of parallelism can be explored at the query level: inter- query parallelism , wherein multiple transactions are executed in parallel in a multiprocessor environment, and intra-query parallelism , where several processors cooperate to concurrently execute a single SQL statement. The latter is particular interesting to the complex queries executed in a data warehousing as the parallelism is used to improve performance through parallel implementation of the various operations of the query execution plan. However, the use of parallelism in the complex data warehouse queries is clearly more difficult and less effective than the parallel execution of multiple small transactions that characterize typical database applications in an on-line transaction processing (OLTP) environments. Another possibility for high volumes of data is to distribute the data across multiple data warehouses in such a
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 03/26/2011 for the course IT 101 taught by Professor Dontknow during the Spring '07 term at Northern Virginia.

Page1 / 11

Handling Big Dimensions in Distributed Data Warehouses using the DWS Technique

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online