A Taxonomy of Data Grids for Distributed Data Sharing, Management,
and Processing
SRIKUMAR VENUGOPAL, RAJKUMAR BUYYA, AND
KOTAGIRI RAMAMOHANARAO
University of Melbourne, Australia
Data Grids have been adopted as the next generation platform by many scientific communities that need
to share, access, transport, process, and manage large data collections distributed worldwide. They combine
high-end computing technologies with high-performance networking and wide-area storage management
techniques. In this article, we discuss the key concepts behind Data Grids and compare them with other
data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks, and
distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture,
data transportation, data replication and resource allocation, and scheduling. Finally, we map the proposed
taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future
exploration.
Categories and Subject Descriptors: H.3.4 [
Information Storage and Retrieval
]: Systems and Software—
Distributed systems
; C.2.4 [
Computer-Communication Networks
]: Distributed Systems—
Client/server
;
distributed applications
; J.2 [
Physical Sciences and Engineering
]; J.3 [
Life and Medical Sciences
]
General Terms: Design, Management
Additional Key Words and Phrases: Grid computing, data-intensive applications, virtual organizations,
replica management
1. INTRODUCTION
The next generation of scientific applications in domains as diverse as high energy
physics, molecular modeling, and earth sciences involve the production of large datasets
from simulations or from large-scale experiments. Analysis of these datasets and their
dissemination among researchers located over a wide geographic area requires high ca-
pacity resources such as supercomputers, high bandwidth networks, and mass storage
systems. Collectively, these large scale applications have come to be known as part of
This work is partially supported through the Australian Research Council (ARC) Discovery Project grant
and Storage Technology Corporation sponsorship of Grid Fellowship.
Authors’ address: R. Buyya, Grid Computing and Distributed Sytems Laboratory, Department of Computer
Science and Software Engineering, University of Melbourne, VIC 3010, Australia; email: [email protected]
edu.au.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for profit or direct commercial advantage and
that copies show this notice on the first page or initial screen of a display along with the full citation.
Copyrights for components of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any
component of this work in other works requires prior specific permission and/or a fee. Permissions may be
requested from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036 USA, fax:
This
preview
has intentionally blurred sections.
Sign up to view the full version.

This is the end of the preview.
Sign up
to
access the rest of the document.
- Spring '11
- Staff
- Computer Science, Grid Computing, ACM Computing Surveys, Data Grids
-
Click to edit the document details