{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

ZR-00-38 - Konrad-Zuse-Zentrum fur Informationstechnik...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
Takustraße 7 D-14195 Berlin-Dahlem Germany Konrad-Zuse-Zentrum f¨ur Informationstechnik Berlin A XEL K ELLER AND A LEXANDER R EINEFELD Anatomy of a Resource Management System for HPC Clusters ZIB-Report 00-38 (November 2000)
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Anatomy of a Resource Management System for HPC Clusters Axel Keller 1 and Alexander Reinefeld 2 1 Paderborn Center for Parallel Computing (PC 2 ) 2 Konrad-Zuse-Zentrum f¨ur Informationstechnik Berlin (ZIB) [email protected], [email protected] Zusammenfassung Workstation clusters are often not only used for high-throughput computing in time-sharing mode but also for running complex parallel jobs in space-sharing mode. This poses several difficulties to the resource management system, which must be able to reserve computing resources for exclusive use and also to deter- mine an optimal process mapping for a given system topology. On the basis of our CCS software, we describe the anatomy of a modern re- source management system. Like Codine, Condor, and LSF, CCS provides me- chanisms for the user-friendly system access and management of clusters. But unlike them, CCS is targeted at the effective support of space-sharing parallel computers and even metacomputers. Among other features, CCS provides a ver- satile resource description facility, topology-based process mapping, pluggable schedulers, and hooks to metacomputer management. To appear in: Annual Review of Scalable Computing, Vol. 3, 2001. Author for correspondence: Alexander Reinefeld, [email protected] 1
Background image of page 2
Inhaltsverzeichnis 1 Introduction 3 1.1 CCS Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 CCS Target Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Scope and Organisation of this Paper . . . . . . . . . . . . . . . . . . 4 2 CCS Architecture 5 2.1 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 User Access Manager . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Authentication . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.2 Authorization . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.3 Accounting . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Scheduling and Partitioning . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.1 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.2 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Access and Job Control . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.5 Performance Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.6 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.6.1 Virtual Terminal Concept . . . . . . . . . . . . . . . . . . . . 12 2.6.2 Alive Checks . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.6.3 Node Checking . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.7 Modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.7.1 Operator Shell . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.7.2 Worker Concept . . . . . . . . . . . . . . . . . . . . . . . . 14 2.7.3 Adapting to the Local Environment . . . . . . . . . . . . . . 15 2.7.4 Monitoring Tools . . . . . . . . . . . . . . . . . . . . . . . . 16 3 Resource and Service Description 17 3.1 Graphical Representation . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Textual Representation . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.1 Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Dynamic Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4 Internal Data Representation . . . . . . . . . . . . . . . . . . . . . . 22 3.5 RSD Tools in CCS . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4 Site Management 23 4.1 Center Resource Manager (CRM) . . . . . . . . . . . . . . . . . . . 23 4.2 Center Information Server (CIS) . . . . . . . . . . . . . . . . . . . . 25 5 Related Work 25 6 Summary 26 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
1 Introduction A resource management system is a portal to the underlying computing resources. It allows users and administrators to access and manage various computing resources like processors, memory, network, and permanent storage. With the current trend towards heterogeneous grid computing [17], it is important to separate the resource manage- ment software from the concrete underlying hardware by introducing an abstraction layer between the hardware and the system management. This facilitates the mana- gement of distributed resources in grid computing environments as well as in local clusters with heterogeneous components.
Background image of page 4
Image of page 5
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}