ZR-00-38 - Konrad-Zuse-Zentrum fur Informationstechnik...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
Takustraße 7 D-14195 Berlin-Dahlem Germany Konrad-Zuse-Zentrum f¨ur Informationstechnik Berlin A XEL K ELLER AND A LEXANDER R EINEFELD Anatomy of a Resource Management System for HPC Clusters ZIB-Report 00-38 (November 2000)
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Anatomy of a Resource Management System for HPC Clusters Axel Keller 1 and Alexander Reinefeld 2 1 Paderborn Center for Parallel Computing (PC 2 ) 2 Konrad-Zuse-Zentrum f¨ur Informationstechnik Berlin (ZIB) kel@upb.de, ar@zib.de Zusammenfassung Workstation clusters are often not only used for high-throughput computing in time-sharing mode but also for running complex parallel jobs in space-sharing mode. This poses several difficulties to the resource management system, which must be able to reserve computing resources for exclusive use and also to deter- mine an optimal process mapping for a given system topology. On the basis of our CCS software, we describe the anatomy of a modern re- source management system. Like Codine, Condor, and LSF, CCS provides me- chanisms for the user-friendly system access and management of clusters. But unlike them, CCS is targeted at the effective support of space-sharing parallel computers and even metacomputers. Among other features, CCS provides a ver- satile resource description facility, topology-based process mapping, pluggable schedulers, and hooks to metacomputer management. To appear in: Annual Review of Scalable Computing, Vol. 3, 2001. Author for correspondence: Alexander Reinefeld, ar@zib.de 1
Background image of page 2
Inhaltsverzeichnis 1 Introduction 3 1.1 CCS Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 CCS Target Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Scope and Organisation of this Paper . . . . . . . . . . . . . . . . . . 4 2 CCS Architecture 5 2.1 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 User Access Manager . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Authentication . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.2 Authorization . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.3 Accounting . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Scheduling and Partitioning . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.1 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.2 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Access and Job Control . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.5 Performance Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.6 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.6.1 Virtual Terminal Concept . . . . . . . . . . . . . . . . . . . . 12 2.6.2 Alive Checks . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.6.3 Node Checking . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.7 Modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.7.1 Operator Shell . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.7.2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/16/2012 for the course BI 200 taught by Professor Potter during the Fall '11 term at Montgomery College.

Page1 / 31

ZR-00-38 - Konrad-Zuse-Zentrum fur Informationstechnik...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online