Capacity Management with Queues The Capacity scheduler is designed to allow

Capacity management with queues the capacity

This preview shows page 187 - 189 out of 337 pages.

these abilities are expected to be available in future versions of YARN. Capacity Management with Queues The Capacity scheduler is designed to allow organizations to share compute clusters using the very familiar notion of first-in, first-out queues. YARN doesn’t assign whole nodes to queues. Instead, queues own a fraction of the capacity of the cluster, which can be fulfilled from any number of nodes in a dynamic fashion. Scheduling is the process of matching the resource requirements of multiple applications from various users, each submitted to different queues at multiple levels in the queue hierarchy, with free capacity available at any point in time on the nodes in the cluster. Queues are configured by the administrators to be allocated as a fraction of the capacity of the whole cluster. In our example, assuming that the administrators decide to share the cluster resources between the grumpy-engineers, finance-wizards, and marketing-moguls in a 6:1:3 ratio, the corresponding queue configuration will be as follows: n n Property : yarn.scheduler.capacity.root.grumpy-engineers.capacity n n Value : 60 n n Property : yarn.scheduler.capacity.root.finance-wizards.capacity n n Value : 10
Image of page 187
Capacity Management with Queues 161 n n Property : yarn.scheduler.capacity.root.marketing-moguls.capacity n n Value : 30 YARN is built around the fundamental requirements of fault tolerance and elastic- ity. In a YARN cluster built out of commodity hardware, subcomponents of a node like disks or even whole nodes can go down for any of several reasons. In addition, depending on workloads and historical cluster usage, administrators may choose to either add new physical machines or take away existing nodes to account for under- utilization. Any of these changes will cause corresponding variations in cluster capac- ity, as seen by the Capacity scheduler for the sake of scheduling. Queue capacity configuration is indicated in terms of percentages for this reason; this scheme ensures that organizations and suborganizations can reason well about their shares and guaran- tees irrespective of small variations in the total cluster capacity. As discussed in the section dealing with hierarchical queues, there is a capac- ity planning problem at a suborganization level. Continuing with our example, let’s assume the grumpy-engineers decide to share their capacity between the infinite- monkeys and the pesky-testers in a 1:4 ratio (so that testing of YARN gets as much resources as possible). The corresponding configuration should be as follows, again given in terms of percentages: n n Property : yarn.scheduler.capacity.root.grumpy-engineers.infinite- monkeys.capacity n n Value : 20 n n Property : yarn.scheduler.capacity.root.grumpy-engineers.pesky- testers.capacity n n Value : 80 Note that the sum of capacities at any level in the hierarchy should be no more than 100% (for obvious reasons).
Image of page 188
Image of page 189

You've reached the end of your free preview.

Want to read all 337 pages?

  • Fall '19
  • Hadoop

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask You can ask (will expire )
Answers in as fast as 15 minutes