In This Section Overview of Amazon EMR p 1 Benefits of Using Amazon EMR p 4

In this section overview of amazon emr p 1 benefits

This preview shows page 7 - 9 out of 395 pages.

In This Section Overview of Amazon EMR (p. 1) Benefits of Using Amazon EMR (p. 4) Overview of Amazon EMR Architecture (p. 8) Overview of Amazon EMR This topic provides an overview of Amazon EMR clusters, including how to submit work to a cluster, how that data is processed, and the various states that the cluster goes through during processing. In This Topic Understanding Clusters and Nodes (p. 1) Submitting Work to a Cluster (p. 2) Processing Data (p. 2) Understanding the Cluster Lifecycle (p. 3) Understanding Clusters and Nodes The central component of Amazon EMR is the cluster . A cluster is a collection of Amazon Elastic Compute Cloud (Amazon EC2) instances. Each instance in the cluster is called a node . Each node has a role within the cluster, referred to as the node type . Amazon EMR also installs different software components on each node type, giving each node a role in a distributed application like Apache Hadoop. The node types in Amazon EMR are as follows: Master node : A node that manages the cluster by running software components to coordinate the distribution of data and tasks among other nodes for processing. The master node tracks the status of tasks and monitors the health of the cluster. Every cluster has a master node, and it's possible to create a single-node cluster with only the master node. Core node : A node with software components that run tasks and store data in the Hadoop Distributed File System (HDFS) on your cluster. Multi-node clusters have at least one core node. 1
Image of page 7
Amazon EMR Management Guide Submitting Work to a Cluster Task node : A node with software components that only runs tasks and does not store data in HDFS. Task nodes are optional. The following diagram represents a cluster with one master node and four core nodes. Submitting Work to a Cluster When you run a cluster on Amazon EMR, you have several options as to how you specify the work that needs to be done. Provide the entire definition of the work to be done in functions that you specify as steps when you create a cluster. This is typically done for clusters that process a set amount of data and then terminate when processing is complete. Create a long-running cluster and use the Amazon EMR console, the Amazon EMR API, or the AWS CLI to submit steps, which may contain one or more jobs. For more information, see Submit Work to a Cluster (p. 345) . Create a cluster, connect to the master node and other nodes as required using SSH, and use the interfaces that the installed applications provide to perform tasks and submit queries, either scripted or interactively. For more information, see the Amazon EMR Release Guide . Processing Data When you launch your cluster, you choose the frameworks and applications to install for your data processing needs. To process data in your Amazon EMR cluster, you can submit jobs or queries directly to installed applications, or you can run steps in the cluster.
Image of page 8
Image of page 9

You've reached the end of your free preview.

Want to read all 395 pages?

  • Spring '12
  • LauraParker
  • Amazon Web Services, Amazon Elastic Compute Cloud

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Ask Expert Tutors You can ask You can ask ( soon) You can ask (will expire )
Answers in as fast as 15 minutes