You can use one of several methods to overcome the 256 step limit in AMI

You can use one of several methods to overcome the

This preview shows page 357 - 359 out of 395 pages.

You can use one of several methods to overcome the 256-step limit in AMI versions earler than 3.1.1 and 2.4.8: 1. Have each step submit several jobs to Hadoop. This does not allow you unlimited steps in AMI versions earlier than 3.1.1 and 2.4.8, but it is the easiest solution if you need a fixed number of steps greater than 256. 2. Write a workflow program that runs in a step on a long-running cluster and submits jobs to Hadoop. The workflow program can do one of the following: Listen to an Amazon SQS queue to receive information about new steps to run. Check an Amazon S3 bucket on a regular schedule for files containing information about the new steps to run. 3. Write a workflow program that runs on an Amazon EC2 instance outside Amazon EMR and submits jobs to your long-running clusters using SSH. 4. Connect to your long-running cluster via SSH and submit Hadoop jobs using the Hadoop API. For more information, see JobClient.html . 5. Connect to the master node and submit jobs to the cluster. You can connect using an SSH client, such as PuTTY or OpenSSH, and manually submit jobs to the cluster, or you can use the ssh subcommand in the AWS CLI to both connect and submit jobs. For more information about establishing an SSH connection with the master node, see Connect to the Master Node Using SSH (p. 313) . For more information about interactively submitting Hadoop jobs, see Submit Hadoop Jobs Interactively (p. 349) . Automate Recurring Clusters with AWS Data Pipeline AWS Data Pipeline is a service that automates the movement and transformation of data. You can use it to schedule moving input data into Amazon S3 and to schedule launching clusters to process that data. For example, consider the case where you have a web server recording traffic logs. If you want to run a weekly cluster to analyze the traffic data, you can use AWS Data Pipeline to schedule those clusters. AWS Data Pipeline is a data-driven workflow, so that one task (launching the cluster) can be dependent on another task (moving the input data to Amazon S3). It also has robust retry functionality. For more information about AWS Data Pipeline, see the AWS Data Pipeline Developer Guide , especially the tutorials regarding Amazon EMR: Tutorial: Launch an Amazon EMR Job Flow Getting Started: Process Web Logs with AWS Data Pipeline, Amazon EMR, and Hive Tutorial: Amazon DynamoDB Import and Export Using AWS Data Pipeline 351
Image of page 357
Amazon EMR Management Guide What Tools are Available for Troubleshooting? Troubleshoot a Cluster A cluster hosted by Amazon EMR runs in a complex ecosystem made up of several types of open-source software, custom application code, and Amazon Web Services. An issue in any of these parts can cause the cluster to fail or take longer than expected to complete. The following topics will help you figure out what has gone wrong in your cluster and give you suggestions on how to fix it.
Image of page 358
Image of page 359

You've reached the end of your free preview.

Want to read all 395 pages?

  • Spring '12
  • LauraParker
  • Amazon Web Services, Amazon Elastic Compute Cloud

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Ask Expert Tutors You can ask You can ask ( soon) You can ask (will expire )
Answers in as fast as 15 minutes