1 Open the Amazon EMR console at 2

1 open the amazon emr console at 2

This preview shows page 23 - 25 out of 395 pages.

1. Open the Amazon EMR console at . 2. Choose Clusters . 3. Choose the Name of the cluster. 4. Under Security and access choose the Security groups for Master link. 5. Choose ElasticMapReduce-master from the list. 6. Choose Inbound , Edit . 7. Find the rule with the following settings and choose the x icon to delete it: Type SSH Port 22 Source Custom 0.0.0.0/0 8. Scroll to the bottom of the list of rules and choose Add Rule . 9. For Type , select SSH . 17
Image of page 23
Amazon EMR Management Guide Step 4: Run a Hive Script to Process Data This automatically enters TCP for Protocol and 22 for Port Range . 10. For source, select My IP . This automatically adds the IP address of your client computer as the source address. Alternatively, you can add a range of Custom trusted client IP addresses and choose Add rule to create additional rules for other clients. In many network environments, IP addresses are allocated dynamically, so you may need to periodically edit security group rules to update the IP address of trusted clients. 11. Choose Save . 12. Optionally, choose ElasticMapReduce-slave from the list and repeat the steps above to allow SSH client access to core and task nodes from trusted clients. Step 4: Process Data By Running The Hive Script as a Step With your cluster up and running, you can now submit a Hive script. In this tutorial, you submit the Hive script as a step using the Amazon EMR console. In Amazon EMR, a step is a unit of work that contains one or more jobs. As you learned in Step 2: Launch Your Sample Amazon EMR Cluster (p. 12) , you can submit steps to a long-running cluster, which is what we do in this step. You can also specify steps when you create a cluster, or you could connect to the master node, create the script in the local file system, and run it using the command line, for example hive -f Hive_CloudFront.q . Understanding The Data And Script The sample data and script that you use in this tutorial are already available in an Amazon S3 location that you can access. The sample data is a series of Amazon CloudFront access log files. For more information about CloudFront and log file formats, see Amazon CloudFront Developer Guide . The data is stored in Amazon S3 at s3:// region .elasticmapreduce.samples/cloudfront/data where region is your region, for example, us-west-2 . When you enter the location when you submit the step, you omit the cloudfront/data portion because the script adds it. Each entry in the CloudFront log files provides details about a single user request in the following format: 2014-07-05 20:00:00 LHR3 4260 10.0.0.15 GET eabcd12345678.cloudfront.net /test- image-1.jpeg 200 - Mozilla/5.0%20(MacOS;%20U;%20Windows%20NT%205.1;%20en-US; %20rv:1.9.0.9)%20Gecko/2009040821%20IE/3.0.9 The sample script calculates the total number of requests per operating system over a specified time frame. The script uses HiveQL, which is a SQL-like scripting language for data warehousing and analysis.
Image of page 24
Image of page 25

You've reached the end of your free preview.

Want to read all 395 pages?

  • Spring '12
  • LauraParker
  • Amazon Web Services, Amazon Elastic Compute Cloud

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask ( soon) You can ask (will expire )
Answers in as fast as 15 minutes
A+ icon
Ask Expert Tutors