If your input data is in a format other than the default text files you can use

If your input data is in a format other than the

This preview shows page 47 - 49 out of 395 pages.

If your input data is in a format other than the default text files, you can use the Hadoop interface InputFormat to specify other input types. You can even create a subclass of the FileInputFormat class to handle custom data types. For more information, see api/org/apache/hadoop/mapred/InputFormat.html . If you are using Hive, you can use a serializer/deserializer (SerDe) to read data in from a given format into HDFS. For more information, see . How to Get Data Into Amazon EMR Amazon EMR provides several ways to get data onto a cluster. The most common way is to upload the data to Amazon S3 and use the built-in features of Amazon EMR to load the data onto your cluster. You can also use the Distributed Cache feature of Hadoop to transfer files from a distributed file system to the local file system. The implementation of Hive provided by Amazon EMR (Hive version 0.7.1.1 and later) includes functionality that you can use to import and export data between DynamoDB and an Amazon EMR cluster. If you have large amounts of on-premises data to process, you may find the AWS Direct Connect service useful. Topics Upload Data to Amazon S3 (p. 41) Import files with Distributed Cache (p. 45) How to Process Compressed Files (p. 48) Import DynamoDB Data into Hive (p. 48) Connect to Data with AWS DirectConnect (p. 48) Upload Large Amounts of Data with AWS Import/Export (p. 48) Upload Data to Amazon S3 For information on how to upload objects to Amazon S3, see Add an Object to Your Bucket in the Amazon Simple Storage Service Getting Started Guide . For more information about using Amazon S3 with Hadoop, see . Topics 41
Image of page 47
Amazon EMR Management Guide Prepare Input Data Create and Configure an Amazon S3 Bucket (p. 42) Configure Multipart Upload for Amazon S3 (p. 42) Best Practices (p. 44) Create and Configure an Amazon S3 Bucket Amazon EMR uses the AWS SDK for Java with Amazon S3 to store input data, log files, and output data. Amazon S3 refers to these storage locations as buckets . Buckets have certain restrictions and limitations to conform with Amazon S3 and DNS requirements. For more information, see Bucket Restrictions and Limitations in the Amazon Simple Storage Service Developer Guide . This section shows you how to use the Amazon S3 AWS Management Console to create and then set permissions for an Amazon S3 bucket. You can also create and set permissions for an Amazon S3 bucket using the Amazon S3 API or AWS CLI. You can also use Curl along with a modification to pass the appropriate authentication parameters for Amazon S3. See the following resources: To create a bucket using the console, see Create a Bucket in the Amazon Simple Storage Service Console User Guide .
Image of page 48
Image of page 49

You've reached the end of your free preview.

Want to read all 395 pages?

  • Spring '12
  • LauraParker
  • Amazon Web Services, Amazon Elastic Compute Cloud

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask ( soon) You can ask (will expire )
Answers in as fast as 15 minutes
A+ icon
Ask Expert Tutors