Considerations and Limitations Consider the following when using notebook

Considerations and limitations consider the following

This preview shows page 37 - 39 out of 395 pages.

Considerations and Limitations Consider the following when using notebook-scoped libraries: You can uninstall only the libraries that are installed using install_pypi_package API. You cannot uninstall any libraries installed on the cluster. If the same libraries with different versions are installed on the cluster and as notebook-scoped libraries, the notebook-scoped library version overrides the cluster library version. Working With Notebook-Scoped Libraries To install libraries, your Amazon EMR cluster must have access to the PyPI repository where the libraries are located. For example, for clusters in private subnets, you may need to configure network address translation (NAT) and provide a path for the cluster to access the repository that is located outside the cluster’s VPC. For more information about configuring external access for different network configurations, see Scenarios and Examples in the Amazon VPC User Guide . By default, Python 2 is used to create the environment. To use Python 3, you can reconfigure the notebook session by running the following commands in a notebook cell to set PySpark properties. %%configure -f { "conf":{ "spark.pyspark.python": "python3", "spark.pyspark.virtualenv.enabled": "true", "spark.pyspark.virtualenv.type":"native", "spark.pyspark.virtualenv.bin.path":"/usr/bin/virtualenv" }} The following examples demonstrate commands to list, install, and uninstall libraries from within a notebook cell using the PySpark APIs. Example – Listing Current Libraries The following command lists the Python packages available for the current Spark notebook session. This lists libraries installed on the cluster and notebook-scoped libraries. 31
Image of page 37
Amazon EMR Management Guide Associate Git Repositories with Amazon EMR Notebooks sc.list_packages() Example – Installing the Celery Library The following command installs the Celery library as a notebook-scoped library. sc.install_pypi_package("celery") After installing the library, the following command confirms that the library is available on the Spark driver and executors. import celery sc.range(1,10000,1,100).map(lambda x: celery.__version__).collect() Example – Installing the Arrow Library, Specifying the Version and Repository The following command installs the Arrow library as a notebook-scoped library, with a specification of the library version and repository URL. sc.install_pypi_package("arrow==0.14.0", "") Example – Uninstalling a Library The following command uninstalls the Arrow library, removing it as a notebook-scoped library from the current session. sc.uninstall_package("arrow") Associate Git Repositories with Amazon EMR Notebooks You can associate Git repositories with your Amazon EMR notebooks to save your notebooks in a version controlled environment. You can associate up to three repositories with a notebook. The Git repositories must be hosted in the following web-based Git hosting services: GitHub or Bitbucket. Associating Git repositories with your notebook can be useful for:
Image of page 38
Image of page 39

You've reached the end of your free preview.

Want to read all 395 pages?

  • Spring '12
  • LauraParker
  • Amazon Web Services, Amazon Elastic Compute Cloud

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Ask Expert Tutors You can ask You can ask ( soon) You can ask (will expire )
Answers in as fast as 15 minutes