It has a strong set of preprocessing tools that make

Info icon This preview shows pages 43–46. Sign up to view the full content.

It has a strong set of preprocessing tools that make it easy to load your data in, and then you have a large library of algorithms at your fingertips, so you can quickly try out ideas until you find an approach that works for your problem. The command-line interface allows you to apply exactly the same code in an automated way for production. Mahout Mahout is an open source framework that can run common machine learning algo- rithms on massive datasets. To achieve that scalability, most of the code is written as parallelizable jobs on top of Hadoop. It comes with algorithms to perform a lot of common tasks, like clustering and classifying objects into groups, recommending items based on other users’ behaviors, and spotting attributes that occur together a lot. In practical terms, the framework makes it easy to use analysis techniques to implement features such as Amazon’s “People who bought this also bought” recommendation 31
Image of page 43

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

engine on your own site. It’s a heavily used project with an active community of de- velopers and users, and it’s well worth trying if you have any significant number of transaction or similar data that you’d like to get more value out of. Introducing Mahout Using Mahout with Cassandra scikits.learn It’s hard to find good off-the-shelf tools for practical machine learning. Many of the projects are aimed at students and researchers who want access to the inner workings of the algorithms, which can be off-putting when you’re looking for more of a black box to solve a particular problem. That’s a gap that scikits.learn really helps to fill. It’s a beautifully documented and easy-to-use Python package offering a high-level inter- face to many standard machine learning techniques. It collects most techniques that fall under the standard definition of machine learning (taking a training dataset and using that to predict something useful about data received later) and offers a common way of connecting them together and swapping them out. This makes it a very fruitful sandbox for experimentation and rapid prototyping, with a very easy path to using the same code in production once it’s working well. Face Recognition using scikits.learn 32 | Chapter 8: Machine Learning
Image of page 44
CHAPTER 9 Visualization One of the best ways to communicate the meaning of data is by extracting the important parts and presenting them graphically. This is helpful both for internal use, as an ex- ploration technique to spot patterns that aren’t obvious from the raw values, and as a way to succinctly present end users with understandable results. As the Web has turned graphs from static images to interactive objects, the lines between presentation and exploration have blurred. The possibilities of the new medium have led to some of the fantastic new tools I cover in this section. Gephi Gephi is an open source Java application that creates network visualizations from raw edge and node graph data. It’s very useful for understanding social network informa- tion; one of the project’s founders was hired by LinkedIn, and Gephi is now used for LinkedIn visualizations. There are several different layout algorithms, each with mul- tiple parameters you can tweak to arrange the positions of the nodes in your data. If
Image of page 45

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 46
This is the end of the preview. Sign up to access the rest of the document.
  • Fall '16
  • KYS
  • Hadoop

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern