pandas.pptx - CS 5163 Introduction to Data Science Pandas...

Info icon This preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
CS 5163: Introduction to Data Science Pandas and data I/O
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Why pandas? One of the most popular library that data scientists use Labeled axes to avoid misalignment of data Data[:, 2] represents weight or weight2? When merge two tables, some rows may be different Missing values or special values may need to be removed or replaced height Weight Weight2 age Gender Amy 160 125 126 32 2 Bob 170 167 155 -1 1 Chris 168 143 150 28 1 David 190 182 NA 42 1 Ella 175 133 138 23 2 Frank 172 150 148 45 1 salary Credit score Alice 50000 700 Bob NA 670 Chris 60000 NA David -99999 750 Ella 70000 685 Tom 45000 660
Image of page 2
Overview Created by Wes McKinney in 2008, now maintained by Jeff Reback and many others. Author of one of the textbooks: Python for Data Analysis Powerful and productive Python data analysis and Management Library Panel Data System Its an open source product.
Image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Overview - 2 Python Library to provide data analysis features similar to: R, MATLAB, SAS Rich data structures and functions to make working with data structure fast, easy and expressive. It is built on top of NumPy Key components provided by Pandas: Series DataFrame In [ 664 ]: from pandas import Series, DataFrame In [ 665 ]: import pandas as pd From now on:
Image of page 4
Series One dimensional array-like object It contains array of data (of any NumPy data type) with associated indexes. (Indexes can be strings or integers or other data types.) By default , the series will get indexing from 0 to N where N = size -1 In [ 666 ]: obj = Series([4, 7, -5, 3]) In [ 667 ]: obj Out[ 667 ]: 0 4 1 7 2 -5 3 3 dtype: int64 In [ 668 ]: obj.values Out[ 668 ]: array([ 4, 7, -5, 3], dtype=int64) In [ 669 ]: obj.index Out[ 669 ]: RangeIndex(start=0, stop=4, step=1)
Image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Series – referencing elements In [ 670 ]: obj2 = Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c']) In [ 671 ]: obj2 Out[ 671 ]: d 4 b 7 a -5 c 3 dtype: int64 In [ 672 ]: obj2.index Out[ 672 ]: Index(['d', 'b', 'a', 'c'], dtype='object') In [ 673 ]: obj2.values Out[ 673 ]: array([ 4, 7, -5, 3], dtype=int64) In [ 674 ]: obj2['a'] Out[ 674 ]: -5 In [ 675 ]: obj2['d']=10 In [ 677 ]: obj2[['d', 'c', 'a']] Out[ 677 ]: d 10 c 3 a -5 dtype: int64 In [ 692 ]: obj2[:2] Out[ 692 ]: d 10 b 7 dtype: int64 In [ 818 ]: obj2.a Out[ 818 ]: -5
Image of page 6
Series – array/dict operations numpy array operations can also be applied, which will preserve the index-value link Can be thought of as a dict. Can be constructed from a dict directly. In [ 694 ]: obj2[obj2>0] Out[ 694 ]: d 10 b 7 c 3 dtype: int64 In [ 699 ]: obj2**2 Out[ 699 ]: d 100 b 49 a 25 c 9 dtype: int64 In [ 700 ]: 'b' in obj2 Out[ 700 ]: True In [ 702 ]: obj3 = Series({'a': 10, 'b': 5, 'c': 30}) In [ 703 ]: obj3 Out[ 703 ]: a 10 b 5 c 30 dtype: int64
Image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Series – null values In [ 704 ]: sdata = {'Texas': 10, 'Ohio': 20, 'Oregon': 15, 'Utah': 18} In [ 705 ]: states = ['Texas', 'Ohio', 'Oregon', 'Iowa'] In [ 706 ]: obj4 = Series(sdata, index=states) In [ 707 ]: obj4 Out[ 707 ]: Texas 10.0 Ohio 20.0 Oregon 15.0 Iowa NaN dtype: float64 Missing value In [ 708 ]: pd.isnull(obj4) Out[ 708 ]: Texas False Ohio False Oregon False Iowa True dtype: bool In [ 709 ]: pd.notnull(obj4) Out[ 709 ]: Texas True Ohio True Oregon True Iowa False dtype: bool In [ 717 ]: obj4[obj4.notnull()] Out[ 717 ]: Texas 10.0 Ohio 20.0 Oregon 15.0 dtype: float64
Image of page 8
Series – auto alignment In [ 714 ]: obj5 Out[ 714 ]: Ohio 20 Oregon 15 Texas 10 Utah 18 dtype: int64 In [ 715 ]: obj5 + obj4 Out[ 715 ]: Iowa NaN Ohio 40.0 Oregon 30.0 Texas 20.0 Utah NaN dtype: float64 In [ 707 ]: obj4 Out[ 707 ]: Texas 10.0 Ohio 20.0 Oregon 15.0 Iowa NaN dtype: float64
Image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 10
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern