Stat507_lecture10_pandas.pdf - STATS 507 Data Analysis in Python Lecture 10 Basics of pandas Pandas Open-source library of data analysis tools Low-level

Stat507_lecture10_pandas.pdf - STATS 507 Data Analysis in...

This preview shows page 1 - 12 out of 61 pages.

STATS 507 Data Analysis in Python Lecture 10: Basics of pandas
Pandas Open-source library of data analysis tools Low-level ops implemented in Cython (C+Python=Cython, often faster) Database-like structures, largely similar to those available in R Optimized for most common operations E.g., vectorized operations, operations on rows of a table From the documentation: pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.
:
Basic Data Structures Series: represents a one-dimensional labeled array Labeled just means that there is an index into the array Support vectorized operations DataFrame: table of rows, with labeled columns Like a spreadsheet or an R data frame Support numpy ufuncs (provided data are numeric) @
pandas Series By default, indices are integers, starting from 0, just like you’re used to. But we can specify a different set of indices if we so choose. Can create a pandas Series from any array-like structure (e.g., numpy array, Python list, dict). Pandas tries to infer this data type automatically. Warning: providing too few or too many indices is a ValueError .
pandas Series Can create a series from a dictionary. Keys become indices. Index ‘cthulu’ doesn’t appear in the dictionary, so pandas assigns it NaN , the standard “missing data” symbol.
pandas Series Indexing works like you’re used to and supports slices, but not negative indexing. This object has type np.int64 This object is another pandas Series.
pandas Series Caution: indices need not be unique in pandas Series. This will only cause an error if/when you perform an operation that requires unique indices.
pandas Series Series objects are like np.ndarray objects, so they support all the same kinds of slice operations, but note that the indices come along with the slices. Series objects even support most numpy functions that act on arrays.
pandas Series Series objects are dict -like, in that we can access and update entries via their keys. Like a dictionary, accessing a non-existent key is a KeyError. Note: I cropped out a bunch of the error message, but you get the idea. Not shown: Series also support the in operator: x in s checks if x appears as an index of Series s . Series also supports the dictionary get method.
pandas Series Entries of a Series can be of (almost) any type, and they may be mixed (e.g., some floats, some ints, some strings, etc), but they can not be sequences. More information on indexing: ocs/stable/indexing.html

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture