STATS 507Data Analysis in PythonLecture 10: Basics of pandas
PandasOpen-source library of data analysis toolsLow-level ops implemented in Cython (C+Python=Cython, often faster)Database-like structures, largely similar to those available in ROptimized for most common operationsE.g., vectorized operations, operations on rows of a tableFrom the documentation: pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.
Basic Data StructuresSeries: represents a one-dimensional labeledarrayLabeled just means that there is an index into the arraySupport vectorized operationsDataFrame: table of rows, with labeled columnsLike a spreadsheet or an R data frameSupport numpyufuncs (provided data are numeric)@
pandasSeriesBy default, indices are integers, starting from 0, just like you’re used to.But we can specify a different set of indices if we so choose.Can create a pandasSeries from any array-like structure (e.g., numpyarray, Python list, dict).Pandas tries to infer this data type automatically.Warning:providing too few or too many indices is a ValueError.
pandasSeriesCan create a series from a dictionary. Keys become indices.Index ‘cthulu’doesn’t appear in the dictionary, so pandasassigns it NaN, the standard “missing data” symbol.
pandasSeriesIndexing works like you’re used to and supports slices, but notnegative indexing.This object has type np.int64This object is another pandasSeries.
pandasSeriesCaution:indices need not be unique in pandasSeries. This will only cause an error if/when you perform an operation that requires unique indices.
pandasSeriesSeries objects are like np.ndarrayobjects, so they support all the same kinds of slice operations, but note that the indices come along with the slices.Series objects even support most numpy functions that act on arrays.
pandasSeriesSeries objects are dict-like, in that we can access and update entries via their keys.Like a dictionary, accessing a non-existent key is a KeyError.Note:I cropped out a bunch of the error message, but you get the idea.Not shown:Series also support the inoperator: x in schecks if xappears as an index of Series s. Series also supports the dictionary getmethod.
pandasSeriesEntries of a Series can be of (almost) any type, and they may be mixed (e.g., some floats, some ints, some strings, etc), but they can notbe sequences.More information on indexing: ocs/stable/indexing.html