W7 Data Wrangling - Join, Combine and Reshape - Jupyter Notebook.pdf - W7 Data Wrangling Join Combine and Reshape Jupyter Notebook Join Combine and

W7 Data Wrangling - Join, Combine and Reshape - Jupyter Notebook.pdf

This preview shows page 1 - 5 out of 44 pages.

10/26/2019 W7 Data Wrangling - Join, Combine and Reshape - Jupyter Notebook localhost:8888/notebooks/W7 Data Wrangling - Join%2C Combine and Reshape.ipynb 1/44 Join, Combine and Reshape Data may be spread across number of files or databases Combine, join and rearrange data is an important skill 1. Hierarchical Indexing Multiple index levels on an axis. Higher dimensional data in lower dimensional form. When looking at a Series or DataFrame with multi-index, you will see "gaps" in the higher index, which means "same as the one above". MultiIndex table example In [1]: In [2]: Out[2]: a 1 100 2 101 3 102 b 1 103 3 104 c 1 105 2 106 d 2 107 3 108 dtype: int32 import warnings warnings.simplefilter(action = 'ignore' , category = FutureWarning) # disable Fu import pandas as pd import numpy as np # 2d data in 1d form data = pd.Series(np.arange( 100 , 109 ), index = [[ 'a' , 'a' , 'a' , 'b' , 'b' , 'c' , 'c' , 'd' , 'd' ], [ 1 , 2 , 3 , 1 , 3 , 1 , 2 , 2 , 3 ]]) #same length so that th data
Image of page 1
10/26/2019 W7 Data Wrangling - Join, Combine and Reshape - Jupyter Notebook localhost:8888/notebooks/W7 Data Wrangling - Join%2C Combine and Reshape.ipynb 2/44 In [3]: 'partial indexing' enables us to concisely select subsets of data. Selection is also possible for "inner" level of indexes. In [4]: In [5]: In [6]: In [7]: Hierarchical indexing has important role in reshaping data and group=based operations. eg - forming pivot table. You could rearrange data into a DataFrame using its 'unstack' method. The inverse operation of stack is 'stack'. Out[3]: MultiIndex(levels=[['a', 'b', 'c', 'd'], [1, 2, 3]], labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 2, 0, 2, 0, 1, 1, 2]]) Out[4]: 1 103 3 104 dtype: int32 Out[5]: b 1 103 3 104 c 1 105 2 106 dtype: int32 Out[6]: b 1 103 3 104 d 2 107 3 108 dtype: int32 Out[7]: a 102 b 104 d 108 dtype: int32 data.index #levels are the unique set, labels are the index of levels data[ 'b' ] # select a subgroup data[ 'b' : 'c' ] #select two subgroups data.loc[[ 'b' , 'd' ]] #select a list of subgroups data.loc[:, 3 ] #for each subgroup, choose the element at index 3
Image of page 2
10/26/2019 W7 Data Wrangling - Join, Combine and Reshape - Jupyter Notebook localhost:8888/notebooks/W7 Data Wrangling - Join%2C Combine and Reshape.ipynb 3/44 In [8]: In [9]: In [10]: In a DataFrame, either axis can have hierarchical index. The hierarchical indexes can have names and they will be shown in the console output. NOTE - Be careful not to mix-up index names with row labels. With partial column indexing, we can select groups of columns. A 'MultiIndex' can be created by itself and then reused. Out[8]: a 1 100 2 101 3 102 b 1 103 3 104 c 1 105 2 106 d 2 107 3 108 dtype: int32 Out[9]: 1 2 3 a 100.0 101.0 102.0 b 103.0 NaN 104.0 c 105.0 106.0 NaN d NaN 107.0 108.0 Out[10]: a 1 100.0 2 101.0 3 102.0 b 1 103.0 3 104.0 c 1 105.0 2 106.0 d 2 107.0 3 108.0 dtype: float64 data data.unstack() # re-arrange 1d form into 2d form by the two indices data.unstack().stack() #switch between 1d and 2d form, i.e., series vs data
Image of page 3
10/26/2019 W7 Data Wrangling - Join, Combine and Reshape - Jupyter Notebook localhost:8888/notebooks/W7 Data Wrangling - Join%2C Combine and Reshape.ipynb 4/44 In [11]: In [12]: In [13]: Out[11]: Ohio Colorado Green Red Green a 1 0 1 2 2 3 4 5 b 1 6 7 8 2 9 10 11 Out[12]: state Ohio Colorado color Green Red Green key1 key2 a 1 0 1 2 2 3 4 5 b 1 6 7 8 2 9 10 11 Out[13]: color Green Red key1 key2 a 1 0 1 2 3 4 b 1 6 7 2 9 10 frame = pd.DataFrame(np.arange( 12 ).reshape(( 4 , 3 )), index = [[
Image of page 4
Image of page 5

You've reached the end of your free preview.

Want to read all 44 pages?

  • Spring '19

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask You can ask (will expire )
Answers in as fast as 15 minutes
A+ icon
Ask Expert Tutors