4 3 columns list bde index Utah Ohio frame In 164 In 165

4 3 columns list bde index utah ohio frame in 164 in

This preview shows page 31 - 36 out of 48 pages.

frame = pd.DataFrame(np.random.randn( 4 , 3 ), columns= list ( 'bde' ), index=[ 'Utah' , 'Ohio frame
Image of page 31
In [164]: In [165]: In [166]: Most of the common array statistics are DataFrame methods, so using apply is not necessary. The functions may not return a scalar value, but may also return a Series. Element-wise Python functions can be applied to pandas objects using 'applymap' . The name is because series has a method 'map' for applying functions for each element. In [167]: In [168]: Out[164]: b d e Utah 1.011066 1.440771 0.980569 Ohio 1.376996 0.230285 0.480924 Texas 1.266911 0.271005 2.140763 Oregon 0.475369 0.246525 1.084174 Out[165]: b 2.643907 d 1.711776 e 3.224937 dtype: float64 Out[166]: Utah 2.451837 Ohio 1.607281 Texas 3.407674 Oregon 1.559543 dtype: float64 Out[167]: b d e min -1.266911 -1.440771 -1.084174 max 1.376996 0.271005 2.140763 Out[168]: b d e Utah 1.01 -1.44 0.98 Ohio 1.38 -0.23 0.48 Texas -1.27 0.27 2.14 Oregon 0.48 0.25 -1.08 np.abs(frame) f = lambda x: x.max() - x.min() frame.apply(f) #applies on each column because frame is a list of series, i.e., col frame.apply(f, axis= 'columns' ) #applies on each row def f (x): return pd.Series([x.min(), x.max()], index=[ 'min' , 'max' ]) #for each column, ret frame.apply(f) shortformat = lambda x: '%.2f' % x frame.applymap(shortformat) #Element-wise Python functions
Image of page 32
In [169]: Sorting and Ranking Sorting dataset lexicographically is possible using sort_index method, which returns a new object. In DataFrames, you can sort by index on either axis. Data is sorted in ascending order by default, but can be sorted in descending orer too. In [170]: In [171]: In [172]: <class 'pandas.core.series.Series'> Utah 0.980569 Ohio 0.480924 Texas 2.140763 Oregon -1.084174 Name: e, dtype: float64 Out[169]: Utah 0.98 Ohio 0.48 Texas 2.14 Oregon -1.08 Name: e, dtype: object Out[170]: a 1 b 2 c 3 d 0 dtype: int64 Out[171]: d a b c one 4 5 6 7 three 0 1 2 3 Out[172]: a b c d three 1 2 3 0 one 5 6 7 4 print ( type (frame[ 'e' ])) print (frame[ 'e' ]) frame[ 'e' ].map(shortformat) obj = pd.Series( range ( 4 ), index=[ 'd' , 'a' , 'b' , 'c' ]) obj.sort_index() frame = pd.DataFrame(np.arange( 8 ).reshape(( 2 , 4 )), index=[ 'three' , 'one' ], columns = [ 'd' , 'a' , 'b' , 'c' ]) frame.sort_index() frame.sort_index(axis= 1 ) #sort column name
Image of page 33
In [173]: To sort Series by its values, use sort_values() . Missing values will be sorted to the end of the Series by default. While sorting DataFrames, you can use values of more than 1 columns as sort keys. Do this by passing 1 or more column names in the sort_values option. In [174]: In [175]: In [176]: In [177]: Out[173]: d c b a three 0 3 2 1 one 4 7 6 5 Out[174]: 4 -3.0 5 2.0 0 4.0 2 7.0 1 NaN 3 NaN dtype: float64 Out[175]: b a 0 4 0 1 7 1 2 -3 0 3 2 1 Out[176]: b a 2 -3 0 3 2 1 0 4 0 1 7 1 Out[177]: b a 2 -3 0 0 4 0 3 2 1 1 7 1 frame.sort_index(axis= 1 , ascending= False ) obj = pd.Series([ 4 , np.nan, 7 , np.nan, - 3 , 2 ]) obj.sort_values() #NaN at end frame = pd.DataFrame({ 'b' : [ 4 , 7 , - 3 , 2 ], 'a' :[ 0 , 1 , 0 , 1 ]}) frame frame.sort_values(by= 'b' ) frame.sort_values(by=[ 'a' , 'b' ]) #sort by a first, if tie, then sort by b
Image of page 34
Ranking assigns ranks 1 through the number of valid data points in an array. In pandas objects by default, 'rank' breaks ties by assigning each group mean rank. Ranks can also be assigned in the order in which they are observed using method='first'. This breaks ties giving upper rank to the value observed first.
Image of page 35
Image of page 36

You've reached the end of your free preview.

Want to read all 48 pages?

  • Spring '17
  • Zhi Wei

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture