数据转置
行列标签一起转置,利用.T实现
# -*- coding: utf-8 -*- import pandas as pd d = {'col1': [1,2,3], 'col2': [4,5,6],'col3':[7,8,9]} df = pd.DataFrame(data=d) print(df) print('-----------') print(df.T)
D:python3installpython.exe D:/python/py3script/test.py col1 col2 col3 0 1 4 7 1 2 5 8 2 3 6 9 ----------- 0 1 2 col1 1 2 3 col2 4 5 6 col3 7 8 9 Process finished with exit code 0
数据修改
1、通过索引实现,修改整行和列
2、通过索引修改单个值不生生效,使用df.at修改单个值(建议新值和旧值数据类型应保持一致)
# -*- coding: utf-8 -*- import pandas as pd d = {'col1': [1,2,3], 'col2': [4,'66',6],'col3':[7,8,9]} df = pd.DataFrame(data=d) print(df) print('------------------') df['col1'] = 'aaa' print(df) print('-------------------') df.at[1,'col2'] = 'py' print(df) print('-----------') df.loc[[1]]['col1'] = 'bbb' print('并未变化') print(df)
D:python3installpython.exe D:/python/py3script/test.py col1 col2 col3 0 1 4 7 1 2 66 8 2 3 6 9 ------------------ col1 col2 col3 0 aaa 4 7 1 aaa 66 8 2 aaa 6 9 ------------------- col1 col2 col3 0 aaa 4 7 1 aaa py 8 2 aaa 6 9 ----------- 并未变化 col1 col2 col3 0 aaa 4 7 1 aaa py 8 2 aaa 6 9 Process finished with exit code 0
数据删除
1、del
2、drop删除行
df.drop(['a', 'd'], axis=0) 删除行,默认axis=0
df.drop(['a', 'd'], axis=1) 删除列
df.drop(['a', 'd'], axis=1 ,inplace=False) 生成新df,不改变原df。默认是False
df.drop(['a', 'd'], axis=1 ,inplace=False) 改变原df
# -*- coding: utf-8 -*- import pandas as pd d = {'col1': [1,2,3], 'col2': [4,'66',6],'col3':[7,8,9]} df = pd.DataFrame(data=d) print(df) print('------------------') # del删除 del(df['col1']) print(df) print('------------') # drop删除 res = df.drop([1]) print(res) print('-------------') df.drop(['col2'],axis=1,inplace=True) print(df)
D:python3installpython.exe D:/python/py3script/test.py col1 col2 col3 0 1 4 7 1 2 66 8 2 3 6 9 ------------------ col2 col3 0 4 7 1 66 8 2 6 9 ------------ col2 col3 0 4 7 2 6 9 ------------- col3 0 7 1 8 2 9 Process finished with exit code 0
对齐
# -*- coding: utf-8 -*- import pandas as pd d1 = {'col1': [1, 2], 'col2': [3, 4]} d2 = {'col1': [4, 8], 'col2': [7, 9],'col3':[1,2]} df1 = pd.DataFrame(data=d1) df2 = pd.DataFrame(data=d2) df = df1 + df2 print(df)
D:python3installpython.exe D:/python/py3script/python66.py col1 col2 col3 0 5 10 NaN 1 10 13 NaN Process finished with exit code 0
排序
1)按值排序
DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')
参数解释:
by:字符串或者列表;如果axis=0,那么by="列名";如果axis=1,那么by="行名"。
axis:默认值0,默认按照列排序,即纵向排序;如果为1,则是横向排序。
ascending:布尔型,True则升序,如果by=['列名1','列名2'],则该参数可以是[True, False],即第一字段升序,第二个降序。
inplace:布尔型,是否用排序后的数据框替换现有的数据框。
kind:排序方法,{‘quicksort’, ‘mergesort’, ‘heapsort’}, 默认是‘quicksort’。似乎不用太关心。
na_position:{‘first’, ‘last’}, 默认是‘last’,默认缺失值排在最后面。
单列排序及多列排序
# -*- coding: utf-8 -*- import pandas as pd df = pd.DataFrame({'b':[1,2,3,2],'a':[4,3,2,1],'c':[1,3,8,2]},index=[2,0,1,3]) print(df) print('------------') df1 = df.sort_values(by='b',axis=0) print(df1) print('--------------') # 多列排序 df2 = df.sort_values(by=['b','a'],axis=0,ascending=[False,True]) print(df2)
D:python3installpython.exe D:/python/py3script/python66.py b a c 2 1 4 1 0 2 3 3 1 3 2 8 3 2 1 2 ------------ b a c 2 1 4 1 0 2 3 3 3 2 1 2 1 3 2 8 -------------- b a c 1 3 2 8 3 2 1 2 0 2 3 3 2 1 4 1 Process finished with exit code 0
2)按索引排序
.sort_index函数
参数解释:
sort_index(axis=0,level=None,ascending=True,inplace=False,kind='quicksort',na_position='last',sort_remaining=True,by=None)
axis:{0 or ‘index’, 1 or ‘columns’}, default 0
The axis along which to sort. The value 0 identifies the rows, and 1 identifies the columns.
level:int or level name or list of ints or list of level names
If not None, sort on values in specified index level(s).
ascending:bool, default True
Sort ascending vs. descending.
inplace:bool, default False
If True, perform operation in-place.
kind:{‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’
Choice of sorting algorithm. See also ndarray.np.sort for more information.mergesortis the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.
na_position:{‘first’, ‘last’}, default ‘last’
Puts NaNs at the beginning iffirst;lastputs NaNs at the end. Not implemented for MultiIndex.
sort_remaining:bool, default True
If True and sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level.
# -*- coding: utf-8 -*- import pandas as pd df = pd.DataFrame({'b':[1,2,2,3],'a':[4,3,2,1],'c':[1,3,8,2]},index=[2,0,1,3]) print(df) print('----------') #默认按“行标签”升序排序 df1 = df.sort_index() print(df1) print('-------------') #按“列标签”降排序 df2 = df.sort_index(axis=1,ascending=False) print(df2)
D:python3installpython.exe D:/python/py3script/python66.py b a c 2 1 4 1 0 2 3 3 1 2 2 8 3 3 1 2 ---------- b a c 0 2 3 3 1 2 2 8 2 1 4 1 3 3 1 2 ------------- c b a 2 1 1 4 0 3 2 3 1 8 2 2 3 2 3 1 Process finished with exit code 0