pandas 小抄

发表于： 2022-08-21 更新于： 2023-02-22 分类于：开发

字数： 392 阅读：≈ 1分钟

pandas 是什么

pandas 是 Python 实现的一个数据处理的工具，可以非常方便的处理数据库、表格中的数据，进行聚合、拆分等操作。

pandas 常用方法

利用 Pandas 处理数据时一些常用的方法。

cumprod()

用来查找到目前为止在任何轴上看到的值的累积乘积，每个单元格都填充。

import pandas as pd

df = pd.DataFrame({"A":[5, 3, 6, 4],
                   "B":[11, 2, 4, 3],
                   "C":[4, 3, 8, 5],
                   "D":[5, 4, 2, 8]})

print(df)
print(df.cumprod(axis=0))

resample()

主要提供重新采样的方式，可以进行降采样，也可以升采样，主要针对采样的频率。

降采样

import pandas as pd
import numpy as np

rng = pd.date_range('1/1/2021', periods=100, freq='D') # 按天生成数据

ts = pd.Series(np.random.randn(len(rng)), index=rng)

rae = ts.resample('M').mean() # 按月进行重采样，用平均值填充
print(ts)
print(rae)

升采样

import pandas as pd
import numpy as np

rng = pd.date_range('1/1/2021', periods = 20, freq='3D')

ts = pd.Series(np.random.randn(len(rng)), index=rng)
print(ts.head())

print(ts.resample('D').asfreq().head())

cumsum()

这个方法与 cumprod() 方法类似，主要是求和。

import pandas as pd

df = pd.DataFrame({"A":[5, 3, 6, 4],
                   "B":[11, 2, 4, 3],
                   "C":[4, 3, 8, 5],
                   "D":[5, 4, 2, 8]})

print(df)
print(df.cumsum(axis=0))

rolling().sum()

滚动求和方法，这个方法与 cumsum() 的区别在于不累计，而是按提供的值进行求和。

import pandas as pd
amount=pd.Series([100,90,110,160,110,130,80,90,199,159])

print(amount.rolling(3).sum())

当然，同样的方法还包括 mean(),var(),std() 等。

shift()

shift() 方法主要是用来移动数据的，他会按具体的数字进行移动。

import pandas as pd

df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
                   "Col2": [13, 23, 18, 33, 48],
                   "Col3": [17, 27, 22, 37, 52]},
                  index=pd.date_range("2020-01-01", "2020-01-05"))
print(df)
print(df.shift(periods=3))