Skip to content

Commit 43cc65f

Browse files
committed
pandas-setting-with-copy
1 parent 765ad72 commit 43cc65f

1 file changed

Lines changed: 87 additions & 0 deletions

File tree

README.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
感受Python之美 | 一、Python基础 |二、Python字符串和正则|三、Python文件和日期|四、Python三大利器|五、Python绘图|六、Python之坑|七、Python第三方包|八、机器学习和深度学必知算法|九、Python实战|十、Pandas数据分析案例实战
44

5+
6+
57
> 目前,正在编写第十一章:一步一步掌握Flask web 开发
68
79

@@ -7079,6 +7081,91 @@ Out[6]:
70797081
70807082
也就是说dummy向量的长度等于输入字符串中,唯一字符的个数。
70817083
7084+
#### 15 讨厌的SettingWithCopyWarning!!!
7085+
7086+
Pandas 处理数据,太好用了,谁用谁知道!
7087+
7088+
使用过 Pandas 的,几乎都会遇到一个警告:
7089+
7090+
*SettingWithCopyWarning*
7091+
7092+
非常烦人!
7093+
7094+
尤其是刚接触 Pandas 的,完全不理解为什么弹出这么一串:
7095+
7096+
```python
7097+
d:\source\test\settingwithcopy.py:9: SettingWithCopyWarning:
7098+
A value is trying to be set on a copy of a slice from a DataFrame.
7099+
Try using .loc[row_indexer,col_indexer] = value instead
7100+
7101+
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
7102+
```
7103+
7104+
归根结底,是因为代码中出现`链式操作`...
7105+
7106+
有人就问了,什么是`链式操作`?
7107+
7108+
这样的:
7109+
7110+
```python
7111+
tmp = df[df.a<4]
7112+
tmp['c'] = 200
7113+
```
7114+
7115+
先记住这个最典型的情况,即可!
7116+
7117+
有的人就问了:出现这个 Warning, 需要理会它吗?
7118+
7119+
如果结果不对,当然要理会;如果结果对,不care.
7120+
7121+
举个例子~~
7122+
7123+
```python
7124+
import pandas as pd
7125+
7126+
df = pd.DataFrame({'a':[1,3,5],'b':[4,2,7]},index=['a','b','c'])
7127+
df.loc[df.a<4,'c'] = 100
7128+
print(df)
7129+
print('it\'s ok')
7130+
7131+
tmp = df[df.a<4]
7132+
tmp['c'] = 200
7133+
print('-----tmp------')
7134+
print(tmp)
7135+
print('-----df-------')
7136+
print(df)
7137+
```
7138+
7139+
输出结果:
7140+
```python
7141+
a b c
7142+
a 1 4 100.0
7143+
b 3 2 100.0
7144+
c 5 7 NaN
7145+
it's ok
7146+
d:\source\test\settingwithcopy.py:9: SettingWithCopyWarning:
7147+
A value is trying to be set on a copy of a slice from a DataFrame.
7148+
Try using .loc[row_indexer,col_indexer] = value instead
7149+
7150+
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
7151+
tmp['c'] = 200
7152+
-----tmp------
7153+
a b c
7154+
a 1 4 200
7155+
b 3 2 200
7156+
-----df-------
7157+
a b c
7158+
a 1 4 100.0
7159+
b 3 2 100.0
7160+
c 5 7 NaN
7161+
```
7162+
7163+
it's ok 行后面的发生链式赋值,导致结果错误。因为 tmp 变了,df 没赋上值啊,所以必须理会。
7164+
7165+
it's ok 行前的是正解。
7166+
7167+
以上,链式操作尽量避免,如何避免?多使用 `.loc[row_indexer,col_indexer]`,提示告诉我们的~
7168+
70827169
70837170
70847171
### 十一、一步一步掌握Flask web开发

0 commit comments

Comments
 (0)