Hello,
First, thank you so much for these great tutorials. There are a number of warnings regarding the usage of "just the indexing operator" for quite a while and the explanation of .loc and .iloc were tremendously helpful.
I'm writing to recommend that you add an example of assigning a new column from a boolean selection that returns a boolean series in the article on assignment. Take for example, the following:
criteria = df[‘some_col’] > sum_number
criteria.head()
0 True
1 False
2 True
4 True
6 False
Using just the assignment operator...
df['new_col'] = df['some-col'] > some_number
...works but yields the warning:
Try using .loc[row_indexer,col_indexer] = value instead
The closest example I've found in your article is this one:
last_name = pd.Series(data=['Smith', 'Jones', 'Williams', 'Green', 'Brown', 'Simpson', 'Peters'],
index=['Tom', 'Niko', 'Penelope', 'Aria', 'Sofia', 'Dean', 'Zach'])
last_name
df['last_name'] = last_name
However, at least in Pandas 0.19.2, this will still yield the same error. After searching around a bit I found this stack overflow discussion which states that after Pandas 0.16.0, the best way to do this is to use the assign function in the following manner:
criteria = df[‘some_col’] > sum_number
df_three.assign(new_col_name = criteria) #note: no quotes on new_col_name
Which seems to work well for me.
Alternatively, I suppose you can simply add which version the tutorial was written under.
Thanks again for this wonderful guide!