Comments (10)
This is exactly the thing we are trying to solve. replace was previously casting your dtypes and will stop doing so in pandas 3
But it is unclear how to replace and cast. E.g. when I have [0, 1]
integers they stand for female and male.
df.gender = df.gender.astype(str)
df.gender = df.gender.replace({'0': 'male', '1': 'female'})
Is that the solution you have in mind? From a users perspective it is a smelling workaround.
The other way around is nearly not possible because I can not cast a str word to an integer.
print(df.gender) # ['male', 'male', 'female']
df.gender = df.gender.astype(int) # <-- ERROR
df.gender = df.gender.replace({'male': 0, 'female': 1})
What is wrong with casting in replace() ?
from pandas.
I got here, trying to understand what pd.set_option('future.no_silent_downcasting', True)
does.
The message I get is from .fillna()
, which is the same message for .ffill()
and .bfill()
. So I'm posting this here in case someone is looking for the same answer using the mentioned functions. This is the warning message I get:
FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version.
Call result.infer_objects(copy=False) instead.
To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
Maybe the confusion arises from the way the message is phrased, I believe it's kind of confusing, it creates more questions than answers:
- Do I need to do some downcasting?
- Am I doing some downcasting somewhere where I am not aware?
- When the messages stated
Call result.infer_objects(copy=False) instead.
, is it telling me to call it before the function I'm trying to use, after? Is it telling me not to use the function? (I guess not sinceinfer_objects
should do something different thanreplace
or one of thefill
functions) - By using
pd.set_option('future.no_silent_downcasting', True)
am I removing the downcasting or am I making the downcasting not silent? Maybe both?
From what I understand, pd.set_option('future.no_silent_downcasting', True)
removes the downcasting the functions do and if it needs to do some downcasting an error would be raised, but I would need to be corrected here if I'm wrong.
from pandas.
Just do
pandas.set_option("future.no_silent_downcasting", True)
as suggested on the stack overflow question
The series will retain object dtype in pandas 3.0 instead of casting to int64
from pandas.
I'm having this problem as well. I have the feeling it's related to .replace
changing the types of the values (as one Stack Overflow commenter implied). Altering the original example slightly:
s = Series(['foo', 'bar'])
replace_dict = {'foo': '1', 'bar': '2'} # replacements maintain original types
s = s.replace(replace_dict)
makes the warning go away.
I agree with @buhtz in that setting the "future" option isn't really getting at the root of understanding how to make this right. I think the hard part for most of us who have relied on .replace
is that we never thought of it as doing any casting -- it was replacing. Now the semantics seem to have changed. It'd be great to reopen this issue to clarify the thinking, intention, and direction so that we can come up with appropriate work-arounds.
from pandas.
So... I did some digging and I think I have a better grasp of what's going on with this FutureWarning. So I wrote an article in Medium to explain what's happening. If you want to give it a read, here it is:
Deciphering the cryptic FutureWarning for .fillna in Pandas 2
Long story short, do:
with pd.option_context('future.no_silent_downcasting', True):
# Do you thing with fillna, ffill, bfill, replace... and possible use infer_objects if needed
from pandas.
I feel like this thread is starting to become a resource. In that spirit:
I just experienced another case where .replace
would have been amazing, but I now need an alternative: a column of strings that are meant to be floats, where the only "offending" values are empty strings (meant to be NaN's). Consider:
records = [
{'a': ''},
{'a': 12.3},
]
df = pd.DataFrame.from_records(records)
I would have first reached for .replace
. Now I consider .filla
, but that doesn't work either. Using .assign
with .to_numeric
does the trick:
In [1]: df.dtypes
Out[1]:
a object
dtype: object
In [2]: x = df.assign(a=lambda x: pd.to_numeric(x['a']))
In [3]: x
Out[3]:
a
0 NaN
1 12.3
In [4]: x.dtypes
Out[4]:
a float64
dtype: object
from pandas.
pandas.set_option("future.no_silent_downcasting", True)
But doesn't this just deactivate the message but doesn't modify the behavior.
To my understanding the behavior is the problem and need to get solved. Or not?
My intention is to extinguish the fire and not just turn off the fire alarm but let the house burn down.
from pandas.
s that we never thought of it as doing any casting
This is exactly the thing we are trying to solve. replace was previously casting your dtypes and will stop doing so in pandas 3
from pandas.
The other way around is nearly not possible because I can not cast a str word to an integer.
One alternative (although I realise a non .replace
supported "alternative" may not be what was actually desired) is to use categoricals with .assign
:
import pandas as pd
df = pd.DataFrame(['male', 'male', 'female'], columns=['gender']) # from the original example
genders = pd.Categorical(df['gender'])
df = df.assign(gender=genders.codes)
If semantically similar data is spread across multiple columns, it gets a little more involved:
import random
import numpy as np
import pandas as pd
def create_data(columns):
genders = ['male', 'male', 'female']
for i in columns:
yield (i, genders.copy())
random.shuffle(genders)
# Create the dataframe
columns = [ f'gender_{x}' for x in range(3) ]
df = pd.DataFrame(dict(create_data(columns)))
# Incorporate all relevant data into the categorical
view = (df
.filter(items=columns)
.unstack())
categories = pd.Categorical(view)
values = np.hsplit(categories.codes, len(columns))
to_replace = dict(zip(columns, values))
df = df.assign(**to_replace)
which I think is what the Categorical documentation is trying to imply.
from pandas.
From your code:
x = df.assign(a=lambda x: pd.to_numeric(x['a']))
I would do it like this, it feels a little cleaner and easier to read:
df['a'] = pd.to_numeric(df['a'])
You said you wanted to use replace
, if you want to use it, you can do this:
with pd.option_context('future.no_silent_downcasting', True):
df2 = (df
.replace('', float('nan')) # Replace empty string for nans
.infer_objects() # Allow pandas to try to "infer better dtypes"
)
df2.dtypes
# a float64
# dtype: object
A note about
Now I consider
.filla
, but that doesn't work either.
That would not work because .fillna
fills na values but ''
(empty string) is not na. (see Filling missing data).
from pandas.
Related Issues (20)
- BUG: [pyarrow] Bizarre overflow error when subtracting two identical `Index` objects. HOT 2
- ENH: Allow to plot weighted KDEs.
- BUG: rank is not supported for large_string[pyarrow] dtype
- BUG: String methods has no method "isascii()" HOT 2
- BUG: freq after sort values differs between arm Mac and ubuntu HOT 2
- BUG: doctests fail after release of scipy 1.14.0
- DOC: to_sql docs should mention ADBC HOT 1
- BUG: DataFrame.xs multi-index drop_level=False has no effect when level= is left at default
- DataFrameGroupBy.agg with nan results into inf HOT 1
- CI: Don't run the 'trailing-whitespace' check on markdown files.
- BUG: HDF support and `show_versions()` broken with pandas 2.2.2 and numpy 2.0
- BUG: to_sql does not populate index column with a value when using the mssql+pyodbc engine
- BUG: to_sql does gives incorrect column name for index when callable passed in to method
- ENH: Back pd.BooleanArray with nanoarrow HOT 1
- BUG: None becomes empty string when writing multiple columns to CSV, but double quotes "" when writing single columns HOT 1
- BUG: Unable to import `pandas` when `pyarrow` 16.1.0 is installed HOT 2
- BUG: datetime64[s] data changes when put into HDFStore HOT 1
- BUG: random crash / hang when calculating rolling sum HOT 2
- BUG: 0/0 with arrow backend is not "NA" HOT 1
- BUG: pivot_table chokes on pd.DatetimeTZDtype if there are no rows.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pandas.