Giter Site home page Giter Site logo

Comments (7)

Nrezhang avatar Nrezhang commented on September 27, 2024 1
>>> df = pd.DataFrame({'column': [0.0, 1.0, 2.0]})
>>> df.dtypes
column    float64
dtype: object
>>> df.convert_dtypes()
   column
0       0
1       1
2       2
>>> df.convert_dtypes().dtypes
column    Int64
dtype: object

I can confirm that the issue is reproducible and not intended behavior. When creating a dataframe that has the same data as newdf, the intended behavior is shown above

from pandas.

Aloqeely avatar Aloqeely commented on September 27, 2024 1

It seems like convert_dtypes does not do any conversion if the existing dtypes are already supporting pd.NA.
This might be intended because originally the point of convert_dtypes was to encourage users to use pandas ExtensionDtypes instead of numpy dtypes, but that conflicts with the documentation: "Convert columns to the best possible dtypes"

from pandas.

mroeschke avatar mroeschke commented on September 27, 2024

Thanks for the issue @caballerofelipe but this is the expected behavior of convert_dtypes. As mentioned it's only intended to convert to a dtype that supports pd.NA

I believe the functionality you're expecting is in to_numeric(downcast=) so closing https://pandas.pydata.org/docs/reference/api/pandas.to_numeric.html

from pandas.

Aloqeely avatar Aloqeely commented on September 27, 2024

@mroeschke don't you think the doc is incorrect though?
It says it converts columns to the best possible dtypes that support pd.NA but that is not actually the case, if it was then it should have converted from Float64 to Int64

from pandas.

mroeschke avatar mroeschke commented on September 27, 2024

I guess "best possible" is a bit too subjective so I wouldn't say incorrect as opposed to unclear. A doc improvement to change "best possible" to "convert a numpy type to a type that supports pd.NA" would probably be better

from pandas.

caballerofelipe avatar caballerofelipe commented on September 27, 2024

I believe if I can use Int64 instead of Float64 is "best" (when I don't need a decimal number), for instance from the point of view of legibility it's easier to read an int than to read a number with a point and a zero (without doing some formatting). Also the maximum possible numbers are bigger.

Is there a processing reason for not changing from Float64 to Int64, is it expensive some how? (No rhetorical question here, I don't know the answer)

Also, is it more expensive than going from float64 (lower F) to Int64 (capital I)?

Also, maybe the function could have a parameter to make it do what I thought it was going to do?

from pandas.

caballerofelipe avatar caballerofelipe commented on September 27, 2024

So I found a workaround for what I want. Allow Pandas to change to int64 when no decimals are present.

In Step 6, instead of doing newdf.convert_dtypes(), to force a simpler dtype you can do newdf.astype('object').convert_dtypes(), it's one more step than I would have liked but it works.

Full Example
df = pd.DataFrame({'column': [0.0, 1.0, 2.0, 3.3]})
df = df.convert_dtypes()
print(df.dtypes)
# Returns
# column    Float64
# dtype: object

newdf = df.iloc[:-1]
print(newdf)
# Returns
#    column
# 0     0.0
# 1     1.0
# 2     2.0

newdf_convert = newdf.convert_dtypes()
print(newdf_convert.dtypes)
print(newdf_convert)
# Returns
# column    Float64
# dtype: object
#    column
# 0     0.0
# 1     1.0
# 2     2.0

newdf_astype_convert = newdf.astype('object').convert_dtypes()
print(newdf_astype_convert.dtypes)
print(newdf_astype_convert)
# Returns
# column    Int64
# dtype: object
#    column
# 0       0
# 1       1
# 2       2

# You could also use a more complex way to obtain int64 (lower i) or float64 (lower f)
newdf_astype_convert_int64 = (
    newdf
    .astype('object')
    .convert_dtypes()  # To dtype with pd.NA
    .astype('object')
    .replace(pd.NA, float('nan'))  # Remove pd.NA created before
    .infer_objects()
)
print(newdf_astype_convert_int64.dtypes)
print(newdf_astype_convert_int64)
# Returns
# column    int64
# dtype: object
#    column
# 0       0
# 1       1
# 2       2

The function convert_dtypes could have a parameter 'simplify_dtypes' (or maybe something a correct keyword that I haven't thought about) that would do the same thing without much implemetation effort: convert_dtypes(simplify_dtypes=True) and that would do .astype('object') before the actual conversion.

Also, you could use this to simplify "even further" to int64 (lower i) or float64 (lower f), see the full example. You would do: df.astype('object').convert_dtypes().astype('object').replace(pd.NA, float('nan')).infer_objects(). Although you might want to do this inside a with pd.option_context('future.no_silent_downcasting', True): because of the replace() in there (see this issue).

Edit: Added .replace(pd.NA, float('nan')) in the example to allow conversion to float64 when a nan is present.

from pandas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.