Giter Site home page Giter Site logo

Comments (9)

stinodego avatar stinodego commented on July 1, 2024 2

We will rename fill_null to fill_nulls. This aligns with has_nulls / drop_nulls.

Also for fill_nan -> fill_nans.

from polars.

JulianCologne avatar JulianCologne commented on July 1, 2024 1

We will rename fill_null to fill_nulls. This aligns with has_nulls / drop_nulls.

Also for fill_nan -> fill_nans.

different opinion here! imo we should drop the "s" for all cases

Reasons

  • fill_nan / fill_null are ROW level operation. For each row you fill a single nan / null. This is also in line with column identifiers which should be singular instead of plural ("name" instead of "names", "age" instead of "ages", "value" instead of "values", ...)
  • has_nulls is strictly speaking semantically incorrect. nulls as a plural implies that it checks for multiple null values in a column which is not the case! It checks for a single null and returns if the column contains a single null (potentially more) -> should be has_null
  • drop_nulls: strong opinion for a rename to drop_null_rows which would make it much more descriptive! This was probably "copied" from pandas fillna which is much different because it can also remove columns. Otherwise would also lean towards drop_null

from polars.

stinodego avatar stinodego commented on July 1, 2024
  • fill_nan / fill_null are ROW level operation. For each row you fill a single nan / null. This is also in line with column identifiers which should be singular instead of plural ("name" instead of "names", "age" instead of "ages", "value" instead of "values", ...)

Not sure. fill_nulls says "fill all the nulls in this column with value X". What makes you think this is a row-level operation?

  • has_nulls is strictly speaking semantically incorrect. nulls as a plural implies that it checks for multiple null values in a column which is not the case! It checks for a single null and returns if the column contains a single null (potentially more) -> should be has_null

I have considered this, but you conveniently gloss over the fact that has_null isn't quite correct either because it also returns true with multiple nulls. Neither has_null or has_nulls is completely correct, but contains_at_least_one_null is too long, so we have to choose.

  • drop_nulls: strong opinion for a rename to drop_null_rows which would make it much more descriptive! This was probably "copied" from pandas fillna which is much different because it can also remove columns. Otherwise would also lean towards drop_null

An expression doesn't have any rows, so drop_null_rows makes no sense.

For what it's worth, personally I feel like fill_null, drop_null, has_null feels better, but I cannot really make a good argument for it.

Anyway, I'll sleep on this one before merging it, but I'm not convinced at all by your arguments here.

from polars.

gab23r avatar gab23r commented on July 1, 2024

I personally prefer the singular form as well. These functions drop/fill or check the existence of any null value so it makes sense to remove the s.

Moreover, I think the fill_null is much more used than drop_nulls and has_nulls, so it will break less code.

from polars.

JulianCologne avatar JulianCologne commented on July 1, 2024

I have considered this, but you conveniently gloss over the fact that has_null isn't quite correct either because it also returns true with multiple nulls. Neither has_null or has_nulls is completely correct, but contains_at_least_one_null is too long, so we have to choose.

has_null is completely correct imo. It answers the question if the column has/contains null. The quantity is irrelevat here 🤓.

Thinking about this a bit more the best solution imo would be actually be to completely remove has_null(s) and introduce Expr.contains (contains(None) with fast-path) as a super-function. Everyone is familiar with this concept. (Also contains isn't called contains_at_least_one because that is implied)

Thoughts? 💭
I think that would improve the api (there is already 5x contains but not yet on Expr) 🤓😎

from polars.

lyngc avatar lyngc commented on July 1, 2024

Also prefer singular

from polars.

stinodego avatar stinodego commented on July 1, 2024

Moreover, I think the fill_null is much more used than drop_nulls and has_nulls, so it will break less code.

Not sure about fill_null vs drop_nulls frequency, but has_nulls was added very recently so renaming it will be very low impact.

from polars.

ritchie46 avatar ritchie46 commented on July 1, 2024

After some thought I agree with @JulianCologne. I was thinking about this yesterday and is_null is an elementwise question. The same can be said for fill_null.

I think we should ask if it is a single row/elementwise question and if so go for singular.

drop_nulls isn't an elementwise operation, so it is fine to be plural here. As the longer version would be drop_null_rows.

I want to put this on a hold as I don't think this merits a change at all.

from polars.

stinodego avatar stinodego commented on July 1, 2024

I want to put this on a hold as I don't think this merits a change at all.

Agree. Status quo is fine, I think. I'll close this for now.

from polars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.