Giter Site home page Giter Site logo

ENH: generalize `__init__` on a `dict` to `abc.collections.Mapping` and `__getitem__` on a `list` to `abc.collections.Sequence` about pandas HOT 10 OPEN

MilesCranmer avatar MilesCranmer commented on August 28, 2024
ENH: generalize `__init__` on a `dict` to `abc.collections.Mapping` and `__getitem__` on a `list` to `abc.collections.Sequence`

from pandas.

Comments (10)

mrkn avatar mrkn commented on August 28, 2024 1

I made a pull-request to propose introduce Mapping support in DataFrame construction #58814.

from pandas.

Aloqeely avatar Aloqeely commented on August 28, 2024 1

#58814 only fixes the issue of constructing a DataFrame from a Mapping, but using a Mapping can still work unexpectedly for other methods, take this example from the docs (using #58814's DictWrapper class):

df = pd.DataFrame({"num_legs": [2, 4], "num_wings": [2, 0]}, index=["falcon", "dog"])

values = {"num_wings": [0, 3]}
my_dict = DictWrapper(values)  # <-- Mapping

print(df.isin(values))  # Correct result
print(df.isin(my_dict))  # Wrong result

A quick search shows 100+ results of isinstance(_, dict) checks and also 100+ isinstance(_, list) checks, so I think if we're going to support Mapping for DataFrame construction then we'd have to support it anywhere else.

from pandas.

jbrockmendel avatar jbrockmendel commented on August 28, 2024

I’m -0.5 on this. Internally we would just convert to dict/list anyway. I’d rather users do that where necessary and avoid the perf penalty in the cases we do supporr

from pandas.

MilesCranmer avatar MilesCranmer commented on August 28, 2024

Ah, there would be a performance penalty? I would have thought it would just be changing an isinstance(i, list) to isinstance(i, Sequence)?

from pandas.

Aloqeely avatar Aloqeely commented on August 28, 2024

Slightly unrelated, but have you tried the Julia pandas wrapper? I believe it automatically does the conversion from Julia Vector/Dict to a Python list/dict for you

from pandas.

MilesCranmer avatar MilesCranmer commented on August 28, 2024

Thanks, sadly that one looks to use the older PyCall.jl instead of the newer PythonCall.jl, so not (yet) compatible.

It's not a big deal if not possible. I guess it's a bit of a sharp edge, especially as it doesn't throw an error, but for users who google this, the workaround does work. I just thought it seemed like it might be more duck-typey/pythonic if any dict-like input was acceptable for initializing from a dict (and similar for sequences) rather than only explicit dicts – I could imagine other cases where it might be useful to have this. But I understand this is totally subjective!

from pandas.

mrkn avatar mrkn commented on August 28, 2024

@jbrockmendel I checked the performance degradation by accepting Mapping. The asv benchmark said BENCHMARKS NOT SIGNIFICANTLY CHANGED. See mrkn#1 for more details and the patch I applied.

Note that I don't have much knowledge of the pandas internals so this patch can be insufficient to accept Mapping to create a dataframe.

from pandas.

cjdoris avatar cjdoris commented on August 28, 2024

I’m -0.5 on this. Internally we would just convert to dict/list anyway. I’d rather users do that where necessary and avoid the perf penalty in the cases we do supporr

In the case of dict -> Mapping this isn't true - the dataframe constructor calls dict_to_mgr(data, ...) (

mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)
) which internally just calls data.keys() and data[key] for each key and converts them to some other internal data structure - so it doesn't matter whether the source is a dict or any other Mapping.

AFAICT the only performance concern is in the actual type check isinstance(data, dict) -> isinstance(data, Mapping) which I presume is negligible (and backed up by mrkn's post).

I haven't looked at the indexing code to see if the same conclusions hold for list -> Sequence there.

from pandas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.