<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="23

Slightly unrelated, but have you tried the <a href="https://github.com/JuliaPy/Pandas.

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

ENH: generalize `init` on a `dict` to `abc.collections.Mapping` and `getitem` on a `list` to `abc.collections.Sequence` about pandas HOT 10 OPEN

MilesCranmer commented on August 28, 2024

ENH: generalize `__init__` on a `dict` to `abc.collections.Mapping` and `__getitem__` on a `list` to `abc.collections.Sequence`

from pandas.

Comments (10)

mrkn commented on August 28, 2024 1

I made a pull-request to propose introduce Mapping support in DataFrame construction #58814.

from pandas.

Aloqeely commented on August 28, 2024 1

#58814 only fixes the issue of constructing a DataFrame from a Mapping, but using a Mapping can still work unexpectedly for other methods, take this example from the docs (using #58814's DictWrapper class):

df = pd.DataFrame({"num_legs": [2, 4], "num_wings": [2, 0]}, index=["falcon", "dog"])

values = {"num_wings": [0, 3]}
my_dict = DictWrapper(values)  # <-- Mapping

print(df.isin(values))  # Correct result
print(df.isin(my_dict))  # Wrong result

A quick search shows 100+ results of isinstance(_, dict) checks and also 100+ isinstance(_, list) checks, so I think if we're going to support Mapping for DataFrame construction then we'd have to support it anywhere else.

from pandas.

jbrockmendel commented on August 28, 2024

I’m -0.5 on this. Internally we would just convert to dict/list anyway. I’d rather users do that where necessary and avoid the perf penalty in the cases we do supporr

from pandas.

MilesCranmer commented on August 28, 2024

Ah, there would be a performance penalty? I would have thought it would just be changing an isinstance(i, list) to isinstance(i, Sequence)?

from pandas.

Aloqeely commented on August 28, 2024

Slightly unrelated, but have you tried the Julia pandas wrapper? I believe it automatically does the conversion from Julia Vector/Dict to a Python list/dict for you

from pandas.

MilesCranmer commented on August 28, 2024

Thanks, sadly that one looks to use the older PyCall.jl instead of the newer PythonCall.jl, so not (yet) compatible.

It's not a big deal if not possible. I guess it's a bit of a sharp edge, especially as it doesn't throw an error, but for users who google this, the workaround does work. I just thought it seemed like it might be more duck-typey/pythonic if any dict-like input was acceptable for initializing from a dict (and similar for sequences) rather than only explicit dicts – I could imagine other cases where it might be useful to have this. But I understand this is totally subjective!

from pandas.

mrkn commented on August 28, 2024

@jbrockmendel I checked the performance degradation by accepting Mapping. The asv benchmark said BENCHMARKS NOT SIGNIFICANTLY CHANGED. See mrkn#1 for more details and the patch I applied.

Note that I don't have much knowledge of the pandas internals so this patch can be insufficient to accept Mapping to create a dataframe.

from pandas.

cjdoris commented on August 28, 2024

I’m -0.5 on this. Internally we would just convert to dict/list anyway. I’d rather users do that where necessary and avoid the perf penalty in the cases we do supporr

In the case of dict -> Mapping this isn't true - the dataframe constructor calls dict_to_mgr(data, ...) (

pandas/pandas/core/frame.py

Line 763 in 2aa155a

mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)

) which internally just calls data.keys() and data[key] for each key and converts them to some other internal data structure - so it doesn't matter whether the source is a dict or any other Mapping.

AFAICT the only performance concern is in the actual type check isinstance(data, dict) -> isinstance(data, Mapping) which I presume is negligible (and backed up by mrkn's post).

I haven't looked at the indexing code to see if the same conclusions hold for list -> Sequence there.

from pandas.

ENH: generalize `init` on a `dict` to `abc.collections.Mapping` and `getitem` on a `list` to `abc.collections.Sequence` about pandas HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent