Comments (10)
I made a pull-request to propose introduce Mapping support in DataFrame construction #58814.
from pandas.
#58814 only fixes the issue of constructing a DataFrame from a Mapping, but using a Mapping can still work unexpectedly for other methods, take this example from the docs (using #58814's DictWrapper class):
df = pd.DataFrame({"num_legs": [2, 4], "num_wings": [2, 0]}, index=["falcon", "dog"])
values = {"num_wings": [0, 3]}
my_dict = DictWrapper(values) # <-- Mapping
print(df.isin(values)) # Correct result
print(df.isin(my_dict)) # Wrong result
A quick search shows 100+ results of isinstance(_, dict)
checks and also 100+ isinstance(_, list)
checks, so I think if we're going to support Mapping for DataFrame construction then we'd have to support it anywhere else.
from pandas.
I’m -0.5 on this. Internally we would just convert to dict/list anyway. I’d rather users do that where necessary and avoid the perf penalty in the cases we do supporr
from pandas.
Ah, there would be a performance penalty? I would have thought it would just be changing an isinstance(i, list)
to isinstance(i, Sequence)
?
from pandas.
Slightly unrelated, but have you tried the Julia pandas wrapper? I believe it automatically does the conversion from Julia Vector/Dict to a Python list/dict for you
from pandas.
Thanks, sadly that one looks to use the older PyCall.jl instead of the newer PythonCall.jl, so not (yet) compatible.
It's not a big deal if not possible. I guess it's a bit of a sharp edge, especially as it doesn't throw an error, but for users who google this, the workaround does work. I just thought it seemed like it might be more duck-typey/pythonic if any dict-like input was acceptable for initializing from a dict (and similar for sequences) rather than only explicit dicts – I could imagine other cases where it might be useful to have this. But I understand this is totally subjective!
from pandas.
@jbrockmendel I checked the performance degradation by accepting Mapping
. The asv benchmark said BENCHMARKS NOT SIGNIFICANTLY CHANGED
. See mrkn#1 for more details and the patch I applied.
Note that I don't have much knowledge of the pandas internals so this patch can be insufficient to accept Mapping
to create a dataframe.
from pandas.
I’m -0.5 on this. Internally we would just convert to dict/list anyway. I’d rather users do that where necessary and avoid the perf penalty in the cases we do supporr
In the case of dict
-> Mapping
this isn't true - the dataframe constructor calls dict_to_mgr(data, ...)
(
Line 763 in 2aa155a
data.keys()
and data[key]
for each key and converts them to some other internal data structure - so it doesn't matter whether the source is a dict
or any other Mapping
.
AFAICT the only performance concern is in the actual type check isinstance(data, dict)
-> isinstance(data, Mapping)
which I presume is negligible (and backed up by mrkn's post).
I haven't looked at the indexing code to see if the same conclusions hold for list
-> Sequence
there.
from pandas.
Related Issues (20)
- ENH: Python 3.13 support HOT 20
- BUG: "styler.format.thousands" option doesn't work for integers HOT 4
- BUG: Pandas 2 is broken! HOT 2
- BUG: 2-sided inplace drop loses freq in DatetimeIndex HOT 4
- BUG: read_orc does not use the provided filesystem for all operations HOT 1
- BUG: pd.to_datetime fails to identify actual date format HOT 4
- BUG: eval fails for ExtensionArray HOT 2
- ENH: Randomised row selection with read_csv() HOT 4
- BUG: read_parquet converts all digits strings to int HOT 2
- Make specific pandas dataframe column immuteable / not changeable HOT 4
- BUG: df.drop_duplicates fails if there is only a single row HOT 3
- Potential regression with PR "PERF: Eliminate circular references in accessor attributes (#58733)" HOT 1
- ENH: support parquet's enum type using Categorical when (de)serializing HOT 3
- ENH: Add a Series method which checks whether a Series is constant HOT 4
- BUG: df.agg with pd.NamedAgg axis=1 unsupported, but errors differently depending on contents of index HOT 2
- BUG: Segmentation Fault when importing Pandas in python 3.10.14 HOT 4
- BUG: df.agg with df with missing values results in IndexError HOT 3
- BUG: Groupby transformation (cumsum) output dtype depends on whether NA is among group labels HOT 9
- DOC: Docstrings missing from .py files in Sphinxext docs folder HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pandas.