Comments (3)
Should I insert that here at current L97?
I think this should go in the notes section which is below See Also
; L190 here.
When you say hash-based, I assume this means that it always uses hash()? eq() is never considered? Should this be mentioned?
No - hash
is used to produce a non-necessarily unique but often distinct number; __eq__
is then used in the case of conflicts. This is why a requirement for writing a hash function is that two objects that are equal must have equal hashes.
I also just realized that the string workaround obviously has its own drawbacks, as it would now group the string
'1'
and the int1
as equals, but I guess we don't have to provide a general solution.
That's a good point - this makes me quite hesitant to mention the workaround.
from pandas.
Thanks for the report, I think a line in the notes section of the docstring on DataFrame.groupby
and Series.groupby
would be appropriate, something like:
The implementation of
groupby
is hash-based, meaning in particular that objects that compare as equal will be considered as the same group. An exception to this is that pandas has special handling of NA values: any NA values will be collapsed to a single group, regardless of how they compare. See the user guide for more details.
An example in the user guide and your suggested workaround could then be linked to in the last sentence.
Would you be interested in submitting a PR?
from pandas.
Thanks for the suggestion. I'd be happy to open a PR on this. Should I insert that here at current L97?
pandas/pandas/core/shared_docs.py
Lines 95 to 99 in bdc79c1
Should I add an example?
I just have another follow-up question: When you say hash-based, I assume this means that it always uses __hash__()
? __eq__()
is never considered? Should this be mentioned? Maybe in the linked user guide? The implication is that a simple a == b
test during debugging as I did above is not actually the right thing to look at (although proper code should maintain a.__eq__(b) == True
=>
a.__hash__() == b.__hash__()
, but bugs do happen...).
I also just realized that the string workaround obviously has its own drawbacks, as it would now group the string '1'
and the int 1
as equals, but I guess we don't have to provide a general solution.
from pandas.
Related Issues (20)
- QST: Grouping at fixed time intervals (hours and minutes) regardeless the date and the first row time
- BUG: interchange protocol with nullable datatypes a non-null validity provides nonsense results HOT 2
- BUG: differing syntax for datetime column dtype specification causes failure with `assert_frame_equal()` HOT 3
- BUG: Cannot use numpy FLS as indicies since pandas 2.2.1 HOT 1
- BUG: integers are being converted to floats when accessed from DataFrame HOT 1
- BUG: groupby.apply respects as_index=False if and only if group_keys=True HOT 3
- ENH: DataFrame.duplicated() should have default `keep` parameter as False HOT 1
- BUG: to_sql fails for Oracle BLOB columns
- BUG: Transform() function returns unexpected results with list HOT 3
- Reader accept directory path with suffix check HOT 6
- BUG: melt no longer supports id_vars and value_vars collapsing levels accross MultiIndex HOT 1
- BUG: interchange protocol with nullable pyarrow datatypes a non-null validity provides nonsense results HOT 3
- BUG: pyarrow stripping leading zeros with dtype=str HOT 5
- Potential regression induced by commit 6ef44f2 HOT 1
- BUG: List of years (as string) raises UserWarning with to_datetime HOT 3
- BUG: fastparquet interface fails on load with non-unique index and CoW HOT 3
- BUG: `join` with `list` does not behave like singleton
- BUG: pandas can't detect matplotlib on nanoserver (works fine on servercore) HOT 2
- BUG: Groupby ignores group_keys=False when followed by a rolling calculation HOT 4
- Breaking change in nightly wheels when assigning to `df.values`, no FutureWarning HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pandas.