Giter Site home page Giter Site logo

pyladieslondon-sprints's Introduction

Hello everybody,

I work on pandas (as part of work) and polars (as volunteer), and am mostly interested in time series.

I've also written some code quality tools:

  • cython-lint
  • absolufy-imports (superseded by ruff)
  • nbQA (probably soon-to-be superseded by ruff)
  • auto-walrus (very silly, not recommended, hopefully won't be implemented in ruff)

pyladieslondon-sprints's People

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

pyladieslondon-sprints's Issues

Add examples from Jezrael's answers

Here's the list, from https://stackoverflow.com/users/2901002/jezrael

If any of the existing docstrings don't have examples, take an example from one of these answers and add it. Likewise, if there's a really good example from these answers which isn't present in an existing docstring, please add it.

pivot dupe
https://stackoverflow.com/q/47152691/
booelan indexing dupe
https://stackoverflow.com/q/17071871
idxmax + groupby dupe
https://stackoverflow.com/q/15705630
idxmin + groupby dupe
https://stackoverflow.com/q/23394476
melt dupe
https://stackoverflow.com/q/28654047
explode dupe
https://stackoverflow.com/q/12680754
cumcount dupe
https://stackoverflow.com/q/23435270
map dupe
https://stackoverflow.com/q/24216425
groupby+size+unstack dupe
https://stackoverflow.com/q/39132742
https://stackoverflow.com/q/38278603
sorting inplace dupe
https://stackoverflow.com/q/42613581
factorize dupe
https://stackoverflow.com/q/39357882
groupby+size dupe
https://stackoverflow.com/q/19384532
groupby+ mean dupe
https://stackoverflow.com/q/30482071
transform sum dupe
https://stackoverflow.com/q/30244952
transform size dupe
https://stackoverflow.com/q/37189878
keyerror dupe
https://stackoverflow.com/q/43736163
merge/map dupe
https://stackoverflow.com/q/53010406
value_count dupe
https://stackoverflow.com/q/15411158
numpy select, where dupe
https://stackoverflow.com/q/19913659
wide_to_long dupe
https://stackoverflow.com/q/55766565
reset_index dupe
https://stackoverflow.com/q/36932759

please comment letting others know which example you will be adding. Please only take 1-2 examples at a time.

remove pd_array from tests

Currently, lots of tests use pd_array, e.g. pandas/tests/arithmetic/common.py .
It would be better to import array from pandas and use that

Move grep checks to pre-commit

  • 'Check for use of exec'
  • 'Check for pytest warns'
  • 'Check for pytest raises without context'
  • 'Check for use of builtin filter function'
  • 'Check for invalid testing'
  • 'Check for invalid EA testing'
  • 'Check for deprecated messages without sphinx directive'
  • 'Check for backticks incorrectly rendering because of missing spaces'
  • 'Check that unittest.mock is not used (pytest builtin monkeypatch fixture should be used instead)'
  • 'Check for use of {foo!r} instead of {repr(foo)}'
  • 'Linting .pyx code for spacing conventions in casting'

Check the pygrep hooks in https://github.com/pandas-dev/pandas/blob/master/.pre-commit-config.yaml

Reorganise pre-commit-config

Currently, it's structured quite randomly.

Let's instead have:

  • third-party hooks
  • local hooks

, both sorted alphabetically

Add tests to pandas-dev-flaker

Currently, pandas has a lot of code quality checks. I've recently started work on writing a flake8 plugin, which unifies the existing checks and which will make its way into pandas once ready (hopefully within 1-2 weeks).

The repository is here: https://github.com/MarcoGorelli/pandas-dev-flaker

The issues to work on are there, in https://github.com/MarcoGorelli/pandas-dev-flaker/issues

If you'd like to work on an issue there, please leave a comment saying that you're working on it

Migrate grep checks from ci/code_checks.sh to pre-commit

ci/code_checks.sh contains lots of grep-based checks. These should be moved to pre-commit, as it comes with a load of benefits.

The following should be moved to the id unwanted-patterns:

    MSG='Check for use of exec' ; echo $MSG
    invgrep -R --include="*.py*" -E "[^a-zA-Z0-9_]exec\(" pandas
    RET=$(($RET + $?)) ; echo $MSG "DONE"
    MSG='Check for use of builtin filter function' ; echo $MSG
    invgrep -R --include="*.py" -P '(?<!def)[\(\s]filter\(' pandas
    RET=$(($RET + $?)) ; echo $MSG "DONE"
    MSG='Check for deprecated messages without sphinx directive' ; echo $MSG
    invgrep -R --include="*.py" --include="*.pyx" -E "(DEPRECATED|DEPRECATE|Deprecated)(:|,|\.)" pandas
    RET=$(($RET + $?)) ; echo $MSG "DONE"
    MSG='Check for use of {foo!r} instead of {repr(foo)}' ; echo $MSG
    invgrep -R --include=*.{py,pyx} '!r}' pandas
    RET=$(($RET + $?)) ; echo $MSG "DONE"
    echo $MSG "DONE"

The following to id unwanted-patterns-in-tests:

    MSG='Check for pytest warns' ; echo $MSG
    invgrep -r -E --include '*.py' 'pytest\.warns' pandas/tests/
    RET=$(($RET + $?)) ; echo $MSG "DONE"
    MSG='Check for invalid testing' ; echo $MSG
    invgrep -r -E --include '*.py' --exclude testing.py '(numpy|np)(\.testing|\.array_equal)' pandas/tests/
    RET=$(($RET + $?)) ; echo $MSG "DONE"
    MSG='Check that unittest.mock is not used (pytest builtin monkeypatch fixture should be used instead)' ; echo $MSG
    invgrep -r -E --include '*.py' '(unittest(\.| import )mock|mock\.Mock\(\)|mock\.patch)' pandas/tests/
    RET=$(($RET + $?)) ; echo $MSG "DONE"

All that needs copying is the regex pattern, along with a comment explaining what it does. You should also run pre-commit run unwanted-patterns --all-files and make sure it still passes

Getting started

Welcome!

Welcome to the 6th edition of "Let's contribute to pandas"! The format will be as follows:

  • 5-10 minutes: brief introduction, overview of issues you could choose to work on
  • rest of the session: your chance to work on issues and ask questions!

Here are some things you could choose to work on:

Remember - it's OK to push work that isn't perfect or finished, or that you're not 100% sure of!


Please visit and use it throughout the session to track your progress: https://docs.google.com/spreadsheets/d/1k3KTsL6x57K_qUFp9IqUI2_YQgyXKZ2gGgON6vTnbOE/edit?usp=sharing


In general, please refer to contributing guide for pandas-dev/pandas: https://pandas.pydata.org/pandas-docs/dev/development/contributing.html

Move np.array_equal check to pre-commit

This check is currently here:

https://github.com/pandas-dev/pandas/blob/9bfa67cf028deec509d4e07138f6659bd2b342fe/ci/code_checks.sh#L83-L86

it should be moved to the bottom of

https://github.com/pandas-dev/pandas/blob/822db7a53fdcf8d860aaa8b51da4b767b3f56fad/.pre-commit-config.yaml#L136-L156

Task here is:

  1. copy the regular expression (numpy|np)(\.testing|\.array_equal) into .pre-commit-config.yaml
  2. remove this check from ci/code_checks.sh
  3. git add, git commit, git push ๐Ÿš€

Function overloads

The issue is to use typing.overload to make the return type of some functions more precise.

An introductory video on typing.overload: https://youtu.be/rY9NZ-tXiDQ

A (work-in-progress) blog post on this topic: https://m-e-gorelli.medium.com/making-sense-of-typing-overload-437e6deecade

In the following methods all return something like Optional[FrameOrSeries]. We should overload them, so that if inplace=True they return None and if inplace=False they return FrameOrSeries

https://github.com/pandas-dev/pandas/blob/master/pandas/core/series.py

  • drop_duplicates
  • drop

https://github.com/pandas-dev/pandas/blob/master/pandas/core/generic.py


From the pandas codebase, see set_axis and reset_index for examples

specify fewer errors in validate docstrings

We currently have a check which runs scripts/validate_docstrings.py, bu only with the following error codes:

--errors=GL03,GL04,GL05,GL06,GL07,GL09,GL10,SS02,SS04,SS05,PR03,PR04,PR05,PR10,EX04,RT01,RT04,RT05,SA02,SA03

Some error codes currently excluded are:

  • SS03
  • PR08
  • SA04
  • GL01
  • EX01
  • SS03
  • SS06

If you just run python scripts/validate_docstrings.py, you will see these and plenty more.

So, to take on this issue, you should:

  1. Pick an extra error code to take on
  2. Run validate_docstrings.py with that error code, for example:
python scripts/validate_docstrings.py --errors=SS01
  1. Pick 5-10 failing files, and fix them so that they pass this check!

Please comment with:

  1. which error code you will work on
  2. which files you will fix

Add YD01 to validate_docstrings codes

and fix related errors:

/Users/marcogorelli/pandas-dev/pandas/core/groupby/groupby.py:753:YD01:pandas.core.resample.Resampler.iter:No Yields section found
/Users/marcogorelli/pandas-dev/pandas/core/groupby/groupby.py:753:YD01:pandas.core.groupby.GroupBy.iter:No Yields section found
/Users/marcogorelli/pandas-dev/pandas/core/groupby/groupby.py:2364:YD01:pandas.core.groupby.GroupBy.ohlc:No Yields section found
/Users/marcogorelli/pandas-dev/pandas/core/frame.py:1345:YD01:pandas.DataFrame.itertuples:No Yields section found
/Users/marcogorelli/pandas-dev/pandas/core/frame.py:9552:YD01:pandas.DataFrame.round:No Yields section found

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.