I wanted to discuss naming conventions for the various functions and arguments for con

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Thanks for raising this issue, <a class="user-mention notranslate" data-hovercard-type

Cool! utils.py sounds like a good place. <p dir="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

[ENH] Naming conventions about pyjanitor HOT 21 CLOSED

pyjanitor-devs commented on May 22, 2024

[ENH] Naming conventions

from pyjanitor.

Comments (21)

hectormz commented on May 22, 2024 4

Thanks @ericmjl , explicit is good!
I actually came across this example on SO:

import functools
import warnings

def _deprecated_alias(**aliases):
    def deco(f):
        @functools.wraps(f)
        def wrapper(*args, **kwargs):
            _rename_kwargs(f.__name__, kwargs, aliases)
            return f(*args, **kwargs)
        return wrapper
    return deco

def _rename_kwargs(func_name, kwargs, aliases):
    for alias, new in aliases.items():
        if alias in kwargs:
            if new in kwargs:
                raise TypeError('{} received both {} and {}'.format(
                    func_name, alias, new))
            warnings.warn('{} is deprecated; use {}'.format(alias, new),
                          DeprecationWarning)
            kwargs[new] = kwargs.pop(alias)

using this as a decorator we would be able to do:

@_deprecated_alias(a='alpha', b='beta')
def simple_sum(alpha, beta):
    return alpha + beta

print(simple_sum(alpha=6, beta=9))
print(simple_sum(a=6, b=9))

With this method, updating the argument names would be much faster, and when the deprecation period has expired, we only need to delete one/two lines instead of 6+ lines currently. If we used this approach, I would make the above functions more human-readable.

What do you think?

from pyjanitor.

ericmjl commented on May 22, 2024 2

Thanks for raising this issue, @szuckerman! I’m also in favor of consistent naming schemes.

Breaking this down a bit:

Columns

I vote in favor of “col_name”. It is more explicit than “column”.

Verbose vs Concise Functions

Zen of Python states that explicit is better than implicit, with no qualifiers afterwards. For that reason, I vote in favor of verbose names.

from pyjanitor.

szuckerman commented on May 22, 2024 2

Relevant PyCon talk on refactoring and adding warnings for legacy code. The actual code for warnings comes in around 17:00.

from pyjanitor.

hectormz commented on May 22, 2024 2

Following advice in that talk, we could update expand_column() as follows:

def expand_column(df, sep, column_name=None, concat=True, **kwargs):
    """
    Expand a categorical column with multiple labels into dummy-coded columns.

    Super sugary syntax that wraps :py:meth:`pandas.Series.str.get_dummies`.

    Functional usage example:

    .. code-block:: python

        df = expand_column(df, column_name='col_name',
                           sep=', ')  # note space in sep

    Method chaining example:

    .. code-block:: python

        import pandas as pd
        import janitor
        df = pd.DataFrame(...).expand_column(df, column_name='col_name', sep=', ')

    :param df: A pandas DataFrame.
    :param column_name: A `str` indicating which column to expand.
    :param sep: The delimiter. Example delimiters include `|`, `, `, `,` etc.
    :param bool concat: Whether to return the expanded column concatenated to
        the original dataframe (`concat=True`), or to return it standalone
        (`concat=False`).
    """
    if kwargs and column_name is not None:
        raise TypeError('Mixed usage of column and column_name')
    if column_name is None:
        warnings.warn('Use of column is deprecated. You should use column_name.')
        column_name = kwargs['column']
    expanded = df[column_name].str.get_dummies(sep=sep)
    if concat:
        df = df.join(expanded)
        return df
    else:
        return expanded

One notable consequence of this, is having to set column_name as None, which requires us to move its position in the list of arguments (during the deprecation period).

The way it's written also assumes that additional arguments received would only relate to deprecated usage, and include column.

Another issue is to decide what is the proper variable form of "column". Even when the variable names are explicit, *_col_* and *_column_* are both used throughout the code.

from pyjanitor.

hectormz commented on May 22, 2024 1

I'd be interested in taking on this issue if no one has done so yet. Would this need some support to warn users that names have changed, and that previous names will no longer work in a future version?

from pyjanitor.

hectormz commented on May 22, 2024 1

Cool! utils.py sounds like a good place.

I'll link to the SO post in the _deprecated_alias docstring.
I'll start with a PR that includes these new functions, and include some tests to make sure it works/fails/warns as expected. After that, I'm fine updating the previously changed functions and moving forward. I actually was considering revisiting those functions anyway after learning that there was an actual DeprecationWarning subclass.

from pyjanitor.

ericmjl commented on May 22, 2024

Labelling this a "good first issue" for newcomers, as it basically involves a find-and-replace.

from pyjanitor.

ericmjl commented on May 22, 2024

@HectorM14 yes, that would be great! Please do go ahead.

Would this need some support to warn users that names have changed, and that previous names will no longer work in a future version?

Yes agreed. Since we're at version 0.16.x right now, the responsible way to evolve the API is to keep it through the next two minor versions, and the deprecate it at the next major or 3rd minor version. Perhaps raising a warning if and only if certain functions are called/keyword arguments are used.

Personally, I would prefer one function at a time, which makes it easier to review, but at the same time, I think we should trust your judgment - especially if there's things that are logically groupable, feel free to PR in them together. Looking forward to what you've got; also, don't hesitate to ping in here!

from pyjanitor.

hectormz commented on May 22, 2024

Great, I'll use my best judgement for PR scope. Is there a standard way to deprecate/discourage old argument names? I was thinking of using decorators to keep the method APIs clear about their usage going forward.

from pyjanitor.

ericmjl commented on May 22, 2024

Thanks @HectorM14!

Is there a standard way to deprecate/discourage old argument names?

To the best of my knowledge, the standard way to do this should be to insert a conditional checking to see if that old argument name was called or not, and then inside the conditional:

Raise a warning.
Assign the value of the old argument to a new variable that has the same name as the new argument.

For example:

from warning import warning

def deprecation_warning(old_arg, new_arg):
    """generic warning function."""
    warning(f"{argname.__name__} is deprecated, and will be replaced with {new_arg.__name__}")

def func(df, old_arg1, arg2, arg3, new_arg):
    """docstring here"""
    if old_arg1 is not None:
        deprecation_warning(old_arg1, new_arg)
        new_arg = old_arg1
    # continue code

I was thinking of using decorators to keep the method APIs clear about their usage going forward.

That might actually be better than what I had up there! I'd like to see the pattern in the first PR, it'd be educational for myself.

from pyjanitor.

ericmjl commented on May 22, 2024

Oh cool, thanks for sharing that, @szuckerman!

from pyjanitor.

ericmjl commented on May 22, 2024

@HectorM14 those are great points you've raised 😄.

Another issue is to decide what is the proper variable form of "column". Even when the variable names are explicit, *_col_* and *_column_* are both used throughout the code.

I am guilty of being inconsistent here, as I sometimes use col and sometimes use column and sometimes use colname. Yes, being consistent is a great thing. I would personally vote for column_name, in adherence to "explicit is better than implicit", but if there are better reasons (brevity, perhaps?) for other conventions, not a problem! I have no strong opinions on this one, as long as we can make it towards consistency in the end!

from pyjanitor.

hectormz commented on May 22, 2024

I like column_name too. I can aim for being explicit with these names unless they become too unwieldy. I'll start making individual PRs for each updated function. Updating variables relating columns is easy, but if anyone finds other variables that should be updated, they can list them here.

from pyjanitor.

hectormz commented on May 22, 2024

Updating names is becoming a bit more complicated. For example we currently have:

def coalesce(df, columns, new_column_name):

Following the current style we've been using, we would update this to:

def coalesce(df, new_column_name, columns=None, **kwargs):

But quick, intuitive usage of this function from example.py is:

df = (
    df
    .coalesce(["certification", "certification_1"], "certification")

With the proposed changes, this would become (due to rearranging of argument order):

df = (
    df
    .coalesce(column_names=["certification", "certification_1"], new_column_name="certification")
)

Which is excessively more complicated. So our fix will have to be more complex/clever to keep it clean for the user.

from pyjanitor.

ericmjl commented on May 22, 2024

@HectorM14 I wouldn't worry too much about the kwargs. In Python, explicit is better than implicit, generally speaking. I find myself writing out kwarg names even though it is sometimes easier to not do so, because I recognize that it makes the code more readable, actually. I would encourage you to keep going with the PRs 😄.

from pyjanitor.

ericmjl commented on May 22, 2024

This is a really great find! I'm all for reducing the maintenance burden 😄.

Perhaps this function can be inserted into utils.py? Be sure to acknowledge the source by linking to SO!

Also, does this mean you'd redo the name changes already PR'd? I don't want to put too much pressure on you, but if you'd do so, that'd be superb for maintenance purposes!

from pyjanitor.

hectormz commented on May 22, 2024

I have been merging from upstream before committing, should I be rebasing instead?

from pyjanitor.

ericmjl commented on May 22, 2024

Rebasing works as well! It's a suggestion from @zbarry, which, frankly speaking, I rarely do, because it's just easier to remember git fetch upstream; git merge upstream/dev. 😄

That said, good practices are good to use, so I'm going to try moving to rebasing now.

from pyjanitor.

zbarry commented on May 22, 2024

@HectorM14 - that decorator is a great find. Thanks for adding it!

from pyjanitor.

hectormz commented on May 22, 2024

Yes! It's a very clean method that I would have been happy to find in the Python standard library. I assume most projects go through name changes and deprecation at some point. 🤷‍♂️

from pyjanitor.

hectormz commented on May 22, 2024

Yes agreed. Since we're at version 0.16.x right now, the responsible way to evolve the API is to keep it through the next two minor versions, and the deprecate it at the next major or 3rd minor version. Perhaps raising a warning if and only if certain functions are called/keyword arguments are used.

I didn't think of this previously, but should I add some tag comment to the decorators to indicate under which release the names were deprecated? After the decided major/minor release, it'll be easy to search for #0.16x etc and remove those lines. In a similar vein, in the unlikely event a function has multiple rounds of deprecation at different releases, a second decorator could be added for the new deprecation (unless that's just too complicated).

from pyjanitor.

[ENH] Naming conventions about pyjanitor HOT 21 CLOSED

Comments (21)

Columns

Verbose vs Concise Functions

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent