Giter Site home page Giter Site logo

Comments (21)

hectormz avatar hectormz commented on May 22, 2024 4

Thanks @ericmjl , explicit is good!
I actually came across this example on SO:

import functools
import warnings

def _deprecated_alias(**aliases):
    def deco(f):
        @functools.wraps(f)
        def wrapper(*args, **kwargs):
            _rename_kwargs(f.__name__, kwargs, aliases)
            return f(*args, **kwargs)
        return wrapper
    return deco

def _rename_kwargs(func_name, kwargs, aliases):
    for alias, new in aliases.items():
        if alias in kwargs:
            if new in kwargs:
                raise TypeError('{} received both {} and {}'.format(
                    func_name, alias, new))
            warnings.warn('{} is deprecated; use {}'.format(alias, new),
                          DeprecationWarning)
            kwargs[new] = kwargs.pop(alias)

using this as a decorator we would be able to do:

@_deprecated_alias(a='alpha', b='beta')
def simple_sum(alpha, beta):
    return alpha + beta

print(simple_sum(alpha=6, beta=9))
print(simple_sum(a=6, b=9))

With this method, updating the argument names would be much faster, and when the deprecation period has expired, we only need to delete one/two lines instead of 6+ lines currently. If we used this approach, I would make the above functions more human-readable.

What do you think?

from pyjanitor.

ericmjl avatar ericmjl commented on May 22, 2024 2

Thanks for raising this issue, @szuckerman! I’m also in favor of consistent naming schemes.

Breaking this down a bit:

Columns

I vote in favor of β€œcol_name”. It is more explicit than β€œcolumn”.

Verbose vs Concise Functions

Zen of Python states that explicit is better than implicit, with no qualifiers afterwards. For that reason, I vote in favor of verbose names.

from pyjanitor.

szuckerman avatar szuckerman commented on May 22, 2024 2

Relevant PyCon talk on refactoring and adding warnings for legacy code. The actual code for warnings comes in around 17:00.

from pyjanitor.

hectormz avatar hectormz commented on May 22, 2024 2

Following advice in that talk, we could update expand_column() as follows:

def expand_column(df, sep, column_name=None, concat=True, **kwargs):
    """
    Expand a categorical column with multiple labels into dummy-coded columns.

    Super sugary syntax that wraps :py:meth:`pandas.Series.str.get_dummies`.

    Functional usage example:

    .. code-block:: python

        df = expand_column(df, column_name='col_name',
                           sep=', ')  # note space in sep

    Method chaining example:

    .. code-block:: python

        import pandas as pd
        import janitor
        df = pd.DataFrame(...).expand_column(df, column_name='col_name', sep=', ')

    :param df: A pandas DataFrame.
    :param column_name: A `str` indicating which column to expand.
    :param sep: The delimiter. Example delimiters include `|`, `, `, `,` etc.
    :param bool concat: Whether to return the expanded column concatenated to
        the original dataframe (`concat=True`), or to return it standalone
        (`concat=False`).
    """
    if kwargs and column_name is not None:
        raise TypeError('Mixed usage of column and column_name')
    if column_name is None:
        warnings.warn('Use of column is deprecated. You should use column_name.')
        column_name = kwargs['column']
    expanded = df[column_name].str.get_dummies(sep=sep)
    if concat:
        df = df.join(expanded)
        return df
    else:
        return expanded

One notable consequence of this, is having to set column_name as None, which requires us to move its position in the list of arguments (during the deprecation period).

The way it's written also assumes that additional arguments received would only relate to deprecated usage, and include column.

Another issue is to decide what is the proper variable form of "column". Even when the variable names are explicit, *_col_* and *_column_* are both used throughout the code.

from pyjanitor.

hectormz avatar hectormz commented on May 22, 2024 1

I'd be interested in taking on this issue if no one has done so yet. Would this need some support to warn users that names have changed, and that previous names will no longer work in a future version?

from pyjanitor.

hectormz avatar hectormz commented on May 22, 2024 1

Cool! utils.py sounds like a good place.

I'll link to the SO post in the _deprecated_alias docstring.
I'll start with a PR that includes these new functions, and include some tests to make sure it works/fails/warns as expected. After that, I'm fine updating the previously changed functions and moving forward. I actually was considering revisiting those functions anyway after learning that there was an actual DeprecationWarning subclass.

from pyjanitor.

ericmjl avatar ericmjl commented on May 22, 2024

Labelling this a "good first issue" for newcomers, as it basically involves a find-and-replace.

from pyjanitor.

ericmjl avatar ericmjl commented on May 22, 2024

@HectorM14 yes, that would be great! Please do go ahead.

Would this need some support to warn users that names have changed, and that previous names will no longer work in a future version?

Yes agreed. Since we're at version 0.16.x right now, the responsible way to evolve the API is to keep it through the next two minor versions, and the deprecate it at the next major or 3rd minor version. Perhaps raising a warning if and only if certain functions are called/keyword arguments are used.

Personally, I would prefer one function at a time, which makes it easier to review, but at the same time, I think we should trust your judgment - especially if there's things that are logically groupable, feel free to PR in them together. Looking forward to what you've got; also, don't hesitate to ping in here!

from pyjanitor.

hectormz avatar hectormz commented on May 22, 2024

Great, I'll use my best judgement for PR scope. Is there a standard way to deprecate/discourage old argument names? I was thinking of using decorators to keep the method APIs clear about their usage going forward.

from pyjanitor.

ericmjl avatar ericmjl commented on May 22, 2024

Thanks @HectorM14!

Is there a standard way to deprecate/discourage old argument names?

To the best of my knowledge, the standard way to do this should be to insert a conditional checking to see if that old argument name was called or not, and then inside the conditional:

  1. Raise a warning.
  2. Assign the value of the old argument to a new variable that has the same name as the new argument.

For example:

from warning import warning

def deprecation_warning(old_arg, new_arg):
    """generic warning function."""
    warning(f"{argname.__name__} is deprecated, and will be replaced with {new_arg.__name__}")

def func(df, old_arg1, arg2, arg3, new_arg):
    """docstring here"""
    if old_arg1 is not None:
        deprecation_warning(old_arg1, new_arg)
        new_arg = old_arg1
    # continue code

I was thinking of using decorators to keep the method APIs clear about their usage going forward.

That might actually be better than what I had up there! I'd like to see the pattern in the first PR, it'd be educational for myself.

from pyjanitor.

ericmjl avatar ericmjl commented on May 22, 2024

Oh cool, thanks for sharing that, @szuckerman!

from pyjanitor.

ericmjl avatar ericmjl commented on May 22, 2024

@HectorM14 those are great points you've raised πŸ˜„.

Another issue is to decide what is the proper variable form of "column". Even when the variable names are explicit, *_col_* and *_column_* are both used throughout the code.

I am guilty of being inconsistent here, as I sometimes use col and sometimes use column and sometimes use colname. Yes, being consistent is a great thing. I would personally vote for column_name, in adherence to "explicit is better than implicit", but if there are better reasons (brevity, perhaps?) for other conventions, not a problem! I have no strong opinions on this one, as long as we can make it towards consistency in the end!

from pyjanitor.

hectormz avatar hectormz commented on May 22, 2024

I like column_name too. I can aim for being explicit with these names unless they become too unwieldy. I'll start making individual PRs for each updated function. Updating variables relating columns is easy, but if anyone finds other variables that should be updated, they can list them here.

from pyjanitor.

hectormz avatar hectormz commented on May 22, 2024

Updating names is becoming a bit more complicated. For example we currently have:

def coalesce(df, columns, new_column_name):

Following the current style we've been using, we would update this to:

def coalesce(df, new_column_name, columns=None, **kwargs):

But quick, intuitive usage of this function from example.py is:

df = (
    df
    .coalesce(["certification", "certification_1"], "certification")

With the proposed changes, this would become (due to rearranging of argument order):

df = (
    df
    .coalesce(column_names=["certification", "certification_1"], new_column_name="certification")
)

Which is excessively more complicated. So our fix will have to be more complex/clever to keep it clean for the user.

from pyjanitor.

ericmjl avatar ericmjl commented on May 22, 2024

@HectorM14 I wouldn't worry too much about the kwargs. In Python, explicit is better than implicit, generally speaking. I find myself writing out kwarg names even though it is sometimes easier to not do so, because I recognize that it makes the code more readable, actually. I would encourage you to keep going with the PRs πŸ˜„.

from pyjanitor.

ericmjl avatar ericmjl commented on May 22, 2024

This is a really great find! I'm all for reducing the maintenance burden πŸ˜„.

Perhaps this function can be inserted into utils.py? Be sure to acknowledge the source by linking to SO!

Also, does this mean you'd redo the name changes already PR'd? I don't want to put too much pressure on you, but if you'd do so, that'd be superb for maintenance purposes!

from pyjanitor.

hectormz avatar hectormz commented on May 22, 2024

I have been merging from upstream before committing, should I be rebasing instead?

from pyjanitor.

ericmjl avatar ericmjl commented on May 22, 2024

Rebasing works as well! It's a suggestion from @zbarry, which, frankly speaking, I rarely do, because it's just easier to remember git fetch upstream; git merge upstream/dev. πŸ˜„

That said, good practices are good to use, so I'm going to try moving to rebasing now.

from pyjanitor.

zbarry avatar zbarry commented on May 22, 2024

@HectorM14 - that decorator is a great find. Thanks for adding it!

from pyjanitor.

hectormz avatar hectormz commented on May 22, 2024

Yes! It's a very clean method that I would have been happy to find in the Python standard library. I assume most projects go through name changes and deprecation at some point. πŸ€·β€β™‚οΈ

from pyjanitor.

hectormz avatar hectormz commented on May 22, 2024

Yes agreed. Since we're at version 0.16.x right now, the responsible way to evolve the API is to keep it through the next two minor versions, and the deprecate it at the next major or 3rd minor version. Perhaps raising a warning if and only if certain functions are called/keyword arguments are used.

I didn't think of this previously, but should I add some tag comment to the decorators to indicate under which release the names were deprecated? After the decided major/minor release, it'll be easy to search for #0.16x etc and remove those lines. In a similar vein, in the unlikely event a function has multiple rounds of deprecation at different releases, a second decorator could be added for the new deprecation (unless that's just too complicated).

from pyjanitor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.