Comments (21)
Thanks @ericmjl , explicit is good!
I actually came across this example on SO:
import functools
import warnings
def _deprecated_alias(**aliases):
def deco(f):
@functools.wraps(f)
def wrapper(*args, **kwargs):
_rename_kwargs(f.__name__, kwargs, aliases)
return f(*args, **kwargs)
return wrapper
return deco
def _rename_kwargs(func_name, kwargs, aliases):
for alias, new in aliases.items():
if alias in kwargs:
if new in kwargs:
raise TypeError('{} received both {} and {}'.format(
func_name, alias, new))
warnings.warn('{} is deprecated; use {}'.format(alias, new),
DeprecationWarning)
kwargs[new] = kwargs.pop(alias)
using this as a decorator we would be able to do:
@_deprecated_alias(a='alpha', b='beta')
def simple_sum(alpha, beta):
return alpha + beta
print(simple_sum(alpha=6, beta=9))
print(simple_sum(a=6, b=9))
With this method, updating the argument names would be much faster, and when the deprecation period has expired, we only need to delete one/two lines instead of 6+ lines currently. If we used this approach, I would make the above functions more human-readable.
What do you think?
from pyjanitor.
Thanks for raising this issue, @szuckerman! Iβm also in favor of consistent naming schemes.
Breaking this down a bit:
Columns
I vote in favor of βcol_nameβ. It is more explicit than βcolumnβ.
Verbose vs Concise Functions
Zen of Python states that explicit is better than implicit, with no qualifiers afterwards. For that reason, I vote in favor of verbose names.
from pyjanitor.
Relevant PyCon talk on refactoring and adding warnings
for legacy code. The actual code for warnings comes in around 17:00.
from pyjanitor.
Following advice in that talk, we could update expand_column()
as follows:
def expand_column(df, sep, column_name=None, concat=True, **kwargs):
"""
Expand a categorical column with multiple labels into dummy-coded columns.
Super sugary syntax that wraps :py:meth:`pandas.Series.str.get_dummies`.
Functional usage example:
.. code-block:: python
df = expand_column(df, column_name='col_name',
sep=', ') # note space in sep
Method chaining example:
.. code-block:: python
import pandas as pd
import janitor
df = pd.DataFrame(...).expand_column(df, column_name='col_name', sep=', ')
:param df: A pandas DataFrame.
:param column_name: A `str` indicating which column to expand.
:param sep: The delimiter. Example delimiters include `|`, `, `, `,` etc.
:param bool concat: Whether to return the expanded column concatenated to
the original dataframe (`concat=True`), or to return it standalone
(`concat=False`).
"""
if kwargs and column_name is not None:
raise TypeError('Mixed usage of column and column_name')
if column_name is None:
warnings.warn('Use of column is deprecated. You should use column_name.')
column_name = kwargs['column']
expanded = df[column_name].str.get_dummies(sep=sep)
if concat:
df = df.join(expanded)
return df
else:
return expanded
One notable consequence of this, is having to set column_name as None, which requires us to move its position in the list of arguments (during the deprecation period).
The way it's written also assumes that additional arguments received would only relate to deprecated usage, and include column
.
Another issue is to decide what is the proper variable form of "column". Even when the variable names are explicit, *_col_*
and *_column_*
are both used throughout the code.
from pyjanitor.
I'd be interested in taking on this issue if no one has done so yet. Would this need some support to warn users that names have changed, and that previous names will no longer work in a future version?
from pyjanitor.
Cool! utils.py
sounds like a good place.
I'll link to the SO post in the _deprecated_alias
docstring.
I'll start with a PR that includes these new functions, and include some tests to make sure it works/fails/warns as expected. After that, I'm fine updating the previously changed functions and moving forward. I actually was considering revisiting those functions anyway after learning that there was an actual DeprecationWarning
subclass.
from pyjanitor.
Labelling this a "good first issue" for newcomers, as it basically involves a find-and-replace.
from pyjanitor.
@HectorM14 yes, that would be great! Please do go ahead.
Would this need some support to warn users that names have changed, and that previous names will no longer work in a future version?
Yes agreed. Since we're at version 0.16.x right now, the responsible way to evolve the API is to keep it through the next two minor versions, and the deprecate it at the next major or 3rd minor version. Perhaps raising a warning if and only if certain functions are called/keyword arguments are used.
Personally, I would prefer one function at a time, which makes it easier to review, but at the same time, I think we should trust your judgment - especially if there's things that are logically groupable, feel free to PR in them together. Looking forward to what you've got; also, don't hesitate to ping in here!
from pyjanitor.
Great, I'll use my best judgement for PR scope. Is there a standard way to deprecate/discourage old argument names? I was thinking of using decorators to keep the method APIs clear about their usage going forward.
from pyjanitor.
Thanks @HectorM14!
Is there a standard way to deprecate/discourage old argument names?
To the best of my knowledge, the standard way to do this should be to insert a conditional checking to see if that old argument name was called or not, and then inside the conditional:
- Raise a warning.
- Assign the value of the old argument to a new variable that has the same name as the new argument.
For example:
from warning import warning
def deprecation_warning(old_arg, new_arg):
"""generic warning function."""
warning(f"{argname.__name__} is deprecated, and will be replaced with {new_arg.__name__}")
def func(df, old_arg1, arg2, arg3, new_arg):
"""docstring here"""
if old_arg1 is not None:
deprecation_warning(old_arg1, new_arg)
new_arg = old_arg1
# continue code
I was thinking of using decorators to keep the method APIs clear about their usage going forward.
That might actually be better than what I had up there! I'd like to see the pattern in the first PR, it'd be educational for myself.
from pyjanitor.
Oh cool, thanks for sharing that, @szuckerman!
from pyjanitor.
@HectorM14 those are great points you've raised π.
Another issue is to decide what is the proper variable form of "column". Even when the variable names are explicit,
*_col_*
and*_column_*
are both used throughout the code.
I am guilty of being inconsistent here, as I sometimes use col
and sometimes use column
and sometimes use colname
. Yes, being consistent is a great thing. I would personally vote for column_name
, in adherence to "explicit is better than implicit", but if there are better reasons (brevity, perhaps?) for other conventions, not a problem! I have no strong opinions on this one, as long as we can make it towards consistency in the end!
from pyjanitor.
I like column_name too. I can aim for being explicit with these names unless they become too unwieldy. I'll start making individual PRs for each updated function. Updating variables relating columns is easy, but if anyone finds other variables that should be updated, they can list them here.
from pyjanitor.
Updating names is becoming a bit more complicated. For example we currently have:
def coalesce(df, columns, new_column_name):
Following the current style we've been using, we would update this to:
def coalesce(df, new_column_name, columns=None, **kwargs):
But quick, intuitive usage of this function from example.py
is:
df = (
df
.coalesce(["certification", "certification_1"], "certification")
With the proposed changes, this would become (due to rearranging of argument order):
df = (
df
.coalesce(column_names=["certification", "certification_1"], new_column_name="certification")
)
Which is excessively more complicated. So our fix will have to be more complex/clever to keep it clean for the user.
from pyjanitor.
@HectorM14 I wouldn't worry too much about the kwargs. In Python, explicit is better than implicit, generally speaking. I find myself writing out kwarg names even though it is sometimes easier to not do so, because I recognize that it makes the code more readable, actually. I would encourage you to keep going with the PRs π.
from pyjanitor.
This is a really great find! I'm all for reducing the maintenance burden π.
Perhaps this function can be inserted into utils.py
? Be sure to acknowledge the source by linking to SO!
Also, does this mean you'd redo the name changes already PR'd? I don't want to put too much pressure on you, but if you'd do so, that'd be superb for maintenance purposes!
from pyjanitor.
I have been merging from upstream
before committing, should I be rebasing instead?
from pyjanitor.
Rebasing works as well! It's a suggestion from @zbarry, which, frankly speaking, I rarely do, because it's just easier to remember git fetch upstream; git merge upstream/dev
. π
That said, good practices are good to use, so I'm going to try moving to rebasing now.
from pyjanitor.
@HectorM14 - that decorator is a great find. Thanks for adding it!
from pyjanitor.
Yes! It's a very clean method that I would have been happy to find in the Python standard library. I assume most projects go through name changes and deprecation at some point. π€·ββοΈ
from pyjanitor.
Yes agreed. Since we're at version 0.16.x right now, the responsible way to evolve the API is to keep it through the next two minor versions, and the deprecate it at the next major or 3rd minor version. Perhaps raising a warning if and only if certain functions are called/keyword arguments are used.
I didn't think of this previously, but should I add some tag comment to the decorators to indicate under which release the names were deprecated? After the decided major/minor release, it'll be easy to search for #0.16x
etc and remove those lines. In a similar vein, in the unlikely event a function has multiple rounds of deprecation at different releases, a second decorator could be added for the new deprecation (unless that's just too complicated).
from pyjanitor.
Related Issues (20)
- add row count for conditional_join
- RuntimeWarning: subpackages can technically be lazily loaded HOT 16
- explode_levels
- Not able to import janitor.clean_name function - ImportError: cannot import name 'ABCPandasArray' from 'pandas.core.dtypes.generic' HOT 2
- Typos in repository
- expand function
- [INFRA] Switch over to pyproject.toml
- Support efficient json extraction within a pandas column HOT 1
- [ENH] implement full numba version of a single conditional_join
- deprecation warning for pivot_longer HOT 1
- Return only matching indices for `conditional_join`
- [ENH] cython a subset of _range_join_indices and equi join HOT 4
- extend `col` powers for index selection HOT 1
- dtype conversion on index
- `conditional_join` fails on mac for `equi-join` and numba HOT 1
- Outdated version in conda forge HOT 1
- extend `row_to_names` to support multiindex
- `sheet_name` not required in jn.xlsx_table
- Problems with equalities in contional_join HOT 18
- Make clean_names() compatible with polars and geopandas dataframes HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyjanitor.