Comments (11)
@ericmjl I'm revisiting this one again.
So currently in some of the optional tests we have this pattern:
try:
import pyspark
import janitor.spark # noqa: F401
except ImportError:
pyspark = None
@pytest.mark.spark_functions
@pytest.mark.skipif(pyspark is None, reason="pyspark tests only required for CI")
def test_clean_names_method_chain(spark_df):
...
There are some other options I learned recently. We can instead skip the whole test file if an import is unsuccessful:
import pytest
pyspark = pytest.importorskip("pyspark")
I think it is very useful to:
- run certain suites (
biology
,chemistry
, etc...) you are working on - run all tests locally without errors for missing packages
- ensure that CI doesn't ignore skipped tests if a package doesn't get installed somehow
from pyjanitor.
@szuckerman that's a really great catch!
In general, I have been dependent on conda since grad school - only because conda helps with packaging of non-Python dependencies of Python and other language data science packages. Admittedly, venv
has been a second-class citizen in my mind - but only out of convenience, really.
I think what I'll do is go in and mark the tests with their appropriate submodule name, so we can do things like:
$ pytest -m biology
or
$ pytest -m functions
That way, they can be selectively run if necessary.
from pyjanitor.
Are the concerns raised here still outstanding @ericmjl, @szuckerman? I suppose this is related in a way to #551.
I'm going through the oldest issues to see what can be done.
from pyjanitor.
I think we might need to update this issue - perhaps it should be renamed, "Mark tests with skipif for optional dependencies".
Doing so would be pretty rad - we could skip all of the biology/chemistry/spark functions if they can't be imported successfully.
One thing I would prefer to remain as is, though, is environment.yml
- using it keeps the Azure pipelines builds cleaner by abstracting out the exact packages we want installed during CI builds.
from pyjanitor.
I agree. Having them as optional for users but required in CI is important.
from pyjanitor.
@hectormz, this looks like a very good idea. I can definitely see how it’s useful for those who don’t want to install the entire suite of optional dependencies.
If you could help me a bit here, I am not quite following the 3rd point:
ensure that CI doesn't ignore skipped tests if a package doesn't get installed somehow
If I understand correctly, on the CI system, if a package doesn’t get installed somehow, we want the tests to fail, rather than skip them. Is that part of the point you’re making with point 3? Or am I simply overthinking things here? 😸
from pyjanitor.
@ericmjl Exactly! This is very related to a PR I'm working on for arviz
arviz-devs/arviz#1113
We're still figuring out the best way, either monkeypatching or writing our own importorskip
, but having some method that involves creating an environment variable in CI. Let me.know your thoughts or we can see how that PR is resolved and do something similar here. 😉
from pyjanitor.
@ericmjl With arviz-devs/arviz#1113 having been merged, do you want to try a similar approach to what was eventually executed in that PR?
from pyjanitor.
Yes! Let's do that. This would be good for the pyspark and domain-specific modules, right?
I looked briefly at the code, and I think this is the block that would help, am I right about it?
pytestmark = pytest.mark.skipif( # pylint: disable=invalid-name
(importlib.util.find_spec("numba") is None) & ~running_on_ci(),
reason="test requires numba which is not installed",
)
from pyjanitor.
Right. That decorator will skip the test if the module is not installed (but not running on CI)
There's also
from ..helper import importorskip
numba= importorskip("numba") # pylint: disable=invalid-name
Which would skip the whole file of tests (Useful for tests that are grouped in same file like pyspark IIRC)
This relies on a custom importorskip
that checks for an environment variable we'd set in the Azure CI:
def running_on_ci() -> bool:
"""Return True if running on CI machine."""
return os.environ.get("ARVIZ_CI_MACHINE") is not None
def importorskip(
modname: str, minversion: Optional[str] = None, reason: Optional[str] = None
) -> Any:
"""Import and return the requested module ``modname``.
Doesn't allow skips on CI machine.
Borrowed and modified from ``pytest.importorskip``.
:param str modname: the name of the module to import
:param str minversion: if given, the imported module's ``__version__``
attribute must be at least this minimal version, otherwise the test is
still skipped.
:param str reason: if given, this reason is shown as the message when the
module cannot be imported.
:returns: The imported module. This should be assigned to its canonical
name.
Example::
docutils = pytest.importorskip("docutils")
"""
# ARVIZ_CI_MACHINE is True if tests run on CI, where ARVIZ_CI_MACHINE env variable exists
ARVIZ_CI_MACHINE = running_on_ci()
if ARVIZ_CI_MACHINE:
import warnings
compile(modname, "", "eval") # to catch syntaxerrors
with warnings.catch_warnings():
# make sure to ignore ImportWarnings that might happen because
# of existing directories with the same name we're trying to
# import but without a __init__.py file
warnings.simplefilter("ignore")
__import__(modname)
mod = sys.modules[modname]
if minversion is None:
return mod
verattr = getattr(mod, "__version__", None)
if minversion is not None:
if verattr is None or Version(verattr) < Version(minversion):
raise Skipped(
"module %r has __version__ %r, required is: %r"
% (modname, verattr, minversion),
allow_module_level=True,
)
return mod
else:
return pytest.importorskip(modname=modname, minversion=minversion, reason=reason)
from pyjanitor.
This second option makes sense if it is very clear that the test .py file is dedicated to pyspark
for example.
from pyjanitor.
Related Issues (20)
- RuntimeWarning: subpackages can technically be lazily loaded HOT 16
- explode_levels
- Not able to import janitor.clean_name function - ImportError: cannot import name 'ABCPandasArray' from 'pandas.core.dtypes.generic' HOT 2
- Typos in repository
- expand function
- [INFRA] Switch over to pyproject.toml
- Support efficient json extraction within a pandas column HOT 1
- [ENH] implement full numba version of a single conditional_join
- deprecation warning for pivot_longer HOT 1
- Return only matching indices for `conditional_join`
- [ENH] cython a subset of _range_join_indices and equi join HOT 4
- extend `col` powers for index selection HOT 1
- dtype conversion on index
- `conditional_join` fails on mac for `equi-join` and numba HOT 1
- Outdated version in conda forge HOT 1
- extend `row_to_names` to support multiindex
- `sheet_name` not required in jn.xlsx_table
- Problems with equalities in contional_join HOT 18
- Make clean_names() compatible with polars and geopandas dataframes HOT 6
- implement similar functions for polars
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyjanitor.