Giter Site home page Giter Site logo

Comments (11)

hectormz avatar hectormz commented on May 18, 2024 1

@ericmjl I'm revisiting this one again.
So currently in some of the optional tests we have this pattern:

try:
    import pyspark

    import janitor.spark  # noqa: F401
except ImportError:
    pyspark = None


@pytest.mark.spark_functions
@pytest.mark.skipif(pyspark is None, reason="pyspark tests only required for CI")
def test_clean_names_method_chain(spark_df):
...

There are some other options I learned recently. We can instead skip the whole test file if an import is unsuccessful:

import pytest
pyspark = pytest.importorskip("pyspark")

I think it is very useful to:

  • run certain suites (biology, chemistry, etc...) you are working on
  • run all tests locally without errors for missing packages
  • ensure that CI doesn't ignore skipped tests if a package doesn't get installed somehow

from pyjanitor.

ericmjl avatar ericmjl commented on May 18, 2024

@szuckerman that's a really great catch!

In general, I have been dependent on conda since grad school - only because conda helps with packaging of non-Python dependencies of Python and other language data science packages. Admittedly, venv has been a second-class citizen in my mind - but only out of convenience, really.

I think what I'll do is go in and mark the tests with their appropriate submodule name, so we can do things like:

$ pytest -m biology

or

$ pytest -m functions

That way, they can be selectively run if necessary.

from pyjanitor.

hectormz avatar hectormz commented on May 18, 2024

Are the concerns raised here still outstanding @ericmjl, @szuckerman? I suppose this is related in a way to #551.

I'm going through the oldest issues to see what can be done.

from pyjanitor.

ericmjl avatar ericmjl commented on May 18, 2024

I think we might need to update this issue - perhaps it should be renamed, "Mark tests with skipif for optional dependencies".

Doing so would be pretty rad - we could skip all of the biology/chemistry/spark functions if they can't be imported successfully.

One thing I would prefer to remain as is, though, is environment.yml - using it keeps the Azure pipelines builds cleaner by abstracting out the exact packages we want installed during CI builds.

from pyjanitor.

zjpoh avatar zjpoh commented on May 18, 2024

I agree. Having them as optional for users but required in CI is important.

from pyjanitor.

ericmjl avatar ericmjl commented on May 18, 2024

@hectormz, this looks like a very good idea. I can definitely see how it’s useful for those who don’t want to install the entire suite of optional dependencies.

If you could help me a bit here, I am not quite following the 3rd point:

ensure that CI doesn't ignore skipped tests if a package doesn't get installed somehow

If I understand correctly, on the CI system, if a package doesn’t get installed somehow, we want the tests to fail, rather than skip them. Is that part of the point you’re making with point 3? Or am I simply overthinking things here? 😸

from pyjanitor.

hectormz avatar hectormz commented on May 18, 2024

@ericmjl Exactly! This is very related to a PR I'm working on for arviz arviz-devs/arviz#1113

We're still figuring out the best way, either monkeypatching or writing our own importorskip, but having some method that involves creating an environment variable in CI. Let me.know your thoughts or we can see how that PR is resolved and do something similar here. 😉

from pyjanitor.

hectormz avatar hectormz commented on May 18, 2024

@ericmjl With arviz-devs/arviz#1113 having been merged, do you want to try a similar approach to what was eventually executed in that PR?

from pyjanitor.

ericmjl avatar ericmjl commented on May 18, 2024

Yes! Let's do that. This would be good for the pyspark and domain-specific modules, right?

I looked briefly at the code, and I think this is the block that would help, am I right about it?

pytestmark = pytest.mark.skipif(  # pylint: disable=invalid-name
    (importlib.util.find_spec("numba") is None) & ~running_on_ci(),
    reason="test requires numba which is not installed",
)

from pyjanitor.

hectormz avatar hectormz commented on May 18, 2024

Right. That decorator will skip the test if the module is not installed (but not running on CI)

There's also

from ..helper import importorskip
numba= importorskip("numba")  # pylint: disable=invalid-name

Which would skip the whole file of tests (Useful for tests that are grouped in same file like pyspark IIRC)

This relies on a custom importorskip that checks for an environment variable we'd set in the Azure CI:

def running_on_ci() -> bool:
    """Return True if running on CI machine."""
    return os.environ.get("ARVIZ_CI_MACHINE") is not None


def importorskip(
    modname: str, minversion: Optional[str] = None, reason: Optional[str] = None
) -> Any:
    """Import and return the requested module ``modname``.
        Doesn't allow skips on CI machine.
        Borrowed and modified from ``pytest.importorskip``.
    :param str modname: the name of the module to import
    :param str minversion: if given, the imported module's ``__version__``
        attribute must be at least this minimal version, otherwise the test is
        still skipped.
    :param str reason: if given, this reason is shown as the message when the
        module cannot be imported.
    :returns: The imported module. This should be assigned to its canonical
        name.
    Example::
        docutils = pytest.importorskip("docutils")
    """
    # ARVIZ_CI_MACHINE is True if tests run on CI, where ARVIZ_CI_MACHINE env variable exists
    ARVIZ_CI_MACHINE = running_on_ci()
    if ARVIZ_CI_MACHINE:
        import warnings

        compile(modname, "", "eval")  # to catch syntaxerrors

        with warnings.catch_warnings():
            # make sure to ignore ImportWarnings that might happen because
            # of existing directories with the same name we're trying to
            # import but without a __init__.py file
            warnings.simplefilter("ignore")
            __import__(modname)
        mod = sys.modules[modname]
        if minversion is None:
            return mod
        verattr = getattr(mod, "__version__", None)
        if minversion is not None:
            if verattr is None or Version(verattr) < Version(minversion):
                raise Skipped(
                    "module %r has __version__ %r, required is: %r"
                    % (modname, verattr, minversion),
                    allow_module_level=True,
                )
        return mod
    else:
        return pytest.importorskip(modname=modname, minversion=minversion, reason=reason)

from pyjanitor.

hectormz avatar hectormz commented on May 18, 2024

This second option makes sense if it is very clear that the test .py file is dedicated to pyspark for example.

from pyjanitor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.