Giter Site home page Giter Site logo

Comments (12)

WillAyd avatar WillAyd commented on June 29, 2024 1

I think pip install pandas and conda install pandas should install PyArrow, and possibly Matplotlib and other dependencies. And there should be a way to install pandas without any optional dependencies, pip install pandas[core] and conda install pandas-core, or whatever makes sense and is feasible.

As long as PDEP-10 holds I think pyarrow is a core package. Outside of that how much of a difference is this expected to make? I think there is also a downside to having separate packages because then you start to fragment the user base

from pandas.

Dr-Irv avatar Dr-Irv commented on June 29, 2024

I like the idea of having a "minimal" installation that covers most common use cases and avoids downloading unneeded packages. I would suggest the name minipandas, akin to how miniconda vs. anaconda are a minimal and maximal version of anaconda.

With respect to the pyarrow issue, we'd then have to make sure that minipandas would work without pyarrow being installed.

One other thought - I imagine the current test suite would have to be split into tests appropriate for minipandas and pandas , and there would be an additional burden when building distributions. We'd also have to carefully examine the docs to determine which parts need a "full pandas" label to indicate that you need the full package (or specific dependencies) for it to work.

from pandas.

mroeschke avatar mroeschke commented on June 29, 2024

I do think this would be useful but mainly for considering how the code is packaged and less about how dependencies are bundled (which seems to be the focus here?). For pip installations we have the pip extras set up and understandably conda doesn't have something like that (yet). If this re-packaging is to make the conda installation story nicer I'm not sure if it's worth it.

Just noting that core seems to be the "common" prefix for minimal packages in Python too:

https://anaconda.org/conda-forge/jupyter_core
https://anaconda.org/conda-forge/dask-core
https://anaconda.org/conda-forge/poetry-core
https://anaconda.org/conda-forge/botocore

from pandas.

datapythonista avatar datapythonista commented on June 29, 2024

which seems to be the focus here?

My main point is about the UX, anything else I'm personally flexible and can be discussed later.

I think pip install pandas and conda install pandas should install PyArrow, and possibly Matplotlib and other dependencies. And there should be a way to install pandas without any optional dependencies, pip install pandas[core] and conda install pandas-core, or whatever makes sense and is feasible.

from pandas.

datapythonista avatar datapythonista commented on June 29, 2024

The difference is that by default users will get our recommended dependencies, as opposed as now, since the main packages will now add them, still leaving the option for users to install a version with no optional dependencies.

Making up the numbers, but if 20% of users have PyArrow now, maybe we'll get 80% of them, making pandas faster for many users who don't know or don't care much on what to install, and trust us on providing what they need by default.

I personally don't see the fragmentation problem you mention. This solution has been implemented for decades in the Linux world. If you want KDE for example, you just install the kde package and you get a notepad, a calculator, a calendar... If you have a reason to not have everything that KDE provides, you can still install kde-core and the specific packages you want. I wouldn't say KDE users are fragmented because of this, or that pandas users will be. We are already dealing with an user base where each individual has a different set of dependencies. We'll affect the percentage of users that have some of the pandas optional dependencies, but other than that I personally don't see a significant change or any drawback. the pandas installed will be exactly the same, the one in pandas-core, which will be installed by both the pandas and the pandas-core packages.

from pandas.

bashtage avatar bashtage commented on June 29, 2024

There is already a great mechanism and all that is needed are some recommendations like installing pandas[all] (or full or kitchen-sink) and possibly other subsets like pandas[io].

I think it would be a mistake to try and redefine pandas to be some huge set of dependencies, and to introduce some other package to be the current pandas.

from pandas.

rhshadrach avatar rhshadrach commented on June 29, 2024

I think this is well known, but feels worth stating anyways: no matter how its implemented, if there are ways of using pandas without pyarrow, then we have to maintain both "pandas with pyarrow" and "pandas without pyarrow" - which to me was the main reasons for PDEP-10.

If pyarrow is always opt-in, then I don't see much issue with this. But if we are having e.g. "string[pyarrow] when pyarrow is installed and otherwise numpy object" type inference, then users will have different behavior in pandas itself depending on whether a third party package is installed or not. That seems like a very bad user experience to me.

from pandas.

attack68 avatar attack68 commented on June 29, 2024

I think this is well known, but feels worth stating anyways: no matter how its implemented, if there are ways of using pandas without pyarrow, then we have to maintain both "pandas with pyarrow" and "pandas without pyarrow" - which to me was the main reasons for PDEP-10.

That was my understanding of one of the core reasons for PDEP-10. I was was one of the few people voting against PDEP-10, but now it has been voted is it not supposed to be accepted it and stuck with? Unless a new PDEP or amendment to it is put forward then surely this is out of scope until then. According to the PDEP the warning should be retained also and not repealed by a close majority vote which might also not follow PDEP rules.

from pandas.

datapythonista avatar datapythonista commented on June 29, 2024

I agree, and it's surely not the goal of this issue to cancel PDEP-10. Also, while having two packages could be used to install PyArrow more broadly without requiring, the scope of what I'm discussing here is not limited to PyArrow and could be used to other dependencies that we recommend (or assume users are most likely to want) but we don't want to force, for example Matplotlib.

From the previous discussions seems like several people have interest in not moving forward with PDEP-10, at least as is. I fully agree that this issue is not where we want to decide or even discuss it. But if there is interest in implementing the two packages for default and minimal dependencies, I think it can make a difference for future discussions on requiring Arrow.

And clearly, this issue doesn't help with cleaning our codebase of if pyarrow or having to deal with two separate cases. The main change I envision is a significant increase in the number of users who have PyArrow installed.

I'm personally +1 on moving forward with PDEP-10, fully requiring PyArrow and keeping the warning, but if many people dislike the PDEP now, I think we'll have to have a new discussion.

from pandas.

lithomas1 avatar lithomas1 commented on June 29, 2024

There is already a great mechanism and all that is needed are some recommendations like installing pandas[all] (or full or kitchen-sink) and possibly other subsets like pandas[io].

I think it would be a mistake to try and redefine pandas to be some huge set of dependencies, and to introduce some other package to be the current pandas.

Agreed with this. Extras should stay extras.

IIRC, the -core thing is probably specific to conda-forge, I've never seen it used with a project on PyPI.

from pandas.

fangchenli avatar fangchenli commented on June 29, 2024

There is already a great mechanism and all that is needed are some recommendations like installing pandas[all] (or full or kitchen-sink) and possibly other subsets like pandas[io].

I think it would be a mistake to try and redefine pandas to be some huge set of dependencies, and to introduce some other package to be the current pandas.

Agree. And we should have more detailed installation instructions to educate users on using extras.

And I think we could have a pandas-core containing all (or some) the extension modules so we could have more fine-grained tests and benchmarks. It'll also speed up CI and improve developer experience.

from pandas.

datapythonista avatar datapythonista commented on June 29, 2024

Thanks all for the feedback. It doesn't seem there is much interest to move forward with this at this point. I guess in the future something similar can be considered for conda-forge, which doesn't have extras like pip, but I'll close this issue, which was specific to making the "normal" pandas package to install a subset of optional dependencies, which doesn't have much support.

from pandas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.