Comments (2)
I've thought about this a lot, and I think we're getting closer to this world. However my main concern is that this generic dataframe schema will have to include a superset of all the options for all of the dataframes. I think eventually we'll nail down a "common dataframe schema api to rule them all", in which case this concern is less of an issue.
We recently introduced a generic dataframe
api: https://github.com/unionai-oss/pandera/tree/main/pandera/api/dataframe which is where this dispatching might happen. Currently pandas
and polars
schemas inherit from these classes (pyspark
still needs to be done).
If folks engage with this issue (👍 or comment/discuss) we can prioritize this effort, but in the mean time @DavidSlayback if you can write down a spec for how this would all work with perhaps a code snippet sketch of how dispatching would work that would get the ball rolling.
from pandera.
Sure, I'll try to sketch something up later this week when I'm free!
from pandera.
Related Issues (20)
- Polars checks not being evaluated correctly HOT 2
- Pyinstaller build fails when using pydantic version 2.*
- `dataframe_parser`s that rename columns conflict with type coercion
- Wrong JSON output from `SchemaErrors.message`
- Is it possible to create a check function that accepts additional arguments?
- Why is pa.String returned as 'str' instead of 'string' when used with Columns? HOT 2
- BackendNotFoundError on databricks/pyspark cluster
- Example on how to use Decimal as dtype for a column
- Feat: Adding more pyarrow types to pandas engine
- Pydantic compatibility issue HOT 1
- pyspark_sql docs run time error
- Reusing Field objects causes unexpected failure
- Mypy type hinting error for direct column type annotations
- COERCION_ERRORS will not support `polars=1.0.0`
- Deprecation warning on `with_context`
- `@pa.parser` does not work for Index fields
- Current version does not work with numpy 2.0 HOT 3
- Auto-completion of column names of a DataFrame instance HOT 1
- Validation with `dataframe_check` gets very slow for big dataframes
- Decimal validation not fully supported
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pandera.