Comments (15)
Could query operators be defined here as well and Tables just implement their efficient methods while having a fallback? For example, select
, filter
, join
, mutate
, groupby
, distinct
, order
, etc.
from dataapi.jl.
We would need to agree on at least one positional argument at specified position to have a type restriction that is defined in a package.
from dataapi.jl.
Just as a comment from DataFrames.jl:
Mutate/Transform (make or replace columns)
This will be handled as a part of select
functionality now (with the :oldcol => function => :newcol
pattern).
from dataapi.jl.
That would indeed be useful for JuliaStats/StatsBase.jl#527.
from dataapi.jl.
Hmmm.........since JuliaData/Tables.jl#82 was closed, Tables.jl doesn't itself use Requires.jl anymore, which has drastically improved Tables.jl load times (around 0.06s on my laptop consistently, with ~0.01s coming from dependency loading, and the rest ~0.05s being Tables.jl definitions itself). Is that really too heavy? I can certainly understand the "separation of concerns" argument, wherein I believe the right answer is waiting on JuliaLang/Pkg.jl#1285, which would allow the proper separation of "glue" code into a separate glue module.
I do agree with the sink concern: namely that separating the API makes it easier for packages to be sources, but not sinks (since the Tables.rows
fallback would be in Tables.jl).
We also already have the property that an object can be a "table" without explicitly depending on Tables.jl, via iterating property-accessible objects; but this isn't supported for columns.
All in all, I think packages should just decide whether they want to take the Tables.jl dependency or not, or wait for proper glue package support.
from dataapi.jl.
The problem is that packages don't really agree on the API of most of these functions. Though it would probably be useful to file an issue for each to discuss that.
from dataapi.jl.
Each package would be free to define their API, but they could also implement a common one. I think it should cover those in the Query.jl style (Julia / LINQ style, Tidyverse, maditr) like... Should we first define which operations we want to support and then the API design?
from dataapi.jl.
At least we would need packages to agree on a common API to ensure no ambiguities or incompatibilities happen.
from dataapi.jl.
Why would that be the case?
select(tbl::Any, cols::Symbol...) = Tables.columntable(tbl, cols) # This is some fallback
DataFrames would then add,
select(df::AbstractDataFrame, cols::Symbol...) = ... # The definition for the tabular struct a package provides
from dataapi.jl.
you should define bare select
without defining any methods (Julia allows for this).
The problem with your definition is that some packages might accept something else than Symbol
for cols
. A basic example would be:
In one package:
select(tbl::Any, cols::Symbol...)
In the other package:
select(tbl::AbstractDataFrame, cols::Any...)
and you are toasted.
That is what, if I understand this correctly, @nalimilan meant by common API. A minimal requirement is to be specific on a single positional argument with a fixed position.
The problem in your case is that using Any
is type piracy from Base. Of course it cannot be always avoided, but the way to resolve type piracy issue when it is impossible to avoid it is exactly what I proposed - i.e. having agreed a positional argument that is guaranteed not to be subject to type piracy (if someone knows a better method to handle this please comment).
from dataapi.jl.
Aye. The API design would have to be drafter similar to for example, Abstraction for Statistical Models in StatsBase.jl.
If the API is defined with select(tbl, cols::Symbol...)
then at the package it would have to make the transformation to dispatch in whatever form the package chooses.
One decision is whether to have a fallback method, no definition, or an error("$method is not defined for $(typeof(obj)).")
.
The Any
is not type piracy since the method is DataAPI.select
which is not defined in Base. Packages would import
and extend DataAPI.select
, but restricting the first argument for their provided tabular struct (i.e., no type piracy). This is akin to the Statistical Model API that requires every method in the API to have the first argument as <:StatsBase.StatisticalModel
|
<:StatsBase.RegressionModel
. We don't require <: Tabular
or something, but it should be fine since the package would have to define the method for a tabular struct that their package defines. What we would need to define here is:
- What capacities we want the API to support?
- Namespace (naming for the methods)
- Method definition (arguments, expected results, default behavior)
The benefit of defining the API rather than throwing namespaces is that the users get an universal way to query tables front-end while using the efficient struct specific internals for each tabular representation. Packages can provide their flavor as well to interact with their structs regardless.
from dataapi.jl.
Yes that's not type piracy as long as packages only add methods with the first argument being of a type they own. But for that function to be generically usable, we should define at least some common signatures that are expected to work (e.g. symbol varargs).
from dataapi.jl.
Yes that's not type piracy as long as packages only add methods with the first argument being of a type they own.
This is exactly what I have postulated.
from dataapi.jl.
Some potential features,
- Select columns
- Filter rows
- Join tables (inner join, left join, right join, full join, anti join, semi join; allow cartesian)
- Mutate/Transform (make or replace columns)
- Rename columns
- Rearrange columns
- Arrange rows (sort)
- Eliminate duplicates (unique / distinct)
- Reshape Stack/Unstack | Melt/Cast
- Add rows / append (SQL UNION)
- Aggregate / Summarise
- Groupby (split, apply, combine)
- Count/sequential (
.N
)
Don't know which ones I am missing...
Probably something for handling Schema/types (Unitful / CategoricalArrays / Missing)
from dataapi.jl.
Aye. SQL-ish and parsimonious.
from dataapi.jl.
Related Issues (20)
- `Between` should accept more than `Int` and `Symbol` HOT 2
- `metadata` method HOT 70
- isordered HOT 4
- Deprecate `All` HOT 9
- TagBot trigger issue HOT 20
- ellipsis notation for Beetwen HOT 4
- Plan for 1.7 release
- Add flatten to DataAPI.jl HOT 6
- Add `Selector` abstract type for ecosytem compat, and rethink `Between` HOT 8
- Change describe contract HOT 2
- add kwarg to levels to keep missing
- nrow and ncol for undefined values HOT 1
- clarify Between HOT 1
- Add method for iterating metadata HOT 8
- a few concerns about metadata methods HOT 8
- Confusing `levels` fallback HOT 9
- default to metadata! style=:default HOT 6
- Define `rename` and `rename!` for modifying column names? HOT 3
- Don't define unwrap(x::Any) HOT 7
- rownumber HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataapi.jl.