Comments (1)
Just some notes on adding arrow support in arcon...
Elements in Arcon look like this:
pub struct ArconElement<A: ArconType> {
#[prost(message, tag = "1")]
pub data: Option<A>,
#[prost(message, tag = "2")]
pub timestamp: Option<u64>,
}
Let's say our data is the following:
#[arcon]
pub struct ArconStruct {
#[prost(uint32, tag = "1")]
pub id: u32,
#[prost(string, tag = "2")]
pub name: String,
}
We would like to be able to convert our ArconElement's into the following Arrow columnar data.
id: UInt32 | name: Utf8 | timestamp: UInt64 |
---|---|---|
10 | test | 11535324290 |
15 | test2 | 11535324299 |
.... | .... | ... |
This conversion would happen through a function like the one below.
fn to_arrow(elems: Vec<ArconElement>) -> arrow::RecordBatch {
...
}
Potential Use Cases
Serve a materialised view of "up to date" data where users may:
- Submit SQL queries to run within Arcon
- Efficiently retrieve Arrow data through Arrow Flight (gRPC) to other applications (e.g., Pandas, Spark).
Cons: Hard to avoid data duplication as our default storage data format is row-based and Arrow is columnar and not meant for long-term storage. However, as long as conversion is done in batches, it should be okay.
from arcon.
Related Issues (20)
- Draft: Project Governance
- Fix arcon_state metrics
- Async Sources
- Explicit Keyed Streams HOT 4
- Arrow Streaming Analytics HOT 1
- Data Sharding HOT 2
- Operator Kind HOT 1
- Parquet Support
- Remove explicit key parameter on Timer
- FilterMap Operator
- Timer Timeout Missing Key Context
- KeyBy on Sources
- Reusable Streams
- ArconState/Index/Backend Rework
- crates release
- Move to arrow2 HOT 1
- Stream Kernel HOT 3
- Rolling Aggregations
- Pipeline Operators
- remove arcon_shell
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arcon.