Comments (1)
Making use of the fact that objects are handled by reference, here is a simple implementation in pseudocode of your problem:
from typing import Optional
from pyspark.sql import DataFrame
from atc.etl import Transformer, Orchestrator, Loader
class BatchTransformer(Transformer):
def __init__(self):
super().__init__()
self.batch_df: Optional[DataFrame] = None
def setInputs(self, batch_df):
self.batch_df = batch_df
def process(self, df: DataFrame) -> DataFrame:
df = df.join(self.batch_df) # etc, your logic
self.batch_df = None # we have consumed it
return df
class StreamLoader(Loader):
def __init__(self):
super().__init__()
self.transformer = BatchTransformer()
self.orchestrator = (
Orchestrator()
# .extract_from(BatchExtractor())
.transform_with(self.transformer)
# .load_into(BatchLoader())
)
def save(self, df: DataFrame) -> None:
self.transformer.setInputs(df)
self.orchestrator.execute()
from atc-dataplatform.
Related Issues (20)
- Support Microsoft Synapse
- location-based delta names when in debug
- DeltaHandle create table use schema
- Documentation of "how-to-do-testing"
- Create common pattern for unit/integration testing HOT 1
- Sql executor incorrect match
- Incremental extraction HOT 2
- Schemamanager is not compatible with Databricks connect HOT 1
- SimpleSqlServerTransformer should use Sql Handle
- Azure CPU quota increase HOT 1
- TestSelectAndCastColumns is flaky HOT 2
- Orchestrator stream methods HOT 1
- Centralize names in conf-file
- Generic schema selection in atc loaders
- Delta merge ignore no change HOT 3
- MessageTypeFilter for EhJsonToDeltaOrchestrator HOT 1
- Unity Catalog
- Optimized Pipeline tests
- Make it possible to save eventhub body as STRING HOT 1
- General dataframes join transformer helper class
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from atc-dataplatform.