inmobi / archived-data-bus Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
LocalStreamService, MergedStreamService, MirrorStreamService use the following commit Protocol to guarantee 0% data loss
SideEffect - Databus (MergedStreamService, MirrorStreamService can stop consuming data thereby resulting in DATABUS getting stalled.
Current Solution - Have operational monitoring on /databus/system/mirrors, /databus/system/consumers directories for each instance of databus. If #files in these directories exceed a threshold(10) then stop databus. cat these files to produce a single file in the respective directory by checking whether the file exists on HDFS else skip it. Start DATABUS.
Today MM can be a single digit number. Same holding true for dd, hr, mn format.
Consumer's using OOZIE to trigger their workflows based on the existence of a directory have issues as OOZIE does an exact match.
Issue1 -
Issue2 -
Fix - this file should be created in a destination cluster where data is being pulled in a timestamp based name inside tmp
LocalStreamService, MergedStreamService, MirrorStreamService use the following commit Protocol to guarantee 0% data loss
A transaction in DATABUS consists of moving multiple paths and DATABUS can fail in between thereby making atomicity tricky.
Fails after 1.(i) - no issues
Fails after 1.(ii) - No issues, next run will see this path isn't copied and this run will get this PATH( No replay)
Fails after 1.(iii) - Next run will see this PATH is already copied, skip this PATH and delete from ZK( Avoid data replay)
Fails after 1.(iv) - No issues
This issue is to add maven release plugin to the databus pom
DataMover is resilient to failures while moving data to intermediate directories. It has a cleanupMode which it enters on every run to fix any issues with Last Run.
If the previous run of DataConsumer isn't successful. There is a possibility that the tmpPath (/databus/system/tmp//jobIN might have some old data which wasn't committed. New run should cleanup this directory to avoid amount of data moved mismatch and any duplicate replays which may happen due to this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.