Giter Site home page Giter Site logo

airavata-data-lake's Introduction

Apache Airavata Data Lake

License Build Status

Apache Airavata use cases enable capture of data from observational and experimental instruments and computations resulting from Airavata's orchestration capabilities. As the data deluges into vast amounts, harvesting the data, capturing metadata, presenting it for easy and controlled access becomes unmanageable.

Airavata data lake will bundle stand alone services to catalog data in various data stores, extract and capture semantics and metadata descriptions of the data and preserve associated data provenance. The data lake will provide API's, query and search capabilities to programmatically search and retrieve data and power building user interactivity and data analysis applications on top of it.

Airavata Data Lake Overview

Airavata Data Lake will provide file watcher and other trigger capabilities to ingest data from scientific instruments as they become available. The framwork will enable pluggable data parsers to read structured and unstructured data files and extract meaningful descriptions.

A bundled Data replica catalogs will associate pointers to data at multiple storgae locations. The replica catalog maps logical file names to the physical locations. Data Lake client SDK's will provide API functions to query replica location and resolve into multiple physical file locations. The client will be bundled with access protocols to retrive the data or to embedd into computational pipelines.

Interfacing with Airavata Managed File Transfer Service Data can moved and archiving into longer term persistant storages like tapped archives. The Data archives will be indexed and have search capabilities

Data Lake's provenance will provide information to capture parameters influenced the production or modification of the data. An abstraction API will enable plugging fine granted provenance based on Airavata tentant context. Interfacing with Airavata Orchestration Services, pointers to experiment catalog will enable restructuring of the underting computations.

airavata-data-lake's People

Contributors

dimuthuupe avatar dinukadesilva avatar isururanawaka avatar machristie avatar smarru avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

airavata-data-lake's Issues

Improve readme

Readme for airavata-data-lake needs improvements

  • Instructions to set up dev environment
  • How to contribute section
  • Addtional details about architecture

Externalize Envoy Rest Proxy

Currently envoy proxy is running as a docker container without a container management platform and it causes for service to shutdown intermittently.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.