Topic: datalake Goto Github
Some thing interesting about datalake
Some thing interesting about datalake
datalake,AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
User: abdullahkhawer
datalake,Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
Organization: activeloopai
Home Page: https://activeloop.ai
datalake,Self-managed thirdparty dependencies for Apache Doris
Organization: apache
Home Page: https://doris.apache.org
datalake,Apache Doris Website
Organization: apache
Home Page: https://doris.apache.org/
datalake,Upserts, Deletes And Incremental Processing on Big Data.
Organization: apache
Home Page: https://hudi.apache.org/
datalake,AWS Lake Formation makes it easy for you to set up, secure, and manage your data lakes also data discovery using the metadata search capabilities of Lake Formation in the console, and metadata search results restricted by column permissions.
Organization: aws-big-data-projects
datalake,A Data Platform built for AWS, powered by Kubernetes.
Organization: awslabs
Home Page: https://awslabs.github.io/aws-orbit-workbench/
datalake,Code/Notes for the Data Engineering Zoomcamp by DataTalksClub
User: balajirvp
datalake,A serverless datalake project and framework based on AWS S3,Glue,Athena,MWAA and QuickSight. With a series of best practices, it guides you how to build a serverless datalake.
User: bluishglc
datalake,Use SQL to build ELT pipelines on a data lakehouse.
Organization: cuebook
Home Page: https://cuelake.cuebook.ai
datalake,Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.
Organization: datalinkdc
Home Page: http://www.dinky.org.cn
datalake,World's most powerful data catalog service with providing a high-performance, geo-distributed and federated metadata lake.
Organization: datastrato
Home Page: https://datastrato.ai/docs/
datalake,Threat Detection and Visualization
Organization: datatech-solutions
datalake,A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
Organization: datavault-uk
Home Page: https://www.automate-dv.com
datalake,The DataLake GraphQL Wrapper provides a GraphQL API for presto/trino.
Organization: dbsystel
datalake,Apiary provides modules which can be combined to create a federated cloud data lake
Organization: expediagroup
datalake,Terraform scripts for deploying Apiary Data Lake
Organization: expediagroup
Home Page: https://github.com/ExpediaGroup/apiary
datalake,An Git-like version control file system for data lineage & data collaboration.
Organization: gitdataai
Home Page: https://jiaozifs.com
datalake,A library to accelerate ML and ETL pipeline by connecting all data sources
User: hifxit
datalake,Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
User: izhangzhihao
datalake,The Internals of Delta Lake
Organization: japila-books
Home Page: https://books.japila.pl/delta-lake-internals
datalake,Awesome list for datapipeline
User: kennethanceyer
Home Page: https://github.com/KennethanCeyer/awesome-data-pipeline
datalake,LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Organization: lakesoul-io
Home Page: https://lakesoul-io.github.io/
datalake,Apache Spark 3 - Structured Streaming Course Material
User: learningjournal
Home Page: https://www.learningjournal.guru
datalake,Apache Spark Course Material
User: learningjournal
Home Page: https://www.learningjournal.guru
datalake,汇总Apache Hudi相关资料
User: leesf
datalake,The LeoFS Storage System
Organization: leo-project
Home Page: https://leo-project.net/leofs/
datalake,Open Control Plane for Tables in Data Lakehouse
Organization: linkedin
Home Page: https://www.openhousedb.org/
datalake,Data Pipeline from the Global Historical Climatology Network DataSet
User: marcosmjd
datalake,This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
User: martandsingh
datalake,Python idiomatic SDK for Cortex™ Data Lake.
Organization: paloaltonetworks
Home Page: https://cortex.pan.dev/docs/data_lake/develop/cdl_python_installation
datalake,Postgres for Search and Analytics
Organization: paradedb
Home Page: https://paradedb.com
datalake,A curated list of open source tools used in analytical stacks and data engineering ecosystem
Organization: pracdata
datalake,Terraform script to deploy almost all Azure Data Services
User: rlevchenko
Home Page: https://rlevchenko.com/2020/09/15/deploy-azure-data-services-with-terraform/
datalake,A curated list of awesome Online Analytical Processing databases, frameworks, ressources and other awesomeness.
User: samber
datalake,A platform for extracting and shipping security value from your data lake to Sentinel.
User: seyed-nouraie
datalake,Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
Organization: sinaptik-ai
Home Page: https://pandas-ai.com
datalake,StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
Organization: starrocks
Home Page: https://starrocks.io
datalake,lakeFS - Data version control for your data lake | Git for data
Organization: treeverse
Home Page: https://docs.lakefs.io
datalake,Roota is a public-domain language of threat detection and response that combines native queries from a SIEM, EDR, XDR, or Data Lake with standardized metadata and threat intelligence to enable automated translation into other languages
User: uncoderio
Home Page: https://roota.io
datalake,An IDE and translation engine for detection engineers and threat hunters. Be faster, write smarter, keep 100% privacy.
User: uncoderio
Home Page: https://uncoder.io
datalake,Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
User: vim89
datalake,Streaming application development and management system, based on Linkis and DSS, planning to provide the workflow-like graphical drag-and-drop development capability.
Organization: webankfintech
datalake,Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Organization: zinggai
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.