Giter Site home page Giter Site logo

obenner / data-engineering-interview-questions Goto Github PK

View Code? Open in Web Editor NEW
802.0 11.0 288.0 995 KB

More than 2000+ Data engineer interview questions.

data-engineering interview-questions interview hadoop hadoop-hdfs spark flink sql kafka hive impala airflow aws azure cassandra flume hbase avro nifi data-structures

data-engineering-interview-questions's Introduction

More than 2000+ questions for preparing a Data Engineer interview.

Interview questions for Data Engineer

Databases and Data Warehouses
GitHub Repo Official page Questions Description Useful links
Cassandra Cassandra Apache Cassandra Cassandra is a distributed, wide-column store, NoSQL database management system. Awesome Cassandra
Greenplum Greenplum Greenplum Greenplum is a big data technology based on MPP architecture and the Postgres open source database technology. Awesome Greenplum
MongoDB MongoDB MongoDB MongoDB is a document-oriented database. Awesome MongoDB
Hbase Hbase Apache Hbase HBase is an open-source non-relational distributed database. Awesome HBase
Hive Hive Apache Hive Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Awesome Hive
Amazon DynamoDB Amazon DynamoDB Amazon DynamoDB is a fully managed proprietary NoSQL database service. Awesome DynamoDB Awesome AWS
Amazon Redshift Amazon Redshift Amazon Redshift is a data warehouse product. Amazon Redshift Utilities Awesome AWS
BigQuery BigQuery GCP BigQuery is a fully-managed, serverless data warehouse. Awesome BigQuery
Bigtable Bigtable GCP Bigtable is a fully managed wide-column and key-value NoSQL database service. Awesome Bigtable
Data Formats
Avro Avro Apache Avro Avro is a row-oriented remote procedure call and data serialization framework. Awesome Avro
Parquet Parquet Apache Parquet Apache Parquet is a column-oriented data file format designed for efficient data storage and retrieval. TODO
Delta Delta Delta Delta Lake is a storage framework that enables building a Lakehouse architecture with compute engines Delta examples
Big Data Frameworks
Airflow Airflow Apache Airflow Apache Airflow is a workflow management platform for data engineering pipelines. Awesome Airflow
Flume Flume Apache Flume Apache Flume is a distributed, reliable, and available software for efficiently collecting, aggregating, and moving large amounts of log data. TODO
Hadoop Hadoop Apache Hadoop Apache Hadoop is a collection of software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. Awesome Hadoop
Impala Impala Apache Impala Apache Impala is a parallel processing SQL query engine for data stored in a computer cluster running Apache Hadoop. TODO
Kafka Kafka Apache Kafka Apache Kafka is a distributed event store and stream-processing platform. Awesome Kafka
NiFi NiFi Apache NiFi Apache NiFi is a software project designed to automate the flow of data between software systems. Awesome NiFi
Spark Spark Apache Spark Apache Spark is unified analytics engine for large-scale data processing. Awesome Spark
Flink Flink Apache Flink Apache Flink is unified stream-processing and batch-processing framework. Awesome Flink
Kubernetes Kubernetes Kubernetes Kubernetes is a system for managing containerized applications across multiple hosts. Awesome Kubernetes
Cloud providers
AWS AWS Amazon Web Services Amazon web service is an online platform that provides scalable and cost-effective cloud computing solutions. Awesome AWS
Azure Azure Microsoft Azure Microsoft Azure is Microsoft's public cloud computing platform. Awesome Azure
GCP GCP Google Cloud Platform Google Cloud Platform is a suite of cloud computing services. Awesome GCP
Theory
DWHA DWH Architectures A data warehouse architecture is a method of defining the overall architecture of data communication processing and presentation that exist for end-clients computing within the enterprise. Awesome databases
Airflow Data Structures A data structure is a specialized format for organizing, processing, retrieving and storing data. TODO
SQL SQL SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS). Awesome SQL
Data visualization tools/BI
Tableau Tableau Tableau is a powerful data visualization tool used in the Business Intelligence. TODO
Looker Looker Looker is an enterprise platform for BI, data applications, and embedded analytics that helps you explore and share insights in real time. TODO
Kafka Apache Superset Apache Superset Superset is a modern data exploration and data visualization platform TODO

Contribution

Please contribute to this repository to help it make better. Any change like new question, code improvement, doc improvement etc is very welcome.

data-engineering-interview-questions's People

Contributors

khoramism avatar obenner avatar piyush-an avatar wingkwong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-engineering-interview-questions's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.