Giter Site home page Giter Site logo

data-engineering-community / data-engineering-wiki Goto Github PK

View Code? Open in Web Editor NEW
1.0K 1.0K 104.0 7.77 MB

The best place to learn data engineering. Built and maintained by the data engineering community.

Home Page: https://dataengineering.wiki

License: Creative Commons Zero v1.0 Universal

CSS 72.40% JavaScript 27.60%
data data-engineer data-engineering data-modeling data-pipelines database etl sql

data-engineering-wiki's People

Contributors

adrianbr avatar chris2shehu avatar dejii avatar digitalghost-dev avatar icharo-tb avatar jphaus avatar mattppal avatar mihaitodor avatar mlee156 avatar peterwzhang avatar pgyogesh avatar radujica avatar rskriegs avatar shaounak avatar sonia1goyal avatar tfehring avatar ykdojo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-engineering-wiki's Issues

Creating a SQL "bible" of sorts

I've been learning SQL over the past 1/2 year and the concepts aren't sticking to me like they are with Python. I've been making a dictionary of SQL concepts/tools but I realized that the wiki could benefit from one. Here is an example from my Notion page.

What do you think about building a section/page for SQL related concepts? I know there is the SQL page already but maybe a quick reference guide like this could be useful since SQL is a huge part of the data world.

Link Checker Report

Summary

Status Count
๐Ÿ” Total 1037
โœ… Successful 1035
โณ Timeouts 0
๐Ÿ”€ Redirected 0
๐Ÿ‘ป Excluded 1
โ“ Unknown 0
๐Ÿšซ Errors 1

Errors per input

Errors in Index.md

Link Checker Report

Summary

Status Count
๐Ÿ” Total 974
โœ… Successful 970
โณ Timeouts 0
๐Ÿ”€ Redirected 0
๐Ÿ‘ป Excluded 1
โ“ Unknown 0
๐Ÿšซ Errors 3

Errors per input

Errors in Tools/Metricflow.md

Errors in Tools/Databases/ClickHouse.md

Link Checker Report

Summary

Status Count
๐Ÿ” Total 1037
โœ… Successful 1035
โณ Timeouts 0
๐Ÿ”€ Redirected 0
๐Ÿ‘ป Excluded 1
โ“ Unknown 0
๐Ÿšซ Errors 1

Errors per input

Errors in Concepts/Graph Database.md

Side Pane Tool category not organized || Need more subfolders

When you expand the Tools sections on the sidebar you get options for Data Ingestion ....... Workflow Orchestrators and after that, all the tools are floating outside without any category:
image

Can't we add some more folders like the AWS Cloud tool with subfolders like storage and put S3 inside that and the same case with all other tools?

Or to make it simple Put all AWS tools inside one AWS folder, all Azure-related stuff inside the Azure folder, and so on.

To recreate this issue:

OR

  • Click on this link

If possible you can assign me this task and I can help you guys to organize this.

Link Checker Report

Summary

Status Count
๐Ÿ” Total 1037
โœ… Successful 1031
โณ Timeouts 0
๐Ÿ”€ Redirected 0
๐Ÿ‘ป Excluded 1
โ“ Unknown 0
๐Ÿšซ Errors 5

Errors per input

Errors in FAQ/FAQ.md

Errors in FAQ/What is the difference between a Data Engineer and X.md

Errors in Guides/Testing Your Data Pipeline.md

Errors in Guides/Data Governance Guide.md

Errors in Guides/SQL Guide.md

Link Checker Report

Summary

Status Count
๐Ÿ” Total 993
โœ… Successful 991
โณ Timeouts 0
๐Ÿ”€ Redirected 0
๐Ÿ‘ป Excluded 1
โ“ Unknown 0
๐Ÿšซ Errors 1

Errors per input

Errors in Concepts/Graph Database.md

Changing order of folders

@JPHaus. Thank you for sharing the 'publish.js' file that you are using to change the order in which files appear in the navigation bar.
Do you know if/how it is possible to change the publish.js file to also change the order of folders in the navigation bar rather than simply of notes?

Link Checker Report

Summary

Status Count
๐Ÿ” Total 1037
โœ… Successful 1035
โณ Timeouts 0
๐Ÿ”€ Redirected 0
๐Ÿ‘ป Excluded 1
โ“ Unknown 0
๐Ÿšซ Errors 1

Errors per input

Errors in Concepts/Graph Database.md

Link Checker Report

Summary

Status Count
๐Ÿ” Total 1037
โœ… Successful 1035
โณ Timeouts 0
๐Ÿ”€ Redirected 0
๐Ÿ‘ป Excluded 1
โ“ Unknown 0
๐Ÿšซ Errors 1

Errors per input

Errors in Concepts/Graph Database.md

Learning DE Roadmap

A FAQ in the community is a structured roadmap for learning Data Engineering and it's about time we start addressing it. We currently have a getting started guide but it's not detailed enough and was meant to be improved on anyways.

It can be a complex question to answer but we can simplify it by adding a few constraints. Since the majority of folks asking are those who are new to DE or trying to transition we should focus on skills for junior/entry level and mid level roles. While there aren't many jr roles at the moment it can still be useful to make the distinction for foundational skills. To make it as general as possible, I believe we should exclude tools/requirements that only apply to FAANG-like companies since they are more niche and oftentimes FAANG companies have developed their own internal tooling to solve their unique problems. Finally, the focus should be on core concepts instead of tooling. While we can include specific tools, we should try to avoid directly recommending specific tools and instead point learners to pages that have lists of the current popular tools to keep this resource as evergreen as possible (example: workflow orchestration popular tools).

While I don't believe a diagram is a requirement, I do think it could be helpful if we can get it to render nicely in mermaid because we can then make it interactive and link to other notes in the wiki like we do with other diagrams. The canvas feature for Obsidian publish is not yet supported so we would probably use a mermaid flowchart for now.

Existing popular roadmap shared in the community:

For V1, please share any thoughts/ideas/constructive criticism on the structure and core concepts. I'll start a new branch after Christmas and start something we can work from.

Feature: Project section?

Hey, I've noticed that the subreddit for r/DataEngineering tends to have the same question over and over about projects and people asking for examples. How about adding a section in the wiki for people's open source data engineering projects?

Contributing by adding content - how about outlining potential topics?

Hi everyone, as I've read through the wiki I think it's great, but I also think it lacks some information regarding several concepts/tools which I personally find potentially relevant for data engineers. I would like to ask you what you think about the idea of gathering a list of such topics first? Some sort of a TO DO list. I think it could enable more contributions as potential contributors would be more encouraged if they knew exactly what topics could they focus on, and if their ideas are in a scope of a data engineering wiki.

Some examples of stuff that I think could be added, even if some of them are basic and/or straight-forward:

  • entries regarding message brokers/queues - what are they and some sample tools such as Apache Kafka
  • infrastructure stuff - virtualization, containerization, infrastructure-as-a-code etc. tools like Docker, Kubernetes, Terraform
  • data visualization, including some sample industry standard tools (for instance Power BI) and open-source (such as Superset)
  • maybe some more programming stuff (more languages - for example R or Rust, some OOP concepts, Big O Notation etc.)

Content on Data Contracts?

๐Ÿ‘‹ hi folks - just discovered data-engineering-wiki today. What a great resource. With all the noise in the space, I love the idea of a shared, vendor neutral space to create an in-depth handbook for the community.

I'm no expert on Data Contracts, but noticed that it's a hot topic these days, and I would love to have some reference content here. I'm wondering if y'all think that would be valuable and if so where that content might land. I'm not necessarily volunteering to author this (but I could probably be convinced to do some research and start a draft PR for collaboration ๐Ÿ™‚ ).

Addition of more Tools || New pages

Adding of Tools Like :

  • Power Automate , power Apps
    And some Visualization Tools like:
  • Power BI, QlikView, Qlik-sense, Tableau

This will help users to see all the relevent tools at one place.

Link Checker Report

Summary

Status Count
๐Ÿ” Total 1037
โœ… Successful 1035
โณ Timeouts 0
๐Ÿ”€ Redirected 0
๐Ÿ‘ป Excluded 1
โ“ Unknown 0
๐Ÿšซ Errors 1

Errors per input

Errors in Index.md

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.