Giter Site home page Giter Site logo

srikanthgr1 / databricks-certified-data-engineer-professional-questions Goto Github PK

View Code? Open in Web Editor NEW

This project forked from amrit-hub/databricks-certified-data-engineer-professional-questions

0.0 0.0 0.0 20.94 MB

This repo contains "Databricks Certified Data Engineer Professional" Questions and related docs.

Home Page: https://amrit-hub.github.io/Databricks-Certified-Data-Engineer-Professional-Questions/

databricks-certified-data-engineer-professional-questions's Introduction

Databricks Certified Data Engineer Professional Questions

Suggestions

Repo link

Topics

I was able to note down these topics memory based.

  1. Read parameters using dbutils.widgets.text and then get
  2. Read access for prod notebook for review to new data engineer - Can Read
  3. Attach notebook to cluster and run - Can Restart
  4. Production DLT pipelines in job cluster or all pupose cluster
  5. CTAS- executes load everytime or while table creation
  6. Scope access control to read prod secrets - Read on scope or secret
  7. %sh runs on driver node only
  8. If there is filter in query - file statistics in transaction log
  9. Vacuum run on shallow clone table- Error
  10. static_df.join(streaming_df) - which is not possible- left/inner/right
  11. Source is CDC - use merge into or leverage CDF feature
  12. How to find difference between previous and present commit
  13. Nighly jobs to overwrite a table for Bussiness team with least latency - Write to table nightly or create view
  14. What is Optimize Table - File target of 1GB?
  15. from code - .withWatermark for 10 inutes delay data
  16. from code - aggregate on source and overwrite/append to target
  17. Email notification in job run where mean(temp) > 120?. received email 3 time, why?
  18. Checkpoint should be unique for each streams
  19. Autoloader scenario based question in bronze with history and do update in target
  20. Streaming deduplication scenario from code question
  21. from code - batch load will overwrite/append?
  22. in CDF, if readChangeFeed start version is 0, and append, will there be deduplication?
  23. from code - Upsert- identify table is SCD 1 or 2
  24. To avoid performance issue, decrease trigger time or not
  25. delta table file skipping column decision based question - forst 32 questions, nested or not
  26. Grant usage and grand select on delta- what will it do?
  27. how to create unmanaged table?
  28. good candidate for partition column - date col
  29. Alter table xx rename xx - what happens in transaction log
  30. add check constraint error and recommendation
  31. tbl properties + comments + partition details - describe history/extended/detail?
  32. question on delta lake file statistics
  33. ganglia UI- detect spill based question
  34. repo branch missing on local - how to get that branch with latest code changes
  35. delete from A where id in (select id from B) - can we time travel and see those deleted records? and how to prevent?
  36. what is difference between dbfs and mounts
  37. api 2.0/jobs/create is exceuted 3 time using a json. what will happen? will it execute or create 3 jobs?
  38. What is dbfs
  39. install python lib - %pip install
  40. Task 1 has downstream task 2 and 3 in parallel. 1 passes, 2 passes, 3 fail- partially completed?
  41. streaming job retries in prod - job cluster, unlimited retries and 1 max concurrent run
  42. clone existing job and version it- how to do using databricks cli
  43. large json 1TB converted to parquet files with partition size of 512 MB - since its not delta table - read>narrow transformation>repartition 2048(1TB10241024/512)>convert to parquet? order of these steps?
  44. from code - drop duplicated on batch read and append- what happens in target table dedupliactes?
  45. one column was missed in prof from kafka- in future, to avoid - write to bronze to have full replayable history
  46. question on giving access control to users
  47. pyspark.sql.function.broadcast- what is the use- distribute to all worker nodes?
  48. from code - join on orders_is, when not matched- insert *- what it does
  49. def bronze_load is given. write silver load function so that it was be transformed and updated downstream
  50. case when is_member("group") then email else 'redacted end as email, lsv from table- what output if not member of group
  51. ganlia UI question to see logs based question
  52. how to get task having multi task run - 2.0/jobs/list or get ot 2.0/jobs/run/list or get
  53. what is unit testing
  54. in dev, multiple display() is executed repeatedly. what happens in prod?
  55. from delta, read has option("readChangeFeed") - will it work on source delta table with no CDC
  56. from code - identlty tumbling or sliding window
  57. Question on performance tuning spark.sql.files.maxPartitionBytes, spark.sql.shuffle.partitions

Must Read hyperlinks

No matter what, please read these databricks docs. Note the Important tags in these pages and questions at the end on some pages.

  1. Data skipping with Z-order indexes for Delta Lake
  2. Clone a table on Databricks
  3. Delta table streaming reads and writes
  4. Structured Streaming Programming Guide - Spark 3.5.0 Documentation
  5. Configure Structured Streaming trigger intervals
  6. Configure Delta Lake to control data file size
  7. Introducing Stream-Stream Joins in Apache Spark 2.3
  8. Best Practices for Using Structured Streaming in Production - The Databricks Blog
  9. What is Auto Loader?
  10. Upsert into a Delta Lake table using merge
  11. Use Delta Lake change data feed on Databricks
  12. Apply watermarks to control data processing thresholds
  13. Use foreachBatch to write to arbitrary data sinks
  14. How to Simplify CDC With Delta Lake's Change Data Feed
  15. VACUUM
  16. Jobs access control
  17. Cluster access control
  18. Secret access control
  19. Hive metastore privileges and securable objects (legacy)
  20. Data objects in the Databricks lakehouse
  21. Constraints on Databricks
  22. When to partition tables on Databricks
  23. Manage clusters
  24. Export and import Databricks notebooks
  25. Unit testing for notebooks
  26. Databricks SQL Statement Execution API โ€“ Announcing the Public Preview
  27. Transform data with Delta Live Tables
  28. Manage data quality with Delta Live Tables
  29. Simplified change data capture with the APPLY CHANGES API in Delta Live Tables
  30. Monitor Delta Live Tables pipelines
  31. Load data with Delta Live Tables
  32. What is Delta Live Tables?
  33. Solved: Re: What is the difference between Streaming live ... - Databricks - 17121
  34. What are all the Delta things in Databricks?
  35. Parameterized queries with PySpark
  36. Recover from Structured Streaming query failures with workflows
  37. Jobs API 2.0
  38. OPTIMIZE
  39. Adding and Deleting Partitions in Delta Lake tables
  40. What is the Databricks File System (DBFS)?
  41. Mounting cloud object storage on Databricks
  42. Databricks widgets
  43. Performance Tuning

databricks-certified-data-engineer-professional-questions's People

Contributors

amrit-hub avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.