Giter Site home page Giter Site logo

mvp_pucrio_dataengineering's Introduction

MVP Data Engineering - Data Science & Analytics Post-Graduation Program of PUC-Rio

Student: Dr. Vagner Zeizer Carvalho Paes

Professors: Tatiana Escovedo, Fernanda Baião, Marcos Villas, Anthony Seabra, Silvio Alonso, and Victor Almeida

Files and Folders Structures

  • ./

    • DataCatalog_MVPIII.xlsx: it contains the Data Catalog of the Crimes in Paraná dataset in the bronze layer;
    • MVP_III_CrimesPR.ipynb: it contains the ipython notebook summarizing MVP's results;
    • MVP_III_CrimesPR.pdf: the pdf file of the ipython notebook, which summarizes MVP's results.
  • ./assets/

    • It contains many figures regarding the screenshots used to evidence the work has been done, as shown and fully documented in the ipython notebook MVP_III_CrimesPR.ipynb.
  • ./data/

    • cleaned_CrimesPR.csv: it contains "cleaned data" (same as in the silver layer) regarding different types of Crimes in the State of Paraná, Brazil, over the 2018-2023 years, which was obtained after cleaning/transforming the data to an appropriate format for subsequent data analysis;
    • CrimesPR.csv: it contains raw data (same as in the bronze layer) regarding different types of Crimes in the State of Paraná, Brazil, over the 2018-2023 years;
    • CrimesPR_Metropolitan_Curitiba_statistics.xlsx: it contains descriptive statistics concerning the main municipalities in the metropolitan area of Curitiba.
  • ./notebooks/

    • DataBricks_ETL_SQLqueries_PySpark.ipynb: it shows the script used in DataBricks in order to run the full ETL pipeline and perform SQL queries;
    • EDA_CrimesPR.ipynb: it shows a comprehensive Exploratory Data Analysis (EDA) procedure in Python;
    • ETL.ipynb: it shows the full Extract Transform and Load (ETL) pipeline in Python code;
    • queries_crimes.sql: it shows the queries written in SQL used to answer the business questions defined in the beginning of the project.

LICENSE

The license for this public, that data was made available by the Brazilian government, is the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This license allows anyone to reuse, distribute, and modify the data for any purpose, including commercial purposes, as long as they give appropriate credit to the original source.

mvp_pucrio_dataengineering's People

Contributors

vzeizer avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.