Giter Site home page Giter Site logo

mikelvallejo / proyecto_tl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ironhack-data-madrid-abril-2021/proyecto_tl

0.0 0.0 0.0 1.59 MB

Data cleaning, transformation and load.

Rich Text Format 0.12% Jupyter Notebook 96.63% Python 3.25%

proyecto_tl's Introduction

TL Project

Introduction

On this project, we are going to focus on the transformation and load of an specific dataframe. After that, we will be able to analyze data and get insights about shark attacks around the world in past years.

You can find 3 main archives in my repo:

  • The TL Project_final.py Python file. Were you can find the clean code.
  • De Playground.ipyb, a Jupyter notebook where you can see how data was cleaned and the checks where made in the process.
  • The .sql docs where you can see the queries I used to upload the clean set to SQL in order to get the insights.

Instructions for the analyst - how data was modified

At first the database was a bit messy and full of nulls. These are the principal changes were made during the transformation and data cleaning process:

  • All the rows that had all their values at null, were deleted.
  • After that Df was reduced by deleting every row with les than 3 no-null values.
  • No columns were deleted during this aproach.
  • You can find null values in the columns with the value 'Unknown', with 3 exceptions: pdf column ('no pdf') and both href columns ('no link')

Insights

  • Most of the attacks happen in summer, and there are less in winter time.
  • Attacks where registered mainly in USA and Australia
  • Shark attacks where mostly unprovoked, non fatal and affected to male.

proyecto_tl's People

Contributors

mikelvallejo avatar yonatanra avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.