Giter Site home page Giter Site logo

project-webscraping-apis's Introduction

Ironhack Logo

Welcome to the Data Thieves Project!

In this project you will get all the data yourself! ๐Ÿ˜ฑ By now, getting pre-made databases is way too simple for you, it's high time you start leveraging your freshly learnt skills, and bring your inner hacker into practice. Discover all the possibilities that web data has to offer; use your creativity and discuss in group how you can reach the next level!

There will be plenty of projects where the focus will be on doing analysis. For this one, we want you to develop your own product or new features for an existing business. You are welcome to use the analytical tools you have learned as well, but the focus should be on functionality this time.

Content

Project Description

In this project, you will choose a topic and collect all the data by yourself. Luckily there's hundreds of Free API's where you can pull data from, and for the cases where there's no API, you'll have a chance to practice your cool web-scraping skills. If needed, you can later enrich your data with other sources of data. Always with a business sense in mind, you'll have a chance to dive deep into a topic of your choosing and come up with a new product/ features. You don't need to reinvent the wheel, sometimes looking at things through a different perspective is all it takes to develop great solutions for specific users.

Project Goals

  • Learn how to develop an interesting question and find the data to answer it.
  • Learn how to obtain data from different sources, including APIs, web-scraping and maybe even building some automated scrapers.
  • Your data and pipeline should serve as the core development of the project, enabling to build your business idea.
  • In case you need additional data to improve the quality of your project, feel free to enrich with other open-source databases available online.

Requirements

  • You must plan your project. That is why creating a Kanban or Trello Board is mandatory. You have a template for Trello here.
  • You CAN'T CODE until you project is planned.
  • Create a .gitignore file and include it in your repository.
  • Your project must include data from (at least) 2 different data sources (APIs & web, dataset & APIs, ...)

Deliverables

You are required to turn in the following:

  1. GitHub Repository with scripts and notebooks used, README and a requirements.txt file.
  2. README file should have a description of your project and how does the pipeline work.
  3. Requirements file should have all the packages used for someone to run your scripts/notebooks. Always consider that when someone uses your code, they'll do it in a new environment, where the only packages that should be installed are the ones your project requires to be ran. Ideally it should also be specified the package version (ex: scipy==1.4.1)
  4. Links to any external data you used (should also be on the README file) .
  5. Slides for your presentation.

Mentoring

Your TA's and LT will be your mentors! We will:

  • Follow your project in general, will be the second person that knows more about the project, after your group.
  • Check if you are following the tasks, your blockers, etc
  • Help/support you in specific questions.

Your mentor is NOT meant to:

  • Know everything.
  • Be your manager. You have to be the responsible person to do the tasks!

Schedule

This one is on you! You have already a couple of projects in your bag and are getting better and better at organising and working as a team, let's keep building that momentum.

README File structure

The README will be your paper and exists to act as a guide of your project. From describing the idea to having a walk-through on how to use your scripts, the README is the welcome file for any user that encounters your project.

The structure should be:

  1. Title of the project
  2. Introduction to your project.
  3. Explanation of how your pipeline works (API's + web-scraping depending on each case).
  4. Links to sources of external data you used to enrich your project (if aplicable).
  5. Conclusions after your analysis/ product/ feature(s) development.
  6. Further questions.

Presentation

You will have 10 minutes to present your project. The below are some ideas for slides you could include in your presentation; those marked with an (M) are mandatory!

  • (M) Title of the project
  • (M) Business idea.
  • (M) Technical developments.
  • Main challenges & strengths.
  • Product / Feature showcasing.
  • Main insights.
  • Questions you couldn't answer.
  • Something funny that happened during the project.
  • Things you learned during this project.
  • If you could start from scratch, what would you do differently?

Tips & Tricks

  • First, choose your topic and look for API's that can help you providing data.
  • Before you start coding and integrate more data, propose some interesting questions you could answer with the data you have.

Resources

Lists

AnyAPI
Top 50 Most Popular APIs on RapidAPI
18 Fun APIs For Your Next Project

Some Ideas

WeatherBit
Strava
GitHub
Twitter
LastFM
Spotify
NYTimes
News
Reddit
Medium
Twitch
IGDB
OMDB
GIPHY
StackExchange
YouTube
TheSportsDB
NBA API

Paper Examples

Data Analysis with Python
The Best Mario Kart Character According To Data Science

project-webscraping-apis's People

Contributors

ta-data-pt-rmt avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.