Giter Site home page Giter Site logo

freecodecamp-articles-scraper's Introduction

Free Code Camp Articles Scraper

This is a freeCodeCamp.org articles scraper, created to practice data mining from websites in Ruby.

screenshot

This Scraper gets data from first pages of freeCodeCamp.org, loading every articles title, author and tag. This data is manipulated to obtain information like authors that published more articles recently and most used tags.

Built With

  • Ruby
  • HTTParty (requests)
  • Nokogiri (html parsing)
  • Byebug (debugging)
  • RSpec (tests)
  • Rubocop (linter)

Getting started

Prerequisites

  • To run this project, you must have Ruby installed (you can get it here).
  • To test the methods you need to install RSpec: $ gem install rspec

Setup

  • Clone this repository to your local machine or download the files.
  • Run $ bundle install to install gem dependencies.

Usage

  • Navigate to the project folder.

  • Run the following command on terminal:

    $ ./bin/main.rb
    
  • To run the specs for this project, use the following command.

    $ rspec -f d
    
  • To adjust the number of pages for scraping, change the argument in Scraper.scrap(numberOfPagesToScrap). The current value is 20 (which gets the 500 latest articles), and if there is no argument, the default value is 10. PS: Try not to insert a big value here, it is not cool to overload freecodecamp.org with our requests.

    articles = Scraper.scrap(20)  # line 20 of main.rb
    
  • To change the number of results filtered, change the argument value when calling methods from Filter class. If there is no second argument, it returns all results.

User Interface

  • After runing the application, choose one of the options displayed in the panel. The options will be displayed until 'Exit' option is taken.

     [=====================( Options Available )======================]
    
     1. List top 10 cover tags from articles
     2. List top 10 most published authors
     3. List last 25 articles
     4. List all cover tags from articles
     5. List all loaded articles title
     6. Exit
    
     >> Choose an option: 
    

๐Ÿ‘ค Author

๐Ÿค Contributing

Contributions, issues and feature requests are welcome! Feel free to check the issues page.

Show your support

Give a โญ๏ธ if you like this project!

Acknowledgments

All data gathered here came from freeCodeCamp.org website and is used only for study. When using this project, please avoid creating many requests on their server.

freecodecamp-articles-scraper's People

Contributors

flpfar avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.