Giter Site home page Giter Site logo

html-entries's Introduction

HTML Entries

This Ruby gem may help to fetch entries data from HTML by provided intructions.

Installation

Console:

$ gem install html_entry

Gemfile:

gem 'html_entry', '~> 0.1.0'

Using

Extended examples

Please take a look an extended version of "Quick Start" in the file tests/example/fetch_url.rb.

Please review tests to get different examples in these files:

Quick Start

This example may show how to fetch TOP questions from https://stackoverflow.com/ home page.

require 'pp'
require 'nokogiri'
require 'open-uri'
require 'html_entry'

fetcher              = HtmlEntry::PageFetcher.new
fetcher.instructions = {
    # This block can be fetched by using CSS selector contains an entry (or entries).
    # There are might be several blocks, as in this example.
    block: {
        type:     :selector,
        selector: '#question-mini-list .question-summary',
    },
    # Further, we have to describe entity attributes
    entity:
           [
               # If fetched node/s contains several attributes data,
               # you may describe all of them.
               # In this example: question "summary" and "url".
               {
                   selector: '.summary h3 a',
                   data:     {
                       summary: {},
                       url:     {
                           type:      :attribute,
                           attribute: 'href',
                       },
                   }
               },
               {
                   selector: '.votes > .mini-counts > span',
                   data:     {
                       votes: {
                           filter: :to_i
                       }
                   }
               },
           ]
}

items = fetcher.fetch Nokogiri::HTML(
    open('https://stackoverflow.com/')
)

# show items in terminal
items.each do |item|
  puts <<-OUTPUT
Question: "#{item[:summary]}"
votes:    #{item[:votes]}

OUTPUT
end
puts 'Element data:'
puts
pp items.last

Output

Question: "Can't see the alerts I created on Azure Portal"
votes:    4

[ ... ]

Element data:

{:summary=>"Scrapy on aws ec2 ubuntu redirect for booking.com",
 :url=>"/questions/52056897/scrapy-on-aws-ec2-ubuntu-redirect-for-booking-com",
 :votes=>2,
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.