Giter Site home page Giter Site logo

linkser's Introduction

Linkser

Linkser is a link parser for Ruby. It gets an URI, tries to dereference it and returns the relevant information about the resource.

Installation

Add to your Gemfile gem 'linkser' and bundle update

Using Linkser

l = Linkser.parse 'https://github.com/ging/linkser'
l.title #=> "linkser"
l.description #=> "linkser - Linkser is a link parser for Ruby. It gets an URI, tries to dereference it and returns the relevant information about the resource."

y = Linkser.parse 'http://youtube.com/someyoutubevideo'
y.title #=> the title of the video
y.images #=> the thumbnails of the video
y.resource.url #=> the url of the video

linkser's People

Contributors

atd avatar deevis avatar rafaelgg avatar roendal avatar rubenrails avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

linkser's Issues

invalid byte sequence in UTF-8

1.9.3p374 :001 > require 'linkser'
 => true 
1.9.3p374 :002 > l = Linkser.parse 'http://sports.163.com/nba/'
 => #<Linkser::Objects::HTML:0x007f92019e99c8 @url="http://sports.163.com/nba/", @last_url="http://sports.163.com/nba/", @head=#<Net::HTTPOK 200 OK readbody=true>, @options={}> 
1.9.3p374 :003 > l.title
encoding error : input conversion failed due to input error, bytes 0xC4 0x4E 0x42 0x41
 => "NBA,NBAֱҥ,\xD7钭ㄒ档" 

build_images

if web page has no image will retrieve the image from the css file should return results for a long time if the site has many css file

Use last_url and dynamic scheme

A lot of sites redirect from http to https, we need to take that into account when generating the complete-url for images.

> l = Linkser.parse "http://facebook.com"
Redirecting to https://facebook.com/
Redirecting to https://www.facebook.com/
=> #<Linkser::Objects::HTML:0x007f82a9ea8930 @url="http://facebook.com", @last_url="https://www.facebook.com/", @head=#<Net::HTTPOK 200 OK readbody=true>, @options={}>

> l.images
=> []

Problem here is that the complete_url method is using the original url being parsed, instead of the last_url in @head.

Also, within the complete_url method we need to use a dynamic scheme string, instead of always assuming the http:// scheme.

Remove leading and trailing whitespace from img_src

An accidental tab or newline in the src attribute <img src="\nhttp://example.ads.com/sample.jpg"/> will make Linkser think that it's a relative path (e.g. desn't begin with 'http' or '/') and thus concatenate it with the relative path, raising the following error:

URI::InvalidURIError (bad URI(is not URI?)

complete_url fails if src is nil

If an image doesn't have a src attribute (e.g. placeholder image for css or js image replacement), Nokogiri will still return that Element, causing the complete_url method to raise the following error:

NoMethodError: undefined method 'index' for nil:NilClass

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.