charlotte-ruby / image_scraper Goto Github PK
View Code? Open in Web Editor NEWsimple utility that pulls image URLs from a web page
License: MIT License
simple utility that pulls image URLs from a web page
License: MIT License
I noticed the following build warnings on CI.
From CI
/home/travis/build/charlotte-ruby/image_scraper/lib/image_scraper/client.rb:10: warning: URI.escape is obsolete
/home/travis/build/charlotte-ruby/image_scraper/lib/image_scraper/client.rb:18: warning: calling URI.open via Kernel#open is deprecated, call URI.open directly or use URI#open
/home/travis/build/charlotte-ruby/image_scraper/lib/image_scraper/client.rb:38: warning: URI.escape is obsolete
/home/travis/build/charlotte-ruby/image_scraper/lib/image_scraper/client.rb:39: warning: URI.escape is obsolete
/home/travis/build/charlotte-ruby/image_scraper/lib/image_scraper/client.rb:52: warning: calling URI.open via Kernel#open is deprecated, call URI.open directly or use URI#open
/home/travis/build/charlotte-ruby/image_scraper/lib/image_scraper/client.rb:67: warning: URI.escape is obsolete
/home/travis/build/charlotte-ruby/image_scraper/lib/image_scraper/client.rb:84: warning: URI.escape is obsolete
Will try to submit PR for this shortly
This looks like a basic html issue but its causing a bad URI error:
url
http://www.amazon.com/Planet-Two-Disc-Digital-Combo-Blu-ray/dp/B004LWZW4W/ref=sr_1_1?s=movies-tv&ie=UTF8&qid=1324771542&sr=1-1
error
(bad URI(is not URI?): %20http://g-ecx.images-amazon.com/images/G/01/SIMON/IsaacsonWalter._V164348457_.jpg):
faulty html sample
<img height="300" src=" http://g-ecx.images-amazon.com/images/G/01/SIMON/IsaacsonWalter._V164348457_.jpg" style="float: right;" width="450">
Looks like the scraper is throwing the error when and image has a space and the first character of the source. Which apparently is a common mistake amazon.com makes
I can give you the full trace if that would help but it seems pretty straight forward. I played with a few lines trying to get it to just ignore the image if the first character was blank to no avail. (still new to rails.) Let me know what other information I can get you to help out.
Getting this error when trying to scrape images from certain websites (ie http://www.wired.com).
Is this an issue with whitespaces or quotes not being completely stripped out?
URI::InvalidURIError in StaticPagesController#home
bad URI(is not URI?): %22http://www.wired.com/js/scrolldock/i/sub_promo_bg_solid.gif%22
I'm a bit new to rails. I was excited to find your scraper as I believe it will do exactly what I want it to. However I keep getting this nil error when scraping any amazon.com url.
example "http://www.amazon.com/OtterBox-Universal-Defender-Silicone-Plastic/dp/B004N7EY5S"
It would appear to be that the strip_quotes function is the only thing using gsub and it's having issues when it's provided an empty url.
My thought was that I could just define :include_css_images=>false as that function only seems to be called when handling stylesheet urls but that did not fix the issue.
Again I'm new to rails so I wish I could give more info that may help. If I'm just clueless and missing something obvious then I do apologize. My hope is only to help make this gem better.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.