Comments (1)
Hi Andej,
Agreed. Its kind of grown organically over time. Sounds like its due for a v2. :). Regarding your points,
- 110% agreed, should be refactored, there are a number of helper classes too that should just ne part of the main class too.
- Agreed on this one, was thinking of spliting to something similar to resque_web. Possibly bundling it as a seperate gem too because all the assets for the gui make the gem too big.
- Yes, namespacing and refactoring should solve this hopefully
- Yes, could do, thinking about it, explicit declaration of a queue system would be good, especially if your not using one, less dependancies.
- Hadn't thought about logging, good suggestion. The debug and quiet options can be removed and output for those can be mapped to a log level.
Cool, so all good suggestions. I've been hesitant to make breaking changes, but will start a v2 branch next time i get a chance. i'll give you a shout once it's there and if you've got any changes fire a pull request to it,
thanks,
Stewart.
On 18 Nov 2014, at 08:44, Andrej JanΔiΔ [email protected] wrote:
Hi, first I want to say thank you for sharing this crawler and for the work you put in it.
Here is our experience with it and thoughts for improvements. I would be happy to know if you agree and if you would like to get this implemented (we can contribute of course).
We have a repository of code, we use for doing lots of data processing using resque.
We tried to use cobweb within our repository and here are our issues:name conflicts, classes are declared on a global level. Classes declared in cobweb should be name-spaced in a module. Example: Cobweb::Stats
Sinatra loaded by default. We run our code on multiple machines with multiple processes. As I understand sinatra's purpose is to provide a UI for stats. We don't need/want it to be loaded every time on all boxes consuming memory and slowing down the boot time of our app. So this should be optional (example: 'require cobweb-web' or separate gem).
files directive in gemspec. Everything you put in the files directive, can be loaded automatically. This again exposes naming conflicts. For example we use Fozzie that declares Stats module. But when you do 'require stats', you don't know which one is going to be loaded.
sidekick vs resque, could be optional programmers decision and I would avoid auto detection
logging should be configurable and puts statements should not be used. ruby Cobwbeb.logger = Logger.new
In conclusion this is what i have in mind:
require 'cobweb-resque'
OR
require 'cobweb-sidekick'
require 'cobweb-web' # optional
Cobweb.logger = Logger.new("crawler.log")
β
Reply to this email directly or view it on GitHub.
from cobweb.
Related Issues (20)
- Encoding problems HOT 5
- Improve connection handling
- Standalone Crawler gives error for redis HOT 6
- `require': cannot load such file -- resque (LoadError) HOT 3
- Should be possible to use Ruby's URI implementation instead of Addressable::URI HOT 3
- Should it be possible to add "depth" in the data hash ? HOT 3
- Inbound links are not normalized when stored
- Binary not installed HOT 1
- Redirect Limit causing crawl to stop
- LoadError with version 1.0.26 HOT 2
- error while installing cobweb-1.0.28.gem: Invalid argument @ rb_sysopen HOT 5
- Cobweb gem causes Rails app to run 10x slower HOT 9
- How can I start stop crawling website HOT 1
- Should depend on slop 3.6.0 or update cobweb bin HOT 1
- Error on first run HOT 2
- undefined method `banner' for main:Object (NoMethodError) on calling from command line HOT 1
- Falling into Crawl Traps HOT 10
- Feature request: Stop crawl at time HOT 3
- Error raised when there's a valid <base> tag in <head> HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cobweb.