Hello, As far as I can see, the generated hash for each page doesn't

Should it be possible to add "depth" in the data hash ? about cobweb HOT 3 OPEN

ABrisset commented on July 20, 2024

Should it be possible to add "depth" in the data hash ?

from cobweb.

Comments (3)

stewartmckee commented on July 20, 2024

I'm assuming you mean minimum depth. One of the misconceptions with navigation is that there is one way to reach a page. The depth of a page can be different depending on the route you take to get to the page. Also, where is the homepage? Is it the page you started the crawl from or the url with the shortest url?

If we took it as the first page that was crawled and passed a depth number down with the crawl it would not be guaranteed to give accurate results as each page is only processed once, and if there was a page that was linked to from the homepage (depth 1) but was actually crawled based on a sub page of the homepage it would have a depth of 2.

Its something to think about, I suppose if you specified a page as the root and then processed all pages crawled after completion for the shortest route (we have the data for that) then that would give the most accurate results. But again, html navigation is not a tree structure, its a node graph with multiple parents and interconnections.

from cobweb.

nikhgupta commented on July 20, 2024

Thats correct, and that it would be inaccurate to report depth when processing the content. However, is there a way we can limit the crawl to a certain depth?

Lets say, we start from the seed url, and we only prefer to go 2 pages deep within the navigation. Is that possible with CobWeb? This is certainly possible with Anemone crawler, but it is an old gem, now. I love the way CobWeb uses Sidekiq/Resque jobs, and would really prefer to limit the crawl depth for the crawler.

Between, thanks again for the awesome gem. Really useful.

from cobweb.

colnpanic commented on July 20, 2024

I agree on both points, this is a really cool gem 👍 and would like to have a "max_depth" option. I totally understand that we're not dealing with tree data and that "depth" is relative, but it would still be useful. The nice thing it would give you is a chance to do a quick test of the "core" links from a page, following just a couple without processing the entire site so you can preview some results without waiting for the whole site to process.

from cobweb.

Recommend Projects

Should it be possible to add "depth" in the data hash ? about cobweb HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent