Giter Site home page Giter Site logo

Comments (3)

svenaas avatar svenaas commented on July 20, 2024

We would benefit from :max_pages or :max_time options, especially in development and test environments.

from cobweb.

stewartmckee avatar stewartmckee commented on July 20, 2024

If the exception is raised then would you want the whole crawl to stop at that point?

I think you get the same from max_pages by submitting crawl_limit, there is also a crawl_limit_by_page boolean which i think is false by default. crawl_limit is the max number of urls, and if crawl_limit_by_page is set to true then the crawl_limit only applies to text/html content.

Like the idea of max_time though, hadn't thought of that before, thinking that would set a datetime and include that date into the within_crawl_limits to check if it has passed, so could also consume a stop_at datetime. max_time would just do the arithmetic for you.

from cobweb.

samnissen avatar samnissen commented on July 20, 2024

Yes, I think raising the error, breaking, or returning should stop the crawl as the default.

Wasn't aware of the crawl_limit – will check that out thank you.

As for max_time, I'm thinking that would probably be an integer, whereas something like stop_at could be a datetime.

from cobweb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.