Giter Site home page Giter Site logo

Comments (2)

Insutanto avatar Insutanto commented on August 15, 2024

Thank you @whalebot-helmsman

I agree with you. It looks so great that we can implement different message queues without implement different schedulers. I am tired of those DRY's problems. 😫
I have read the issues and PRs that your mention, they are very valuable. I will try to use DownloaderAwarePriorityQueue and queue-based implementation. That would be great for me to implement some modules in the future. 😸
In the end, thank you for your contributions to the Scrapy project. 😸

from scrapy-distributed.

Insutanto avatar Insutanto commented on August 15, 2024

Hi @Insutanto

You doing nice work in this repo. I have the same desire: different message queues should be supported in scrapy.

Old implementations of this idea and one you have here share common disadvantage. For every type of queue you need to implement separate scheduler. Beside amount of work required such implementations can't use work done on improvement of scheduling. I am talking mostly about scrapy/scrapy#3520. The reason for going distributed(at least for me) is a lot of domains in a single crawl. Not using DownloaderAwarePriorityQueue makes crawling slower(like 10 times slower) according to benchmarks in mentioned PR.

To overcome this situation I developed and merged in scrapy/scrapy#3884 separation between logic of scheduler and external message queue.

It would be great for your project and scrapy community if you change from scheduler-based to queue-based.

More details and discussions can be find in scrapy/scrapy#4326. Example of such implementation for redis you can find in https://github.com/whalebot-helmsman/scrapy/blob/redis/scrapy/squeues.py#L101-L173 .

Also there is a PR for external queue protocol scrapy/scrapy#4783

Thanks for your proposal !

from scrapy-distributed.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.