Comments (2)
Thank you @whalebot-helmsman
I agree with you. It looks so great that we can implement different message queues without implement different schedulers. I am tired of those DRY's problems. 😫
I have read the issues and PRs that your mention, they are very valuable. I will try to use DownloaderAwarePriorityQueue
and queue-based implementation. That would be great for me to implement some modules in the future. 😸
In the end, thank you for your contributions to the Scrapy project. 😸
from scrapy-distributed.
Hi @Insutanto
You doing nice work in this repo. I have the same desire: different message queues should be supported in scrapy.
Old implementations of this idea and one you have here share common disadvantage. For every type of queue you need to implement separate scheduler. Beside amount of work required such implementations can't use work done on improvement of scheduling. I am talking mostly about scrapy/scrapy#3520. The reason for going distributed(at least for me) is a lot of domains in a single crawl. Not using
DownloaderAwarePriorityQueue
makes crawling slower(like 10 times slower) according to benchmarks in mentioned PR.To overcome this situation I developed and merged in scrapy/scrapy#3884 separation between logic of scheduler and external message queue.
It would be great for your project and scrapy community if you change from scheduler-based to queue-based.
More details and discussions can be find in scrapy/scrapy#4326. Example of such implementation for redis you can find in https://github.com/whalebot-helmsman/scrapy/blob/redis/scrapy/squeues.py#L101-L173 .
Also there is a PR for external queue protocol scrapy/scrapy#4783
Thanks for your proposal !
from scrapy-distributed.
Related Issues (13)
- dynamic web crawlers HOT 1
- Support Delayed Message in RabbitMQ Scheduler
- Custom Interface for DupeFilter
- RocketMQ Scheduler
- RocketMQ Item Pipeline
- 是否能够添加初始url类似 scrapy_redis redis_key的功能 HOT 3
- Support Scrapy 2.6+ HOT 1
- Redis Streams Scheduler
- Redis Streams Item Pipeline
- First feedback from users HOT 1
- SQLAlchemy Pipeline
- Congratulate HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scrapy-distributed.