When i set a schedule on a spider, i set "max_instances" to 1 and "coalesce" as "True

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

Please provide the latest log in <a href="http://127.0.0.1:5000/schedule/history/" rel

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

Yes, with the parameters of task <a class="issue-link js-issue-link" data-error-text="

Please show me sth like this: <a target="_blank" rel="noopener noreferrer nofollow

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

"max_instances" setting is not work about scrapydweb HOT 18 CLOSED

my8100 commented on July 22, 2024

"max_instances" setting is not work

from scrapydweb.

Comments (18)

luckyyezi commented on July 22, 2024 1

from scrapydweb.

my8100 commented on July 22, 2024

Please provide the latest log in http://127.0.0.1:5000/schedule/history/

from scrapydweb.

luckyyezi commented on July 22, 2024

################################################## 2019-03-18 18:30:21
['10.8.5.40:6800']
curl http://10.8.5.40:6800/schedule.json -d project=helloworld -d _version=1552895608 -d spider=books -d jobid=2019-03-18T18_30_20
Update task #1 (test_schedule) successfully, next run at 2019-03-18 18:31:02+08:00.
kwargs for execute_task():
{
"task_id": 1
}

task_data for scheduler.add_job():
{
"coalesce": true,
"day": "",
"day_of_week": "",
"end_date": null,
"hour": "",
"id": "1",
"jitter": 0,
"max_instances": 1,
"minute": "",
"misfire_grace_time": 600,
"month": "",
"name": "test_schedule",
"second": "2",
"start_date": null,
"timezone": "Asia/Shanghai",
"trigger": "cron",
"week": "",
"year": "*"
}
################################################## 2019-03-18 17:48:41 <add_fire>

from scrapydweb.

my8100 commented on July 22, 2024

Could you provide the screenshot of the Timer Tasks page with the parameters of task #1 displayed, as well as http://127.0.0.1.com:5000/1/tasks/1/

from scrapydweb.

luckyyezi commented on July 22, 2024

Like this?

from scrapydweb.

luckyyezi commented on July 22, 2024

from scrapydweb.

my8100 commented on July 22, 2024

The screenshot of the Timer Tasks page with the parameters of task #1 displayed? btw, did you modify the source code of schedule.py before?

from scrapydweb.

luckyyezi commented on July 22, 2024

Yes, with the parameters of task #1.
And i just used pip to install scrapydweb without any modification.

from scrapydweb.

my8100 commented on July 22, 2024

Please show me sth like this:

from scrapydweb.

luckyyezi commented on July 22, 2024

from scrapydweb.

my8100 commented on July 22, 2024

The parameters of task #1 indicate that the task would be executed when the second is 2, and there is nothing wrong with the execution results!
It's weired that the day and the minute being empty string in the schedule history. How did you fill in these inputs when adding a task?

"day": "",
"minute": "",

from scrapydweb.

luckyyezi commented on July 22, 2024

I used the default values of 'day' or 'minute'.
The task will be fired every minute, but i think if the job fired last minute is still running, then it should not be fired again. Is it right?

from scrapydweb.

my8100 commented on July 22, 2024

But the value of day and hour should be '*' in the history, could you try to add another task without modifying the parameters of timer task and show me the log again?
The scheduler of Timer Tasks knows nothing about the scraping jobs. Please check out the related links in the HELP section at the top of the Run Spider page.
https://apscheduler.readthedocs.io/en/latest/userguide.html#limiting-the-number-of-concurrently-executing-instances-of-a-job

from scrapydweb.

luckyyezi commented on July 22, 2024

Do i misunderstand the meaning of "job"?
I mean if one spider is fired, it gets a job id like "task_1_2019-03-18T18_53_02", then one minute later, the spider is fired again, then it gets another job id "task_1_2019-03-18T18_54_02", so these two are different jobs? Not two instances of one job?

from scrapydweb.

my8100 commented on July 22, 2024

In Scrapy and Scrapyd, scheduling a spider run would result in a scraping job.
In APScheduler, firing a task would cause another kind of job instance. And the job instance goes away when the execution of the corresponding task is finished, no matter the scraping job sechduled is finished or not.

from scrapydweb.

luckyyezi commented on July 22, 2024

I see, these are two different kinds of jobs. You are so nice to explain these details for me.
One more question, in my situation, i have a spider which may crawl for several days, but not sure about how long it will take to finish. I want to schedule the spider so that once it finishes, it will be fired automatically after one day or several days. So is there any solution for this situation?

from scrapydweb.

my8100 commented on July 22, 2024

There are two solution：

Enable the Email Notice feature of ScrapyWeb, and get notified when a scraping job is finished. Then fire the task manually.
Catch the spider_closed signal of Scrapy and make a request to http://127.0.0.1:5000/1/tasks/xhr/fire/1/ to fire task #1 automatically.

from scrapydweb.

my8100 commented on July 22, 2024

In Scrapy and Scrapyd, scheduling a spider run would result in a scraping job.
In APScheduler, firing a task would cause another kind of job instance. And the job instance goes away when the execution of the corresponding task is finished, no matter the scraping job sechduled is finished or not.

[UPDATE]

In Scrapy and Scrapyd, scheduling a spider run would result in a scraping job.
In ScrapydWeb, a job instance of APScheduler (name it apscheduler_job) would be generated when a task is added.
- Everytime you fire a task or it is time to execute a task, the next_run_time attribute of the corresponding apscheduler_job would be set to datetime.now(), and to None if a task is paused.
- The apscheduler_job would be removed when its next_run_time would never occur in the future or when you stop a task.
- The settings of misfire_grace_time, coalesce, and max_instances make sense when APScheduler is too busy, or when ScrapydWeb is restarted, or when you add a task to fire every second while it takes more than one second to finish the task execution.

from scrapydweb.

"max_instances" setting is not work about scrapydweb HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent