Comments (18)
from scrapydweb.
Please provide the latest log in http://127.0.0.1:5000/schedule/history/
from scrapydweb.
################################################## 2019-03-18 18:30:21
['10.8.5.40:6800']
curl http://10.8.5.40:6800/schedule.json -d project=helloworld -d _version=1552895608 -d spider=books -d jobid=2019-03-18T18_30_20
Update task #1 (test_schedule) successfully, next run at 2019-03-18 18:31:02+08:00.
kwargs for execute_task():
{
"task_id": 1
}
task_data for scheduler.add_job():
{
"coalesce": true,
"day": "",
"day_of_week": "",
"end_date": null,
"hour": "",
"id": "1",
"jitter": 0,
"max_instances": 1,
"minute": "",
"misfire_grace_time": 600,
"month": "",
"name": "test_schedule",
"second": "2",
"start_date": null,
"timezone": "Asia/Shanghai",
"trigger": "cron",
"week": "",
"year": "*"
}
################################################## 2019-03-18 17:48:41 <add_fire>
from scrapydweb.
Could you provide the screenshot of the Timer Tasks page with the parameters of task #1 displayed, as well as http://127.0.0.1.com:5000/1/tasks/1/
from scrapydweb.
Like this?
from scrapydweb.
from scrapydweb.
The screenshot of the Timer Tasks page with the parameters of task #1 displayed? btw, did you modify the source code of schedule.py before?
from scrapydweb.
Yes, with the parameters of task #1.
And i just used pip to install scrapydweb without any modification.
from scrapydweb.
from scrapydweb.
from scrapydweb.
The parameters of task #1 indicate that the task would be executed when the second is 2, and there is nothing wrong with the execution results!
It's weired that the day and the minute being empty string in the schedule history. How did you fill in these inputs when adding a task?
"day": "",
"minute": "",
from scrapydweb.
I used the default values of 'day' or 'minute'.
The task will be fired every minute, but i think if the job fired last minute is still running, then it should not be fired again. Is it right?
from scrapydweb.
But the value of day and hour should be '*' in the history, could you try to add another task without modifying the parameters of timer task and show me the log again?
The scheduler of Timer Tasks knows nothing about the scraping jobs. Please check out the related links in the HELP section at the top of the Run Spider page.
https://apscheduler.readthedocs.io/en/latest/userguide.html#limiting-the-number-of-concurrently-executing-instances-of-a-job
from scrapydweb.
Do i misunderstand the meaning of "job"?
I mean if one spider is fired, it gets a job id like "task_1_2019-03-18T18_53_02", then one minute later, the spider is fired again, then it gets another job id "task_1_2019-03-18T18_54_02", so these two are different jobs? Not two instances of one job?
from scrapydweb.
In Scrapy and Scrapyd, scheduling a spider run would result in a scraping job.
In APScheduler, firing a task would cause another kind of job instance. And the job instance goes away when the execution of the corresponding task is finished, no matter the scraping job sechduled is finished or not.
from scrapydweb.
I see, these are two different kinds of jobs. You are so nice to explain these details for me.
One more question, in my situation, i have a spider which may crawl for several days, but not sure about how long it will take to finish. I want to schedule the spider so that once it finishes, it will be fired automatically after one day or several days. So is there any solution for this situation?
from scrapydweb.
There are two solution:
- Enable the Email Notice feature of ScrapyWeb, and get notified when a scraping job is finished. Then fire the task manually.
- Catch the spider_closed signal of Scrapy and make a request to http://127.0.0.1:5000/1/tasks/xhr/fire/1/ to fire task #1 automatically.
from scrapydweb.
In Scrapy and Scrapyd, scheduling a spider run would result in a scraping job.
In APScheduler, firing a task would cause another kind of job instance. And the job instance goes away when the execution of the corresponding task is finished, no matter the scraping job sechduled is finished or not.
[UPDATE]
- In Scrapy and Scrapyd, scheduling a spider run would result in a scraping job.
- In ScrapydWeb, a job instance of APScheduler (name it apscheduler_job) would be generated when a task is added.
- Everytime you fire a task or it is time to execute a task, the next_run_time attribute of the corresponding apscheduler_job would be set to datetime.now(), and to None if a task is paused.
- The apscheduler_job would be removed when its next_run_time would never occur in the future or when you stop a task.
- The settings of
misfire_grace_time
,coalesce
, andmax_instances
make sense when APScheduler is too busy, or when ScrapydWeb is restarted, or when you add a task to fire every second while it takes more than one second to finish the task execution.
from scrapydweb.
Related Issues (20)
- Not able to see stats section of the job HOT 1
- scrapydweb failed to run on python 3.8 HOT 5
- 启动报错:sqlite3.OperationalError: no such table: metadata HOT 13
- Is it possible to run multiple spider at the same time in a tmux machine with scrapydweb automatically
- items Oops! Something went wrong. HOT 1
- scrapydweb fresh install won't run HOT 8
- APScheduler 3.10 causing 500 errors HOT 2
- How to Change Timezone of scrapydweb? HOT 3
- Which scrapyd image you use? HOT 2
- Clean install on clean Ubuntu VM. Whatever I do it is not working. HOT 2
- Docker compose scrapdweb with scrapyd the log url use docker name
- Processes dont stop after finishing HOT 1
- v1.4.1 submit cron job can't run HOT 1
- ('Connection aborted.', timeout('timed out',))
- ERROR: Package 'scrapydweb' requires a different Python: HOT 4
- Error while installing scrapydweb HOT 2
- spiders are closed but showing as running/warning in the tasks page
- web界面可以使用中文吗? HOT 1
- DATABASE_URL配置连接域名:端口的mysql失败 HOT 2
- Jobs are killed without a clear reason HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scrapydweb.