Giter Site home page Giter Site logo

Comments (18)

luckyyezi avatar luckyyezi commented on July 22, 2024 1

4
3

from scrapydweb.

my8100 avatar my8100 commented on July 22, 2024

Please provide the latest log in http://127.0.0.1:5000/schedule/history/

from scrapydweb.

luckyyezi avatar luckyyezi commented on July 22, 2024

################################################## 2019-03-18 18:30:21
['10.8.5.40:6800']
curl http://10.8.5.40:6800/schedule.json -d project=helloworld -d _version=1552895608 -d spider=books -d jobid=2019-03-18T18_30_20
Update task #1 (test_schedule) successfully, next run at 2019-03-18 18:31:02+08:00.
kwargs for execute_task():
{
"task_id": 1
}

task_data for scheduler.add_job():
{
"coalesce": true,
"day": "",
"day_of_week": "
",
"end_date": null,
"hour": "",
"id": "1",
"jitter": 0,
"max_instances": 1,
"minute": "
",
"misfire_grace_time": 600,
"month": "",
"name": "test_schedule",
"second": "2",
"start_date": null,
"timezone": "Asia/Shanghai",
"trigger": "cron",
"week": "
",
"year": "*"
}
################################################## 2019-03-18 17:48:41 <add_fire>

from scrapydweb.

my8100 avatar my8100 commented on July 22, 2024

Could you provide the screenshot of the Timer Tasks page with the parameters of task #1 displayed, as well as http://127.0.0.1.com:5000/1/tasks/1/

from scrapydweb.

luckyyezi avatar luckyyezi commented on July 22, 2024

Like this?

from scrapydweb.

luckyyezi avatar luckyyezi commented on July 22, 2024

1

from scrapydweb.

my8100 avatar my8100 commented on July 22, 2024

The screenshot of the Timer Tasks page with the parameters of task #1 displayed? btw, did you modify the source code of schedule.py before?

from scrapydweb.

luckyyezi avatar luckyyezi commented on July 22, 2024

Yes, with the parameters of task #1.
And i just used pip to install scrapydweb without any modification.

from scrapydweb.

my8100 avatar my8100 commented on July 22, 2024

Please show me sth like this:
image

from scrapydweb.

luckyyezi avatar luckyyezi commented on July 22, 2024

2

from scrapydweb.

my8100 avatar my8100 commented on July 22, 2024

The parameters of task #1 indicate that the task would be executed when the second is 2, and there is nothing wrong with the execution results!
It's weired that the day and the minute being empty string in the schedule history. How did you fill in these inputs when adding a task?

"day": "",
"minute": "",

from scrapydweb.

luckyyezi avatar luckyyezi commented on July 22, 2024

I used the default values of 'day' or 'minute'.
The task will be fired every minute, but i think if the job fired last minute is still running, then it should not be fired again. Is it right?

from scrapydweb.

my8100 avatar my8100 commented on July 22, 2024

But the value of day and hour should be '*' in the history, could you try to add another task without modifying the parameters of timer task and show me the log again?
The scheduler of Timer Tasks knows nothing about the scraping jobs. Please check out the related links in the HELP section at the top of the Run Spider page.
https://apscheduler.readthedocs.io/en/latest/userguide.html#limiting-the-number-of-concurrently-executing-instances-of-a-job

from scrapydweb.

luckyyezi avatar luckyyezi commented on July 22, 2024

Do i misunderstand the meaning of "job"?
I mean if one spider is fired, it gets a job id like "task_1_2019-03-18T18_53_02", then one minute later, the spider is fired again, then it gets another job id "task_1_2019-03-18T18_54_02", so these two are different jobs? Not two instances of one job?

from scrapydweb.

my8100 avatar my8100 commented on July 22, 2024

In Scrapy and Scrapyd, scheduling a spider run would result in a scraping job.
In APScheduler, firing a task would cause another kind of job instance. And the job instance goes away when the execution of the corresponding task is finished, no matter the scraping job sechduled is finished or not.

from scrapydweb.

luckyyezi avatar luckyyezi commented on July 22, 2024

I see, these are two different kinds of jobs. You are so nice to explain these details for me.
One more question, in my situation, i have a spider which may crawl for several days, but not sure about how long it will take to finish. I want to schedule the spider so that once it finishes, it will be fired automatically after one day or several days. So is there any solution for this situation?

from scrapydweb.

my8100 avatar my8100 commented on July 22, 2024

There are two solution:

  1. Enable the Email Notice feature of ScrapyWeb, and get notified when a scraping job is finished. Then fire the task manually.
  2. Catch the spider_closed signal of Scrapy and make a request to http://127.0.0.1:5000/1/tasks/xhr/fire/1/ to fire task #1 automatically.

from scrapydweb.

my8100 avatar my8100 commented on July 22, 2024

In Scrapy and Scrapyd, scheduling a spider run would result in a scraping job.
In APScheduler, firing a task would cause another kind of job instance. And the job instance goes away when the execution of the corresponding task is finished, no matter the scraping job sechduled is finished or not.

[UPDATE]

  • In Scrapy and Scrapyd, scheduling a spider run would result in a scraping job.
  • In ScrapydWeb, a job instance of APScheduler (name it apscheduler_job) would be generated when a task is added.
    • Everytime you fire a task or it is time to execute a task, the next_run_time attribute of the corresponding apscheduler_job would be set to datetime.now(), and to None if a task is paused.
    • The apscheduler_job would be removed when its next_run_time would never occur in the future or when you stop a task.
    • The settings of misfire_grace_time, coalesce, and max_instances make sense when APScheduler is too busy, or when ScrapydWeb is restarted, or when you add a task to fire every second while it takes more than one second to finish the task execution.

from scrapydweb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.