Comments (23)
Actually, it's an issue of Scrapyd. I would figure it out in the next release.
You can check out the log of finished jobs in the Logs page for the time being, and there is no need to persist the data folder of ScrapydWeb.
from scrapydweb.
Well I've just noticed that I could see the graph of the job when going to the Files > Logs section, a nice column let see the graph for all log files, which is perfect for me!
With a snapshot of the dashboard, it will be even better!
from scrapydweb.
Yes it would be awesome to support this feature !
Will it be possible to see the job graph after a scrapydweb restart ?
from scrapydweb.
I just implemented a snapshot mechanism for the Dashboard page so that you still can check out the last view of it in case the Scrapyd service is restarted.
What do you mean by 'job graph'?
from scrapydweb.
I mean the graphic that shows the number of items stored by minute, the number of pages crawled by minute, and the other graph giving the progression of the global amount of crawled pages / stored items
Very handy to have after a scrapydweb restart.
from scrapydweb.
The stats and graphs of a job would be available as long as the json file generated by LogParser or the original logfile exist.
You may need to adjust jobs_to_keep and finished_to_keep in the config file of Scrapyd
from scrapydweb.
But why emphasizing "after a ScrapydWeb restart"? Is there anything wrong with v1.1.0?
from scrapydweb.
Well indeed I’ve launched a job. It has finished, then I’ve restarted scrapydweb and scrapyd also. So I guess scrapydweb doesn’t show anymore the finished job and as a result I cannot get anymore the stats and graph of the job
I imagine that if scrapydweb persist the finished job(next release) then I’ll be able also to see the graph built in real time
Is it right ?
I’ll be happy to test this new release ;)
from scrapydweb.
As I told you before: "You can check out the log of finished jobs in the Logs page for the time being"
from scrapydweb.
Also note that the json files generated by LogParser would also be removed by Scrapyd when it deletes the original logfiles.
from scrapydweb.
v1.2.0: Persist jobs information in the database
from scrapydweb.
Hi,
scrapyd uses sqlite only as a concurrently accessed queue.
The persistence of scheduled jobs that you see right now was not in purpose.
scrapyd should have used https://docs.python.org/3/library/queue.html
to implement the spider queue instead of sqlite.
I think what's best is to make scrapyd more modular
so that developers like @my8100 can easily plug custom components,
eg a persistent job table.
from scrapydweb.
Is there any plans to add this feature in future releases? There are a lot of cases when it's nice to be able to restore failed parser right from where it stopped, so that already scheduled requests won't be lost, just like it implemented in SpiderKeeper.
from scrapydweb.
@goshaQ
What’s the meaning of “restore failed parser right from where it stopped“?
from scrapydweb.
@my8100
The same as in the first comment. SpiderKeeper allows to save the state of the queue that contains scheduled requests, if a spider will stop (because of user request or anything else), it will be able to start not from scratch. But now I think that there are some considerations that make it hard to provide such functionality. And it's not so hard to do it by yourself with all specifics of particular use case.
Btw, just noticed that Items section show error if scrapyd doesn't return them, which is normal if the result is written to database. It looks for me for the same reason on Jobs section keep show the red tip that tells to install logparser to show number of parsed items, even after I've installed logparser and launched it. Or am I doing something wrong? Sorry for unrelated question.
from scrapydweb.
-
pip install scrapydweb==1.3.0
Both the Classic view and Database view of the Jobs page are provided,
that's why I closed this issue in v1.2.0 -
Set SHOW_SCRAPYD_ITEMS to False to hide the Items link in the sidebar.
scrapydweb/scrapydweb/default_settings.py
Lines 155 to 159 in a449dbf
-
What's the result of visiting http://127.0.0.1:6800/logs/stats.json
from scrapydweb.
Thanks, that's what I was looking for. But it appears that there is no stats.json
on the server:
- What's the result of visiting http://127.0.0.1:6800/logs/stats.json
The reply is No Such Resource.
from scrapydweb.
I've installed logparser and launched it.
Restart logparser and post the full log.
from scrapydweb.
Restart logparser and post the full log.
[2019-08-08 07:23:22,926] INFO in logparser.run: LogParser version: 0.8.2
[2019-08-08 07:23:22,927] INFO in logparser.run: Use 'logparser -h' to get help
[2019-08-08 07:23:22,927] INFO in logparser.run: Main pid: 20297
[2019-08-08 07:23:22,927] INFO in logparser.run: Check out the config file below for more advanced settings.
****************************************************************************************************
Loading settings from /usr/local/lib/python3.6/dist-packages/logparser/settings.py
****************************************************************************************************
[2019-08-08 07:23:22,928] DEBUG in logparser.run: Reading settings from command line: Namespace(delete_json_files=False, disable_telnet=False, main_pid=0, scrapyd_logs_dir='/somepath', scrapyd_server='127.0.0.1:6800', sleep=10, verbose=False)
[2019-08-08 07:23:22,928] DEBUG in logparser.run: Checking config
[2019-08-08 07:23:22,928] INFO in logparser.run: SCRAPYD_SERVER: 127.0.0.1:6800
[2019-08-08 07:23:22,928] INFO in logparser.run: SCRAPYD_LOGS_DIR: /somepath
[2019-08-08 07:23:22,928] INFO in logparser.run: PARSE_ROUND_INTERVAL: 10
[2019-08-08 07:23:22,928] INFO in logparser.run: ENABLE_TELNET: True
[2019-08-08 07:23:22,928] INFO in logparser.run: DELETE_EXISTING_JSON_FILES_AT_STARTUP: False
[2019-08-08 07:23:22,928] INFO in logparser.run: VERBOSE: False
****************************************************************************************************
Visit stats at: http://127.0.0.1:6800/logs/stats.json
****************************************************************************************************
[2019-08-08 07:23:23,294] INFO in logparser.utils: Running the latest version: 0.8.2
[2019-08-08 07:23:26,299] WARNING in logparser.logparser: New logfile found: /somepath/2019-08-08T07_19_39.log (121355 bytes)
[2019-08-08 07:23:26,299] WARNING in logparser.logparser: Json file not found: /somepath/2019-08-08T07_19_39.json
[2019-08-08 07:23:26,299] WARNING in logparser.logparser: New logfile: /somepath/2019-08-08T07_19_39.log (121355 bytes) -> parse
[2019-08-08 07:23:26,331] WARNING in logparser.logparser: Saved to /somepath/2019-08-08T07_19_39.json
[2019-08-08 07:23:26,332] WARNING in logparser.logparser: Saved to http://127.0.0.1:6800/logs/stats.json
[2019-08-08 07:23:26,332] WARNING in logparser.logparser: Sleep 10 seconds
[2019-08-08 07:23:36,343] WARNING in logparser.logparser: Saved to http://127.0.0.1:6800/logs/stats.json
[2019-08-08 07:23:36,343] WARNING in logparser.logparser: Sleep 10 seconds
[2019-08-08 07:23:46,350] WARNING in logparser.logparser: Saved to http://127.0.0.1:6800/logs/stats.json
[2019-08-08 07:23:46,351] WARNING in logparser.logparser: Sleep 10 seconds
from scrapydweb.
Check if SCRAPYD_LOGS_DIR/stats.json exists.
Visit http://127.0.0.1:6800/logs/stats.json again.
from scrapydweb.
There is a file .json, but it's named the same as .log file.
The reply is the same - No Such Resource.
from scrapydweb.
Check if SCRAPYD_LOGS_DIR/stats.json exists.
from scrapydweb.
Did you see the comment below?
https://github.com/my8100/logparser/blob/711786042aece827be87acf0286fb68bfe5ebd20/logparser/settings.py#L20-L26
from scrapydweb.
Related Issues (20)
- Not able to see stats section of the job HOT 1
- scrapydweb failed to run on python 3.8 HOT 5
- 启动报错:sqlite3.OperationalError: no such table: metadata HOT 13
- Is it possible to run multiple spider at the same time in a tmux machine with scrapydweb automatically
- items Oops! Something went wrong. HOT 1
- scrapydweb fresh install won't run HOT 8
- APScheduler 3.10 causing 500 errors HOT 2
- How to Change Timezone of scrapydweb? HOT 3
- Which scrapyd image you use? HOT 2
- Clean install on clean Ubuntu VM. Whatever I do it is not working. HOT 2
- Docker compose scrapdweb with scrapyd the log url use docker name
- Processes dont stop after finishing HOT 1
- v1.4.1 submit cron job can't run HOT 1
- ('Connection aborted.', timeout('timed out',))
- ERROR: Package 'scrapydweb' requires a different Python: HOT 4
- Error while installing scrapydweb HOT 2
- spiders are closed but showing as running/warning in the tasks page
- web界面可以使用中文吗? HOT 1
- DATABASE_URL配置连接域名:端口的mysql失败 HOT 2
- Jobs are killed without a clear reason HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scrapydweb.