linux：HTTPConnectionPool(host='192.168.0.24', port=6801): Max retries exceeded with ur

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Why not try it yourself and share with us? <p dir="auto

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

配置文件我应该怎么配置呢？能不能发生产环境一个模板呢？ FYI <p di

my8100,scrapydweb

my8100 commented on July 22, 2024 3

@zhangshengchun You have to manually restart the Scrapyd service. (Note that this operation would reset the running jobs at the same time since it keeps all job data in memory. ) That's why ScrapydWeb displays the finished jobs in the descending order.

from scrapydweb.

my8100 commented on July 22, 2024 3

@zhangshengchun Now you can set up the DASHBOARD_FINISHED_JOBS_LIMIT item in v1.1.0 to control the number of finished jobs displayed.

from scrapydweb.

KimFu2 commented on July 22, 2024 2

Why not try it yourself and share with us?

ok.If there is, I don't have to do it. If not, I will do it and share it.

from scrapydweb.

my8100 commented on July 22, 2024 2

Check out the .db files in the data path on your localhost to confirm that it’s mounted successfully.
Besides, ScrapydWeb v1.3.0 now supports both MySQL and PostgreSQL backends, simply set up the DATABASE_URL option to store the data.

from scrapydweb.

my8100 commented on July 22, 2024 1

@zhangshengchun Try the latest version v1.0.0 (which fixed issue 14) and remember to set ENABLE_CACHE = True.
If the problem remains, please new an issue with details.

from scrapydweb.

zhangshengchun commented on July 22, 2024 1

@my8100 The problem of high CPU usage has been solved. thanks a lot!

from scrapydweb.

my8100 commented on July 22, 2024 1

There's no need to set up the SCRAPYD_LOGS_DIR option if you are not running any Scrapyd server on the ScrapydWeb host.

from scrapydweb.

LWsmile commented on July 22, 2024

配置文件我应该怎么配置呢？能不能发生产环境一个模板呢？

from scrapydweb.

my8100 commented on July 22, 2024

Make sure that Scrapyd has been installed and started.
Then visit the Scrapyd server (in your case: http://192.168.0.24:6801) to check the connectivity.

from scrapydweb.

my8100 commented on July 22, 2024

配置文件我应该怎么配置呢？能不能发生产环境一个模板呢？

FYI

How to efficiently manage your distributed web scraping projects

如何简单高效地部署和监控分布式爬虫项目

from scrapydweb.

my8100 commented on July 22, 2024

Leave your comment here for help, in English if possible.
使用过程中如有疑问，请添加评论，尽量用英语。

from scrapydweb.

zhangshengchun commented on July 22, 2024

CPU occupancy is too high when i refresh the statistical information，and the web interface has been waiting for a long time. i using scrapydweb to manage 25 scrapyd server nodes on five virtual machines in the same lan，how can I solve it?

from scrapydweb.

zhangshengchun commented on July 22, 2024

@my8100 scrapyweb work very well, its a greate project！now i have a question, the finished job list in
dashboard getting longer and longer，how to remove finished job info and log in dashboard?

from scrapydweb.

KimFu2 commented on July 22, 2024

did you hava a docker deploy version?

from scrapydweb.

my8100 commented on July 22, 2024

Why not try it yourself and share with us?

from scrapydweb.

my8100 commented on July 22, 2024

FYI

$ cat docker-entrypoint.sh 
#!/bin/bash
mkdir /code/logs
/usr/bin/nohup /usr/local/bin/logparser -t 10 -dir /code/logs > logparser.log 2>&1 &
#TO KEEP THE CONTAINER RUNNING
/usr/local/bin/scrapyd  > scrapyd.log

from scrapydweb.

1034467757 commented on July 22, 2024

Hello,I have some questions.How can I start more than one spider in a single scrapyd server without using timer task?In my project,I need to start lots of spiders at the same time(like batch start),or,I need to stop spiders at one scrapyd server.Thanks for your reply !

from scrapydweb.

my8100 commented on July 22, 2024

It's not supported yet as I don't think it's a common practice. You can use the Requests library in a Python script instead.

from scrapydweb.

1034467757 commented on July 22, 2024

It's not supported yet as I don't think it's a common practice. You can use the Requests library in a Python script instead.

OK,thanks!

from scrapydweb.

KimFu2 commented on July 22, 2024

Is SCRAPYDWEB a true distributed crawler?Use the same queue like Scrapy-redis, or the crawler is running independently?

from scrapydweb.

my8100 commented on July 22, 2024

ScrapydWeb is a web application for Scrapyd cluster management, not a crawler. Have you read the readme and these tutorials?

from scrapydweb.

fishmisswater commented on July 22, 2024

How to set up the SCRAPYD_LOGS_DIR option? I try this:

and this:

neither work

from scrapydweb.

my8100 commented on July 22, 2024

# Check out this link to find out where the Scrapy logs are stored:
# https://scrapyd.readthedocs.io/en/stable/config.html#logs-dir
# e.g. 'C:/Users/username/logs/' or '/home/username/logs/'
SCRAPYD_LOGS_DIR = ''

from scrapydweb.

fishmisswater commented on July 22, 2024

thx ! but I still don't konw how to fix it:
the scrapyd.conf file shows my scrapyd log dir is"/root/logs" :

from scrapydweb.

fishmisswater commented on July 22, 2024

aha, it's running on the server side.

from scrapydweb.

sun8029554 commented on July 22, 2024

Hello，I feel better if u can add time filtering to the task page.

from scrapydweb.

my8100 commented on July 22, 2024

Hello，I feel better if u can add time filtering to the task page.

@sun8029554 Could you explain the need for "time filtering" in detail?

from scrapydweb.

jiuheit commented on July 22, 2024

{"node_name": "WIN-HPEPRHCCLL4", "status", "error", "message": "Expected one of [b'HEAD', b'object', b'POST']}
The crawler that integrates selenium cannot stop. How can this problem be solved?

from scrapydweb.

my8100 commented on July 22, 2024

How did you get that response? Any more logs?

from scrapydweb.

jiuheit commented on July 22, 2024

How did you get that response? Any more logs?

Other reptiles that do not integrate selenium can stop normally

from scrapydweb.

my8100 commented on July 22, 2024

I need the log of scrapydweb when you click the Stop button.

stop.log
Click the log output after stopping

You are not running scrapydweb with argument -v or --verbose to change the logging level to DEBUG, please try again and post the logs.

from scrapydweb.

my8100 commented on July 22, 2024

I need the log of scrapydweb when you click the Stop button.

I tried to stop using Cmder, and there was an error.

It seems that Scrapyd got redundant single quotes for the project name.
Try to post the request with the requests library instead.

from scrapydweb.

my8100 commented on July 22, 2024

Scrapyd only returned "message": "", which helps nothing.
It's not an issue of ScrapydWeb.

Visit the web UI of Scrapyd to make sure that the job is still running and under control.
Also, check out the logs of Scrapyd.

[2019-04-21 21:57:04,870] DEBUG    in ApiView: POST data: {
    "job": "2019-04-21T10_02_38",
    "project": "2019_4_20_005650"
}
[2019-04-21 21:57:04,909] ERROR    in ApiView: !!!!! (200) error: http://yj.xxxx.com:80/cancel.json
[2019-04-21 21:57:04,909] DEBUG    in ApiView: Got json from http://yj.xxxx.com:80/cancel.json: {
    "auth": [
        "username",
        "password"
    ],
    "message": "",
    "node_name": "WIN-HPEPRHCCLL4",
    "status": "error",
    "status_code": 200,
    "url": "http://yj.xxxx.com:80/cancel.json",
    "when": "2019-04-21 21:57:04"
}

from scrapydweb.

jiuheit commented on July 22, 2024

Scrapyd only returned "message": "", which helps nothing.
It's not an issue of ScrapydWeb.

Visit the web UI of Scrapyd to make sure that the job is still running and under control.
Also, check out the logs of Scrapyd.

[2019-04-21 21:57:04,870] DEBUG    in ApiView: POST data: {
    "job": "2019-04-21T10_02_38",
    "project": "2019_4_20_005650"
}
[2019-04-21 21:57:04,909] ERROR    in ApiView: !!!!! (200) error: http://yj.xxxx.com:80/cancel.json
[2019-04-21 21:57:04,909] DEBUG    in ApiView: Got json from http://yj.xxxx.com:80/cancel.json: {
    "auth": [
        "username",
        "password"
    ],
    "message": "",
    "node_name": "WIN-HPEPRHCCLL4",
    "status": "error",
    "status_code": 200,
    "url": "http://yj.xxxx.com:80/cancel.json",
    "when": "2019-04-21 21:57:04"
}

Okay. Thank you for continuing to find out why.

from scrapydweb.

sun8029554 commented on July 22, 2024

Hello，I feel better if u can add time filtering to the task page.

@sun8029554 Could you explain the need for "time filtering" in detail?

Query the crawler's operation according to the time period, such as querying a week of crawlers, or a few days 。。。I think it's a common practice for most of the people 。

from scrapydweb.

my8100 commented on July 22, 2024

@sun8029554 You mean displaying all task results within a period of time, like today or this week?

from scrapydweb.

my8100 commented on July 22, 2024

Why not try it yourself and share with us?

ok.If there is, I don't have to do it. If not, I will do it and share it.

@KimFu2, scrapydwebDockerFile from @AoeSyL, not verified.

from scrapydweb.

ServletJunior commented on July 22, 2024

我删除了任务，还是照样运行，请问怎样暂停或停止，

from scrapydweb.

my8100 commented on July 22, 2024

我删除了任务，还是照样运行，请问怎样暂停或停止，

Replied in #51.

from scrapydweb.

Syucryingstar commented on July 22, 2024

How to solve：bash: scrapydweb: command not found? I use pip3 to install scrapydweb whit no report error.Thanks.

from scrapydweb.

my8100 commented on July 22, 2024

pip3 uninstall scrapydweb
pip3 install scrapydweb
scrapydweb
Post all the log if still fail to start scrapydweb.

from scrapydweb.

Syucryingstar commented on July 22, 2024

pip3 uninstall scrapydweb

pip3 install scrapydweb

scrapydweb

Post all the log if still fail to start scrapydweb.

[root@localhost scrapydweb]# pip3 install scrapydweb
Collecting scrapydweb
Using cached https://files.pythonhosted.org/packages/71/69/e60d374d7571a55058176bc0df1ce8bc15a9fe113ecba4c49d7e42eac478/scrapydweb-1.2.0-py3-none-any.whl
Requirement already satisfied: logparser==0.8.1 in /usr/local/python/lib/python3.6/site-packages (from scrapydweb) (0.8.1)
Requirement already satisfied: six>=1.12.0 in /usr/local/python/lib/python3.6/site-packages (from scrapydweb) (1.12.0)
Requirement already satisfied: requests>=2.21.0 in /usr/local/python/lib/python3.6/site-packages (from scrapydweb) (2.22.0)
Requirement already satisfied: SQLAlchemy>=1.2.15 in /usr/local/python/lib/python3.6/site-packages (from scrapydweb) (1.3.4)
Requirement already satisfied: Flask-SQLAlchemy>=2.3.2 in /usr/local/python/lib/python3.6/site-packages (from scrapydweb) (2.4.0)
Requirement already satisfied: flask>=1.0.2 in /usr/local/python/lib/python3.6/site-packages (from scrapydweb) (1.0.3)
Requirement already satisfied: setuptools>=40.6.3 in /usr/local/python/lib/python3.6/site-packages (from scrapydweb) (41.0.1)
Requirement already satisfied: flask-compress>=1.4.0 in /usr/local/python/lib/python3.6/site-packages (from scrapydweb) (1.4.0)
Requirement already satisfied: APScheduler>=3.5.3 in /usr/local/python/lib/python3.6/site-packages (from scrapydweb) (3.6.0)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/python/lib/python3.6/site-packages (from requests>=2.21.0->scrapydweb) (1.25.3)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/python/lib/python3.6/site-packages (from requests>=2.21.0->scrapydweb) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/python/lib/python3.6/site-packages (from requests>=2.21.0->scrapydweb) (2019.3.9)
Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/python/lib/python3.6/site-packages (from requests>=2.21.0->scrapydweb) (2.8)
Requirement already satisfied: Jinja2>=2.10 in /usr/local/python/lib/python3.6/site-packages (from flask>=1.0.2->scrapydweb) (2.10.1)
Requirement already satisfied: itsdangerous>=0.24 in /usr/local/python/lib/python3.6/site-packages (from flask>=1.0.2->scrapydweb) (0.24)
Requirement already satisfied: click>=5.1 in /usr/local/python/lib/python3.6/site-packages (from flask>=1.0.2->scrapydweb) (6.7)
Requirement already satisfied: Werkzeug>=0.14 in /usr/local/python/lib/python3.6/site-packages (from flask>=1.0.2->scrapydweb) (0.15.4)
Requirement already satisfied: tzlocal>=1.2 in /usr/local/python/lib/python3.6/site-packages (from APScheduler>=3.5.3->scrapydweb) (1.3)
Requirement already satisfied: pytz in /usr/local/python/lib/python3.6/site-packages (from APScheduler>=3.5.3->scrapydweb) (2017.2)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/python/lib/python3.6/site-packages (from Jinja2>=2.10->flask>=1.0.2->scrapydweb) (1.0)
Installing collected packages: scrapydweb
Successfully installed scrapydweb-1.2.0
You are using pip version 10.0.1, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
[root@localhost scrapydweb]# scrapydweb
bash: scrapydweb: command not found...

It still doesn't work....

from scrapydweb.

my8100 commented on July 22, 2024

Try non-root user, or use virtualenv.

from scrapydweb.

Syucryingstar commented on July 22, 2024

Try non-root user, or use virtualenv.

Thanks a lot. ^_^

from scrapydweb.

JQ-K commented on July 22, 2024

求问，定时任务是不是用的0时区时间，比如我定时任务是8点启动，但是实际上显示0点就启动了

from scrapydweb.

my8100 commented on July 22, 2024

@JQ-K
Please new an issue with details (logs and screenshots).

from scrapydweb.

1-eness commented on July 22, 2024

请问可以通过接口或者API来设置定时任务和查询定时任务的状态嘛？现在只能通过网页操作

from scrapydweb.

my8100 commented on July 22, 2024

@1-eness
It may be supported in a future release, via RESTful API.
Is it necessary to add a timer task via API?
What kind of information of a timer task would you like to get from the API?

from scrapydweb.

1-eness commented on July 22, 2024

@1-eness
It may be supported in a future release, via RESTful API.
Is it necessary to add a timer task via API?
What kind of information of a timer task would you like to get from the API?

希望可以支持定时任务的CRUD,目前完全只能通过网页，对代码整合有点不友好

from scrapydweb.

Jiyanggg commented on July 22, 2024

请问, 能支持千级数量的定时任务么? (Can you support a thousand-level number of scheduled tasks?)

from scrapydweb.

my8100 commented on July 22, 2024

@L2016J
Thousands of what? tasks or nodes?

from scrapydweb.

my8100 commented on July 22, 2024

I would suggest you set up the jitter option to add a random component to the execution time,
as SQLite is being used in the background.

from scrapydweb.

kachacha commented on July 22, 2024

I would suggest you set up the jitter option to add a random component to the execution time,
as SQLite is being used in the background.

Set it up to perform custom tasks randomly?

from scrapydweb.

my8100 commented on July 22, 2024

Execute a task by random delay of [-N, +N] secs, defaults to 0. (docs)
It would help avoid writing database concurrently.

from scrapydweb.

kachacha commented on July 22, 2024

.（docs）

Thank you ，I‘ll try it。

from scrapydweb.

kachacha commented on July 22, 2024

my SCRAPYD_SERVERS have not port ,is error "None of your SCRAPYD_SERVERS could be connected.";Thank you for your reply.

from scrapydweb.

my8100 commented on July 22, 2024

@kachacha
What do you mean? Can you post the full log?

from scrapydweb.

kachacha commented on July 22, 2024

Fail to decode json from http://scrapyd********.local:80/logs/stats.json: Expecting value: line 2 column 1 (char 1)

from scrapydweb.

kachacha commented on July 22, 2024

@kachacha
What do you mean? Can you post the full log?

This problem has been solved. It's the port problem.

from scrapydweb.

my8100 commented on July 22, 2024

Fail to decode json from http://scrapyd********.local:80/logs/stats.json: Expecting value: line 2 column 1 (char 1)

Post the full log and the content of stats.json.
Or rename stats.json to stats.json.bak and restart logparser.

from scrapydweb.

kachacha commented on July 22, 2024

'pip install logparser' on host 'scrapyd1.****.local:80' and run command 'logparser' to show crawled_pages and scraped_items.

from scrapydweb.

kachacha commented on July 22, 2024

from scrapydweb.

Jiyanggg commented on July 22, 2024

I am running the service with docker.
For the persistence of timed tasks, (the scheduled task still exists when restarting)
I have mounted in the "/scrapydweb/data" directory, but it has no effect.
What should I do? Thank you

from scrapydweb.

yutiya commented on July 22, 2024

[2019-08-30 10:35:00,050] INFO     in apscheduler.scheduler: Scheduler started
[2019-08-30 10:35:00,061] INFO     in scrapydweb.run: ScrapydWeb version: 1.4.0
[2019-08-30 10:35:00,062] INFO     in scrapydweb.run: Use 'scrapydweb -h' to get help
[2019-08-30 10:35:00,062] INFO     in scrapydweb.run: Main pid: 5854
[2019-08-30 10:35:00,062] DEBUG    in scrapydweb.run: Loading default settings from /Library/Python/2.7/site-packages/scrapydweb/default_settings.py
Traceback (most recent call last):
  File "/usr/local/bin/scrapydweb", line 11, in <module>
    load_entry_point('scrapydweb==1.4.0', 'console_scripts', 'scrapydweb')()
  File "/Library/Python/2.7/site-packages/scrapydweb/run.py", line 37, in main
    load_custom_settings(app.config)
  File "/Library/Python/2.7/site-packages/scrapydweb/run.py", line 124, in load_custom_settings
    print(u"{star}Overriding custom settings from {path}{star}".format(star=STAR, path=handle_slash(path)))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 21: ordinal not in range(128)

你好，想知道是哪里出错的呢？谢谢

from scrapydweb.

my8100 commented on July 22, 2024

Go to the user home directory (e.g. /home/yourusername) and try again.
It’s recommended to use Python 3 instead.

from scrapydweb.

yutiya commented on July 22, 2024

谢谢。我用的python2.7。删除run.py 124行print(u"xxx") 这行改成print("xxx") 就可以运行了。很感谢

from scrapydweb.

my8100 commented on July 22, 2024

@foreverbeyoung
Try to install scrapydweb in virtualenv.

from scrapydweb.

panithan-b commented on July 22, 2024

Hello, I am able to set scrapy and scrapydweb and get them running as a separate docker containers, but the pages and items are N/A, not a number as shown in your example at http://scrapydweb.herokuapp.com

The crawlers can be run properly without any errors shown.
Here are what I have tried:

In scrapyd.conf, I set items_dir=/spider/items
In scrapydweb_settings_v10.py, I set SHOW_SCRAPYD_ITEMS = True
I was using an example of JsonWriterPipeline in https://docs.scrapy.org/en/latest/topics/item-pipeline.html. Now I'm trying to use the Feed Exports from https://docs.scrapy.org/en/latest/topics/feed-exports.html

When run, I expected some items from customspider in /spider/items, but actually there is no file there

If possible, I would like to know how you set the items setting for scrapyd and scrapydweb on that site so that I would try modifying it to work in my implementation.

from scrapydweb.

my8100 commented on July 22, 2024

@panithan-b
Run scrapyd, scrapydweb, and logparser outside dockers to figure out how they cooperate first,
then update your dockers accordingly.

from scrapydweb.

panithan-b commented on July 22, 2024

Hello, I've tried running scrapyd, scrapydweb, and logparser outside docker and got a log file like this, still not sure why most fields are still "N/A" or null
https://pastebin.com/WqRZYcAB

from scrapydweb.

my8100 commented on July 22, 2024

What’s the content in file “/spider/logs/tzspider/cosmenet/7dc32862e10311e9bb640242ac130002.log"

from scrapydweb.

panithan-b commented on July 22, 2024

It was the blank file. But now I just know what happened. I set the LOG_LEVEL to ERROR in the scrapy config file for staging environment so it won't print anything except the ERROR level when it runs properly, so it prints nothing.

Now I set it to INFO and finally able to see the log content, the number of page crawled, and scraped items. :)

from scrapydweb.

Ray916 commented on July 22, 2024

Hello，how to make the send_text work, I want to use email to alert, but I dont understand what the code in the send_text page do, thank you

from scrapydweb.

stone0311 commented on July 22, 2024

hi, When I run the ninth crawler on each server, it shows waiting to run. How do I set it up to increase the number of crawlers that each server runs?looking forward to your answer.

from scrapydweb.

kachacha commented on July 22, 2024

hi, When I run the ninth crawler on each server, it shows waiting to run. How do I set it up to increase the number of crawlers that each server runs?looking forward to your answer.

Set in the config of scrapyd. ...\scrapyd\default_scrapyd.conf

from scrapydweb.

stone0311 commented on July 22, 2024

嗨，当我在每台服务器上运行第九个搜寻器时，它显示等待运行。我该如何设置它以增加每个服务器运行的搜寻器的数量？期待您的回答。

在scrapyd的配置中设置。... \ scrapyd \ default_scrapyd.conf

thank you

from scrapydweb.

zaifei5 commented on July 22, 2024

Why does my timer always run twice?

from scrapydweb.

icella commented on July 22, 2024

How display real ip in "servers" view page - [scrapyd servers], not '127.0.0.1:6800'?

Now i configured scrapyd_servers in a tuple like ('username', 'password', '127.0.0.1', '6801', 'group'), and the web page dispaly '127.0.0.1:6800', but i wanto real ip , so how to configure SCRAPYD_SERVERS. help pls~~~

from scrapydweb.

PShiYou commented on July 22, 2024

Hello, can you record a full scrapydweb tutorial video? I followed the tutorials on the Internet and reported all kinds of mistakes

from scrapydweb.

dataknower commented on July 22, 2024

董圣源已经收到您的来信~会尽快回复谢谢This is Bruce Dong . I have received your letter and will answer you as soon as possible,Thanks.

from scrapydweb.

488443059 commented on July 22, 2024

An error occurred while uploading the project str object has no attribute decode

from scrapydweb.

User Guide | Q&A | 用户指南 | 问答 about scrapydweb HOT 81 OPEN

Comments (81)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent