k1995 / baiduyunspider Goto Github PK

View Code? Open in Web Editor NEW

1.1K 1.1K 481.0 2.9 MB

百度云网盘搜索引擎，包含爬虫 & 网站

Python 3.04% HTML 0.67% JavaScript 96.29%

python spider

baiduyunspider's Issues

errno=-55；这个是什么造成的，我的爬虫，现在一直被返回这个错误码

errno=-55；这个是什么造成的，我的爬虫，现在一直被返回这个错误码。能否给我一份大概带注释的爬虫脚本，我自己可以修改下，想减少下弯路，我是Python小白。谢了

404 Not Found

当我发出搜索请求时，显示的请求链接如下
http://mydomain/s/57un55S15L%2Bd5oqk?from=sf&type=all
可是，在nginx中，每次都是在显示404错误，找不到页面，
这个问题困扰几天了，仍然没解决，
请大佬帮忙解答

运行scrapy crawl baidupan报错，请问应该怎么解决呢？

File "c:\users\administrator.win-a3unjobi233\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 89, in crawl
yield self.engine.open_spider(self.spider, start_requests)
redis.exceptions.ConnectionError: Error 10061 connecting to 127.0.0.1:6379. 由于目标计算机积极拒绝，无法连接。.

2021-02-01 10:12:28 [twisted] CRITICAL:
Traceback (most recent call last):
File "c:\users\administrator.win-a3unjobi233\appdata\local\programs\python\python38\lib\site-packages\redis\connection.py", line 559, in connect
sock = self._connect()
File "c:\users\administrator.win-a3unjobi233\appdata\local\programs\python\python38\lib\site-packages\redis\connection.py", line 615, in _connect
raise err
File "c:\users\administrator.win-a3unjobi233\appdata\local\programs\python\python38\lib\site-packages\redis\connection.py", line 603, in _connect
sock.connect(socket_address)
ConnectionRefusedError: [WinError 10061] 由于目标计算机积极拒绝，无法连接。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\users\administrator.win-a3unjobi233\appdata\local\programs\python\python38\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "c:\users\administrator.win-a3unjobi233\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 89, in crawl
yield self.engine.open_spider(self.spider, start_requests)
redis.exceptions.ConnectionError: Error 10061 connecting to 127.0.0.1:6379. 由于目标计算机积极拒绝，无法连接。.

11

bug

爬虫代理ip，报错

我对代码进行了改造，使用了代理ip但是仍然报错：

uk:2518160999 error to fetch files,try again later

getShareLists errno:-55

代码如下：
def getHtml(url,ref=None,reget=5):
try:
proxies={'http': '222.194.14.130:808'}
proxy_support = urllib2.ProxyHandler(proxies)
opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler)
#定义Opener
# urllib2.install_opener(opener)
request = urllib2.Request(url)
request.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36')
if ref:
request.add_header('Referer',ref)
page = urllib2.urlopen(request,timeout=10)
html = page.read()
except:
if reget>=1:
#如果getHtml失败，则再次尝试5次
print 'getHtml error,reget...%d'%(6-reget)
time.sleep(2)
return getHtml(url,ref,reget-1)
else:
print 'request url:'+url
print 'failed to fetch html'
exit()
else:
return html

参考着我也写了一个百度云搜索 www.81ad.cn

我也写了一个百度云搜索 www.81ad.cn 没放广告,调用百度内部接口，现在已经有3千多万数据了

爬虫做种的时候报错

success to fetched hot users: 24
Traceback (most recent call last):
File "spider.py", line 475, in
spider.seedUsers()
File "spider.py", line 328, in seedUsers
self.db.commit()
File "spider.py", line 101, in commit
self.dbconn.commit()
AttributeError: 'NoneType' object has no attribute 'commit'

请问有没有什么解决方法呢？
操作系统是用的 Centos 7X64
Python版本是：2.7.5

没有关于搜索引擎的操作源码吗

想了解这个搜索引擎是怎么操作的

redis.exceptions.ConnectionError: Error 10061 connecting to 127.0.0.1:6379.

按照你的步骤，执行。。是不是缺少了什么，
scrapy crawl baidupan 执行这个命令是一直报这个错

k1995 / baiduyunspider Goto Github PK

baiduyunspider's Issues

如果指定关键字进行爬取

errno=-55；这个是什么造成的，我的爬虫，现在一直被返回这个错误码

网站内页优化

404 Not Found

如何联系你

厉害~

我想問下你python用的哪個版本呢？

1

运行scrapy crawl baidupan报错，请问应该怎么解决呢？

11

爬虫代理ip，报错

参考着我也写了一个百度云搜索 www.81ad.cn

爬虫做种的时候报错

没有关于搜索引擎的操作源码吗

redis.exceptions.ConnectionError: Error 10061 connecting to 127.0.0.1:6379.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent