first-spider's Introduction

Kyle Ip

✨ Software development engineer from China.

🔭 Focused on server-side applicaion development, I'm enthusiastic about:

Software Architecture Design
Distributed System
Big Data
DevOps / DevSecOps
Full Stack Engineering
Other cutting-edge technologies and open source projects

🌈 Skill Set:

...

first-spider's People

Contributors

Stargazers

Watchers

first-spider's Issues

Mysql/Json example

Could you pls post mysql structure or json index and mapping for spider site example. I still have problems with send data to ES (5.3.1).
I'm not sure why my ES deprecation.log show errors like,

[WARN ][o.e.d.r.a.a.i.RestAnalyzeAction] analyzer request parameter is deprecated and will be removed in the next major release. Please use the JSON in the request body instead request param

[WARN ][o.e.d.r.a.a.i.RestAnalyzeAction] filter request parameter is deprecated and will be removed in the next major release. Please use the JSON in the request body instead request param

Any help or suggestions would be greatly ...

Error: es connection

您好, 我用了elasticsearch-rtf 5.1.1, 去跑"scrapy crawl jobbole", 但以下這個error一只出現....

=====

'虽然即便是最大的公司网站也会因宕机而遭受损失，但这种影响对于处理网上销售的中小型企业尤其关键。根据最近的一份调查报告显示，一分钟的宕机导致企业平均损失约5000美元。不要让你的业务成为那种统计数据（因为宕机造成的损失）的一部分。在假日繁忙之前，主动调优MySQL数据库服务器（S）并收获回报吧！\n'
'\r\n'
' \r\n'
' \r\n'
' \n'
' \n'
' 1 赞\n'
' 1 收藏\n'
'\n'
' 2 评论\n'
' \n'
' \n'
' \n'
' \n'
' \n'
' \n'
' \n'
' \n'
' \n'
' \n'
' \n'
'\n'
' \n'
'\n'
'\n'
'\n'
'\r\n'
' \r\n'
'\r\n'
'\r\n'
'\r\n'
'\t',
'created_time': datetime.date(2018, 2, 12),
'fav_count': 1,
'front_image_url': ['http://jbcdn2.b0.upaiyun.com/2015/11/e78e36715813f49e9e62fe0c6050075c.png'],
'tags': 'IT技术,,MySQL,数据库',
'title': 'MySQL 性能调优技巧',
'url': 'http://blog.jobbole.com/113197/',
'url_object_id': '9d1a4f483de8dad0a4da4300eacbcc09',
'voteup_count': 1}
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/urllib3/connection.py", line 141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/usr/local/lib/python3.6/dist-packages/urllib3/util/connection.py", line 83, in create_connection
raise err
File "/usr/local/lib/python3.6/dist-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/elasticsearch/connection/http_urllib3.py", line 114, in perform_request
response = self.pool.urlopen(method, url, body, retries=False, headers=self.headers, **kw)
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 649, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python3.6/dist-packages/urllib3/util/retry.py", line 333, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.6/dist-packages/urllib3/packages/six.py", line 686, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 356, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/usr/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/usr/local/lib/python3.6/dist-packages/urllib3/connection.py", line 166, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.6/dist-packages/urllib3/connection.py", line 150, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7feb72180978>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/twisted/internet/defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/apps/03.pip_py36_elasticsearch-dsl/11.yipwinghong_FirstSpider_py36_ERROR_es-rtf/FirstSpider/pipelines.py", line 147, in process_item
item.insert_to_es()
File "/apps/03.pip_py36_elasticsearch-dsl/11.yipwinghong_FirstSpider_py36_ERROR_es-rtf/FirstSpider/items.py", line 158, in insert_to_es
article.suggests = gen_suggests(ArticleType._doc_type.index, ((article.title, 10), (article.tags, 7)))
File "/apps/03.pip_py36_elasticsearch-dsl/11.yipwinghong_FirstSpider_py36_ERROR_es-rtf/FirstSpider/items.py", line 34, in gen_suggests
words = es.indices.analyze(index=index, analyzer="ik_max_word", params={'filter': ["lowercase"]}, body=text)
File "/usr/local/lib/python3.6/dist-packages/elasticsearch/client/utils.py", line 73, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/elasticsearch/client/indices.py", line 32, in analyze
'_analyze'), params=params, body=body)
File "/usr/local/lib/python3.6/dist-packages/elasticsearch/transport.py", line 318, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/local/lib/python3.6/dist-packages/elasticsearch/connection/http_urllib3.py", line 123, in perform_request
raise ConnectionError('N/A', str(e), e)
elasticsearch.exceptions.ConnectionError: ConnectionError(<urllib3.connection.HTTPConnection object at 0x7feb72180978>: Failed to establish a new connection: [Errno 111] Connection refused) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7feb72180978>: Failed to establish a new connection: [Errno 111] Connection refused)
2018-02-12 21:32:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://blog.jobbole.com/113200/> (referer: http://blog.jobbole.com/all-posts/page/5/)
2018-02-12 21:32:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://blog.jobbole.com/113203/> (referer: http://blog.jobbole.com/all-posts/page/5/)
2018-02-12 21:32:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://blog.jobbole.com/113205/> (referer: http://blog.jobbole.com/all-posts/page/5/)
2018-02-12 21:32:37 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://blog.jobbole.com/all-posts/page/6/> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>, <twisted.python.failure.Failure twisted.web.http._DataLoss: Chunked decoder in 'BODY' state, still expecting more data to get to 'FINISHED' state.>]

Recommend Projects

kyle-ip / first-spider Goto Github PK

first-spider's Introduction

Kyle Ip

first-spider's People

Contributors

Stargazers

Watchers

Forkers

first-spider's Issues

测试

Mysql/Json example

Error: es connection

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent