Giter Site home page Giter Site logo

first-spider's Introduction

Kyle Ip

✨ Software development engineer from China.

🔭 Focused on server-side applicaion development, I'm enthusiastic about:

  • Software Architecture Design
  • Distributed System
  • Big Data
  • DevOps / DevSecOps
  • Full Stack Engineering
  • Other cutting-edge technologies and open source projects

🌈 Skill Set:

linux java python go c lua js mysql kafka redis zookeeper

hadoop spark HBase ansible airflow docker kubernetes spring jenkins tomcat ...

Metrics

yipwinghong

first-spider's People

Contributors

dependabot[bot] avatar kyle-ip avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

first-spider's Issues

测试

本issue用作自动周报生成器测试。

Mysql/Json example

Could you pls post mysql structure or json index and mapping for spider site example. I still have problems with send data to ES (5.3.1).
I'm not sure why my ES deprecation.log show errors like,

[WARN ][o.e.d.r.a.a.i.RestAnalyzeAction] analyzer request parameter is deprecated and will be removed in the next major release. Please use the JSON in the request body instead request param

[WARN ][o.e.d.r.a.a.i.RestAnalyzeAction] filter request parameter is deprecated and will be removed in the next major release. Please use the JSON in the request body instead request param

Any help or suggestions would be greatly ...

Error: es connection

您好, 我用了elasticsearch-rtf 5.1.1, 去跑"scrapy crawl jobbole", 但以下這個error一只出現....

=====

'虽然即便是最大的公司网站也会因宕机而遭受损失,但这种影响对于处理网上销售的中小型企业尤其关键。根据最近的一份调查报告显示,一分钟的宕机导致企业平均损失约5000美元。不要让你的业务成为那种统计数据(因为宕机造成的损失)的一部分。在假日繁忙之前,主动调优MySQL数据库服务器(S)并收获回报吧!\n'
'\r\n'
' \r\n'
' \r\n'
' \n'
' \n'
' 1 赞\n'
' 1 收藏\n'
'\n'
' 2 评论\n'
' \n'
' \n'
' \n'
' \n'
' \n'
' \n'
' \n'
' \n'
' \n'
' \n'
' \n'
'\n'
' \n'
'\n'
'\n'
'\n'
'\r\n'
' \r\n'
'\r\n'
'\r\n'
'\r\n'
'\t',
'created_time': datetime.date(2018, 2, 12),
'fav_count': 1,
'front_image_url': ['http://jbcdn2.b0.upaiyun.com/2015/11/e78e36715813f49e9e62fe0c6050075c.png'],
'tags': 'IT技术,,MySQL,数据库',
'title': 'MySQL 性能调优技巧',
'url': 'http://blog.jobbole.com/113197/',
'url_object_id': '9d1a4f483de8dad0a4da4300eacbcc09',
'voteup_count': 1}
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/urllib3/connection.py", line 141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/usr/local/lib/python3.6/dist-packages/urllib3/util/connection.py", line 83, in create_connection
raise err
File "/usr/local/lib/python3.6/dist-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/elasticsearch/connection/http_urllib3.py", line 114, in perform_request
response = self.pool.urlopen(method, url, body, retries=False, headers=self.headers, **kw)
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 649, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python3.6/dist-packages/urllib3/util/retry.py", line 333, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.6/dist-packages/urllib3/packages/six.py", line 686, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 356, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/usr/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/usr/local/lib/python3.6/dist-packages/urllib3/connection.py", line 166, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.6/dist-packages/urllib3/connection.py", line 150, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7feb72180978>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/twisted/internet/defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/apps/03.pip_py36_elasticsearch-dsl/11.yipwinghong_FirstSpider_py36_ERROR_es-rtf/FirstSpider/pipelines.py", line 147, in process_item
item.insert_to_es()
File "/apps/03.pip_py36_elasticsearch-dsl/11.yipwinghong_FirstSpider_py36_ERROR_es-rtf/FirstSpider/items.py", line 158, in insert_to_es
article.suggests = gen_suggests(ArticleType._doc_type.index, ((article.title, 10), (article.tags, 7)))
File "/apps/03.pip_py36_elasticsearch-dsl/11.yipwinghong_FirstSpider_py36_ERROR_es-rtf/FirstSpider/items.py", line 34, in gen_suggests
words = es.indices.analyze(index=index, analyzer="ik_max_word", params={'filter': ["lowercase"]}, body=text)

File "/usr/local/lib/python3.6/dist-packages/elasticsearch/client/utils.py", line 73, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/elasticsearch/client/indices.py", line 32, in analyze
'_analyze'), params=params, body=body)
File "/usr/local/lib/python3.6/dist-packages/elasticsearch/transport.py", line 318, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/local/lib/python3.6/dist-packages/elasticsearch/connection/http_urllib3.py", line 123, in perform_request
raise ConnectionError('N/A', str(e), e)
elasticsearch.exceptions.ConnectionError: ConnectionError(<urllib3.connection.HTTPConnection object at 0x7feb72180978>: Failed to establish a new connection: [Errno 111] Connection refused) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7feb72180978>: Failed to establish a new connection: [Errno 111] Connection refused)
2018-02-12 21:32:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://blog.jobbole.com/113200/> (referer: http://blog.jobbole.com/all-posts/page/5/)
2018-02-12 21:32:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://blog.jobbole.com/113203/> (referer: http://blog.jobbole.com/all-posts/page/5/)
2018-02-12 21:32:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://blog.jobbole.com/113205/> (referer: http://blog.jobbole.com/all-posts/page/5/)
2018-02-12 21:32:37 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://blog.jobbole.com/all-posts/page/6/> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>, <twisted.python.failure.Failure twisted.web.http._DataLoss: Chunked decoder in 'BODY' state, still expecting more data to get to 'FINISHED' state.>]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.