Giter Site home page Giter Site logo

jumper2014 / lianjia-beike-spider Goto Github PK

View Code? Open in Web Editor NEW
2.8K 2.8K 710.0 1021 KB

链家网和贝壳网房价爬虫,采集北京上海广州深圳等21个**主要城市的房价数据(小区,二手房,出租房,新房),稳定可靠快速!支持csv,MySQL, MongoDB,Excel, json存储,支持Python2和3,图表展示数据,注释丰富 ,点星支持,仅供学习参考,请勿用于商业用途,后果自负。

Python 97.50% TSQL 2.50%
beike crawler house lianjia spider

lianjia-beike-spider's People

Contributors

jumper2014 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lianjia-beike-spider's Issues

xiaoqu_to_chart.py 运行出错:UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: unexpected end of data

mac环境,运行xiaoqu_to_chart.py 生成html图表的时候,xiaoqu.html都可以正确生成,但是district.html生成过程中报错,错误信息如下:

Traceback (most recent call last):
  File "./xiaoqu_to_chart.py", line 63, in <module>
    bar.render(path="district.html")
  File "/usr/local/lib/python2.7/site-packages/pyecharts/base.py", line 146, in render
    **kwargs
  File "/usr/local/lib/python2.7/site-packages/pyecharts/engine.py", line 220, in render_chart_to_file
    html = tpl.render(**kwargs)
  File "/usr/local/lib/python2.7/site-packages/jinja2/environment.py", line 1008, in render
    return self.environment.handle_exception(exc_info, True)
  File "/usr/local/lib/python2.7/site-packages/jinja2/environment.py", line 780, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python2.7/site-packages/pyecharts/templates/simple_chart.html", line 10, in top-level template code
    {{ echarts_js_content(chart) }}
  File "/usr/local/lib/python2.7/site-packages/pyecharts/engine.py", line 129, in echarts_js_content
    return Markup(EMBED_SCRIPT_FORMATTER.format(generate_js_content(*charts)))
  File "/usr/local/lib/python2.7/site-packages/pyecharts/engine.py", line 101, in generate_js_content
    javascript_snippet = TRANSLATOR.translate(chart.options)
  File "/usr/local/lib/python2.7/site-packages/pyecharts_javascripthon/api.py", line 127, in translate
    option_snippet = json.dumps(options, indent=4, cls=self.json_encoder)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 251, in dumps
    sort_keys=sort_keys, **kw).encode(obj)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 209, in encode
    chunks = list(chunks)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 434, in _iterencode
    for chunk in _iterencode_dict(o, _current_indent_level):
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 408, in _iterencode_dict
    for chunk in chunks:
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 332, in _iterencode_list
    for chunk in chunks:
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 443, in _iterencode
    for chunk in _iterencode(o, _current_indent_level):
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 434, in _iterencode
    for chunk in _iterencode_dict(o, _current_indent_level):
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 408, in _iterencode_dict
    for chunk in chunks:
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 443, in _iterencode
    for chunk in _iterencode(o, _current_indent_level):
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 431, in _iterencode
    for chunk in _iterencode_list(o, _current_indent_level):
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 313, in _iterencode_list
    yield buf + _encoder(value)

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: unexpected end of data

急:总是被反爬虫(人机验证)怎么办呀?

请问大神怎么做到的300s,7万条数据的啊?
我thread_pool_size = 5,RANDOM_DELAY = 30,
用ershou.py才100多条数据就被人机验证了。实测只有thread_pool_size <= 2才行。。。
注:我修改了一点代码,适配目前的链家二手房网页。

遇到的一个关于浏览器的问题

作者你好,感谢你的开发和分享
我在运行pip install -r requirements.txt时出现以下报错信息
Could not find a version that satisfies the requirement webbrowser (from -r requirements.txt (line 13)) (from versions: none)
ERROR: No matching distribution found for webbrowser (from -r requirements.txt (line 13))
请问这个是跟我的浏览器版本有关吗?按网上指示将python的pip升级到了最新仍无效。
请教一下应该如何解决?

建议添加抓取延时

做为一名高素质爬虫coder,建议配置文件加一个随机延时选项,要是别人服务器顶不住了,一封爬虫,大家都GG,另外建议爬取数据可以直接存数据,不经过csv

list index out of range

你好 运行zufang.py 选择北京后 产生如下错误信息:

……(正常运行)

http://bj.lianjia.com/zufang/tianningsi1/
http://bj.lianjia.com/zufang/xuanwumen12/
Warning: only find one page for dongdan
list index out of range
http://bj.lianjia.com/zufang/dongdan/pg1
python(19966,0x700011453000) malloc: *** error for object 0x7ffd472deff8: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
[1] 19966 abort ~/anaconda/bin/python zufang.py

关于支持python2

代码中所有print都是采用python3结构。根本没办法支持python2。在get_city()中添加版本判断个人感觉是可以删掉的。

希望能够修改一下生成xiaoqu的命名逻辑

我认为用抓取到的小区信息生成csv或者写入数据库的时候应该带有时间而不是直接覆盖
例如默认情况下xiaoqu_to_db.py生成的csv文件为xiaoqu.csv在自动化部署时会覆盖前一天的csv
写入数据库的时候建议把表名由xiaoqu改为城市名有利于减少数据分类的工作量

小区和二手房运行会出现这个错误。

http://bj.lianjia.com/xiaoqu/dingfuzhuang/
raise ReadTimeout(e, request=request)
ReadTimeout: HTTPSConnectionPool(host='bj.lianjia.com', port=443): Read timed out. (read timeout=10)
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/threadpool.py", line 158, in run
result = request.callable(*request.args, **request.kwds)
File "xiaoqu.py", line 34, in collect_xiaoqu_data
xqs = get_xiaoqu_info(city_name, area_name)
File "/Users/kaiyingwu/Downloads/lianjia-spider-master/lib/url/xiaoqu.py", line 101, in get_xiaoqu_info
response = requests.get(page, timeout=10, headers=headers)
File "/Library/Python/2.7/site-packages/requests-2.11.1-py2.7.egg/requests/api.py", line 70, in get
return request('get', url, params=params, **kwargs)
File "/Library/Python/2.7/site-packages/requests-2.11.1-py2.7.egg/requests/api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "/Library/Python/2.7/site-packages/requests-2.11.1-py2.7.egg/requests/sessions.py", line 475, in request
resp = self.send(prep, **send_kwargs)
File "/Library/Python/2.7/site-packages/requests-2.11.1-py2.7.egg/requests/sessions.py", line 617, in send
history = [resp for resp in gen] if allow_redirects else []
File "/Library/Python/2.7/site-packages/requests-2.11.1-py2.7.egg/requests/sessions.py", line 177, in resolve_redirects
**adapter_kwargs
File "/Library/Python/2.7/site-packages/requests-2.11.1-py2.7.egg/requests/sessions.py", line 628, in send
r.content
File "/Library/Python/2.7/site-packages/requests-2.11.1-py2.7.egg/requests/models.py", line 755, in content
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
File "/Library/Python/2.7/site-packages/requests-2.11.1-py2.7.egg/requests/models.py", line 683, in generate
raise ConnectionError(e)

房间布局和面积的提取

由于数据的不规律,爬取贝壳时用下标提取常常会有一些异常值,建议使用正则表达式
pattern_layout = re.compile(r'[0-9]{1,2}[\u4e00-\u9fa5][0-9]{1,2}[\u4e00-\u9fa5][0-9]{1,2}[\u4e00-\u9fa5]')
pattern_size = re.compile(r'([0-9]{1,3})㎡')
descs = desc2.text.strip().replace("\n", "").replace(" ", "").replace("/", "")
m_layout = pattern_layout.search(descs)
m_size = pattern_size.search(descs)
layout = m_layout.group()
size = m_size.group()

新增城市

有什么方法可以在框架内自己新增城市么,比如石家庄,哈尔滨这种人口也很多的城市,需要增加哪部分代码呢?

The output columns issue

通过 ershou.py开始运行,运行后数据sample如下。 可以看到没有小区名字,而且后面的四列没有实际意义,不知道是故意这样子设计,还是现在的网站做了更新,所以抓取的内容发生了改变。

20200428 宝山 月浦 电梯房,双南户型,功能间全明,不沿街 180万 低楼层(共18层) 1995年建 2室1厅 67.2平米 https://ke-image.ljcdn.com/110000-inspection/pc1_CQEejQcfJ.jpg!m_fill w_280 h_210 f_jpg?from=ke.com
20200428 宝山 月浦 满五不唯一,全明格局,看房方便,诚心出售。必看好房 268万 中楼层(共24层) 2011年建 2室2厅 83.32平米 https://ke-image.ljcdn.com/110000-inspection/pc1_ItVNT4rKm.jpg!m_fill w_280 h_210 f_jpg?from=ke.com
20200428 宝山 月浦 月浦六七九村 一房一厅 非顶楼 精装修 112万诚售必看好房 112万 高楼层(共6层) 1994年建 1室1厅 36.94平米 https://ke-image.ljcdn.com/110000-inspection/pc1_AgGtusWON.jpg!m_fill w_280 h_210 f_jpg?from=ke.com

好项目,两个小问题

  1. 有的小区名字内含有英文逗号,所以按逗号分隔写入csv的时候,这样的数据会多出一列,可以改成按制表符分隔或者把写入内容的逗号都替换成非逗号;
    2.贝壳网,网址为xx.ke.com这样的有的城市并没有小区数据,如何判断一个城市是否有小区数据?

写入文件开始出错

你好,代码前面跑着正常,貌似从写入文件就开始出错了:

  1. 中文开始有乱码;

  2. 多线程不太懂,这段报错是啥意思呀?
    谢谢

huyuanzhen   File "C:\ProgramData\Anaconda2\envs\LJ27\lib\site-packages\threadpool.py", line 158, in run
 淇濆瓨鏂囦欢璺緞:淇濆瓨鏂囦欢璺緞:  d:\python\sp\lj20180225/data/lianjia/sh/20180225/sanlin.txtd:\python\sp\lj20180225/data/lianjia/sh/20180225/shuyuanzhen.txt

   result = request.callable(*request.args, **request.kwds)
  File "xiaoqu.py", line 26, in collect_xiaoqu_data
    print "寮€濮嬬埇鍙栨澘鍧?", area_name, "淇濆瓨鏂囦欢璺緞:", csv_file
Traceback (most recent call last):
  File "xiaoqu.py", line 94, in <module>
    pool.poll()
  File "C:\ProgramData\Anaconda2\envs\LJ27\lib\site-packages\threadpool.py", line 315, in poll
    request.exc_callback(request, result)
  File "C:\ProgramData\Anaconda2\envs\LJ27\lib\site-packages\threadpool.py", line 78, in _handle_thread_exception
    traceback.print_exception(*exc_info)
  File "C:\ProgramData\Anaconda2\envs\LJ27\lib\traceback.py", line 125, in print_exception
    print_tb(tb, limit, file)
  File "C:\ProgramData\Anaconda2\envs\LJ27\lib\traceback.py", line 70, in print_tb
    if line: _print(file, '    ' + line.strip())
  File "C:\ProgramData\Anaconda2\envs\LJ27\lib\traceback.py", line 13, in _print
    file.write(str+terminator)
IOError: [Errno 0] Error

是不是被链家发现是爬虫,总是被kill

Districts: ['dongcheng', 'xicheng', 'chaoyang', 'haidian', 'fengtai', 'shijingshan', 'tongzhou', 'changping', 'daxing', 'yizhuangkaifaqu', 'shunyi', 'fangshan', 'mentougou', 'pinggu', 'huairou', 'miyun', 'yanqing', 'yanjiao', 'xianghe']
dongcheng: Area list: []
xicheng: Area list: ['baizhifang1', 'caihuying', 'changchunjie', 'chongwenmen', 'chegongzhuang1', 'dianmen', 'deshengmen', 'fuchengmen', 'guanganmen', 'guanyuan', 'jinrongjie', 'liupukang', 'madian1', 'maliandao1', 'muxidi1', 'niujie', 'taoranting1', 'taipingqiao1', 'tianningsi1', 'xisi1', 'xuanwumen12', 'xizhimen1', 'xinjiekou2', 'xidan', 'yuetan', 'youanmennei11']
chaoyang: Area list: ['andingmen', 'anzhen1', 'aolinpikegongyuan11', 'beiyuan2', 'beigongda', 'baiziwan', 'chengshousi1', 'changying', 'chaoyangmenwai1', 'cbd', 'chaoqing', 'chaoyanggongyuan', 'dongzhimen', 'dongba', 'dawanglu', 'dongdaqiao', 'dashanzi', 'dougezhuang', 'dingfuzhuang', 'fangzhuang1', 'fatou', 'guangqumen', 'gongti', 'gaobeidian', 'guozhan1', 'ganluyuan', 'guanzhuang', 'hepingli', 'huanlegu', 'huixinxijie', 'hongmiao', 'huaweiqiao', 'jianxiangqiao1', 'jiuxianqiao', 'jinsong', 'jianguomenwai', 'lishuiqiao1', 'madian1', 'nongzhanguan', 'nanshatan1', 'panjiayuan1', 'sanyuanqiao', 'shaoyaoju', 'shifoying', 'shilibao', 'shoudoujichang1', 'shuangjing', 'shilihe', 'shibalidian1', 'shuangqiao', 'sanlitun', 'sihui', 'tongzhoubeiyuan', 'tuanjiehu', 'taiyanggong', 'tianshuiyuan', 'wangjing', 'xibahe', 'yayuncun', 'yayuncunxiaoying', 'yansha1', 'zhongyangbieshuqu1', 'zhaoyangqita']
haidian: Area list: ['aolinpikegongyuan11', 'anningzhuang1', 'baishiqiao1', 'beitaipingzhuang', 'changpingqita1', 'changwa', 'dinghuisi', 'erlizhuang', 'gongzhufen', 'ganjiakou', 'haidianqita1', 'haidianbeibuxinqu1', 'junbo1', 'liuliqiao1', 'mudanyuan', 'madian1', 'malianwa', 'qinghe11', 'suzhouqiao', 'shangdi1', 'shijicheng', 'sijiqing', 'shuangyushu', 'tiancun1', 'wudaokou', 'weigongcun', 'wukesong1', 'wanliu', 'wanshoulu1', 'xishan21', 'xisanqi1', 'xibeiwang', 'xueyuanlu1', 'xiaoxitian1', 'xizhimen1', 'xinjiekou2', 'xierqi1', 'yangzhuang1', 'yuquanlu11', 'yuanmingyuan', 'yiheyuan', 'zhichunlu', 'zaojunmiao', 'zhongguancun', 'zizhuqiao']
fengtai: Area list: ['beidadi', 'beijingnanzhan1', 'chengshousi1', 'caoqiao', 'caihuying', 'dahongmen', 'fengtaiqita1', 'fangzhuang1', 'guanganmen', 'heyi', 'huaxiang', 'jiugong1', 'jiaomen', 'kejiyuanqu', 'kandanqiao', 'lize', 'liujiayao', 'lugouqiao1', 'liuliqiao1', 'muxiyuan1', 'majiabao', 'maliandao1', 'puhuangyu', 'qingta1', 'qilizhuang', 'songjiazhuang', 'shilihe', 'taipingqiao1', 'wulidian', 'xihongmen', 'xiluoyuan', 'xingong', 'yuegezhuang', 'yuquanying', 'youanmenwai', 'yangqiao1', 'zhaogongkou']
shijingshan: Area list: ['bajiao1', 'chengzi', 'gucheng', 'laoshan1', 'lugu1', 'pingguoyuan1', 'shijingshanqita1', 'yangzhuang1', 'yuquanlu11']
tongzhou: Area list: ['beiguan', 'daxingqita11', 'guoyuan1', 'jiukeshu12', 'luyuan', 'liyuan', 'linheli', 'majuqiao1', 'qiaozhuang', 'shoudoujichang1', 'tongzhoubeiyuan', 'tongzhouqita11', 'wuyihuayuan', 'xinhuadajie', 'yizhuang1', 'yuqiao']
HTTPSConnectionPool(host='bj.lianjia.com', port=443): Read timed out. (read timeout=10)
changping: Area list: None
Traceback (most recent call last):
File "ershou.py", line 134, in
areas.extend(areas_of_district)
TypeError: 'NoneType' object is not iterable

有历史数据吗

有没有历史数据呢
想了解房价是涨了还是跌了
貌似行情要走低了

xiaoqu_to_db.py在无法通过命令行模式自动运行

C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pymysql\cursors.py:170:`

Warning: (1366, "Incorrect string value: '\xD6\xD0\xB9\xFA\xB1\xEA...' for column 'VARIABLE_VALUE' at row 519")
result = self._query(query)
Which city data do you want to save ?
bj: 北京, cd: 成都, cq: 重庆, cs: 长沙
dg: 东莞, dl: 大连, fs: 佛山, gz: 广州
hz: 杭州, hf: 合肥, jn: 济南, nj: 南京
qd: 青岛, sh: 上海, sz: 深圳, su: 苏州
sy: 沈阳, tj: 天津, wh: 武汉, xm: 厦门
yt: 烟台,

xiaoqu.py等python脚本可以通过命令行加参数直接执行这个导入数据库的却不能通过命令行执行不便于自动收集

执行pip install -r requirements.txt疯狂报错

Collecting pillow (from pyecharts-snapshot->-r requirements.txt (line 12))
Using cached https://files.pythonhosted.org/packages/40/50/406ea88c6d3c4fdffd45f2cf7528628586e1651e5c6f95f0193870832175/Pillow-6.2.0-cp35-cp35m-win_amd64.whl
Collecting pyppeteer>=0.0.25 (from pyecharts-snapshot->-r requirements.txt (line 12))
Using cached https://files.pythonhosted.org/packages/b0/16/a5e8d617994cac605f972523bb25f12e3ff9c30baee29b4a9c50467229d9/pyppeteer-0.0.25.tar.gz
ERROR: Command errored out with exit status 1:
command: 'c:\users\sunpeng3\appdata\local\programs\python\python35\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\sunpeng3\AppData\Local\Temp\pip-install-5xao5og2\pyppeteer\setup.py'"'"'; file='"'"'C:\Users\sunpeng3\AppData\Local\Temp\pip-install-5xao5og2\pyppeteer\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info
cwd: C:\Users\sunpeng3\AppData\Local\Temp\pip-install-5xao5og2\pyppeteer
Complete output (32 lines):
Requirement already satisfied: py-backwards in c:\users\sunpeng3\appdata\local\programs\python\python35\lib\site-packages (0.7)
Requirement already satisfied: colorama in c:\users\sunpeng3\appdata\local\programs\python\python35\lib\site-packages (from py-backwards) (0.4.1)
Requirement already satisfied: py-backwards-astunparse in c:\users\sunpeng3\appdata\local\programs\python\python35\lib\site-packages (from py-backwards) (1.5.0.post3)
Requirement already satisfied: typed-ast in c:\users\sunpeng3\appdata\local\programs\python\python35\lib\site-packages (from py-backwards) (1.4.0)
Requirement already satisfied: autopep8 in c:\users\sunpeng3\appdata\local\programs\python\python35\lib\site-packages (from py-backwards) (1.4.4)
Requirement already satisfied: wheel<1.0,>=0.23.0 in c:\users\sunpeng3\appdata\local\programs\python\python35\lib\site-packages (from py-backwards-astunparse->py-backwards) (0.33.6)
Requirement already satisfied: six<2.0,>=1.6.1 in c:\users\sunpeng3\appdata\local\programs\python\python35\lib\site-packages (from py-backwards-astunparse->py-backwards) (1.12.0)
Requirement already satisfied: pycodestyle>=2.4.0 in c:\users\sunpeng3\appdata\local\programs\python\python35\lib\site-packages (from autopep8->py-backwards) (2.5.0)
Traceback (most recent call last):
File "C:\Users\sunpeng3\AppData\Local\Temp\pip-install-5xao5og2\pyppeteer\setup.py", line 19, in
from py_backwards.compiler import compile_files
File "c:\users\sunpeng3\appdata\local\programs\python\python35\lib\site-packages\py_backwards\compiler.py", line 8, in
from .files import get_input_output_paths, InputOutput
File "c:\users\sunpeng3\appdata\local\programs\python\python35\lib\site-packages\py_backwards\files.py", line 9, in
from .exceptions import InvalidInputOutput, InputDoesntExists
File "c:\users\sunpeng3\appdata\local\programs\python\python35\lib\site-packages\py_backwards\exceptions.py", line 1, in
from typing import Type, TYPE_CHECKING
ImportError: cannot import name 'Type'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\sunpeng3\AppData\Local\Temp\pip-install-5xao5og2\pyppeteer\setup.py", line 25, in <module>
    from py_backwards.compiler import compile_files
  File "c:\users\sunpeng3\appdata\local\programs\python\python35\lib\site-packages\py_backwards\compiler.py", line 8, in <module>
    from .files import get_input_output_paths, InputOutput
  File "c:\users\sunpeng3\appdata\local\programs\python\python35\lib\site-packages\py_backwards\files.py", line 9, in <module>
    from .exceptions import InvalidInputOutput, InputDoesntExists
  File "c:\users\sunpeng3\appdata\local\programs\python\python35\lib\site-packages\py_backwards\exceptions.py", line 1, in <module>
    from typing import Type, TYPE_CHECKING
ImportError: cannot import name 'Type'
----------------------------------------

ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

connection time out

运行ershou.py,提示
HTTPSConnectionPool(host='yt.lianjia.com', port=443): Max retries exceeded with url: /xiaoqu/fushan (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x10a4308d0>, 'Connection to yt.lianjia.com timed out. (connect timeout=10)')) fushan: Area list: None Traceback (most recent call last): File "ershou.py", line 135, in <module> areas.extend(areas_of_district) TypeError: 'NoneType' object is not iterable

是不是请求频繁被block了?

爬虫失败

爬取贝壳信息时报requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='nj.ke.com', port=443): Max retries exceeded with url:错误

抓取不到数据

[nonroot@fbox lianjia-beike-spider]$ python ershou.py nj
Today date is: 20181125
Target site is lianjia.com
City is: nj
OK, start to crawl 南京
City: nj
Districts: []
('Area:', [])
('District and areas:', {})
Total crawl 0 areas.
Total cost 5.6205971241 second to crawl 0 data items.

其他城市也抓不到, 没能成功安装 webbrowser, 不知道是否又影响

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.