尝试跑了一下,有http的报错
python2.7 ./easy_university_selection.py 10010 10035 512 2018 10148
好像是抓取的链接根本不存在的问题
在gkcx.eol.cn找了很久也没找到该怎么改
http://gkcx.eol.cn/schoolhtm/scores/provinceScores643_10010_10035_10036.xml
python2.7 ./easy_university_selection.py 10010 10035 512 2018 10148
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
年份:2018
地区:山西
分数:512 理科
过滤:专科
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
加载高校库完成,共有2766所高校信息载入
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
抓取高校库中所有高校在[山西]地区[理科]招生分数线
http://gkcx.eol.cn/schoolhtm/scores/provinceScores643_10010_10035_10036.xml
Traceback (most recent call last):
File "./easy_university_selection.py", line 679, in
spider_university_province_score_line('10036', '本一批次')
File "./easy_university_selection.py", line 384, in spider_university_province_score_line
'http://gkcx.eol.cn/schoolhtm/scores/provinceScores', 'provinceScores', tier, info)
File "./easy_university_selection.py", line 406, in spider_score_line
res_data = urllib2.urlopen(req)
File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 435, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 467, in error
result = self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 654, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib/python2.7/urllib2.py", line 435, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 473, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 500: Internal Server Error