Giter Site home page Giter Site logo

koreanewscrawler's People

Contributors

geonmo avatar geun9716 avatar lumyjuwon avatar minyoung347 avatar slslslrhfem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

koreanewscrawler's Issues

궁금한게 있습니다.

def set_category(self, *args):
    for key in args:
        if self.category.get(key) is None:
            raise InvalidCategory(key)
        else:
            self.selected_category = args

이부분에서 self.selected_category 에다 args를 assign해줬는데 key를 해야하지 않는 건가 해서요...

README.MD 수정 필요

Sport Crawler가 추가돼서 Sport Crawler 사용 방법에 대한 내용을 기존 Example에 추가가 필요합니다.

SyntaxError: invalid syntax

Python 3.8.6 Windows 10

pip install KoreaNewsCrawler

#를 치고 Ctrl과 F5를 함께 눌렀더니

cd 'c:\Users\Administrator\Desktop\IDE\Python'; & 'python' 'c:\Users\Administrator.vscode\extensions\ms-python.python-2020.11.371526539\pythonFiles\lib\python\debugpy\launcher' '51484' '--' 'c:\Users\Administrator\Desktop\IDE\Python\KoreaNewsCrawler.txt'
File "c:\Users\Administrator\Desktop\IDE\Python\KoreaNewsCrawler.txt", line 1
pip install KoreaNewsCrawler
^
SyntaxError: invalid syntax

#라는 에러 문구가 뜹니다 ㅜ 방금 새 컴퓨터에 VS Code를 깔았는데, 혹시 KoreaNewsCrawler 실행 전에 다른 패키지 설치가 필요한가요?

pip install KoreaNewsCrawler

#I typed a line above and pressed Control and F5 key at the same time. Yet, I have this error message

cd 'c:\Users\Administrator\Desktop\IDE\Python'; & 'python' 'c:\Users\Administrator.vscode\extensions\ms-python.python-2020.11.371526539\pythonFiles\lib\python\debugpy\launcher' '51484' '--' 'c:\Users\Administrator\Desktop\IDE\Python\KoreaNewsCrawler.txt'
File "c:\Users\Administrator\Desktop\IDE\Python\KoreaNewsCrawler.txt", line 1
pip install KoreaNewsCrawler
^
SyntaxError: invalid syntax

I just downloaded VS Code on my new laptop. Do I need prerequisite packages?

spyder에서 실행을 했는데. 날자입력 에러로 나옵니다.

Crawler.set_date_range(2015, 9, 2016, 9) 와 같이 입력하였는데.

File "d:\conda_env\nlp38\lib\calendar.py", line 124, in monthrange
raise IllegalMonthError(month)

IllegalMonthError: bad month number 0; must be 1-12

와 같은 에러가 나타났습니다.
제대로 입력을 한 것 같은데 왜 이런 이슈가 나타났을까요?

Readme 업데이트에 관하여

우선 올려주신 코드 덕분에 뉴스데이터 확보를 편하게 할 수 있었습니다. 감사합니다.

하나 건의를 드리고자 issue 올립니다.

지금은 Readme에 종료 관련된 내용이 따로 없는 것으로 알고 있습니다.

제가 사용해본 결과 오늘 날짜를 포함하여 크롤링을 진행하는 경우 종료가 되지 않고 계속 크롤링을 하는 것 같습니다.

예를 들어 기간을 (2021, 9, 2021, 10)으로 잡는 경우 오늘 기준(2021.10.12)으로 10월 달이 끝나지 않았기 때문에

계속 실행이 되며 csv를 업데이트 하는 것으로 확인했습니다.

계속 크롤링을 진행해야 하는 경우라면 유용하겠지만, 처음에는 이걸 보고 중간에 종료를 해도 되는지 의문이 생겼었습니다.

저같은 경우 저장되는 csv 파일을 열어서 오늘 날짜로 어느 정도 저장된 걸 확인하고 크롤러를 종료했습니다.

제가 확인한 것이 맞다면 현재 날짜가 포함되는 크롤링에 대한 설명을 추가해주셨으면 합니다.

'임의로 종료하셔도 됩니다' 라던지, 아니면 아예 날짜까지 지정하게 업데이트를 해주시면 더 좋은 소스 코드가 될 것 같습니다.

추가로 output이 바탕 화면에 저장된다는 내용을 적어주셨으면 합니다. 해당 부분을 몰라서 파일을 찾는 분들이 있을 것 같습니다.

감사합니다.

다시 질문입니다. Read.me에 따라 실행을 했는데. 또 오류가 나오는데요.

코드는

Crawler = ArticleCrawler()
Crawler.set_category("경제")
Crawler.set_date_range("2017-01", "2018-04-20")
Crawler.start()

이며,

오류는

ypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_1836/1408842911.py in
1 Crawler = ArticleCrawler()
2 Crawler.set_category("경제")
----> 3 Crawler.set_date_range("2017-01", "2018-04-20")
4 Crawler.start()

TypeError: set_date_range() missing 2 required positional arguments: 'end_year' and 'end_month'

이렇게 나옵니다. 뭐가 문제일까요?

그리고 혹시 크롤링 중간에 저장되고 있는 엑셀을 열면 크롤링은 중단되는 건가요?

병렬 처리 혹은 exception error 관련 에러인 듯한데, 한번 살펴봐주세요. 크롤링이 안되는 상황입니다.

안녕하세요, 이메일로 에러 관련 연락드렸고, 관련 에러를 깃허브 이슈 등록해달라는 말씀에 다라 깃허브 에러 이슈 등록 합니다.

아래와 같은 에러 문구를 보게 되었고, 제가 현재 사용하고 있는 os, python version은 pop_os_20.04, 3.8.10 입니다. 에러 문구 맨 아래쪽에 추가적인 제 의심사항 적어두었으니 참고 부탁드립니다.

(base) root@920410e6c84d:/mnthdd/Dropbox/D/project/COVID_NEWS# python crawler.py
{'start_year': 2020, 'start_month': 1, 'start_day': 1, 'end_year': 2020, 'end_month': 5, 'end_day': 31}
정치 PID: 3057177
IT과학 PID: 3057178
economy PID: 3057179
생활문화 PID: 3057180
오피니언 PID: 3057181
사회 PID: 3057182
세계 PID: 3057183
오피니언 Urls are generated
오피니언 is collecting ...
IT과학 Urls are generated
IT과학 is collecting ...
생활문화 Urls are generated
생활문화 is collecting ...
정치 Urls are generated
정치 is collecting ...
세계 Urls are generated
세계 is collecting ...
economy Urls are generated
economy is collecting ...
사회 Urls are generated
사회 is collecting ...
Process Process-7:
Process Process-6:
Traceback (most recent call last):
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
ConnectionResetError: [Errno 104] Connection reset by peer
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)

During handling of the above exception, another exception occurred:

File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(

During handling of the above exception, another exception occurred:

File "/opt/conda/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(

During handling of the above exception, another exception occurred:

File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/opt/conda/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 131, in get_url_data
return requests.get(url, headers={'User-Agent':'Mozilla/5.0'})
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 170, in crawling
request_content = self.get_url_data(content_url)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 132, in get_url_data
except requests.exceptions:
TypeError: catching classes that do not inherit from BaseException is not allowed
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 131, in get_url_data
return requests.get(url, headers={'User-Agent':'Mozilla/5.0'})
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 170, in crawling
request_content = self.get_url_data(content_url)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 132, in get_url_data
except requests.exceptions:
TypeError: catching classes that do not inherit from BaseException is not allowed
Process Process-4:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/opt/conda/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/opt/conda/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 131, in get_url_data
return requests.get(url, headers={'User-Agent':'Mozilla/5.0'})
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 170, in crawling
request_content = self.get_url_data(content_url)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 132, in get_url_data
except requests.exceptions:
TypeError: catching classes that do not inherit from BaseException is not allowed
Process Process-5:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/opt/conda/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/opt/conda/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 131, in get_url_data
return requests.get(url, headers={'User-Agent':'Mozilla/5.0'})
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 170, in crawling
request_content = self.get_url_data(content_url)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 132, in get_url_data
except requests.exceptions:
TypeError: catching classes that do not inherit from BaseException is not allowed
Process Process-2:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/opt/conda/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/opt/conda/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 131, in get_url_data
return requests.get(url, headers={'User-Agent':'Mozilla/5.0'})
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 170, in crawling
request_content = self.get_url_data(content_url)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 132, in get_url_data
except requests.exceptions:
TypeError: catching classes that do not inherit from BaseException is not allowed
Process Process-3:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/opt/conda/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/opt/conda/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 131, in get_url_data
return requests.get(url, headers={'User-Agent':'Mozilla/5.0'})
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 170, in crawling
request_content = self.get_url_data(content_url)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 132, in get_url_data
except requests.exceptions:
TypeError: catching classes that do not inherit from BaseException is not allowed
Process Process-1:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/opt/conda/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/opt/conda/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 131, in get_url_data
return requests.get(url, headers={'User-Agent':'Mozilla/5.0'})
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 170, in crawling
request_content = self.get_url_data(content_url)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 132, in get_url_data
except requests.exceptions:
TypeError: catching classes that do not inherit from BaseException is not allowed

멀티 프로세싱 및 requiests.exceptions 관련 문제인 듯 하여, articlecrawler.py의 228번, 229번째 코드 줄을 self.crawling(category_name), print(f"{category_name} crawling start!")로 바꾸고, 132번째 코드 줄을 except: 바꾸고 재 작동 시켜보았을 때에는, 아래와 같은 에러가 발생했습니다.

"ResponseTimeout()"

크롤링 년/월/일 을 체크해주는 부분에 문제가 있는것 같습니다

if start_month > end_month:

Crawler.set_date_range(2017, 9, 2019, 5)
Crawler.set_date_range(2015, 8, 2019, 5)

위와 같이 크롤링 기간을 설정하니까 자꾸 OverbalanceMonth(start_month, end_month) 에러를 띄웁니다

이유를 찾아보니 39번째 줄에서 start_month 가 end_month 보다 큰 경우 무조건 위 에러를 띄우기 때문이라고 생각합니다.

조건을 달만 비교하는 것이 아니라 년도까지 고려해서 체크해주면 해결할 수 있을것 같습니다.

파일 저장 문제

writer.py로 깔끔하게 정리된 것 같습니다. 그런데 리눅스에서 파일에 저장하고 닫을 때, close 부분에 문제가 있는 것 같습니다. 혹시 윈도우에서는 문제가 없으신가요?

{'start_year': 2000, 'start_month': 1, 'end_year': 2000, 'end_month': 2}
IT과학 PID: 24333
IT과학 Urls are generated
The crawler starts
Process Process-1:
Traceback (most recent call last):
File "/home/my/anaconda3/envs/env/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/my/anaconda3/envs/env/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "articlecrawler.py", line 158, in crawling
writer_csv.close()
AttributeError: '_csv.writer' object has no attribute 'close'

EmptyDataError: No columns to parse from file

안녕하세요, 편리한 패키지 공유해주셔서 감사합니다.

colab 환경에서 readme 안내대로 실행해보았습니다.
pip 로 정상적으로 설치하고, 패키지 정상 로드까지 확인했습니다.
이후 아래 코드로 크롤링한 후, 각 파일을 pandas로 열어보면 모두 EmptyDataError가 납니다.
파일을 직접 다운로드 받아 엑셀로 열어보면 모두 빈 파일(0바이트)입니다.
제가 뭔가 잘못 수행한 부분이 있을까요??

<실행코드>

# crawl news articles in output/
Crawler = ArticleCrawler()
Crawler.set_category("정치", "IT과학", "economy")
Crawler.set_date_range(2017, 1, 2017, 12)
Crawler.start()

# check result
for file in glob(output_dir + '*.csv'):
  try:
    display(pd.read_csv(file, header=None))
  except Exception as inst:
    print(type(inst))
    print(inst.args)
    print(inst)
    print()

<출력 결과>

<class 'pandas.errors.EmptyDataError'>
('No columns to parse from file',)
No columns to parse from file

<class 'pandas.errors.EmptyDataError'>
('No columns to parse from file',)
No columns to parse from file

<class 'pandas.errors.EmptyDataError'>
('No columns to parse from file',)
No columns to parse from file

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.