levirve / dcard-spider Goto Github PK
View Code? Open in Web Editor NEWA spider on Dcard. Strong and speedy.
License: MIT License
A spider on Dcard. Strong and speedy.
License: MIT License
I got a error when using command-line "dcard download -f photography -n 100 -likes 100"
"json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)"
Traceback (most recent call last):
File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 435, in _update_chunk_length
self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 226, in _error_catcher
yield
File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 486, in read_chunked
self._update_chunk_length()
File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 439, in _update_chunk_length
raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/models.py", line 660, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 340, in stream
for line in self.read_chunked(amt, decode_content=decode_content):
File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 514, in read_chunked
self._original_response.close()
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 244, in _error_catcher
raise ProtocolError('Connection broken: %r' % e, e)
requests.packages.urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "spider.py", line 40, in main
poster.get(callback=lambda posts, forum=forum_name: store_to_db(posts, forum))
File "/home/ubuntu/workspace/dcard-spider/dcard/posts.py", line 51, in get
return PostsResult(self.ids, bundle, callback)
File "/home/ubuntu/workspace/dcard-spider/dcard/posts.py", line 73, in __init__
self.results = self.format(bundle, callback)
File "/home/ubuntu/workspace/dcard-spider/dcard/posts.py", line 97, in format
comments = comments.get() if comments else []
File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
File "/usr/lib/python3.5/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/ubuntu/workspace/dcard-spider/dcard/posts.py", line 60, in _serially_get_comments
_comments = client.get(comments_url, params=params)
File "/home/ubuntu/workspace/dcard-spider/dcard/utils.py", line 31, in get
response = self.req_session.get(url, **kwargs)
File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/sessions.py", line 480, in get
return self.request('GET', url, **kwargs)
File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/sessions.py", line 468, in request
resp = self.send(prep, **send_kwargs)
File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/sessions.py", line 608, in send
r.content
File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/models.py", line 737, in content
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/models.py", line 663, in generate
raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
Error occurred several minutes after executing with "number of posts" as 100000000 the second time.
(harvesting pictures from "Dcard Sex" forum)
Seems to be like a file name problem?
API 使用範例中提到
print('成功下載!' if all(status) else '出了點錯下載不完全喔')
應改為
print('成功下載!' if status>0 else '出了點錯下載不完全喔')
因為 manager.py
裡面的 download()
回傳的是 sum(status)
I found that this package does not work.
I tried these codes:
forum = dcard.forums('funny')
forum.get_metas()
It returns nothing: [ ]
Also command lines:
dcard download --forum photography --number 100
It returns:
成功下載 0 items!
Finish in 0.20728 sec(s).
I use Python36 on Linux.
My code used to work two months ago.
你好,很感謝你的套件,但最近好像不能使用,有空的話,麻煩可以修正一下,感謝~
請求多了,在哪裏怎麽更換代理服務器
不確定這樣改是不是最好的,不過 Forum
並不是一個 public 的 class
,必須要透過 dcard
才拿的到
ariticle_metas = dcard.forums('funny').get_metas(num=Forum.infinite_page, sort='popular')
會報錯說不知道 Forum
是啥,暫時的 work around 是
ariticle_metas = dcard.forums('funny').get_metas(num=dcard.forums.infinite_page, sort='popular')
或許作者有更好的作法?
from dcard import Dcard
dcard = Dcard()
dcard.forums.get_metas()
我執行以上程式碼,但回傳空list,沒有回傳所有看板的metadata。
想請問可能是哪裡出問題,感謝!
nothing is working
is it my computer problem or someone else can't work too?
Traceback (most recent call last):
File "spider.py", line 40, in main
poster.get(callback=lambda posts, forum=forum_name: store_to_db(posts, forum))
File "/home/ubuntu/workspace/dcard-spider/dcard/posts.py", line 51, in get
return PostsResult(self.ids, bundle, callback)
File "/home/ubuntu/workspace/dcard-spider/dcard/posts.py", line 73, in __init__
self.results = self.format(bundle, callback)
File "/home/ubuntu/workspace/dcard-spider/dcard/posts.py", line 100, in format
post.update(cont.result().json()) if cont else None
File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/models.py", line 808, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.