Giter Site home page Giter Site logo

dcard-spider's People

Contributors

anemology avatar levirve avatar y252328 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dcard-spider's Issues

JSONDecodeError?

I got a error when using command-line "dcard download -f photography -n 100 -likes 100"
"json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)"

http.client.IncompleteRead: IncompleteRead(0 bytes read)


Traceback (most recent call last):
  File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 435, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 226, in _error_catcher
    yield
  File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 486, in read_chunked
    self._update_chunk_length()
  File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 439, in _update_chunk_length
    raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/models.py", line 660, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 340, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 514, in read_chunked
    self._original_response.close()
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/packages/urllib3/response.py", line 244, in _error_catcher
    raise ProtocolError('Connection broken: %r' % e, e)
requests.packages.urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "spider.py", line 40, in main
    poster.get(callback=lambda posts, forum=forum_name: store_to_db(posts, forum))
  File "/home/ubuntu/workspace/dcard-spider/dcard/posts.py", line 51, in get
    return PostsResult(self.ids, bundle, callback)
  File "/home/ubuntu/workspace/dcard-spider/dcard/posts.py", line 73, in __init__
    self.results = self.format(bundle, callback)
  File "/home/ubuntu/workspace/dcard-spider/dcard/posts.py", line 97, in format
    comments = comments.get() if comments else []
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
    raise self._value
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/ubuntu/workspace/dcard-spider/dcard/posts.py", line 60, in _serially_get_comments
    _comments = client.get(comments_url, params=params)
  File "/home/ubuntu/workspace/dcard-spider/dcard/utils.py", line 31, in get
    response = self.req_session.get(url, **kwargs)
  File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/sessions.py", line 480, in get
    return self.request('GET', url, **kwargs)
  File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/sessions.py", line 608, in send
    r.content
  File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/models.py", line 737, in content
    self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
  File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/models.py", line 663, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

Bug in file naming again

Error occurred several minutes after executing with "number of posts" as 100000000 the second time.
(harvesting pictures from "Dcard Sex" forum)

2-1
2-2
2-3

dcard.log.txt

Seems to be like a file name problem?

README 範例

API 使用範例中提到

print('成功下載!' if all(status) else '出了點錯下載不完全喔')

應改為

print('成功下載!' if status>0 else '出了點錯下載不完全喔')

因為 manager.py 裡面的 download() 回傳的是 sum(status)

methods of forum do not work

I found that this package does not work.
I tried these codes:

forum = dcard.forums('funny')
forum.get_metas()

It returns nothing: [ ]
Also command lines:

dcard download --forum photography --number 100

It returns:

成功下載 0 items!
Finish in 0.20728 sec(s).

I use Python36 on Linux.
My code used to work two months ago.

沒有動作

你好,很感謝你的套件,但最近好像不能使用,有空的話,麻煩可以修正一下,感謝~

Forum

不確定這樣改是不是最好的,不過 Forum 並不是一個 public 的 class,必須要透過 dcard 才拿的到

ariticle_metas = dcard.forums('funny').get_metas(num=Forum.infinite_page, sort='popular')

會報錯說不知道 Forum 是啥,暫時的 work around 是

ariticle_metas = dcard.forums('funny').get_metas(num=dcard.forums.infinite_page, sort='popular')

或許作者有更好的作法?

dcard.forums.get_metas() 回傳空list

from dcard import Dcard
dcard = Dcard()
dcard.forums.get_metas()

我執行以上程式碼,但回傳空list,沒有回傳所有看板的metadata。
想請問可能是哪裡出問題,感謝!

is it down?

nothing is working
is it my computer problem or someone else can't work too?

JSONDecodeError needs exception handle and retry

Traceback (most recent call last):
  File "spider.py", line 40, in main
    poster.get(callback=lambda posts, forum=forum_name: store_to_db(posts, forum))
  File "/home/ubuntu/workspace/dcard-spider/dcard/posts.py", line 51, in get
    return PostsResult(self.ids, bundle, callback)
  File "/home/ubuntu/workspace/dcard-spider/dcard/posts.py", line 73, in __init__
    self.results = self.format(bundle, callback)
  File "/home/ubuntu/workspace/dcard-spider/dcard/posts.py", line 100, in format
    post.update(cont.result().json()) if cont else None
  File "/home/ubuntu/workspace/dcard/venv/lib/python3.5/site-packages/requests/models.py", line 808, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)


Responsed with "invalid_token"

Error occurred while setting number of posts as 100000000 to harvest the whole "Dcard Sex" forum.
64612313

This bug did not appear while executing the same command one more time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.