csu / pyquora Goto Github PK

View Code? Open in Web Editor NEW

131.0 8.0 72.0 255 KB

A Python module for fetching and parsing data from Quora.

Home Page: http://christopher.su/pyquora/

License: Other

Python 100.00%

quora python statistics parsed-data python-library

pyquora's People

Contributors

Stargazers

Watchers

Forkers

erwanor daoqiu gitter-badger praroh2 rohithpr kleopatra999 svisser pavbooji sanms sergeikutanov eduoss mayankagarwal vikas-parashar quora-api sumit12dec michal3141 costrella ashish1294 vyasgiridhar saishredkar edwinksl yz2869 sunilk747 bcongdon bhaveshmunot1 cuxidumdum aman-roy steensply jatinjindalj pgorsira geetanvesh andzi quora-users johnson8087 slumdoge kshitij6495 phanikiranthaticharla machinelearningcommunity drewer9 python3pkg ankurpandey42 peterkcwu akaanirban bala1718 sivunq chakchak1234 yunchu2019 wang91an tecno14 00mjk swapnil-sudhir mohammedgomaa zycalice grizz97 brechtcorbeel 1blue-sky fushiqingyun

pyquora's Issues

Unhandled exception in `get_question_stats()`

When the get_question_stats() method is called with an invalid question, an unhandled exception occurs. Here's a dump of an error:

question = Quora.get_question_stats('Medicine-and-Healthcare')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/quora/quora.py", line 115, in get_question_stats
return Quora.scrape_question_stats(soup)
File "/usr/local/lib/python2.7/dist-packages/quora/quora.py", line 125, in scrape_question_stats
answer_count = soup.find('div', attrs={'class' : 'answer_count'}).next.split()[0]
AttributeError: 'NoneType' object has no attribute 'next'

Write class/serializer for question statistics

Add topic follows to user activity

Add tests for question statistics

Most static methods in Quora and User really should be class methods

We should fix this, but maintain the legacy API so things that currently use pyquora don't need to be rewritten. We can throw away the legacy support at a certain milestone, like v2.0 or something.

Move "Usage"

Currently it is in readme.md. Wouldn't it be better to move it from there to another folder with code examples?

Add tests for legacy API

So csu/quora-backup#6 doesn't happen again.

Write basic tests for user statistics

Fix activity so that it recognizes "want answers" instead of "followed question"

In light of new Quora UI changes, we need to fix how we detect question follows in user activity.

Standardize the `question` returned by get_one_answer

There are three ways of calling this function and each one returns different value of question.

Take a different approach to testing

If I'm not wrong, the tests are checking if data has been scraped. It does not check if it has been done correctly.

How about adding selected HTML pages into the test folder rather than loading the page from Quora every time the test is run? This way, we can check if there is a difference between what was expected and what was received.

Write test cases to ensure legacy support

Doesn't need to actually check the functionality of the methods/API (because they would just be aliases to methods that are being tested elsewhere in the test suite), just need to check that the old API/methods exist and can be called with the correct parameters.

Fix get_one_answer

It's still not working for me. It's also not working at http://quora-api.herokuapp.com/answers/How-can-I-join-Open-Source-Rails-projects/Tobias-Sandelius.

>>> from quora import Quora
>>> Quora.get_one_answer('How-can-I-join-Open-Source-Rails-projects', 'Tobias-Sandelius')
{}

how to get all the answers of a question

using Quora.get_one_answer('6hARL') can only get one answer for a question.
I mean how to get all the answers of a question?
thanks

Make Python 3 compatible

Aiming for Python 2.6+ and Python 3.3+ compatibility.

Split the class `Quora` into `User` and `Question`

Or some other name instead of Question, this class will be responsible for questions and answers.

Add question text/title to question statistics

get_user_stats raises an IndexError

stats = quora.get_user_stats('Christopher-J-Su')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "../quora/quora.py", line 143, in get_user_stats
    return User.get_user_stats(u)
  File "../quora/user.py", line 156, in get_user_stats
    user_dict = {'answers'   : data_stats[1],
IndexError: list index out of range

How to get latest or popular question?

I don't see any endpoint. is there any method to do that?
thanks

Set up Read the Docs

Add tests for answer statistics

User information hidden when not logged in

Looks like Quora is masking usernames when you view an answer without logging in. "Quora User" is shown in place of the user's actual name. This is also affecting answers fetched by requests.

I haven't checked to see the extent to which this is applied.

PS: It's not an issue with the user being banned or anything, it shows the name properly after logging in.

Come up with a better way to test Activity

Right now, my example tests (wrote just to get CI working) just check to see if any of the activity attributes (answers, questions, etc.) return an empty list. This isn't always necessarily correct.

For example, if someone hasn't posted a review in a long time, their activity.review_requests will be empty, even if pyquora is working properly.

nosetest imports `quora` from whatever is installed in the virtualenv and not the current working directory

I ran into a little something when I was writing tests and made changes to quora.py.
Those changes aren't useful as nosetest is importing quora from the venv.
Is this the expected behaviour or am I doing something wrong?

Write test suite

Fix answer activity

Answers aren't being parsed from the feed properly.

Test code:

from quora import Quora, Activity

quora = Quora()
activity = quora.get_activity('Christopher-J-Su')
print activity.answers

Results:

(env)csu:pyquora (master)$ python debug.py
[]

Also, from quora-api:

{
  "items": []
}

Add docstrings

help(quora) is pretty unhelpful!

Open organization to give contributors push access

@rohithpr and @aaronwinter have contributed enough and are familiar enough with the codebase to directly push to pyquora and quora-api, as well as review and accept pull requests. An org should be created to grant them push access to the repositories.

try_cast_int ignores the 'k' in cases where there are over a thousand upvotes/want answers

print Quora.get_question_stats('What-are-the-best-Cyanide-Happiness-comics')
{'want_answers': 2, 'question_text': u'What are the best Cyanide & Happiness comics?', 'topics': [u'Communication', u'Writing', u'Books', u'Publishing', u'Comics (narrative art form)'], 'question_details': None, 'answer_count': 474, 'answer_wiki': None}

want_answers should've been 2k! 😆

Correct USAGE instruction in README.md

Currently the USAGE instruction in README.md is like this:



    from quora import Quora, Activity

    quora = new Quora()

    # get user activity
    activity = get_activity('Christopher-J-Su')

But it should be like this:



from quora import Quora, Activity

quora = Quora()

# get user activity
activity = quora.get_activity('Christopher-J-Su')

Quora is blocking scrapers

As I've stated here, quora is blocking some (all?) scripts.

from bs4 import BeautifulSoup
import requests

url = 'http://www.quora.com/search?q=flowers'
soup = BeautifulSoup(requests.get(url).text)
print soup

<html>
  <head>
    <title>503 Service Unavailable</title>
  </head>
  <body>
    <h1>503 Service Unavailable</h1>
      The server is currently unavailable. Please try again at a later time.<br/><br/>
      Our automated scripts have detected a possible scraper. If you feel we have made an error, please email [email protected]. Sorry for the inconvenience. Thanks.


  </body>
</html>

Add question statistics

Fetch the number of views, edits, followers, etc. for a question, but not the content (for now, just to be safe 😄).

Rewrite tests to use soups from local test HTML files

get_latest_answers returns some empty dicts

This happens when the answer's author has a number at the end of their username.
Ex: Foo-Bar-23 but we make a function call as: get_one_answer(question, 'Foo-Bar')

One way to overcome this would be to check for invalid dicts and keep making function calls as:
get_one_answer(question, 'Foo-Bar-1'), get_one_answer(question, 'Foo-Bar-2') and so on till a valid dict is received but it is highly inefficient.

So we need to find another way to get these answers.

get_user_activity does not scrap data anymore because of recent UI changes

Because of Quora's recent UI change Quora.get_user_activity does not scrap data correctly.

A direct consequence on quora-api can be observed by making a GET request on:
http://quora-api.herokuapp.com/users//activity/answers
where an empty array is returned.

https://github.com/csu/pyquora/edit/master/quora/pyquora.py#L45

Output:

{
  "answer_count": 4, 
  "answer_wiki": "<div class=\"hidden\" id=\"answer_wiki\"><div id=\"ld_ebgwib_28688\"><div id=\"__w2_sHb6iqm_wiki\"></div></div></div>", 
  "question_details": null, 
  "question_text": "Is there a proof of the Four Color Theorem that does not involve substantial computation?", 
  "topics": [
    "Science, Engineering, and Technology", 
    "Science", 
    "Formal Sciences", 
    "Mathematics"
  ], 
  "want_answers": 1
}

question_details is null, but the question has details on Quora.

Rewrite test suite to use new API

get_random_answers breaks

>>> from quora import Quora
>>> Quora.get_random_answers(5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "quora/quora.py", line 139, in get_random_answers
    answer = Quora.get_one_answer(question)
  File "quora/quora.py", line 50, in get_one_answer
    return Quora.scrape_one_answer(soup)
  File "quora/quora.py", line 54, in scrape_one_answer
    answer = soup.find('div', id = re.compile('_answer_content$')).find('div', id = re.compile('_container'))
AttributeError: 'NoneType' object has no attribute 'find'
>>>

user = Quora.User('Christopher-J-Su')
activity = user.activity
print activity.activity_type

I.e. we shouldn't have to call a method to get activity, rather, it should be an attribute of the User class like the other statistics (followers, following, edits, etc.).