Giter Site home page Giter Site logo

csu / pyquora Goto Github PK

View Code? Open in Web Editor NEW
131.0 8.0 72.0 255 KB

A Python module for fetching and parsing data from Quora.

Home Page: http://christopher.su/pyquora/

License: Other

Python 100.00%
quora python statistics parsed-data python-library

pyquora's People

Contributors

aaronwinter avatar csu avatar gitter-badger avatar rohithpr avatar svisser avatar vikas-parashar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyquora's Issues

Unhandled exception in `get_question_stats()`

When the get_question_stats() method is called with an invalid question, an unhandled exception occurs. Here's a dump of an error:

question = Quora.get_question_stats('Medicine-and-Healthcare')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/quora/quora.py", line 115, in get_question_stats
return Quora.scrape_question_stats(soup)
File "/usr/local/lib/python2.7/dist-packages/quora/quora.py", line 125, in scrape_question_stats
answer_count = soup.find('div', attrs={'class' : 'answer_count'}).next.split()[0]
AttributeError: 'NoneType' object has no attribute 'next'

Move "Usage"

Currently it is in readme.md. Wouldn't it be better to move it from there to another folder with code examples?

Take a different approach to testing

If I'm not wrong, the tests are checking if data has been scraped. It does not check if it has been done correctly.

How about adding selected HTML pages into the test folder rather than loading the page from Quora every time the test is run? This way, we can check if there is a difference between what was expected and what was received.

Write test cases to ensure legacy support

Doesn't need to actually check the functionality of the methods/API (because they would just be aliases to methods that are being tested elsewhere in the test suite), just need to check that the old API/methods exist and can be called with the correct parameters.

Fix get_one_answer

It's still not working for me. It's also not working at http://quora-api.herokuapp.com/answers/How-can-I-join-Open-Source-Rails-projects/Tobias-Sandelius.

>>> from quora import Quora
>>> Quora.get_one_answer('How-can-I-join-Open-Source-Rails-projects', 'Tobias-Sandelius')
{}

get_user_stats raises an IndexError

stats = quora.get_user_stats('Christopher-J-Su')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "../quora/quora.py", line 143, in get_user_stats
    return User.get_user_stats(u)
  File "../quora/user.py", line 156, in get_user_stats
    user_dict = {'answers'   : data_stats[1],
IndexError: list index out of range

User information hidden when not logged in

Looks like Quora is masking usernames when you view an answer without logging in. "Quora User" is shown in place of the user's actual name. This is also affecting answers fetched by requests.

I haven't checked to see the extent to which this is applied.

screenshot from 2015-11-15 18 59 46

PS: It's not an issue with the user being banned or anything, it shows the name properly after logging in.

Come up with a better way to test Activity

Right now, my example tests (wrote just to get CI working) just check to see if any of the activity attributes (answers, questions, etc.) return an empty list. This isn't always necessarily correct.

For example, if someone hasn't posted a review in a long time, their activity.review_requests will be empty, even if pyquora is working properly.

Fix answer activity

Answers aren't being parsed from the feed properly.

Test code:

from quora import Quora, Activity

quora = Quora()
activity = quora.get_activity('Christopher-J-Su')
print activity.answers

Results:

(env)csu:pyquora (master)$ python debug.py
[]

Also, from quora-api:

{
  "items": []
}

try_cast_int ignores the 'k' in cases where there are over a thousand upvotes/want answers

print Quora.get_question_stats('What-are-the-best-Cyanide-Happiness-comics')
{'want_answers': 2, 'question_text': u'What are the best Cyanide & Happiness comics?', 'topics': [u'Communication', u'Writing', u'Books', u'Publishing', u'Comics (narrative art form)'], 'question_details': None, 'answer_count': 474, 'answer_wiki': None}

want_answers should've been 2k! ๐Ÿ˜†

Correct USAGE instruction in README.md

Currently the USAGE instruction in README.md is like this:



    from quora import Quora, Activity

    quora = new Quora()

    # get user activity
    activity = get_activity('Christopher-J-Su')

But it should be like this:



from quora import Quora, Activity

quora = Quora()

# get user activity
activity = quora.get_activity('Christopher-J-Su')


Quora is blocking scrapers

As I've stated here, quora is blocking some (all?) scripts.

from bs4 import BeautifulSoup
import requests

url = 'http://www.quora.com/search?q=flowers'
soup = BeautifulSoup(requests.get(url).text)
print soup
<html>
  <head>
    <title>503 Service Unavailable</title>
  </head>
  <body>
    <h1>503 Service Unavailable</h1>
      The server is currently unavailable. Please try again at a later time.<br/><br/>
      Our automated scripts have detected a possible scraper. If you feel we have made an error, please email [email protected]. Sorry for the inconvenience. Thanks.


  </body>
</html>

Add question statistics

Fetch the number of views, edits, followers, etc. for a question, but not the content (for now, just to be safe ๐Ÿ˜„).

get_latest_answers returns some empty dicts

This happens when the answer's author has a number at the end of their username.
Ex: Foo-Bar-23 but we make a function call as: get_one_answer(question, 'Foo-Bar')

One way to overcome this would be to check for invalid dicts and keep making function calls as:
get_one_answer(question, 'Foo-Bar-1'), get_one_answer(question, 'Foo-Bar-2') and so on till a valid dict is received but it is highly inefficient.

So we need to find another way to get these answers.

Question details doesn't work

Try Is-there-a-proof-of-the-Four-Color-Theorem-that-does-not-involve-substantial-computation.

GET: http://quora-api.herokuapp.com/questions/Is-there-a-proof-of-the-Four-Color-Theorem-that-does-not-involve-substantial-computation

Output:

{
  "answer_count": 4, 
  "answer_wiki": "<div class=\"hidden\" id=\"answer_wiki\"><div id=\"ld_ebgwib_28688\"><div id=\"__w2_sHb6iqm_wiki\"></div></div></div>", 
  "question_details": null, 
  "question_text": "Is there a proof of the Four Color Theorem that does not involve substantial computation?", 
  "topics": [
    "Science, Engineering, and Technology", 
    "Science", 
    "Formal Sciences", 
    "Mathematics"
  ], 
  "want_answers": 1
}

question_details is null, but the question has details on Quora.

get_random_answers breaks

>>> from quora import Quora
>>> Quora.get_random_answers(5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "quora/quora.py", line 139, in get_random_answers
    answer = Quora.get_one_answer(question)
  File "quora/quora.py", line 50, in get_one_answer
    return Quora.scrape_one_answer(soup)
  File "quora/quora.py", line 54, in scrape_one_answer
    answer = soup.find('div', id = re.compile('_answer_content$')).find('div', id = re.compile('_container'))
AttributeError: 'NoneType' object has no attribute 'find'
>>>

Use Python properties to have User Activity as an attribute to User

e.g. the end API usage should be like

user = Quora.User('Christopher-J-Su')
activity = user.activity
print activity.activity_type

I.e. we shouldn't have to call a method to get activity, rather, it should be an attribute of the User class like the other statistics (followers, following, edits, etc.).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.