danieliu / play-scraper Goto Github PK

View Code? Open in Web Editor NEW

234.0 234.0 103.0 164 KB

A web scraper to retrieve application data from the Google Play Store.

License: MIT License

Python 99.21% Makefile 0.79%

play-scraper's People

Contributors

Stargazers

Watchers

Forkers

winclap jicksonp eric-zhu yonashub apkudo alessandrodd rchatterjee corcoran shrikantkunte feal94 northshoot mikhmakarov nekorin ashleyco qzshucsz tbenjis costular gitdeng stirnim masroore greenkakadu ayush-mandowara-bst 6harat kyoungmo-yang fbalestra sungyongcho ratson parisbs nshgraph xiaoganghan aaaronaggie02 ggbaro amarantolaw kushalkrip ajinabraham chartboost pradyukrish fan-s-fujikawa superpoussin22 bfix nikolaisachok gerella davidlanz romeodingo munikumargv rabbit666666 gunesacar keyloguer pjmavadiya lenilsonms konstantinfarrell nicolaceschin almekhlafi2 elenanempp dewmal 314937885 sonvanes xiongweii nightxburn lemoz celegans615 li4097 an0nym0u5101 arixlin jarviswillwill gorcom hhhwwwuuu redpointssolutions volcanoflash chenfarong ytoast wandera wusiyan niteshoswal qianweihang furkankadioglu centillioncolors anudin rawheel cyberclone8 metou3 resonancellc soheilrt ashishbijlani yashomer1994 alzomaili mmdevlpr modo0202 adangadang captainrock karschau faraz-bst vineet-agrawal santhoshs20 kekea1

play-scraper's Issues

categories returns empty result

Describe the bug
When calling categories, it returns empty result

To Reproduce

import play_scraper
play_scraper.version
'0.5.5'
play_scraper.categories()
{}

Desktop (please complete the following information):

Ubuntu 16
Python 3.5.2
play_scraper Version 0.5.5

Change in class id: the solution

Google has changed class id for additional info.
Please change row 310 in utils.py from
soup.select_one('.xyOfqd'))
to
soup.select_one('.IxB2fe'))

add Privacy URL

Is your feature request related to a problem? Please describe.
The developer field has no privacy URL. So if it could be added will be great.

Describe the solution you'd like
To add privacy URL to developer detail.

I don't know whether the code beyond worked or not, just for advice.

Thanks for your reading.

developer_privacy = value_div.select('div')[-2].contents[0]

AttributeError: 'NoneType' object has no attribute 'attrs'

File "/home/huma/humascraper/HumaScraper/spiders/store/store.py", line 229, in processGooglePlay data = android.fetchFromGooglePlay(self.fakeName if self.fakeName else response.meta["packageName"]) File "/home/huma/humascraper/HumaScraper/common/util/android.py", line 243, in fetchFromGooglePlay gplay_info = play_scraper.details(package_name) File "/home/huma/humascraper/.env/lib/python3.5/site-packages/play_scraper/api.py", line 22, in details return s.details(app_id) File "/home/huma/humascraper/.env/lib/python3.5/site-packages/play_scraper/scraper.py", line 83, in details app_json = parse_app_details(soup) File "/home/huma/humascraper/.env/lib/python3.5/site-packages/play_scraper/utils.py", line 240, in parse_app_details icon = (soup.select_one('.dQrBL img.ujDFqe') AttributeError: 'NoneType' object has no attribute 'attrs'

i used play-scraper 0.5.4 and i think google change googleplay html pages

How can I list topselling_paid or topselling_free?

How can I list topselling_paid or topselling_free?
Links:
https://play.google.com/store/apps/category/GAME/collection/topselling_paid
https://play.google.com/store/apps/category/GAME/collection/topselling_free

suddenly fires error 'App not found (404)' on request

Describe the bug
I implemented this module in my service and worked well.
recently, module fires 'App not found(404)' error 70~80% of request.
strange thing is request success sometime.
that error on category only page

To Reproduce
import play_scraper
print(play_scraper.collection(
collection='TOP_GROSSING',
category='GAME',
gl='jp',
results=10,
page=0))

Error message
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/category/GAME/collection/topgrossing?hl=en&gl=jp

Desktop (please complete the following information):

OS: macOS 10.14.6
Python Version 3.7.4
play_scraper Version 0.5.5

Lack info returned

Is your feature request related to a problem? Please describe.
I don't see the code where you extract current_version, size and updated in https://github.com/danieliu/play-scraper/blob/master/play_scraper/utils.py
Whereas in your README you explained that it's possible !

Describe the solution you'd like
I would like details method return this informations

Thanks

Double encoding in search.

Description:
If search contains special characters, it will be encoded twice: the first one due to the quote_plus library and the second one is done by Google's servers. It is easily solved by removing quote_plus from parameters. That is, having:

        self.params.update({
            'q': query,
            'c': 'apps',
        })

instead of:

        self.params.update({
            'q': quote_plus(query),
            'c': 'apps',
        })

To Reproduce
The input will be the developer web of Instagram: https://help.instagram.com/
When running res = play_scraper.search("https://help.instagram.com/", detailed=True)
Play_scraper does the following query:
/store/search?q=https%253A%252F%252Fhelp.instagram.com%252F&c=apps&gl=us&hl=en
Which is not right. If we look for that url in a browser, a query with encodings is written automatically in the searchbox:

Expected behavior
If we manually put https://help.instagram.com/ in the searchbox, the url will be:
https://play.google.com/store/search?q=https%3A%2F%2Fhelp.instagram.com%2F&c=apps&hl=en&gl=us

If we use the piece of code without quote_plus, the url that is searched for is exactly the same as the desired one.

Desktop (please complete the following information):

play_scraper Version 0.6.0

server error for developer request

Hi!

Thanks for this package! I experience a problem with the most recent version:

Describe the bug
When using the code from the Github readme for requesting developer info, a 502 server error is raised.

To Reproduce

import play_scraper
print(play_scraper.developer('Disney', results=5))
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-23-86155d48969c> in <module>()
----> 1 dev = play_scraper.developer('Disney', results = 5)

~\Anaconda3\lib\site-packages\play_scraper\api.py in developer(developer, hl, gl, **kwargs)
     53     """
     54     s = scraper.PlayScraper(hl, gl)
---> 55     return s.developer(developer, **kwargs)
     56 
     57 

~\Anaconda3\lib\site-packages\play_scraper\scraper.py in developer(self, developer, results, page, detailed)
    158         url = build_url('developer', developer)
    159         data = generate_post_data(results, 0, pagtok)
--> 160         response = send_request('POST', url, data, self.params)
    161 
    162         if detailed:

~\Anaconda3\lib\site-packages\play_scraper\utils.py in send_request(method, url, data, params, headers, timeout, verify, allow_redirects)
    119             allow_redirects=allow_redirects)
    120         if not response.status_code == requests.codes.ok:
--> 121             response.raise_for_status()
    122     except requests.exceptions.RequestException as e:
    123         log.error(e)

~\Anaconda3\lib\site-packages\requests\models.py in raise_for_status(self)
    938 
    939         if http_error_msg:
--> 940             raise HTTPError(http_error_msg, response=self)
    941 
    942     def close(self):

HTTPError: 502 Server Error: Bad Gateway for url: https://play.google.com/store/apps/developer?id=Disney&hl=en&gl=us

Desktop (please complete the following information):

Windows 10
Python Version 3.6.
play_scraper Version e.g. 0.5.2

Additional context
I think the problem is due to the gl parameter, might also be related to #21.

Getting similar apps returns empty list

Describe the bug
Getting similar apps is broken, it does not return any apps. This also fails in the unit tests with AssertionError: 0 not greater than 0

To Reproduce
Easiest way is to run the unit tests.

Expected behavior
The tests do not fail and similar apps are returned when using the method.

Desktop (please complete the following information):

OS: macOS 10.14
Python Version 3.7
play_scraper Version 0.5.5

TypeError: search() got an unexpected keyword argument 'page'

Any one know how to fix it?

Outdated GPlay class name + uncaught exception

Describe the bug
Scraper crashes due to outdated class name. This issue causes an Exception to be raised later in the parsing process. This exception is not caught and crashes the scraper.

Traceback (most recent call last):
File "", line 1, in
File "site-packages/play_scraper/api.py", line 22, in details
return s.details(app_id)
File "site-packages/play_scraper/scraper.py", line 83, in details
app_json = parse_app_details(soup)
File "site-packages/play_scraper/utils.py", line 312, in parse_app_details
soup.select_one('.xyOfqd'))
File "site-packages/play_scraper/utils.py", line 138, in parse_additional_info
section_titles_divs = [x for x in soup.select('div.hAyfc div.BgcNfc')]
AttributeError: 'NoneType' object has no attribute 'select'

To Reproduce
Request details of any package name.

Expected behavior

Scraper returns additional_info_data variable in utils.py/parse_app_details(soup) as empty dictionary or
2.scraper makes check if utils.py/parse_additional_info(soup)/soup is not None.
Depends on the author's decision.

Screenshots
Not applicable

Desktop (please complete the following information):

OS: [macOS 10.14.3]
Python Version [2.7.10]
play_scraper Version [0.5.3]

Additional context
While obtaining details about an app today (2019-03-26) I have noticed, that tool starts crashing on utils.py, line 312 (... soup.select_one('.xyOfqd')))

After some investigation I have found out, that class name xyOfqd is not used by Google Play anymore. Instead class name IxB2fe is currently used by Google Play Store.
Preventing utils.py/parse_app_details(soup)/additional_info_data to be set as None might help with NoneType object accessing and prevent the AttributeError in utils.py, line 138

Exception Thrown by play-scraper.details Method

Describe the bug
Attempting to run the example described in the README:
print(play_scraper.details('com.android.chrome'))
returns a AttributeError.

To Reproduce

import play_scraper
print(play_scraper.details('com.android.chrome'))

Screenshots

Desktop (please complete the following information):

OS: macOS 10.14.1
Python Version: 3.6.5
play_scraper Version: 0.5.3

Additional context
Seems related to Issue #2

'NoneType' object has no attribute 'attrs'

It gives an error on line 240,
`icon = (soup.select_one('.dQrBL img.ujDFqe')`
On replacing it with : `icon = (soup.select_one ('. XSyT2c` img.T75of')` as suggested in Issue 42, the error was still not resolved.
Kindly suggest on what should be done to resolve this issue.

Variable and function names are the same.

play-scraper/play_scraper/api.py

Line 25 in 6059c47

def collection(collection, category=None, **kwargs):

I am getting RecursionError: maximum recursion depth exceeded error. I think you if you change the variable name to something else, that will fix the issue. Thanks.

problem installing with install-option

Hello,
trying to install the package:
pip install play-scraper --install-option="--prefix=/airflow"

is causing this error:

Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-3jPoyb/play-scraper/setup.py", line 10, in
with open('README.md', 'r', 'utf-8') as f:
File "/usr/lib64/python2.7/codecs.py", line 881, in open
file = builtin.open(filename, mode, buffering)
IOError: [Errno 2] No such file or directory: 'README.md'

how can we solve?

P.S. installing without the option works fine

thanks,
Lorenzo

Search retrives same apps although page is different

Hi.

When I try the search function the results are always the same:

pprint(p.search('tinder', page=1))
pprint(p.search('tinder', page=12))

Both calls gives me the same results.

Add implementation permission parser

Like a https://github.com/facundoolano/google-play-scraper/blob/dev/lib/permissions.js

Maybe I'll do in late October 🤷‍♂️

Tests are failing

Describe the bug
Tests are failing. There are 404 and 405 issues which seems to result in assertion errors.

To Reproduce

$ python3 setup.py test

Expected behavior
Passing of all tests.

Desktop (please complete the following information):

OS: Fedora 31
Python Version: Python 3.7.5 (default, Dec 15 2019, 17:54:26)
play_scraper Version 0.6.0

Additional context

Output

Executing(%check): /bin/sh -e /var/tmp/rpm-tmp.HGuyQ3
+ umask 022
+ cd /home/fab/rpmbuild/BUILD
+ cd play-scraper-0.6.0
+ /usr/bin/python3 setup.py test
running test
Searching for requests-futures>=0.9.7
Reading https://pypi.org/simple/requests-futures/
Downloading https://files.pythonhosted.org/packages/47/c4/fd48d1ac5110a5457c71ac7cc4caa93da10a80b8de71112430e439bdee22/requests-futures-1.0.0.tar.gz#sha256=35547502bf1958044716a03a2f47092a89efe8f9789ab0c4c528d9c9c30bc148
Best match: requests-futures 1.0.0
Processing requests-futures-1.0.0.tar.gz
Writing /tmp/easy_install-87wce9d5/requests-futures-1.0.0/setup.cfg
Running requests-futures-1.0.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-87wce9d5/requests-futures-1.0.0/egg-dist-tmp-kjhu22lj
creating /home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/.eggs/requests_futures-1.0.0-py3.7.egg
Extracting requests_futures-1.0.0-py3.7.egg to /home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/.eggs
Installed /home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/.eggs/requests_futures-1.0.0-py3.7.egg

running egg_info

writing play_scraper.egg-info/PKG-INFO

writing dependency_links to play_scraper.egg-info/dependency_links.txt

writing requirements to play_scraper.egg-info/requires.txt

writing top-level names to play_scraper.egg-info/top_level.txt

reading manifest file 'play_scraper.egg-info/SOURCES.txt'

reading manifest template 'MANIFEST.in'

writing manifest file 'play_scraper.egg-info/SOURCES.txt'

running build_ext

test_categories_ok (tests.test_scraper.CategoryTest) ... ok

test_different_language_and_country (tests.test_scraper.CategoryTest) ... ok

test_default_num_results (tests.test_scraper.CollectionTest) ... ERROR

test_detailed_collection (tests.test_scraper.CollectionTest) ... FAIL

test_detailed_collection_different_language (tests.test_scraper.CollectionTest) ... FAIL

test_family_with_age_collection (tests.test_scraper.CollectionTest) ... ERROR

test_invalid_category_id (tests.test_scraper.CollectionTest) ... ok

test_invalid_collection_id (tests.test_scraper.CollectionTest) ... ok

test_invalid_num_results_over_120 (tests.test_scraper.CollectionTest) ... ok

test_invalid_page_x_results_over_500 (tests.test_scraper.CollectionTest) ... ok

test_non_detailed_collection (tests.test_scraper.CollectionTest) ... FAIL

test_non_detailed_different_language_and_country (tests.test_scraper.CollectionTest) ... ERROR

test_promotion_collection_id (tests.test_scraper.CollectionTest) ... FAIL

test_app_with_no_developer (tests.test_scraper.DetailsTest) ... ERROR

test_fetching_app_in_spanish (tests.test_scraper.DetailsTest) ... ok

test_fetching_app_with_all_details (tests.test_scraper.DetailsTest) ... ok

test_developer_parameter_float_invalid (tests.test_scraper.DeveloperTest) ... ok

test_developer_parameter_int_invalid (tests.test_scraper.DeveloperTest) ... ok

test_developer_parameter_long_invalid (tests.test_scraper.DeveloperTest) ... ok

test_developer_parameter_string_digits_invalid (tests.test_scraper.DeveloperTest) ... ok

test_different_language_and_country (tests.test_scraper.DeveloperTest) ... ERROR

test_fetch_developer_apps_detailed (tests.test_scraper.DeveloperTest) ... ERROR

test_fetching_developer_default_results (tests.test_scraper.DeveloperTest) ... ERROR

test_maximum_results (tests.test_scraper.DeveloperTest) ... ERROR

test_over_max_results_fetches_five (tests.test_scraper.DeveloperTest) ... ERROR

test_page_out_of_range (tests.test_scraper.DeveloperTest) ... ok

test_init_with_defaults (tests.test_scraper.PlayScraperTest) ... ok

test_init_with_language_and_geolocation (tests.test_scraper.PlayScraperTest) ... ok

test_invalid_geolocation_code_raises (tests.test_scraper.PlayScraperTest) ... ok

test_invalid_language_code_raises (tests.test_scraper.PlayScraperTest) ... ok

test_basic_search (tests.test_scraper.SearchTest) ... ok

test_different_language_and_country (tests.test_scraper.SearchTest) ... ok

test_page_out_of_range_not_between_0_and_12 (tests.test_scraper.SearchTest) ... ok

test_search_with_app_detailed (tests.test_scraper.SearchTest) ... /home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=12, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47640), raddr=('172.217.168.46', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=9, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50764), raddr=('216.58.215.238', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50760), raddr=('216.58.215.238', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47632), raddr=('172.217.168.46', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47626), raddr=('172.217.168.46', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47628), raddr=('172.217.168.46', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50768), raddr=('216.58.215.238', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47624), raddr=('172.217.168.46', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47630), raddr=('172.217.168.46', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=11, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47636), raddr=('172.217.168.46', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

ok

test_different_language_and_country (tests.test_scraper.SimilarTest) ... ok

test_similar_ok (tests.test_scraper.SimilarTest) ... ok

test_similar_with_app_detailed (tests.test_scraper.SimilarTest) ... /home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47664), raddr=('172.217.168.46', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47680), raddr=('172.217.168.46', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47666), raddr=('172.217.168.46', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=9, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50796), raddr=('216.58.215.238', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50794), raddr=('216.58.215.238', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50798), raddr=('216.58.215.238', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47662), raddr=('172.217.168.46', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=11, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47678), raddr=('172.217.168.46', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47674), raddr=('172.217.168.46', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=12, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50802), raddr=('216.58.215.238', 443)>

return multi_futures_app_request(app_ids, params=self.params)

ResourceWarning: Enable tracemalloc to get the object allocation traceback

ok

test_different_language_and_country (tests.test_scraper.SuggestionTest) ... ok

test_empty_query (tests.test_scraper.SuggestionTest) ... ok

test_query_suggestions (tests.test_scraper.SuggestionTest) ... ok

test_list_url_both_args (tests.test_utils.TestBuildListUrl) ... ok

test_list_url_no_args (tests.test_utils.TestBuildListUrl) ... ok

test_list_url_only_category (tests.test_utils.TestBuildListUrl) ... ok

test_list_url_only_collection (tests.test_utils.TestBuildListUrl) ... ok

test_building_app_url (tests.test_utils.TestBuildUrl) ... ok

test_building_multiple_word_dev_name (tests.test_utils.TestBuildUrl) ... ok

test_building_simple_dev_name (tests.test_utils.TestBuildUrl) ... ok

test_default_post_data (tests.test_utils.TestGeneratePostData) ... ok

test_first_page_data (tests.test_utils.TestGeneratePostData) ... ok

test_only_num_results (tests.test_utils.TestGeneratePostData) ... ok

test_page_token (tests.test_utils.TestGeneratePostData) ... ok

test_request_with_params (tests.test_utils.TestSendRequest) ... ok

test_send_normal_request (tests.test_utils.TestSendRequest) ... ok
======================================================================

ERROR: test_default_num_results (tests.test_scraper.CollectionTest)
Traceback (most recent call last):

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 223, in test_default_num_results

self.assertTrue(all(key in apps[0] for key in BASIC_KEYS))

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 223, in 

self.assertTrue(all(key in apps[0] for key in BASIC_KEYS))

IndexError: list index out of range
======================================================================

ERROR: test_family_with_age_collection (tests.test_scraper.CollectionTest)
Traceback (most recent call last):

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 231, in test_family_with_age_collection

age='SIX_EIGHT')

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 132, in collection

response = send_request('POST', url, data, self.params)

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request

response.raise_for_status()

File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status

raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/category/FAMILY/collection/topselling_free?hl=en&gl=us&age=AGE_RANGE2
======================================================================

ERROR: test_non_detailed_different_language_and_country (tests.test_scraper.CollectionTest)
Traceback (most recent call last):

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 191, in test_non_detailed_different_language_and_country

apps = s.collection('TOP_PAID', 'LIFESTYLE', results=5)

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 132, in collection

response = send_request('POST', url, data, self.params)

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request

response.raise_for_status()

File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status

raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/category/LIFESTYLE/collection/topselling_paid?hl=da&gl=dk
======================================================================

ERROR: test_app_with_no_developer (tests.test_scraper.DetailsTest)
Traceback (most recent call last):

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 82, in details

response = send_request('GET', url, params=self.params)

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request

response.raise_for_status()

File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status

raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/details?id=org.selfie.beauty.camera.pro&hl=en&gl=us
During handling of the above exception, another exception occurred:
Traceback (most recent call last):

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 156, in test_app_with_no_developer

app_data = self.s.details('org.selfie.beauty.camera.pro')

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 86, in details

app=app_id, error=e))

ValueError: Invalid application ID: org.selfie.beauty.camera.pro. 404 Client Error: Not Found for url: https://play.google.com/store/apps/details?id=org.selfie.beauty.camera.pro&hl=en&gl=us
======================================================================

ERROR: test_different_language_and_country (tests.test_scraper.DeveloperTest)
Traceback (most recent call last):

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 302, in test_different_language_and_country

apps = s.developer('Google LLC', results=5)

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 165, in developer

response = send_request('POST', url, data, self.params)

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request

response.raise_for_status()

File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status

raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 405 Client Error: Method Not Allowed for url: https://play.google.com/store/apps/developer?id=Google+LLC&hl=da&gl=dk
======================================================================

ERROR: test_fetch_developer_apps_detailed (tests.test_scraper.DeveloperTest)
Traceback (most recent call last):

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 294, in test_fetch_developer_apps_detailed

apps = self.s.developer('Disney', results=3, detailed=True)

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 165, in developer

response = send_request('POST', url, data, self.params)

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request

response.raise_for_status()

File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status

raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 405 Client Error: Method Not Allowed for url: https://play.google.com/store/apps/developer?id=Disney&hl=en&gl=us
======================================================================

ERROR: test_fetching_developer_default_results (tests.test_scraper.DeveloperTest)
Traceback (most recent call last):

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 272, in test_fetching_developer_default_results

apps = self.s.developer('Disney')

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 165, in developer

response = send_request('POST', url, data, self.params)

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request

response.raise_for_status()

File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status

raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 405 Client Error: Method Not Allowed for url: https://play.google.com/store/apps/developer?id=Disney&hl=en&gl=us
======================================================================

ERROR: test_maximum_results (tests.test_scraper.DeveloperTest)
Traceback (most recent call last):

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 280, in test_maximum_results

apps = self.s.developer('Google LLC', results=120)

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 165, in developer

response = send_request('POST', url, data, self.params)

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request

response.raise_for_status()

File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status

raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 405 Client Error: Method Not Allowed for url: https://play.google.com/store/apps/developer?id=Google+LLC&hl=en&gl=us
======================================================================

ERROR: test_over_max_results_fetches_five (tests.test_scraper.DeveloperTest)
Traceback (most recent call last):

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 287, in test_over_max_results_fetches_five

apps = self.s.developer('Google LLC', results=121)

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 165, in developer

response = send_request('POST', url, data, self.params)

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request

response.raise_for_status()

File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status

raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 405 Client Error: Method Not Allowed for url: https://play.google.com/store/apps/developer?id=Google+LLC&hl=en&gl=us
======================================================================

FAIL: test_detailed_collection (tests.test_scraper.CollectionTest)
Traceback (most recent call last):

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 200, in test_detailed_collection

self.assertEqual(1, len(apps))

AssertionError: 1 != 0
======================================================================

FAIL: test_detailed_collection_different_language (tests.test_scraper.CollectionTest)
Traceback (most recent call last):

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 210, in test_detailed_collection_different_language

self.assertEqual(1, len(apps))

AssertionError: 1 != 0
======================================================================

FAIL: test_non_detailed_collection (tests.test_scraper.CollectionTest)
Traceback (most recent call last):

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 178, in test_non_detailed_collection

self.assertEqual(2, len(apps))

AssertionError: 2 != 0
======================================================================

FAIL: test_promotion_collection_id (tests.test_scraper.CollectionTest)
Traceback (most recent call last):

File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 241, in test_promotion_collection_id

self.assertEqual(2, len(apps))

AssertionError: 2 != 0

Ran 53 tests in 37.547s
FAILED (failures=4, errors=9)

Test failed: <unittest.runner.TextTestResult run=53 errors=9 failures=4>

error: Test failed: <unittest.runner.TextTestResult run=53 errors=9 failures=4>

error: Bad exit status from /var/tmp/rpm-tmp.HGuyQ3 (%check)

play_scraper always returns None for description and description_html

Describe the bug
always return None for description and description_html

{'title': 'WiFi Map — Free Passwords & Hotspots', 'icon': 'https://lh3.googleusercontent.com/SnhTxeJgxWHS7AlwEl_QNa2lpwCNFsL3Dqrzl-jMqvwFzYMIZW5V3IHUWFU3bKe6N8Kg', 'screenshots': [], 'video': 'https://www.youtube.com/embed/yl95ZgDTYp0', 'category': ['TRAVEL_AND_LOCAL'], 'score': '4.3', 'histogram': {5: None, 4: None, 3: None, 2: None, 1: None}, 'reviews': 700308, 'description': None, 'description_html': None, 'recent_changes': None, 'editors_choice': False, 'price': '0', 'free': True, 'iap': True, 'developer_id': '8565181914239008089', 'updated': 'April 20, 2019', 'size': '38M', 'installs': '50,000,000+', 'current_version': '4.1.17', 'required_android_version': '4.3 and up', 'content_rating': ['Everyone'], 'iap_range': ('$0.99', '$19.99'), 'interactive_elements': ['Users Interact, Digital Purchases'], 'developer': 'WiFi Map LLC', 'developer_email': '[email protected]', 'developer_url': 'https://www.wifimap.io/', 'developer_address': '25 Broadway, 9th Floor\nNew York, NY 10004', 'app_id': 'io.wifimap.wifimap', 'url': 'https://play.google.com/store/apps/details?id=io.wifimap.wifimap'}

To Reproduce

>>> import play_scraper
>>> play_scraper.details('io.wifimap.wifimap')

Expected behavior
fetch description and description_html correctly.

Screenshots
N/A

Desktop (please complete the following information):

OS: macOS 10.14.4
Python Version 3.7.3
play_scraper Version 0.5.5

Additional context
Add any other context about the problem here.

Improve requirements.txt

Hello @danieliu,

First, thank you for that great module.
Then, could you replace the == of the requirements.txt by >= ? Unless your module does not need these exact versions, I think it can be compatible with newer versions.

Cheers !

In collection few categories are not managed

The categories "APP_WALLPAPER","TRANSPORTATION","MEDIA_AND_VIDEO","APP_WIDGETS" have no collection defined for them as their url are not there in google play store.

Error:HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/category/APP_WALLPAPER/collection/movers_shakers

Outdated css class name for GPlay

Describe the bug
Google probably made update yesterday causing detail method to stop working
play_scraper/utils.py", line 240, in parse_app_details\n icon = (soup.select_one('.dQrBL img.ujDFqe')\nAttributeError: 'NoneType' object has no attribute 'attrs''

To Reproduce
Call detail method of with any package name

Desktop (please complete the following information):

play-scraper==0.5.4

RecursionError: maximum recursion depth exceeded while calling a Python object

Tried running a call in python shell. Error --> RecursionError: maximum recursion depth exceeded while calling a Python object. See trace below:

>>> print(play_scraper.details('com.android.chrome'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/play_scraper/api.py", line 22, in details
    return s.details(app_id)
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/play_scraper/scraper.py", line 292, in details
    response = send_request('GET', url)
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/play_scraper/utils.py", line 120, in send_request
    verify=verify)
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/urllib3/connectionpool.py", line 346, in _make_request
    self._validate_conn(conn)
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/urllib3/connectionpool.py", line 850, in _validate_conn
    conn.connect()
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/urllib3/connection.py", line 314, in connect
    cert_reqs=resolve_cert_reqs(self.cert_reqs),
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 269, in create_urllib3_context
    context.options |= options
  File "/usr/lib/python3.6/ssl.py", line 465, in options
    super(SSLContext, SSLContext).options.__set__(self, value)
  File "/usr/lib/python3.6/ssl.py", line 465, in options
    super(SSLContext, SSLContext).options.__set__(self, value)
  File "/usr/lib/python3.6/ssl.py", line 465, in options
    super(SSLContext, SSLContext).options.__set__(self, value)
  [Previous line repeated 323 more times]
RecursionError: maximum recursion depth exceeded while calling a Python object

output format

Hi, thanks for this cool scraper.
I just want to mention that the output data is not in a clean json format. when i used python json parsing library it shows error.

Thanks!

PlayScraper.categories() not working

PlayScraper.categories() returns empty list - Google has again changed the class attributes of anchor elements.
PR submitted

Specifying different languages breaks selectors used to find various app details

Passing in a different hl results in the alt attributes and the titles in the additional details section to change.

Selectors should be generalized to use obfuscated classes or other attributes that do not change between languages.

The additional details section HTML isn't specific enough for a clear way to differentiate between subsections and correctly parse the data out, unfortunately.

AttributeError: 'NoneType' object has no attribute 'strip'

Traceback (most recent call last): File "test_desc.py", line 2, in <module> print (play_scraper.details('com.z2p.devops.mathpuzzel')) File "/home/vmstation/.local/lib/python2.7/site-packages/play_scraper/api.py", line 22, in details return s.details(app_id) File "/home/vmstation/.local/lib/python2.7/site-packages/play_scraper/scraper.py", line 341, in details app_json = self._parse_app_details(soup) File "/home/vmstation/.local/lib/python2.7/site-packages/play_scraper/scraper.py", line 254, in _parse_app_details soup.select_one('.xyOfqd')) File "/home/vmstation/.local/lib/python2.7/site-packages/play_scraper/scraper.py", line 168, in _parse_additional_info 'developer_address': developer_address.strip()} AttributeError: 'NoneType' object has no attribute 'strip'

Missing apps are not skipped

Describe the bug
In rare situations, an app will be listed as a result in the search function, while the app actually has been (temporarily) removed from the Play store. When using the detailed=True argument; the package will throw an error once the missing app is scraped, as it tries to access the actual app page.

To Reproduce
Steps to reproduce the behavior, e.g. the full example code, not just a snippet of where the error occurs!

 $ print(play_scraper.search('CAUTI', gl='nl', detailed='True', page=6))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/api.py", line 79, in search
    return s.search(query, page, detailed)
  File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/scraper.py", line 224, in search
    apps = self._parse_multiple_apps(response)
  File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/scraper.py", line 71, in _parse_multiple_apps
    return multi_futures_app_request(app_ids, params=self.params)
  File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/utils.py", line 531, in multi_futures_app_request
    result = response.result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "(blahblah)/.env/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "(blahblah)/.env/lib/python3.6/site-packages/requests/sessions.py", line 653, in send
    r = dispatch_hook('response', hooks, r, **kwargs)
  File "(blahblah)/.env/lib/python3.6/site-packages/requests/hooks.py", line 31, in dispatch_hook
    _hook_data = hook(hook_data, **kwargs)
  File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/utils.py", line 504, in parse_app_details_response_hook
    details = parse_app_details(soup)
  File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/utils.py", line 239, in parse_app_details
    title = soup.select_one('h1[itemprop="name"] span').text
AttributeError: 'NoneType' object has no attribute 'text'

In the original usecase (function that iterated over the pages using celery) the following error was thrown as well:

[2019-10-28 10:48:41,362: ERROR/ForkPoolWorker-1] Error occurred fetching uk.incrediblesoftware.mpcmachine.demo: 404 Client Error: Not Found for url: https://play.google.com/store/apps/details?id=uk.incrediblesoftware.mpcmachine.demo&hl=en&gl=nl&q=CAUTI&c=apps

From this I tried to check out the actual play store page for uk.incrediblesoftware.mpcmachine.demo; which as expected, throws an HTTP 404 error.

Expected behavior
I hoped the package would print the 404-error; skip over this one and still return the remaining results. I can catch errors in my code to prevent problems, but that way an entire page of apps will still be excluded from the results.

Desktop (please complete the following information):

OS: Windows 10 - Running WSL Ubuntu 18.04
Python Version 3.6.8
play_scraper Version 0.6.0

Proxy support

Is your feature request related to a problem? Please describe.
play-scraper did not support proxy
so if a program is using play-scraper is behind a proxy, it will fail

Describe the solution you'd like
proxy support in play-scraper

links are dead by a recent google update

Describe the bug
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/category/GAME_RACING/collection/movers_shakers?gl=us&hl=en

To Reproduce

import play_scraper
print(play_scraper.collection(
... collection='TRENDING',
... category='GAME_RACING',
... results=5,
... page=1))

Expected behavior
list of trending games

Desktop (please complete the following information):

OS: Windows
Python Version : 2.7
play_scraper Version : latest

search not working

search returns nothing

Items in resulting dict are BeautifulSoup entities

Items in resulting dict are BeautifulSoup entities, not primitives as user may think after looking on examples.
I'm using this lib with multiprocessing. And I found out that results cannot be pickled by multiprocessing's map (RecursionError). I propose converting values in resulting dict into true primitives.

feature request: module for collecting reviews

Hi,

this is already a pretty nice package, but one addition would make it even better: an option for collecting information about the reviews of users:

date of review
stars of review
thumbs up of review
text of review
name of reviewer (might be good to hash it for maintaining anonymity)

There is a lot of stuff that could be done with reviewer information, for instance constructing relations between apps and their users (reviewers), or examining whether a fixed core of users is producing a lot of positive / negative reviews for some content.

get reviews

I would like to know if I could extract all reviews of an app.
Thanks

"developer" from the details() method is not working

Hi,
I already had contact with daniel and he told me to put this issue here.
I am using:

windows 10 operating system
pyhon 3.7.0
play-scraper 0.2.4

The rest seems to work fine. Atleast for the developer() method.

Code:
app_id = "com.igg.android.lordsmobile"
details = play_scraper.details(app_id)
print(details["developer"])

It keeps giving me Google Commerce Ltd for every app_id I put in.
The Google Commerce Ltd can been found on the bottom of each app page.

Offered By
Google Commerce Ltd

Maybe this has something to do with it?

Search Pagination

Hi!
The library is great so far and it is helping me a lot in one of my projects. I just have one question. In the search function there is now a limit of 12 to the number of pages. I have noticed that this is related to the PAGE_TOKENS in the settings. My question is: how can it be augmented to retrieve an arbitrary amount of page results?

Thanks!

AttributeError: 'NoneType' object has no attribute 'select'

I tried running the sample code from the readme:

import play_scraper
print(play_scraper.details('com.android.chrome'))

I get the following output:

Traceback (most recent call last):
  File "bug.py", line 2, in <module>
    print(play_scraper.details('com.android.chrome'))
  File "/usr/local/lib/python3.7/site-packages/play_scraper/api.py", line 22, in details
    return s.details(app_id)
  File "/usr/local/lib/python3.7/site-packages/play_scraper/scraper.py", line 83, in details
    app_json = parse_app_details(soup)
  File "/usr/local/lib/python3.7/site-packages/play_scraper/utils.py", line 312, in parse_app_details
    soup.select_one('.xyOfqd'))
  File "/usr/local/lib/python3.7/site-packages/play_scraper/utils.py", line 138, in parse_additional_info
    section_titles_divs = [x for x in soup.select('div.hAyfc div.BgcNfc')]
AttributeError: 'NoneType' object has no attribute 'select'

Desktop (please complete the following information):

OS: macOS 10.14.2
Python Version 3.7.2
play_scraper Version 0.5.1

Getting Errors

Hi,

This is what i running:

import play_scraper
import csv
import json
data=( play_scraper.search("Disney", page=1, detailed=True))
print(data.count(data))

also same when running:
print play_scraper.details('com.android.chrome')

Getting errors:

    data=( play_scraper.search("Disney", page=1, detailed=True))
  File "C:\Python27\lib\site-packages\play_scraper\api.py", line 79, in search
    return s.search(query, **kwargs)
  File "C:\Python27\lib\site-packages\play_scraper\scraper.py", line 415, in search
    apps = self._parse_multiple_apps(response)
  File "C:\Python27\lib\site-packages\play_scraper\scraper.py", line 273, in _parse_multiple_apps
    apps.append(self._parse_app_details(soup))
  File "C:\Python27\lib\site-packages\play_scraper\scraper.py", line 155, in _parse_app_details
    updated = additional_info.select_one('div[itemprop="datePublished"]')
AttributeError: 'NoneType' object has no attribute 'select_one'

Thanks,

save data to csv file

Really it is great post. Thank you so much.
As I am very new to python , I am unable to save the output to csv file for filtering based on number of installations.

Developer field not working for play_scraper.collection() with gl<>'us' and detailed=True

Edit : not a problem with the library, the webpage itself is wrong when using a non default gf. So I guess nothing you can do about it (by the way thanks for this very useful library). Workaround I used : going through the list of apps scraped a second time using gl='us'.
When using play_scraper.collection() with gl<>'us' (ex:gl='fr') and with detailed=True, the developer field is always 'Google Commerce Ltd'.
Developer field is correct when using default gl.

Country Codes Does Not Match Google's Own Country Code List

Describe the bug
play_scraper/constants.py - GL_COUNTRY_CODES list, any gl= must match, but this does not match Google's own country code list at https://developers.google.com/public-data/docs/canonical/countries_csv, which leads to 404 errors on countries listed in GL_COUNTRY_CODES and valid country codes being rejected.

To Reproduce

Run any play_scraper call with a gl="kp", valid according to play-scraper, not valid according to Google:
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/developer?id=&gl=kp&hl=en
Run play_scraper with gl='im' results in error from play-scraper:

ValueError: im is not a valid geolocation country code.

However https://play.google.com/store/apps/developer?id=&gl=im results in valid content from Google.

Expected behavior
More maintained GL_COUNTRY_CODE list, or allowing overrides to not validate internally.

Do it support Python3.7

when I invoke 'python3 -m pip install play_scraper', it say:
command "/Library/Frameworks/Python.framework/Versions/3.7/bin/python3 -u -c "import setuptools, tokenize;__file__='/private/tmp/pip-install-cj268d40/lxml/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/tmp/pip-record-mzwfg42a/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/tmp/pip-install-cj268d40/lxml/

Error running simple example. Help please.

import play_scraper
print play_scraper.details('com.android.chrome')

File "<ipython-input-8-60b6c1359646>", line 2
    print play_scraper.details('com.android.chrome')
                     ^
SyntaxError: invalid syntax

Running on Mac OS with python:
3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]

IndexError: list index out of range while getting collections

Describe the bug
While requesting NEW_FREE collection I get IndexError exception:

File ".../python2.7/site-packages/play_scraper/api.py", line 41, in collection
return s.collection(collection, category, **kwargs)
File ".../python2.7/site-packages/play_scraper/scraper.py", line 134, in collection
for app_card in soup.select('div[data-uitype="500"]')]
File ".../python2.7/site-packages/play_scraper/utils.py", line 358, in parse_card_info
developer_id = dev_soup.attrs['href'].split('=')[1]
IndexError: list index out of range

To Reproduce
run play_scraper.collection(collection='NEW_FREE',results=100,page=0,gl='us')

Expected behavior
Should receive list of app metadata from specified Chart in Google Play

Screenshots
Not applicable

Desktop (please complete the following information):

OS: [Linux Ubuntu 12.04.5 LTS, Precise Pangolin, MacOS Mojave 10.14.3]
Python Version: [2.7.3, 2.7.10]
play_scraper Version: [0.5.2: 2019-01-19]

Additional context
It seems, like the scraper is getting wrong response from Google Play and cannot parse it correctly.

play-scraper always return empty array for screenshot

Describe the bug
screenshot attribute always empty array, description, description_html

To Reproduce
location = "kr"
lang="ko"
ajson = play_scraper.details("com.dena.a12026418",gl=location,hl=lang)
print(ajson)
--------------- return -------------------
{'title': 'Pokémon Masters', 'icon': 'https://lh3.googleusercontent.com/Qow956nxep_gy5lWMRXd7hTX-SUE-m8Un4etpm6o1A3AAjFvesAq-YyM1Fy9qjr1uZBe', 'screenshots': [], 'video': 'https://www.youtube.com/embed/FV2ISpwZRck', 'category': ['GAME_ROLE_PLAYING'], 'score': '3.8', 'histogram': {}, 'reviews': 0, 'description': None, 'description_html': None, 'recent_changes': None, 'editors_choice': False, 'price': '0', 'free': True, 'iap': False, 'developer_id': '5614074995304947897', 'updated': None, 'size': None, 'installs': None, 'current_version': None, 'required_android_version': None, 'content_rating': None, 'iap_range': None, 'interactive_elements': None, 'developer': None, 'developer_email': None, 'developer_url': None, 'developer_address': None, 'app_id': 'com.dena.a12026418', 'url': 'https://play.google.com/store/apps/details?id=com.dena.a12026418'}

In this page, we found screen shot and description html
https://play.google.com/store/apps/details?id=com.dena.a12026418&hl=ko&gl=kr

Expected behavior
screen shot array and description and description_html appear

Desktop (please complete the following information):

OS: [e.g. Windows 10]
Python Version [e.g. 3.7.3]
play_scraper Version [0.5.5]

Wrong selector beautiful soup

Describe the bug
The scraper crash because selector css not found

To Reproduce
play_scraper.details('whatever_id')

'NoneType' object has no attribute 'attrs'

Expected behavior
No crash when selector not found

Desktop

OS: Win 7
Python Version 3.7
play_scraper Version 0.5.4

The issue is on utils.py line 241
soup.select_one('.dQrBL img.ujDFqe')

What's new / "recent_changes" returns None

Describe the bug
What's New / "recent_changes" returns None even though it exists.

To Reproduce

import play_scraper as ps

data = ps.details('com.supercell.clashofclans')
print(data['recent_changes'])

Expected behavior
Should return the value for 'what's new' when it exists.

Screenshots
Not required

Desktop (please complete the following information):

OS: [Windows 10]
Python Version [3.7.2]
play_scraper Version [0.5.2]

Additional context
Issue can be resolved by changing code to
recent_changes = changes_soup.text

screenshots Problem

com.Rain.Teslagrad

for egs in gPlayBilgi['screenshots']:
            print("Ekran:",egs)

Output:
data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==

Got : AttributeError: 'module' object has no attribute '_base'

Got this error after installing play_scraper and try to import it
please help,

thanks

[root@CT114 ~]# python
Python 2.7.5 (default, Aug  4 2017, 00:39:18)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import play_scraper
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/play_scraper/__init__.py", line 13, in <module>
    from play_scraper.api import (
  File "/usr/lib/python2.7/site-packages/play_scraper/api.py", line 11, in <module>
    from play_scraper import scraper
  File "/usr/lib/python2.7/site-packages/play_scraper/scraper.py", line 15, in <module>
    from bs4 import BeautifulSoup, SoupStrainer
  File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 30, in <module>
    from .builder import builder_registry, ParserRejectedMarkup
  File "/usr/lib/python2.7/site-packages/bs4/builder/__init__.py", line 314, in <module>
    from . import _html5lib
  File "/usr/lib/python2.7/site-packages/bs4/builder/_html5lib.py", line 70, in <module>
    class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder):
AttributeError: 'module' object has no attribute '_base'

danieliu / play-scraper Goto Github PK

play-scraper's People

Contributors

Stargazers

Watchers

Forkers

play-scraper's Issues

It gives an error on line 240, icon = (soup.select_one('.dQrBL img.ujDFqe') On replacing it with : icon = (soup.select_one ('. XSyT2c img.T75of')` as suggested in Issue 42, the error was still not resolved. Kindly suggest on what should be done to resolve this issue.

====================================================================== ERROR: test_default_num_results (tests.test_scraper.CollectionTest)

====================================================================== ERROR: test_family_with_age_collection (tests.test_scraper.CollectionTest)

====================================================================== ERROR: test_non_detailed_different_language_and_country (tests.test_scraper.CollectionTest)

====================================================================== ERROR: test_app_with_no_developer (tests.test_scraper.DetailsTest)

====================================================================== ERROR: test_different_language_and_country (tests.test_scraper.DeveloperTest)

====================================================================== ERROR: test_fetch_developer_apps_detailed (tests.test_scraper.DeveloperTest)

====================================================================== ERROR: test_fetching_developer_default_results (tests.test_scraper.DeveloperTest)

====================================================================== ERROR: test_maximum_results (tests.test_scraper.DeveloperTest)

====================================================================== ERROR: test_over_max_results_fetches_five (tests.test_scraper.DeveloperTest)

====================================================================== FAIL: test_detailed_collection (tests.test_scraper.CollectionTest)

====================================================================== FAIL: test_detailed_collection_different_language (tests.test_scraper.CollectionTest)

====================================================================== FAIL: test_non_detailed_collection (tests.test_scraper.CollectionTest)

====================================================================== FAIL: test_promotion_collection_id (tests.test_scraper.CollectionTest)

Recommend Projects

Recommend Topics

Recommend Org

It gives an error on line 240,
`icon = (soup.select_one('.dQrBL img.ujDFqe')`
On replacing it with : `icon = (soup.select_one ('. XSyT2c` img.T75of')` as suggested in Issue 42, the error was still not resolved.
Kindly suggest on what should be done to resolve this issue.

======================================================================
ERROR: test_default_num_results (tests.test_scraper.CollectionTest)

======================================================================
ERROR: test_family_with_age_collection (tests.test_scraper.CollectionTest)

======================================================================
ERROR: test_non_detailed_different_language_and_country (tests.test_scraper.CollectionTest)

======================================================================
ERROR: test_app_with_no_developer (tests.test_scraper.DetailsTest)

======================================================================
ERROR: test_different_language_and_country (tests.test_scraper.DeveloperTest)

======================================================================
ERROR: test_fetch_developer_apps_detailed (tests.test_scraper.DeveloperTest)

======================================================================
ERROR: test_fetching_developer_default_results (tests.test_scraper.DeveloperTest)

======================================================================
ERROR: test_maximum_results (tests.test_scraper.DeveloperTest)

======================================================================
ERROR: test_over_max_results_fetches_five (tests.test_scraper.DeveloperTest)

======================================================================
FAIL: test_detailed_collection (tests.test_scraper.CollectionTest)

======================================================================
FAIL: test_detailed_collection_different_language (tests.test_scraper.CollectionTest)

======================================================================
FAIL: test_non_detailed_collection (tests.test_scraper.CollectionTest)

======================================================================
FAIL: test_promotion_collection_id (tests.test_scraper.CollectionTest)