danieliu / play-scraper Goto Github PK
View Code? Open in Web Editor NEWA web scraper to retrieve application data from the Google Play Store.
License: MIT License
A web scraper to retrieve application data from the Google Play Store.
License: MIT License
Describe the bug
When calling categories, it returns empty result
To Reproduce
import play_scraper
play_scraper.version
'0.5.5'
play_scraper.categories()
{}
Desktop (please complete the following information):
Google has changed class id for additional info.
Please change row 310 in utils.py from
soup.select_one('.xyOfqd'))
to
soup.select_one('.IxB2fe'))
Is your feature request related to a problem? Please describe.
The developer field has no privacy URL. So if it could be added will be great.
Describe the solution you'd like
To add privacy URL to developer detail.
I don't know whether the code beyond worked or not, just for advice.
Thanks for your reading.
developer_privacy = value_div.select('div')[-2].contents[0]
File "/home/huma/humascraper/HumaScraper/spiders/store/store.py", line 229, in processGooglePlay data = android.fetchFromGooglePlay(self.fakeName if self.fakeName else response.meta["packageName"]) File "/home/huma/humascraper/HumaScraper/common/util/android.py", line 243, in fetchFromGooglePlay gplay_info = play_scraper.details(package_name) File "/home/huma/humascraper/.env/lib/python3.5/site-packages/play_scraper/api.py", line 22, in details return s.details(app_id) File "/home/huma/humascraper/.env/lib/python3.5/site-packages/play_scraper/scraper.py", line 83, in details app_json = parse_app_details(soup) File "/home/huma/humascraper/.env/lib/python3.5/site-packages/play_scraper/utils.py", line 240, in parse_app_details icon = (soup.select_one('.dQrBL img.ujDFqe') AttributeError: 'NoneType' object has no attribute 'attrs'
i used play-scraper 0.5.4 and i think google change googleplay html pages
How can I list topselling_paid or topselling_free?
Links:
https://play.google.com/store/apps/category/GAME/collection/topselling_paid
https://play.google.com/store/apps/category/GAME/collection/topselling_free
Describe the bug
I implemented this module in my service and worked well.
recently, module fires 'App not found(404)' error 70~80% of request.
strange thing is request success sometime.
that error on category only page
To Reproduce
import play_scraper
print(play_scraper.collection(
collection='TOP_GROSSING',
category='GAME',
gl='jp',
results=10,
page=0))
Error message
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/category/GAME/collection/topgrossing?hl=en&gl=jp
Desktop (please complete the following information):
Is your feature request related to a problem? Please describe.
I don't see the code where you extract current_version
, size
and updated
in https://github.com/danieliu/play-scraper/blob/master/play_scraper/utils.py
Whereas in your README you explained that it's possible !
Describe the solution you'd like
I would like details
method return this informations
Thanks
Description:
If search contains special characters, it will be encoded twice: the first one due to the quote_plus library and the second one is done by Google's servers. It is easily solved by removing quote_plus from parameters. That is, having:
self.params.update({
'q': query,
'c': 'apps',
})
instead of:
self.params.update({
'q': quote_plus(query),
'c': 'apps',
})
To Reproduce
The input will be the developer web of Instagram: https://help.instagram.com/
When running res = play_scraper.search("https://help.instagram.com/", detailed=True)
Play_scraper does the following query:
/store/search?q=https%253A%252F%252Fhelp.instagram.com%252F&c=apps&gl=us&hl=en
Which is not right. If we look for that url in a browser, a query with encodings is written automatically in the searchbox:
Expected behavior
If we manually put https://help.instagram.com/
in the searchbox, the url will be:
https://play.google.com/store/search?q=https%3A%2F%2Fhelp.instagram.com%2F&c=apps&hl=en&gl=us
If we use the piece of code without quote_plus, the url that is searched for is exactly the same as the desired one.
Desktop (please complete the following information):
Hi!
Thanks for this package! I experience a problem with the most recent version:
Describe the bug
When using the code from the Github readme for requesting developer info, a 502 server error is raised.
To Reproduce
import play_scraper
print(play_scraper.developer('Disney', results=5))
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
<ipython-input-23-86155d48969c> in <module>()
----> 1 dev = play_scraper.developer('Disney', results = 5)
~\Anaconda3\lib\site-packages\play_scraper\api.py in developer(developer, hl, gl, **kwargs)
53 """
54 s = scraper.PlayScraper(hl, gl)
---> 55 return s.developer(developer, **kwargs)
56
57
~\Anaconda3\lib\site-packages\play_scraper\scraper.py in developer(self, developer, results, page, detailed)
158 url = build_url('developer', developer)
159 data = generate_post_data(results, 0, pagtok)
--> 160 response = send_request('POST', url, data, self.params)
161
162 if detailed:
~\Anaconda3\lib\site-packages\play_scraper\utils.py in send_request(method, url, data, params, headers, timeout, verify, allow_redirects)
119 allow_redirects=allow_redirects)
120 if not response.status_code == requests.codes.ok:
--> 121 response.raise_for_status()
122 except requests.exceptions.RequestException as e:
123 log.error(e)
~\Anaconda3\lib\site-packages\requests\models.py in raise_for_status(self)
938
939 if http_error_msg:
--> 940 raise HTTPError(http_error_msg, response=self)
941
942 def close(self):
HTTPError: 502 Server Error: Bad Gateway for url: https://play.google.com/store/apps/developer?id=Disney&hl=en&gl=us
Desktop (please complete the following information):
Additional context
I think the problem is due to the gl
parameter, might also be related to #21.
Describe the bug
Getting similar apps is broken, it does not return any apps. This also fails in the unit tests with AssertionError: 0 not greater than 0
To Reproduce
Easiest way is to run the unit tests.
Expected behavior
The tests do not fail and similar apps are returned when using the method.
Desktop (please complete the following information):
Any one know how to fix it?
Describe the bug
Scraper crashes due to outdated class name. This issue causes an Exception to be raised later in the parsing process. This exception is not caught and crashes the scraper.
Traceback (most recent call last):
File "", line 1, in
File "site-packages/play_scraper/api.py", line 22, in details
return s.details(app_id)
File "site-packages/play_scraper/scraper.py", line 83, in details
app_json = parse_app_details(soup)
File "site-packages/play_scraper/utils.py", line 312, in parse_app_details
soup.select_one('.xyOfqd'))
File "site-packages/play_scraper/utils.py", line 138, in parse_additional_info
section_titles_divs = [x for x in soup.select('div.hAyfc div.BgcNfc')]
AttributeError: 'NoneType' object has no attribute 'select'
To Reproduce
Request details of any package name.
Expected behavior
Screenshots
Not applicable
Desktop (please complete the following information):
Additional context
While obtaining details about an app today (2019-03-26) I have noticed, that tool starts crashing on utils.py, line 312 (... soup.select_one('.xyOfqd')))
Describe the bug
Attempting to run the example described in the README:
print(play_scraper.details('com.android.chrome'))
returns a AttributeError.
To Reproduce
import play_scraper
print(play_scraper.details('com.android.chrome'))
Desktop (please complete the following information):
Additional context
Seems related to Issue #2
icon = (soup.select_one('.dQrBL img.ujDFqe')
icon = (soup.select_one ('. XSyT2c
img.T75of')` as suggested in Issue 42, the error was still not resolved.play-scraper/play_scraper/api.py
Line 25 in 6059c47
I am getting RecursionError: maximum recursion depth exceeded
error. I think you if you change the variable name to something else, that will fix the issue. Thanks.
Hello,
trying to install the package:
pip install play-scraper --install-option="--prefix=/airflow"
is causing this error:
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-3jPoyb/play-scraper/setup.py", line 10, in
with open('README.md', 'r', 'utf-8') as f:
File "/usr/lib64/python2.7/codecs.py", line 881, in open
file = builtin.open(filename, mode, buffering)
IOError: [Errno 2] No such file or directory: 'README.md'
how can we solve?
P.S. installing without the option works fine
thanks,
Lorenzo
Hi.
When I try the search function the results are always the same:
pprint(p.search('tinder', page=1))
pprint(p.search('tinder', page=12))
Both calls gives me the same results.
Like a https://github.com/facundoolano/google-play-scraper/blob/dev/lib/permissions.js
Maybe I'll do in late October 🤷♂️
Describe the bug
Tests are failing. There are 404 and 405 issues which seems to result in assertion errors.
To Reproduce
$ python3 setup.py test
Expected behavior
Passing of all tests.
Desktop (please complete the following information):
Additional context
Executing(%check): /bin/sh -e /var/tmp/rpm-tmp.HGuyQ3 + umask 022 + cd /home/fab/rpmbuild/BUILD + cd play-scraper-0.6.0 + /usr/bin/python3 setup.py test running test Searching for requests-futures>=0.9.7 Reading https://pypi.org/simple/requests-futures/ Downloading https://files.pythonhosted.org/packages/47/c4/fd48d1ac5110a5457c71ac7cc4caa93da10a80b8de71112430e439bdee22/requests-futures-1.0.0.tar.gz#sha256=35547502bf1958044716a03a2f47092a89efe8f9789ab0c4c528d9c9c30bc148 Best match: requests-futures 1.0.0 Processing requests-futures-1.0.0.tar.gz Writing /tmp/easy_install-87wce9d5/requests-futures-1.0.0/setup.cfg Running requests-futures-1.0.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-87wce9d5/requests-futures-1.0.0/egg-dist-tmp-kjhu22lj creating /home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/.eggs/requests_futures-1.0.0-py3.7.egg Extracting requests_futures-1.0.0-py3.7.egg to /home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/.eggsInstalled /home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/.eggs/requests_futures-1.0.0-py3.7.egg
running egg_info
writing play_scraper.egg-info/PKG-INFO
writing dependency_links to play_scraper.egg-info/dependency_links.txt
writing requirements to play_scraper.egg-info/requires.txt
writing top-level names to play_scraper.egg-info/top_level.txt
reading manifest file 'play_scraper.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'play_scraper.egg-info/SOURCES.txt'
running build_ext
test_categories_ok (tests.test_scraper.CategoryTest) ... ok
test_different_language_and_country (tests.test_scraper.CategoryTest) ... ok
test_default_num_results (tests.test_scraper.CollectionTest) ... ERROR
test_detailed_collection (tests.test_scraper.CollectionTest) ... FAIL
test_detailed_collection_different_language (tests.test_scraper.CollectionTest) ... FAIL
test_family_with_age_collection (tests.test_scraper.CollectionTest) ... ERROR
test_invalid_category_id (tests.test_scraper.CollectionTest) ... ok
test_invalid_collection_id (tests.test_scraper.CollectionTest) ... ok
test_invalid_num_results_over_120 (tests.test_scraper.CollectionTest) ... ok
test_invalid_page_x_results_over_500 (tests.test_scraper.CollectionTest) ... ok
test_non_detailed_collection (tests.test_scraper.CollectionTest) ... FAIL
test_non_detailed_different_language_and_country (tests.test_scraper.CollectionTest) ... ERROR
test_promotion_collection_id (tests.test_scraper.CollectionTest) ... FAIL
test_app_with_no_developer (tests.test_scraper.DetailsTest) ... ERROR
test_fetching_app_in_spanish (tests.test_scraper.DetailsTest) ... ok
test_fetching_app_with_all_details (tests.test_scraper.DetailsTest) ... ok
test_developer_parameter_float_invalid (tests.test_scraper.DeveloperTest) ... ok
test_developer_parameter_int_invalid (tests.test_scraper.DeveloperTest) ... ok
test_developer_parameter_long_invalid (tests.test_scraper.DeveloperTest) ... ok
test_developer_parameter_string_digits_invalid (tests.test_scraper.DeveloperTest) ... ok
test_different_language_and_country (tests.test_scraper.DeveloperTest) ... ERROR
test_fetch_developer_apps_detailed (tests.test_scraper.DeveloperTest) ... ERROR
test_fetching_developer_default_results (tests.test_scraper.DeveloperTest) ... ERROR
test_maximum_results (tests.test_scraper.DeveloperTest) ... ERROR
test_over_max_results_fetches_five (tests.test_scraper.DeveloperTest) ... ERROR
test_page_out_of_range (tests.test_scraper.DeveloperTest) ... ok
test_init_with_defaults (tests.test_scraper.PlayScraperTest) ... ok
test_init_with_language_and_geolocation (tests.test_scraper.PlayScraperTest) ... ok
test_invalid_geolocation_code_raises (tests.test_scraper.PlayScraperTest) ... ok
test_invalid_language_code_raises (tests.test_scraper.PlayScraperTest) ... ok
test_basic_search (tests.test_scraper.SearchTest) ... ok
test_different_language_and_country (tests.test_scraper.SearchTest) ... ok
test_page_out_of_range_not_between_0_and_12 (tests.test_scraper.SearchTest) ... ok
test_search_with_app_detailed (tests.test_scraper.SearchTest) ... /home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=12, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47640), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=9, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50764), raddr=('216.58.215.238', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50760), raddr=('216.58.215.238', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47632), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47626), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47628), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50768), raddr=('216.58.215.238', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47624), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47630), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=11, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47636), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
ok
test_different_language_and_country (tests.test_scraper.SimilarTest) ... ok
test_similar_ok (tests.test_scraper.SimilarTest) ... ok
test_similar_with_app_detailed (tests.test_scraper.SimilarTest) ... /home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47664), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47680), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47666), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=9, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50796), raddr=('216.58.215.238', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50794), raddr=('216.58.215.238', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50798), raddr=('216.58.215.238', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47662), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=11, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47678), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47674), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=12, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50802), raddr=('216.58.215.238', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
ok
test_different_language_and_country (tests.test_scraper.SuggestionTest) ... ok
test_empty_query (tests.test_scraper.SuggestionTest) ... ok
test_query_suggestions (tests.test_scraper.SuggestionTest) ... ok
test_list_url_both_args (tests.test_utils.TestBuildListUrl) ... ok
test_list_url_no_args (tests.test_utils.TestBuildListUrl) ... ok
test_list_url_only_category (tests.test_utils.TestBuildListUrl) ... ok
test_list_url_only_collection (tests.test_utils.TestBuildListUrl) ... ok
test_building_app_url (tests.test_utils.TestBuildUrl) ... ok
test_building_multiple_word_dev_name (tests.test_utils.TestBuildUrl) ... ok
test_building_simple_dev_name (tests.test_utils.TestBuildUrl) ... ok
test_default_post_data (tests.test_utils.TestGeneratePostData) ... ok
test_first_page_data (tests.test_utils.TestGeneratePostData) ... ok
test_only_num_results (tests.test_utils.TestGeneratePostData) ... ok
test_page_token (tests.test_utils.TestGeneratePostData) ... ok
test_request_with_params (tests.test_utils.TestSendRequest) ... ok
test_send_normal_request (tests.test_utils.TestSendRequest) ... ok======================================================================
ERROR: test_default_num_results (tests.test_scraper.CollectionTest)Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 223, in test_default_num_results
self.assertTrue(all(key in apps[0] for key in BASIC_KEYS))
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 223, in
self.assertTrue(all(key in apps[0] for key in BASIC_KEYS))
IndexError: list index out of range======================================================================
ERROR: test_family_with_age_collection (tests.test_scraper.CollectionTest)Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 231, in test_family_with_age_collection
age='SIX_EIGHT')
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 132, in collection
response = send_request('POST', url, data, self.params)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request
response.raise_for_status()
File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/category/FAMILY/collection/topselling_free?hl=en&gl=us&age=AGE_RANGE2======================================================================
ERROR: test_non_detailed_different_language_and_country (tests.test_scraper.CollectionTest)Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 191, in test_non_detailed_different_language_and_country
apps = s.collection('TOP_PAID', 'LIFESTYLE', results=5)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 132, in collection
response = send_request('POST', url, data, self.params)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request
response.raise_for_status()
File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/category/LIFESTYLE/collection/topselling_paid?hl=da&gl=dk======================================================================
ERROR: test_app_with_no_developer (tests.test_scraper.DetailsTest)Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 82, in details
response = send_request('GET', url, params=self.params)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request
response.raise_for_status()
File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/details?id=org.selfie.beauty.camera.pro&hl=en&gl=usDuring handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 156, in test_app_with_no_developer
app_data = self.s.details('org.selfie.beauty.camera.pro')
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 86, in details
app=app_id, error=e))
ValueError: Invalid application ID: org.selfie.beauty.camera.pro. 404 Client Error: Not Found for url: https://play.google.com/store/apps/details?id=org.selfie.beauty.camera.pro&hl=en&gl=us======================================================================
ERROR: test_different_language_and_country (tests.test_scraper.DeveloperTest)Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 302, in test_different_language_and_country
apps = s.developer('Google LLC', results=5)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 165, in developer
response = send_request('POST', url, data, self.params)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request
response.raise_for_status()
File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 405 Client Error: Method Not Allowed for url: https://play.google.com/store/apps/developer?id=Google+LLC&hl=da&gl=dk======================================================================
ERROR: test_fetch_developer_apps_detailed (tests.test_scraper.DeveloperTest)Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 294, in test_fetch_developer_apps_detailed
apps = self.s.developer('Disney', results=3, detailed=True)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 165, in developer
response = send_request('POST', url, data, self.params)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request
response.raise_for_status()
File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 405 Client Error: Method Not Allowed for url: https://play.google.com/store/apps/developer?id=Disney&hl=en&gl=us======================================================================
ERROR: test_fetching_developer_default_results (tests.test_scraper.DeveloperTest)Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 272, in test_fetching_developer_default_results
apps = self.s.developer('Disney')
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 165, in developer
response = send_request('POST', url, data, self.params)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request
response.raise_for_status()
File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 405 Client Error: Method Not Allowed for url: https://play.google.com/store/apps/developer?id=Disney&hl=en&gl=us======================================================================
ERROR: test_maximum_results (tests.test_scraper.DeveloperTest)Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 280, in test_maximum_results
apps = self.s.developer('Google LLC', results=120)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 165, in developer
response = send_request('POST', url, data, self.params)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request
response.raise_for_status()
File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 405 Client Error: Method Not Allowed for url: https://play.google.com/store/apps/developer?id=Google+LLC&hl=en&gl=us======================================================================
ERROR: test_over_max_results_fetches_five (tests.test_scraper.DeveloperTest)Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 287, in test_over_max_results_fetches_five
apps = self.s.developer('Google LLC', results=121)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 165, in developer
response = send_request('POST', url, data, self.params)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request
response.raise_for_status()
File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 405 Client Error: Method Not Allowed for url: https://play.google.com/store/apps/developer?id=Google+LLC&hl=en&gl=us======================================================================
FAIL: test_detailed_collection (tests.test_scraper.CollectionTest)Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 200, in test_detailed_collection
self.assertEqual(1, len(apps))
AssertionError: 1 != 0======================================================================
FAIL: test_detailed_collection_different_language (tests.test_scraper.CollectionTest)Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 210, in test_detailed_collection_different_language
self.assertEqual(1, len(apps))
AssertionError: 1 != 0======================================================================
FAIL: test_non_detailed_collection (tests.test_scraper.CollectionTest)Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 178, in test_non_detailed_collection
self.assertEqual(2, len(apps))
AssertionError: 2 != 0======================================================================
FAIL: test_promotion_collection_id (tests.test_scraper.CollectionTest)Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 241, in test_promotion_collection_id
self.assertEqual(2, len(apps))
AssertionError: 2 != 0
Ran 53 tests in 37.547s
FAILED (failures=4, errors=9)
Test failed: <unittest.runner.TextTestResult run=53 errors=9 failures=4>
error: Test failed: <unittest.runner.TextTestResult run=53 errors=9 failures=4>
error: Bad exit status from /var/tmp/rpm-tmp.HGuyQ3 (%check)
Describe the bug
always return None for description
and description_html
{'title': 'WiFi Map — Free Passwords & Hotspots', 'icon': 'https://lh3.googleusercontent.com/SnhTxeJgxWHS7AlwEl_QNa2lpwCNFsL3Dqrzl-jMqvwFzYMIZW5V3IHUWFU3bKe6N8Kg', 'screenshots': [], 'video': 'https://www.youtube.com/embed/yl95ZgDTYp0', 'category': ['TRAVEL_AND_LOCAL'], 'score': '4.3', 'histogram': {5: None, 4: None, 3: None, 2: None, 1: None}, 'reviews': 700308, 'description': None, 'description_html': None, 'recent_changes': None, 'editors_choice': False, 'price': '0', 'free': True, 'iap': True, 'developer_id': '8565181914239008089', 'updated': 'April 20, 2019', 'size': '38M', 'installs': '50,000,000+', 'current_version': '4.1.17', 'required_android_version': '4.3 and up', 'content_rating': ['Everyone'], 'iap_range': ('$0.99', '$19.99'), 'interactive_elements': ['Users Interact, Digital Purchases'], 'developer': 'WiFi Map LLC', 'developer_email': '[email protected]', 'developer_url': 'https://www.wifimap.io/', 'developer_address': '25 Broadway, 9th Floor\nNew York, NY 10004', 'app_id': 'io.wifimap.wifimap', 'url': 'https://play.google.com/store/apps/details?id=io.wifimap.wifimap'}
To Reproduce
>>> import play_scraper
>>> play_scraper.details('io.wifimap.wifimap')
Expected behavior
fetch description
and description_html
correctly.
Screenshots
N/A
Desktop (please complete the following information):
Additional context
Add any other context about the problem here.
Hello @danieliu,
First, thank you for that great module.
Then, could you replace the ==
of the requirements.txt
by >=
? Unless your module does not need these exact versions, I think it can be compatible with newer versions.
Cheers !
The categories "APP_WALLPAPER","TRANSPORTATION","MEDIA_AND_VIDEO","APP_WIDGETS" have no collection defined for them as their url are not there in google play store.
Error:HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/category/APP_WALLPAPER/collection/movers_shakers
Describe the bug
Google probably made update yesterday causing detail
method to stop working
play_scraper/utils.py", line 240, in parse_app_details\n icon = (soup.select_one('.dQrBL img.ujDFqe')\nAttributeError: 'NoneType' object has no attribute 'attrs''
To Reproduce
Call detail
method of with any package name
Desktop (please complete the following information):
Tried running a call in python shell. Error --> RecursionError: maximum recursion depth exceeded while calling a Python object. See trace below:
>>> print(play_scraper.details('com.android.chrome'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/play_scraper/api.py", line 22, in details
return s.details(app_id)
File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/play_scraper/scraper.py", line 292, in details
response = send_request('GET', url)
File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/play_scraper/utils.py", line 120, in send_request
verify=verify)
File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
timeout=timeout
File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/urllib3/connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/urllib3/connectionpool.py", line 850, in _validate_conn
conn.connect()
File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/urllib3/connection.py", line 314, in connect
cert_reqs=resolve_cert_reqs(self.cert_reqs),
File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 269, in create_urllib3_context
context.options |= options
File "/usr/lib/python3.6/ssl.py", line 465, in options
super(SSLContext, SSLContext).options.__set__(self, value)
File "/usr/lib/python3.6/ssl.py", line 465, in options
super(SSLContext, SSLContext).options.__set__(self, value)
File "/usr/lib/python3.6/ssl.py", line 465, in options
super(SSLContext, SSLContext).options.__set__(self, value)
[Previous line repeated 323 more times]
RecursionError: maximum recursion depth exceeded while calling a Python object
Hi, thanks for this cool scraper.
I just want to mention that the output data is not in a clean json format. when i used python json parsing library it shows error.
Thanks!
PlayScraper.categories()
returns empty list - Google has again changed the class attributes of anchor elements.
PR submitted
Passing in a different hl
results in the alt
attributes and the titles in the additional details section to change.
Selectors should be generalized to use obfuscated classes or other attributes that do not change between languages.
The additional details section HTML isn't specific enough for a clear way to differentiate between subsections and correctly parse the data out, unfortunately.
Traceback (most recent call last): File "test_desc.py", line 2, in <module> print (play_scraper.details('com.z2p.devops.mathpuzzel')) File "/home/vmstation/.local/lib/python2.7/site-packages/play_scraper/api.py", line 22, in details return s.details(app_id) File "/home/vmstation/.local/lib/python2.7/site-packages/play_scraper/scraper.py", line 341, in details app_json = self._parse_app_details(soup) File "/home/vmstation/.local/lib/python2.7/site-packages/play_scraper/scraper.py", line 254, in _parse_app_details soup.select_one('.xyOfqd')) File "/home/vmstation/.local/lib/python2.7/site-packages/play_scraper/scraper.py", line 168, in _parse_additional_info 'developer_address': developer_address.strip()} AttributeError: 'NoneType' object has no attribute 'strip'
Describe the bug
In rare situations, an app will be listed as a result in the search function, while the app actually has been (temporarily) removed from the Play store. When using the detailed=True argument; the package will throw an error once the missing app is scraped, as it tries to access the actual app page.
To Reproduce
Steps to reproduce the behavior, e.g. the full example code, not just a snippet of where the error occurs!
$ print(play_scraper.search('CAUTI', gl='nl', detailed='True', page=6))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/api.py", line 79, in search
return s.search(query, page, detailed)
File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/scraper.py", line 224, in search
apps = self._parse_multiple_apps(response)
File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/scraper.py", line 71, in _parse_multiple_apps
return multi_futures_app_request(app_ids, params=self.params)
File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/utils.py", line 531, in multi_futures_app_request
result = response.result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
result = self.fn(*self.args, **self.kwargs)
File "(blahblah)/.env/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "(blahblah)/.env/lib/python3.6/site-packages/requests/sessions.py", line 653, in send
r = dispatch_hook('response', hooks, r, **kwargs)
File "(blahblah)/.env/lib/python3.6/site-packages/requests/hooks.py", line 31, in dispatch_hook
_hook_data = hook(hook_data, **kwargs)
File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/utils.py", line 504, in parse_app_details_response_hook
details = parse_app_details(soup)
File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/utils.py", line 239, in parse_app_details
title = soup.select_one('h1[itemprop="name"] span').text
AttributeError: 'NoneType' object has no attribute 'text'
In the original usecase (function that iterated over the pages using celery) the following error was thrown as well:
[2019-10-28 10:48:41,362: ERROR/ForkPoolWorker-1] Error occurred fetching uk.incrediblesoftware.mpcmachine.demo: 404 Client Error: Not Found for url: https://play.google.com/store/apps/details?id=uk.incrediblesoftware.mpcmachine.demo&hl=en&gl=nl&q=CAUTI&c=apps
From this I tried to check out the actual play store page for uk.incrediblesoftware.mpcmachine.demo
; which as expected, throws an HTTP 404 error.
Expected behavior
I hoped the package would print the 404-error; skip over this one and still return the remaining results. I can catch errors in my code to prevent problems, but that way an entire page of apps will still be excluded from the results.
Desktop (please complete the following information):
Is your feature request related to a problem? Please describe.
play-scraper did not support proxy
so if a program is using play-scraper is behind a proxy, it will fail
Describe the solution you'd like
proxy support in play-scraper
Describe the bug
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/category/GAME_RACING/collection/movers_shakers?gl=us&hl=en
To Reproduce
import play_scraper
print(play_scraper.collection(
... collection='TRENDING',
... category='GAME_RACING',
... results=5,
... page=1))
Expected behavior
list of trending games
Desktop (please complete the following information):
search returns nothing
Items in resulting dict are BeautifulSoup entities, not primitives as user may think after looking on examples.
I'm using this lib with multiprocessing. And I found out that results cannot be pickled by multiprocessing's map (RecursionError). I propose converting values in resulting dict into true primitives.
Hi,
this is already a pretty nice package, but one addition would make it even better: an option for collecting information about the reviews of users:
There is a lot of stuff that could be done with reviewer information, for instance constructing relations between apps and their users (reviewers), or examining whether a fixed core of users is producing a lot of positive / negative reviews for some content.
I would like to know if I could extract all reviews of an app.
Thanks
Hi,
I already had contact with daniel and he told me to put this issue here.
I am using:
The rest seems to work fine. Atleast for the developer() method.
Code:
app_id = "com.igg.android.lordsmobile"
details = play_scraper.details(app_id)
print(details["developer"])
It keeps giving me Google Commerce Ltd for every app_id I put in.
The Google Commerce Ltd can been found on the bottom of each app page.
Offered By
Google Commerce Ltd
Maybe this has something to do with it?
Hi!
The library is great so far and it is helping me a lot in one of my projects. I just have one question. In the search function there is now a limit of 12 to the number of pages. I have noticed that this is related to the PAGE_TOKENS in the settings. My question is: how can it be augmented to retrieve an arbitrary amount of page results?
Thanks!
I tried running the sample code from the readme:
import play_scraper
print(play_scraper.details('com.android.chrome'))
I get the following output:
Traceback (most recent call last):
File "bug.py", line 2, in <module>
print(play_scraper.details('com.android.chrome'))
File "/usr/local/lib/python3.7/site-packages/play_scraper/api.py", line 22, in details
return s.details(app_id)
File "/usr/local/lib/python3.7/site-packages/play_scraper/scraper.py", line 83, in details
app_json = parse_app_details(soup)
File "/usr/local/lib/python3.7/site-packages/play_scraper/utils.py", line 312, in parse_app_details
soup.select_one('.xyOfqd'))
File "/usr/local/lib/python3.7/site-packages/play_scraper/utils.py", line 138, in parse_additional_info
section_titles_divs = [x for x in soup.select('div.hAyfc div.BgcNfc')]
AttributeError: 'NoneType' object has no attribute 'select'
Desktop (please complete the following information):
Hi,
This is what i running:
import play_scraper
import csv
import json
data=( play_scraper.search("Disney", page=1, detailed=True))
print(data.count(data))
also same when running:
print play_scraper.details('com.android.chrome')
Getting errors:
data=( play_scraper.search("Disney", page=1, detailed=True))
File "C:\Python27\lib\site-packages\play_scraper\api.py", line 79, in search
return s.search(query, **kwargs)
File "C:\Python27\lib\site-packages\play_scraper\scraper.py", line 415, in search
apps = self._parse_multiple_apps(response)
File "C:\Python27\lib\site-packages\play_scraper\scraper.py", line 273, in _parse_multiple_apps
apps.append(self._parse_app_details(soup))
File "C:\Python27\lib\site-packages\play_scraper\scraper.py", line 155, in _parse_app_details
updated = additional_info.select_one('div[itemprop="datePublished"]')
AttributeError: 'NoneType' object has no attribute 'select_one'
Thanks,
Really it is great post. Thank you so much.
As I am very new to python , I am unable to save the output to csv file for filtering based on number of installations.
Edit : not a problem with the library, the webpage itself is wrong when using a non default gf. So I guess nothing you can do about it (by the way thanks for this very useful library). Workaround I used : going through the list of apps scraped a second time using gl='us'.
When using play_scraper.collection() with gl<>'us' (ex:gl='fr') and with detailed=True, the developer field is always 'Google Commerce Ltd'.
Developer field is correct when using default gl.
Describe the bug
play_scraper/constants.py - GL_COUNTRY_CODES list, any gl= must match, but this does not match Google's own country code list at https://developers.google.com/public-data/docs/canonical/countries_csv, which leads to 404 errors on countries listed in GL_COUNTRY_CODES and valid country codes being rejected.
To Reproduce
Run any play_scraper call with a gl="kp", valid according to play-scraper, not valid according to Google:
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/developer?id=&gl=kp&hl=en
Run play_scraper with gl='im' results in error from play-scraper:
ValueError: im is not a valid geolocation country code.
However https://play.google.com/store/apps/developer?id=&gl=im results in valid content from Google.
Expected behavior
More maintained GL_COUNTRY_CODE list, or allowing overrides to not validate internally.
when I invoke 'python3 -m pip install play_scraper', it say:
command "/Library/Frameworks/Python.framework/Versions/3.7/bin/python3 -u -c "import setuptools, tokenize;__file__='/private/tmp/pip-install-cj268d40/lxml/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/tmp/pip-record-mzwfg42a/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/tmp/pip-install-cj268d40/lxml/
import play_scraper
print play_scraper.details('com.android.chrome')
File "<ipython-input-8-60b6c1359646>", line 2
print play_scraper.details('com.android.chrome')
^
SyntaxError: invalid syntax
Running on Mac OS with python:
3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
Describe the bug
While requesting NEW_FREE collection I get IndexError exception:
File ".../python2.7/site-packages/play_scraper/api.py", line 41, in collection
return s.collection(collection, category, **kwargs)
File ".../python2.7/site-packages/play_scraper/scraper.py", line 134, in collection
for app_card in soup.select('div[data-uitype="500"]')]
File ".../python2.7/site-packages/play_scraper/utils.py", line 358, in parse_card_info
developer_id = dev_soup.attrs['href'].split('=')[1]
IndexError: list index out of range
To Reproduce
run play_scraper.collection(collection='NEW_FREE',results=100,page=0,gl='us')
Expected behavior
Should receive list of app metadata from specified Chart in Google Play
Screenshots
Not applicable
Desktop (please complete the following information):
Additional context
It seems, like the scraper is getting wrong response from Google Play and cannot parse it correctly.
Describe the bug
screenshot attribute always empty array, description, description_html
To Reproduce
location = "kr"
lang="ko"
ajson = play_scraper.details("com.dena.a12026418",gl=location,hl=lang)
print(ajson)
--------------- return -------------------
{'title': 'Pokémon Masters', 'icon': 'https://lh3.googleusercontent.com/Qow956nxep_gy5lWMRXd7hTX-SUE-m8Un4etpm6o1A3AAjFvesAq-YyM1Fy9qjr1uZBe', 'screenshots': [], 'video': 'https://www.youtube.com/embed/FV2ISpwZRck', 'category': ['GAME_ROLE_PLAYING'], 'score': '3.8', 'histogram': {}, 'reviews': 0, 'description': None, 'description_html': None, 'recent_changes': None, 'editors_choice': False, 'price': '0', 'free': True, 'iap': False, 'developer_id': '5614074995304947897', 'updated': None, 'size': None, 'installs': None, 'current_version': None, 'required_android_version': None, 'content_rating': None, 'iap_range': None, 'interactive_elements': None, 'developer': None, 'developer_email': None, 'developer_url': None, 'developer_address': None, 'app_id': 'com.dena.a12026418', 'url': 'https://play.google.com/store/apps/details?id=com.dena.a12026418'}
In this page, we found screen shot and description html
https://play.google.com/store/apps/details?id=com.dena.a12026418&hl=ko&gl=kr
Expected behavior
screen shot array and description and description_html appear
Desktop (please complete the following information):
Describe the bug
The scraper crash because selector css not found
To Reproduce
play_scraper.details('whatever_id')
'NoneType' object has no attribute 'attrs'
Expected behavior
No crash when selector not found
Desktop
The issue is on utils.py line 241
soup.select_one('.dQrBL img.ujDFqe')
Describe the bug
What's New / "recent_changes" returns None even though it exists.
To Reproduce
import play_scraper as ps
data = ps.details('com.supercell.clashofclans')
print(data['recent_changes'])
Expected behavior
Should return the value for 'what's new' when it exists.
Screenshots
Not required
Desktop (please complete the following information):
Additional context
Issue can be resolved by changing code to
recent_changes = changes_soup.text
com.Rain.Teslagrad
for egs in gPlayBilgi['screenshots']:
print("Ekran:",egs)
Output:
data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==
Got this error after installing play_scraper and try to import it
please help,
thanks
[root@CT114 ~]# python
Python 2.7.5 (default, Aug 4 2017, 00:39:18)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import play_scraper
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/site-packages/play_scraper/__init__.py", line 13, in <module>
from play_scraper.api import (
File "/usr/lib/python2.7/site-packages/play_scraper/api.py", line 11, in <module>
from play_scraper import scraper
File "/usr/lib/python2.7/site-packages/play_scraper/scraper.py", line 15, in <module>
from bs4 import BeautifulSoup, SoupStrainer
File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 30, in <module>
from .builder import builder_registry, ParserRejectedMarkup
File "/usr/lib/python2.7/site-packages/bs4/builder/__init__.py", line 314, in <module>
from . import _html5lib
File "/usr/lib/python2.7/site-packages/bs4/builder/_html5lib.py", line 70, in <module>
class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder):
AttributeError: 'module' object has no attribute '_base'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.