My code
from scrape_amazon import get_reviews
reviews = get_reviews('in','B09RMG1M98')
Error log :
[INFO] Scraping Reviews of Amazon ProductID - B09RMG1M98
[scrape-amazon] - Amazon.in:Customer reviews: realme narzo 50 (Speed Black, 4GB RAM+64GB Storage) Helio G96 Processor | 50MP AI Triple Camera | 120Hz Ultra Smooth Display
[scrape-amazon] Total Pages - 78
[scrape-amazon] Total Reviews - 773
71%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | 55/78 [02:21<00:58, 2.56s/it]
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/urllib3/connection.py", line 175, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/urllib3/util/connection.py", line 95, in create_connection
raise err
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/urllib3/util/connection.py", line 85, in create_connection
sock.connect(sa)
OSError: [Errno 101] Network is unreachable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 710, in urlopen
chunked=chunked,
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 386, in _make_request
self._validate_conn(conn)
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
conn.connect()
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/urllib3/connection.py", line 358, in connect
self.sock = conn = self._new_conn()
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/urllib3/connection.py", line 187, in _new_conn
self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f7c8b9c1780>: Failed to establish a new connection: [Errno 101] Network is unreachable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/requests/adapters.py", line 450, in send
timeout=timeout
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/urllib3/connectionpool.py", line 788, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.amazon.in', port=443): Max retries exceeded with url: /dp/product-reviews/B09RMG1M98?pageNumber=56 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f7c8b9c1780>: Failed to establish a new connection: [Errno 101] Network is unreachable',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/multiprocess/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/pathos/helpers/mp_helper.py", line 15, in
func = lambda args: f(*args)
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/scrape_amazon/util/scrape.py", line 27, in extractPage
r = get_URL(url)
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/scrape_amazon/util/urlFunctions.py", line 30, in get_URL
content: str = requests.get(url, headers={"User-Agent": user_agent})
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/requests/sessions.py", line 529, in request
resp = self.send(prep, **send_kwargs)
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/requests/sessions.py", line 645, in send
r = adapter.send(request, **kwargs)
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/requests/adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.amazon.in', port=443): Max retries exceeded with url: /dp/product-reviews/B09RMG1M98?pageNumber=56 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f7c8b9c1780>: Failed to establish a new connection: [Errno 101] Network is unreachable',))
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/tops/tp/scrapping.py", line 3, in
reviews = get_reviews('in','B09RMG1M98')
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/scrape_amazon/scraper.py", line 17, in get_reviews
return scrape_reviews(all_reviews_url, domain)
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/scrape_amazon/util/scrape.py", line 132, in scrape_reviews
results = p_map(extractPage, urlsToFetch)
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/p_tqdm/p_tqdm.py", line 65, in p_map
result = list(generator)
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/p_tqdm/p_tqdm.py", line 54, in _parallel
for item in tqdm_func(map_func(function, *iterables), total=length, **kwargs):
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/tqdm/std.py", line 1195, in iter
for obj in iterable:
File "/home/tops/environments/tp_env/lib/python3.6/site-packages/multiprocess/pool.py", line 735, in next
raise value
requests.exceptions.ConnectionError: None: Max retries exceeded with url: /dp/product-reviews/B09RMG1M98?pageNumber=56 (Caused by None)
My guess is that the requests are getting blocked after a certain no of consecutive attempts. Please let me know if there is a solution