Comments (1)
Sorry, I cannot reproduce.
$ scrapy version -v
Scrapy : 2.11.2
lxml : 4.9.3.0
libxml2 : 2.10.3
cssselect : 1.2.0
parsel : 1.8.1
w3lib : 2.1.2
Twisted : 24.3.0
Python : 3.10.10 (main, Feb 16 2023, 02:58:25) [Clang 14.0.0 (clang-1400.0.29.202)]
pyOpenSSL : 23.2.0 (OpenSSL 3.1.2 1 Aug 2023)
cryptography : 41.0.3
Platform : macOS-14.4.1-x86_64-i386-64bit
$ python -c "import scrapy_playwright; print(scrapy_playwright.__version__)"
0.0.35
import scrapy
def should_abort_request(request):
return request.resource_type == "image" or ".jpg" in request.url
class ExampleSpider(scrapy.Spider):
name = "example"
custom_settings = {
"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
"DOWNLOAD_HANDLERS": {
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
},
"PLAYWRIGHT_BROWSER_TYPE": "webkit",
"PLAYWRIGHT_ABORT_REQUEST": should_abort_request,
}
def start_requests(self):
yield scrapy.Request(
url="https://books.toscrape.com",
meta={
"playwright": True,
"playwright_page_goto_kwargs": {"wait_until": "networkidle"},
},
)
def parse(self, response):
yield {"url": response.url}
(...)
2024-06-03 22:17:29 [scrapy.core.engine] INFO: Spider opened
2024-06-03 22:17:29 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-06-03 22:17:29 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2024-06-03 22:17:29 [scrapy-playwright] INFO: Starting download handler
2024-06-03 22:17:34 [scrapy-playwright] INFO: Launching browser webkit
2024-06-03 22:17:34 [scrapy-playwright] INFO: Browser webkit launched
2024-06-03 22:17:35 [scrapy-playwright] DEBUG: Browser context started: 'default' (persistent=False, remote=False)
2024-06-03 22:17:35 [scrapy-playwright] DEBUG: [Context=default] New page created, page count is 1 (1 for all contexts)
2024-06-03 22:17:35 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/> (resource type: document)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/static/oscar/css/styles.css> (resource type: stylesheet, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/static/oscar/js/bootstrap-datetimepicker/bootstrap-datetimepicker.css> (resource type: stylesheet, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/static/oscar/css/datetimepicker.css> (resource type: stylesheet, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/26/0c/260c6ae16bce31c8f8c95daddd9f4a1c.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/3e/ef/3eef99c9d9adef34639f510662022830.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/26/0c/260c6ae16bce31c8f8c95daddd9f4a1c.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/32/51/3251cf3a3412f53f339e42cac2134093.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/3e/ef/3eef99c9d9adef34639f510662022830.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/be/a5/bea5697f2534a2f86a3ef27b5a8c12a6.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/32/51/3251cf3a3412f53f339e42cac2134093.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/68/33/68339b4c9bc034267e1da611ab3b34f8.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/be/a5/bea5697f2534a2f86a3ef27b5a8c12a6.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/92/27/92274a95b7c251fea59a2b8a78275ab4.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/68/33/68339b4c9bc034267e1da611ab3b34f8.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/3d/54/3d54940e57e662c4dd1f3ff00c78cc64.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/92/27/92274a95b7c251fea59a2b8a78275ab4.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/66/88/66883b91f6804b2323c8369331cb7dd1.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/3d/54/3d54940e57e662c4dd1f3ff00c78cc64.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/66/88/66883b91f6804b2323c8369331cb7dd1.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/58/46/5846057e28022268153beff6d352b06c.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/be/f4/bef44da28c98f905a3ebec0b87be8530.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/10/48/1048f63d3b5061cd2f424d20b3f9b666.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/5b/88/5b88c52633f53cacf162c15f4f823153.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/94/b1/94b1b8b244bce9677c2f29ccc890d4d2.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/81/c4/81c4a973364e17d01f217e1188253d5e.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/58/46/5846057e28022268153beff6d352b06c.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/be/f4/bef44da28c98f905a3ebec0b87be8530.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/10/48/1048f63d3b5061cd2f424d20b3f9b666.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/5b/88/5b88c52633f53cacf162c15f4f823153.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/54/60/54607fe8945897cdcced0044103b10b6.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/94/b1/94b1b8b244bce9677c2f29ccc890d4d2.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/55/33/553310a7162dfbc2c6d19a84da0df9e1.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/09/a3/09a3aef48557576e1a85ba7efea8ecb7.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/81/c4/81c4a973364e17d01f217e1188253d5e.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/54/60/54607fe8945897cdcced0044103b10b6.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/0b/bc/0bbcd0a6f4bcd81ccb1049a52736406e.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/55/33/553310a7162dfbc2c6d19a84da0df9e1.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/media/cache/27/a5/27a53d0bb95bdd88288eaf66c9230d7e.jpg> (resource type: image, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/static/oscar/js/bootstrap3/bootstrap.min.js> (resource type: script, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/static/oscar/js/oscar/ui.js> (resource type: script, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/static/oscar/js/bootstrap-datetimepicker/bootstrap-datetimepicker.js> (resource type: script, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/static/oscar/js/bootstrap-datetimepicker/locales/bootstrap-datetimepicker.all.js> (resource type: script, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/static/oscar/css/styles.css>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/static/oscar/css/datetimepicker.css>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/static/oscar/js/bootstrap-datetimepicker/bootstrap-datetimepicker.css>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/static/oscar/js/jquery/jquery-1.9.1.min.js> (resource type: script, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/09/a3/09a3aef48557576e1a85ba7efea8ecb7.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/0b/bc/0bbcd0a6f4bcd81ccb1049a52736406e.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Request: <GET https://books.toscrape.com/static/oscar/fonts/fontawesome-webfont.woff%3Fv=3.2.1> (resource type: font, referrer: https://books.toscrape.com/)
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Aborted Playwright request <GET https://books.toscrape.com/media/cache/27/a5/27a53d0bb95bdd88288eaf66c9230d7e.jpg>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/static/oscar/js/bootstrap3/bootstrap.min.js>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/static/oscar/js/bootstrap-datetimepicker/bootstrap-datetimepicker.js>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/static/oscar/js/oscar/ui.js>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/static/oscar/js/bootstrap-datetimepicker/locales/bootstrap-datetimepicker.all.js>
2024-06-03 22:17:36 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/static/oscar/fonts/fontawesome-webfont.woff%3Fv=3.2.1>
2024-06-03 22:17:37 [scrapy-playwright] DEBUG: [Context=default] Response: <200 https://books.toscrape.com/static/oscar/js/jquery/jquery-1.9.1.min.js>
2024-06-03 22:17:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com> (referer: None) ['playwright']
2024-06-03 22:17:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://books.toscrape.com/>
{'url': 'https://books.toscrape.com/'}
2024-06-03 22:17:37 [scrapy.core.engine] INFO: Closing spider (finished)
2024-06-03 22:17:37 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 219,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 51287,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 8.153309,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2024, 6, 4, 1, 17, 37, 725425, tzinfo=datetime.timezone.utc),
'item_scraped_count': 1,
'log_count/DEBUG': 67,
'log_count/INFO': 13,
'log_count/WARNING': 1,
'memusage/max': 57114624,
'memusage/startup': 57110528,
'playwright/context_count': 1,
'playwright/context_count/max_concurrent': 1,
'playwright/context_count/persistent/False': 1,
'playwright/context_count/remote/False': 1,
'playwright/page_count': 1,
'playwright/page_count/closed': 1,
'playwright/page_count/max_concurrent': 1,
'playwright/request_count': 30,
'playwright/request_count/aborted': 20,
'playwright/request_count/method/GET': 30,
'playwright/request_count/navigation': 1,
'playwright/request_count/resource_type/document': 1,
'playwright/request_count/resource_type/font': 1,
'playwright/request_count/resource_type/image': 20,
'playwright/request_count/resource_type/script': 5,
'playwright/request_count/resource_type/stylesheet': 3,
'playwright/response_count': 10,
'playwright/response_count/method/GET': 10,
'playwright/response_count/resource_type/document': 1,
'playwright/response_count/resource_type/font': 1,
'playwright/response_count/resource_type/script': 5,
'playwright/response_count/resource_type/stylesheet': 3,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2024, 6, 4, 1, 17, 29, 572116, tzinfo=datetime.timezone.utc)}
2024-06-03 22:17:37 [scrapy.core.engine] INFO: Spider closed (finished)
2024-06-03 22:17:37 [scrapy-playwright] INFO: Closing download handler
2024-06-03 22:17:37 [scrapy-playwright] DEBUG: Browser context closed: 'default' (persistent=False, remote=False)
2024-06-03 22:17:37 [scrapy-playwright] INFO: Closing browser
Notice the "Aborted Playwright request" log lines and the 'playwright/request_count/aborted': 20,
entry in the job stats.
from scrapy-playwright.
Related Issues (20)
- how to use playwright with SitemapSpider HOT 1
- Supporting for Windows HOT 7
- Playwright consistently captures a 404 request code despite the successful loading of the product on Target(retailer website). HOT 2
- Scrapy playwright infinite scroll isn't working HOT 1
- Question from the optimizing Scrapy with Playwright for Concurrent Page Handling and Response Capture with async_generator TypeError HOT 9
- Modifying headers when sending out the request HOT 3
- Unable to save downloaded file HOT 2
- How to disable file download? HOT 3
- Contracts and testing best practices with Scrapy-Playwright HOT 2
- Page refreshes unexpectedly on form submission instead of loading dynamic results HOT 3
- Issue running scrape on Mac HOT 2
- Inconsistent behavior between scrapy_playwright and playwright when accessing web pages HOT 3
- Scrapy hangs with no exception raised HOT 5
- my URL changes when scrapy.request HOT 1
- awswaf challenge http status 202 HOT 6
- This is not an issue, rather it is a question regarding cookies HOT 1
- Javascript not triggering in ASPX web page (works in regular playwright). HOT 7
- KeyError: 'playwright_page' HOT 8
- Activating chromium_sandbox in PLAYWRIGHT_LAUNCH_OPTIONS HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scrapy-playwright.