Comments (4)
There is a options.disableProxy
parameter. We should either disable it by default or change integration with Apify Proxy.
from crawlee.
Apify.launchPuppeteer()
should not require any proxy when running locally but probably should use it by default when running on Apify cloud, while enabling the user to disable it. The parameter options.disableProxy
is not great, because it's not obvious it's Apify proxy - we should rename it to something like avoidApifyProxy
.
from crawlee.
So what about to add options.useApifyProxy
and by default to run without proxy? Thats expected behavior for SDK.
We need this configuration at multiple levels:
Apify.launchPuppeteer()
Apify.PuppeteerPool
Apify.PuppeteerCrawler
So maybe this could be a part of launchPuppeteer's options and we can add parameter opts.launchPuppeteerOptions
of type Object to both Apify.PuppeteerPool
and Apify.PuppeteerCrawler
.
from crawlee.
Yeah, that sounds good! BTW this is related to TODOs in launchPuppeteerFunction
- see https://github.com/apifytech/apify-js/blob/master/src/puppeteer_pool.js#L34 we should move all these options into this object.
from crawlee.
Related Issues (20)
- Typescript issue with 3.7 HOT 5
- Crawlee docs - the default values are wrongly displayed HOT 1
- scrape page count is exceed maxRequestsPerCrawl too much
- Show line numbers in code blocks on Crawlee docs
- No links are being enqueued on some pages HOT 3
- Playwright requires installation via `npx playwright install` HOT 13
- Issue Downgrading from Crawlee 3.7.2 to 3.4.0 - Persistent Version and TypeScript Errors HOT 8
- Save screenshot/HTML on first occurrence of error in error statistics HOT 2
- Double clicking title selects also prefix pill – makes it harder to copypaste HOT 1
- dataset as requestsFromUrl
- add "exclude" property to enqueueLinksByClickingElements like "enqueueLinks"
- Implement Automatic Memory Management in Playwright for Enhanced Stability in Web Crawling Operations
- Support plain-text sitemaps (sitemap.txt) in the `Sitemap` class HOT 1
- Implement sitemap autodetection (independent of robots.txt)
- `maxUsageCount: 1` does not retire session after a single use HOT 1
- `useIncognitoPages` doesn't rotate fingerprints HOT 1
- Add support for all tags defined by the sitemap protocol
- `page.evaluate` results error HOT 2
- HttpCrawler - determining character encoding
- Add `waitForAllRequestsToBeAdded` option to `enqueueLinks`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crawlee.