Comments (8)
Further updates:
-
Locally, I've replaced
net.Socket.prototype.write
withnew net.Socket().write
, andproxy-chain
wasn't causing errors anymore. -
Next up, there was an error with
node_modules/@crawlee/browser-pool/proxy-server.js
with lineserver.server.unref();
I looked into it. The
unref
should refers tohttp.Server.unref
. For some reason, this isn't define in Bun, and this seems to be genuine error on their side (it's not even reported in their docs). -
Out of curiosity, I just commented out that line, to see if I get the crawler to work. It printed the initial log with system info
INFO System info {"apifyVersion":"3.1.4","apifyClientVersion":"2.7.1","crawleeVersion":"3.3.1","osType":"Darwin","nodeVersion":"v18.15.0"}
However, the run still ended in an error. Here, the
promises_1.opendir
refer tofs.promises.opendir
(node:fs). Unfortunately, none of theopendir
functions are currently defined Bun (fs.opendirSync
, fs.opendir,
fs.promises.opendir`).ERROR (0, promises_1.opendir) is not a function. (In '(0, promises_1.opendir)(keyValueStoreDir)', '(0, promises_1.opendir)' is undefined) TypeError: (0, promises_1.opendir) is not a function. (In '(0, promises_1.opendir)(keyValueStoreDir)', '(0, promises_1.opendir)' is undefined) at <anonymous> (/Users/presenter/repos/apify-actor-facebook/node_modules/@crawlee/memory-storage/cache-helpers.js:110:25)
So to sum up:
- Yes, the ticket can remain closed, it's currently not possible to run Apify crawlers with Bun, because of (at least) 2 unsupported features.
- For future refernce, once / if this becomes relevant, then the first issue can be resolved by replacing
net.Socket.prototype.write
withnew net.Socket().write
.
from proxy-chain.
Bun is not fully compatible with Node. See https://bun.sh/docs/runtime/nodejs-apis#node-net, where they write:
If you run into any bugs with a particular package, please open an issue. Opening issues for compatibility bugs helps us prioritize what to work on next.
So I'd recommend doing that, we can't fix it here...
from proxy-chain.
@jancurn Please don't judge so fast and have a look at the error I posted.
The error said that undefined is not an object (evaluating 'net_1.default.Socket.prototype.write')
.
But in my test, the (new Socket()).write
function was defined in Bun. So it didn't seem to be an issue on the Bun side, implying that the issue is in proxy-chain
.
What's more, I think I just found the issue, and it's here this line 5:
const asyncWrite = promisify(net.Socket.prototype.write);
Which is then called here on line 14
await asyncWrite.call(socket, 'HTTP/1.1 200 Connection Established\r\n\r\n');
For some reason, net.Socket.prototype
is undefined
in Bun, so net.Socket.prototype.write
throws the error.
However, (new net.Socket()).write
is defined, and following:
new Socket().write('HTTP/1.1 200 Connection Established\r\n\r\n')
returns true
.
So that's what I think the issue is. However, I haven't worked with Sockets before, and I'm not 100% sure what's the prurpose of that file, so I don't know if the behaviour of new net.Socket().write
in Bun is the same as net.Socket.prototype.write
in Node. But common sense suggests that it should be.
from proxy-chain.
Sorry, you're right. I think we just need to get rid of the problematic line and change the code of the customConnect
function in https://github.com/apify/proxy-chain/blob/master/src/custom_connect.ts to something like this:
const asyncWrite = util.promisify(socket.write).bind(socket);
await asyncWrite.call(socket, 'HTTP/1.1 200 Connection Established\r\n\r\n');
Would you care to create a pull request?
from proxy-chain.
I made a PR for the socket one (#522), since I'm already in the flow. Couldn't verify the tests. I leave it up to you to decide whether it should go in or not. Have a nice evening!
from proxy-chain.
I couldn't resist testing further, so just summarizing what I learnt:
-
I managed to get start a Playwright crawler in Bun with following changes to the Apify packages:
- I commented out the
server.server.unref();
in@crawlee/browser-pool/proxy-server.js
- I replaced
fs.promises.opendir(dirName)
withfs.promises.readdir(dirName, { withFileTypes: true })
in@crawlee/memory-storage/cache-helpers.js
- NOTE: Good thing is that with the
withFileTypes: true
option, bothopendir
andreaddir
resolve to an iterable of Dirent. Bad thing, from my understandingopendir
yields the entries one-by-one as they are found, whereasreaddir
resolves only once all items have been found. So replacingopendir
withreaddir
might add extra waiting time.
- NOTE: Good thing is that with the
- I commented out the
-
With changes in step 1., I managed to start a Playwright crawler, to the point where Playwright command was executed. Afterwards, there is an issue on Playwright side with
child_process.spawn
. You can find more about that issue here:
from proxy-chain.
Many thanks for the analysis! Please can you post this to https://github.com/apify/crawlee/issues instead? Otherwise the Crawlee team will not look into it...
from proxy-chain.
Closing this issue here for now
from proxy-chain.
Related Issues (20)
- A lot of TCP connections HOT 5
- SOCKS5 support HOT 11
- Passing custom http agent on every prepareRequestFunction HOT 1
- Introduce command line interface to start a local proxy server connect to upstream HOT 2
- Issue with latest stable version 2.2.1 and README about customTag HOT 2
- Linux Mint: Invalid upstream proxy credentials HOT 1
- how can i get origin headers of http request HOT 1
- intercept local requests HOT 2
- proxy-chain and puppeteer in same process do not send username and password proxy-authorization HOT 1
- Consider binding HTTP server to a specific hostname HOT 1
- TimeoutError: Navigation timeout of 30000 ms exceeded HOT 9
- Proxy Authentication 407 HOT 3
- Proxy-chain never reconnect when get this error: "Failed to connect to upstream proxy: Error: socket hang up" HOT 5
- getting code 400 for upstream proxy HOT 1
- How do I self sign a request using my certs HOT 1
- Convert "GET" requests to "CONNECT"
- Proxy only certain requests HOT 1
- How to use dynamic upstream proxy pool?
- How to proxy HTTPS? HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from proxy-chain.