Giter Site home page Giter Site logo

Comments (8)

JuroOravec avatar JuroOravec commented on August 11, 2024 1

Further updates:

  1. Locally, I've replaced net.Socket.prototype.write with new net.Socket().write, and proxy-chain wasn't causing errors anymore.

  2. Next up, there was an error with node_modules/@crawlee/browser-pool/proxy-server.js with line

    server.server.unref();

    I looked into it. The unref should refers to http.Server.unref. For some reason, this isn't define in Bun, and this seems to be genuine error on their side (it's not even reported in their docs).

  3. Out of curiosity, I just commented out that line, to see if I get the crawler to work. It printed the initial log with system info

    INFO  System info 
    {"apifyVersion":"3.1.4","apifyClientVersion":"2.7.1","crawleeVersion":"3.3.1","osType":"Darwin","nodeVersion":"v18.15.0"}

    However, the run still ended in an error. Here, the promises_1.opendir refer to fs.promises.opendir (node:fs). Unfortunately, none of the opendir functions are currently defined Bun (fs.opendirSync, fs.opendir, fs.promises.opendir`).

     ERROR (0, promises_1.opendir) is not a function. (In '(0, promises_1.opendir)(keyValueStoreDir)', '(0, promises_1.opendir)' is undefined)
       TypeError: (0, promises_1.opendir) is not a function. (In '(0, promises_1.opendir)(keyValueStoreDir)', '(0, promises_1.opendir)' is undefined)
           at <anonymous> (/Users/presenter/repos/apify-actor-facebook/node_modules/@crawlee/memory-storage/cache-helpers.js:110:25)

So to sum up:

  • Yes, the ticket can remain closed, it's currently not possible to run Apify crawlers with Bun, because of (at least) 2 unsupported features.
  • For future refernce, once / if this becomes relevant, then the first issue can be resolved by replacing net.Socket.prototype.write with new net.Socket().write.

from proxy-chain.

jancurn avatar jancurn commented on August 11, 2024

Bun is not fully compatible with Node. See https://bun.sh/docs/runtime/nodejs-apis#node-net, where they write:

If you run into any bugs with a particular package, please open an issue. Opening issues for compatibility bugs helps us prioritize what to work on next.

So I'd recommend doing that, we can't fix it here...

from proxy-chain.

JuroOravec avatar JuroOravec commented on August 11, 2024

@jancurn Please don't judge so fast and have a look at the error I posted.

The error said that undefined is not an object (evaluating 'net_1.default.Socket.prototype.write').

But in my test, the (new Socket()).write function was defined in Bun. So it didn't seem to be an issue on the Bun side, implying that the issue is in proxy-chain.


What's more, I think I just found the issue, and it's here this line 5:

const asyncWrite = promisify(net.Socket.prototype.write);

Which is then called here on line 14

await asyncWrite.call(socket, 'HTTP/1.1 200 Connection Established\r\n\r\n');

For some reason, net.Socket.prototype is undefined in Bun, so net.Socket.prototype.write throws the error.

However, (new net.Socket()).write is defined, and following:

new Socket().write('HTTP/1.1 200 Connection Established\r\n\r\n')

returns true.


So that's what I think the issue is. However, I haven't worked with Sockets before, and I'm not 100% sure what's the prurpose of that file, so I don't know if the behaviour of new net.Socket().write in Bun is the same as net.Socket.prototype.write in Node. But common sense suggests that it should be.

from proxy-chain.

jancurn avatar jancurn commented on August 11, 2024

Sorry, you're right. I think we just need to get rid of the problematic line and change the code of the customConnect function in https://github.com/apify/proxy-chain/blob/master/src/custom_connect.ts to something like this:

const asyncWrite = util.promisify(socket.write).bind(socket);
await asyncWrite.call(socket, 'HTTP/1.1 200 Connection Established\r\n\r\n');

Would you care to create a pull request?

from proxy-chain.

JuroOravec avatar JuroOravec commented on August 11, 2024

I made a PR for the socket one (#522), since I'm already in the flow. Couldn't verify the tests. I leave it up to you to decide whether it should go in or not. Have a nice evening!

from proxy-chain.

JuroOravec avatar JuroOravec commented on August 11, 2024

I couldn't resist testing further, so just summarizing what I learnt:

  1. I managed to get start a Playwright crawler in Bun with following changes to the Apify packages:

    • I commented out the server.server.unref(); in @crawlee/browser-pool/proxy-server.js
    • I replaced fs.promises.opendir(dirName) with fs.promises.readdir(dirName, { withFileTypes: true }) in @crawlee/memory-storage/cache-helpers.js
      • NOTE: Good thing is that with the withFileTypes: true option, both opendir and readdir resolve to an iterable of Dirent. Bad thing, from my understanding opendir yields the entries one-by-one as they are found, whereas readdir resolves only once all items have been found. So replacing opendir with readdir might add extra waiting time.
  2. With changes in step 1., I managed to start a Playwright crawler, to the point where Playwright command was executed. Afterwards, there is an issue on Playwright side with child_process.spawn. You can find more about that issue here:

from proxy-chain.

jancurn avatar jancurn commented on August 11, 2024

Many thanks for the analysis! Please can you post this to https://github.com/apify/crawlee/issues instead? Otherwise the Crawlee team will not look into it...

from proxy-chain.

jancurn avatar jancurn commented on August 11, 2024

Closing this issue here for now

from proxy-chain.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.