Giter Site home page Giter Site logo

Comments (1)

passcod avatar passcod commented on June 8, 2024

Replacing tempdir with tempfile sounds straightforward and would be a good first step but I've had some discussions with Rust people in discord and there's some subtleties that are not at first obvious. A quick summary:

  • tempdirs that are not extremely restricted to the calling process (which is not generally possible to do), especially with copies and links possibly happening, are a security issue (due to undetectable substituting and various other "racing the filesystem" vectors). Now, binstall is in the business of installing arbitrary code onto your machine, so it's a bit of a lesser concern compared to that, but it's still something.
  • temp files on the other hand are much less vulnerable, especially if you refrain from dropping their FD/handle once open
  • not all temporary filesystems behave the same. For example there are known implementations where letting go of a file handle in a temporary location will immediately queue up the file for deletion by the OS.
  • temporary locations might not be all that temporary, being variously stored on disk, wiped on boot or periodically
  • cleanup might fail, sometimes silently
  • these last two combined: binstall may silently and inadvertently eat up user memory and/or disk space, and leave rubbish around

So while initially I was a bit standoffish about moving this to memory I'm quite a bit more partial to it now! However, there are more considerations here:

  • massive files and limited memory: we don't want to download a multi-gigabyte archive into memory and then try to extract it, also in memory. That's an extreme example but I could see e.g. a 100MB tar.xz unpacking to a 2GB "folder" and suddenly binstall crashes or the kernel OOM's something else at random, like the very important Zoom call you're on ("whoops!")
  • parity of extraction. the cool thing about extracting to disk is we kinda let the tar/zip etc figure out the detail for us, then we just copy things around. doing it ourselves we need to make sure we're getting the permissions/attributes/whatever correct.
  • we'll need more automated testing around this to make sure we catch errors, not just now, but also down the line.

Something I could see as the holy graal here is full-streaming unpacking, which from following the Great Npm Optimisation Debacles Of 2015-2017 was a great big step forward in both speed and memory use, but there's even more stuff to consider for this.

So, I think, as a first step, let's blindly replace tempdir with tempfile, which will solve the immediate "let's not depend on unmaintained crates" issue, and then let's have a careful look at a future strategy which could include unpacking things straight from downloaded archives, centering on meaningful improvements like:

  • support for files which don't fit in ram (rare, but not unknown, e.g. geo packages which include lots of data files)
  • closing some "obvious" security holes
  • not littering our users' computers
  • faster/leaner installs (let's maybe get some data on this, too!)

from cargo-binstall.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.