Giter Site home page Giter Site logo

Comments (10)

BobbyWibowo avatar BobbyWibowo commented on July 26, 2024 2

Just a suggestion: Have you considered using a multi-writer when upload? So rather than treating uploading, and hashing, as two separate steps, it can instead be done at the same time. This is hope kipp handles file uploads.

Frankly hadn't given that any thought before this issue. Also thought of something like that while this was going on, but I was sure Multer (the current lib we use to parse multipart data) didn't have stream-based API to hook into.
I just gave it a look again, and indeed stream-based API is only on RC versions versions atm. Can probably give that a try as-is, but eh, dunno.
Though there are some solutions that involve writing own Multer storage engine, such as this one. Probably a better choice for now.

There's a few Blake3 implementations here https://www.npmjs.com/search?q=blake3

As @camjac251 suggested, it might also be a good idea to use blake3. md5 should never, ever be used anymore.

Aight, that sounds good to me as well.

from lolisafe.

BobbyWibowo avatar BobbyWibowo commented on July 26, 2024

Intensive processes that I can think of, that the node would do with the uploads after fully uploaded, are probably file hashing and virus scanning. Thumbnails are created in the background, so no waiting there.
You didn't mention virus scanning, so it most likely is the file hashing then (FYI it's for duplication avoidance).

Try to comment out:

const hash = await new Promise((resolve, reject) => {
const result = crypto.createHash('md5')
fs.createReadStream(info.path)
.on('error', error => reject(error))
.on('end', () => resolve(result.digest('hex')))
.on('data', data => result.update(data, 'utf8'))
})
// Check if the file exists by checking its hash and size
const dbFile = await db.table('files')
.where(function () {
if (user === undefined)
this.whereNull('userid')
else
this.where('userid', user.id)
})
.where({
hash,
size: info.data.size
})
// Select expirydate to display expiration date of existing files as well
.select('name', 'expirydate')
.first()
if (dbFile) {
// Continue even when encountering errors
await utils.unlinkFile(info.data.filename).catch(logger.error)
// logger.log(`Unlinked ${info.data.filename} since a duplicate named ${dbFile.name} exists`)
// If on /nojs route, append original file name reported by client
if (req.path === '/nojs')
dbFile.original = info.data.originalname
exists.push(dbFile)
return
}

Then add a new line below them: const hash = null.

If that was indeed the bottleneck, I'll add a config option to completely disable the feature I guess. Can't really make that any quicker.

from lolisafe.

BobbyWibowo avatar BobbyWibowo commented on July 26, 2024

After that the file started over with reuploading, although in my uploads folder, the old file remained and a new file was being written for this new transfer.

Can't think of why this would happen though.
If client attempts to reupload, then it has to be something from the client itself. But I don't recall the homepage uploader having the feature to do that.

from lolisafe.

camjac251 avatar camjac251 commented on July 26, 2024

Could the internal webserver have a timeout set?

I commented it out like this

self.storeFilesToDb = async (req, res, user, infoMap) => {
  const files = []
  const exists = []
  const albumids = []

  await Promise.all(infoMap.map(async info => {
    // Create hash of the file
/*    const hash = await new Promise((resolve, reject) => {
      const result = crypto.createHash('md5')
      fs.createReadStream(info.path)
        .on('error', error => reject(error))
        .on('end', () => resolve(result.digest('hex')))
        .on('data', data => result.update(data, 'utf8'))
    })

    // Check if the file exists by checking its hash and size
    const dbFile = await db.table('files')
      .where(function () {
        if (user === undefined)
          this.whereNull('userid')
        else
          this.where('userid', user.id)
      })
      .where({
        hash,
        size: info.data.size
      })
      // Select expirydate to display expiration date of existing files as well
      .select('name', 'expirydate')
      .first()

    if (dbFile) {
      // Continue even when encountering errors
      await utils.unlinkFile(info.data.filename).catch(logger.error)
      // logger.log(`Unlinked ${info.data.filename} since a duplicate named ${dbFile.name} exists`)

      // If on /nojs route, append original file name reported by client
      if (req.path === '/nojs')
        dbFile.original = info.data.originalname

      exists.push(dbFile)
      return
    }*/

    const timestamp = Math.floor(Date.now() / 1000)
    const data = {

Running the transfer again now

from lolisafe.

BobbyWibowo avatar BobbyWibowo commented on July 26, 2024

You'd have to add the const hash = null line, as it's being referenced somewhere below when inserting to DB.

Other than the timeout between Nginx to lolisafe service, I'm not aware of where else.
Supposedly since Express sits on top Node.js HTTP API, the default timeout of it applies as well: https://nodejs.org/docs/latest-v12.x/api/http.html#http_server_timeout. Not sure if it's that kind of timeout, but at least I don't recall it being changed elsewhere, so it should still be the default 2 mins.

from lolisafe.

camjac251 avatar camjac251 commented on July 26, 2024

That was it. It was instantaneous right after it finished uploading in giving the link. Nothing I've tried had been so fast like this before.

Could there be a faster checksum method available for duplicates in node?
There's a few Blake3 implementations here https://www.npmjs.com/search?q=blake3

from lolisafe.

uhthomas avatar uhthomas commented on July 26, 2024

Just a suggestion: Have you considered using a multi-writer when upload? So rather than treating uploading, and hashing, as two separate steps, it can instead be done at the same time. This is hope kipp handles file uploads.

As @camjac251 suggested, it might also be a good idea to use blake3. md5 should never, ever be used anymore.

from lolisafe.

BobbyWibowo avatar BobbyWibowo commented on July 26, 2024

@camjac251 try 62a9775

from lolisafe.

camjac251 avatar camjac251 commented on July 26, 2024

Oh wow. I'll try it out right now. Thank you so much

from lolisafe.

camjac251 avatar camjac251 commented on July 26, 2024

This is incredible. Thank you for adding this. It was almost instant after the 23GB file was done uploading until it gave me the link. So much faster now

from lolisafe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.