Giter Site home page Giter Site logo

Comments (4)

JMS55 avatar JMS55 commented on June 19, 2024

Bypassing WGPU and using a transfer queue that WGPU doesn't see.

That's a fairly feasible option in the interm. For e.g. vulkan, you would:

  • Create your own vulkan device/queue/transfer queue
  • Use Adapter::create_device_from_hal() to create your regular wgpu Device/Queue
  • Create a background thread, give it the wgpu::Device and transfer queue, and create a vulkan command pool and fence
  • Background thread allocates a vulkan buffer (Problem: Need to access wgpu's memory allocator, else manage allocations yourself) for staging, and a second that's device local.
  • Background thread loads an asset from disk, and decompresses it into the staging buffer
  • Background thread creates a command buffer from the pool and records a vkCmdCopyBuffer from staging to device local (it might be a good idea to load several assets at once, and use multiple copy commands per command buffer, otherwise you'll have a ton of submits and individual command buffers which is bad for performance, but I'm not sure how engines tend to structure this)
  • Background thread submits the command buffer to the transfer queue, along with the fence
  • Background thread waits for the fence to signal
  • Background thread calls wgpu::hal::vulkan::Device::buffer_from_raw() and wgpu::Device::create_buffer_from_hal() on the device local buffer to create a wgpu::Buffer (Important note: dropping the buffer does not free the GPU memory like it normally does. You'll have to manually free it yourself. Also, buffer can't be mapped, but that's not relevant to you.)
  • Background thread sends the finished buffer to the main thread or whatever needs it via a channel

from wgpu.

John-Nagle avatar John-Nagle commented on June 19, 2024

That's an option, but a desperation one. I'd hate to have to go into the innards of WGPU's allocation system. Also, I'd be giving up MacOS support, which is the main point of using WGPU rather than Vulkan directly. I saw a note that the Bevy devs are considering such a bypass. If they do it, I'll have an example to look at.

Because I'm using WGPU via Rend3, I'd need my own version of Rend3, too.

Another alternative is to fork Rend3, rip out the connection to WGPU, and replace that with Vulkano. That would be a relatively clean and safe Rust solution. Rend3 has a well-designed, clean API, and that's worth retaining.

These are all ugly hacks. Better if transfer queues are implemented inside WGPU, where they belong.

from wgpu.

John-Nagle avatar John-Nagle commented on June 19, 2024

Here's the proposed Bevy workaround for this problem: That's a plan to bypass WGPU and go directly from Bevy to Vulkan. Comment from the Bevy issue: "This is a bit hacky, and relying on globals in the form of static OnceLock-ed variables, but may be reasonable until wgpu supports multiple queues."

from wgpu.

John-Nagle avatar John-Nagle commented on June 19, 2024

JMS55 commented: Background thread creates a command buffer from the pool and records a vkCmdCopyBuffer from staging to device local (it might be a good idea to load several assets at once, and use multiple copy commands per command buffer, otherwise you'll have a ton of submits and individual command buffers which is bad for performance, but I'm not sure how engines tend to structure this)

That raises a good question for implementation, regardless of where this is implemented - how expensive is submit? Expensive enough on transfer queues that minimizing submit operations is worth it?

There are at least two ways to approach this:

Simple way:

  • Application makes a request to put a texture into the GPU. Multiple threads may be making such requests concurrently.
  • GPU Buffer is allocated for texture.
  • Buffer is passed to transfer queue feeding thread (needed because one thread per command queue limit).
  • Application thread is blocked waiting for completion.
  • Transfer queue feeding thread submits everything on its queue to GPU as one Submit.
  • Transfer queue feeding thread fences and waits for completion callbacks.
  • On completion, application thread is unblocked and gets a fully loaded handle to the asset in the GPU.

This depends on Submit being reasonably fast compared to, say, loading a 1MB texture.

Complicated way

  • Application makes a request to put a texture into the GPU. Multiple threads may be making such requests concurrently.
  • GPU Buffer is allocated for texture.
  • Buffer is passed to transfer queue feeding thread (needed because one thread per command queue limit).
  • Application thread gets control back immediately, with WGPU's version of a future referencing a buffer not yet loaded into the GPU.
  • Application thread is free to use its "handle" in a render request. WGPU interlocking (?) prevents use of the asset before it is loaded.
  • Transfer queue feeding thread submits everything on its queue to GPU as one Submit.
  • Transfer queue feeding thread fences and waits for completion callbacks.
  • Completion callbacks are (somehow) fed to WGPU interlocking system.

This potentially has higher performance, especially for single-thread programs where the asset loading requests come from the same thread that does renders. Unclear if the added complexity is worth it.

I'd be fine with the simple approach, unless Submit is really slow.

from wgpu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.