Is your feature request related to a problem? Please describe. Ri

Here's the <a href="https://github.com/bevyengine/bevy/issues/12856" data-hovercard-ty

Add support for transfer queues so assets can be loaded concurrently with rendering. about wgpu HOT 4 OPEN

John-Nagle commented on June 19, 2024

Add support for transfer queues so assets can be loaded concurrently with rendering.

from wgpu.

Comments (4)

JMS55 commented on June 19, 2024

Bypassing WGPU and using a transfer queue that WGPU doesn't see.

That's a fairly feasible option in the interm. For e.g. vulkan, you would:

Create your own vulkan device/queue/transfer queue
Use Adapter::create_device_from_hal() to create your regular wgpu Device/Queue
Create a background thread, give it the wgpu::Device and transfer queue, and create a vulkan command pool and fence
Background thread allocates a vulkan buffer (Problem: Need to access wgpu's memory allocator, else manage allocations yourself) for staging, and a second that's device local.
Background thread loads an asset from disk, and decompresses it into the staging buffer
Background thread creates a command buffer from the pool and records a vkCmdCopyBuffer from staging to device local (it might be a good idea to load several assets at once, and use multiple copy commands per command buffer, otherwise you'll have a ton of submits and individual command buffers which is bad for performance, but I'm not sure how engines tend to structure this)
Background thread submits the command buffer to the transfer queue, along with the fence
Background thread waits for the fence to signal
Background thread calls wgpu::hal::vulkan::Device::buffer_from_raw() and wgpu::Device::create_buffer_from_hal() on the device local buffer to create a wgpu::Buffer (Important note: dropping the buffer does not free the GPU memory like it normally does. You'll have to manually free it yourself. Also, buffer can't be mapped, but that's not relevant to you.)
Background thread sends the finished buffer to the main thread or whatever needs it via a channel

from wgpu.

John-Nagle commented on June 19, 2024

That's an option, but a desperation one. I'd hate to have to go into the innards of WGPU's allocation system. Also, I'd be giving up MacOS support, which is the main point of using WGPU rather than Vulkan directly. I saw a note that the Bevy devs are considering such a bypass. If they do it, I'll have an example to look at.

Because I'm using WGPU via Rend3, I'd need my own version of Rend3, too.

Another alternative is to fork Rend3, rip out the connection to WGPU, and replace that with Vulkano. That would be a relatively clean and safe Rust solution. Rend3 has a well-designed, clean API, and that's worth retaining.

These are all ugly hacks. Better if transfer queues are implemented inside WGPU, where they belong.

from wgpu.

John-Nagle commented on June 19, 2024

Here's the proposed Bevy workaround for this problem: That's a plan to bypass WGPU and go directly from Bevy to Vulkan. Comment from the Bevy issue: "This is a bit hacky, and relying on globals in the form of static OnceLock-ed variables, but may be reasonable until wgpu supports multiple queues."

from wgpu.

John-Nagle commented on June 19, 2024

JMS55 commented: Background thread creates a command buffer from the pool and records a vkCmdCopyBuffer from staging to device local (it might be a good idea to load several assets at once, and use multiple copy commands per command buffer, otherwise you'll have a ton of submits and individual command buffers which is bad for performance, but I'm not sure how engines tend to structure this)

That raises a good question for implementation, regardless of where this is implemented - how expensive is submit? Expensive enough on transfer queues that minimizing submit operations is worth it?

There are at least two ways to approach this:

Simple way:

Application makes a request to put a texture into the GPU. Multiple threads may be making such requests concurrently.
GPU Buffer is allocated for texture.
Buffer is passed to transfer queue feeding thread (needed because one thread per command queue limit).
Application thread is blocked waiting for completion.
Transfer queue feeding thread submits everything on its queue to GPU as one Submit.
Transfer queue feeding thread fences and waits for completion callbacks.
On completion, application thread is unblocked and gets a fully loaded handle to the asset in the GPU.

This depends on Submit being reasonably fast compared to, say, loading a 1MB texture.

Complicated way

Application makes a request to put a texture into the GPU. Multiple threads may be making such requests concurrently.
GPU Buffer is allocated for texture.
Buffer is passed to transfer queue feeding thread (needed because one thread per command queue limit).
Application thread gets control back immediately, with WGPU's version of a future referencing a buffer not yet loaded into the GPU.
Application thread is free to use its "handle" in a render request. WGPU interlocking (?) prevents use of the asset before it is loaded.
Transfer queue feeding thread submits everything on its queue to GPU as one Submit.
Transfer queue feeding thread fences and waits for completion callbacks.
Completion callbacks are (somehow) fed to WGPU interlocking system.

This potentially has higher performance, especially for single-thread programs where the asset loading requests come from the same thread that does renders. Unclear if the added complexity is worth it.

I'd be fine with the simple approach, unless Submit is really slow.

from wgpu.

Add support for transfer queues so assets can be loaded concurrently with rendering. about wgpu HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent