Comments (4)
Bypassing WGPU and using a transfer queue that WGPU doesn't see.
That's a fairly feasible option in the interm. For e.g. vulkan, you would:
- Create your own vulkan device/queue/transfer queue
- Use Adapter::create_device_from_hal() to create your regular wgpu Device/Queue
- Create a background thread, give it the wgpu::Device and transfer queue, and create a vulkan command pool and fence
- Background thread allocates a vulkan buffer (Problem: Need to access wgpu's memory allocator, else manage allocations yourself) for staging, and a second that's device local.
- Background thread loads an asset from disk, and decompresses it into the staging buffer
- Background thread creates a command buffer from the pool and records a vkCmdCopyBuffer from staging to device local (it might be a good idea to load several assets at once, and use multiple copy commands per command buffer, otherwise you'll have a ton of submits and individual command buffers which is bad for performance, but I'm not sure how engines tend to structure this)
- Background thread submits the command buffer to the transfer queue, along with the fence
- Background thread waits for the fence to signal
- Background thread calls wgpu::hal::vulkan::Device::buffer_from_raw() and wgpu::Device::create_buffer_from_hal() on the device local buffer to create a wgpu::Buffer (Important note: dropping the buffer does not free the GPU memory like it normally does. You'll have to manually free it yourself. Also, buffer can't be mapped, but that's not relevant to you.)
- Background thread sends the finished buffer to the main thread or whatever needs it via a channel
from wgpu.
That's an option, but a desperation one. I'd hate to have to go into the innards of WGPU's allocation system. Also, I'd be giving up MacOS support, which is the main point of using WGPU rather than Vulkan directly. I saw a note that the Bevy devs are considering such a bypass. If they do it, I'll have an example to look at.
Because I'm using WGPU via Rend3, I'd need my own version of Rend3, too.
Another alternative is to fork Rend3, rip out the connection to WGPU, and replace that with Vulkano. That would be a relatively clean and safe Rust solution. Rend3 has a well-designed, clean API, and that's worth retaining.
These are all ugly hacks. Better if transfer queues are implemented inside WGPU, where they belong.
from wgpu.
Here's the proposed Bevy workaround for this problem: That's a plan to bypass WGPU and go directly from Bevy to Vulkan. Comment from the Bevy issue: "This is a bit hacky, and relying on globals in the form of static OnceLock-ed variables, but may be reasonable until wgpu supports multiple queues."
from wgpu.
JMS55 commented: Background thread creates a command buffer from the pool and records a vkCmdCopyBuffer from staging to device local (it might be a good idea to load several assets at once, and use multiple copy commands per command buffer, otherwise you'll have a ton of submits and individual command buffers which is bad for performance, but I'm not sure how engines tend to structure this)
That raises a good question for implementation, regardless of where this is implemented - how expensive is submit? Expensive enough on transfer queues that minimizing submit operations is worth it?
There are at least two ways to approach this:
Simple way:
- Application makes a request to put a texture into the GPU. Multiple threads may be making such requests concurrently.
- GPU Buffer is allocated for texture.
- Buffer is passed to transfer queue feeding thread (needed because one thread per command queue limit).
- Application thread is blocked waiting for completion.
- Transfer queue feeding thread submits everything on its queue to GPU as one Submit.
- Transfer queue feeding thread fences and waits for completion callbacks.
- On completion, application thread is unblocked and gets a fully loaded handle to the asset in the GPU.
This depends on Submit being reasonably fast compared to, say, loading a 1MB texture.
Complicated way
- Application makes a request to put a texture into the GPU. Multiple threads may be making such requests concurrently.
- GPU Buffer is allocated for texture.
- Buffer is passed to transfer queue feeding thread (needed because one thread per command queue limit).
- Application thread gets control back immediately, with WGPU's version of a future referencing a buffer not yet loaded into the GPU.
- Application thread is free to use its "handle" in a render request. WGPU interlocking (?) prevents use of the asset before it is loaded.
- Transfer queue feeding thread submits everything on its queue to GPU as one Submit.
- Transfer queue feeding thread fences and waits for completion callbacks.
- Completion callbacks are (somehow) fed to WGPU interlocking system.
This potentially has higher performance, especially for single-thread programs where the asset loading requests come from the same thread that does renders. Unclear if the added complexity is worth it.
I'd be fine with the simple approach, unless Submit is really slow.
from wgpu.
Related Issues (20)
- Update inter-stage interface validation to follow spec
- Add tvOS support HOT 2
- createBindGroup accepts textures from different device HOT 6
- [core] Deadlock involving five locks
- Device::create_buffer_init() provides no indication that the buffer is only filled with use of Queue::submit() HOT 2
- Naga (Metal conversion) crash if we use "vertex" variable name in VertexOutput HOT 1
- Automatically select minimum limits for a specific platform HOT 2
- 59 tests failing on Dx12/Microsoft Basic Render Driver with Invalid access to memory location HOT 1
- [core] Deadlock involving three locks
- Panic when taking the address of a struct within a struct HOT 1
- [naga] ValidationError after processing pipeline constants on a SPIR-V from rust-gpu
- 10-bit color support on Linux (No HDR) HOT 1
- Lock analysis is not prepared to handle SnatchLock usage HOT 4
- surface.get_current_texture blocks for the duration of fullscreen animation on macOS
- Regression: wgpu no longer detects Dx12 backend on Intel(R) HD Graphics 4600 HOT 1
- make Instanse::create_surface safe HOT 1
- Procedural Macros to create `VertexBufferLayout` HOT 5
- beginRenderPass device mismatched crashes HOT 3
- Unorm_10_10_10_A2 Should Be Rgb10A2Unorm HOT 2
- "Draw Indexed Primitives Validation" assertion when drawing empty buffer HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wgpu.