Skip to content

New render modes + winit perfs#8

Open
Speedy37 wants to merge 13 commits intoDGriffin91:mainfrom
Speedy37:clean
Open

New render modes + winit perfs#8
Speedy37 wants to merge 13 commits intoDGriffin91:mainfrom
Speedy37:clean

Conversation

@Speedy37
Copy link
Contributor

  • Optimize winit integration by removing a full clear and blit per frame

    I found out that most of the time was spent in these 3 steps: - clearing the frame - copy canvas to frame - presenting the frame (another copy)

    From my understanding, there is no point in doing these 3 steps, if we use the framebuffer as canvas. Which is what the winit implementation now does.

    EguiSoftwareRender now doesn't hold a canvas anymore, it's a new EguiSoftwareRenderCanvas struct that does this now.

    Another optimisation I found while doing this is to only present the dirty/damaged zone of the framebuffer. So render now returns a DirtyRect representing the damaged zone that needs to be presented.

    This improves winit frame times by a lot!

  • Two new caching modes Mesh and TiledMesh

    With this new DirtyRect logic, I was wondering how fast simply drawing the zone that is required to be redrawn without caching render would be.

    First I did the Mesh mode, simply caching meshes to generate the DirtyRect and rendering any primitive bounding box. Cache lookup is the same as before, final meshes are prepared for cache lookup.

    Then I was wondering if I could optimize it a bit more, with a new TiledMesh mode. This mode compute a set of non overlapping bounding boxes extended to tile limits so there is too many of them. And primitive are now rendered for each intersection with this set of bounding boxes. When writing this, I was wondering if seams would appear as this effectively render primitive meshes in multiple steps, but visualy it looks good on my machine at least.

  • egui::Mesh::clone() removed

    By changing the render api from &[ClippedPrimitive] to Vec<[ClippedPrimitive]> I was able to remove the egui::Mesh::clone() that was required before.

    In most cases render will be called with the output of egui_context.tessellate making it perfect. And if a clone of the whole vec is required for some reason it would be the same amount of work as before.

  • SoftwareBackend reworked

    I reworked winit SoftwareBackend exposed API.

    • is_capture_frame_time and set_capture_frame_time are now removed, frame_time is now always captured as it only cost 2 Instant::now() calls, so really not much.

    • stats() -> &RenderStats are now exposed

    • caching, set_caching to read and change the caching modes live. The winit example use it.

    • clear_cache, Clear cache and reclaim memory, this will cause the next frame to redraw everything

  • RasterStats inner mutability, to allow &self usage when possible

    While doing all this work I mostly left the raster_stat feature a problem for later me. Well when I tried to activate it back, it force &self to &mut self to too many points for my taste and could found a good way to fix this.

    So the fix was to use inner mutability via AtomicU32 for f32 storage and egui::Mutex for rasterisation stats. I split the RasterStats struct in two parts: RenderStats that contains RasterStats with a "nice" API for start_raster. I added a few stats for the new render modes.

    Even if with this changes the start_raster would compile with rayon, as a mutex is involved there no point try to add this stats with the rayon feature, so it's still gated to #[cfg(not(feature = "rayon"))]

  • winit example:

    It compiles with the render_stats feature and display stats and allows for live render mode seletion.

@Speedy37
Copy link
Contributor Author

Speedy37 commented Jan 25, 2026

This one still sit on top of the other 2 pull requests, but it should now be simpler to review.

@Speedy37 Speedy37 force-pushed the clean branch 2 times, most recently from 929e0f8 to 401e766 Compare January 25, 2026 15:58
@DGriffin91
Copy link
Owner

Now that the SIMD and CI stuff is in, would you be able to merge in or rebase on main?

- Optimize winit integration by removing a full clear and blit per frame

  I found out that most of the time was spent in these 3 steps: - clearing the frame - copy canvas to frame - presenting the frame (another copy)

  From my understanding, there is no point in doing these 3 steps, if we use the framebuffer as canvas. Which is what the winit implementation now does.

  `EguiSoftwareRender` now doesn't hold a canvas anymore, it's a new `EguiSoftwareRenderCanvas` struct that does this now.

  Another optimisation I found while doing this is to only present the dirty/damaged zone of the framebuffer. So render now returns a `DirtyRect` representing the damaged zone that needs to be presented.

  This improves winit frame times by a lot!

- Two new caching modes `Mesh` and `TiledMesh`

  With this new `DirtyRect` logic, I was wondering how fast simply drawing the zone that is required to be redrawn without caching render would be.

  First I did the `Mesh` mode, simply caching meshes to generate the `DirtyRect` and rendering any primitive bounding box. Cache lookup is the same as before, final meshes are prepared for cache lookup.

  Then I was wondering if I could optimize it a bit more, with a new `TiledMesh` mode. This mode compute a set of non overlapping bounding boxes extended to tile limits so there is too many of them. And primitive are now rendered for each intersection with this set of bounding boxes. When writing this, I was wondering if seams would appear as this effectively render primitive meshes in multiple steps, but visualy it looks good on my machine at least.

- `egui::Mesh::clone()` removed

  By changing the render api from `&[ClippedPrimitive]` to `Vec<[ClippedPrimitive]>` I was able to remove the `egui::Mesh::clone()` that was required before.

  In most cases render will be called with the output of `egui_context.tessellate` making it perfect. And if a clone of the whole vec is required for some reason it would be the same amount of work as before.

- `SoftwareBackend` reworked

  I reworked winit `SoftwareBackend` exposed API.
  - `is_capture_frame_time` and `set_capture_frame_time` are now removed, frame_time is now always captured as it only cost 2 `Instant::now()` calls, so really not much.

  - `stats() -> &RenderStats ` are now exposed

  - `caching`, `set_caching` to read and change the caching modes live. The winit example use it.

  - `clear_cache`, Clear cache and reclaim memory, this will cause the next frame to redraw everything

- RasterStats inner mutability, to allow &self usage when possible

  While doing all this work I mostly left the raster_stat feature a problem for later me. Well when I tried to activate it back, it force `&self` to `&mut self` to too many points for my taste and could found a good way to fix this.

  So the fix was to use inner mutability via AtomicU32 for f32 storage and egui::Mutex for rasterisation stats. I split the `RasterStats` struct in two parts: `RenderStats` that contains `RasterStats` with a "nice" API for `start_raster`. I added a few stats for the new render modes.

  Even if with this changes the `start_raster` would compile with rayon, as a mutex is involved there no point try to add this stats with the rayon feature, so it's still gated to `#[cfg(not(feature = "rayon"))]`
@Speedy37
Copy link
Contributor Author

Speedy37 commented Feb 23, 2026

ok should be a good starting point

Note that any non direct rendering modes now only renders and blit the least amount of dirty pixels.

@DGriffin91
Copy link
Owner

Running the winit example with default settings cargo run --example winit --release I'm getting lots of artifacts on the edges of a window after moving it around.

image

Also if I move the window from one monitor to another, everything is garbled:
image

Similar issues initially when I resize the window:
image

@DGriffin91
Copy link
Owner

It might be worth adding some dynamic use cases to the test_render/compare_software_render_with_gpu() stuff.

@Speedy37
Copy link
Contributor Author

Speedy37 commented Feb 23, 2026

Can you try again with the latest commit.

Which OS are you testing, and which DPI ?

I'm on Windows 11, 4K display, 175% DPI

Update 1: I can reproduce the issue in ubuntu/wayland (looks like the buffer is not the same every frame as with windows :( )

Update 2: Looking at softbuffer implementation for wayland, it's swapping between two buffers and damage_rect may be ignored if compositor support is not there.

@Speedy37
Copy link
Contributor Author

Ok, I've to rework that so that I respect https://docs.rs/softbuffer/0.4.8/softbuffer/struct.Buffer.html#method.age.

For now, this pr only works on win32, web, x11, orbital

macOS and android, age is always 0
wayland and kms age is 1, or 2 (front/back buffers)

I see a way to handle age = 2 without falling back to the extra full copy by adding an age concept to the cache.
But age is always 0 is a pain point,

  • android implementation use an internal buffer that is copied when presenting, so age could be 1 quite easily, I don't see any reason why this is not the case.
  • macOS implementation generate a new buffer every frame

@DGriffin91
Copy link
Owner

DGriffin91 commented Feb 25, 2026

Which OS are you testing, and which DPI ?

I was testing on Linux, Wayland, 125% DPI. I see it still has issues with this configuration. (To be expected since you mentioned "pr only works on win32, web, x11, orbital"). I'll have to try out other platforms when I get a chance.

Speedy37 added 2 commits March 3, 2026 00:59
Previous revisions were only working in single buffering mode (win32, web, x11, orbital)

Tests are run on simulated no, single, double and triple buffered canvas, there might still be differences with softbuffer. So far it handle softbuffer frame age and frame resize.
@Speedy37
Copy link
Contributor Author

Speedy37 commented Mar 3, 2026

Ok, I've added some basic "dynamic" tests, added supports for no (android, macos), double (wayland, kms) and triple buffering (rust-windowing/softbuffer#329)

This also allows for cached_primitives reuse when resizing.

@DGriffin91 Can you confirm this is working as expected? I used wsl2 to test wayland so I might have missed things, but it was "perfect" once the new tests were passing on my side.

I think I'll add internal single buffering for the BufferState::AlwaysZeroed case, which while writing this, I think will be renamed AlwaysBlitCanvas so performance is at least on par with before this pull request for android and macos.

@DGriffin91
Copy link
Owner

DGriffin91 commented Mar 3, 2026

Nice! Going though testing now. It seems to be working on windows so far. One thing I noticed (might be related to #11) is that if on the winit example I open up the Table window, on BlendTiled and MeshTiled the text on the right can get cut off as I move around the window in ways that it doesn't with Mesh mode.

Update: All the new modes seem to be working on windows, linux/wayland, and M1/MacOS so far!

@Speedy37
Copy link
Contributor Author

Speedy37 commented Mar 6, 2026

About #11, I'm unable to reproduce on main or this branch, any tip?

@AlexanderSchuetz97
Copy link
Contributor

AlexanderSchuetz97 commented Mar 6, 2026

I can reproduce it with the 0.0.2 release with the provided code in the issue. As mentioned debian 12 xfce x11 nvidia drivers.

I have seen this on windows server before aswell, but not with this exact code.

@Speedy37 Speedy37 changed the title New render modes + winit perfs + compilation fixes New render modes + winit perfs Mar 10, 2026
@Speedy37 Speedy37 mentioned this pull request Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants