Conversation
|
This one still sit on top of the other 2 pull requests, but it should now be simpler to review. |
929e0f8 to
401e766
Compare
|
Now that the SIMD and CI stuff is in, would you be able to merge in or rebase on main? |
- Optimize winit integration by removing a full clear and blit per frame I found out that most of the time was spent in these 3 steps: - clearing the frame - copy canvas to frame - presenting the frame (another copy) From my understanding, there is no point in doing these 3 steps, if we use the framebuffer as canvas. Which is what the winit implementation now does. `EguiSoftwareRender` now doesn't hold a canvas anymore, it's a new `EguiSoftwareRenderCanvas` struct that does this now. Another optimisation I found while doing this is to only present the dirty/damaged zone of the framebuffer. So render now returns a `DirtyRect` representing the damaged zone that needs to be presented. This improves winit frame times by a lot! - Two new caching modes `Mesh` and `TiledMesh` With this new `DirtyRect` logic, I was wondering how fast simply drawing the zone that is required to be redrawn without caching render would be. First I did the `Mesh` mode, simply caching meshes to generate the `DirtyRect` and rendering any primitive bounding box. Cache lookup is the same as before, final meshes are prepared for cache lookup. Then I was wondering if I could optimize it a bit more, with a new `TiledMesh` mode. This mode compute a set of non overlapping bounding boxes extended to tile limits so there is too many of them. And primitive are now rendered for each intersection with this set of bounding boxes. When writing this, I was wondering if seams would appear as this effectively render primitive meshes in multiple steps, but visualy it looks good on my machine at least. - `egui::Mesh::clone()` removed By changing the render api from `&[ClippedPrimitive]` to `Vec<[ClippedPrimitive]>` I was able to remove the `egui::Mesh::clone()` that was required before. In most cases render will be called with the output of `egui_context.tessellate` making it perfect. And if a clone of the whole vec is required for some reason it would be the same amount of work as before. - `SoftwareBackend` reworked I reworked winit `SoftwareBackend` exposed API. - `is_capture_frame_time` and `set_capture_frame_time` are now removed, frame_time is now always captured as it only cost 2 `Instant::now()` calls, so really not much. - `stats() -> &RenderStats ` are now exposed - `caching`, `set_caching` to read and change the caching modes live. The winit example use it. - `clear_cache`, Clear cache and reclaim memory, this will cause the next frame to redraw everything - RasterStats inner mutability, to allow &self usage when possible While doing all this work I mostly left the raster_stat feature a problem for later me. Well when I tried to activate it back, it force `&self` to `&mut self` to too many points for my taste and could found a good way to fix this. So the fix was to use inner mutability via AtomicU32 for f32 storage and egui::Mutex for rasterisation stats. I split the `RasterStats` struct in two parts: `RenderStats` that contains `RasterStats` with a "nice" API for `start_raster`. I added a few stats for the new render modes. Even if with this changes the `start_raster` would compile with rayon, as a mutex is involved there no point try to add this stats with the rayon feature, so it's still gated to `#[cfg(not(feature = "rayon"))]`
|
ok should be a good starting point Note that any non direct rendering modes now only renders and blit the least amount of dirty pixels. |
|
It might be worth adding some dynamic use cases to the |
|
Can you try again with the latest commit. Which OS are you testing, and which DPI ? I'm on Windows 11, 4K display, 175% DPI Update 1: I can reproduce the issue in ubuntu/wayland (looks like the buffer is not the same every frame as with windows :( ) Update 2: Looking at softbuffer implementation for wayland, it's swapping between two buffers and damage_rect may be ignored if compositor support is not there. |
|
Ok, I've to rework that so that I respect https://docs.rs/softbuffer/0.4.8/softbuffer/struct.Buffer.html#method.age. For now, this pr only works on win32, web, x11, orbital macOS and android, age is always 0 I see a way to handle
|
I was testing on Linux, Wayland, 125% DPI. I see it still has issues with this configuration. (To be expected since you mentioned "pr only works on win32, web, x11, orbital"). I'll have to try out other platforms when I get a chance. |
Previous revisions were only working in single buffering mode (win32, web, x11, orbital) Tests are run on simulated no, single, double and triple buffered canvas, there might still be differences with softbuffer. So far it handle softbuffer frame age and frame resize.
|
Ok, I've added some basic "dynamic" tests, added supports for no (android, macos), double (wayland, kms) and triple buffering (rust-windowing/softbuffer#329) This also allows for cached_primitives reuse when resizing. @DGriffin91 Can you confirm this is working as expected? I used wsl2 to test wayland so I might have missed things, but it was "perfect" once the new tests were passing on my side. I think I'll add internal single buffering for the |
|
Nice! Going though testing now. It seems to be working on windows so far. One thing I noticed (might be related to #11) is that if on the winit example I open up the Table window, on Update: All the new modes seem to be working on windows, linux/wayland, and M1/MacOS so far! |
|
About #11, I'm unable to reproduce on main or this branch, any tip? |
|
I can reproduce it with the 0.0.2 release with the provided code in the issue. As mentioned debian 12 xfce x11 nvidia drivers. I have seen this on windows server before aswell, but not with this exact code. |



Optimize winit integration by removing a full clear and blit per frame
I found out that most of the time was spent in these 3 steps: - clearing the frame - copy canvas to frame - presenting the frame (another copy)
From my understanding, there is no point in doing these 3 steps, if we use the framebuffer as canvas. Which is what the winit implementation now does.
EguiSoftwareRendernow doesn't hold a canvas anymore, it's a newEguiSoftwareRenderCanvasstruct that does this now.Another optimisation I found while doing this is to only present the dirty/damaged zone of the framebuffer. So render now returns a
DirtyRectrepresenting the damaged zone that needs to be presented.This improves winit frame times by a lot!
Two new caching modes
MeshandTiledMeshWith this new
DirtyRectlogic, I was wondering how fast simply drawing the zone that is required to be redrawn without caching render would be.First I did the
Meshmode, simply caching meshes to generate theDirtyRectand rendering any primitive bounding box. Cache lookup is the same as before, final meshes are prepared for cache lookup.Then I was wondering if I could optimize it a bit more, with a new
TiledMeshmode. This mode compute a set of non overlapping bounding boxes extended to tile limits so there is too many of them. And primitive are now rendered for each intersection with this set of bounding boxes. When writing this, I was wondering if seams would appear as this effectively render primitive meshes in multiple steps, but visualy it looks good on my machine at least.egui::Mesh::clone()removedBy changing the render api from
&[ClippedPrimitive]toVec<[ClippedPrimitive]>I was able to remove theegui::Mesh::clone()that was required before.In most cases render will be called with the output of
egui_context.tessellatemaking it perfect. And if a clone of the whole vec is required for some reason it would be the same amount of work as before.SoftwareBackendreworkedI reworked winit
SoftwareBackendexposed API.is_capture_frame_timeandset_capture_frame_timeare now removed, frame_time is now always captured as it only cost 2Instant::now()calls, so really not much.stats() -> &RenderStatsare now exposedcaching,set_cachingto read and change the caching modes live. The winit example use it.clear_cache, Clear cache and reclaim memory, this will cause the next frame to redraw everythingRasterStats inner mutability, to allow &self usage when possible
While doing all this work I mostly left the raster_stat feature a problem for later me. Well when I tried to activate it back, it force
&selfto&mut selfto too many points for my taste and could found a good way to fix this.So the fix was to use inner mutability via AtomicU32 for f32 storage and egui::Mutex for rasterisation stats. I split the
RasterStatsstruct in two parts:RenderStatsthat containsRasterStatswith a "nice" API forstart_raster. I added a few stats for the new render modes.Even if with this changes the
start_rasterwould compile with rayon, as a mutex is involved there no point try to add this stats with the rayon feature, so it's still gated to#[cfg(not(feature = "rayon"))]winit example:
It compiles with the render_stats feature and display stats and allows for live render mode seletion.