Fixes#90017Fixes#90030Fixes#98044
This PR makes the following changes:
# Force processing of GPU commands for frame_count frames
The variable `frames_pending_resources_for_processing` is added to track
this.
The ticket #98044 suggested to use `_flush_and_stall_for_all_frames()`
while minimized.
Technically this works and is a viable solution.
However I noticed that this issue was happening because Logic/Physics
continue to work "business as usual" while minimized(\*). Only Graphics
was being deactivated (which caused commands to accumulate until window
is restored).
To continue this behavior of "business as usual", I decided that GPU
work should also "continue as usual" by buffering commands in a double
or triple buffer scheme until all commands are done processing (if they
ever stop coming). This is specially important if the app specifically
intends to keep processing while minimized.
Calling `_flush_and_stall_for_all_frames()` would fix the leak, but it
would make Godot's behavior different while minimized vs while the
window is presenting.
\* `OS::add_frame_delay` _does_ consider being minimized, but it just
throttles CPU usage. Some platforms such as Android completely disable
processing because the higher level code stops being called when the app
goes into background. But this seems like an implementation-detail that
diverges from the rest of the platforms (e.g. Windows, Linux & macOS
continue to process while minimized).
# Rename p_swap_buffers for p_present
**This is potentially a breaking change** (if it actually breaks
anything, I ignore. But I strongly suspect it doesn't break anything).
"Swap Buffers" is a concept carried from OpenGL, where a frame is "done"
when `glSwapBuffers()` is called, which basically means "present to the
screen".
However it _also_ means that OpenGL internally swaps its internal
buffers in a double/triple buffer scheme (in Vulkan, we do that
ourselves and is tracked by `RenderingDevice::frame`).
Modern APIs like Vulkan differentiate between "submitting GPU work" and
"presenting".
Before this PR, calling `RendererCompositorRD::end_frame(false)` would
literally do nothing. This is often undesired and the cause of the leak.
After this PR, calling `RendererCompositorRD::end_frame(false)` will now
process commands, swap our internal buffers in a double/triple buffer
scheme **but avoid presenting to the screen**.
Hence the rename of the variable from `p_swap_buffers` to `p_present`
(which slightly alters its behavior).
If we want `RendererCompositorRD::end_frame(false)` to do nothing, then
we should not call it at all.
This PR reflects such change: When we're minimized **_and_**
`has_pending_resources_for_processing()` returns false, we don't call
`RendererCompositorRD::end_frame()` at all.
But if `has_pending_resources_for_processing()` returns true, we will
call it, but with `p_present = false` because we're minimized.
There's still the issue that Godot keeps processing work (logic,
scripts, physics) while minimized, which we shouldn't do by default. But
that's work for follow up PR.
Fixes an issue introduced in #96439 (see
https://github.com/godotengine/godot/pull/96439#issuecomment-2447288702)
Godot was relying on Java's
activity.getWindowManager().getDefaultDisplay().getRotation(); to apply
pre-rotation but this is wrong.
First, getRotation() may temporarily return a different value from the
correct one; which is what was causing the splash screen to be upside
down. It would return -90 instead of 90 for the first rendered frame.
But unfortunately, the splash screen is just one frame rendered for a
very long time, so the error lingered for a long time for everyone to
see.
Second, to determine what rotation to use, we should be looking at what
Vulkan told us, which is the value we pass to
VkSurfaceTransformFlagBitsKHR::preTransform.
This commit removes the now-unnecessary
screen_get_internal_current_rotation() function (which was introduced by
#96439) and now saves the preTransform value in the swapchain.
- Adds Swappy for Android for stable frame pacing
- Implements pre-transformed Swapchain so that Godot's compositor is in
charge of rotating the screen instead of Android's compositor
(performance optimization for phones that don't have HW rotator)
============================
The work was performed by collaboration of TheForge and Google. I am
merely splitting it up into smaller PRs and cleaning it up.
Changes from original PR:
- Removed "display/window/frame_pacing/android/target_frame_rate" option
to use Engine::get_max_fps instead.
- Target framerate can be changed at runtime using Engine::set_max_fps.
- Swappy is enabled by default.
- Added documentation.
- enable_auto_swap setting is replaced with swappy_mode.
Also adds a new possible texture layout and API trait to support a particular behavior in D3D12 where only the COMMON layout is supported in copy queues. Fixes#98158.
- Implements asynchronous transfer queues from PR #87590.
- Adds ubershaders that can run with specialization constants specified as push constants.
- Pipelines with specialization constants can compile in the background.
- Added monitoring for pipeline compilations.
- Materials and shaders can now be created asynchronously on background threads.
- Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
PR #90993 added several debugging utilities.
Among them, advanced memory tracking through the use of custom
allocators and VK_EXT_device_memory_report.
However as issue #95967 reveals, it is dangerous to leave it on by
default because drivers (or even the Vulkan loader) can too easily
accidentally break custom allocators by allocating memory through std
malloc but then request us to deallocate it (or viceversa).
This PR fixes the following problems:
- Adds --extra-gpu-memory-tracking cmd line argument
- Adds missing enum entries to
RenderingContextDriverVulkan::VkTrackedObjectType
- Adds RenderingDevice::get_driver_and_device_memory_report
- GDScript users can easily check via print(
RenderingServer.get_rendering_device().get_driver_and_device_memory_report()
)
- Uses get_driver_and_device_memory_report on device lost for appending
further info.
Fixes#95967
Features:
- Debug-only tracking of objects by type. See
get_driver_allocs_by_object_type et al.
- Debug-only Breadcrumb info for debugging GPU crashes and device lost
- Performance report per frame from get_perf_report
- Some VMA calls had to be modified in order to insert the necessary
memory callbacks
Functionality marked as "debug-only" is only available in debug or dev
builds.
Misc fixes:
- Early break optimization in RenderingDevice::uniform_set_create
============================
The work was performed by collaboration of TheForge and Google. I am
merely splitting it up into smaller PRs and cleaning it up.
Enables support for enhanced barriers if available.
Gets rid of the implementation of [CROSS_FAMILY_FALLBACK] in the D3D12 driver. The logic has been reimplemented at a higher level in RenderingDevice itself.
This fallback is only used if the RenderingDeviceDriver reports the API traits and the capability of sharing texture formats correctly. Aliases created in this way can only be used for sampling: never for writing. In most cases, the formats that do not support sharing do not support unordered access/storage writes in the first place.
* Replaces `find(...) != -1` with `contains` for `String`
* Replaces `find(...) == -1` with `!contains` for `String`
* Replaces `find(...) != -1` with `has` for containers
* Replaces `find(...) == -1` with `!has` for containers