Commit graph

29 commits

Author SHA1 Message Date
Aaron Franke
3d1c9fd5de
Move server files into their subfolders 2025-09-30 19:39:39 -07:00
clayjohn
d90332aa0f Avoid attempting to load from shader cache when both the user-dir and res-dir are invalid 2025-09-01 12:25:23 -07:00
Pāvels Nadtočajevs
9c325d0f91
Remove selective shader baking. 2025-07-23 23:02:43 +03:00
Pāvels Nadtočajevs
a8873727ac
[macOS] Selectively bake specific shader variants for MoltenVK. 2025-07-09 20:09:56 +03:00
Dario
5a30a7e7cd Add shader baker to project exporter.
Metal Support contributed by Migeran (https://migeran.com) and Stuart Carnie.

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: Gergely Kis <gergely.kis@migeran.com>
2025-05-27 12:45:27 -03:00
Stuart Carnie
09282c316a Renderer: Reduce scope of mutex locks to prevent common deadlocks
Fixes #102877
2025-04-13 06:56:13 +10:00
Thaddeus Crews
324512e11c
Style: Replace header guards with #pragma once 2025-03-07 17:33:47 -06:00
Yufeng Ying
e88e30c273 Remove unused headers in servers.
Co-authored-by: bruvzg <7645683+bruvzg@users.noreply.github.com>
2024-12-20 18:51:01 +08:00
Matias N. Goldberg
c77cbf096b Improvements from TheForge (see description)
The work was performed by collaboration of TheForge and Google. I am
merely splitting it up into smaller PRs and cleaning it up.

This is the most "risky" PR so far because the previous ones have been
miscellaneous stuff aimed at either [improve
debugging](https://github.com/godotengine/godot/pull/90993) (e.g. device
lost), [improve Android
experience](https://github.com/godotengine/godot/pull/96439) (add Swappy
for better Frame Pacing + Pre-Transformed Swapchains for slightly better
performance), or harmless [ASTC
improvements](https://github.com/godotengine/godot/pull/96045) (better
performance by simply toggling a feature when available).

However this PR contains larger modifications aimed at improving
performance or reducing memory fragmentation. With greater
modifications, come greater risks of bugs or breakage.

Changes introduced by this PR:

TBDR GPUs (e.g. most of Android + iOS + M1 Apple) support rendering to
Render Targets that are not backed by actual GPU memory (everything
stays in cache). This works as long as load action isn't `LOAD`, and
store action must be `DONT_CARE`. This saves VRAM (it also makes
painfully obvious when a mistake introduces a performance regression).
Of particular usefulness is when doing MSAA and keeping the raw MSAA
content is not necessary.

Some GPUs get faster when the sampler settings are hard-coded into the
GLSL shaders (instead of being dynamically bound at runtime). This
required changes to the GLSL shaders, PSO creation routines, Descriptor
creation routines, and Descriptor binding routines.

 - `bool immutable_samplers_enabled = true`

Setting it to false enforces the old behavior. Useful for debugging bugs
and regressions.

Immutable samplers requires that the samplers stay... immutable, hence
this boolean is useful if the promise gets broken. We might want to turn
this into a `GLOBAL_DEF` setting.

Instead of creating dozen/hundreds/thousands of `VkDescriptorSet` every
frame that need to be freed individually when they are no longer needed,
they all get freed at once by resetting the whole pool. Once the whole
pool is no longer in use by the GPU, it gets reset and its memory
recycled. Descriptor sets that are created to be kept around for longer
or forever (i.e. not created and freed within the same frame) **must
not** use linear pools. There may be more than one pool per frame. How
many pools per frame Godot ends up with depends on its capacity, and
that is controlled by
`rendering/rendering_device/vulkan/max_descriptors_per_pool`.

- **Possible improvement for later:** It should be possible for Godot
to adapt to how many descriptors per pool are needed on a per-key basis
(i.e. grow their capacity like `std::vector` does) after rendering a few
frames; which would be better than the current solution of having a
single global value for all pools (`max_descriptors_per_pool`) that the
user needs to tweak.

 - `bool linear_descriptor_pools_enabled = true`

Setting it to false enforces the old behavior. Useful for debugging bugs
and regressions.
Setting it to false is required when workarounding driver bugs (e.g.
Adreno 730).

A ridiculous optimization. Ridiculous because the original code
should've done this in the first place. Previously Godot was doing the
following:

  1. Create a command buffer **pool**. One per frame.
  2. Create multiple command buffers from the pool in point 1.
3. Call `vkBeginCommandBuffer` on the cmd buffer in point 2. This
resets the cmd buffer because Godot requests the
`VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT` flag.
  4. Add commands to the cmd buffers from point 2.
  5. Submit those commands.
6. On frame N + 2, recycle the buffer pool and cmd buffers from pt 1 &
2, and repeat from step 3.

The problem here is that step 3 resets each command buffer individually.
Initially Godot used to have 1 cmd buffer per pool, thus the impact is
very low.

But not anymore (specially with Adreno workarounds to force splitting
compute dispatches into a new cmd buffer, more on this later). However
Godot keeps around a very low amount of command buffers per frame.

The recommended method is to reset the whole pool, to reset all cmd
buffers at once. Hence the new steps would be:

  1. Create a command buffer **pool**. One per frame.
  2. Create multiple command buffers from the pool in point 1.
3. Call `vkBeginCommandBuffer` on the cmd buffer in point 2, which is
already reset/empty (see step 6).
  4. Add commands to the cmd buffers from point 2.
  5. Submit those commands.
6. On frame N + 2, recycle the buffer pool and cmd buffers from pt 1 &
2, call `vkResetCommandPool` and repeat from step 3.

**Possible issues:** @dariosamo added `transfer_worker` which creates a
command buffer pool:

```cpp
transfer_worker->command_pool =
driver->command_pool_create(transfer_queue_family,
RDD::COMMAND_BUFFER_TYPE_PRIMARY);
```

As expected, validation was complaining that command buffers were being
reused without being reset (that's good, we now know Validation Layers
will warn us of wrong use).
I fixed it by adding:

```cpp
void RenderingDevice::_wait_for_transfer_worker(TransferWorker
*p_transfer_worker) {
	driver->fence_wait(p_transfer_worker->command_fence);
	driver->command_pool_reset(p_transfer_worker->command_pool); //
! New line !
```

**Secondary cmd buffers are subject to the same issue but I didn't alter
them. I talked this with Dario and he is aware of this.**
Secondary cmd buffers are currently disabled due to other issues (it's
disabled on master).

 - `bool RenderingDeviceCommons::command_pool_reset_enabled`

Setting it to false enforces the old behavior. Useful for debugging bugs
and regressions.

There's no other reason for this boolean. Possibly once it becomes well
tested, the boolean could be removed entirely.

Adds `command_bind_render_uniform_sets` and
`add_draw_list_bind_uniform_sets` (+ compute variants).

It performs the same as `add_draw_list_bind_uniform_set` (notice
singular vs plural), but on multiple consecutive uniform sets, thus
reducing graph and draw call overhead.

 - `bool descriptor_set_batching = true;`

Setting it to false enforces the old behavior. Useful for debugging bugs
and regressions.

There's no other reason for this boolean. Possibly once it becomes well
tested, the boolean could be removed entirely.

Godot currently does the following:

 1. Fill the entire cmd buffer with commands.
 2. `submit()`
    - Wait with a semaphore for the swapchain.
- Trigger a semaphore to indicate when we're done (so the swapchain
can submit).
 3. `present()`

The optimization opportunity here is that 95% of Godot's rendering is
done offscreen.
Then a fullscreen pass copies everything to the swapchain. Godot doesn't
practically render directly to the swapchain.

The problem with this is that the GPU has to wait for the swapchain to
be released **to start anything**, when we could start *much earlier*.
Only the final blit pass must wait for the swapchain.

TheForge changed it to the following (more complicated, I'm simplifying
the idea):

 1. Fill the entire cmd buffer with commands.
 2. In `screen_prepare_for_drawing` do `submit()`
    - There are no semaphore waits for the swapchain.
    - Trigger a semaphore to indicate when we're done.
3. Fill a new cmd buffer that only does the final blit to the
swapchain.
 4. `submit()`
    - Wait with a semaphore for the submit() from step 2.
- Wait with a semaphore for the swapchain (so the swapchain can
submit).
- Trigger a semaphore to indicate when we're done (so the swapchain
can submit).
 5. `present()`

Dario discovered this problem independently while working on a different
platform.

**However TheForge's solution had to be rewritten from scratch:** The
complexity to achieve the solution was high and quite difficult to
maintain with the way Godot works now (after Übershaders PR).
But on the other hand, re-implementing the solution became much simpler
because Dario already had to do something similar: To fix an Adreno 730
driver bug, he had to implement splitting command buffers. **This is
exactly what we need!**. Thus it was re-written using this existing
functionality for a new purpose.

To achieve this, I added a new argument, `bool p_split_cmd_buffer`, to
`RenderingDeviceGraph::add_draw_list_begin`, which is only set to true
by `RenderingDevice::draw_list_begin_for_screen`.

The graph will split the draw list into its own command buffer.

 - `bool split_swapchain_into_its_own_cmd_buffer = true;`

Setting it to false enforces the old behavior. This might be necessary
for consoles which follow an alternate solution to the same problem.
If not, then we should consider removing it.

PR #90993 added `shader_destroy_modules()` but it was not actually in
use.

This PR adds several places where `shader_destroy_modules()` is called
after initialization to free up memory of SPIR-V structures that are no
longer needed.
2024-12-09 11:49:28 -03:00
Dario
e2c6daf7ef Implement asynchronous transfer queues, thread guards on RenderingDevice. Add ubershaders and rework pipeline caches for Forward+ and Mobile.
- Implements asynchronous transfer queues from PR #87590.
- Adds ubershaders that can run with specialization constants specified as push constants.
- Pipelines with specialization constants can compile in the background.
- Added monitoring for pipeline compilations.
- Materials and shaders can now be created asynchronously on background threads.
- Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
2024-10-02 15:11:58 -03:00
Thaddeus Crews
b37fc1014a
Style: Apply new clang-format changes 2024-09-20 08:09:48 -05:00
Pedro J. Estébanez
f3ef83517a Namespace shader cache files by graphics API 2024-01-31 20:19:12 +01:00
A Thousand Ships
fdd3d36c6d [Servers] Replace ERR_FAIL_COND with ERR_FAIL_NULL where applicable 2023-09-25 18:45:30 +02:00
clayjohn
e970f5249c Add Shader compile groups to RD Shader system
This allows us to specify a subset of variants to compile at load time and conditionally other variants later.

This works seamlessly with shader caching.

Needed to ensure that users only pay the cost for variants they use
2023-07-21 16:42:30 +02:00
Rémi Verschelde
d95794ec8a
One Copyright Update to rule them all
As many open source projects have started doing it, we're removing the
current year from the copyright notice, so that we don't need to bump
it every year.

It seems like only the first year of publication is technically
relevant for copyright notices, and even that seems to be something
that many companies stopped listing altogether (in a version controlled
codebase, the commits are a much better source of date of publication
than a hardcoded copyright statement).

We also now list Godot Engine contributors first as we're collectively
the current maintainers of the project, and we clarify that the
"exclusive" copyright of the co-founders covers the timespan before
opensourcing (their further contributions are included as part of Godot
Engine contributors).

Also fixed "cf." Frenchism - it's meant as "refer to / see".
2023-01-05 13:25:55 +01:00
reduz
746dddc067 Replace most uses of Map by HashMap
* Map is unnecessary and inefficient in almost every case.
* Replaced by the new HashMap.
* Renamed Map to RBMap and Set to RBSet for cases that still make sense
  (order matters) but use is discouraged.

There were very few cases where replacing by HashMap was undesired because
keeping the key order was intended.
I tried to keep those (as RBMap) as much as possible, but might have missed
some. Review appreciated!
2022-05-16 10:37:48 +02:00
Rémi Verschelde
ba2bdc478b
Style: Remove inconsistently used @author docstrings
Each file in Godot has had multiple contributors who co-authored it over the
years, and the information of who was the original person to create that file
is not very relevant, especially when used so inconsistently.

`git blame` is a much better way to know who initially authored or later
modified a given chunk of code, and most IDEs now have good integration to
show this information.
2022-01-04 20:42:50 +01:00
Rémi Verschelde
fe52458154
Update copyright statements to 2022
Happy new year to the wonderful Godot community!
2022-01-03 21:27:34 +01:00
Hugo Locurcio
ba65730cbf
Rename RID's getornull() to get_or_null() 2021-09-29 23:58:02 +02:00
Yuri Roubinsky
9de779344c Makes a clear error message if shader compilation failed 2021-08-16 11:25:20 +03:00
Yuri Roubinsky
63c7d5c330 Removes an internal error report if shader fails compile 2021-08-12 10:18:18 +03:00
reduz
cf3f404d31 Implement Binary Shader Compilation
* Added an extra stage before compiling shader, which is generating a binary blob.
* On Vulkan, this allows caching the SPIRV reflection information, which is expensive to parse.
* On other (future) RenderingDevices, it allows caching converted binary data, such as DXIL or MSL.

This PR makes the shader cache include the reflection information, hence editor startup times are significantly improved.
I tested this well and it appears to work, and I added a lot of consistency checks, but because it includes writing and reading binary information, rare bugs may pop up, so be aware.
There was not much of a choice for storing the reflection information, given shaders can be a lot, take a lot of space and take time to parse.
2021-07-26 08:40:39 -03:00
reduz
0d2e02945b Implement shader caching
* Shader compilation is now cached. Subsequent loads take less than a millisecond.
* Improved game, editor and project manager startup time.
* Editor uses .godot/shader_cache to store shaders.
* Game uses user://shader_cache
* Project manager uses $config_dir/shader_cache
* Options to tweak shader caching in project settings.
* Editor path configuration moved from EditorSettings to new class, EditorPaths, so it can be available early on (before shaders are compiled).
* Reworked ShaderCompilerRD to ensure deterministic shader code creation (else shader may change and cache will be invalidated).
* Added shader compression with SMOLV: https://github.com/aras-p/smol-v
2021-05-31 10:13:09 +02:00
reduz
d3b49c416a Refactor GLSL shader compilation
-Used a more consistent set of keywords for the shader
-Remove all harcoded entry points
-Re-wrote the GLSL shader parser, new system is more flexible. Allows any entry point organization.
-Entry point for sky shaders is now sky().
-Entry point for particle shaders is now process().
2021-04-14 11:37:52 -03:00
reduz
f20999f6fe Rewrote how barriers work for faster rendering
-Added more finegrained control in RenderingDevice API
-Optimized barriers (use less ones for thee same)
-General optimizations
-Shadows render all together unbarriered
-GI can render together with shadows.
-SDFGI can render together with depth-preoass.
-General fixes
-Added GPU detection
2021-02-04 09:42:28 -03:00
reduz
cdb216f4e4 Added ability to visualize native shaders 2021-01-06 09:40:09 -03:00
Rémi Verschelde
b5334d14f7
Update copyright statements to 2021
Happy new year to the wonderful Godot community!

2020 has been a tough year for most of us personally, but a good year for
Godot development nonetheless with a huge amount of work done towards Godot
4.0 and great improvements backported to the long-lived 3.2 branch.

We've had close to 400 contributors to engine code this year, authoring near
7,000 commit! (And that's only for the `master` branch and for the engine code,
there's a lot more when counting docs, demos and other first-party repos.)

Here's to a great year 2021 for all Godot users 🎆
2021-01-01 20:19:21 +01:00
reduz
2748b9a10d Add support for low-end 3D rendering.
-Reduce number of uniform sets from 6 to 4.
-Remove features in low end mode, in order to reduce the number of texture units fit to 16.
2020-12-07 20:50:57 -03:00
reduz
2787ad65be RenderingServer reorganization 2020-12-04 18:39:46 -03:00
Renamed from servers/rendering/rasterizer_rd/shader_rd.h (Browse further)