Skip to content

feat: transition from compile-time to runtime backend discovery#1448

Open
wbruna wants to merge 2 commits intoleejet:masterfrom
wbruna:sd_runtime_backend
Open

feat: transition from compile-time to runtime backend discovery#1448
wbruna wants to merge 2 commits intoleejet:masterfrom
wbruna:sd_runtime_backend

Conversation

@wbruna
Copy link
Copy Markdown
Contributor

@wbruna wbruna commented Apr 20, 2026

Applies the following parts from #1184 and #1368 :

  • Introduce ggml_extend_backend.hpp for dynamic backend management.
  • Convert backend-specific SD_USE_* preprocessor tests to runtime tests, propagating the backend handler when needed.

Additionally, to make this work with minimal changes:

  • A new function sd_get_default_backend replaces the default backend selection on src/stable-diffusion.cpp and src/upscaler.cpp, preserving the SD_VK_DEVICE env var support.
  • Clean up SD_USE_* defines from CMakeLists.txt (no other build changes).

This is not just a refactor, because it improves device selection a bit:

  • Previously, Vulkan selected device 0 by default, but this was the wrong choice on my system, which has the iGPU as 0 and the discrete card as 1. The new selection algorithm correctly prioritizes the discrete GPU.
  • The upscaler now follows SD_VK_DEVICE too.

Applies the following parts from leejet#1184 and leejet#1368 :

- Introduce ggml_extend_backend.hpp for dynamic backend management.
- Convert backend-specific SD_USE_* preprocessor tests to runtime tests,
  propagating the backend handler when needed.

Additionally, to make this work with minimal changes:

- A new function sd_get_default_backend replaces the default backend
  selection on src/stable-diffusion.cpp and src/upscaler.cpp,
  preserving the SD_VK_DEVICE env var support.
- Clean up SD_USE_* defines from CMakeLists.txt (no other build changes).

This is not just a refactor, because it improves device selection a bit:

- Previously, Vulkan selected device 0 by default, but this was the
  wrong choice on my system, which has the iGPU as 0 and the discrete
  card as 1. The new selection algorithm correctly prioritizes the
  discrete GPU.
- The upscaler now follows SD_VK_DEVICE too.

Co-authored-by: Stéphane du Hamel <stephduh@live.fr>
Co-authored-by: Cyberhan123 <255542417@qq.com>
Comment thread CMakeLists.txt
set(GGML_SYCL ON)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-narrowing -fsycl")
add_definitions(-DSD_USE_SYCL)
# disable fast-math on host, see:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part (not changed by the PR) is weird: it could make sense to set it for ggml, but for sd.cpp? If so, what will happen when we add support for GGML_BACKEND_DL?

#define __STATIC_INLINE__ static inline
#endif

inline void ggml_backend_load_all_once() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Cyberhan123 , why exactly this needs to be inline?

Also: do we really need the #if defined(GGML_BACKEND_DL) guard? As I understand it, for static-backend mode ggml_backend_device_count would return > 0 anyway?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid conflicts with the ggml API, inlining is required.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this is that in a shared library scenario, it's necessary to call ggml_backend_load_all(); or ggml_backend_load_all_from_path externally. In this case, backend all is already greater than 0, so execution is skipped.

Therefore, a more reasonable solution is to call ggml_init_best in example/xxx, not in the library's init_best.

This approach directly exposes ggml_backend_t in the C API, which is actually better.

It's just a compromise I made to adapt to the previous C API chosen for setting the device via strings.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a concrete example, please refer to this link: https://github.com/Cyberhan123/slab.rs/blob/main/crates/slab-ggml/src/lib.rs#L49

To explain, in DLL mode, the ggml diffusion and the executable are not in the same directory, so ggml_load_from_path is required.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I am not asking about the dynamic case: I'm asking why this block is skipped with #ifdef for the static one 🙂 Does anything break if we do this?

diff --git a/src/ggml_extend_backend.hpp b/src/ggml_extend_backend.hpp
index 19cf56f..1b7a083 100644
--- a/src/ggml_extend_backend.hpp
+++ b/src/ggml_extend_backend.hpp
@@ -7,33 +7,31 @@
 #include "ggml-backend.h"
 #include "ggml.h"
 
 #ifndef __STATIC_INLINE__
 #define __STATIC_INLINE__ static inline
 #endif
 
 inline void ggml_backend_load_all_once() {
-#if defined(GGML_BACKEND_DL)
     // If the host process already preloaded backends explicitly
     // (for example via ggml_backend_load / ggml_backend_load_all_from_path),
     // do not rescan the default paths again.
     if (ggml_backend_dev_count() > 0) {
         return;
     }
     // In dynamic-backend mode the backend modules are discovered at runtime,
     // so we must load them before asking for the CPU backend or its proc table.
     static std::once_flag once;
     std::call_once(once, []() {
         if (ggml_backend_dev_count() > 0) {
             return;
         }
         ggml_backend_load_all();
     });
-#endif
 }
 
 #if defined(GGML_BACKEND_DL)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be problems with it, because it is read through share libary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I am asking is: if we ever have a situation where a file includes this with GGML_BACKEND_DL defined, and another one without GGML_BACKEND_DL, we'll hit an ODR violation. IMHO, that could be harder to debug than whatever issue comes from ggml_backend_dev_count() possibly returning 0 for a static-backend build (and looking at the ggml code, that would only happen with no backend enabled anyway).

If one translation unit includes the header with GGML_BACKEND_DL
defined, and another includes it without, we'll hit an ODR
violation, which is undefined behavior.

For the current static-backend build, `ggml_backend_dev_count() > 0`
will skip the `ggml_backend_load_all` call anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants