feat: transition from compile-time to runtime backend discovery#1448
feat: transition from compile-time to runtime backend discovery#1448wbruna wants to merge 2 commits intoleejet:masterfrom
Conversation
Applies the following parts from leejet#1184 and leejet#1368 : - Introduce ggml_extend_backend.hpp for dynamic backend management. - Convert backend-specific SD_USE_* preprocessor tests to runtime tests, propagating the backend handler when needed. Additionally, to make this work with minimal changes: - A new function sd_get_default_backend replaces the default backend selection on src/stable-diffusion.cpp and src/upscaler.cpp, preserving the SD_VK_DEVICE env var support. - Clean up SD_USE_* defines from CMakeLists.txt (no other build changes). This is not just a refactor, because it improves device selection a bit: - Previously, Vulkan selected device 0 by default, but this was the wrong choice on my system, which has the iGPU as 0 and the discrete card as 1. The new selection algorithm correctly prioritizes the discrete GPU. - The upscaler now follows SD_VK_DEVICE too. Co-authored-by: Stéphane du Hamel <stephduh@live.fr> Co-authored-by: Cyberhan123 <255542417@qq.com>
| set(GGML_SYCL ON) | ||
| set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-narrowing -fsycl") | ||
| add_definitions(-DSD_USE_SYCL) | ||
| # disable fast-math on host, see: |
There was a problem hiding this comment.
This part (not changed by the PR) is weird: it could make sense to set it for ggml, but for sd.cpp? If so, what will happen when we add support for GGML_BACKEND_DL?
| #define __STATIC_INLINE__ static inline | ||
| #endif | ||
|
|
||
| inline void ggml_backend_load_all_once() { |
There was a problem hiding this comment.
@Cyberhan123 , why exactly this needs to be inline?
Also: do we really need the #if defined(GGML_BACKEND_DL) guard? As I understand it, for static-backend mode ggml_backend_device_count would return > 0 anyway?
There was a problem hiding this comment.
To avoid conflicts with the ggml API, inlining is required.
There was a problem hiding this comment.
The reason for this is that in a shared library scenario, it's necessary to call ggml_backend_load_all(); or ggml_backend_load_all_from_path externally. In this case, backend all is already greater than 0, so execution is skipped.
Therefore, a more reasonable solution is to call ggml_init_best in example/xxx, not in the library's init_best.
This approach directly exposes ggml_backend_t in the C API, which is actually better.
It's just a compromise I made to adapt to the previous C API chosen for setting the device via strings.
There was a problem hiding this comment.
For a concrete example, please refer to this link: https://github.com/Cyberhan123/slab.rs/blob/main/crates/slab-ggml/src/lib.rs#L49
To explain, in DLL mode, the ggml diffusion and the executable are not in the same directory, so ggml_load_from_path is required.
There was a problem hiding this comment.
Actually, I am not asking about the dynamic case: I'm asking why this block is skipped with #ifdef for the static one 🙂 Does anything break if we do this?
diff --git a/src/ggml_extend_backend.hpp b/src/ggml_extend_backend.hpp
index 19cf56f..1b7a083 100644
--- a/src/ggml_extend_backend.hpp
+++ b/src/ggml_extend_backend.hpp
@@ -7,33 +7,31 @@
#include "ggml-backend.h"
#include "ggml.h"
#ifndef __STATIC_INLINE__
#define __STATIC_INLINE__ static inline
#endif
inline void ggml_backend_load_all_once() {
-#if defined(GGML_BACKEND_DL)
// If the host process already preloaded backends explicitly
// (for example via ggml_backend_load / ggml_backend_load_all_from_path),
// do not rescan the default paths again.
if (ggml_backend_dev_count() > 0) {
return;
}
// In dynamic-backend mode the backend modules are discovered at runtime,
// so we must load them before asking for the CPU backend or its proc table.
static std::once_flag once;
std::call_once(once, []() {
if (ggml_backend_dev_count() > 0) {
return;
}
ggml_backend_load_all();
});
-#endif
}
#if defined(GGML_BACKEND_DL)There was a problem hiding this comment.
There may be problems with it, because it is read through share libary.
There was a problem hiding this comment.
The reason I am asking is: if we ever have a situation where a file includes this with GGML_BACKEND_DL defined, and another one without GGML_BACKEND_DL, we'll hit an ODR violation. IMHO, that could be harder to debug than whatever issue comes from ggml_backend_dev_count() possibly returning 0 for a static-backend build (and looking at the ggml code, that would only happen with no backend enabled anyway).
If one translation unit includes the header with GGML_BACKEND_DL defined, and another includes it without, we'll hit an ODR violation, which is undefined behavior. For the current static-backend build, `ggml_backend_dev_count() > 0` will skip the `ggml_backend_load_all` call anyway.
Applies the following parts from #1184 and #1368 :
Additionally, to make this work with minimal changes:
This is not just a refactor, because it improves device selection a bit: