[GPU] fix limited GV un init problem#819
Open
BI71317 wants to merge 1 commit into
Open
Conversation
Contributor
Author
|
As I can't so sure whether this PR acceptable, I didn't added Kernel test cases. If PR is acceptable, I'll also think about test cases for Kernel GV examples, and add it in this branch. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR partially fixes #781 and fixes #818.
What does this PR do
This PR updates the
constant propagation passso thatglobal constantsinitialized through simple scalar cast/constructor patterns can also be folded.Previously, CIR only considered literal scalar constants as valid global constants:
With this change, the pass also accepts a
narrow CallInstr patternwhere the global value is initialized through asingle-argument type.__new__ call, such asfloat32(pi):Motivation
While investigating the generated
LLVM IRandPTX, I noticed that literal global constants were already beinginlinedinto GPU kernels, but casted global constants were not.For example, in the
math module,math.piis a literal scalar constant, whilemath.pi32is initialized as something equivalent tofloat32(math.pi). Even though both are initialized from constants, only math.pi was folded and inlined into the kernel.MRE
Result
In the generated PTX, only
pi32remained as an uninitialized global variable:Looking at the LLVM IR,
both
piandpi32are emitted as globals initialized to zero,and their actual values are assigned inside the math import initialization function:
Folding Pass
However, the folding pass runs earlier at the
CIR level. When the folding pass group is disabled:When this pass is turned off,
both
piandpi32remain as uninitialized globals in PTX:This is why I changed the constant propagation logic in the CIR folding pass
rather than trying to handle this later in LLVM IR or PTX generation.
Scope of the Change
The change is intentionally narrow.
It does not attempt to fold arbitrary function-call initializers.
Previously, only direct scalar constants were accepted.
This PR extends the accepted pattern to simple
constructor/castcalls of the form:where the call:
This covers such as:
Additional tests
MRE 1: const in math module
Result
The
e32,pi32, andtau32cases are now handled because they match the scalar cast pattern. Theinf,nan,inf32, andnan32cases are still not handled, since they are initialized through function-call patterns outside the current whitelist. (And Actually I don` know which way is proper way to express.)MRE 2: user-defined casted globals
Result
But except these patterns, still GVs are not initialized as explained.
MRE3: general function-call globals
Result
Questions / Limits
This is a deliberately limited fix for scalar cast/constructor patterns. It does not solve the broader global-variable initialization issue in general.
Also, because the change is made at the
CIR constant propagation level, it is notGPU-specificand may affectCPU-sideoptimization behavior as well.I would appreciate feedback on whether this is an acceptable direction, or whether this should instead be handled in a GPU-specific lowering path or through a more general global-initialization mechanism.