Sometime, we may get a SGPR-PHI with only one meaningful value from
dominator and undefined in another predecessor. The problem with
such pattern is the SGPR might be overwritten in the undefined
predecessor. The pass added here is to make the SGPR also alive in the
originally dead predecessors.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/SIFixSGPRLiveness.cpp | ||
---|---|---|
94 | Thanks for point out this, I have refined this part to be more robust. |
I think the code is correct, but this needs a MIR test checking the behaviour of the pass.
My understanding is that this pass only fixes specific liveness problems that are created by D132450. It is not a general purpose pass that could handle any possible pattern of SGPR usage.
llvm/lib/Target/AMDGPU/SIFixSGPRLiveness.cpp | ||
---|---|---|
17 | The implementation does not check that there is a divergent branch, so it would do the same thing for uniform control flow. |
Thank you for your comment @foad. But I don't agree the problem was created by D132450. see the attached sgpr-liveness.ll, a hand-made test can also show the problem (The tricky thing is that the existing structurizer still simplify the IR even it is already structurized, thus it helps hiding the issue, but I don't think it is the expected behavior). Yes the current design of the pass only works for this known pattern. That is just because I don't know if there are any other broken patterns.
llvm/lib/Target/AMDGPU/SIFixSGPRLiveness.cpp | ||
---|---|---|
17 | Sounds true, this is doing things conservative, but I think it may not cause any regression in real case. Practically, such kind of pattern in source program should already been optimized into the single dominate value. The only possible cases that I can think of are either "opt-none" or "created by structurizeCFG". If such pattern is created by structurizeCFG, BB1 will be very likely to be divergent because structurizeCFG was designed to try to skip uniform branching as much as possible. I am not sure if checking the terminator of BB1 against s_cbranch_scc0/scc1 for uniform branch sounds good idea? |
llvm/lib/Target/AMDGPU/SIFixSGPRLiveness.cpp | ||
---|---|---|
17 | I will try to use the idea at https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp#L170 to detect divergent branch. |
I think the new version should work correctly in all possible shapes of post-structurized CFG I can think of now. Please let me know if you have any further concern.
The implementation does not check that there is a divergent branch, so it would do the same thing for uniform control flow.