Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll | ||
---|---|---|
85 ↗ | (On Diff #497880) | The comment at the start of this test says the following: 11 ; This module creates a divergent branch in block Flow2. The branch is 12 ; marked as divergent by the divergence analysis but the condition is 13 ; not. This test ensures that the divergence of the branch is tested, 14 ; not its condition, so that branch is correctly emitted as divergent. It seems that the proposed change completely fails the intention of this test? |
I think for this specific case, we should report %8 as uniform, and the branch should also be uniform. But there seems something wrong in the uniform analysis, if you try opt -passes='print<uniformity>' with the test here. We will see both the condition %8 and the conditional branch br i1 %8,... are reported as divergent. but the isUniform() query return true for the branch instruction. I think there should be something wrong in uniform analysis. If the condition is divergent, I think the branch should also be divergent. Another issue I want to point out that the uniform analysis has a subtle difference with divergence analysis. There is some comment in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPURewriteUndefForPHI.cpp#L13 to explain the issue. I think we need to fill the gap to switch to uniform analysis, otherwise we will regress code generation. The last time we discussed this, I think we want some target specific option in uniform analysis to match this behavior with divergence analysis.
I agree with that assessment, and pretty much found the same thing myself by comparing with the output of "print<divergence>". The uniformity analysis assumes a phi is divergent if it has an undef argument, but the divergence analysis does not make that assumption. The difference is in the call to PHINode::hasConstantValue() for divergence analysis and PHINode::hasConstantOrUndefValue() for uniformity analysis.
I think we should just fix the uniformity analysis to match the existing behaviour and revisit the issue later. I can have the patch ready by Monday EOD IST.
I think it is reasonable to match divergent analysis behavior regarding to undef. The other problem that isUniform() return true for a divergent branch instruction makes me wonder: is it the best way to replace use of divergence analysis with uniform analysis one by one? Although I am optimistic about the quality of uniform analysis, I think it may be more helpful to replace all the occurrences of divergence analysis and fix all the bugs uncovered. Ideally, we would have very little test changes. The reason is that one specific pass may not have enough test coverage. Fixing the bugs after switching all the uses of divergence analysis to uniform analysis will make us more confident that we will less likely cause regression. Any different idea?
I tried replacing the analysis in AMDGPUUnifyDivergentExitNodes pass, it causes 20 tests to fail including a few fatal errors. I could imagine a lot more failures may occur after replacing in 8 or so other middle-end passes. Replacing all the occurences in a single patch will be massive. I prefer the one pass at a time approach (like the approach with this patch).
I agree with @gandhi21299 ... it's best to work through the passes one at a time. But maybe we should not enable each change as it happens. We could put the changes behind a command-line option to switch between DA and UA. One single option that switches whatever passes have been updated, until all passes are updated and we are ready to flip the switch permanently.
Do you mean a single option that determines the analysis to use and all transformations use analysis depending on the option value? Something like --enable-uniformity-analysis=1?
I think changing them one at a time is fine, this shouldn't be a long drawn out process.
Another possible way may be do the replacement one pass by one pass, but we only submit all the changes until all the passes have switched to uniform analysis.
Not exactly like that. We can make the UA a wrapper over the DA, and update the passes to use only the UA. Depending on the commandline, the UA will actually compute its own results, or simply hand off to a DA stored internally. This is how the legacy DA and the DA are currently arranged.
Once we are ready to shift to UA for good, we can eliminate all this. And we should.
- Instead of checking uniformity of a branch instruction in the pass, check whether its parent basic block is not divergent.
- Update test checks.
llvm/lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp | ||
---|---|---|
81 | I am not sure we should be doing this. Smells like a bug in the UA! |
llvm/lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp | ||
---|---|---|
81 | Did you mean to change this back to isUniform? (I don't really mind either way.) | |
llvm/test/CodeGen/AMDGPU/branch-uniformity.ll | ||
12 ↗ | (On Diff #501047) | This looks like it might be the same kind of regression as i1-copy-from-loop.ll |
llvm/test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll | ||
46 ↗ | (On Diff #501047) | Same kind of regression here I think (the s_cselect_b64 -1, 0 is a bad smell). |
llvm/test/CodeGen/AMDGPU/i1-copy-from-loop.ll | ||
28 ↗ | (On Diff #501047) | This is a regression. |
llvm/test/CodeGen/AMDGPU/i1-copy-from-loop.ll | ||
---|---|---|
28 ↗ | (On Diff #501047) | Normally you would not commit a patch that causes regressions, unless you can explain them and get agreement that they are acceptable. |
I am not sure we should be doing this. Smells like a bug in the UA!