This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Replace LegacyDA with Uniformity Analysis in AnnotateUniformValues
ClosedPublic

Authored by gandhi21299 on Feb 15 2023, 9:07 PM.

Diff Detail

Event Timeline

gandhi21299 created this revision.Feb 15 2023, 9:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2023, 9:07 PM
gandhi21299 requested review of this revision.Feb 15 2023, 9:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2023, 9:07 PM
gandhi21299 added a project: Restricted Project.
gandhi21299 retitled this revision from [AnnotateUniformValues] Replace LegacyDA with Uniformity Analysis to [AMDGPU] Replace LegacyDA with Uniformity Analysis in AnnotateUniformValues.Feb 15 2023, 9:13 PM
sameerds added inline comments.Feb 16 2023, 4:40 AM
llvm/test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll
85 ↗(On Diff #497880)

The comment at the start of this test says the following:

11 ; This module creates a divergent branch in block Flow2. The branch is
12 ; marked as divergent by the divergence analysis but the condition is
13 ; not. This test ensures that the divergence of the branch is tested,
14 ; not its condition, so that branch is correctly emitted as divergent.

It seems that the proposed change completely fails the intention of this test?

I think for this specific case, we should report %8 as uniform, and the branch should also be uniform. But there seems something wrong in the uniform analysis, if you try opt -passes='print<uniformity>' with the test here. We will see both the condition %8 and the conditional branch br i1 %8,... are reported as divergent. but the isUniform() query return true for the branch instruction. I think there should be something wrong in uniform analysis. If the condition is divergent, I think the branch should also be divergent. Another issue I want to point out that the uniform analysis has a subtle difference with divergence analysis. There is some comment in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPURewriteUndefForPHI.cpp#L13 to explain the issue. I think we need to fill the gap to switch to uniform analysis, otherwise we will regress code generation. The last time we discussed this, I think we want some target specific option in uniform analysis to match this behavior with divergence analysis.

I think for this specific case, we should report %8 as uniform, and the branch should also be uniform. But there seems something wrong in the uniform analysis, if you try opt -passes='print<uniformity>' with the test here. We will see both the condition %8 and the conditional branch br i1 %8,... are reported as divergent. but the isUniform() query return true for the branch instruction. I think there should be something wrong in uniform analysis. If the condition is divergent, I think the branch should also be divergent. Another issue I want to point out that the uniform analysis has a subtle difference with divergence analysis. There is some comment in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPURewriteUndefForPHI.cpp#L13 to explain the issue. I think we need to fill the gap to switch to uniform analysis, otherwise we will regress code generation. The last time we discussed this, I think we want some target specific option in uniform analysis to match this behavior with divergence analysis.

I agree with that assessment, and pretty much found the same thing myself by comparing with the output of "print<divergence>". The uniformity analysis assumes a phi is divergent if it has an undef argument, but the divergence analysis does not make that assumption. The difference is in the call to PHINode::hasConstantValue() for divergence analysis and PHINode::hasConstantOrUndefValue() for uniformity analysis.

I think we should just fix the uniformity analysis to match the existing behaviour and revisit the issue later. I can have the patch ready by Monday EOD IST.

I think it is reasonable to match divergent analysis behavior regarding to undef. The other problem that isUniform() return true for a divergent branch instruction makes me wonder: is it the best way to replace use of divergence analysis with uniform analysis one by one? Although I am optimistic about the quality of uniform analysis, I think it may be more helpful to replace all the occurrences of divergence analysis and fix all the bugs uncovered. Ideally, we would have very little test changes. The reason is that one specific pass may not have enough test coverage. Fixing the bugs after switching all the uses of divergence analysis to uniform analysis will make us more confident that we will less likely cause regression. Any different idea?

I think it is reasonable to match divergent analysis behavior regarding to undef. The other problem that isUniform() return true for a divergent branch instruction makes me wonder: is it the best way to replace use of divergence analysis with uniform analysis one by one? Although I am optimistic about the quality of uniform analysis, I think it may be more helpful to replace all the occurrences of divergence analysis and fix all the bugs uncovered. Ideally, we would have very little test changes. The reason is that one specific pass may not have enough test coverage. Fixing the bugs after switching all the uses of divergence analysis to uniform analysis will make us more confident that we will less likely cause regression. Any different idea?

I tried replacing the analysis in AMDGPUUnifyDivergentExitNodes pass, it causes 20 tests to fail including a few fatal errors. I could imagine a lot more failures may occur after replacing in 8 or so other middle-end passes. Replacing all the occurences in a single patch will be massive. I prefer the one pass at a time approach (like the approach with this patch).

I agree with @gandhi21299 ... it's best to work through the passes one at a time. But maybe we should not enable each change as it happens. We could put the changes behind a command-line option to switch between DA and UA. One single option that switches whatever passes have been updated, until all passes are updated and we are ready to flip the switch permanently.

I agree with @gandhi21299 ... it's best to work through the passes one at a time. But maybe we should not enable each change as it happens. We could put the changes behind a command-line option to switch between DA and UA. One single option that switches whatever passes have been updated, until all passes are updated and we are ready to flip the switch permanently.

Do you mean a single option that determines the analysis to use and all transformations use analysis depending on the option value? Something like --enable-uniformity-analysis=1?

Fixing the bugs after switching all the uses of divergence analysis to uniform analysis will make us more confident that we will less likely cause regression. Any different idea?

I think changing them one at a time is fine, this shouldn't be a long drawn out process.

I agree with @gandhi21299 ... it's best to work through the passes one at a time. But maybe we should not enable each change as it happens. We could put the changes behind a command-line option to switch between DA and UA. One single option that switches whatever passes have been updated, until all passes are updated and we are ready to flip the switch permanently.

Another possible way may be do the replacement one pass by one pass, but we only submit all the changes until all the passes have switched to uniform analysis.

I agree with @gandhi21299 ... it's best to work through the passes one at a time. But maybe we should not enable each change as it happens. We could put the changes behind a command-line option to switch between DA and UA. One single option that switches whatever passes have been updated, until all passes are updated and we are ready to flip the switch permanently.

Do you mean a single option that determines the analysis to use and all transformations use analysis depending on the option value? Something like --enable-uniformity-analysis=1?

Not exactly like that. We can make the UA a wrapper over the DA, and update the passes to use only the UA. Depending on the commandline, the UA will actually compute its own results, or simply hand off to a DA stored internally. This is how the legacy DA and the DA are currently arranged.

Once we are ready to shift to UA for good, we can eliminate all this. And we should.

  • Instead of checking uniformity of a branch instruction in the pass, check whether its parent basic block is not divergent.
  • Update test checks.
gandhi21299 marked an inline comment as done.Feb 23 2023, 1:30 AM
sameerds added inline comments.Feb 23 2023, 7:59 AM
llvm/lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp
81

I am not sure we should be doing this. Smells like a bug in the UA!

sameerds added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp
81

D144699 allows the original "isUniform()" call to work as expected on a BranchInst.

Rebased against main

sameerds accepted this revision.Feb 27 2023, 11:45 PM

LGTM! But please ensure this passes internal builds before submitting upstream.

This revision is now accepted and ready to land.Feb 27 2023, 11:45 PM
  • Corrected branch-uniformity.ll and i1-copy-from-loop.ll
foad added inline comments.Feb 28 2023, 2:33 AM
llvm/lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp
81

Did you mean to change this back to isUniform? (I don't really mind either way.)

llvm/test/CodeGen/AMDGPU/branch-uniformity.ll
12 ↗(On Diff #501047)

This looks like it might be the same kind of regression as i1-copy-from-loop.ll

llvm/test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll
46 ↗(On Diff #501047)

Same kind of regression here I think (the s_cselect_b64 -1, 0 is a bad smell).

llvm/test/CodeGen/AMDGPU/i1-copy-from-loop.ll
28 ↗(On Diff #501047)

This is a regression.

gandhi21299 added inline comments.Feb 28 2023, 8:28 AM
llvm/lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp
81

Yea I meant to change this back, great catch!

llvm/test/CodeGen/AMDGPU/i1-copy-from-loop.ll
28 ↗(On Diff #501047)

Should I handle all the regressions in a seperate patch?

foad added inline comments.Feb 28 2023, 9:04 AM
llvm/test/CodeGen/AMDGPU/i1-copy-from-loop.ll
28 ↗(On Diff #501047)

Normally you would not commit a patch that causes regressions, unless you can explain them and get agreement that they are acceptable.

Reverted back to isUniform, fixes all the regressions.

gandhi21299 marked 3 inline comments as done.Feb 28 2023, 9:33 AM