This is an archive of the discontinued LLVM Phabricator instance.

[GVN] Avoid replacing uniforms with non-uniforms in propagateEquality
Needs ReviewPublic

Authored by bogner on Mar 24 2023, 9:58 AM.

Details

Reviewers
foad
arsenm
sameerds
Pierre-vh
Group Reviewers
Restricted Project
Summary

When propagating equality, avoid replacing values that are known to be
uniform when that will lead to information loss, since that's very
likely to pessimize performance.

Diff Detail

Event Timeline

bogner created this revision.Mar 24 2023, 9:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2023, 9:58 AM
bogner requested review of this revision.Mar 24 2023, 9:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2023, 9:58 AM

A couple of things I'm not 100% sure about:

  • Are we okay with adding the TTI dependency on GVN? We already depend on TLI so I think it's probably okay.
  • Is this testable at all without the AMDGPU backend? I don't think there's a way to imply uniformity generically
bogner updated this revision to Diff 508163.Mar 24 2023, 10:46 AM

Fix legacy pass manager deps

foad added reviewers: Restricted Project, sameerds.Mar 27 2023, 2:27 AM

If we're going to do this I think we ought to do it properly, by querying UniformityAnalysis. TTI->isAlwaysUniform will only catch a small fraction of the values that are provably uniform.

llvm/test/Transforms/GVN/uniform-values.ll
2

Maybe this should go in test/Transforms/GVN/AMDGPU/? That's what we've done for lots of other test/Transforms/*/AMDGPU/ tests.

If we're going to do this I think we ought to do it properly, by querying UniformityAnalysis. TTI->isAlwaysUniform will only catch a small fraction of the values that are provably uniform.

I agree, but most targets don't care about uniformity so I don't think we can make GVN depend on UA unless we find a way to avoid running the analysis for those targets. No?
Does UA bail out early for non-GPU targets? (It'd be nice if it did that and just returned an empty UniformityInfo instance for those targets, we could then use UA in more places in the middle-end without worrying about performance)

If we're going to do this I think we ought to do it properly, by querying UniformityAnalysis. TTI->isAlwaysUniform will only catch a small fraction of the values that are provably uniform.

I agree, but most targets don't care about uniformity so I don't think we can make GVN depend on UA unless we find a way to avoid running the analysis for those targets. No?
Does UA bail out early for non-GPU targets? (It'd be nice if it did that and just returned an empty UniformityInfo instance for those targets, we could then use UA in more places in the middle-end without worrying about performance)

When UA or DA initialize, they query TTI for sources of divergence. On a target with no divergence, they won't find any and hence the analysis should be trivial anyway. The only cost is one O(N) pass over the entire function. We could in principle eliminate that by just querying for TTI.hasBranchDivergence(). But there have been philosophical objections to taking that approach in the past. In general, one day, we would like to move to an implementation where source of divergence are inferred from IR semantics rather than TTI.

A couple of things I'm not 100% sure about:

  • Are we okay with adding the TTI dependency on GVN? We already depend on TLI so I think it's probably okay.
  • Is this testable at all without the AMDGPU backend? I don't think there's a way to imply uniformity generically

In general divergence exists only on targets that have branch divergence, which is why this change matters on those targets such as AMDGPU. On a CPU target, everything is always uniform. So any test needs a GPU target.

A couple of things I'm not 100% sure about:

  • Are we okay with adding the TTI dependency on GVN? We already depend on TLI so I think it's probably okay.
  • Is this testable at all without the AMDGPU backend? I don't think there's a way to imply uniformity generically

In general divergence exists only on targets that have branch divergence, which is why this change matters on those targets such as AMDGPU. On a CPU target, everything is always uniform. So any test needs a GPU target.

Yep, that's all clear. I was thinking more along the lines of if there's some attribute or something that the base handles generically for testing purposes. If there's nothing like that today that's fine.

If we're going to do this I think we ought to do it properly, by querying UniformityAnalysis. TTI->isAlwaysUniform will only catch a small fraction of the values that are provably uniform.

I agree, but most targets don't care about uniformity so I don't think we can make GVN depend on UA unless we find a way to avoid running the analysis for those targets. No?
Does UA bail out early for non-GPU targets? (It'd be nice if it did that and just returned an empty UniformityInfo instance for those targets, we could then use UA in more places in the middle-end without worrying about performance)

When UA or DA initialize, they query TTI for sources of divergence. On a target with no divergence, they won't find any and hence the analysis should be trivial anyway. The only cost is one O(N) pass over the entire function. We could in principle eliminate that by just querying for TTI.hasBranchDivergence(). But there have been philosophical objections to taking that approach in the past. In general, one day, we would like to move to an implementation where source of divergence are inferred from IR semantics rather than TTI.

The cost of UA will also include pulling in the cost of dependencies, so using UA here would also introduce the cost of at least CycleAnalysis. I don't think we can just dismiss the compile time concern for non-gpu targets without measuring it unfortunately.

When UA or DA initialize, they query TTI for sources of divergence. On a target with no divergence, they won't find any and hence the analysis should be trivial anyway. The only cost is one O(N) pass over the entire function. We could in principle eliminate that by just querying for TTI.hasBranchDivergence(). But there have been philosophical objections to taking that approach in the past. In general, one day, we would like to move to an implementation where source of divergence are inferred from IR semantics rather than TTI.

The cost of UA will also include pulling in the cost of dependencies, so using UA here would also introduce the cost of at least CycleAnalysis. I don't think we can just dismiss the compile time concern for non-gpu targets without measuring it unfortunately.

Right. I had not thought of that. But then again, on the new pass manager, we can eliminate this cost by ensuring that CycleAnalysis is retrieved only if any divergence is found.

bogner updated this revision to Diff 509504.Mar 29 2023, 5:04 PM

Use UniformityAnalysis instead of TTI.

TODO:

  • Still need to measure compile time impact, especially on non-gpu/non-divergent targets. This might lead to making some changes to make uniformity analysis as cheap as possible on those targets.
  • I'll also add a couple more tests for slightly more complicated cases.
arsenm added inline comments.Mar 31 2023, 2:35 PM
llvm/test/Transforms/GVN/uniform-values.ll
27

no typed pointers in tests

bogner updated this revision to Diff 510952.Apr 4 2023, 3:39 PM

Add new test case and incorporate review feedback

nikic added a subscriber: nikic.Apr 5 2023, 2:03 AM
llvm/test/Transforms/GVN/AMDGPU/uniform-values.ll
1 ↗(On Diff #510952)

Use update_test_checks.py.

foad added a comment.Apr 5 2023, 2:15 AM

Patch looks good to me modulo the compile time concerns.

arsenm added a comment.Apr 8 2023, 5:19 AM

Could another pass undo this instead?

Could another pass undo this instead?

I assume you mean "undo" in a good way? That is, instead of depending on uniformity here, have another pass that replaces divergent values with uniform values, specific to targets that have divergence? Would that be like a "uniformity aware" GVN in MIR?

Is this running-time impact a serious enough concern to fix? UniformityAnalysis is special in the sense that it has to be recomputed every time the code changes, at least for now. And it already depends on TTI. So it's not too wrong to just query TTI and immediately return if the target does not have divergence.