When propagating equality, avoid replacing values that are known to be
uniform when that will lead to information loss, since that's very
likely to pessimize performance.
Details
Diff Detail
Unit Tests
Event Timeline
A couple of things I'm not 100% sure about:
- Are we okay with adding the TTI dependency on GVN? We already depend on TLI so I think it's probably okay.
- Is this testable at all without the AMDGPU backend? I don't think there's a way to imply uniformity generically
If we're going to do this I think we ought to do it properly, by querying UniformityAnalysis. TTI->isAlwaysUniform will only catch a small fraction of the values that are provably uniform.
llvm/test/Transforms/GVN/uniform-values.ll | ||
---|---|---|
2 | Maybe this should go in test/Transforms/GVN/AMDGPU/? That's what we've done for lots of other test/Transforms/*/AMDGPU/ tests. |
I agree, but most targets don't care about uniformity so I don't think we can make GVN depend on UA unless we find a way to avoid running the analysis for those targets. No?
Does UA bail out early for non-GPU targets? (It'd be nice if it did that and just returned an empty UniformityInfo instance for those targets, we could then use UA in more places in the middle-end without worrying about performance)
When UA or DA initialize, they query TTI for sources of divergence. On a target with no divergence, they won't find any and hence the analysis should be trivial anyway. The only cost is one O(N) pass over the entire function. We could in principle eliminate that by just querying for TTI.hasBranchDivergence(). But there have been philosophical objections to taking that approach in the past. In general, one day, we would like to move to an implementation where source of divergence are inferred from IR semantics rather than TTI.
In general divergence exists only on targets that have branch divergence, which is why this change matters on those targets such as AMDGPU. On a CPU target, everything is always uniform. So any test needs a GPU target.
Yep, that's all clear. I was thinking more along the lines of if there's some attribute or something that the base handles generically for testing purposes. If there's nothing like that today that's fine.
The cost of UA will also include pulling in the cost of dependencies, so using UA here would also introduce the cost of at least CycleAnalysis. I don't think we can just dismiss the compile time concern for non-gpu targets without measuring it unfortunately.
Right. I had not thought of that. But then again, on the new pass manager, we can eliminate this cost by ensuring that CycleAnalysis is retrieved only if any divergence is found.
Use UniformityAnalysis instead of TTI.
TODO:
- Still need to measure compile time impact, especially on non-gpu/non-divergent targets. This might lead to making some changes to make uniformity analysis as cheap as possible on those targets.
- I'll also add a couple more tests for slightly more complicated cases.
llvm/test/Transforms/GVN/uniform-values.ll | ||
---|---|---|
27 | no typed pointers in tests |
Compile-time: http://llvm-compile-time-tracker.com/compare.php?from=712dfec1781db8aa92782b98cac5517db548b7f9&to=406639396dd168f459ffb5efcac9eaa1db881f56&stat=instructions:u Would certainly be good if UniformityAnalysis were free for non-GPU targets.
llvm/test/Transforms/GVN/AMDGPU/uniform-values.ll | ||
---|---|---|
1 ↗ | (On Diff #510952) | Use update_test_checks.py. |
I assume you mean "undo" in a good way? That is, instead of depending on uniformity here, have another pass that replaces divergent values with uniform values, specific to targets that have divergence? Would that be like a "uniformity aware" GVN in MIR?
Is this running-time impact a serious enough concern to fix? UniformityAnalysis is special in the sense that it has to be recomputed every time the code changes, at least for now. And it already depends on TTI. So it's not too wrong to just query TTI and immediately return if the target does not have divergence.
Maybe this should go in test/Transforms/GVN/AMDGPU/? That's what we've done for lots of other test/Transforms/*/AMDGPU/ tests.