This is an attempt to fix a regression we saw in benchmarks of FFT like code after computeKnownBitsFromShiftOperator was made more strict with respect to possibly shifting past the bitwidth. It adds exact flags to AShr/LShr flags where we can statically prove it is valid using the range of induction variables. This allows further optimisations to remove extra loads.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Ping. Anyone fancy taking a look at this, or suggesting a better place for it?
The definition of shifts past the bitwidth has recently been changed (or clarified?) to poison, not undef. I'm unsure how poison plays together with computeKnownBits, and whether the original cause of the regressions (back in https://reviews.llvm.org/rL297724) is now correct or not. It can still produce end to end compilation errors if it's reverted, which I guess is proof enough that it's needed! There may well be other ways to fix this though.
lib/Transforms/Utils/SimplifyIndVar.cpp | ||
---|---|---|
529 ↗ | (On Diff #102558) | Please clang-format the change. |
532 ↗ | (On Diff #102558) | LLVM style is auto *User or auto *U (I prefer the latter since there is also a llvm::User). |
535 ↗ | (On Diff #102558) | Use llvm::PatternMatch here. You can then fold in the dyn_cast<ConstantInt>(AShr->getOperand(1)) check into the same patternmatch expression. If you do the dyn_cast<ConstantInt>(AShr->getOperand(1)) check, then the AShr->getOperand(0) != BO check should not be necessary. |
542 ↗ | (On Diff #102558) | Do you really mean getUpper or do you want getUnsignedMax? Secondly, BO will always be an integer (as opposed to a vector of integers); since the induction variable is an integer. Thirdly, if I'm not missing something, the IVRange.getUpper().uge(BO->getType()->getPrimitiveSizeInBits()) check should be unnecessary. If the IV is larger than the bitwidth then BO is poison, and so is AShr (and making it exact does not make it any more poison). |
545 ↗ | (On Diff #102558) | In cases like this, I think a more indented version is easier to read: if (IVOperand == BO->getOperand(1) && IVMax.ult(BitWidth)) if (auto *AShrCI = dyn_cast<ConstantInt>(AShr->getOperand(1)) if (IVMin.uge(AShrOp2->getValue()) { AShr->setIsExact(true); Changed = true; } |
546 ↗ | (On Diff #102558) | Perhaps you need IVRange.getUnsignedMin() instead of IVRange.getLower()? |
test/Transforms/IndVarSimplify/strengthen-overflow.ll | ||
116 ↗ | (On Diff #102558) | Add a few more tests here, check that
|
123 ↗ | (On Diff #102558) | I think most of the load/store stuff here can be pruned. |
Hello, Thanks for the review. It sounds like you are happy with where this is, so I will upload a cleaned up version. Thanks for the pointers.
Unfortunately it looks like something else I need for this has broken over in r305481 :( I'll have to try and investigate that to see exactly why it's stopping the full case from optimising.
To this, I've added LShr's, as well as AShr's and the extra testing. Let me know if anything else needs changing, but no need to hurry as I need to look into that other thing now too.
lib/Transforms/Utils/SimplifyIndVar.cpp | ||
---|---|---|
542 ↗ | (On Diff #102558) | This is my understanding, yes. But I thought the same for shifts in computeKnownBits, so I thought it best to be careful. In the case for the original test it was known what the upper bound for the loop is. Removing the need for this will only make this fire for more loops, which should be good. |
test/Transforms/IndVarSimplify/strengthen-overflow.ll | ||
123 ↗ | (On Diff #102558) | Sounds good. This was a bit strategic, in case I had to explain where the benefits come in ;) |
lib/Transforms/Utils/SimplifyIndVar.cpp | ||
---|---|---|
33 ↗ | (On Diff #103376) | Don't open this namespace globally -- just open it in the function you're using it in. |
539 ↗ | (On Diff #103376) | The name and documentation of this function is now out of sync. I'd instead just split out a separate strengthenRightShift, and put that logic in there. |