This patch allows optimizeLoopExitWithUnknownExitCount to deal with
branches by conditions that are not immediately ICmp's, but aggregates
of ICmp's joined by arithmetic or logical AND/OR. Each ICmp is optimized
independently.
Details
Diff Detail
Event Timeline
I'm planning to land some more tests where branch exits loop by TRUE condition to make sure this works fine with it, but it's ready for review as is.
Rebased on top of swapped tests; for some reason we don't optimize them. At least not miscompile (but need to analyze why).
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1434 | Why do we have to reconstruct the whole and chain? Can't we replace a single condition only? This looks quite fragile, especially as we probably want to generalize this to logical and as well, and at that point we'd have to replicate different types of ands in the correct position. |
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1434 | It's only possible if no one else outside our cluster is using it. If they are, we can't simply replace operands of AND's. One more (indirect, but still useful) effect of this algorithm is that it'll also eliminate duplicates, should they be in the AND tree. As for logical AND -- what's the complexity? We keep conditions in the original order in Conditions, and replace them in the very same places. All it takes to support logical AND -- is to call another Builder function (multi-operand logical AND). If it doesn't exist, it's only a reason to write it. |
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1434 | "using them" I mean "using ANDs" |
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1434 |
That probably means that we need some one-use checks, otherwise we might unprofitable replicate a large condition tree.
I don't think it's the job of this transform to optimize ands with redundant operands...
There could be a mix of bitwise and logical and. Of course, it is safe to replace everything with logical and, it's just non-optimal. In any case, I believe logical and handling should be part of the initial patch (or at least part of the patch stack). We should avoid differences in support, they will come back to bite us at some point :) | |
1451 | I think the structure here isn't quite right. The L->contains(FalseSucc) check from createReplacement() should be in this function and just pass in an inversion flag. Then here, if we invert, we need to handle ORs, otherwise we need to handle ANDs. Again, I think parity between and/or needs to be either in the initial patch, or in the patch stack. |
I agree that OR should be supported on the same grounds as AND. Need to check in more tests for it, and maybe this will be in this patch.
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1434 |
Yes, this is also a legit option.
But it just happens if we track duplicates, with no specific purpose... I mean, in ( (A & B)& (B & A) we will only consider A and B once and therefore remove duplicates. Though, if we check one-use, this effect also disappers.
Let's leave logical operations for follow-up. I agree we should support them at some point, but they seem rare. |
Addressed Nikita's comments:
- OR is supported same as AND
- Logical AND and OR are also supported
- Got rid of code duplication, now the conditions are overwritten in existing instructions
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1373–1374 | nit: Missing static. | |
1451 | LogicalOr also matches Or, no need to check both (same for LogicalAnd). | |
1466 | To check that I understand the purpose of the MayNeedUpdate set: We can't simply RAUW because we only check one-use on the and/or ops, but not on the icmps, so the icmp may still have other users that shouldn't be replaced? If so, would it be possible to add the one-use check on the icmp as well, which would allow us to use simple RAUW, and also avoid duplicating the icmp? | |
1479 | nit: I would prefer passing in Inverted as a flag here rather than re-computing it in createReplacement. it took me a moment to understand that this code is not missing the inner condition inversion. | |
llvm/test/Transforms/IndVarSimplify/turn-to-invariant.ll | ||
299 | nit: Drop TODO |
Looks like I've uploaded wrong last diff here (pre-last looks OK), need to fix that and also address the comments.
Addressed comments:
- Removed duplicating AND/logical AND matchers
- Simplified replacement logic, shrinking it down to one-use + RAUW mechanism
- Removed Fixed TODO
No-recompute of condition inversion is planned for follow-up.
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1373–1374 | ||
1451 | Didn't know that, done. | |
1466 | Yes, you are right about the intention here. We can do it I guess, I don't really have a motivating counter-example where it would really matter. | |
1479 | Do you mind if it goes as a follow-up? It will require some non-trivial restructuring there. |
I realized that passing inverted doesn't requrie changing logic of SCEV queries, so not a big deal. Done in frames of this one.
This looks basically good to me, only concern is about the instruction move.
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1446 | This comment is out of place now (I think it can just be omitted, you have another below). | |
1464 | I don't think we need to check isa<Instruction>(Curr) && here (I guess you could have a constant expression icmp or something, but the whole transform doesn't make sense for that case). | |
1480 | Should we be setting the insert point on Rewriter instead? Mainly wondering if we might generate multiple insts and this then only moves one of them. |
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1464 | We're only interested in and/or/icmp here, if one of the operands is a multi-use argument we won't be doing anything with it anyway (that is, it will not get transformed, but it also won't block any transforms on e.g. a different operand). |
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1480 | in short: createInvariantCond uses BI as expansion point for icmp and not rewriter's insertion point. I think this can be changed, but takes more than flick of fingers. Now much are you concerned? Can it go as follow-up? We never generate more than 1 non-invariant instruction here as far as I can tell. |
Fixed astray comment.
Rewrote one-use check that now we only collect single use ICMPs as leaves and don't filter out what we don't need to.
As for insertion point, proposing follow-up. Needs some underlying refactoring I don't want to be blocked on.
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1480 | i.e. in fact it's also invariant, just placed in the loop. I guess we could instead use rewriter's insertion point and even inject it in preheader. |
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1464 | Formally we can, but they're so semantically different... |
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1461 | nit: leaf ICMPs? |
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp | ||
---|---|---|
1461 | Thanks for catch! |
nit: Missing static.