Currently, isDynamicConstant tries (incorrectly) to check whether we could only have reached the current block from a switch on the value in the previous block. If the block is determined to only be reachable from one case, the value is being used as if it were a dynamic constant (which it may not be), instead of being replaced by the constant. This patch fixes the check, and also does the replacement if the check succeeds. Also, disable a test that relied on the buggy behavior.
Diff Detail
Event Timeline
Are you sure that actually fixes the issue? On trunk, the following also crashes:
define i32 @f() local_unnamed_addr { entry: %call = call i32 @g() switch i32 %call, label %sw.default [ i32 1, label %cleanup ] sw.default: %call1 = call i32 @f() %add = add nsw i32 %call1, 1 br label %cleanup cleanup: %retval.0 = phi i32 [ %add, %sw.default ], [ %call, %entry ] ret i32 %retval.0 } declare i32 @g()
Also, missing testcase.
This now fixes both repos of the crash. I removed a test that relied on the bug and actually didn't produce valid output. Added the case with a single branch as a test case. However, all this switch handling really does is propagate a constant. Maybe we should remove it entirely and expect that the constant propagation passes do this for us.
llvm/test/Transforms/TailCallElim/accum_recursion.ll | ||
---|---|---|
48 | Its not valid for us to use %n here as our initial value, since %n would change with each iteration of the recursion. | |
52–53 | If only one of these cases resulted in this branch we could replace the %n (see above comment) with the constant. However, we don't whether to choose the 1 or the 0. |
However, all this switch handling really does is propagate a constant. Maybe we should remove it entirely and expect that the constant propagation passes do this for us.
This sort of depends on how the interaction with other passes works out. A switch with one case will be turned into a branch by SimplifyCFG, so it's not worth optimizing. If it's realistic to have a switch where only one of the cases goes to a return instruction, maybe worth keeping the switch handling around; SimplifyCFG will often prefer a unified "ret" instruction. Probably want to check what the input to -tailcallelim looks like, realistically.
In terms of whether we could actually transform my original example... well, it sort of goes back to what I was saying on D78259. Currently, we set the initial value of the "accumulator" to the value returned in the base case. So if the base case isn't a value we can materialize in the entry block, we can't do the tail call transform. But I'm not sure that's a fundamental limitation. Suppose, instead, the accumulator always starts at zero. Then, just before we return, we add the value of the base case. It's still basically the same transform, but it's more flexible: you can drop the whole isDynamicConstant() check, I think. Does that make sense?
To write this out:
Input
define i32 @f() local_unnamed_addr { entry: %call = call i32 @g() switch i32 %call, label %sw.default [ i32 1, label %cleanup i32 2, label %cleanup ] sw.default: %call1 = call i32 @f() %add = add nsw i32 %call1, 1 ret i32 %add cleanup: ret i32 %call } declare i32 @g()
Current, invalid -tailcallelim output on trunk:
define i32 @f() local_unnamed_addr { entry: br label %tailrecurse tailrecurse: ; preds = %sw.default, %entry %accumulator.tr = phi i32 [ %call, %entry ], [ %add, %sw.default ] %call = tail call i32 @g() switch i32 %call, label %sw.default [ i32 1, label %cleanup i32 2, label %cleanup ] sw.default: ; preds = %tailrecurse %add = add nsw i32 %accumulator.tr, 1 br label %tailrecurse cleanup: ; preds = %tailrecurse, %tailrecurse ret i32 %accumulator.tr } declare i32 @g()
Alternative transform:
define i32 @f() local_unnamed_addr { entry: br label %tailrecurse tailrecurse: ; preds = %sw.default, %entry %accumulator.tr = phi i32 [ 0, %entry ], [ %add, %sw.default ] %call = tail call i32 @g() switch i32 %call, label %sw.default [ i32 1, label %cleanup i32 2, label %cleanup ] sw.default: ; preds = %tailrecurse %add = add nsw i32 %accumulator.tr, 1 br label %tailrecurse cleanup: ; preds = %tailrecurse, %tailrecurse %accumulator.ret = add i32 %accumulator.tr, %call ret i32 %accumulator.ret } declare i32 @g()
llvm/test/Transforms/TailCallElim/accum_recursion.ll | ||
---|---|---|
44 | Please keep this function as a testcase, even if we're just keeping it an example of something we currently can't transform. |
Suppose, instead, the accumulator always starts at zero.
The accumulator has to start with a value that won't effect the computation. Zero works in the case where we are doing addition, but would not work if we were doing multiplication. We could possibly implement a version that looks at the operation and then decides a starting value. Alternatively, and what I am working on is, we could basically inline one iteration though the loop and start the accumulator with the value you get from the first iteration. I have a version of this working, that does transform your original example. I still wanted to keep the current implementation where it applies though, because there is no need to do the inlining if we can already find the value.
The accumulator has to start with a value that won't effect the computation. Zero works in the case where we are doing addition, but would not work if we were doing multiplication. We could possibly implement a version that looks at the operation and then decides a starting value.
tailcallelim currently only recognizes a very restrictive set of accumulator operations: ones where BinaryOperator::isAssociative returns true. ConstantExpr::getBinOpIdentity() can compute an appropriate identity value for all those operations.
Alternatively, and what I am working on is, we could basically inline one iteration though the loop and start the accumulator with the value you get from the first iteration.
Compiler jargon for making a copy of the loop for the first iteration is "loop peeling".
In general, it's more restrictive to require an identity value, yes. Peeling the first iteration of the loop provides an alternative transform in those cases. But I can't think of any realistic case where we could prove an operation is associative without being able to compute an identity value. And we don't really want to peel loops unless the peeled loop is significantly faster.
If we're not peeling the loop, I can't think of any other reason to keep the isDynamicConstant path around.
I'm trying this approach now, but I'm having a hard time seeing a good way to transform this case:
define i32 @func(i32 %index) local_unnamed_addr { entry: %0 = icmp eq i32 %index, 0 br i1 %0, label %then, label %else then: ret i32 12 else: %1 = call i32 @func(i32 0) ret i32 3 }
Current transform:
define i32 @func(i32 %index) local_unnamed_addr { entry: br label %tailrecurse tailrecurse: ; preds = %else, %entry %accumulator.tr = phi i32 [ 12, %entry ], [ 3, %else ] %index.tr = phi i32 [ %index, %entry ], [ 0, %else ] %0 = icmp eq i32 %index.tr, 0 br i1 %0, label %then, label %else then: ; preds = %tailrecurse ret i32 %accumulator.tr else: ; preds = %tailrecurse br label %tailrecurse }
We could do something like this:
define i32 @func(i32 %index) local_unnamed_addr { entry: br label %tailrecurse tailrecurse: ; preds = %else, %entry %selector.tr = phi i1 [ true, %entry ], [ false, %else ] %accumulator.tr = phi i32 [ 0, %entry ], [ %selection.tr1, %else ] %index.tr = phi i32 [ %index, %entry ], [ 0, %else ] %0 = icmp eq i32 %index.tr, 0 br i1 %0, label %then, label %else then: ; preds = %tailrecurse %selection.tr = select i1 %selector.tr, i32 12, i32 %accumulator.tr ret i32 %selection.tr else: ; preds = %tailrecurse %selection.tr1 = select i1 %selector.tr, i32 3, i32 %accumulator.tr br label %tailrecurse }
But that really doesn't seem all that clean to me. Is there a better way to do this that I am missing?
In general, I think it has to look something like that, yes. You have to keep the value from the first iteration in a PHI node like %accumulator.tr. You need a select in the loop to consistently take the value from the first iteration. And you need a select before the return to check if there's a valid value in %accumulator.tr. Otherwise, the dominance doesn't work out in general, and you're back in isDynamicConstant() territory. Later loop optimizations should be able to simplify it to something sane in most cases.
I have gotten a version of this that does the accumulation at the end mostly working. I will finish cleaning it up and try to split it into a few different patches as I reworked a fair bit, and found a few addition improvements I think we can make. I went ahead and updated this diff in case we want to consider it a first step of incremental improvement.
Please keep this function as a testcase, even if we're just keeping it an example of something we currently can't transform.