The URem instruction should be uninteresting and be treated as the user.
There is no URem SCEV expression, and the URem operation is analyzed as
Add SCEV. So, we need to check the opcode of URem instruction in the
instruction interesting check function.
Details
Diff Detail
Event Timeline
Sure.
Actually, this bug should exist long time ago. The bug is exposed when we enable opaque pointer and the IR generated becomes different. It seems the old IR with typed pointer cannot trigger this scenario.
In this scenario, there are two phi nodes with i64 and i32 type, and the use of phi node with i32 type is URem instruction. In LSR, the two phi nodes are reduced into one with i64 type, and the urem can be replaced with udiv/add instructions in Scalar Evolution. In the above test case (urem-use-type-conversion.ll), the phi node matters is %dec.137. When converting to i64, the negative value will be the problem. Check the following:
// The wrong transformation: (think about what if %dec.137 is -1) %dec.137 = ... ; i64 type %rem = urem i64 %dec.137, 53 %newrem = trunc i64 %rem to i32 %call = ... %rem // The correct transformation: %dec.137 = ... ; i64 type %dec.137.2 = trunc i64 %dec.137 to i32 %rem = urem i32 %dec.137.2, 53 %call = ... %rem
Previously, when getting the IVuses, isInteresting thinks URem instruction is not one user of Phi node (%dec.137) since the SCEV expression of URem is Add expr. As a result, the user of Phi node (%dec.137) is analyzed as %call. Then, in function LSRInstance::GenerateAllReuseFormulae in LSR, the wrong transformation SCEV expression is generated.
@nikic Please let me know if something is not clear to you.
For example, there is UDiv SCEV, so the bug does exist when udiv is the user of Phi node (%dec.137).
I think what I still don't really understand is why LSR thinks it is safe to replace the IV here when it isn't. It seems like marking the urem as interesting here may be non-profitable, but also shouldn't be incorrect. I think there has to be some bug on the LSR side to enable this replacement.
Note that urem is not the only non-add instruction that can end up producing an add SCEV node, so I'm not sure why we need to treat urem in particular specially.
This truly seems to be like one work around instead of fixing the problem for now. I tested some internal benchmarks and there is no performance regression. I tried to analyze this case in LSR, but everywhere looks like reasonable. I tested the test case with typed pointer, and the bug exists in LLVM 12. I may need more time to dig when and which design missed this scenario in LSR.
urem is special since the i32-i64 conversion may cause the value difference and the variable is negative since it converts the signed to unsigned. Another one is udiv, but there is SCEV Div Expr.
signed: (i32 -1) == (i64 -1) unsigned: (i32 -1) != (i64 -1)
I think the underlying issue is that normalized expressions are extended and then used in a post-inc context outside the loop. I put up D153004