Motivation for the LoopIntWrapPredication pass
Consider the following example:
for (unsigned i = 0; i < N; ++i) for (unsigned j = 0; j < N; ++j) C[i*N+j] = foo();
With C pointer size = 64 bit and i, j variables with 32-bit size. According to the C standard, unsigned overflow is defined behavior, so i*N+j calculation will be done with 32-bit types, zero-extented to 64 bit and will be used as offset in GEP instruction. However, if we replace induction variables types to signed integer, this calculation will have nsw flag, and IndVarSimplify pass will be able to promote both induction variables to 64-bit types and get rid of sext inside hot loop. Since we can't do the same thing with unsigned variables, but it's quite common pattern in code, we try to do versioning of this loop: generate some runtime check that ensures that overflow will never occur, and set NUW flags on this chain of address calculation. We use scalar evolution to get possible range of this expression and insert loop-invariant condition that this range is not overflowing. To simplify pass, we don't do loop versioning directly, but inserting branching code for this calculation chain inside the loop, relying that this branch will be unswitched by the subsequent pass (because this branch has loop-invariant condition). This how it will look like in pseudo-code:
for (unsigned i = 0; i < N; ++i) for (unsigned j = 0; j < N; ++j) if (overflow(N*N)) *(C + zext(i*N+j)) = foo(); else *(C + zext(i*N+j /*nuw*/)) = foo();
After unswitching:
if (overflow(N*N)) for (unsigned i = 0; i < N; ++i) for (unsigned j = 0; j < N; ++j) *(C + zext(i*N+j)) = foo(); else for (unsigned i = 0; i < N; ++i) for (unsigned j = 0; j < N; ++j) *(C + zext(i*N+j /*nuw*/)) = foo();
After IndVarSimplify:
if (overflow(N*N)) for (unsigned i = 0; i < N; ++i) for (unsigned j = 0; j < N; ++j) *(C + zext(i*N+j)) = foo(); else for (uint64_t i = 0; i < N; ++i) for (uint64_t j = 0; j < N; ++j) *(C + (i*N+j)) = foo();
Results
This pass shows some good results in Coremark benchmark on our RISC-V hardware, increasing score on 18 %. It has exaclty the same pattern as described before: uses unsigned types for induction variables, that are never overflowed in runtime (this issue was also mentioned here: https://github.com/eembc/coremark/issues/22). Analyzing code of matrix functions (especially matrix_mul_matrix) shows that significant ammount of instructions inside innermost loop are doing this zero extension, but unsigned overflow of array index is never occured.
The description on line 32 mentions shl, but Shl isn't here