This change removes a trunc from in-between an induction variable and the associated compare. This allows other optimizations (i.e., instcombine, LSR) to take effect. A sext may be added to the compare's other operand, but this can often be hoisted outside of the loop.
For example,
int *ptr;
int e, idx;
int foo() {
int i; idx = -1; for (i = 0; i <= e; i++) if (!ptr[i]) { idx = i; break; }; return idx;
}
Before this change, on AArch64 we generate the following loop:
.LBB0_2: // %for.body
ldr w11, [x10, x0, lsl #2] cbz w11, .LBB0_5 add x0, x0, #1 // ++i / pre-increment sub w11, w0, #1 // rematerialize i cmp w11, w9 // compare i b.lt .LBB0_2
After:
.LBB0_2: // %for.body
ldr w11, [x10] // remove shift as we can now increment base by 4 add x0, x0, #1 // i++ / post-increment i cbz w11, .LBB0_5 add x10, x10, #4 // base += 4 cmp x0, x9 b.lt .LBB0_2
With a little more work we should be able to generate a post-increment load (my next patch) to generate the following loop:
.LBB0_2: // %for.body
ldr w11, [x10], 4 add x0, x0, #1 cbz w11, .LBB0_5 cmp x0, x9 b.lt .LBB0_2
This change in isolation has a minimal effect on performance (i.e., nothing outside of noise). However, it enables better use of post-increment loads/stores, which is why I deemed it important. Please have a look.
Chad
Does your comment about single use still apply? I don't see a check for it.