I am working on a downstream custom backend that uses the loop guard intrinsic.
If a loop count was initially represented by a 32b unsigned int in C then the hardware-loop pass can recognise the loop guard and insert the llvm.test.set.loop.iterations intrinsic. However if this was instead a unsigned short/char then clang inserts a zext instruction to expand the loop count to an i32.
This means the loop guard intrinsic would not be used as shown here because %x was being compared with 0 in the loop guard and the hardware loop pass was expecting %0 to be compared.
entry: %tobool.not5 = icmp eq i8 %x, 0 %0 = zext i8 %x to i32 br i1 %tobool.not5, label %while.end, label %while.body.preheader while.body.preheader: ; preds = %entry %1 = zext i8 %x to i32 %scevgep = getelementptr i32, i32* %arr, i32 %1 call void @llvm.set.loop.iterations.i32(i32 %0) br label %while.body
This patch would benefit other backends as well that make use of the loop guard intrinsic. For example, if we have a loop:
void loop(unsigned *a, unsigned *b, unsigned length) { for(int i = 0; i < length; i++) { a[i] = b[i]; } }
and compile it to IR we get int_loop.ll. After changing unsigned length to unsigned short length and compiling to IR we have short_loop.ll.
Using ARM as an example (since it makes use of the hardware loop guard intrinsic), compiling as llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+lob,+mve.fp int_loop.ll -o int_loop.asm we get int_loop.asm and short_loop.asm respectively.
The int_loop.ll example compiles to a hardware loop with ARM's WLS instruction, while short_loop.ll does not and includes a loop guard. With this patch the short_loop.ll example also compiles to a WLS loop since the loop guard intrinsic is used.
Can we use SCEV isLoopEntryGuardedByCond instead of doing the pattern matching? It feels like we should be able to replace IsCompareZero with it which would remove the need to do an explicit zext check.