This is an archive of the discontinued LLVM Phabricator instance.

[HardwareLoops] Loop guard intrinsic to recognise zext
ClosedPublic

Authored by sherwin-dc on Sep 10 2021, 2:03 PM.

Details

Summary

I am working on a downstream custom backend that uses the loop guard intrinsic.

If a loop count was initially represented by a 32b unsigned int in C then the hardware-loop pass can recognise the loop guard and insert the llvm.test.set.loop.iterations intrinsic. However if this was instead a unsigned short/char then clang inserts a zext instruction to expand the loop count to an i32.

This means the loop guard intrinsic would not be used as shown here because %x was being compared with 0 in the loop guard and the hardware loop pass was expecting %0 to be compared.

entry:
  %tobool.not5 = icmp eq i8 %x, 0
  %0 = zext i8 %x to i32
  br i1 %tobool.not5, label %while.end, label %while.body.preheader

while.body.preheader:                             ; preds = %entry
  %1 = zext i8 %x to i32
  %scevgep = getelementptr i32, i32* %arr, i32 %1
  call void @llvm.set.loop.iterations.i32(i32 %0)
  br label %while.body

This patch would benefit other backends as well that make use of the loop guard intrinsic. For example, if we have a loop:

void loop(unsigned *a, unsigned *b, unsigned length)
{
    for(int i = 0; i < length; i++)
    {
        a[i] = b[i];
    }
}

and compile it to IR we get int_loop.ll. After changing unsigned length to unsigned short length and compiling to IR we have short_loop.ll.


Using ARM as an example (since it makes use of the hardware loop guard intrinsic), compiling as llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+lob,+mve.fp int_loop.ll -o int_loop.asm we get int_loop.asm and short_loop.asm respectively.


The int_loop.ll example compiles to a hardware loop with ARM's WLS instruction, while short_loop.ll does not and includes a loop guard. With this patch the short_loop.ll example also compiles to a WLS loop since the loop guard intrinsic is used.

Diff Detail

Event Timeline

sherwin-dc created this revision.Sep 10 2021, 2:03 PM
sherwin-dc requested review of this revision.Sep 10 2021, 2:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 10 2021, 2:03 PM
dmgreen added a comment.EditedSep 11 2021, 11:37 AM

The int_loop.ll example compiles to a hardware loop with ARM's WLS instruction, while short_loop.ll does not and includes a loop guard. With this patch the short_loop.ll example also compiles to a WLS loop since the loop guard intrinsic is used.

Sounds good, Can you add a test then?

samparker added inline comments.Sep 13 2021, 12:19 AM
llvm/lib/CodeGen/HardwareLoops.cpp
369

Can we use SCEV isLoopEntryGuardedByCond instead of doing the pattern matching? It feels like we should be able to replace IsCompareZero with it which would remove the need to do an explicit zext check.

sherwin-dc added inline comments.Sep 13 2021, 8:09 AM
llvm/lib/CodeGen/HardwareLoops.cpp
369

It seems that isLoopEntryGuardedByCond is already used just before this (see below comment). This was added at the same time as IsCompareZero in D63809, so I'm not sure if IsCompareZero was needed in some way to explicitly check the instructions.

395–396

isLoopEntryGuardedByCond used here

  • Add loop guard test
samparker added inline comments.Sep 14 2021, 12:55 AM
llvm/lib/CodeGen/HardwareLoops.cpp
369

Ah, yes... those unfortunate cases where pattern matching is more useful than SCEV.

@dmgreen would this test case suffice or should I add a ARM test that compiles to a WLS loop?

Yeah thanks. The test looks good. @samparker - are you happy with this?

samparker accepted this revision.Sep 15 2021, 9:02 AM

are you happy with this?

Yep!
Thanks @sherwin-dc

This revision is now accepted and ready to land.Sep 15 2021, 9:02 AM

Thanks for reviewing. I dont have commit access, so I'll need help with landing the change.

This revision was automatically updated to reflect the committed changes.