This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Implement TTI::isHardwareLoopProfitable

Authored by samparker on Jun 5 2019, 7:47 AM.



Implement the backend target hook to drive the HardwareLoops pass. The low-overhead branch extension for Arm M-class cores is flexible enough that we don't have to ensure correctness at this point, except checking that the loop counter variable can be stored in LR - a 32-bit register. For it to be profitable, we want to avoid loops that contain function calls, or any other instruction that alters the PC.
This implementation uses TargetLoweringInfo, to query type and operation actions, looks at intrinsic calls and also performs some manual checks for remainder/division and FP operations.
I think this should be a good base to start and extra details can be filled out later.

Diff Detail


Event Timeline

samparker created this revision.Jun 5 2019, 7:47 AM
samparker updated this revision to Diff 203541.Jun 7 2019, 5:50 AM
samparker retitled this revision from [WIP][ARM] Implement TTI::isHardwareLoopProfitable to [ARM] Implement TTI::isHardwareLoopProfitable.
samparker edited the summary of this revision. (Show Details)

Now implemented isLoweredToCall to look at intrinsics.

dmgreen added inline comments.Jun 8 2019, 7:53 AM
646 ↗(On Diff #203541)

There is a floating point sqrt instruction, at least for scalar fp. MVE does not have sqrt or div, but will expand into multiple scalar instructions.

688 ↗(On Diff #203541)

I think most intrinsics like these will be expanded, so not end up as functions (even if they are not very efficient).

698 ↗(On Diff #203541)

This is an extension to 8.1-m? Not just a base feature?

704 ↗(On Diff #203541)

Will getBackedgeTakenCount ever be nullptr (as opposed to something like SCEVCouldNotCompute below)

samparker marked 4 inline comments as done.Jun 9 2019, 11:47 PM
samparker added inline comments.
646 ↗(On Diff #203541)


688 ↗(On Diff #203541)

That would make sense, I'll double check.

698 ↗(On Diff #203541)

Section A1.4 'Armv8-M' variants reads to me that the LOB extension is optional on 8.1-M. The main extension doesn't appear to include LOB, but MVE-I does. I've raised this with Simon for the instruction support too.

704 ↗(On Diff #203541)

I'll check.

samparker updated this revision to Diff 203781.Jun 10 2019, 1:06 AM
  • Folded saturation into the overflow case.
  • Allow scalar sqrt.

Looks like a good start for these loloops. I'm guessing we can always extend MaybeCall if we find more cases that end up as calls. (Along with the difficulties of tail predication!)

656 ↗(On Diff #203781)

I think that even a vector type should become scalar sqrts. Same for abs and friends. So they wouldn't remain as vector operations (which may effect tail prediction), but wouldn't become calls, unless the scalar version will be calls.

723 ↗(On Diff #203781)

I would remove this line :)

775 ↗(On Diff #203781)

Should this one be to return true?

704 ↗(On Diff #203541)

I also see the blurb for getBackedgeTakenCount talking about calling hasLoopInvariantBackedgeTakenCount, but many passes don't seem to check that. Maybe worth adding a check in place of !SE.getBackedgeTakenCount(L).

samparker marked 3 inline comments as done.Jun 10 2019, 3:51 AM
samparker added inline comments.
656 ↗(On Diff #203781)

Okay, I'll query TLI here instead and check for Expand or LibCall.

775 ↗(On Diff #203781)

bah! yep, probably should add some double tests...

704 ↗(On Diff #203541)

I saw that getBackedgeTakenCount appears to always return some sort of value, but I had certainly missed the LoopInvariant restriction - which seems arbitrary and unnecessary to me! But I'll follow the good practise.

samparker updated this revision to Diff 204252.Jun 12 2019, 3:57 AM
  • Now checking for loop invariant trip count.
  • Added tests for double and half precision floats.
  • Now checking the new hasLOB target feature.

I didn't use TLI in isLoweredToCall because it requires an Instruction and it doesn't seem to give any meaningful results when an intrinsic call is passed either. Instead, I'm just checking the precision against the subtarget supported features.

dmgreen accepted this revision.Jun 12 2019, 4:07 AM

LGTM. With one minor point

706 ↗(On Diff #204252)

Can we remove this !SE.getBackedgeTakenCount(L) check now?

This revision is now accepted and ready to land.Jun 12 2019, 4:07 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 12 2019, 4:58 AM
samparker marked an inline comment as done.Jun 12 2019, 4:59 AM
thakis added a subscriber: thakis.Jun 12 2019, 5:36 AM

All these tests seem to fail when the ARM target isn't enabled, with errors like this:

FAIL: LLVM :: Transforms/HardwareLoops/ARM/counter.ll (25696 of 31777)
******************** TEST 'LLVM :: Transforms/HardwareLoops/ARM/counter.ll' FAILED ********************
: 'RUN: at line 1';   /Users/thakis/src/llvm-project/out/gn/bin/opt -mtriple=thumbv8.1m.main-arm-none-eabi -hardware-loops -disable-arm-loloops=false /Users/thakis/src/llvm-project/llvm/test/Transforms/HardwareLoops/ARM/counter.ll -o - | /Users/thakis/src/llvm-project/out/gn/bin/FileCheck /Users/thakis/src/llvm-project/llvm/test/Transforms/HardwareLoops/ARM/counter.ll
Exit Code: 2

Command Output (stderr):
opt: Unknown command line argument '-disable-arm-loloops=false'.  Try: '/Users/thakis/src/llvm-project/out/gn/bin/opt --help'
opt: Did you mean '  --disable-x86-lea-opt=false'?
FileCheck error: '-' is empty.
FileCheck command line:  /Users/thakis/src/llvm-project/out/gn/bin/FileCheck /Users/thakis/src/llvm-project/llvm/test/Transforms/HardwareLoops/ARM/counter.ll


Testing Time: 266.30s
Failing Tests (6):
    LLVM :: Transforms/HardwareLoops/ARM/calls.ll
    LLVM :: Transforms/HardwareLoops/ARM/counter.ll
    LLVM :: Transforms/HardwareLoops/ARM/do-rem.ll
    LLVM :: Transforms/HardwareLoops/ARM/fp-emulation.ll
    LLVM :: Transforms/HardwareLoops/ARM/simple-do.ll
    LLVM :: Transforms/HardwareLoops/ARM/structure.ll

They probably all need a REQUIRES: arm-registered-target?

Sam: you can do the same thing as test/Transforms/LoopVectorize/ARM/lit.local.cfg

Thanks both. I went for the REQUIRES option, though slightly different, and I'll also add that config file.

Thanks! Having both is a bit belt-and-suspenders. obsoletes , so I think you could revert again.