Implement the backend target hook to drive the HardwareLoops pass. The low-overhead branch extension for Arm M-class cores is flexible enough that we don't have to ensure correctness at this point, except checking that the loop counter variable can be stored in LR - a 32-bit register. For it to be profitable, we want to avoid loops that contain function calls, or any other instruction that alters the PC.
This implementation uses TargetLoweringInfo, to query type and operation actions, looks at intrinsic calls and also performs some manual checks for remainder/division and FP operations.
I think this should be a good base to start and extra details can be filled out later.
Details
Diff Detail
Event Timeline
lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
646 | There is a floating point sqrt instruction, at least for scalar fp. MVE does not have sqrt or div, but will expand into multiple scalar instructions. | |
688 | I think most intrinsics like these will be expanded, so not end up as functions (even if they are not very efficient). | |
698 | This is an extension to 8.1-m? Not just a base feature? | |
704 | Will getBackedgeTakenCount ever be nullptr (as opposed to something like SCEVCouldNotCompute below) |
lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
646 | cheers! | |
688 | That would make sense, I'll double check. | |
698 | Section A1.4 'Armv8-M' variants reads to me that the LOB extension is optional on 8.1-M. The main extension doesn't appear to include LOB, but MVE-I does. I've raised this with Simon for the instruction support too. | |
704 | I'll check. |
Looks like a good start for these loloops. I'm guessing we can always extend MaybeCall if we find more cases that end up as calls. (Along with the difficulties of tail predication!)
lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
656 | I think that even a vector type should become scalar sqrts. Same for abs and friends. So they wouldn't remain as vector operations (which may effect tail prediction), but wouldn't become calls, unless the scalar version will be calls. | |
704 | I also see the blurb for getBackedgeTakenCount talking about calling hasLoopInvariantBackedgeTakenCount, but many passes don't seem to check that. Maybe worth adding a check in place of !SE.getBackedgeTakenCount(L). | |
723 | I would remove this line :) | |
775 | Should this one be to return true? |
lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
656 | Okay, I'll query TLI here instead and check for Expand or LibCall. | |
704 | I saw that getBackedgeTakenCount appears to always return some sort of value, but I had certainly missed the LoopInvariant restriction - which seems arbitrary and unnecessary to me! But I'll follow the good practise. | |
775 | bah! yep, probably should add some double tests... |
- Now checking for loop invariant trip count.
- Added tests for double and half precision floats.
- Now checking the new hasLOB target feature.
I didn't use TLI in isLoweredToCall because it requires an Instruction and it doesn't seem to give any meaningful results when an intrinsic call is passed either. Instead, I'm just checking the precision against the subtarget supported features.
LGTM. With one minor point
lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
706 | Can we remove this !SE.getBackedgeTakenCount(L) check now? |
All these tests seem to fail when the ARM target isn't enabled, with errors like this:
******************** FAIL: LLVM :: Transforms/HardwareLoops/ARM/counter.ll (25696 of 31777) ******************** TEST 'LLVM :: Transforms/HardwareLoops/ARM/counter.ll' FAILED ******************** Script: -- : 'RUN: at line 1'; /Users/thakis/src/llvm-project/out/gn/bin/opt -mtriple=thumbv8.1m.main-arm-none-eabi -hardware-loops -disable-arm-loloops=false /Users/thakis/src/llvm-project/llvm/test/Transforms/HardwareLoops/ARM/counter.ll -o - | /Users/thakis/src/llvm-project/out/gn/bin/FileCheck /Users/thakis/src/llvm-project/llvm/test/Transforms/HardwareLoops/ARM/counter.ll -- Exit Code: 2 Command Output (stderr): -- opt: Unknown command line argument '-disable-arm-loloops=false'. Try: '/Users/thakis/src/llvm-project/out/gn/bin/opt --help' opt: Did you mean ' --disable-x86-lea-opt=false'? FileCheck error: '-' is empty. FileCheck command line: /Users/thakis/src/llvm-project/out/gn/bin/FileCheck /Users/thakis/src/llvm-project/llvm/test/Transforms/HardwareLoops/ARM/counter.ll -- ******************** Testing Time: 266.30s ******************** Failing Tests (6): LLVM :: Transforms/HardwareLoops/ARM/calls.ll LLVM :: Transforms/HardwareLoops/ARM/counter.ll LLVM :: Transforms/HardwareLoops/ARM/do-rem.ll LLVM :: Transforms/HardwareLoops/ARM/fp-emulation.ll LLVM :: Transforms/HardwareLoops/ARM/simple-do.ll LLVM :: Transforms/HardwareLoops/ARM/structure.ll
They probably all need a REQUIRES: arm-registered-target?
Thanks both. I went for the REQUIRES option, though slightly different, and I'll also add that config file.
Thanks! Having both is a bit belt-and-suspenders. https://reviews.llvm.org/rL363157 obsoletes https://reviews.llvm.org/rL363156 , so I think you could revert https://reviews.llvm.org/rL363156 again.
There is a floating point sqrt instruction, at least for scalar fp. MVE does not have sqrt or div, but will expand into multiple scalar instructions.