Skip to content

Commit 312409e

Browse files
committedSep 6, 2019
[ARM] MVE Tail Predication
The MVE and LOB extensions of Armv8.1m can be combined to enable 'tail predication' which removes the need for a scalar remainder loop after vectorization. Lane predication is performed implicitly via a system register. The effects of predication is described in Section B5.6.3 of the Armv8.1-m Arch Reference Manual, the key points being: - For vector operations that perform reduction across the vector and produce a scalar result, whether the value is accumulated or not. - For non-load instructions, the predicate flags determine if the destination register byte is updated with the new value or if the previous value is preserved. - For vector store instructions, whether the store occurs or not. - For vector load instructions, whether the value that is loaded or whether zeros are written to that element of the destination register. This patch implements a pass that takes a hardware loop, containing masked vector instructions, and converts it something that resembles an MVE tail predicated loop. Currently, if we had code generation, we'd generate a loop in which the VCTP would generate the predicate and VPST would then setup the value of VPR.PO. The loads and stores would be placed in VPT blocks so this is not tail predication, but normal VPT predication with the predicate based upon a element counting induction variable. Further work needs to be done to finally produce a true tail predicated loop. Because only the loads and stores are predicated, in both the LLVM IR and MIR level, we will restrict support to only lane-wise operations (no horizontal reductions). We will perform a final check on MIR during loop finalisation too. Another restriction, specific to MVE, is that all the vector instructions need operate on the same number of elements. This is because predication is performed at the byte level and this is set on entry to the loop, or by the VCTP instead. Differential Revision: https://reviews.llvm.org/D65884 llvm-svn: 371179
1 parent f879c68 commit 312409e

File tree

12 files changed

+1985
-1
lines changed

12 files changed

+1985
-1
lines changed
 

‎llvm/include/llvm/IR/IntrinsicsARM.td

+4
Original file line numberDiff line numberDiff line change
@@ -777,6 +777,10 @@ class Neon_Dot_Intrinsic
777777
def int_arm_neon_udot : Neon_Dot_Intrinsic;
778778
def int_arm_neon_sdot : Neon_Dot_Intrinsic;
779779

780+
def int_arm_vctp8 : Intrinsic<[llvm_v16i1_ty], [llvm_i32_ty], [IntrNoMem]>;
781+
def int_arm_vctp16 : Intrinsic<[llvm_v8i1_ty], [llvm_i32_ty], [IntrNoMem]>;
782+
def int_arm_vctp32 : Intrinsic<[llvm_v4i1_ty], [llvm_i32_ty], [IntrNoMem]>;
783+
def int_arm_vctp64 : Intrinsic<[llvm_v2i1_ty], [llvm_i32_ty], [IntrNoMem]>;
780784

781785
// GNU eabi mcount
782786
def int_arm_gnu_eabi_mcount : Intrinsic<[],

‎llvm/lib/Target/ARM/ARM.h

+2
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ class MachineInstr;
3535
class MCInst;
3636
class PassRegistry;
3737

38+
Pass *createMVETailPredicationPass();
3839
FunctionPass *createARMLowOverheadLoopsPass();
3940
Pass *createARMParallelDSPPass();
4041
FunctionPass *createARMISelDag(ARMBaseTargetMachine &TM,
@@ -67,6 +68,7 @@ void initializeThumb2SizeReducePass(PassRegistry &);
6768
void initializeThumb2ITBlockPass(PassRegistry &);
6869
void initializeMVEVPTBlockPass(PassRegistry &);
6970
void initializeARMLowOverheadLoopsPass(PassRegistry &);
71+
void initializeMVETailPredicationPass(PassRegistry &);
7072

7173
} // end namespace llvm
7274

‎llvm/lib/Target/ARM/ARMTargetMachine.cpp

+4-1
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ extern "C" void LLVMInitializeARMTarget() {
9696
initializeARMExpandPseudoPass(Registry);
9797
initializeThumb2SizeReducePass(Registry);
9898
initializeMVEVPTBlockPass(Registry);
99+
initializeMVETailPredicationPass(Registry);
99100
initializeARMLowOverheadLoopsPass(Registry);
100101
}
101102

@@ -447,8 +448,10 @@ bool ARMPassConfig::addPreISel() {
447448
MergeExternalByDefault));
448449
}
449450

450-
if (TM->getOptLevel() != CodeGenOpt::None)
451+
if (TM->getOptLevel() != CodeGenOpt::None) {
451452
addPass(createHardwareLoopsPass());
453+
addPass(createMVETailPredicationPass());
454+
}
452455

453456
return false;
454457
}

‎llvm/lib/Target/ARM/CMakeLists.txt

+1
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ add_llvm_target(ARMCodeGen
5252
ARMTargetObjectFile.cpp
5353
ARMTargetTransformInfo.cpp
5454
MLxExpansionPass.cpp
55+
MVETailPredication.cpp
5556
MVEVPTBlockPass.cpp
5657
Thumb1FrameLowering.cpp
5758
Thumb1InstrInfo.cpp

0 commit comments

Comments
 (0)