This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][MachineCombiner] Update isAssociativeAndCommutative
AbandonedPublic

Authored by kawashima-fj on Nov 16 2022, 2:23 AM.

Details

Summary

This commit adds opcodes for ADD, MUL, AND, ORR, and EOR Base/SIMD/SVE instructions and missing opcodes for FADD and FMUL FP/SIMD/SVE instructions to the isAssociativeAndCommutative function. Also, it removes opcodes for the FMULX instruction, which is not associative (bug fix).

This helps increasing instruction-level parallelism by the existing Machine InstCombiner pass. This supersedes D132828, which implements tree height reduction in a new LLVM IR pass. Advantages of using the existing Machine InstCombiner pass are (1) more precise cost estimation, (2) no redundant process, and (3) less compile-time impact. Disadvantages are (4) per-target isAssociativeAndCommutative implementation and (4) constraints by the instruction set (see comment for MULWrr in AArch64InstrInfo::isAssociativeAndCommutative). In addition, (5) the sequence of instructions may not be optimal in some cases in terms of ILP because the algorithm in TargetInstrInfo::getMachineCombinerPatterns in the Machine InstCombiner pass is simpler than that of D132828. Nonetheless, it generates a fairly good sequence of instructions.

I run C/C++ benchmarks in SPECrate 2017 on Fujitsu A64FX processor, which has two pipelines for integer operations and SIMD/FP operations each. 511.povray_r had 4% improvement. Other benchmarks (int: 500, 502, 505, 520, 523, 525, 531, 541, 557; fp: 508, 510, 519, 538, 544) were within 1% up/down. For a synthetic benchmark, it doubled the performance.

Diff Detail

Event Timeline

kawashima-fj created this revision.Nov 16 2022, 2:23 AM
kawashima-fj requested review of this revision.Nov 16 2022, 2:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 16 2022, 2:23 AM

@melver
As shown in the change of llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll, the Machine InstCombiner pass removes pcsections metadata when replacing MachineInstr. I'm not sure it should nor not.

Essentially, the Machine InstCombiner pass changes MIR

$5 = ADD $1, $2, pcsections !0
$6 = ADD $5, $3, pcsections !0
$7 = ADD $6, $4, pcsections !0

to

$5 = ADD $1, $2, pcsections !0
$6 = ADD $3, $4
$7 = ADD $5, $6

Should pcsections !0 in $6 and $7 be preserved? If so, probablly code to create new MachineInstr should be updated.

@melver
As shown in the change of llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll, the Machine InstCombiner pass removes pcsections metadata when replacing MachineInstr. I'm not sure it should nor not.

Essentially, the Machine InstCombiner pass changes MIR

$5 = ADD $1, $2, pcsections !0
$6 = ADD $5, $3, pcsections !0
$7 = ADD $6, $4, pcsections !0

to

$5 = ADD $1, $2, pcsections !0
$6 = ADD $3, $4
$7 = ADD $5, $6

Should pcsections !0 in $6 and $7 be preserved? If so, probablly code to create new MachineInstr should be updated.

Yes, because the add instructions in LLVM IR have !pcsections attached to them.

It should be enough to use BuildMI(*MF, MIMetadata(Prev), TII->get(Opcode), NewVR) ... for Prev and BuildMI(*MF, MIMetadata(Root), TII->get(Opcode), RegC)... for Root.

Thanks for pointing it out.

I think it would be best to split this up into some instruction groups. The scalar were already being handled by https://reviews.llvm.org/D134260 (ANDS can also be added). I hadn't committed that because it ran into the same pcsections test failure. I have a patch to fix it up though, I can put that up for review and get D134260 committed.

The others could be split into scalar fp/neon/sve/etc.

I think it would be best to split this up into some instruction groups. The scalar were already being handled by https://reviews.llvm.org/D134260 (ANDS can also be added). I hadn't committed that because it ran into the same pcsections test failure. I have a patch to fix it up though, I can put that up for review and get D134260 committed.

The others could be split into scalar fp/neon/sve/etc.

Thanks. I didn't notice your D134260.
Do you mean splitting one patch to some patches? Or, splitting case ... return statement in the code?

Thanks. I didn't notice your D134260.
Do you mean splitting one patch to some patches? Or, splitting case ... return statement in the code?

I mean a few different patches. It can be good to commit things in smaller chunks in case some part of it needs to be reverted. The patch looks sensible from what I can tell, I just feel it is doing a little too much all at once.

I mean a few different patches. It can be good to commit things in smaller chunks in case some part of it needs to be reverted. The patch looks sensible from what I can tell, I just feel it is doing a little too much all at once.

Ok. Do you have a plan to add FP/NEON/SVE patches after D134260? If not, I'll post splitted patches after D134260 is landed.

I have no plans past D134260, I will leave all the rest to you!

Matt added a subscriber: Matt.Nov 22 2022, 11:03 AM
kawashima-fj abandoned this revision.Dec 8 2022, 12:21 AM

Abandon. I'll post splitted (and improved) patches.