This is an archive of the discontinued LLVM Phabricator instance.

[MachineCombiner][RISCV] Enable MachineCombiner for RISCV
ClosedPublic

Authored by asi-sc on Oct 5 2022, 5:48 AM.

Diff Detail

Event Timeline

asi-sc created this revision.Oct 5 2022, 5:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2022, 5:48 AM
asi-sc requested review of this revision.Oct 5 2022, 5:48 AM
asi-sc added a comment.Oct 5 2022, 5:57 AM

This patch does not contain the full test to make changes more readable during the review.

Whetstone double-precision perf results for sifive-u74 (-O3 -funroll-loops -finline-functions -ffast-math -mtune=sifive-u74)
N1 +8.9%
N2 +1%

Machine combiner is off

Loop content                  Result              MFLOPS      MOPS   Seconds

N1 floating point     -1.12398255667391944       261.534              0.743
N2 floating point     -1.12187079889293351       222.128              6.123
N3 if then else        1.00000000000000000               14291.202    0.073
N4 fixed point        12.00000000000000000               289771357.091    0.000
N5 sin,cos etc.        0.49902937281518078                  20.891   40.300
N6 floating point      0.99999987890802811       170.563             32.001
N7 assignments         3.00000000000000000               28564.747    0.065
N8 exp,sqrt etc.       0.75100163018453681                  21.163   17.787

MWIPS                                           1042.205             97.092

Machine combiner is on

Loop content                  Result              MFLOPS      MOPS   Seconds

N1 floating point     -1.12398255667391900       284.892              0.692
N2 floating point     -1.12187079889295083       224.735              6.144
N3 if then else        1.00000000000000000               14281.518    0.074
N4 fixed point        12.00000000000000000               269692501.333    0.000
N5 sin,cos etc.        0.49902937281518078                  20.787   41.122
N6 floating point      0.99999987890802811       170.559             32.492
N7 assignments         3.00000000000000000               28605.105    0.066
N8 exp,sqrt etc.       0.75100163018453681                  21.534   17.748

MWIPS                                           1044.744             98.340
craig.topper added inline comments.Oct 5 2022, 9:29 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
1150

Probably want to make sure that the FrmOpIdx is the same as the current number of operands. If it's not then adding to the end is wrong.

1154

I'm not sure I like this assumption that FRM physical register is the next operand. Can we just create a FRM implicit use operand instead of copying it from the old?

1211

Why not use fast math flags?

llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
271

Doesn't every other target put this in addILPOpts?

asi-sc updated this revision to Diff 465715.Oct 6 2022, 6:34 AM

Addressed review comments

asi-sc added inline comments.Oct 6 2022, 6:38 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
1211

I changed current implementation to use fast math flags, but I have a question related to that.
Clang distributes fast math flags to instructions if -ffast-math is passed, whereas llc does not. So, theoretically we can have UnsafeFPMath option enabled and don't have fast math flags on instructions. What do you think, should we combine instructions in this case?

llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
271

Thanks for the question, it's definitely something we should discuss. You are right, addILPOpts is the common insertion point for this pass. However, my experiments show that it's more profitable to run machine combiner after machine LICM and machine sinking which are inserted right after ILPOpts customization point. I also need to say that my local machine combiner version has many more patterns to combine, e.g. it can reassociate integer additions.
One example I remember is that on Coremark machine sinking gathers a chain of additions in the loop:

bb.291.for.body3.us.us.i66:
; predecessors: %bb.289, %bb.290 
  successors: %bb.63(0x04000000), %bb.62(0x7c000000); %bb.63(3.12%), %bb.62(96.88%)

  %76:gpr = PHI %1287:gpr, %bb.289, %1285:gpr, %bb.290 
  %1288:gpr = PHI %1210:gpr, %bb.289, %1286:gpr, %bb.290 
  %1212:gpr = ADD %1211:gpr, %72:gpr 
  %1223:gpr = ADD %1222:gpr, %1212:gpr
  %1234:gpr = ADD %1233:gpr, %1223:gpr
  %1245:gpr = ADD %1244:gpr, %1234:gpr
  %1256:gpr = ADD %1255:gpr, %1245:gpr
  %1267:gpr = ADD %1266:gpr, %1256:gpr
  %1278:gpr = ADD %1277:gpr, %1267:gpr
  %77:gpr = ADD killed %1288:gpr, killed %1278:gpr
  %78:gpr = nuw ADDI %71:gpr, 8
  BNE %3114:gpr, %78:gpr, %bb.62
  PseudoBR %bb.63

Reassociation allows using multiple ALUs for this code.

At the same time I wasn't able to find any examples against inserting machine combiner a little later than other targets do.

asi-sc added inline comments.Oct 7 2022, 10:31 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
1186

I'd like to comment why I don't follow the default path here (TargetInstrInfo::getMachineCombinerPatterns -> TargetInstrInfo::isReassociationCandidate -> ...). There are two main reasons:

  • Generic implementation relies on isAssociativeAndCommutative function that takes MachineInstruction. Having it, we can check the opcode, fast math flags, etc. However, instructions are checked independently, so it's not possible to test if two instructions have equal rounding modes. Adding one more interface method required only by RISCV target doesn't seem to be a good idea.
  • If we decide to introduce machine combiner for RISCV, I'd like to add patterns for FSUB instruction as well, e.g. (X + A) - Y => (X - Y) + A. These FSUB patterns need approximately the same checks as the patterns from this patch but cannot follow default (associative and commutative) path. It means that this code will be required anyway.
craig.topper added inline comments.Oct 7 2022, 10:35 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
1211

Here's the patch where X86 removed the UnsafeFPMath check https://reviews.llvm.org/D74851

asi-sc updated this revision to Diff 466480.Oct 10 2022, 6:21 AM

Small code refactoring and tests update (change 'fast' flag to 'reassoc')

asi-sc added inline comments.Oct 10 2022, 6:49 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
1205

I looked into how quite similar transformations are implemented in InstCombine. Usually it's enough to check that the root instruction allows reassociation. So, we can think of relaxing this condition.

One more thing to mention is that other targets in addititon to FmReassoc checks FmNsz. I don't see any need to do this for fadd and fmul cases. Please, correct me if I'm wrong.

1211

Thanks for the link. Then I think we also should use only instructions flags.

craig.topper added inline comments.Oct 10 2022, 9:39 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
1165

Should we intersect the MIFlags too like X86 and PowerPC do? Not sure where AArch64 doesn't.

1205

I don't think Nsz is needed, but I'd rather us be consistent with the other 3 targets that do this.

asi-sc updated this revision to Diff 466776.Oct 11 2022, 5:09 AM

Make flags intersection for new instructions, add Nsz flag check.

asi-sc added inline comments.Oct 11 2022, 5:18 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
1165

Thanks for drawing my attention to this. I think we should. As an improvement we can think of propagating some fast-math flags from the root instruction, but not in this patch.
I added flags intersection and two test cases.

asi-sc updated this revision to Diff 466824.Oct 11 2022, 8:13 AM

Move FRM operands creation from setSpecialOperandAttr to finalizeInsInstrs

craig.topper added inline comments.Oct 12 2022, 11:05 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
1156

auto *NewMi

1221

Use Added = true. No need for |

asi-sc updated this revision to Diff 467393.Oct 13 2022, 1:32 AM

Addressed review comments

asi-sc updated this revision to Diff 467801.Oct 14 2022, 9:28 AM

Rebase after D135776

This revision is now accepted and ready to land.Oct 14 2022, 3:46 PM
asi-sc updated this revision to Diff 468483.Oct 18 2022, 4:50 AM

Rebase to fetch precommitted tests

This revision was automatically updated to reflect the committed changes.

We have to revert it because tests were not correctly updated by 'update_mir_test_checks.py' script. Fix in D136170. I'll regenerate tests and update the review soon.

asi-sc reopened this revision.Oct 18 2022, 8:39 AM
This revision is now accepted and ready to land.Oct 18 2022, 8:39 AM
asi-sc updated this revision to Diff 468572.Oct 18 2022, 8:42 AM

Re-generate machine-combiner-mir.ll test checks

This revision was landed with ongoing or failed builds.Oct 18 2022, 8:58 AM
This revision was automatically updated to reflect the committed changes.