Initial implementation to match basic FP reassociation patterns.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This patch does not contain the full test to make changes more readable during the review.
Whetstone double-precision perf results for sifive-u74 (-O3 -funroll-loops -finline-functions -ffast-math -mtune=sifive-u74)
N1 +8.9%
N2 +1%
Machine combiner is off
Loop content Result MFLOPS MOPS Seconds N1 floating point -1.12398255667391944 261.534 0.743 N2 floating point -1.12187079889293351 222.128 6.123 N3 if then else 1.00000000000000000 14291.202 0.073 N4 fixed point 12.00000000000000000 289771357.091 0.000 N5 sin,cos etc. 0.49902937281518078 20.891 40.300 N6 floating point 0.99999987890802811 170.563 32.001 N7 assignments 3.00000000000000000 28564.747 0.065 N8 exp,sqrt etc. 0.75100163018453681 21.163 17.787 MWIPS 1042.205 97.092
Machine combiner is on
Loop content Result MFLOPS MOPS Seconds N1 floating point -1.12398255667391900 284.892 0.692 N2 floating point -1.12187079889295083 224.735 6.144 N3 if then else 1.00000000000000000 14281.518 0.074 N4 fixed point 12.00000000000000000 269692501.333 0.000 N5 sin,cos etc. 0.49902937281518078 20.787 41.122 N6 floating point 0.99999987890802811 170.559 32.492 N7 assignments 3.00000000000000000 28605.105 0.066 N8 exp,sqrt etc. 0.75100163018453681 21.534 17.748 MWIPS 1044.744 98.340
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp | ||
---|---|---|
1150 | Probably want to make sure that the FrmOpIdx is the same as the current number of operands. If it's not then adding to the end is wrong. | |
1154 | I'm not sure I like this assumption that FRM physical register is the next operand. Can we just create a FRM implicit use operand instead of copying it from the old? | |
1211 | Why not use fast math flags? | |
llvm/lib/Target/RISCV/RISCVTargetMachine.cpp | ||
271 | Doesn't every other target put this in addILPOpts? |
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp | ||
---|---|---|
1211 | I changed current implementation to use fast math flags, but I have a question related to that. | |
llvm/lib/Target/RISCV/RISCVTargetMachine.cpp | ||
271 | Thanks for the question, it's definitely something we should discuss. You are right, addILPOpts is the common insertion point for this pass. However, my experiments show that it's more profitable to run machine combiner after machine LICM and machine sinking which are inserted right after ILPOpts customization point. I also need to say that my local machine combiner version has many more patterns to combine, e.g. it can reassociate integer additions. bb.291.for.body3.us.us.i66: ; predecessors: %bb.289, %bb.290 successors: %bb.63(0x04000000), %bb.62(0x7c000000); %bb.63(3.12%), %bb.62(96.88%) %76:gpr = PHI %1287:gpr, %bb.289, %1285:gpr, %bb.290 %1288:gpr = PHI %1210:gpr, %bb.289, %1286:gpr, %bb.290 %1212:gpr = ADD %1211:gpr, %72:gpr %1223:gpr = ADD %1222:gpr, %1212:gpr %1234:gpr = ADD %1233:gpr, %1223:gpr %1245:gpr = ADD %1244:gpr, %1234:gpr %1256:gpr = ADD %1255:gpr, %1245:gpr %1267:gpr = ADD %1266:gpr, %1256:gpr %1278:gpr = ADD %1277:gpr, %1267:gpr %77:gpr = ADD killed %1288:gpr, killed %1278:gpr %78:gpr = nuw ADDI %71:gpr, 8 BNE %3114:gpr, %78:gpr, %bb.62 PseudoBR %bb.63 Reassociation allows using multiple ALUs for this code. At the same time I wasn't able to find any examples against inserting machine combiner a little later than other targets do. |
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp | ||
---|---|---|
1186 | I'd like to comment why I don't follow the default path here (TargetInstrInfo::getMachineCombinerPatterns -> TargetInstrInfo::isReassociationCandidate -> ...). There are two main reasons:
|
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp | ||
---|---|---|
1211 | Here's the patch where X86 removed the UnsafeFPMath check https://reviews.llvm.org/D74851 |
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp | ||
---|---|---|
1205 | I looked into how quite similar transformations are implemented in InstCombine. Usually it's enough to check that the root instruction allows reassociation. So, we can think of relaxing this condition. One more thing to mention is that other targets in addititon to FmReassoc checks FmNsz. I don't see any need to do this for fadd and fmul cases. Please, correct me if I'm wrong. | |
1211 | Thanks for the link. Then I think we also should use only instructions flags. |
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp | ||
---|---|---|
1165 | Thanks for drawing my attention to this. I think we should. As an improvement we can think of propagating some fast-math flags from the root instruction, but not in this patch. |
We have to revert it because tests were not correctly updated by 'update_mir_test_checks.py' script. Fix in D136170. I'll regenerate tests and update the review soon.
Probably want to make sure that the FrmOpIdx is the same as the current number of operands. If it's not then adding to the end is wrong.