Emit FNMADD instead of FNEG(FMADD) for optimization levels
above Oz when fast-math flags (nsz+contract) permit it.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This combine requires the "no signed zeroes" fast math flag in addition to the contract fast math flag. I assume the existing combines only checked for contract?
@craig.topper That was correct, it was only checking for contract. I've added a check for both contract and nsz being present and added some more tests.
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
5437 | Shouldn't we also be doing FMSUB here too, to ensure we are also requiring the nsz flag for the existing fnmsub combine? | |
llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll | ||
87 | nit: This is just a minor thing, but at first I though @neg referred to negation here, given the combine involves a fneg. Perhaps it's less confusing to just remove it and add a comment above the negative test cases explaining why the combine doesn't happen? |
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
5437 | I don't think FNEG(FMSUB) and FNMSUB are equivalent, so we can't assume FNMSUB from a FNEG perspective like we can with FNMAD: FNMSUB = (a * b) - c e.g. (12 * 4) - 8 = 32 | |
llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll | ||
87 | Done, I agree it was pretty confusing |
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
5437 | Furthering on from this FNMSUB and FNEG(FMSUB) are equal (oops, calcuator misuse) but there's already patterns that exist for emitting FNMSUBs. Adding tests similar to the ones added in this patch don't generate FNEG(FMSUB) but instead correctly emit FNMSUB already. |
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
5437 | They may not be equal for cases where combinations where a*b==0.0 or -0.0 and c==0.0 or -0.0. fneg(fadd A, B) != fadd(fneg(A), fneg(B)) if (A==0.0 and B==-0.0) or (A==-0.0 and B=0.0) fneg(fadd 0.0, -0.0) -> fneg(0.0) -> -0.0 fadd(fneg(0.0), fneg(-0.0)) -> fadd(-0.0, 0.0) -> 0.0 This is why I asked for the no signed zeros check. |
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
5437 | I guess its's really fneg(fadd A,B) != fadd(fneg(A), fneg(B)) if (A==-B) |
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
5437 | Hi @craig.topper in this case I'm really surprised that we don't check for nsz for fnmsub combines: ; Don't combine: Missing nsz define void @fnmsubs_contract(ptr %a, ptr %b, ptr %c) { entry: %0 = load float, ptr %a, align 4 %1 = load float, ptr %b, align 4 %mul = fmul contract float %1, %0 %2 = load float, ptr %c, align 4 %add = fsub contract float %mul, %2 %fneg = fneg contract float %add store float %fneg, ptr %a, align 4 ret void } I tested this without @MattDevereau's patch and it combines fine to fnmsub so it clearly doesn't need nsz. This suggests to me either:
Do you have any idea which of the above this is? There is no real harm in @MattDevereau adding the nsz check in this patch, but it would be good to know if it's actually necessary or whether there is an existing bug with fnmsub. | |
llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll | ||
5 | nit: Can you remove the dso_local markers on all these functions please? I don't think we need them. |
LGTM with the nit addressed! Thanks @MattDevereau.
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
5437 | Hi @craig.topper @MattDevereau, I think I understand now. FNMSUB does the opposite of what I expected, which was -((a * b) - c) - it actually does (a * b) - c. I missed the fact that in the fnmsubs_contract example I gave above we actually generate fnmsub followed by fneg. When you add the extra nsz flag this contracts to fmsub!! |
I'm seeing a failure in my LLVM_ENABLE_EXPENSIVE_CHECKS build:
FAIL: LLVM :: CodeGen/AArch64/aarch64_fnmadd.ll (1 of 47620) ******************** TEST 'LLVM :: CodeGen/AArch64/aarch64_fnmadd.ll' FAILED ******************** Script: -- : 'RUN: at line 2'; /home/jayfoad2/llvm-expensive/bin/llc < /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll -mtriple=aarch64-linux-gnu -O3 | /home/jayfoad2/llvm-expensive/bin/FileCheck /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll -- Exit Code: 2 Command Output (stderr): -- # After Machine InstCombiner # Machine code for function fnmaddd: IsSSA, TracksLiveness Function Live Ins: $x0 in %0, $x1 in %1, $x2 in %2 bb.0.entry: liveins: $x0, $x1, $x2 %2:gpr64common = COPY $x2 %1:gpr64common = COPY $x1 %0:gpr64common = COPY $x0 %3:fpr64 = LDRDui %0:gpr64common, 0 :: (load (s64) from %ir.a) %4:fpr64 = LDRDui %1:gpr64common, 0 :: (load (s64) from %ir.b) %6:fpr64 = LDRDui %2:gpr64common, 0 :: (load (s64) from %ir.c) %7:fpr64 = nnan ninf nsz arcp contract afn reassoc nofpexcept FMADDDrrr killed %4:fpr64, killed %3:fpr64, killed %6:fpr64, implicit $fpcr %8:fpr64 = nnan ninf nsz arcp contract afn reassoc nofpexcept FNMADDDrrr killed %4:fpr64, killed %3:fpr64, killed %6:fpr64, implicit $fpcr STRDui killed %8:fpr64, %0:gpr64common, 0 :: (store (s64) into %ir.a) RET_ReallyLR # End machine code for function fnmaddd. *** Bad machine code: Using a killed virtual register *** - function: fnmaddd - basic block: %bb.0 entry (0x8785268) - instruction: %8:fpr64 = nnan ninf nsz arcp contract afn reassoc nofpexcept FNMADDDrrr killed %4:fpr64, killed %3:fpr64, killed %6:fpr64, implicit $fpcr - operand 1: killed %4:fpr64 *** Bad machine code: Using a killed virtual register *** - function: fnmaddd - basic block: %bb.0 entry (0x8785268) - instruction: %8:fpr64 = nnan ninf nsz arcp contract afn reassoc nofpexcept FNMADDDrrr killed %4:fpr64, killed %3:fpr64, killed %6:fpr64, implicit $fpcr - operand 2: killed %3:fpr64 *** Bad machine code: Using a killed virtual register *** - function: fnmaddd - basic block: %bb.0 entry (0x8785268) - instruction: %8:fpr64 = nnan ninf nsz arcp contract afn reassoc nofpexcept FNMADDDrrr killed %4:fpr64, killed %3:fpr64, killed %6:fpr64, implicit $fpcr - operand 3: killed %6:fpr64 LLVM ERROR: Found 3 machine code errors. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: /home/jayfoad2/llvm-expensive/bin/llc -mtriple=aarch64-linux-gnu -O3 1. Running pass 'Function Pass Manager' on module '<stdin>'. 2. Running pass 'Verify generated machine code' on function '@fnmaddd' #0 0x0000000006072777 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/jayfoad2/llvm-expensive/bin/llc+0x6072777) #1 0x000000000607062e llvm::sys::RunSignalHandlers() (/home/jayfoad2/llvm-expensive/bin/llc+0x607062e) #2 0x0000000006072e1a SignalHandler(int) Signals.cpp:0:0 #3 0x00007f138d442520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520) #4 0x00007f138d496a7c __pthread_kill_implementation ./nptl/pthread_kill.c:44:76 #5 0x00007f138d496a7c __pthread_kill_internal ./nptl/pthread_kill.c:78:10 #6 0x00007f138d496a7c pthread_kill ./nptl/pthread_kill.c:89:10 #7 0x00007f138d442476 gsignal ./signal/../sysdeps/posix/raise.c:27:6 #8 0x00007f138d4287f3 abort ./stdlib/abort.c:81:7 #9 0x0000000005fe8e4c llvm::report_fatal_error(llvm::Twine const&, bool) (/home/jayfoad2/llvm-expensive/bin/llc+0x5fe8e4c) #10 0x00000000055889eb (/home/jayfoad2/llvm-expensive/bin/llc+0x55889eb) #11 0x00000000054a0eec llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/home/jayfoad2/llvm-expensive/bin/llc+0x54a0eec) #12 0x0000000005964754 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/jayfoad2/llvm-expensive/bin/llc+0x5964754) #13 0x000000000596c961 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/jayfoad2/llvm-expensive/bin/llc+0x596c961) #14 0x00000000059651f9 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/jayfoad2/llvm-expensive/bin/llc+0x59651f9) #15 0x0000000003ac8759 main (/home/jayfoad2/llvm-expensive/bin/llc+0x3ac8759) #16 0x00007f138d429d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16 #17 0x00007f138d429e40 call_init ./csu/../csu/libc-start.c:128:20 #18 0x00007f138d429e40 __libc_start_main ./csu/../csu/libc-start.c:379:5 #19 0x0000000003ac3165 _start (/home/jayfoad2/llvm-expensive/bin/llc+0x3ac3165) FileCheck error: '<stdin>' is empty. FileCheck command line: /home/jayfoad2/llvm-expensive/bin/FileCheck /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll
@foad apologies, I pushed another commit a9919db65a1afa71ac62631d51711383c17d43fc straight afterwards which only enables this test for aarch64. It's possible that you pulled this commit but not the one immediately afterwards. If it still fails with both commits can you let me know? Thanks.
Thanks. You might be able to reproduce the failure even in a non-expensive-checks build just by adding -verify-machineinstrs to the RUN line.
When I pushed the previous revision it produced bad machine code which triggered failures.
Previously I created a new variable MAD for capturing old FMADD instructions and used this variable
to merge the FMADD flags with the FNEG flags, however I did not mark it for deletion like the MUL did, i.e.
} // end switch (Pattern) // Record MUL and ADD/SUB for deletion if (MUL) DelInstrs.push_back(MUL); DelInstrs.push_back(&Root); // Set the flags on the inserted instructions to be the merged flags of the // instructions that we have combined. uint16_t Flags = Root.getFlags(); if (MUL) Flags = Root.mergeFlagsWith(*MUL); if (MAD) Flags = Root.mergeFlagsWith(*MAD); for (auto *MI : InsInstrs) MI->setFlags(Flags); }
Instead I have used the MUL variable to capture the MAD, and that emits clean machine code.
I have also added -verify-machineinstrs to the test to verify the machine code does not regress.
LGTM. I'm happy with the fix @MattDevereau - by reusing MUL you're now correctly adding it to the delete list (DelInstrs).
This patch is causing a crash in our builds. here is a repro:
clang -cc1 -triple aarch64-cros-linux-gnu -emit-obj -ffp-contract=fast -ffast-math -O2 -x c crash.txt double a(); int b(char *d) { _Bool e; double exponent; char *c = d; e = 1; do exponent = 10.0 * exponent - '0'; while (c); a((e ? -1.0 : 1.0) * exponent); }
Just took the opportunity to have a quick look at this patch, given that it recently got reverted.
llvm/include/llvm/CodeGen/MachineCombinerPattern.h | ||
---|---|---|
182–183 | Having two patterns, one for 32-bit values, and one for 64-bit values doesn't match what was done for FMSUB/FNMSUB. Can these be merged into 1 and use the register class used for the operands to determine which instruction to use? | |
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
5421 | You'll need to add a check that MI != nullptr From the doxygen comment: /// getUniqueVRegDef - Return the unique machine instr that defines the /// specified virtual register or null if none is found. If there are /// multiple definitions or no definition, return null. | |
5437 | nit: if you just return Match(..) here, then you can avoid the variable Found |
llvm/include/llvm/CodeGen/MachineCombinerPattern.h | ||
---|---|---|
182–183 | I'm not sure what you mean? MachineCombinerPattern::FNMSUB isn't used in AArch64InstrInfo::genAlternativeCodeSequence at all. The way FNMSUB is combined in this function has MachineCombinerPattern::FMULSUBH_OP1 MachineCombinerPattern::FMULSUBS_OP1 and MachineCombinerPattern::FMULSUBD_OP1: which all describe the number of bits. I'd need to do something like auto RC = MRI.getRegClass(MAD->getOperand(0).getReg()); if RC == 64bit opc = FNMADDDrrr else if RC == 32bit opc = FNMADDSrrr which I can't see a clear example of |
@manojgupta Thank you for reverting the patch and the reproducer. I've added a check to bail on the combine if there is more than one use of an FMADD which it what was causing your reproducer to fail and i've added a test to assert this behaviour now.
llvm/include/llvm/CodeGen/MachineCombinerPattern.h | ||
---|---|---|
182–183 | I managed to achieve this with Arch64::FPR32RegClass.hasSubClassEq(RC) in the end |
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
5713 | I think we should probably be doing this check in getFNEGPatterns - this is similar to how it's done in canCombineWithFMUL, which is called from getFMAPatterns. Also, I think this should be hasOneNonDBGUse to avoid debug info blocking your wonderful optimisation. :) | |
5723 | nit: Sorry I didn't pick this up before, but I expect this can just be else llvm_unreachable("Unexpected opcode"); because we only ever matched the double and single variants in getFNEGPatterns? |
LGTM! I think you've addressed all the review comments. I had one nit on line 5717 about adding an unreachable instead of returning nullptr, but I won't hold the patch up for it.
llvm/include/llvm/CodeGen/MachineCombinerPattern.h | ||
---|---|---|
182–183 | Thanks! | |
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | ||
5441 | nit: I don't know if some compilers could warn about the function not returning a value, but maybe you could break in the default case instead and return false here. Either that, or add a llvm_unreachable() here to make it clear that the function should have returned at this point. |
Having two patterns, one for 32-bit values, and one for 64-bit values doesn't match what was done for FMSUB/FNMSUB. Can these be merged into 1 and use the register class used for the operands to determine which instruction to use?