This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Emit FNMADD instead of FNEG(FMADD)
ClosedPublic

Authored by MattDevereau on Apr 26 2023, 7:54 AM.

Details

Summary

Emit FNMADD instead of FNEG(FMADD) for optimization levels
above Oz when fast-math flags (nsz+contract) permit it.

Diff Detail

Event Timeline

MattDevereau created this revision.Apr 26 2023, 7:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2023, 7:54 AM
MattDevereau requested review of this revision.Apr 26 2023, 7:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2023, 7:54 AM

This combine requires the "no signed zeroes" fast math flag in addition to the contract fast math flag. I assume the existing combines only checked for contract?

MattDevereau edited the summary of this revision. (Show Details)

@craig.topper That was correct, it was only checking for contract. I've added a check for both contract and nsz being present and added some more tests.

david-arm added inline comments.Apr 28 2023, 5:21 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5437

Shouldn't we also be doing FMSUB here too, to ensure we are also requiring the nsz flag for the existing fnmsub combine?

llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll
87

nit: This is just a minor thing, but at first I though @neg referred to negation here, given the combine involves a fneg. Perhaps it's less confusing to just remove it and add a comment above the negative test cases explaining why the combine doesn't happen?

MattDevereau added inline comments.
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5437

I don't think FNEG(FMSUB) and FNMSUB are equivalent, so we can't assume FNMSUB from a FNEG perspective like we can with FNMAD:

FNMSUB = (a * b) - c e.g. (12 * 4) - 8 = 32
FNEG(FMSUB) = -(-(a*b) + c)
e.g. -(12 * 4) + 8 = 40

llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll
87

Done, I agree it was pretty confusing

MattDevereau added inline comments.May 2 2023, 9:16 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5437

Furthering on from this FNMSUB and FNEG(FMSUB) are equal (oops, calcuator misuse) but there's already patterns that exist for emitting FNMSUBs. Adding tests similar to the ones added in this patch don't generate FNEG(FMSUB) but instead correctly emit FNMSUB already.

craig.topper added inline comments.May 2 2023, 10:03 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5437

They may not be equal for cases where combinations where a*b==0.0 or -0.0 and c==0.0 or -0.0.

fneg(fadd A, B) != fadd(fneg(A), fneg(B)) if (A==0.0 and B==-0.0) or (A==-0.0 and B=0.0)

fneg(fadd 0.0, -0.0) -> fneg(0.0) -> -0.0
fadd(fneg(0.0), fneg(-0.0)) -> fadd(-0.0, 0.0) -> 0.0

This is why I asked for the no signed zeros check.

craig.topper added inline comments.May 2 2023, 10:31 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5437

I guess its's really

fneg(fadd A,B) != fadd(fneg(A), fneg(B)) if (A==-B)

david-arm added inline comments.May 3 2023, 1:53 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5437

Hi @craig.topper in this case I'm really surprised that we don't check for nsz for fnmsub combines:

; Don't combine: Missing nsz
define void @fnmsubs_contract(ptr %a, ptr %b, ptr %c) {
entry:
  %0 = load float, ptr %a, align 4
  %1 = load float, ptr %b, align 4
  %mul = fmul contract float %1, %0
  %2 = load float, ptr %c, align 4
  %add = fsub contract float %mul, %2
  %fneg = fneg contract float %add
  store float %fneg, ptr %a, align 4
  ret void
}

I tested this without @MattDevereau's patch and it combines fine to fnmsub so it clearly doesn't need nsz. This suggests to me either:

  1. We don't need the nsz flag for the fnmadd combine, or
  2. For some reason the fnmsub combine doesn't need the nsz flag, or
  3. There is also a bug with the fnmsub combine that needs fixing.

Do you have any idea which of the above this is? There is no real harm in @MattDevereau adding the nsz check in this patch, but it would be good to know if it's actually necessary or whether there is an existing bug with fnmsub.

llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll
5

nit: Can you remove the dso_local markers on all these functions please? I don't think we need them.

david-arm accepted this revision.May 3 2023, 3:45 AM

LGTM with the nit addressed! Thanks @MattDevereau.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5437

Hi @craig.topper @MattDevereau, I think I understand now. FNMSUB does the opposite of what I expected, which was -((a * b) - c) - it actually does (a * b) - c. I missed the fact that in the fnmsubs_contract example I gave above we actually generate fnmsub followed by fneg. When you add the extra nsz flag this contracts to fmsub!!

This revision is now accepted and ready to land.May 3 2023, 3:45 AM
This revision was landed with ongoing or failed builds.May 5 2023, 1:14 AM
This revision was automatically updated to reflect the committed changes.
foad added a subscriber: foad.May 5 2023, 3:13 AM

I'm seeing a failure in my LLVM_ENABLE_EXPENSIVE_CHECKS build:

FAIL: LLVM :: CodeGen/AArch64/aarch64_fnmadd.ll (1 of 47620)
******************** TEST 'LLVM :: CodeGen/AArch64/aarch64_fnmadd.ll' FAILED ********************
Script:
--
: 'RUN: at line 2';   /home/jayfoad2/llvm-expensive/bin/llc < /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll -mtriple=aarch64-linux-gnu -O3 | /home/jayfoad2/llvm-expensive/bin/FileCheck /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll
--
Exit Code: 2

Command Output (stderr):
--

# After Machine InstCombiner
# Machine code for function fnmaddd: IsSSA, TracksLiveness
Function Live Ins: $x0 in %0, $x1 in %1, $x2 in %2

bb.0.entry:
  liveins: $x0, $x1, $x2
  %2:gpr64common = COPY $x2
  %1:gpr64common = COPY $x1
  %0:gpr64common = COPY $x0
  %3:fpr64 = LDRDui %0:gpr64common, 0 :: (load (s64) from %ir.a)
  %4:fpr64 = LDRDui %1:gpr64common, 0 :: (load (s64) from %ir.b)
  %6:fpr64 = LDRDui %2:gpr64common, 0 :: (load (s64) from %ir.c)
  %7:fpr64 = nnan ninf nsz arcp contract afn reassoc nofpexcept FMADDDrrr killed %4:fpr64, killed %3:fpr64, killed %6:fpr64, implicit $fpcr
  %8:fpr64 = nnan ninf nsz arcp contract afn reassoc nofpexcept FNMADDDrrr killed %4:fpr64, killed %3:fpr64, killed %6:fpr64, implicit $fpcr
  STRDui killed %8:fpr64, %0:gpr64common, 0 :: (store (s64) into %ir.a)
  RET_ReallyLR

# End machine code for function fnmaddd.

*** Bad machine code: Using a killed virtual register ***
- function:    fnmaddd
- basic block: %bb.0 entry (0x8785268)
- instruction: %8:fpr64 = nnan ninf nsz arcp contract afn reassoc nofpexcept FNMADDDrrr killed %4:fpr64, killed %3:fpr64, killed %6:fpr64, implicit $fpcr
- operand 1:   killed %4:fpr64

*** Bad machine code: Using a killed virtual register ***
- function:    fnmaddd
- basic block: %bb.0 entry (0x8785268)
- instruction: %8:fpr64 = nnan ninf nsz arcp contract afn reassoc nofpexcept FNMADDDrrr killed %4:fpr64, killed %3:fpr64, killed %6:fpr64, implicit $fpcr
- operand 2:   killed %3:fpr64

*** Bad machine code: Using a killed virtual register ***
- function:    fnmaddd
- basic block: %bb.0 entry (0x8785268)
- instruction: %8:fpr64 = nnan ninf nsz arcp contract afn reassoc nofpexcept FNMADDDrrr killed %4:fpr64, killed %3:fpr64, killed %6:fpr64, implicit $fpcr
- operand 3:   killed %6:fpr64
LLVM ERROR: Found 3 machine code errors.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/jayfoad2/llvm-expensive/bin/llc -mtriple=aarch64-linux-gnu -O3
1.	Running pass 'Function Pass Manager' on module '<stdin>'.
2.	Running pass 'Verify generated machine code' on function '@fnmaddd'
 #0 0x0000000006072777 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/jayfoad2/llvm-expensive/bin/llc+0x6072777)
 #1 0x000000000607062e llvm::sys::RunSignalHandlers() (/home/jayfoad2/llvm-expensive/bin/llc+0x607062e)
 #2 0x0000000006072e1a SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f138d442520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007f138d496a7c __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007f138d496a7c __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #6 0x00007f138d496a7c pthread_kill ./nptl/pthread_kill.c:89:10
 #7 0x00007f138d442476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x00007f138d4287f3 abort ./stdlib/abort.c:81:7
 #9 0x0000000005fe8e4c llvm::report_fatal_error(llvm::Twine const&, bool) (/home/jayfoad2/llvm-expensive/bin/llc+0x5fe8e4c)
#10 0x00000000055889eb (/home/jayfoad2/llvm-expensive/bin/llc+0x55889eb)
#11 0x00000000054a0eec llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/home/jayfoad2/llvm-expensive/bin/llc+0x54a0eec)
#12 0x0000000005964754 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/jayfoad2/llvm-expensive/bin/llc+0x5964754)
#13 0x000000000596c961 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/jayfoad2/llvm-expensive/bin/llc+0x596c961)
#14 0x00000000059651f9 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/jayfoad2/llvm-expensive/bin/llc+0x59651f9)
#15 0x0000000003ac8759 main (/home/jayfoad2/llvm-expensive/bin/llc+0x3ac8759)
#16 0x00007f138d429d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#17 0x00007f138d429e40 call_init ./csu/../csu/libc-start.c:128:20
#18 0x00007f138d429e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#19 0x0000000003ac3165 _start (/home/jayfoad2/llvm-expensive/bin/llc+0x3ac3165)
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/jayfoad2/llvm-expensive/bin/FileCheck /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll

@foad apologies, I pushed another commit a9919db65a1afa71ac62631d51711383c17d43fc straight afterwards which only enables this test for aarch64. It's possible that you pulled this commit but not the one immediately afterwards. If it still fails with both commits can you let me know? Thanks.

foad added a comment.May 5 2023, 3:18 AM

@foad apologies, I pushed another commit a9919db65a1afa71ac62631d51711383c17d43fc straight afterwards which only enables this test for aarch64. It's possible that you pulled this commit but not the one immediately afterwards. If it still fails with both commits can you let me know? Thanks.

I'm pretty sure that won't help since I build with all targets enabled.

@foad Very well, I've reverted it

foad added a comment.May 5 2023, 3:57 AM

@foad Very well, I've reverted it

Thanks. You might be able to reproduce the failure even in a non-expensive-checks build just by adding -verify-machineinstrs to the RUN line.

MattDevereau edited the summary of this revision. (Show Details)

When I pushed the previous revision it produced bad machine code which triggered failures.
Previously I created a new variable MAD for capturing old FMADD instructions and used this variable
to merge the FMADD flags with the FNEG flags, however I did not mark it for deletion like the MUL did, i.e.

  } // end switch (Pattern)
  // Record MUL and ADD/SUB for deletion
  if (MUL)
    DelInstrs.push_back(MUL);
  DelInstrs.push_back(&Root);

  // Set the flags on the inserted instructions to be the merged flags of the
  // instructions that we have combined.
  uint16_t Flags = Root.getFlags();
  if (MUL)
    Flags = Root.mergeFlagsWith(*MUL);
  if (MAD)
    Flags = Root.mergeFlagsWith(*MAD);
  for (auto *MI : InsInstrs)
    MI->setFlags(Flags);
}

Instead I have used the MUL variable to capture the MAD, and that emits clean machine code.
I have also added -verify-machineinstrs to the test to verify the machine code does not regress.

MattDevereau reopened this revision.May 5 2023, 5:35 AM
This revision is now accepted and ready to land.May 5 2023, 5:35 AM
david-arm accepted this revision.May 5 2023, 6:31 AM

LGTM. I'm happy with the fix @MattDevereau - by reusing MUL you're now correctly adding it to the delete list (DelInstrs).

This revision was automatically updated to reflect the committed changes.

This patch is causing a crash in our builds. here is a repro:

clang -cc1 -triple aarch64-cros-linux-gnu -emit-obj -ffp-contract=fast -ffast-math -O2 -x c crash.txt

double a();
int b(char *d) {
  _Bool e;
  double exponent;
  char *c = d;
  e = 1;
  do
    exponent = 10.0 * exponent - '0';
  while (c);
  a((e ? -1.0 : 1.0) * exponent);
}

Just took the opportunity to have a quick look at this patch, given that it recently got reverted.

llvm/include/llvm/CodeGen/MachineCombinerPattern.h
179–180

Having two patterns, one for 32-bit values, and one for 64-bit values doesn't match what was done for FMSUB/FNMSUB. Can these be merged into 1 and use the register class used for the operands to determine which instruction to use?

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5421

You'll need to add a check that MI != nullptr

From the doxygen comment:

/// getUniqueVRegDef - Return the unique machine instr that defines the
/// specified virtual register or null if none is found.  If there are
/// multiple definitions or no definition, return null.
5437

nit: if you just return Match(..) here, then you can avoid the variable Found

MattDevereau added inline comments.May 9 2023, 6:38 AM
llvm/include/llvm/CodeGen/MachineCombinerPattern.h
179–180

I'm not sure what you mean? MachineCombinerPattern::FNMSUB isn't used in AArch64InstrInfo::genAlternativeCodeSequence at all. The way FNMSUB is combined in this function has MachineCombinerPattern::FMULSUBH_OP1 MachineCombinerPattern::FMULSUBS_OP1 and MachineCombinerPattern::FMULSUBD_OP1: which all describe the number of bits. I'd need to do something like

auto RC = MRI.getRegClass(MAD->getOperand(0).getReg());
if RC == 64bit
  opc = FNMADDDrrr
else if RC == 32bit
  opc = FNMADDSrrr

which I can't see a clear example of

MattDevereau reopened this revision.May 10 2023, 2:01 AM

@manojgupta Thank you for reverting the patch and the reproducer. I've added a check to bail on the combine if there is more than one use of an FMADD which it what was causing your reproducer to fail and i've added a test to assert this behaviour now.

This revision is now accepted and ready to land.May 10 2023, 2:01 AM
MattDevereau marked 3 inline comments as done.May 10 2023, 2:02 AM
MattDevereau added inline comments.
llvm/include/llvm/CodeGen/MachineCombinerPattern.h
179–180

I managed to achieve this with Arch64::FPR32RegClass.hasSubClassEq(RC) in the end

david-arm added inline comments.May 10 2023, 3:56 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5709

I think we should probably be doing this check in getFNEGPatterns - this is similar to how it's done in canCombineWithFMUL, which is called from getFMAPatterns.

Also, I think this should be hasOneNonDBGUse to avoid debug info blocking your wonderful optimisation. :)

5719

nit: Sorry I didn't pick this up before, but I expect this can just be

else
  llvm_unreachable("Unexpected opcode");

because we only ever matched the double and single variants in getFNEGPatterns?

Move OneUseCheck to getFNEGPatterns and use hasOneNonDBGUse instead

MattDevereau marked an inline comment as done.May 10 2023, 4:07 AM
david-arm accepted this revision.May 10 2023, 5:16 AM

LGTM! I think you've addressed all the review comments. I had one nit on line 5717 about adding an unreachable instead of returning nullptr, but I won't hold the patch up for it.

sdesmalen added inline comments.May 10 2023, 5:32 AM
llvm/include/llvm/CodeGen/MachineCombinerPattern.h
179–180

Thanks!

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5441

nit: I don't know if some compilers could warn about the function not returning a value, but maybe you could break in the default case instead and return false here. Either that, or add a llvm_unreachable() here to make it clear that the function should have returned at this point.

This revision was landed with ongoing or failed builds.May 10 2023, 5:47 AM
This revision was automatically updated to reflect the committed changes.