This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
3/4
MachineCombinerPattern.h
-
lib/Target/AArch64/
-
Target/
-
AArch64/
4/12
AArch64InstrInfo.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1/3
aarch64_fnmadd.ll

Differential D149260

[AArch64] Emit FNMADD instead of FNEG(FMADD)
ClosedPublic

Authored by MattDevereau on Apr 26 2023, 7:54 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
david-arm
kmclaughlin

Commits

rG004bf170c6cb: [AArch64] Emit FNMADD instead of FNEG(FMADD)
rGea228bd0bd01: [AArch64] Emit FNMADD instead of FNEG(FMADD)
rGcaa95c240867: [AArch64] Emit FNMADD instead of FNEG(FMADD)

Summary

Emit FNMADD instead of FNEG(FMADD) for optimization levels
above Oz when fast-math flags (nsz+contract) permit it.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

MattDevereau created this revision.Apr 26 2023, 7:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2023, 7:54 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

MattDevereau requested review of this revision.Apr 26 2023, 7:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2023, 7:54 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B228297: Diff 517177.Apr 26 2023, 8:31 AM

This combine requires the "no signed zeroes" fast math flag in addition to the contract fast math flag. I assume the existing combines only checked for contract?

@craig.topper That was correct, it was only checking for contract. I've added a check for both contract and nsz being present and added some more tests.

Harbormaster completed remote builds in B228785: Diff 517866.Apr 28 2023, 4:59 AM

david-arm added inline comments.Apr 28 2023, 5:21 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5437	Shouldn't we also be doing FMSUB here too, to ensure we are also requiring the nsz flag for the existing fnmsub combine?
llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll
87	nit: This is just a minor thing, but at first I though `@neg` referred to negation here, given the combine involves a `fneg`. Perhaps it's less confusing to just remove it and add a comment above the negative test cases explaining why the combine doesn't happen?

MattDevereau updated this revision to Diff 518756.May 2 2023, 8:39 AM

MattDevereau added inline comments.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5437	I don't think FNEG(FMSUB) and FNMSUB are equivalent, so we can't assume FNMSUB from a FNEG perspective like we can with FNMAD: FNMSUB = (a * b) - c e.g. (12 * 4) - 8 = 32 FNEG(FMSUB) = -(-(ab) + c) e.g. -(12 4) + 8 = 40
llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll
87	Done, I agree it was pretty confusing

MattDevereau updated this revision to Diff 518764.May 2 2023, 9:13 AM

MattDevereau added inline comments.May 2 2023, 9:16 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5437	Furthering on from this FNMSUB and FNEG(FMSUB) are equal (oops, calcuator misuse) but there's already patterns that exist for emitting FNMSUBs. Adding tests similar to the ones added in this patch don't generate FNEG(FMSUB) but instead correctly emit FNMSUB already.

craig.topper added inline comments.May 2 2023, 10:03 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5437	They may not be equal for cases where combinations where a*b==0.0 or -0.0 and c==0.0 or -0.0. fneg(fadd A, B) != fadd(fneg(A), fneg(B)) if (A==0.0 and B==-0.0) or (A==-0.0 and B=0.0) fneg(fadd 0.0, -0.0) -> fneg(0.0) -> -0.0 fadd(fneg(0.0), fneg(-0.0)) -> fadd(-0.0, 0.0) -> 0.0 This is why I asked for the no signed zeros check.

Harbormaster completed remote builds in B229458: Diff 518764.May 2 2023, 10:25 AM

craig.topper added inline comments.May 2 2023, 10:31 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5437	I guess its's really fneg(fadd A,B) != fadd(fneg(A), fneg(B)) if (A==-B)

david-arm added inline comments.May 3 2023, 1:53 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5437	Hi @craig.topper in this case I'm really surprised that we don't check for nsz for fnmsub combines: ; Don't combine: Missing nsz define void @fnmsubs_contract(ptr %a, ptr %b, ptr %c) { entry: %0 = load float, ptr %a, align 4 %1 = load float, ptr %b, align 4 %mul = fmul contract float %1, %0 %2 = load float, ptr %c, align 4 %add = fsub contract float %mul, %2 %fneg = fneg contract float %add store float %fneg, ptr %a, align 4 ret void } I tested this without @MattDevereau's patch and it combines fine to `fnmsub` so it clearly doesn't need `nsz`. This suggests to me either: We don't need the `nsz` flag for the fnmadd combine, or For some reason the fnmsub combine doesn't need the `nsz` flag, or There is also a bug with the fnmsub combine that needs fixing. Do you have any idea which of the above this is? There is no real harm in @MattDevereau adding the `nsz` check in this patch, but it would be good to know if it's actually necessary or whether there is an existing bug with fnmsub.
llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll
5	nit: Can you remove the `dso_local` markers on all these functions please? I don't think we need them.

LGTM with the nit addressed! Thanks @MattDevereau.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5437	Hi @craig.topper @MattDevereau, I think I understand now. FNMSUB does the opposite of what I expected, which was `-((a * b) - c)` - it actually does `(a * b) - c`. I missed the fact that in the `fnmsubs_contract` example I gave above we actually generate `fnmsub` followed by `fneg`. When you add the extra `nsz` flag this contracts to `fmsub`!!

This revision is now accepted and ready to land.May 3 2023, 3:45 AM

This revision was landed with ongoing or failed builds.May 5 2023, 1:14 AM

Closed by commit rGcaa95c240867: [AArch64] Emit FNMADD instead of FNEG(FMADD) (authored by MattDevereau). · Explain Why

This revision was automatically updated to reflect the committed changes.

MattDevereau added a commit: rGcaa95c240867: [AArch64] Emit FNMADD instead of FNEG(FMADD).

I'm seeing a failure in my LLVM_ENABLE_EXPENSIVE_CHECKS build:

FAIL: LLVM :: CodeGen/AArch64/aarch64_fnmadd.ll (1 of 47620)
******************** TEST 'LLVM :: CodeGen/AArch64/aarch64_fnmadd.ll' FAILED ********************
Script:
--
: 'RUN: at line 2';   /home/jayfoad2/llvm-expensive/bin/llc < /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll -mtriple=aarch64-linux-gnu -O3 | /home/jayfoad2/llvm-expensive/bin/FileCheck /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll
--
Exit Code: 2

Command Output (stderr):
--

# After Machine InstCombiner
# Machine code for function fnmaddd: IsSSA, TracksLiveness
Function Live Ins: $x0 in %0, $x1 in %1, $x2 in %2

bb.0.entry:
  liveins: $x0, $x1, $x2
  %2:gpr64common = COPY $x2
  %1:gpr64common = COPY $x1
  %0:gpr64common = COPY $x0
  %3:fpr64 = LDRDui %0:gpr64common, 0 :: (load (s64) from %ir.a)
  %4:fpr64 = LDRDui %1:gpr64common, 0 :: (load (s64) from %ir.b)
  %6:fpr64 = LDRDui %2:gpr64common, 0 :: (load (s64) from %ir.c)
  %7:fpr64 = nnan ninf nsz arcp contract afn reassoc nofpexcept FMADDDrrr killed %4:fpr64, killed %3:fpr64, killed %6:fpr64, implicit $fpcr
  %8:fpr64 = nnan ninf nsz arcp contract afn reassoc nofpexcept FNMADDDrrr killed %4:fpr64, killed %3:fpr64, killed %6:fpr64, implicit $fpcr
  STRDui killed %8:fpr64, %0:gpr64common, 0 :: (store (s64) into %ir.a)
  RET_ReallyLR

# End machine code for function fnmaddd.

*** Bad machine code: Using a killed virtual register ***
- function:    fnmaddd
- basic block: %bb.0 entry (0x8785268)
- instruction: %8:fpr64 = nnan ninf nsz arcp contract afn reassoc nofpexcept FNMADDDrrr killed %4:fpr64, killed %3:fpr64, killed %6:fpr64, implicit $fpcr
- operand 1:   killed %4:fpr64

*** Bad machine code: Using a killed virtual register ***
- function:    fnmaddd
- basic block: %bb.0 entry (0x8785268)
- instruction: %8:fpr64 = nnan ninf nsz arcp contract afn reassoc nofpexcept FNMADDDrrr killed %4:fpr64, killed %3:fpr64, killed %6:fpr64, implicit $fpcr
- operand 2:   killed %3:fpr64

*** Bad machine code: Using a killed virtual register ***
- function:    fnmaddd
- basic block: %bb.0 entry (0x8785268)
- instruction: %8:fpr64 = nnan ninf nsz arcp contract afn reassoc nofpexcept FNMADDDrrr killed %4:fpr64, killed %3:fpr64, killed %6:fpr64, implicit $fpcr
- operand 3:   killed %6:fpr64
LLVM ERROR: Found 3 machine code errors.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/jayfoad2/llvm-expensive/bin/llc -mtriple=aarch64-linux-gnu -O3
1.	Running pass 'Function Pass Manager' on module '<stdin>'.
2.	Running pass 'Verify generated machine code' on function '@fnmaddd'
 #0 0x0000000006072777 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/jayfoad2/llvm-expensive/bin/llc+0x6072777)
 #1 0x000000000607062e llvm::sys::RunSignalHandlers() (/home/jayfoad2/llvm-expensive/bin/llc+0x607062e)
 #2 0x0000000006072e1a SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f138d442520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007f138d496a7c __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007f138d496a7c __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #6 0x00007f138d496a7c pthread_kill ./nptl/pthread_kill.c:89:10
 #7 0x00007f138d442476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x00007f138d4287f3 abort ./stdlib/abort.c:81:7
 #9 0x0000000005fe8e4c llvm::report_fatal_error(llvm::Twine const&, bool) (/home/jayfoad2/llvm-expensive/bin/llc+0x5fe8e4c)
#10 0x00000000055889eb (/home/jayfoad2/llvm-expensive/bin/llc+0x55889eb)
#11 0x00000000054a0eec llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/home/jayfoad2/llvm-expensive/bin/llc+0x54a0eec)
#12 0x0000000005964754 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/jayfoad2/llvm-expensive/bin/llc+0x5964754)
#13 0x000000000596c961 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/jayfoad2/llvm-expensive/bin/llc+0x596c961)
#14 0x00000000059651f9 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/jayfoad2/llvm-expensive/bin/llc+0x59651f9)
#15 0x0000000003ac8759 main (/home/jayfoad2/llvm-expensive/bin/llc+0x3ac8759)
#16 0x00007f138d429d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#17 0x00007f138d429e40 call_init ./csu/../csu/libc-start.c:128:20
#18 0x00007f138d429e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#19 0x0000000003ac3165 _start (/home/jayfoad2/llvm-expensive/bin/llc+0x3ac3165)
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/jayfoad2/llvm-expensive/bin/FileCheck /home/jayfoad2/git/llvm-project/llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll

@foad apologies, I pushed another commit a9919db65a1afa71ac62631d51711383c17d43fc straight afterwards which only enables this test for aarch64. It's possible that you pulled this commit but not the one immediately afterwards. If it still fails with both commits can you let me know? Thanks.

In D149260#4321553, @MattDevereau wrote:

@foad apologies, I pushed another commit a9919db65a1afa71ac62631d51711383c17d43fc straight afterwards which only enables this test for aarch64. It's possible that you pulled this commit but not the one immediately afterwards. If it still fails with both commits can you let me know? Thanks.

I'm pretty sure that won't help since I build with all targets enabled.

MattDevereau added a reverting change: rGf9ff2468af9b: Revert "[AArch64] Emit FNMADD instead of FNEG(FMADD)".May 5 2023, 3:50 AM

@foad Very well, I've reverted it

In D149260#4321595, @MattDevereau wrote:

@foad Very well, I've reverted it

Thanks. You might be able to reproduce the failure even in a non-expensive-checks build just by adding -verify-machineinstrs to the RUN line.

When I pushed the previous revision it produced bad machine code which triggered failures.
Previously I created a new variable MAD for capturing old FMADD instructions and used this variable
to merge the FMADD flags with the FNEG flags, however I did not mark it for deletion like the MUL did, i.e.

  } // end switch (Pattern)
  // Record MUL and ADD/SUB for deletion
  if (MUL)
    DelInstrs.push_back(MUL);
  DelInstrs.push_back(&Root);

  // Set the flags on the inserted instructions to be the merged flags of the
  // instructions that we have combined.
  uint16_t Flags = Root.getFlags();
  if (MUL)
    Flags = Root.mergeFlagsWith(*MUL);
  if (MAD)
    Flags = Root.mergeFlagsWith(*MAD);
  for (auto *MI : InsInstrs)
    MI->setFlags(Flags);
}

Instead I have used the MUL variable to capture the MAD, and that emits clean machine code.
I have also added -verify-machineinstrs to the test to verify the machine code does not regress.

MattDevereau reopened this revision.May 5 2023, 5:35 AM

This revision is now accepted and ready to land.May 5 2023, 5:35 AM

Harbormaster completed remote builds in B230214: Diff 519817.May 5 2023, 6:30 AM

LGTM. I'm happy with the fix @MattDevereau - by reusing MUL you're now correctly adding it to the delete list (DelInstrs).

Closed by commit rGea228bd0bd01: [AArch64] Emit FNMADD instead of FNEG(FMADD) (authored by MattDevereau). · Explain WhyMay 5 2023, 6:36 AM

This revision was automatically updated to reflect the committed changes.

MattDevereau added a commit: rGea228bd0bd01: [AArch64] Emit FNMADD instead of FNEG(FMADD).

This patch is causing a crash in our builds. here is a repro:

clang -cc1 -triple aarch64-cros-linux-gnu -emit-obj -ffp-contract=fast -ffast-math -O2 -x c crash.txt

double a();
int b(char *d) {
  _Bool e;
  double exponent;
  char *c = d;
  e = 1;
  do
    exponent = 10.0 * exponent - '0';
  while (c);
  a((e ? -1.0 : 1.0) * exponent);
}

manojgupta added a reverting change: rG4157625cea4a: Revert "[AArch64] Emit FNMADD instead of FNEG(FMADD)".May 7 2023, 4:38 PM

Just took the opportunity to have a quick look at this patch, given that it recently got reverted.

llvm/include/llvm/CodeGen/MachineCombinerPattern.h
182–183	Having two patterns, one for 32-bit values, and one for 64-bit values doesn't match what was done for FMSUB/FNMSUB. Can these be merged into 1 and use the register class used for the operands to determine which instruction to use?
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5421	You'll need to add a check that `MI != nullptr` From the doxygen comment: /// getUniqueVRegDef - Return the unique machine instr that defines the /// specified virtual register or null if none is found. If there are /// multiple definitions or no definition, return null.
5437	nit: if you just `return Match(..)` here, then you can avoid the variable `Found`

MattDevereau added inline comments.May 9 2023, 6:38 AM

llvm/include/llvm/CodeGen/MachineCombinerPattern.h
182–183	I'm not sure what you mean? MachineCombinerPattern::FNMSUB isn't used in `AArch64InstrInfo::genAlternativeCodeSequence` at all. The way FNMSUB is combined in this function has `MachineCombinerPattern::FMULSUBH_OP1` `MachineCombinerPattern::FMULSUBS_OP1` and `MachineCombinerPattern::FMULSUBD_OP1:` which all describe the number of bits. I'd need to do something like auto RC = MRI.getRegClass(MAD->getOperand(0).getReg()); if RC == 64bit opc = FNMADDDrrr else if RC == 32bit opc = FNMADDSrrr which I can't see a clear example of

MattDevereau updated this revision to Diff 520937.May 10 2023, 1:58 AM

@manojgupta Thank you for reverting the patch and the reproducer. I've added a check to bail on the combine if there is more than one use of an FMADD which it what was causing your reproducer to fail and i've added a test to assert this behaviour now.

This revision is now accepted and ready to land.May 10 2023, 2:01 AM

MattDevereau marked 3 inline comments as done.May 10 2023, 2:02 AM

MattDevereau added inline comments.

llvm/include/llvm/CodeGen/MachineCombinerPattern.h
182–183	I managed to achieve this with `Arch64::FPR32RegClass.hasSubClassEq(RC)` in the end

Harbormaster completed remote builds in B231057: Diff 520937.May 10 2023, 2:39 AM

david-arm added inline comments.May 10 2023, 3:56 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5713	I think we should probably be doing this check in `getFNEGPatterns` - this is similar to how it's done in `canCombineWithFMUL`, which is called from `getFMAPatterns`. Also, I think this should be `hasOneNonDBGUse` to avoid debug info blocking your wonderful optimisation. :)
5723	nit: Sorry I didn't pick this up before, but I expect this can just be else llvm_unreachable("Unexpected opcode"); because we only ever matched the double and single variants in getFNEGPatterns?

Move OneUseCheck to getFNEGPatterns and use hasOneNonDBGUse instead

MattDevereau marked an inline comment as done.May 10 2023, 4:07 AM

Harbormaster completed remote builds in B231068: Diff 520955.May 10 2023, 4:59 AM

LGTM! I think you've addressed all the review comments. I had one nit on line 5717 about adding an unreachable instead of returning nullptr, but I won't hold the patch up for it.

sdesmalen added inline comments.May 10 2023, 5:32 AM

llvm/include/llvm/CodeGen/MachineCombinerPattern.h
182–183	Thanks!
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5441	nit: I don't know if some compilers could warn about the function not returning a value, but maybe you could `break` in the default case instead and `return false` here. Either that, or add a `llvm_unreachable()` here to make it clear that the function should have returned at this point.

This revision was landed with ongoing or failed builds.May 10 2023, 5:47 AM

Closed by commit rG004bf170c6cb: [AArch64] Emit FNMADD instead of FNEG(FMADD) (authored by MattDevereau). · Explain Why

This revision was automatically updated to reflect the committed changes.

MattDevereau added a commit: rG004bf170c6cb: [AArch64] Emit FNMADD instead of FNEG(FMADD).

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

MachineCombinerPattern.h

3 lines

lib/

Target/

AArch64/

AArch64InstrInfo.cpp

87 lines

test/

CodeGen/

AArch64/

aarch64_fnmadd.ll

130 lines

Diff 519763

llvm/include/llvm/CodeGen/MachineCombinerPattern.h

Show First 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	enum class MachineCombinerPattern {
// RISCV FMADD, FMSUB, FNMSUB patterns		// RISCV FMADD, FMSUB, FNMSUB patterns
FMADD_AX,		FMADD_AX,
FMADD_XA,		FMADD_XA,
FMSUB,		FMSUB,
FNMSUB,		FNMSUB,

// X86 VNNI		// X86 VNNI
DPWSSD,		DPWSSD,

		FNMADDS,
		FNMADDD,
		sdesmalenUnsubmitted Done Reply Inline Actions Having two patterns, one for 32-bit values, and one for 64-bit values doesn't match what was done for FMSUB/FNMSUB. Can these be merged into 1 and use the register class used for the operands to determine which instruction to use? sdesmalen: Having two patterns, one for 32-bit values, and one for 64-bit values doesn't match what was…
		MattDevereauAuthorUnsubmitted Done Reply Inline Actions I'm not sure what you mean? MachineCombinerPattern::FNMSUB isn't used in `AArch64InstrInfo::genAlternativeCodeSequence` at all. The way FNMSUB is combined in this function has `MachineCombinerPattern::FMULSUBH_OP1` `MachineCombinerPattern::FMULSUBS_OP1` and `MachineCombinerPattern::FMULSUBD_OP1:` which all describe the number of bits. I'd need to do something like auto RC = MRI.getRegClass(MAD->getOperand(0).getReg()); if RC == 64bit opc = FNMADDDrrr else if RC == 32bit opc = FNMADDSrrr which I can't see a clear example of MattDevereau: I'm not sure what you mean? MachineCombinerPattern::FNMSUB isn't used in `AArch64InstrInfo…
		MattDevereauAuthorUnsubmitted Done Reply Inline Actions I managed to achieve this with `Arch64::FPR32RegClass.hasSubClassEq(RC)` in the end MattDevereau: I managed to achieve this with `Arch64::FPR32RegClass.hasSubClassEq(RC)` in the end
		sdesmalenUnsubmitted Not Done Reply Inline Actions Thanks! sdesmalen: Thanks!
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,403 Lines • ▼ Show 20 Lines	case AArch64::FMULv8f16:
Found = Match(AArch64::DUPv8i16lane, 1, MCP::FMULv8i16_indexed_OP1);		Found = Match(AArch64::DUPv8i16lane, 1, MCP::FMULv8i16_indexed_OP1);
Found \|= Match(AArch64::DUPv8i16lane, 2, MCP::FMULv8i16_indexed_OP2);		Found \|= Match(AArch64::DUPv8i16lane, 2, MCP::FMULv8i16_indexed_OP2);
break;		break;
}		}

return Found;		return Found;
}		}

		static bool getFNEGPatterns(MachineInstr &Root,
		SmallVectorImpl<MachineCombinerPattern> &Patterns) {
		unsigned Opc = Root.getOpcode();
		MachineBasicBlock &MBB = *Root.getParent();
		MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
		bool Found = false;

		auto Match = [&](unsigned Opcode, MachineCombinerPattern Pattern) -> bool {
		MachineOperand &MO = Root.getOperand(1);
		MachineInstr *MI = MRI.getUniqueVRegDef(MO.getReg());
		sdesmalenUnsubmitted Done Reply Inline Actions You'll need to add a check that `MI != nullptr` From the doxygen comment: /// getUniqueVRegDef - Return the unique machine instr that defines the /// specified virtual register or null if none is found. If there are /// multiple definitions or no definition, return null. sdesmalen: You'll need to add a check that `MI != nullptr` From the doxygen comment: ///…
		if ((MI->getOpcode() == Opcode) &&
		Root.getFlag(MachineInstr::MIFlag::FmContract) &&
		Root.getFlag(MachineInstr::MIFlag::FmNsz) &&
		MI->getFlag(MachineInstr::MIFlag::FmContract) &&
		MI->getFlag(MachineInstr::MIFlag::FmNsz)) {
		Patterns.push_back(Pattern);
		return true;
		}
		return false;
		};

		switch (Opc) {
		default:
		return false;
		case AArch64::FNEGDr:
		Found \|= Match(AArch64::FMADDDrrr, MachineCombinerPattern::FNMADDD);
		david-armUnsubmitted Not Done Reply Inline Actions Shouldn't we also be doing FMSUB here too, to ensure we are also requiring the nsz flag for the existing fnmsub combine? david-arm: Shouldn't we also be doing FMSUB here too, to ensure we are also requiring the nsz flag for the…
		MattDevereauAuthorUnsubmitted Not Done Reply Inline Actions I don't think FNEG(FMSUB) and FNMSUB are equivalent, so we can't assume FNMSUB from a FNEG perspective like we can with FNMAD: FNMSUB = (a * b) - c e.g. (12 * 4) - 8 = 32 FNEG(FMSUB) = -(-(ab) + c) e.g. -(12 4) + 8 = 40 MattDevereau: I don't think FNEG(FMSUB) and FNMSUB are equivalent, so we can't assume FNMSUB from a FNEG…
		MattDevereauAuthorUnsubmitted Done Reply Inline Actions Furthering on from this FNMSUB and FNEG(FMSUB) are equal (oops, calcuator misuse) but there's already patterns that exist for emitting FNMSUBs. Adding tests similar to the ones added in this patch don't generate FNEG(FMSUB) but instead correctly emit FNMSUB already. MattDevereau: Furthering on from this FNMSUB and FNEG(FMSUB) are equal (oops, calcuator misuse) but there's…
		craig.topperUnsubmitted Not Done Reply Inline Actions They may not be equal for cases where combinations where ab==0.0 or -0.0 and c==0.0 or -0.0. fneg(fadd A, B) != fadd(fneg(A), fneg(B)) if (A==0.0 and B==-0.0) or (A==-0.0 and B=0.0) fneg(fadd 0.0, -0.0) -> fneg(0.0) -> -0.0 fadd(fneg(0.0), fneg(-0.0)) -> fadd(-0.0, 0.0) -> 0.0 This is why I asked for the no signed zeros check. craig.topper:* They may not be equal for cases where combinations where a*b==0.0 or -0.0 and c==0.0 or -0.0.
		craig.topperUnsubmitted Not Done Reply Inline Actions I guess its's really fneg(fadd A,B) != fadd(fneg(A), fneg(B)) if (A==-B) craig.topper: I guess its's really fneg(fadd A,B) != fadd(fneg(A), fneg(B)) if (A==-B)
		david-armUnsubmitted Not Done Reply Inline Actions Hi @craig.topper in this case I'm really surprised that we don't check for nsz for fnmsub combines: ; Don't combine: Missing nsz define void @fnmsubs_contract(ptr %a, ptr %b, ptr %c) { entry: %0 = load float, ptr %a, align 4 %1 = load float, ptr %b, align 4 %mul = fmul contract float %1, %0 %2 = load float, ptr %c, align 4 %add = fsub contract float %mul, %2 %fneg = fneg contract float %add store float %fneg, ptr %a, align 4 ret void } I tested this without @MattDevereau's patch and it combines fine to `fnmsub` so it clearly doesn't need `nsz`. This suggests to me either: We don't need the `nsz` flag for the fnmadd combine, or For some reason the fnmsub combine doesn't need the `nsz` flag, or There is also a bug with the fnmsub combine that needs fixing. Do you have any idea which of the above this is? There is no real harm in @MattDevereau adding the `nsz` check in this patch, but it would be good to know if it's actually necessary or whether there is an existing bug with fnmsub. david-arm: Hi @craig.topper in this case I'm really surprised that we don't check for nsz for fnmsub…
		david-armUnsubmitted Not Done Reply Inline Actions Hi @craig.topper @MattDevereau, I think I understand now. FNMSUB does the opposite of what I expected, which was `-((a * b) - c)` - it actually does `(a * b) - c`. I missed the fact that in the `fnmsubs_contract` example I gave above we actually generate `fnmsub` followed by `fneg`. When you add the extra `nsz` flag this contracts to `fmsub`!! david-arm: Hi @craig.topper @MattDevereau, I think I understand now. FNMSUB does the opposite of what I…
		sdesmalenUnsubmitted Done Reply Inline Actions nit: if you just `return Match(..)` here, then you can avoid the variable `Found` sdesmalen: nit: if you just `return Match(..)` here, then you can avoid the variable `Found`
		break;
		case AArch64::FNEGSr:
		Found \|= Match(AArch64::FMADDSrrr, MachineCombinerPattern::FNMADDS);
		break;
		sdesmalenUnsubmitted Not Done Reply Inline Actions nit: I don't know if some compilers could warn about the function not returning a value, but maybe you could `break` in the default case instead and `return false` here. Either that, or add a `llvm_unreachable()` here to make it clear that the function should have returned at this point. sdesmalen: nit: I don't know if some compilers could warn about the function not returning a value, but…
		}

		return Found;
		}

/// Return true when a code sequence can improve throughput. It		/// Return true when a code sequence can improve throughput. It
/// should be called only for instructions in loops.		/// should be called only for instructions in loops.
/// \param Pattern - combiner pattern		/// \param Pattern - combiner pattern
bool AArch64InstrInfo::isThroughputPattern(		bool AArch64InstrInfo::isThroughputPattern(
MachineCombinerPattern Pattern) const {		MachineCombinerPattern Pattern) const {
switch (Pattern) {		switch (Pattern) {
default:		default:
break;		break;
▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	bool AArch64InstrInfo::getMachineCombinerPatterns(
// Integer patterns		// Integer patterns
if (getMaddPatterns(Root, Patterns))		if (getMaddPatterns(Root, Patterns))
return true;		return true;
// Floating point patterns		// Floating point patterns
if (getFMULPatterns(Root, Patterns))		if (getFMULPatterns(Root, Patterns))
return true;		return true;
if (getFMAPatterns(Root, Patterns))		if (getFMAPatterns(Root, Patterns))
return true;		return true;
		if (getFNEGPatterns(Root, Patterns))
		return true;

// Other patterns		// Other patterns
if (getMiscPatterns(Root, Patterns))		if (getMiscPatterns(Root, Patterns))
return true;		return true;

return TargetInstrInfo::getMachineCombinerPatterns(Root, Patterns,		return TargetInstrInfo::getMachineCombinerPatterns(Root, Patterns,
DoRegPressureReduce);		DoRegPressureReduce);
}		}
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	MIB = BuildMI(MF, MIMetadata(Root), TII->get(MaddOpc), ResultReg)
.addReg(SrcReg1, getKillRegState(Src1IsKill));		.addReg(SrcReg1, getKillRegState(Src1IsKill));
else		else
assert(false && "Invalid FMA instruction kind \n");		assert(false && "Invalid FMA instruction kind \n");
// Insert the MADD (MADD, FMA, FMS, FMLA, FMSL)		// Insert the MADD (MADD, FMA, FMS, FMLA, FMSL)
InsInstrs.push_back(MIB);		InsInstrs.push_back(MIB);
return MUL;		return MUL;
}		}

		static MachineInstr *
		genFNegatedMAD(MachineFunction &MF, MachineRegisterInfo &MRI,
		const TargetInstrInfo *TII, MachineInstr &Root,
		SmallVectorImpl<MachineInstr *> &InsInstrs, unsigned Opc,
		const TargetRegisterClass *RC) {
		MachineInstr *MAD = MRI.getUniqueVRegDef(Root.getOperand(1).getReg());
		david-armUnsubmitted Done Reply Inline Actions I think we should probably be doing this check in `getFNEGPatterns` - this is similar to how it's done in `canCombineWithFMUL`, which is called from `getFMAPatterns`. Also, I think this should be `hasOneNonDBGUse` to avoid debug info blocking your wonderful optimisation. :) david-arm: I think we should probably be doing this check in `getFNEGPatterns` - this is similar to how…
		Register ResultReg = Root.getOperand(0).getReg();
		Register SrcReg0 = MAD->getOperand(1).getReg();
		Register SrcReg1 = MAD->getOperand(2).getReg();
		Register SrcReg2 = MAD->getOperand(3).getReg();
		bool Src0IsKill = MAD->getOperand(1).isKill();
		bool Src1IsKill = MAD->getOperand(2).isKill();
		bool Src2IsKill = MAD->getOperand(3).isKill();

		if (ResultReg.isVirtual())
		MRI.constrainRegClass(ResultReg, RC);
		david-armUnsubmitted Not Done Reply Inline Actions nit: Sorry I didn't pick this up before, but I expect this can just be else llvm_unreachable("Unexpected opcode"); because we only ever matched the double and single variants in getFNEGPatterns? david-arm: nit: Sorry I didn't pick this up before, but I expect this can just be else…
		if (SrcReg0.isVirtual())
		MRI.constrainRegClass(SrcReg0, RC);
		if (SrcReg1.isVirtual())
		MRI.constrainRegClass(SrcReg1, RC);
		if (SrcReg2.isVirtual())
		MRI.constrainRegClass(SrcReg2, RC);

		MachineInstrBuilder MIB =
		BuildMI(MF, MIMetadata(Root), TII->get(Opc), ResultReg)
		.addReg(SrcReg0, getKillRegState(Src0IsKill))
		.addReg(SrcReg1, getKillRegState(Src1IsKill))
		.addReg(SrcReg2, getKillRegState(Src2IsKill));
		InsInstrs.push_back(MIB);

		return MAD;
		}

/// Fold (FMUL x (DUP y lane)) into (FMUL_indexed x y lane)		/// Fold (FMUL x (DUP y lane)) into (FMUL_indexed x y lane)
static MachineInstr *		static MachineInstr *
genIndexedMultiply(MachineInstr &Root,		genIndexedMultiply(MachineInstr &Root,
SmallVectorImpl<MachineInstr *> &InsInstrs,		SmallVectorImpl<MachineInstr *> &InsInstrs,
unsigned IdxDupOp, unsigned MulOpc,		unsigned IdxDupOp, unsigned MulOpc,
const TargetRegisterClass *RC, MachineRegisterInfo &MRI) {		const TargetRegisterClass *RC, MachineRegisterInfo &MRI) {
assert(((IdxDupOp == 1) \|\| (IdxDupOp == 2)) &&		assert(((IdxDupOp == 1) \|\| (IdxDupOp == 2)) &&
"Invalid index of FMUL operand");		"Invalid index of FMUL operand");
▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	void AArch64InstrInfo::genAlternativeCodeSequence(
SmallVectorImpl<MachineInstr *> &DelInstrs,		SmallVectorImpl<MachineInstr *> &DelInstrs,
DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const {		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const {
MachineBasicBlock &MBB = *Root.getParent();		MachineBasicBlock &MBB = *Root.getParent();
MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();		MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
MachineFunction &MF = *MBB.getParent();		MachineFunction &MF = *MBB.getParent();
const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();		const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();

MachineInstr *MUL = nullptr;		MachineInstr *MUL = nullptr;
		MachineInstr *MAD = nullptr;
const TargetRegisterClass *RC;		const TargetRegisterClass *RC;
unsigned Opc;		unsigned Opc;
switch (Pattern) {		switch (Pattern) {
default:		default:
// Reassociate instructions.		// Reassociate instructions.
TargetInstrInfo::genAlternativeCodeSequence(Root, Pattern, InsInstrs,		TargetInstrInfo::genAlternativeCodeSequence(Root, Pattern, InsInstrs,
DelInstrs, InstrIdxForVirtReg);		DelInstrs, InstrIdxForVirtReg);
return;		return;
▲ Show 20 Lines • Show All 890 Lines • ▼ Show 20 Lines	void AArch64InstrInfo::genAlternativeCodeSequence(
case MachineCombinerPattern::FMULv8i16_indexed_OP1:		case MachineCombinerPattern::FMULv8i16_indexed_OP1:
case MachineCombinerPattern::FMULv8i16_indexed_OP2: {		case MachineCombinerPattern::FMULv8i16_indexed_OP2: {
unsigned IdxDupOp =		unsigned IdxDupOp =
(Pattern == MachineCombinerPattern::FMULv8i16_indexed_OP1) ? 1 : 2;		(Pattern == MachineCombinerPattern::FMULv8i16_indexed_OP1) ? 1 : 2;
genIndexedMultiply(Root, InsInstrs, IdxDupOp, AArch64::FMULv8i16_indexed,		genIndexedMultiply(Root, InsInstrs, IdxDupOp, AArch64::FMULv8i16_indexed,
&AArch64::FPR128_loRegClass, MRI);		&AArch64::FPR128_loRegClass, MRI);
break;		break;
}		}

		case MachineCombinerPattern::FNMADDS: {
		Opc = AArch64::FNMADDSrrr;
		RC = &AArch64::FPR32RegClass;
		MAD = genFNegatedMAD(MF, MRI, TII, Root, InsInstrs, Opc, RC);
		break;
		}
		case MachineCombinerPattern::FNMADDD: {
		Opc = AArch64::FNMADDDrrr;
		RC = &AArch64::FPR64RegClass;
		MAD = genFNegatedMAD(MF, MRI, TII, Root, InsInstrs, Opc, RC);
		break;
		}

} // end switch (Pattern)		} // end switch (Pattern)
// Record MUL and ADD/SUB for deletion		// Record MUL and ADD/SUB for deletion
if (MUL)		if (MUL)
DelInstrs.push_back(MUL);		DelInstrs.push_back(MUL);
DelInstrs.push_back(&Root);		DelInstrs.push_back(&Root);

// Set the flags on the inserted instructions to be the merged flags of the		// Set the flags on the inserted instructions to be the merged flags of the
// instructions that we have combined.		// instructions that we have combined.
uint16_t Flags = Root.getFlags();		uint16_t Flags = Root.getFlags();
if (MUL)		if (MUL)
Flags = Root.mergeFlagsWith(*MUL);		Flags = Root.mergeFlagsWith(*MUL);
		if (MAD)
		Flags = Root.mergeFlagsWith(*MAD);
for (auto *MI : InsInstrs)		for (auto *MI : InsInstrs)
MI->setFlags(Flags);		MI->setFlags(Flags);
}		}

/// Replace csincr-branch sequence by simple conditional branch		/// Replace csincr-branch sequence by simple conditional branch
///		///
/// Examples:		/// Examples:
/// 1. \code		/// 1. \code
▲ Show 20 Lines • Show All 1,516 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
				; RUN: llc < %s -mtriple=aarch64-linux-gnu -O3 \| FileCheck %s

				define void @fnmaddd(ptr %a, ptr %b, ptr %c) {
				; CHECK-LABEL: fnmaddd:
				david-armUnsubmitted Not Done Reply Inline Actions nit: Can you remove the `dso_local` markers on all these functions please? I don't think we need them. david-arm: nit: Can you remove the `dso_local` markers on all these functions please? I don't think we…
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldr d0, [x1]
				; CHECK-NEXT: ldr d1, [x0]
				; CHECK-NEXT: ldr d2, [x2]
				; CHECK-NEXT: fnmadd d0, d0, d1, d2
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				entry:
				%0 = load double, ptr %a, align 8
				%1 = load double, ptr %b, align 8
				%mul = fmul fast double %1, %0
				%2 = load double, ptr %c, align 8
				%add = fadd fast double %mul, %2
				%fneg = fneg fast double %add
				store double %fneg, ptr %a, align 8
				ret void
				}

				; Don't combine: No flags
				define void @fnmaddd_no_fast(ptr %a, ptr %b, ptr %c) {
				; CHECK-LABEL: fnmaddd_no_fast:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldr d0, [x0]
				; CHECK-NEXT: ldr d1, [x1]
				; CHECK-NEXT: fmul d0, d1, d0
				; CHECK-NEXT: ldr d1, [x2]
				; CHECK-NEXT: fadd d0, d0, d1
				; CHECK-NEXT: fneg d0, d0
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				entry:
				%0 = load double, ptr %a, align 8
				%1 = load double, ptr %b, align 8
				%mul = fmul double %1, %0
				%2 = load double, ptr %c, align 8
				%add = fadd double %mul, %2
				%fneg = fneg double %add
				store double %fneg, ptr %a, align 8
				ret void
				}

				define void @fnmadds(ptr %a, ptr %b, ptr %c) {
				; CHECK-LABEL: fnmadds:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldr s0, [x1]
				; CHECK-NEXT: ldr s1, [x0]
				; CHECK-NEXT: ldr s2, [x2]
				; CHECK-NEXT: fnmadd s0, s0, s1, s2
				; CHECK-NEXT: str s0, [x0]
				; CHECK-NEXT: ret
				entry:
				%0 = load float, ptr %a, align 4
				%1 = load float, ptr %b, align 4
				%mul = fmul fast float %1, %0
				%2 = load float, ptr %c, align 4
				%add = fadd fast float %mul, %2
				%fneg = fneg fast float %add
				store float %fneg, ptr %a, align 4
				ret void
				}

				define void @fnmadds_nsz_contract(ptr %a, ptr %b, ptr %c) {
				; CHECK-LABEL: fnmadds_nsz_contract:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldr s0, [x1]
				; CHECK-NEXT: ldr s1, [x0]
				; CHECK-NEXT: ldr s2, [x2]
				; CHECK-NEXT: fnmadd s0, s0, s1, s2
				; CHECK-NEXT: str s0, [x0]
				; CHECK-NEXT: ret
				entry:
				%0 = load float, ptr %a, align 4
				%1 = load float, ptr %b, align 4
				%mul = fmul contract nsz float %1, %0
				%2 = load float, ptr %c, align 4
				%add = fadd contract nsz float %mul, %2
				%fneg = fneg contract nsz float %add
				store float %fneg, ptr %a, align 4
				ret void
				}

				; Don't combine: Missing nsz
				david-armUnsubmitted Not Done Reply Inline Actions nit: This is just a minor thing, but at first I though `@neg` referred to negation here, given the combine involves a `fneg`. Perhaps it's less confusing to just remove it and add a comment above the negative test cases explaining why the combine doesn't happen? david-arm: nit: This is just a minor thing, but at first I though `@neg` referred to negation here, given…
				MattDevereauAuthorUnsubmitted Done Reply Inline Actions Done, I agree it was pretty confusing MattDevereau: Done, I agree it was pretty confusing
				define void @fnmadds_contract(ptr %a, ptr %b, ptr %c) {
				; CHECK-LABEL: fnmadds_contract:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldr s0, [x1]
				; CHECK-NEXT: ldr s1, [x0]
				; CHECK-NEXT: ldr s2, [x2]
				; CHECK-NEXT: fmadd s0, s0, s1, s2
				; CHECK-NEXT: fneg s0, s0
				; CHECK-NEXT: str s0, [x0]
				; CHECK-NEXT: ret
				entry:
				%0 = load float, ptr %a, align 4
				%1 = load float, ptr %b, align 4
				%mul = fmul contract float %1, %0
				%2 = load float, ptr %c, align 4
				%add = fadd contract float %mul, %2
				%fneg = fneg contract float %add
				store float %fneg, ptr %a, align 4
				ret void
				}

				; Don't combine: Missing contract
				define void @fnmadds_nsz(ptr %a, ptr %b, ptr %c) {
				; CHECK-LABEL: fnmadds_nsz:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldr s0, [x0]
				; CHECK-NEXT: ldr s1, [x1]
				; CHECK-NEXT: fmul s0, s1, s0
				; CHECK-NEXT: ldr s1, [x2]
				; CHECK-NEXT: fadd s0, s0, s1
				; CHECK-NEXT: fneg s0, s0
				; CHECK-NEXT: str s0, [x0]
				; CHECK-NEXT: ret
				entry:
				%0 = load float, ptr %a, align 4
				%1 = load float, ptr %b, align 4
				%mul = fmul nsz float %1, %0
				%2 = load float, ptr %c, align 4
				%add = fadd nsz float %mul, %2
				%fneg = fneg nsz float %add
				store float %fneg, ptr %a, align 4
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Emit FNMADD instead of FNEG(FMADD)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 519763

llvm/include/llvm/CodeGen/MachineCombinerPattern.h

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

llvm/test/CodeGen/AArch64/aarch64_fnmadd.ll

[AArch64] Emit FNMADD instead of FNEG(FMADD)
ClosedPublic