Download Raw Diff

Details

Reviewers

craig.topper
reames
frasercrmck

Commits

rGb6c790736e77: [MachineCombiner][RISCV] Add fmadd/fmsub/fnmsub instructions patterns

Summary

This patch adds tranformation of fmul+fadd/fsub chains to fused multiply
instructions:

fmul+fadd->fmadd
fmul+fsub->fmsub/fnmsub

We also will try to combine these instructions if the fmul has more than one use
and cannot be deleted. However, removing the dependence between fmul and fadd can
still be profitable, and we rely on machine combiner approximations of scheduling.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

asi-sc created this revision.Oct 26 2022, 6:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2022, 6:14 AM

Herald added subscribers: sunshaoce, VincentWu, vkmr and 28 others. · View Herald Transcript

asi-sc requested review of this revision.Oct 26 2022, 6:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2022, 6:14 AM

Herald added subscribers: llvm-commits, • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B194405: Diff 470799.Oct 26 2022, 6:15 AM

Performance impact on Whetstone (double-precision) for sifive-u74 -march=rv64imafdc -O3 -funroll-loops -finline-functions -ffast-math -DDP -mtune=sifive-u74:
N1 +67%
N2 +45%
MWIPS +18%

Baseline

Loop content                  Result              MFLOPS      MOPS   Seconds

N1 floating point     -1.12398255667391900       285.015              0.700
N2 floating point     -1.12187079889295083       224.672              6.220
N3 if then else        1.00000000000000000                5725.464    0.188
N4 fixed point        12.00000000000000000               327505510.400    0.000
N5 sin,cos etc.        0.49902937281518078                  20.516   42.163
N6 floating point      0.99999987890802811       169.612             33.064
N7 assignments         3.00000000000000000                7123.768    0.270
N8 exp,sqrt etc.       0.75100163018453681                  21.097   18.333

MWIPS                                           1030.036            100.938

This patch

Loop content                  Result              MFLOPS      MOPS   Seconds

N1 floating point     -1.12398255667393077       476.923              0.498
N2 floating point     -1.12187079889296992       325.987              5.098
N3 if then else        1.00000000000000000                7110.547    0.180
N4 fixed point        12.00000000000000000               299613459.692    0.000
N5 sin,cos etc.        0.49902937281518367                  19.847   51.836
N6 floating point      0.99999987890802855       307.010             21.725
N7 assignments         3.00000000000000000               28396.673    0.080
N8 exp,sqrt etc.       0.75100163018453681                  21.007   21.897

MWIPS                                           1220.474            101.313

Herald added a subscriber: StephenFan. · View Herald TranscriptOct 26 2022, 6:36 AM

Merge debug locations of the original instructions when creating new fused instruction.

Harbormaster completed remote builds in B194905: Diff 471500.Oct 28 2022, 5:12 AM

Ping!

A gentle ping

In D136764#3885520, @asi-sc wrote:

Performance impact on Whetstone (double-precision) for sifive-u74 -march=rv64imafdc -O3 -funroll-loops -finline-functions -ffast-math -DDP -mtune=sifive-u74:
N1 +67%
N2 +45%
MWIPS +18%

Baseline

Loop content                  Result              MFLOPS      MOPS   Seconds

N1 floating point     -1.12398255667391900       285.015              0.700
N2 floating point     -1.12187079889295083       224.672              6.220
N3 if then else        1.00000000000000000                5725.464    0.188
N4 fixed point        12.00000000000000000               327505510.400    0.000
N5 sin,cos etc.        0.49902937281518078                  20.516   42.163
N6 floating point      0.99999987890802811       169.612             33.064
N7 assignments         3.00000000000000000                7123.768    0.270
N8 exp,sqrt etc.       0.75100163018453681                  21.097   18.333

MWIPS                                           1030.036            100.938

This patch

Loop content                  Result              MFLOPS      MOPS   Seconds

N1 floating point     -1.12398255667393077       476.923              0.498
N2 floating point     -1.12187079889296992       325.987              5.098
N3 if then else        1.00000000000000000                7110.547    0.180
N4 fixed point        12.00000000000000000               299613459.692    0.000
N5 sin,cos etc.        0.49902937281518367                  19.847   51.836
N6 floating point      0.99999987890802855       307.010             21.725
N7 assignments         3.00000000000000000               28396.673    0.080
N8 exp,sqrt etc.       0.75100163018453681                  21.007   21.897

MWIPS                                           1220.474            101.313

Do you know why sin, cos and exp, sqrt seems to take longer?

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
1279	Cache the the result is FADD when you called it the first time?

In D136764#3928934, @craig.topper wrote:

Do you know why sin, cos and exp, sqrt seems to take longer?

It's a good question, thanks. I should say that the math functions we call are unchanged, so it's definitely not the reason. The caller site for exp/sqrt (N8) also matches between time measurements, so it regressed not because of any change in instructions. One idea I have is that the unrolled loop that executes exp/sqrt is better aligned in the baseline version (0x1d64 vs 0x1d86). What do you think, can this be the reason?

As for sin/cos (N5), machine combiner generates fmadd (leaving fmul as it has one more use) that results in one additional fmv for each loop iteration. The reason for this is the extended liverange of the atan return value.
Baseline

call    atan@plt
fmul.d  fs1, fa0, fs6
...
fmv.d   fa0, fs0
call    sincos@plt
...
fadd.d  fa0, fs1, fs0
...

This patch

call    atan@plt
fmv.d   fs1, fa0           <--- we combined fmadd that uses fa0 but it is placed after sincos call. Move is required to extend fa0 liverange
fmul.d  fs2, fa0, fs8
...
fmv.d   fa0, fs0
call    sincos@plt
...
fmadd.d fa0, fs1, fs8, fs0
...

I do have a draft work that will improve machine combiner logic to deal with this problem. In my opinion, combining instructions that are separated by a call is a doubtful from performance point of view thing and we must do additional check in these situations. However, I don't see how we can fix it exactly in this patch. Do we agree that sometimes additional moves are inserted, their origin is clear, and the problem is likely to be addressed in further patches? I don't expect major performance issues caused by exactly this behavior and didn't observe any. Or maybe there are any suggestions on how to address it in this patch?

Address review comments

Harbormaster completed remote builds in B197970: Diff 475777.Nov 16 2022, 4:40 AM

In D136764#3930123, @asi-sc wrote:

I do have a draft work that will improve machine combiner logic to deal with this problem. In my opinion, combining instructions that are separated by a call is a doubtful from performance point of view thing and we must do additional check in these situations. However, I don't see how we can fix it exactly in this patch. Do we agree that sometimes additional moves are inserted, their origin is clear, and the problem is likely to be addressed in further patches? I don't expect major performance issues caused by exactly this behavior and didn't observe any. Or maybe there are any suggestions on how to address it in this patch?

I updated my draft patch to handle the situation described. Now locally there is no difference in asm instructions for both Whets-N5 (sin/cos) and Whets-N8(exp/sqrt) comparing to the baseline version. However, there is still the same execution time difference. So, it seems to me as architecture-related thing and not instructions combining issue.

Also, according to llvm-mca report (llvm-mca -march=riscv64 -mcpu=sifive-u74) for the unrolled loop body of N5 (sin/cos), there should not be so dramatic difference (diff in cycles is less than 1%):
baseline

Iterations:        100      
Instructions:      15300    
Total Cycles:      380004   
Total uOps:        18500    
                            
Dispatch Width:    2        
uOps Per Cycle:    0.05     
IPC:               0.04     
Block RThroughput: 560.0

this patch

Iterations:        100
Instructions:      16000  
Total Cycles:      383504 
Total uOps:        19200  

Dispatch Width:    2
uOps Per Cycle:    0.05
IPC:               0.04
Block RThroughput: 567.0

LGTM

This revision is now accepted and ready to land.Nov 16 2022, 8:51 PM

asi-sc mentioned this in rG374d07656357: [MachineCombiner][RISCV] Precommit tests for D136764.Nov 17 2022, 1:13 AM

Rebase to fetch precommitted tests

Harbormaster completed remote builds in B198156: Diff 476043.Nov 17 2022, 1:56 AM

This revision was landed with ongoing or failed builds.Nov 17 2022, 2:28 AM

Closed by commit rGb6c790736e77: [MachineCombiner][RISCV] Add fmadd/fmsub/fnmsub instructions patterns (authored by asi-sc). · Explain Why

This revision was automatically updated to reflect the committed changes.

asi-sc added a commit: rGb6c790736e77: [MachineCombiner][RISCV] Add fmadd/fmsub/fnmsub instructions patterns.

Diff 476061

llvm/include/llvm/CodeGen/MachineCombinerPattern.h

Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	enum class MachineCombinerPattern {
FMULv2i64_indexed_OP1,		FMULv2i64_indexed_OP1,
FMULv2i64_indexed_OP2,		FMULv2i64_indexed_OP2,
FMULv4i16_indexed_OP1,		FMULv4i16_indexed_OP1,
FMULv4i16_indexed_OP2,		FMULv4i16_indexed_OP2,
FMULv4i32_indexed_OP1,		FMULv4i32_indexed_OP1,
FMULv4i32_indexed_OP2,		FMULv4i32_indexed_OP2,
FMULv8i16_indexed_OP1,		FMULv8i16_indexed_OP1,
FMULv8i16_indexed_OP2,		FMULv8i16_indexed_OP2,

		// RISCV FMADD, FMSUB, FNMSUB patterns
		FMADD_AX,
		FMADD_XA,
		FMSUB,
		FNMSUB,
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/lib/CodeGen/MachineCombiner.cpp

Show First 20 Lines • Show All 313 Lines • ▼ Show 20 Lines	static CombinerObjective getCombinerObjective(MachineCombinerPattern P) {
case MachineCombinerPattern::REASSOC_AX_BY:		case MachineCombinerPattern::REASSOC_AX_BY:
case MachineCombinerPattern::REASSOC_AX_YB:		case MachineCombinerPattern::REASSOC_AX_YB:
case MachineCombinerPattern::REASSOC_XA_BY:		case MachineCombinerPattern::REASSOC_XA_BY:
case MachineCombinerPattern::REASSOC_XA_YB:		case MachineCombinerPattern::REASSOC_XA_YB:
case MachineCombinerPattern::REASSOC_XY_AMM_BMM:		case MachineCombinerPattern::REASSOC_XY_AMM_BMM:
case MachineCombinerPattern::REASSOC_XMM_AMM_BMM:		case MachineCombinerPattern::REASSOC_XMM_AMM_BMM:
case MachineCombinerPattern::SUBADD_OP1:		case MachineCombinerPattern::SUBADD_OP1:
case MachineCombinerPattern::SUBADD_OP2:		case MachineCombinerPattern::SUBADD_OP2:
		case MachineCombinerPattern::FMADD_AX:
		case MachineCombinerPattern::FMADD_XA:
		case MachineCombinerPattern::FMSUB:
		case MachineCombinerPattern::FNMSUB:
return CombinerObjective::MustReduceDepth;		return CombinerObjective::MustReduceDepth;
case MachineCombinerPattern::REASSOC_XY_BCA:		case MachineCombinerPattern::REASSOC_XY_BCA:
case MachineCombinerPattern::REASSOC_XY_BAC:		case MachineCombinerPattern::REASSOC_XY_BAC:
return CombinerObjective::MustReduceRegisterPressure;		return CombinerObjective::MustReduceRegisterPressure;
default:		default:
return CombinerObjective::Default;		return CombinerObjective::Default;
}		}
}		}
▲ Show 20 Lines • Show All 451 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVInstrInfo.h

Show First 20 Lines • Show All 190 Lines • ▼ Show 20 Lines	public:
getMachineCombinerPatterns(MachineInstr &Root,		getMachineCombinerPatterns(MachineInstr &Root,
SmallVectorImpl<MachineCombinerPattern> &Patterns,		SmallVectorImpl<MachineCombinerPattern> &Patterns,
bool DoRegPressureReduce) const override;		bool DoRegPressureReduce) const override;

void		void
finalizeInsInstrs(MachineInstr &Root, MachineCombinerPattern &P,		finalizeInsInstrs(MachineInstr &Root, MachineCombinerPattern &P,
SmallVectorImpl<MachineInstr *> &InsInstrs) const override;		SmallVectorImpl<MachineInstr *> &InsInstrs) const override;

		void genAlternativeCodeSequence(
		MachineInstr &Root, MachineCombinerPattern Pattern,
		SmallVectorImpl<MachineInstr *> &InsInstrs,
		SmallVectorImpl<MachineInstr *> &DelInstrs,
		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const override;

protected:		protected:
const RISCVSubtarget &STI;		const RISCVSubtarget &STI;
};		};

namespace RISCV {		namespace RISCV {

// Returns true if this is the sext.w pattern, addiw rd, rs1, 0.		// Returns true if this is the sext.w pattern, addiw rd, rs1, 0.
bool isSEXT_W(const MachineInstr &MI);		bool isSEXT_W(const MachineInstr &MI);
Show All 36 Lines

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

Show All 20 Lines
#include "llvm/Analysis/MemoryLocation.h"		#include "llvm/Analysis/MemoryLocation.h"
#include "llvm/CodeGen/LiveIntervals.h"		#include "llvm/CodeGen/LiveIntervals.h"
#include "llvm/CodeGen/LiveVariables.h"		#include "llvm/CodeGen/LiveVariables.h"
#include "llvm/CodeGen/MachineCombinerPattern.h"		#include "llvm/CodeGen/MachineCombinerPattern.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/RegisterScavenging.h"		#include "llvm/CodeGen/RegisterScavenging.h"
		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/MC/MCInstBuilder.h"		#include "llvm/MC/MCInstBuilder.h"
#include "llvm/MC/TargetRegistry.h"		#include "llvm/MC/TargetRegistry.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"

using namespace llvm;		using namespace llvm;

#define GEN_CHECK_COMPRESS_INSTR		#define GEN_CHECK_COMPRESS_INSTR
#include "RISCVGenCompressInstEmitter.inc"		#include "RISCVGenCompressInstEmitter.inc"
▲ Show 20 Lines • Show All 1,134 Lines • ▼ Show 20 Lines	default:
return false;		return false;
case RISCV::FADD_H:		case RISCV::FADD_H:
case RISCV::FADD_S:		case RISCV::FADD_S:
case RISCV::FADD_D:		case RISCV::FADD_D:
return true;		return true;
}		}
}		}

		static bool isFSUB(unsigned Opc) {
		switch (Opc) {
		default:
		return false;
		case RISCV::FSUB_H:
		case RISCV::FSUB_S:
		case RISCV::FSUB_D:
		return true;
		}
		}

static bool isFMUL(unsigned Opc) {		static bool isFMUL(unsigned Opc) {
switch (Opc) {		switch (Opc) {
default:		default:
return false;		return false;
case RISCV::FMUL_H:		case RISCV::FMUL_H:
case RISCV::FMUL_S:		case RISCV::FMUL_S:
case RISCV::FMUL_D:		case RISCV::FMUL_D:
return true;		return true;
Show All 19 Lines	if (!Root.getFlag(MachineInstr::MIFlag::FmReassoc) \|\|
!Root.getFlag(MachineInstr::MIFlag::FmNsz) \|\|		!Root.getFlag(MachineInstr::MIFlag::FmNsz) \|\|
!MI->getFlag(MachineInstr::MIFlag::FmReassoc) \|\|		!MI->getFlag(MachineInstr::MIFlag::FmReassoc) \|\|
!MI->getFlag(MachineInstr::MIFlag::FmNsz))		!MI->getFlag(MachineInstr::MIFlag::FmNsz))
return false;		return false;

return RISCV::hasEqualFRM(Root, *MI);		return RISCV::hasEqualFRM(Root, *MI);
}		}

		static bool canCombineFPFusedMultiply(const MachineInstr &Root,
		const MachineOperand &MO,
		bool DoRegPressureReduce) {
		if (!MO.isReg() \|\| !Register::isVirtualRegister(MO.getReg()))
		return false;
		const MachineRegisterInfo &MRI = Root.getMF()->getRegInfo();
		MachineInstr *MI = MRI.getVRegDef(MO.getReg());
		if (!MI \|\| !isFMUL(MI->getOpcode()))
		return false;

		if (!Root.getFlag(MachineInstr::MIFlag::FmContract) \|\|
		!MI->getFlag(MachineInstr::MIFlag::FmContract))
		return false;

		// Try combining even if fmul has more than one use as it eliminates
		// dependency between fadd(fsub) and fmul. However, it can extend liveranges
		// for fmul operands, so reject the transformation in register pressure
		// reduction mode.
		if (DoRegPressureReduce && !MRI.hasOneNonDBGUse(MI->getOperand(0).getReg()))
		return false;

		// Do not combine instructions from different basic blocks.
		if (Root.getParent() != MI->getParent())
		return false;
		return RISCV::hasEqualFRM(Root, *MI);
		}

static bool		static bool
getFPReassocPatterns(MachineInstr &Root,		getFPReassocPatterns(MachineInstr &Root,
SmallVectorImpl<MachineCombinerPattern> &Patterns) {		SmallVectorImpl<MachineCombinerPattern> &Patterns) {
bool Added = false;		bool Added = false;
if (canReassociate(Root, Root.getOperand(1))) {		if (canReassociate(Root, Root.getOperand(1))) {
Patterns.push_back(MachineCombinerPattern::REASSOC_AX_BY);		Patterns.push_back(MachineCombinerPattern::REASSOC_AX_BY);
Patterns.push_back(MachineCombinerPattern::REASSOC_XA_BY);		Patterns.push_back(MachineCombinerPattern::REASSOC_XA_BY);
Added = true;		Added = true;
}		}
if (canReassociate(Root, Root.getOperand(2))) {		if (canReassociate(Root, Root.getOperand(2))) {
Patterns.push_back(MachineCombinerPattern::REASSOC_AX_YB);		Patterns.push_back(MachineCombinerPattern::REASSOC_AX_YB);
Patterns.push_back(MachineCombinerPattern::REASSOC_XA_YB);		Patterns.push_back(MachineCombinerPattern::REASSOC_XA_YB);
Added = true;		Added = true;
}		}
return Added;		return Added;
}		}

static bool getFPPatterns(MachineInstr &Root,		static bool
SmallVectorImpl<MachineCombinerPattern> &Patterns) {		getFPFusedMultiplyPatterns(MachineInstr &Root,
		SmallVectorImpl<MachineCombinerPattern> &Patterns,
		bool DoRegPressureReduce) {
unsigned Opc = Root.getOpcode();		unsigned Opc = Root.getOpcode();
if (isAssociativeAndCommutativeFPOpcode(Opc))		bool IsFAdd = isFADD(Opc);
return getFPReassocPatterns(Root, Patterns);		if (!IsFAdd && !isFSUB(Opc))
return false;		return false;
		bool Added = false;
		if (canCombineFPFusedMultiply(Root, Root.getOperand(1),
		craig.topperUnsubmitted Not Done Reply Inline Actions Cache the the result is FADD when you called it the first time? craig.topper: Cache the the result is FADD when you called it the first time?
		DoRegPressureReduce)) {
		Patterns.push_back(IsFAdd ? MachineCombinerPattern::FMADD_AX
		: MachineCombinerPattern::FMSUB);
		Added = true;
		}
		if (canCombineFPFusedMultiply(Root, Root.getOperand(2),
		DoRegPressureReduce)) {
		Patterns.push_back(IsFAdd ? MachineCombinerPattern::FMADD_XA
		: MachineCombinerPattern::FNMSUB);
		Added = true;
		}
		return Added;
		}

		static bool getFPPatterns(MachineInstr &Root,
		SmallVectorImpl<MachineCombinerPattern> &Patterns,
		bool DoRegPressureReduce) {
		bool Added = getFPFusedMultiplyPatterns(Root, Patterns, DoRegPressureReduce);
		if (isAssociativeAndCommutativeFPOpcode(Root.getOpcode()))
		Added \|= getFPReassocPatterns(Root, Patterns);
		return Added;
}		}

bool RISCVInstrInfo::getMachineCombinerPatterns(		bool RISCVInstrInfo::getMachineCombinerPatterns(
MachineInstr &Root, SmallVectorImpl<MachineCombinerPattern> &Patterns,		MachineInstr &Root, SmallVectorImpl<MachineCombinerPattern> &Patterns,
bool DoRegPressureReduce) const {		bool DoRegPressureReduce) const {

if (getFPPatterns(Root, Patterns))		if (getFPPatterns(Root, Patterns, DoRegPressureReduce))
return true;		return true;

return TargetInstrInfo::getMachineCombinerPatterns(Root, Patterns,		return TargetInstrInfo::getMachineCombinerPatterns(Root, Patterns,
DoRegPressureReduce);		DoRegPressureReduce);
}		}

		static unsigned getFPFusedMultiplyOpcode(unsigned RootOpc,
		MachineCombinerPattern Pattern) {
		switch (RootOpc) {
		default:
		llvm_unreachable("Unexpected opcode");
		case RISCV::FADD_H:
		return RISCV::FMADD_H;
		case RISCV::FADD_S:
		return RISCV::FMADD_S;
		case RISCV::FADD_D:
		return RISCV::FMADD_D;
		case RISCV::FSUB_H:
		return Pattern == MachineCombinerPattern::FMSUB ? RISCV::FMSUB_H
		: RISCV::FNMSUB_H;
		case RISCV::FSUB_S:
		return Pattern == MachineCombinerPattern::FMSUB ? RISCV::FMSUB_S
		: RISCV::FNMSUB_S;
		case RISCV::FSUB_D:
		return Pattern == MachineCombinerPattern::FMSUB ? RISCV::FMSUB_D
		: RISCV::FNMSUB_D;
		}
		}

		static unsigned getAddendOperandIdx(MachineCombinerPattern Pattern) {
		switch (Pattern) {
		default:
		llvm_unreachable("Unexpected pattern");
		case MachineCombinerPattern::FMADD_AX:
		case MachineCombinerPattern::FMSUB:
		return 2;
		case MachineCombinerPattern::FMADD_XA:
		case MachineCombinerPattern::FNMSUB:
		return 1;
		}
		}

		static void combineFPFusedMultiply(MachineInstr &Root, MachineInstr &Prev,
		MachineCombinerPattern Pattern,
		SmallVectorImpl<MachineInstr *> &InsInstrs,
		SmallVectorImpl<MachineInstr *> &DelInstrs) {
		MachineFunction *MF = Root.getMF();
		MachineRegisterInfo &MRI = MF->getRegInfo();
		const TargetInstrInfo *TII = MF->getSubtarget().getInstrInfo();

		MachineOperand &Mul1 = Prev.getOperand(1);
		MachineOperand &Mul2 = Prev.getOperand(2);
		MachineOperand &Dst = Root.getOperand(0);
		MachineOperand &Addend = Root.getOperand(getAddendOperandIdx(Pattern));

		Register DstReg = Dst.getReg();
		unsigned FusedOpc = getFPFusedMultiplyOpcode(Root.getOpcode(), Pattern);
		auto IntersectedFlags = Root.getFlags() & Prev.getFlags();
		DebugLoc MergedLoc =
		DILocation::getMergedLocation(Root.getDebugLoc(), Prev.getDebugLoc());

		MachineInstrBuilder MIB =
		BuildMI(*MF, MergedLoc, TII->get(FusedOpc), DstReg)
		.addReg(Mul1.getReg(), getKillRegState(Mul1.isKill()))
		.addReg(Mul2.getReg(), getKillRegState(Mul2.isKill()))
		.addReg(Addend.getReg(), getKillRegState(Addend.isKill()))
		.setMIFlags(IntersectedFlags);

		// Mul operands are not killed anymore.
		Mul1.setIsKill(false);
		Mul2.setIsKill(false);

		InsInstrs.push_back(MIB);
		if (MRI.hasOneNonDBGUse(Prev.getOperand(0).getReg()))
		DelInstrs.push_back(&Prev);
		DelInstrs.push_back(&Root);
		}

		void RISCVInstrInfo::genAlternativeCodeSequence(
		MachineInstr &Root, MachineCombinerPattern Pattern,
		SmallVectorImpl<MachineInstr *> &InsInstrs,
		SmallVectorImpl<MachineInstr *> &DelInstrs,
		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const {
		MachineRegisterInfo &MRI = Root.getMF()->getRegInfo();
		switch (Pattern) {
		default:
		TargetInstrInfo::genAlternativeCodeSequence(Root, Pattern, InsInstrs,
		DelInstrs, InstrIdxForVirtReg);
		return;
		case MachineCombinerPattern::FMADD_AX:
		case MachineCombinerPattern::FMSUB: {
		MachineInstr &Prev = *MRI.getVRegDef(Root.getOperand(1).getReg());
		combineFPFusedMultiply(Root, Prev, Pattern, InsInstrs, DelInstrs);
		return;
		}
		case MachineCombinerPattern::FMADD_XA:
		case MachineCombinerPattern::FNMSUB: {
		MachineInstr &Prev = *MRI.getVRegDef(Root.getOperand(2).getReg());
		combineFPFusedMultiply(Root, Prev, Pattern, InsInstrs, DelInstrs);
		return;
		}
		}
		}

bool RISCVInstrInfo::verifyInstruction(const MachineInstr &MI,		bool RISCVInstrInfo::verifyInstruction(const MachineInstr &MI,
StringRef &ErrInfo) const {		StringRef &ErrInfo) const {
MCInstrDesc const &Desc = MI.getDesc();		MCInstrDesc const &Desc = MI.getDesc();

for (auto &OI : enumerate(Desc.operands())) {		for (auto &OI : enumerate(Desc.operands())) {
unsigned OpType = OI.value().OperandType;		unsigned OpType = OI.value().OperandType;
if (OpType >= RISCVOp::OPERAND_FIRST_RISCV_IMM &&		if (OpType >= RISCVOp::OPERAND_FIRST_RISCV_IMM &&
OpType <= RISCVOp::OPERAND_LAST_RISCV_IMM) {		OpType <= RISCVOp::OPERAND_LAST_RISCV_IMM) {
▲ Show 20 Lines • Show All 1,043 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/machine-combiner-mir.ll

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	define double @test_fmadd(double %a0, double %a1, double %a2) {
; CHECK-LABEL: name: test_fmadd		; CHECK-LABEL: name: test_fmadd
; CHECK: bb.0 (%ir-block.0):		; CHECK: bb.0 (%ir-block.0):
; CHECK-NEXT: liveins: $f10_d, $f11_d, $f12_d		; CHECK-NEXT: liveins: $f10_d, $f11_d, $f12_d
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:fpr64 = COPY $f12_d		; CHECK-NEXT: [[COPY:%[0-9]+]]:fpr64 = COPY $f12_d
; CHECK-NEXT: [[COPY1:%[0-9]+]]:fpr64 = COPY $f11_d		; CHECK-NEXT: [[COPY1:%[0-9]+]]:fpr64 = COPY $f11_d
; CHECK-NEXT: [[COPY2:%[0-9]+]]:fpr64 = COPY $f10_d		; CHECK-NEXT: [[COPY2:%[0-9]+]]:fpr64 = COPY $f10_d
; CHECK-NEXT: [[FMUL_D:%[0-9]+]]:fpr64 = contract nofpexcept FMUL_D [[COPY2]], [[COPY1]], 7, implicit $frm		; CHECK-NEXT: [[FMUL_D:%[0-9]+]]:fpr64 = contract nofpexcept FMUL_D [[COPY2]], [[COPY1]], 7, implicit $frm
; CHECK-NEXT: [[FADD_D:%[0-9]+]]:fpr64 = contract nofpexcept FADD_D [[FMUL_D]], [[COPY]], 7, implicit $frm		; CHECK-NEXT: [[FMADD_D:%[0-9]+]]:fpr64 = contract nofpexcept FMADD_D [[COPY2]], [[COPY1]], [[COPY]], 7, implicit $frm
; CHECK-NEXT: [[FDIV_D:%[0-9]+]]:fpr64 = nofpexcept FDIV_D killed [[FADD_D]], [[FMUL_D]], 7, implicit $frm		; CHECK-NEXT: [[FDIV_D:%[0-9]+]]:fpr64 = nofpexcept FDIV_D killed [[FMADD_D]], [[FMUL_D]], 7, implicit $frm
; CHECK-NEXT: $f10_d = COPY [[FDIV_D]]		; CHECK-NEXT: $f10_d = COPY [[FDIV_D]]
; CHECK-NEXT: PseudoRET implicit $f10_d		; CHECK-NEXT: PseudoRET implicit $f10_d
%t0 = fmul contract double %a0, %a1		%t0 = fmul contract double %a0, %a1
%t1 = fadd contract double %t0, %a2		%t1 = fadd contract double %t0, %a2
%t2 = fdiv double %t1, %t0		%t2 = fdiv double %t1, %t0
ret double %t2		ret double %t2
}		}

llvm/test/CodeGen/RISCV/machine-combiner.ll

Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%t1 = fadd nsz reassoc double %t0, %a2		%t1 = fadd nsz reassoc double %t0, %a2
%t2 = fadd double %t1, %a3		%t2 = fadd double %t1, %a3
ret double %t2		ret double %t2
}		}

define double @test_fmadd1(double %a0, double %a1, double %a2, double %a3) {		define double @test_fmadd1(double %a0, double %a1, double %a2, double %a3) {
; CHECK-LABEL: test_fmadd1:		; CHECK-LABEL: test_fmadd1:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: fmul.d ft0, fa0, fa1		; CHECK-NEXT: fmadd.d ft0, fa0, fa1, fa2
; CHECK-NEXT: fadd.d ft1, ft0, fa2		; CHECK-NEXT: fmadd.d ft1, fa0, fa1, fa3
; CHECK-NEXT: fadd.d ft0, fa3, ft0		; CHECK-NEXT: fadd.d fa0, ft0, ft1
; CHECK-NEXT: fadd.d fa0, ft1, ft0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%t0 = fmul contract double %a0, %a1		%t0 = fmul contract double %a0, %a1
%t1 = fadd contract double %t0, %a2		%t1 = fadd contract double %t0, %a2
%t2 = fadd contract double %a3, %t0		%t2 = fadd contract double %a3, %t0
%t3 = fadd double %t1, %t2		%t3 = fadd double %t1, %t2
ret double %t3		ret double %t3
}		}

define double @test_fmadd2(double %a0, double %a1, double %a2) {		define double @test_fmadd2(double %a0, double %a1, double %a2) {
; CHECK-LABEL: test_fmadd2:		; CHECK-LABEL: test_fmadd2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: fmul.d ft0, fa0, fa1		; CHECK-NEXT: fmul.d ft0, fa0, fa1
; CHECK-NEXT: fadd.d ft1, ft0, fa2		; CHECK-NEXT: fmadd.d ft1, fa0, fa1, fa2
; CHECK-NEXT: fdiv.d fa0, ft1, ft0		; CHECK-NEXT: fdiv.d fa0, ft1, ft0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%t0 = fmul contract double %a0, %a1		%t0 = fmul contract double %a0, %a1
%t1 = fadd contract double %t0, %a2		%t1 = fadd contract double %t0, %a2
%t2 = fdiv double %t1, %t0		%t2 = fdiv double %t1, %t0
ret double %t2		ret double %t2
}		}

define double @test_fmsub(double %a0, double %a1, double %a2) {		define double @test_fmsub(double %a0, double %a1, double %a2) {
; CHECK-LABEL: test_fmsub:		; CHECK-LABEL: test_fmsub:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: fmul.d ft0, fa0, fa1		; CHECK-NEXT: fmul.d ft0, fa0, fa1
; CHECK-NEXT: fsub.d ft1, ft0, fa2		; CHECK-NEXT: fmsub.d ft1, fa0, fa1, fa2
; CHECK-NEXT: fdiv.d fa0, ft1, ft0		; CHECK-NEXT: fdiv.d fa0, ft1, ft0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%t0 = fmul contract double %a0, %a1		%t0 = fmul contract double %a0, %a1
%t1 = fsub contract double %t0, %a2		%t1 = fsub contract double %t0, %a2
%t2 = fdiv double %t1, %t0		%t2 = fdiv double %t1, %t0
ret double %t2		ret double %t2
}		}

define double @test_fnmsub(double %a0, double %a1, double %a2) {		define double @test_fnmsub(double %a0, double %a1, double %a2) {
; CHECK-LABEL: test_fnmsub:		; CHECK-LABEL: test_fnmsub:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: fmul.d ft0, fa0, fa1		; CHECK-NEXT: fmul.d ft0, fa0, fa1
; CHECK-NEXT: fsub.d ft1, fa2, ft0		; CHECK-NEXT: fnmsub.d ft1, fa0, fa1, fa2
; CHECK-NEXT: fdiv.d fa0, ft1, ft0		; CHECK-NEXT: fdiv.d fa0, ft1, ft0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%t0 = fmul contract double %a0, %a1		%t0 = fmul contract double %a0, %a1
%t1 = fsub contract double %a2, %t0		%t1 = fsub contract double %a2, %t0
%t2 = fdiv double %t1, %t0		%t2 = fdiv double %t1, %t0
ret double %t2		ret double %t2
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[MachineCombiner][RISCV] Add fmadd/fmsub/fnmsub instructions patterns
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 476061

llvm/include/llvm/CodeGen/MachineCombinerPattern.h

llvm/lib/CodeGen/MachineCombiner.cpp

llvm/lib/Target/RISCV/RISCVInstrInfo.h

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

llvm/test/CodeGen/RISCV/machine-combiner-mir.ll

llvm/test/CodeGen/RISCV/machine-combiner.ll

This is an archive of the discontinued LLVM Phabricator instance.

[MachineCombiner][RISCV] Add fmadd/fmsub/fnmsub instructions patternsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 476061

llvm/include/llvm/CodeGen/MachineCombinerPattern.h

llvm/lib/CodeGen/MachineCombiner.cpp

llvm/lib/Target/RISCV/RISCVInstrInfo.h

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

llvm/test/CodeGen/RISCV/machine-combiner-mir.ll

llvm/test/CodeGen/RISCV/machine-combiner.ll

[MachineCombiner][RISCV] Add fmadd/fmsub/fnmsub instructions patterns
ClosedPublic