This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
3/4
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
DAGCombiner.cpp
-
Target/
-
AArch64/
-
AArch64ISelLowering.h
-
AMDGPU/
-
AMDGPUISelLowering.h
-
ARM/
-
ARMISelLowering.h
-
Mips/
-
MipsISelLowering.h
-
NVPTX/
-
NVPTXISelLowering.h
-
PowerPC/
-
PPCISelLowering.h
-
SystemZ/
-
SystemZISelLowering.h
-
WebAssembly/
-
WebAssemblyInstrFloat.td
-
X86/
-
X86ISelLowering.h
-
test/CodeGen/
-
CodeGen/
-
RISCV/
1/1
copysign-casts.ll
-
WebAssembly/
-
copysign-casts.ll

Differential D66725

[DAGCombiner][TargetLowering] Target hook for FCOPYSIGN arg cast folding
AbandonedPublic

Authored by luismarques on Aug 25 2019, 7:11 PM.

Download Raw Diff

Details

Reviewers

eli.friedman
asb
lenary

Summary

The FCOPYSIGN IR instruction takes magnitude and sign arguments that can have different floating-point types. That instruction is typically generated by calls to copysign functions or intrinsics (copysign, copysignf, copysignl, llvm.copysign.f32, etc.) that take two arguments with the same type. Thus, if you call copysign with differently-typed values you'll get an FCOPYSIGN argument that is "casted" to the right type, using either fp_extend/fpext or fp_round/fptrunc. For example:

float copysignf(float x, float y);

float foo(float x, double y) {
    return copysignf(x, y);
}

Initial selection DAG: %bb.0 'foo:entry'
SelectionDAG has 11 nodes:
  t0: ch = EntryToken
      t2: f32,ch = CopyFromReg t0, Register:f32 %0
        t4: f64,ch = CopyFromReg t0, Register:f64 %1
      t6: f32 = fp_round t4, TargetConstant:i32<0>
    t7: f32 = fcopysign t2, t6
  t9: ch,glue = CopyToReg t0, Register:f32 $f10_32, t7
  t10: ch = RISCVISD::RET_FLAG t9, Register:f32 $f10_32, t9:1

The DAGCombiner currently folds away that cast, unless the sign type is an fp128:

Optimized lowered selection DAG: %bb.0 'foo:entry'
SelectionDAG has 9 nodes:
  t0: ch = EntryToken
      t2: f32,ch = CopyFromReg t0, Register:f32 %0
      t4: f64,ch = CopyFromReg t0, Register:f64 %1
    t11: f32 = fcopysign t2, t4
  t9: ch,glue = CopyToReg t0, Register:f32 $f10_32, t11
  t10: ch = RISCVISD::RET_FLAG t9, Register:f32 $f10_32, t9:1

One case where the folding is desirable is when FCOPYSIGN is expanded (in LegalizeDAG). In that case the conversion is pointless since the expanded code extracts the sign bit using integer bitwise operations and that can be easily done whatever the original floating-point type is, thus you save the cost of a doing the extension or round-down. Another case where you want elide the cast is when your ISA handles both floats and doubles using the same internal format and FP hardware (e.g. PowerPC converts everything to double).

On the other hand, when you have native copysign instructions (and don't convert everything to the same internal format) doing the fold is typically undesirable and just creates implementation busywork. The copysign machine instructions probably don't accept mixed precision types, so now you have to create selection patterns for the various FP type combination, using the appropriate rounding/extending instructions to recreate the cast that was folded away. See for instance the WebAssembly target.

In general, I would say that whether you want to fold the cast away or not depends on the target lowering. Currently the DAGCombiner does not use any target hook to check whether the fold should be done or not. Instead, it only checks for f128, with a vague comment about it being problematic on some targets like x86_64 (not clear exactly which...). This hard coded logic doesn't allow the folding to be done for f128 when it actually would be OK, and folds away when the target doesn't actually handle the mixed types (e.g. RISC-V, at the moment). You'll probably agree that having random x86 target limitations infect the DAGCombiner is not ideal. One possible solution is to introduce a target hook, which is what this patch does.

Feedback would be appreciated about the following:

Is there a way to do introduce a better check in the DAGCombiner that does not require introducing a target hook? I explored a couple of options, but I could only make it work with a hook.

Nitpick: naming suggestions for the hook. I tried to be consistent with the other hooks but all of the names I could come up with were slightly flawed in some away, including the one in this patch.

Semantics for the hook. In this patch we do the fold if the target indicates it can handle the given types. Maybe that's not the best way to express the problem.

Implementation for the hook. By default we consider that we can handle any types if the copysign would be expanded, since generally the expansion code should be able to handle any type unless the target messes that up with the legalization steps (that was the case for a couple of targets). Also, I was initially checking for the expansion outside the hook (in the DAGCombiner) but that failed for one x86 test, so I had to move that to the hook.

Arguably the comment that was originally on DAGCombiner should be moved to the X86 TargetLowering and adjusted for that new context. But given its vagueness I just deleted it.

Not all targets have tests for this, covering all of the relevant types. Not sure for which targets and which target options it's worth adding. For instance, for ARM there's a lack of tests for the f128 case, and it needs to be tested with more than -mtriple=arm.

The alternative to all of this would be to just exhaustively implement FCOPYSIGN in all targets, despite the busywork. Is it realistic that we can encounter FCOPYSIGN with all of the FP types without a cast? Doesn't seem so to me.

Isn't the root problem of this that we don't have the notion of an instruction being legal depending on the types of the various operands? I checked GlobalISel and it seems like it doesn't solve that either. It seems like it can set legalization actions based on the type of individual operands but not based on combinations of operands. So, more flexible than SelectionDAG but still not flexible enough to solve this without a target hook.
- It seems like it's not the first time people have run into that limitation. For instance, in CodeGenPrepare it says FIXME: always querying the result type is just an approximation; some nodes' legality is determined by the operand or other means. There's no good way to find out though.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

luismarques created this revision.Aug 25 2019, 7:11 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2019, 7:11 PM

Herald added subscribers: llvm-commits, pzheng, steven.zhang and 33 others. · View Herald Transcript

lenary added inline comments.Aug 27 2019, 8:48 AM

llvm/include/llvm/CodeGen/TargetLowering.h
2609	Which targets use this default and not `SignTy != i128`, beyond RISC-V and WebAssembly?
llvm/test/CodeGen/RISCV/copysign-casts.ll
19	NIT: Can you add `noexcept` to these tests?

Herald added a subscriber: • wuzish. · View Herald TranscriptAug 27 2019, 8:48 AM

Adds nounwind attribute to the RISC-V tests.

luismarques marked an inline comment as done.Aug 27 2019, 4:45 PM

luismarques added inline comments.

llvm/include/llvm/CodeGen/TargetLowering.h
2609	AArch64 overrides the hook but accepts all types, including `f128`. In-tree targets that keep the default implementation in this patch are ARC, AVR, BPF, Lanai, MSP430, NVPTX, Sparc, RISCV and WebAssembly.

luismarques marked an inline comment as done.Aug 27 2019, 4:54 PM

luismarques added inline comments.

llvm/include/llvm/CodeGen/TargetLowering.h
2609	...and XCore.

I am happy with this. I think this hook is the correct way to go about choosing how to do this optimisation, and accurately conveys what's going on.

I would like to see a review by someone who works on cross-target parts of DAGCombiner, before this is landed.

llvm/include/llvm/CodeGen/TargetLowering.h
2609	Ok, nice, I just wanted to make sure this was a sensible default, rather than being overriden everywhere.

In D66725#1663174, @lenary wrote:

I would like to see a review by someone who works on cross-target parts of DAGCombiner, before this is landed.

@efriedma would you be willing to review this?

emaste added a subscriber: emaste.Oct 18 2019, 11:57 AM

FYI, this change allowed me to compile and run a hard-float FreeBSD RISC-V userland. (With stock LLVM the build trips over this assertion)

For FCOPYSIGN, specifically, LegalizeDAG is going to query the target to ask, "is FCOPYSIGN legal for result type X", using the getOperationAction() API? The target has a few different ways to respond: here, the relevant possibilities are "Legal", "Expand", or "Custom". I guess the issue here is that on RISCV, this query is returning "Legal", when it actually isn't legal for all combinations of legal result/input types?

If this is a problem for a bunch of targets, maybe we need a different API for expressing whether an FCOPYSIGN is legal. I'd prefer to follow existing convention for other operations which involve multiple types, though; for example, see TargetLoweringBase::getLoadExtAction.

brooks added a subscriber: brooks.Nov 6 2019, 11:34 AM

Herald added a subscriber: sameer.abuasal. · View Herald TranscriptNov 6 2019, 11:34 AM

Rebased and tweaked patch.

Herald added a subscriber: jholewinski. · View Herald TranscriptNov 10 2019, 3:51 AM

luismarques edited the summary of this revision. (Show Details)Nov 10 2019, 3:52 AM

In D66725#1717427, @efriedma wrote:

For FCOPYSIGN, specifically, LegalizeDAG is going to query the target to ask, "is FCOPYSIGN legal for result type X", using the getOperationAction() API? The target has a few different ways to respond: here, the relevant possibilities are "Legal", "Expand", or "Custom". I guess the issue here is that on RISCV, this query is returning "Legal", when it actually isn't legal for all combinations of legal result/input types?
If this is a problem for a bunch of targets, maybe we need a different API for expressing whether an FCOPYSIGN is legal. I'd prefer to follow existing convention for other operations which involve multiple types, though; for example, see TargetLoweringBase::getLoadExtAction.

@efriedma: one problem is that whether we want to do the combine or not doesn't map well to whether FCOPYSIGN is legal (for return type X). For instance, if you don't have an FPU and you expand the copysign (to integer bit manipulations) you actually still want to do the combine, since it's pointless to emit a libcall for promoting/truncating the floating-point sign value -- it's just as easy to extract the sign bit from the original value. In any case, I experimented with the proposed approach and I will submit the patch for review for comparison. But I think it ends up addressing this problem in an even kludgier way (I'll add some details in that patch's summary).

Add nounwind to tests.

luismarques mentioned this in D70064: [DAGCombiner][TargetLowering] FCOPYSIGN mixed types legality.Nov 10 2019, 2:41 PM

dmz added a subscriber: dmz.Nov 12 2019, 9:39 AM

mhorne added a subscriber: mhorne.Nov 16 2019, 2:50 PM

luismarques mentioned this in D70426: [DAGCombiner][RISCV] Avoid FCOPYSIGN folding of legalizing operand casts.Nov 19 2019, 2:52 AM

Abandoned in favour of D70679 (and possibly other future patches).

luismarques mentioned this in D96037: [DAGCombiner] Don't fold FCOPYSIGN vector sign operand casts.Feb 4 2021, 7:49 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

5 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

13 lines

Target/

AArch64/

AArch64ISelLowering.h

4 lines

AMDGPU/

AMDGPUISelLowering.h

4 lines

ARM/

ARMISelLowering.h

4 lines

Mips/

MipsISelLowering.h

4 lines

NVPTX/

NVPTXISelLowering.h

4 lines

PowerPC/

PPCISelLowering.h

4 lines

SystemZ/

SystemZISelLowering.h

4 lines

WebAssembly/

WebAssemblyInstrFloat.td

6 lines

X86/

X86ISelLowering.h

4 lines

test/

CodeGen/

RISCV/

copysign-casts.ll

118 lines

WebAssembly/

copysign-casts.ll

3 lines

Diff 228604

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 2,600 Lines • ▼ Show 20 Lines	public:

// Return true if CodeGenPrepare should consider splitting large offset of a		// Return true if CodeGenPrepare should consider splitting large offset of a
// GEP to make the GEP fit into the addressing mode and can be sunk into the		// GEP to make the GEP fit into the addressing mode and can be sunk into the
// same blocks of its users.		// same blocks of its users.
virtual bool shouldConsiderGEPOffsetSplit() const { return false; }		virtual bool shouldConsiderGEPOffsetSplit() const { return false; }

// Return the shift amount threshold for profitable transforms into shifts.		// Return the shift amount threshold for profitable transforms into shifts.
// Transforms creating shifts above the returned value will be avoided.		// Transforms creating shifts above the returned value will be avoided.
virtual unsigned getShiftAmountThreshold(EVT VT) const {		virtual unsigned getShiftAmountThreshold(EVT VT) const {
		lenaryUnsubmitted Not Done Reply Inline Actions Which targets use this default and not `SignTy != i128`, beyond RISC-V and WebAssembly? lenary: Which targets use this default and not `SignTy != i128`, beyond RISC-V and WebAssembly?
		luismarquesAuthorUnsubmitted Done Reply Inline Actions AArch64 overrides the hook but accepts all types, including `f128`. In-tree targets that keep the default implementation in this patch are ARC, AVR, BPF, Lanai, MSP430, NVPTX, Sparc, RISCV and WebAssembly. luismarques: AArch64 overrides the hook but accepts all types, including `f128`. In-tree targets that keep…
		luismarquesAuthorUnsubmitted Done Reply Inline Actions ...and XCore. luismarques: ...and XCore.
		lenaryUnsubmitted Done Reply Inline Actions Ok, nice, I just wanted to make sure this was a sensible default, rather than being overriden everywhere. lenary: Ok, nice, I just wanted to make sure this was a sensible default, rather than being overriden…
return VT.getScalarSizeInBits();		return VT.getScalarSizeInBits();
}		}

		// Return true if FCOPYSIGN is properly handled with the given types.
		virtual bool canCopySign(EVT ValueTy, EVT SignTy) const {
		return isOperationExpand(ISD::FCOPYSIGN, ValueTy) \|\| ValueTy == SignTy;
		}

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Runtime Library hooks		// Runtime Library hooks
//		//

/// Rename the default libcall routine name for the specified libcall.		/// Rename the default libcall routine name for the specified libcall.
void setLibcallName(RTLIB::Libcall Call, const char *Name) {		void setLibcallName(RTLIB::Libcall Call, const char *Name) {
LibcallRoutineNames[Call] = Name;		LibcallRoutineNames[Call] = Name;
}		}
▲ Show 20 Lines • Show All 1,658 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,660 Lines • ▼ Show 20 Lines	if (TLI.isFsqrtCheap(N0, DAG))
return SDValue();		return SDValue();

// FSQRT nodes have flags that propagate to the created nodes.		// FSQRT nodes have flags that propagate to the created nodes.
return buildSqrtEstimate(N0, Flags);		return buildSqrtEstimate(N0, Flags);
}		}

/// copysign(x, fp_extend(y)) -> copysign(x, y)		/// copysign(x, fp_extend(y)) -> copysign(x, y)
/// copysign(x, fp_round(y)) -> copysign(x, y)		/// copysign(x, fp_round(y)) -> copysign(x, y)
static inline bool CanCombineFCOPYSIGN_EXTEND_ROUND(SDNode *N) {		static inline bool CanCombineFCOPYSIGN_EXTEND_ROUND(const TargetLowering &TLI,
		SDNode *N) {
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
if ((N1.getOpcode() == ISD::FP_EXTEND \|\|		if ((N1.getOpcode() == ISD::FP_EXTEND \|\|
N1.getOpcode() == ISD::FP_ROUND)) {		N1.getOpcode() == ISD::FP_ROUND)) {
// Do not optimize out type conversion of f128 type yet.		SDValue N0 = N->getOperand(1);
// For some targets like x86_64, configuration is changed to keep one f128		EVT N0VT = N0->getValueType(0);
// value in one SSE register, but instruction selection cannot handle
// FCOPYSIGN on SSE registers yet.
EVT N1VT = N1->getValueType(0);		EVT N1VT = N1->getValueType(0);
EVT N1Op0VT = N1->getOperand(0).getValueType();		EVT N1Op0VT = N1->getOperand(0).getValueType();
return (N1VT == N1Op0VT \|\| N1Op0VT != MVT::f128);		return (N1VT == N1Op0VT \|\| TLI.canCopySign(N0VT, N1Op0VT));
}		}
return false;		return false;
}		}

SDValue DAGCombiner::visitFCOPYSIGN(SDNode *N) {		SDValue DAGCombiner::visitFCOPYSIGN(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
bool N0CFP = isConstantFPBuildVectorOrConstantFP(N0);		bool N0CFP = isConstantFPBuildVectorOrConstantFP(N0);
Show All 29 Lines	if (N1.getOpcode() == ISD::FABS)
return DAG.getNode(ISD::FABS, SDLoc(N), VT, N0);		return DAG.getNode(ISD::FABS, SDLoc(N), VT, N0);

// copysign(x, copysign(y,z)) -> copysign(x, z)		// copysign(x, copysign(y,z)) -> copysign(x, z)
if (N1.getOpcode() == ISD::FCOPYSIGN)		if (N1.getOpcode() == ISD::FCOPYSIGN)
return DAG.getNode(ISD::FCOPYSIGN, SDLoc(N), VT, N0, N1.getOperand(1));		return DAG.getNode(ISD::FCOPYSIGN, SDLoc(N), VT, N0, N1.getOperand(1));

// copysign(x, fp_extend(y)) -> copysign(x, y)		// copysign(x, fp_extend(y)) -> copysign(x, y)
// copysign(x, fp_round(y)) -> copysign(x, y)		// copysign(x, fp_round(y)) -> copysign(x, y)
if (CanCombineFCOPYSIGN_EXTEND_ROUND(N))		if (CanCombineFCOPYSIGN_EXTEND_ROUND(TLI, N))
return DAG.getNode(ISD::FCOPYSIGN, SDLoc(N), VT, N0, N1.getOperand(0));		return DAG.getNode(ISD::FCOPYSIGN, SDLoc(N), VT, N0, N1.getOperand(0));

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitFPOW(SDNode *N) {		SDValue DAGCombiner::visitFPOW(SDNode *N) {
ConstantFPSDNode *ExponentC = isConstOrConstSplatFP(N->getOperand(1));		ConstantFPSDNode *ExponentC = isConstOrConstSplatFP(N->getOperand(1));
if (!ExponentC)		if (!ExponentC)
▲ Show 20 Lines • Show All 8,188 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 754 Lines • ▼ Show 20 Lines	bool getPostIndexedAddressParts(SDNode N, SDNode Op, SDValue &Base,
SelectionDAG &DAG) const override;		SelectionDAG &DAG) const override;

void ReplaceNodeResults(SDNode *N, SmallVectorImpl<SDValue> &Results,		void ReplaceNodeResults(SDNode *N, SmallVectorImpl<SDValue> &Results,
SelectionDAG &DAG) const override;		SelectionDAG &DAG) const override;

bool shouldNormalizeToSelectSequence(LLVMContext &, EVT) const override;		bool shouldNormalizeToSelectSequence(LLVMContext &, EVT) const override;

void finalizeLowering(MachineFunction &MF) const override;		void finalizeLowering(MachineFunction &MF) const override;

		bool canCopySign(EVT ValueTy, EVT SignTy) const override {
		return (ValueTy == SignTy \|\| SignTy != MVT::f128);
		}
};		};

namespace AArch64 {		namespace AArch64 {
FastISel *createFastISel(FunctionLoweringInfo &funcInfo,		FastISel *createFastISel(FunctionLoweringInfo &funcInfo,
const TargetLibraryInfo *libInfo);		const TargetLibraryInfo *libInfo);
} // end namespace AArch64		} // end namespace AArch64

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 319 Lines • ▼ Show 20 Lines	public:
/// type of implicit parameter.		/// type of implicit parameter.
uint32_t getImplicitParameterOffset(const MachineFunction &MF,		uint32_t getImplicitParameterOffset(const MachineFunction &MF,
const ImplicitParameter Param) const;		const ImplicitParameter Param) const;

MVT getFenceOperandTy(const DataLayout &DL) const override {		MVT getFenceOperandTy(const DataLayout &DL) const override {
return MVT::i32;		return MVT::i32;
}		}

		bool canCopySign(EVT ValueTy, EVT SignTy) const override {
		return (ValueTy == SignTy \|\| SignTy != MVT::f128);
		}

AtomicExpansionKind shouldExpandAtomicRMWInIR(AtomicRMWInst *) const override;		AtomicExpansionKind shouldExpandAtomicRMWInIR(AtomicRMWInst *) const override;
};		};

namespace AMDGPUISD {		namespace AMDGPUISD {

enum NodeType : unsigned {		enum NodeType : unsigned {
// AMDIL ISD Opcodes		// AMDIL ISD Opcodes
FIRST_NUMBER = ISD::BUILTIN_OP_END,		FIRST_NUMBER = ISD::BUILTIN_OP_END,
▲ Show 20 Lines • Show All 209 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 802 Lines • ▼ Show 20 Lines	SDValue LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool isVarArg,
const SDLoc &dl, SelectionDAG &DAG) const override;		const SDLoc &dl, SelectionDAG &DAG) const override;

bool isUsedByReturnOnly(SDNode *N, SDValue &Chain) const override;		bool isUsedByReturnOnly(SDNode *N, SDValue &Chain) const override;

bool mayBeEmittedAsTailCall(const CallInst *CI) const override;		bool mayBeEmittedAsTailCall(const CallInst *CI) const override;

bool shouldConsiderGEPOffsetSplit() const override { return true; }		bool shouldConsiderGEPOffsetSplit() const override { return true; }

		bool canCopySign(EVT ValueTy, EVT SignTy) const override {
		return (ValueTy == SignTy \|\| SignTy != MVT::f128);
		}

bool isUnsupportedFloatingType(EVT VT) const;		bool isUnsupportedFloatingType(EVT VT) const;

SDValue getCMOV(const SDLoc &dl, EVT VT, SDValue FalseVal, SDValue TrueVal,		SDValue getCMOV(const SDLoc &dl, EVT VT, SDValue FalseVal, SDValue TrueVal,
SDValue ARMcc, SDValue CCR, SDValue Cmp,		SDValue ARMcc, SDValue CCR, SDValue Cmp,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
SDValue getARMCmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,		SDValue getARMCmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,
SDValue &ARMcc, SelectionDAG &DAG, const SDLoc &dl) const;		SDValue &ARMcc, SelectionDAG &DAG, const SDLoc &dl) const;
SDValue getVFPCmp(SDValue LHS, SDValue RHS, SelectionDAG &DAG,		SDValue getVFPCmp(SDValue LHS, SDValue RHS, SelectionDAG &DAG,
▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/lib/Target/Mips/MipsISelLowering.h

Show First 20 Lines • Show All 682 Lines • ▼ Show 20 Lines	private:

unsigned getJumpTableEncoding() const override;		unsigned getJumpTableEncoding() const override;
bool useSoftFloat() const override;		bool useSoftFloat() const override;

bool shouldInsertFencesForAtomic(const Instruction *I) const override {		bool shouldInsertFencesForAtomic(const Instruction *I) const override {
return true;		return true;
}		}

		bool canCopySign(EVT ValueTy, EVT SignTy) const override {
		return (ValueTy == SignTy \|\| SignTy != MVT::f128);
		}

/// Emit a sign-extension using sll/sra, seb, or seh appropriately.		/// Emit a sign-extension using sll/sra, seb, or seh appropriately.
MachineBasicBlock *emitSignExtendToI32InReg(MachineInstr &MI,		MachineBasicBlock *emitSignExtendToI32InReg(MachineInstr &MI,
MachineBasicBlock *BB,		MachineBasicBlock *BB,
unsigned Size, unsigned DstReg,		unsigned Size, unsigned DstReg,
unsigned SrcRec) const;		unsigned SrcRec) const;

MachineBasicBlock *emitAtomicBinary(MachineInstr &MI,		MachineBasicBlock *emitAtomicBinary(MachineInstr &MI,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;
Show All 33 Lines

llvm/lib/Target/NVPTX/NVPTXISelLowering.h

Show First 20 Lines • Show All 542 Lines • ▼ Show 20 Lines	public:
bool enableAggressiveFMAFusion(EVT VT) const override { return true; }		bool enableAggressiveFMAFusion(EVT VT) const override { return true; }

// The default is to transform llvm.ctlz(x, false) (where false indicates that		// The default is to transform llvm.ctlz(x, false) (where false indicates that
// x == 0 is not undefined behavior) into a branch that checks whether x is 0		// x == 0 is not undefined behavior) into a branch that checks whether x is 0
// and avoids calling ctlz in that case. We have a dedicated ctlz		// and avoids calling ctlz in that case. We have a dedicated ctlz
// instruction, so we say that ctlz is cheap to speculate.		// instruction, so we say that ctlz is cheap to speculate.
bool isCheapToSpeculateCtlz() const override { return true; }		bool isCheapToSpeculateCtlz() const override { return true; }

		bool canCopySign(EVT ValueTy, EVT SignTy) const override {
		return (ValueTy == SignTy \|\| SignTy != MVT::f128);
		}

private:		private:
const NVPTXSubtarget &STI; // cache the subtarget here		const NVPTXSubtarget &STI; // cache the subtarget here
SDValue getParamSymbol(SelectionDAG &DAG, int idx, EVT) const;		SDValue getParamSymbol(SelectionDAG &DAG, int idx, EVT) const;

SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;

Show All 26 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 950 Lines • ▼ Show 20 Lines	public:
unsigned getJumpTableEncoding() const override;		unsigned getJumpTableEncoding() const override;
bool isJumpTableRelative() const override;		bool isJumpTableRelative() const override;
SDValue getPICJumpTableRelocBase(SDValue Table,		SDValue getPICJumpTableRelocBase(SDValue Table,
SelectionDAG &DAG) const override;		SelectionDAG &DAG) const override;
const MCExpr getPICJumpTableRelocBaseExpr(const MachineFunction MF,		const MCExpr getPICJumpTableRelocBaseExpr(const MachineFunction MF,
unsigned JTI,		unsigned JTI,
MCContext &Ctx) const override;		MCContext &Ctx) const override;

		bool canCopySign(EVT ValueTy, EVT SignTy) const override {
		return (ValueTy == SignTy \|\| SignTy != MVT::f128);
		}

private:		private:
struct ReuseLoadInfo {		struct ReuseLoadInfo {
SDValue Ptr;		SDValue Ptr;
SDValue Chain;		SDValue Chain;
SDValue ResChain;		SDValue ResChain;
MachinePointerInfo MPI;		MachinePointerInfo MPI;
bool IsDereferenceable = false;		bool IsDereferenceable = false;
bool IsInvariant = false;		bool IsInvariant = false;
▲ Show 20 Lines • Show All 276 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZISelLowering.h

Show First 20 Lines • Show All 518 Lines • ▼ Show 20 Lines	public:
ISD::NodeType getExtendForAtomicOps() const override {		ISD::NodeType getExtendForAtomicOps() const override {
return ISD::ANY_EXTEND;		return ISD::ANY_EXTEND;
}		}

bool supportSwiftError() const override {		bool supportSwiftError() const override {
return true;		return true;
}		}

		bool canCopySign(EVT ValueTy, EVT SignTy) const override {
		return (ValueTy == SignTy \|\| SignTy != MVT::f128);
		}

private:		private:
const SystemZSubtarget &Subtarget;		const SystemZSubtarget &Subtarget;

// Implement LowerOperation for individual opcodes.		// Implement LowerOperation for individual opcodes.
SDValue getVectorCmp(SelectionDAG &DAG, unsigned Opcode,		SDValue getVectorCmp(SelectionDAG &DAG, unsigned Opcode,
const SDLoc &DL, EVT VT,		const SDLoc &DL, EVT VT,
SDValue CmpOp0, SDValue CmpOp1) const;		SDValue CmpOp0, SDValue CmpOp1) const;
SDValue lowerVectorSETCC(SelectionDAG &DAG, const SDLoc &DL,		SDValue lowerVectorSETCC(SelectionDAG &DAG, const SDLoc &DL,
▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

llvm/lib/Target/WebAssembly/WebAssemblyInstrFloat.td

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	defm MAX : BinaryFP<fmaximum, "max ", 0x97, 0xa5>;			defm MAX : BinaryFP<fmaximum, "max ", 0x97, 0xa5>;
	} // isCommutable = 1			} // isCommutable = 1

	defm CEIL : UnaryFP<fceil, "ceil", 0x8d, 0x9b>;			defm CEIL : UnaryFP<fceil, "ceil", 0x8d, 0x9b>;
	defm FLOOR : UnaryFP<ffloor, "floor", 0x8e, 0x9c>;			defm FLOOR : UnaryFP<ffloor, "floor", 0x8e, 0x9c>;
	defm TRUNC : UnaryFP<ftrunc, "trunc", 0x8f, 0x9d>;			defm TRUNC : UnaryFP<ftrunc, "trunc", 0x8f, 0x9d>;
	defm NEAREST : UnaryFP<fnearbyint, "nearest", 0x90, 0x9e>;			defm NEAREST : UnaryFP<fnearbyint, "nearest", 0x90, 0x9e>;

	// DAGCombine oddly folds casts into the rhs of copysign. Unfold them.
	def : Pat<(fcopysign F64:$lhs, F32:$rhs),
	(COPYSIGN_F64 F64:$lhs, (F64_PROMOTE_F32 F32:$rhs))>;
	def : Pat<(fcopysign F32:$lhs, F64:$rhs),
	(COPYSIGN_F32 F32:$lhs, (F32_DEMOTE_F64 F64:$rhs))>;

	// WebAssembly doesn't expose inexact exceptions, so map frint to fnearbyint.			// WebAssembly doesn't expose inexact exceptions, so map frint to fnearbyint.
	def : Pat<(frint f32:$src), (NEAREST_F32 f32:$src)>;			def : Pat<(frint f32:$src), (NEAREST_F32 f32:$src)>;
	def : Pat<(frint f64:$src), (NEAREST_F64 f64:$src)>;			def : Pat<(frint f64:$src), (NEAREST_F64 f64:$src)>;

	let isCommutable = 1 in {			let isCommutable = 1 in {
	defm EQ : ComparisonFP<SETOEQ, "eq ", 0x5b, 0x61>;			defm EQ : ComparisonFP<SETOEQ, "eq ", 0x5b, 0x61>;
	defm NE : ComparisonFP<SETUNE, "ne ", 0x5c, 0x62>;			defm NE : ComparisonFP<SETUNE, "ne ", 0x5c, 0x62>;
	} // isCommutable = 1			} // isCommutable = 1
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 1,143 Lines • ▼ Show 20 Lines	bool isExtractVecEltCheap(EVT VT, unsigned Index) const override {
return (EltVT == MVT::f32 \|\| EltVT == MVT::f64) && Index == 0;		return (EltVT == MVT::f32 \|\| EltVT == MVT::f64) && Index == 0;
}		}

/// Overflow nodes should get combined/lowered to optimal instructions		/// Overflow nodes should get combined/lowered to optimal instructions
/// (they should allow eliminating explicit compares by getting flags from		/// (they should allow eliminating explicit compares by getting flags from
/// math ops).		/// math ops).
bool shouldFormOverflowOp(unsigned Opcode, EVT VT) const override;		bool shouldFormOverflowOp(unsigned Opcode, EVT VT) const override;

		bool canCopySign(EVT ValueTy, EVT SignTy) const override {
		return (ValueTy == SignTy \|\| SignTy != MVT::f128);
		}

bool storeOfVectorConstantIsCheap(EVT MemVT, unsigned NumElem,		bool storeOfVectorConstantIsCheap(EVT MemVT, unsigned NumElem,
unsigned AddrSpace) const override {		unsigned AddrSpace) const override {
// If we can replace more than 2 scalar stores, there will be a reduction		// If we can replace more than 2 scalar stores, there will be a reduction
// in instructions even after we add a vector constant load.		// in instructions even after we add a vector constant load.
return NumElem > 2;		return NumElem > 2;
}		}

bool isLoadBitCastBeneficial(EVT LoadVT, EVT BitcastVT,		bool isLoadBitCastBeneficial(EVT LoadVT, EVT BitcastVT,
▲ Show 20 Lines • Show All 539 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/copysign-casts.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
				; RUN: \| FileCheck %s -check-prefix=RV32I
				; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
				; RUN: \| FileCheck %s -check-prefix=RV64I
				; RUN: llc -mtriple=riscv32 -verify-machineinstrs -mattr=+f \
				; RUN: -target-abi ilp32f < %s \| FileCheck %s -check-prefix=RV32IF
				; RUN: llc -mtriple=riscv32 -verify-machineinstrs -mattr=+f -mattr=+d \
				; RUN: -target-abi ilp32d < %s \| FileCheck %s -check-prefix=RV32IFD
				; RUN: llc -mtriple=riscv64 -verify-machineinstrs -mattr=+f -mattr=+d \
				; RUN: -target-abi lp64d < %s \| FileCheck %s -check-prefix=RV64IFD

				; Check that DAGCombiner only folds casts into the sign argument of copysign
				; when appropriate (i.e. when it would be expanded because we don't handle mixed
				; precision magnitude and sign arguments).

				declare double @llvm.copysign.f64(double, double)
				declare float @llvm.copysign.f32(float, float)

				lenaryUnsubmitted Done Reply Inline Actions NIT: Can you add `noexcept` to these tests? lenary: NIT: Can you add `noexcept` to these tests?
				define double @fold_promote(double %a, float %b) nounwind {
				; RV32I-LABEL: fold_promote:
				; RV32I: # %bb.0:
				; RV32I-NEXT: lui a3, 524288
				; RV32I-NEXT: and a2, a2, a3
				; RV32I-NEXT: addi a3, a3, -1
				; RV32I-NEXT: and a1, a1, a3
				; RV32I-NEXT: or a1, a1, a2
				; RV32I-NEXT: ret
				;
				; RV64I-LABEL: fold_promote:
				; RV64I: # %bb.0:
				; RV64I-NEXT: addi a2, zero, -1
				; RV64I-NEXT: slli a2, a2, 63
				; RV64I-NEXT: addi a2, a2, -1
				; RV64I-NEXT: and a0, a0, a2
				; RV64I-NEXT: addi a2, zero, 1
				; RV64I-NEXT: slli a2, a2, 31
				; RV64I-NEXT: and a1, a1, a2
				; RV64I-NEXT: slli a1, a1, 32
				; RV64I-NEXT: or a0, a0, a1
				; RV64I-NEXT: ret
				;
				; RV32IF-LABEL: fold_promote:
				; RV32IF: # %bb.0:
				; RV32IF-NEXT: fmv.x.w a2, fa0
				; RV32IF-NEXT: lui a3, 524288
				; RV32IF-NEXT: and a2, a2, a3
				; RV32IF-NEXT: addi a3, a3, -1
				; RV32IF-NEXT: and a1, a1, a3
				; RV32IF-NEXT: or a1, a1, a2
				; RV32IF-NEXT: ret
				;
				; RV32IFD-LABEL: fold_promote:
				; RV32IFD: # %bb.0:
				; RV32IFD-NEXT: fcvt.d.s ft0, fa1
				; RV32IFD-NEXT: fsgnj.d fa0, fa0, ft0
				; RV32IFD-NEXT: ret
				;
				; RV64IFD-LABEL: fold_promote:
				; RV64IFD: # %bb.0:
				; RV64IFD-NEXT: fcvt.d.s ft0, fa1
				; RV64IFD-NEXT: fsgnj.d fa0, fa0, ft0
				; RV64IFD-NEXT: ret
				%c = fpext float %b to double
				%t = call double @llvm.copysign.f64(double %a, double %c)
				ret double %t
				}

				define float @fold_demote(float %a, double %b) nounwind {
				; RV32I-LABEL: fold_demote:
				; RV32I: # %bb.0:
				; RV32I-NEXT: lui a1, 524288
				; RV32I-NEXT: and a2, a2, a1
				; RV32I-NEXT: addi a1, a1, -1
				; RV32I-NEXT: and a0, a0, a1
				; RV32I-NEXT: or a0, a0, a2
				; RV32I-NEXT: ret
				;
				; RV64I-LABEL: fold_demote:
				; RV64I: # %bb.0:
				; RV64I-NEXT: lui a2, 524288
				; RV64I-NEXT: addiw a2, a2, -1
				; RV64I-NEXT: and a0, a0, a2
				; RV64I-NEXT: addi a2, zero, -1
				; RV64I-NEXT: slli a2, a2, 63
				; RV64I-NEXT: and a1, a1, a2
				; RV64I-NEXT: srli a1, a1, 32
				; RV64I-NEXT: or a0, a0, a1
				; RV64I-NEXT: ret
				;
				; RV32IF-LABEL: fold_demote:
				; RV32IF: # %bb.0:
				; RV32IF-NEXT: addi sp, sp, -16
				; RV32IF-NEXT: sw ra, 12(sp)
				; RV32IF-NEXT: fsw fs0, 8(sp)
				; RV32IF-NEXT: fmv.s fs0, fa0
				; RV32IF-NEXT: call __truncdfsf2
				; RV32IF-NEXT: fsgnj.s fa0, fs0, fa0
				; RV32IF-NEXT: flw fs0, 8(sp)
				; RV32IF-NEXT: lw ra, 12(sp)
				; RV32IF-NEXT: addi sp, sp, 16
				; RV32IF-NEXT: ret
				;
				; RV32IFD-LABEL: fold_demote:
				; RV32IFD: # %bb.0:
				; RV32IFD-NEXT: fcvt.s.d ft0, fa1
				; RV32IFD-NEXT: fsgnj.s fa0, fa0, ft0
				; RV32IFD-NEXT: ret
				;
				; RV64IFD-LABEL: fold_demote:
				; RV64IFD: # %bb.0:
				; RV64IFD-NEXT: fcvt.s.d ft0, fa1
				; RV64IFD-NEXT: fsgnj.s fa0, fa0, ft0
				; RV64IFD-NEXT: ret
				%c = fptrunc double %b to float
				%t = call float @llvm.copysign.f32(float %a, float %c)
				ret float %t
				}

llvm/test/CodeGen/WebAssembly/copysign-casts.ll

	; RUN: llc < %s -asm-verbose=false -wasm-keep-registers \| FileCheck %s			; RUN: llc < %s -asm-verbose=false -wasm-keep-registers \| FileCheck %s

	; DAGCombiner oddly folds casts into the rhs of copysign. Test that they get			; Check that DAGCombiner does not fold casts into the sign argument of copysign.
	; unfolded.

	target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"			target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
	target triple = "wasm32-unknown-unknown"			target triple = "wasm32-unknown-unknown"

	declare double @copysign(double, double) nounwind readnone			declare double @copysign(double, double) nounwind readnone
	declare float @copysignf(float, float) nounwind readnone			declare float @copysignf(float, float) nounwind readnone

	; CHECK-LABEL: fold_promote:			; CHECK-LABEL: fold_promote:
	Show All 16 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner][TargetLowering] Target hook for FCOPYSIGN arg cast foldingAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 228604

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h

llvm/lib/Target/ARM/ARMISelLowering.h

llvm/lib/Target/Mips/MipsISelLowering.h

llvm/lib/Target/NVPTX/NVPTXISelLowering.h

llvm/lib/Target/PowerPC/PPCISelLowering.h

llvm/lib/Target/SystemZ/SystemZISelLowering.h

llvm/lib/Target/WebAssembly/WebAssemblyInstrFloat.td

llvm/lib/Target/X86/X86ISelLowering.h

llvm/test/CodeGen/RISCV/copysign-casts.ll

llvm/test/CodeGen/WebAssembly/copysign-casts.ll

[DAGCombiner][TargetLowering] Target hook for FCOPYSIGN arg cast folding
AbandonedPublic