This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
4/8
AMDGPUISelLowering.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
fneg-combines.new.ll
1/3
fneg-modifier-casting.ll

Differential D142746

AMDGPU: Fold fneg into bitcast of build_vector
ClosedPublic

Authored by arsenm on Jan 27 2023, 9:24 AM.

Download Raw Diff

Details

Reviewers

foad
rampitec
Pierre-vh

Group Reviewers

Restricted Project

Summary

The math libraries have a lot of code that performs
manual sign bit operations by bitcasting doubles to int2
and doing bithacking on them. This is a bad canonical form
we should rewrite to use high level sign operations directly
on double. To avoid codegen regressions, we need to do a better
job moving fnegs to operate only on the high 32-bits.

This is only halfway to fixing the real case.

Diff Detail

Event Timeline

arsenm created this revision.Jan 27 2023, 9:24 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2023, 9:24 AM

Herald added subscribers: bzcheeseman, kosarev, StephenFan and 7 others. · View Herald Transcript

arsenm requested review of this revision.Jan 27 2023, 9:24 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2023, 9:24 AM

Herald added a subscriber: wdng. · View Herald Transcript

arsenm added parent revisions: D142682: AMDGPU: Combine down fcopysign f64 magnitude, D142641: AMDGPU: Force sign operand of f64 fcopysign to f32, D142585: AMDGPU: Try to push fneg as integer into select.Jan 27 2023, 9:24 AM

Harbormaster completed remote builds in B210394: Diff 492801.Jan 27 2023, 9:26 AM

arsenm added a child revision: D142749: AMDGPU: Push fneg into bitcast of integer select.Jan 27 2023, 9:26 AM

Pierre-vh added inline comments.Jan 31 2023, 1:56 AM

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
576–588
4161	Do a early return instead to reduce indentation?
4167–4181	Can you add a comment to show what this is doing? (`A -> B` style comment like we usually do for DAG combines)

Address comments

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
4161	That would just get re-indented in the next patch

Harbormaster completed remote builds in B211689: Diff 494580.Feb 3 2023, 3:21 AM

Pierre-vh added inline comments.Feb 8 2023, 3:00 AM

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
4173–4174	Wouldn't it make more sense to bitcast the fneg back to i32 to get 2xi32 -> f64? Would that change codegen? 2xf32 -> f64 looks wrong to me (even though it's correct in that context)
llvm/test/CodeGen/AMDGPU/fneg-modifier-casting.ll
517	There's a small regression here (1 extra instruction), is that addressed in a next patch?
917	ditto

Pierre-vh mentioned this in D142585: AMDGPU: Try to push fneg as integer into select.Feb 8 2023, 3:05 AM

arsenm added inline comments.Mar 15 2023, 3:28 PM

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
4167–4181	That's already what's above?
4173–4174	That would just introduce new bitcasts, which would be equally foldable back. Minimal bitcasts is better. They get in the way enough as is

Rebase

Rebase had a few regressions due to the revert of 11c3cead23783e65fb30e673d62771352078ff05

llvm/test/CodeGen/AMDGPU/fneg-modifier-casting.ll
517	Yes, there's additional source modifier usage after D142749

Harbormaster completed remote builds in B219765: Diff 505669.Mar 15 2023, 6:35 PM

ping

rampitec accepted this revision.Apr 10 2023, 10:22 AM

rampitec added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
4176	This is really a hack, half of the f64 is not an f32. It's unfortunate we have to do it.

This revision is now accepted and ready to land.Apr 10 2023, 10:22 AM

0f59720e1c3416030a91fdfb6f016fd7fdf21e85

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUISelLowering.cpp

61 lines

test/

CodeGen/

AMDGPU/

fneg-combines.new.ll

7 lines

fneg-modifier-casting.ll

60 lines

Diff 505669

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 550 Lines • ▼ Show 20 Lines bool AMDGPUTargetLowering::mayIgnoreSignedZero(SDValue Op) const {

return false; return false;

} }

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// Target Information // Target Information

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

LLVM_READNONE LLVM_READNONE

static bool fnegFoldsIntoOp(unsigned Opc) { static bool fnegFoldsIntoOpcode(unsigned Opc) {

switch (Opc) { switch (Opc) {

case ISD::FADD: case ISD::FADD:

case ISD::FSUB: case ISD::FSUB:

case ISD::FMUL: case ISD::FMUL:

case ISD::FMA: case ISD::FMA:

case ISD::FMAD: case ISD::FMAD:

case ISD::FMINNUM: case ISD::FMINNUM:

case ISD::FMAXNUM: case ISD::FMAXNUM:

case ISD::FMINNUM_IEEE: case ISD::FMINNUM_IEEE:

case ISD::FMAXNUM_IEEE: case ISD::FMAXNUM_IEEE:

case ISD::SELECT: case ISD::SELECT:

case ISD::FSIN: case ISD::FSIN:

case ISD::FTRUNC: case ISD::FTRUNC:

case ISD::FRINT: case ISD::FRINT:

case ISD::FNEARBYINT: case ISD::FNEARBYINT:

case ISD::FCANONICALIZE: case ISD::FCANONICALIZE:

case AMDGPUISD::RCP: case AMDGPUISD::RCP:

case AMDGPUISD::RCP_LEGACY: case AMDGPUISD::RCP_LEGACY:

case AMDGPUISD::RCP_IFLAG: case AMDGPUISD::RCP_IFLAG:

case AMDGPUISD::SIN_HW: case AMDGPUISD::SIN_HW:

case AMDGPUISD::FMUL_LEGACY: case AMDGPUISD::FMUL_LEGACY:

case AMDGPUISD::FMIN_LEGACY: case AMDGPUISD::FMIN_LEGACY:

case AMDGPUISD::FMAX_LEGACY: case AMDGPUISD::FMAX_LEGACY:

case AMDGPUISD::FMED3: case AMDGPUISD::FMED3:

// TODO: handle llvm.amdgcn.fma.legacy // TODO: handle llvm.amdgcn.fma.legacy

return true; return true;

case ISD::BITCAST:

llvm_unreachable("bitcast is special cased");

default: default:

Pierre-vhUnsubmitted

Done

case ISD::FCANONICALIZE:

- return true;

- case ISD::BITCAST:

- llvm_unreachable("bitcast is special cased");

case AMDGPUISD::RCP:

case AMDGPUISD::RCP_LEGACY:

case AMDGPUISD::RCP_IFLAG:

case AMDGPUISD::SIN_HW:

case AMDGPUISD::FMUL_LEGACY:

case AMDGPUISD::FMIN_LEGACY:

case AMDGPUISD::FMAX_LEGACY:

case AMDGPUISD::FMED3:

// TODO: handle llvm.amdgcn.fma.legacy

return true;

- default:

+ case ISD::BITCAST:

+ llvm_unreachable("bitcast is special cased"); default:

Pierre-vh:

return false; return false;

} }

static bool fnegFoldsIntoOp(const SDNode *N) {

unsigned Opc = N->getOpcode();

if (Opc == ISD::BITCAST) {

// TODO: Is there a benefit to checking the conditions performFNegCombine

// does? We don't for the other cases.

SDValue BCSrc = N->getOperand(0);

return BCSrc.getOpcode() == ISD::BUILD_VECTOR &&

BCSrc.getNumOperands() == 2 &&

BCSrc.getOperand(1).getValueSizeInBits() == 32;

}

return fnegFoldsIntoOpcode(Opc);

}

/// \p returns true if the operation will definitely need to use a 64-bit /// \p returns true if the operation will definitely need to use a 64-bit

/// encoding, and thus will use a VOP3 encoding regardless of the source /// encoding, and thus will use a VOP3 encoding regardless of the source

/// modifiers. /// modifiers.

LLVM_READONLY LLVM_READONLY

static bool opMustUseVOP3Encoding(const SDNode *N, MVT VT) { static bool opMustUseVOP3Encoding(const SDNode *N, MVT VT) {

return (N->getNumOperands() > 2 && N->getOpcode() != ISD::SELECT) || return (N->getNumOperands() > 2 && N->getOpcode() != ISD::SELECT) ||

VT == MVT::f64; VT == MVT::f64;

} }

▲ Show 20 Lines • Show All 3,177 Lines • ▼ Show 20 Lines if ((LHS.getOpcode() == ISD::FNEG || LHS.getOpcode() == ISD::FABS) && CRHS &&

SDValue NewLHS = LHS.getOperand(0); SDValue NewLHS = LHS.getOperand(0);

SDValue NewRHS = RHS; SDValue NewRHS = RHS;

// Careful: if the neg can be folded up, don't try to pull it back down. // Careful: if the neg can be folded up, don't try to pull it back down.

bool ShouldFoldNeg = true; bool ShouldFoldNeg = true;

if (NewLHS.hasOneUse()) { if (NewLHS.hasOneUse()) {

unsigned Opc = NewLHS.getOpcode(); unsigned Opc = NewLHS.getOpcode();

if (LHS.getOpcode() == ISD::FNEG && fnegFoldsIntoOp(Opc)) if (LHS.getOpcode() == ISD::FNEG && fnegFoldsIntoOp(NewLHS.getNode()))

ShouldFoldNeg = false; ShouldFoldNeg = false;

if (LHS.getOpcode() == ISD::FABS && Opc == ISD::FMUL) if (LHS.getOpcode() == ISD::FABS && Opc == ISD::FMUL)

ShouldFoldNeg = false; ShouldFoldNeg = false;

} }

if (ShouldFoldNeg) { if (ShouldFoldNeg) {

if (LHS.getOpcode() == ISD::FABS && CRHS->isNegative()) if (LHS.getOpcode() == ISD::FABS && CRHS->isNegative())

return SDValue(); return SDValue();

▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines static unsigned inverseMinMax(unsigned Opc) {

default: default:

llvm_unreachable("invalid min/max opcode"); llvm_unreachable("invalid min/max opcode");

} }

/// \return true if it's profitable to try to push an fneg into its source /// \return true if it's profitable to try to push an fneg into its source

/// instruction. /// instruction.

bool AMDGPUTargetLowering::shouldFoldFNegIntoSrc(SDNode *N, SDValue N0) { bool AMDGPUTargetLowering::shouldFoldFNegIntoSrc(SDNode *N, SDValue N0) {

unsigned Opc = N0.getOpcode();

// If the input has multiple uses and we can either fold the negate down, or // If the input has multiple uses and we can either fold the negate down, or

// the other uses cannot, give up. This both prevents unprofitable // the other uses cannot, give up. This both prevents unprofitable

// transformations and infinite loops: we won't repeatedly try to fold around // transformations and infinite loops: we won't repeatedly try to fold around

// a negate that has no 'good' form. // a negate that has no 'good' form.

if (N0.hasOneUse()) { if (N0.hasOneUse()) {

// This may be able to fold into the source, but at a code size cost. Don't // This may be able to fold into the source, but at a code size cost. Don't

// fold if the fold into the user is free. // fold if the fold into the user is free.

if (allUsesHaveSourceMods(N, 0)) if (allUsesHaveSourceMods(N, 0))

return false; return false;

} else { } else {

if (fnegFoldsIntoOp(Opc) && if (fnegFoldsIntoOp(N0.getNode()) &&

(allUsesHaveSourceMods(N) || !allUsesHaveSourceMods(N0.getNode()))) (allUsesHaveSourceMods(N) || !allUsesHaveSourceMods(N0.getNode())))

return false; return false;

} }

return true; return true;

} }

SDValue AMDGPUTargetLowering::performFNegCombine(SDNode *N, SDValue AMDGPUTargetLowering::performFNegCombine(SDNode *N,

▲ Show 20 Lines • Show All 189 Lines • ▼ Show 20 Lines SDValue IntFNeg = DAG.getNode(ISD::XOR, SL, SrcVT, Src,

DAG.getConstant(0x8000, SL, SrcVT)); DAG.getConstant(0x8000, SL, SrcVT));

return DAG.getNode(ISD::FP16_TO_FP, SL, N->getValueType(0), IntFNeg); return DAG.getNode(ISD::FP16_TO_FP, SL, N->getValueType(0), IntFNeg);

} }

case ISD::SELECT: { case ISD::SELECT: {

// fneg (select c, a, b) -> select c, (fneg a), (fneg b) // fneg (select c, a, b) -> select c, (fneg a), (fneg b)

// TODO: Invert conditions of foldFreeOpFromSelect // TODO: Invert conditions of foldFreeOpFromSelect

return SDValue(); return SDValue();

} }

case ISD::BITCAST: {

SDLoc SL(N);

SDValue BCSrc = N0.getOperand(0);

if (BCSrc.getOpcode() == ISD::BUILD_VECTOR) {

Pierre-vhUnsubmitted

Not Done

Do a early return instead to reduce indentation?

Pierre-vh: Do a early return instead to reduce indentation?

arsenmAuthorUnsubmitted

Done

That would just get re-indented in the next patch

arsenm: That would just get re-indented in the next patch

SDValue HighBits = BCSrc.getOperand(BCSrc.getNumOperands() - 1);

if (HighBits.getValueType().getSizeInBits() != 32 ||

!fnegFoldsIntoOp(HighBits.getNode()))

return SDValue();

// f64 fneg only really needs to operate on the high half of of the

// register, so try to force it to an f32 operation to help make use of

// source modifiers.

// fneg (f64 (bitcast (build_vector x, y))) ->

// f64 (bitcast (build_vector (bitcast i32:x to f32),

// (fneg (bitcast i32:y to f32)))

Pierre-vhUnsubmitted

Not Done

Wouldn't it make more sense to bitcast the fneg back to i32 to get 2xi32 -> f64? Would that change codegen?
2xf32 -> f64 looks wrong to me (even though it's correct in that context)

Pierre-vh: Wouldn't it make more sense to bitcast the fneg back to i32 to get 2xi32 -> f64? Would that…

arsenmAuthorUnsubmitted

Done

That would just introduce new bitcasts, which would be equally foldable back. Minimal bitcasts is better. They get in the way enough as is

arsenm: That would just introduce new bitcasts, which would be equally foldable back. Minimal bitcasts…

SDValue CastHi = DAG.getNode(ISD::BITCAST, SL, MVT::f32, HighBits);

rampitecUnsubmitted

Not Done

This is really a hack, half of the f64 is not an f32. It's unfortunate we have to do it.

rampitec: This is really a hack, half of the f64 is not an f32. It's unfortunate we have to do it.

SDValue NegHi = DAG.getNode(ISD::FNEG, SL, MVT::f32, CastHi);

SDValue CastBack =

DAG.getNode(ISD::BITCAST, SL, HighBits.getValueType(), NegHi);

SmallVector<SDValue, 8> Ops(BCSrc->op_begin(), BCSrc->op_end());

Pierre-vhUnsubmitted

Not Done

Can you add a comment to show what this is doing? (A -> B style comment like we usually do for DAG combines)

Pierre-vh: Can you add a comment to show what this is doing? (`A -> B` style comment like we usually do…

arsenmAuthorUnsubmitted

Done

That's already what's above?

arsenm: That's already what's above?

Ops.back() = CastBack;

DCI.AddToWorklist(NegHi.getNode());

SDValue Build =

DAG.getNode(ISD::BUILD_VECTOR, SL, BCSrc.getValueType(), Ops);

SDValue Result = DAG.getNode(ISD::BITCAST, SL, VT, Build);

if (!N0.hasOneUse())

DAG.ReplaceAllUsesWith(N0, DAG.getNode(ISD::FNEG, SL, VT, Result));

return Result;

}

return SDValue();

}

default: default:

return SDValue(); return SDValue();

} }

SDValue AMDGPUTargetLowering::performFAbsCombine(SDNode *N, SDValue AMDGPUTargetLowering::performFAbsCombine(SDNode *N,

DAGCombinerInfo &DCI) const { DAGCombinerInfo &DCI) const {

SelectionDAG &DAG = DCI.DAG; SelectionDAG &DAG = DCI.DAG;

▲ Show 20 Lines • Show All 925 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fneg-combines.new.ll

	Show First 20 Lines • Show All 3,020 Lines • ▼ Show 20 Lines
	; SI-LABEL: s_fneg_select_infloop_regression_f64:			; SI-LABEL: s_fneg_select_infloop_regression_f64:
	; SI: ; %bb.0:			; SI: ; %bb.0:
	; SI-NEXT: s_load_dword s4, s[0:1], 0xb			; SI-NEXT: s_load_dword s4, s[0:1], 0xb
	; SI-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x9			; SI-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x9
	; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_and_b32 s4, 1, s4			; SI-NEXT: s_and_b32 s4, 1, s4
	; SI-NEXT: s_cselect_b32 s3, 0, s3			; SI-NEXT: s_cselect_b32 s3, 0, s3
	; SI-NEXT: s_cselect_b32 s2, 0, s2
	; SI-NEXT: s_xor_b32 s3, s3, 0x80000000			; SI-NEXT: s_xor_b32 s3, s3, 0x80000000
	; SI-NEXT: s_cmp_eq_u32 s4, 1			; SI-NEXT: s_cmp_eq_u32 s4, 1
	; SI-NEXT: s_cselect_b32 s3, 0, s3
	; SI-NEXT: s_cselect_b32 s2, 0, s2			; SI-NEXT: s_cselect_b32 s2, 0, s2
				; SI-NEXT: s_cselect_b32 s3, 0, s3
	; SI-NEXT: v_mov_b32_e32 v3, s1			; SI-NEXT: v_mov_b32_e32 v3, s1
	; SI-NEXT: v_mov_b32_e32 v0, s2			; SI-NEXT: v_mov_b32_e32 v0, s2
	; SI-NEXT: v_mov_b32_e32 v1, s3			; SI-NEXT: v_mov_b32_e32 v1, s3
	; SI-NEXT: v_mov_b32_e32 v2, s0			; SI-NEXT: v_mov_b32_e32 v2, s0
	; SI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; SI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: s_fneg_select_infloop_regression_f64:			; VI-LABEL: s_fneg_select_infloop_regression_f64:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dword s4, s[0:1], 0x2c			; VI-NEXT: s_load_dword s4, s[0:1], 0x2c
	; VI-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24			; VI-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24
	; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_and_b32 s4, 1, s4			; VI-NEXT: s_and_b32 s4, 1, s4
	; VI-NEXT: s_cselect_b32 s3, 0, s3			; VI-NEXT: s_cselect_b32 s3, 0, s3
	; VI-NEXT: s_cselect_b32 s2, 0, s2
	; VI-NEXT: s_xor_b32 s3, s3, 0x80000000			; VI-NEXT: s_xor_b32 s3, s3, 0x80000000
	; VI-NEXT: s_cmp_eq_u32 s4, 1			; VI-NEXT: s_cmp_eq_u32 s4, 1
	; VI-NEXT: s_cselect_b32 s3, 0, s3
	; VI-NEXT: s_cselect_b32 s2, 0, s2			; VI-NEXT: s_cselect_b32 s2, 0, s2
				; VI-NEXT: s_cselect_b32 s3, 0, s3
	; VI-NEXT: v_mov_b32_e32 v3, s1			; VI-NEXT: v_mov_b32_e32 v3, s1
	; VI-NEXT: v_mov_b32_e32 v0, s2			; VI-NEXT: v_mov_b32_e32 v0, s2
	; VI-NEXT: v_mov_b32_e32 v1, s3			; VI-NEXT: v_mov_b32_e32 v1, s3
	; VI-NEXT: v_mov_b32_e32 v2, s0			; VI-NEXT: v_mov_b32_e32 v2, s0
	; VI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; VI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	%i = select i1 %arg1, double 0.0, double %arg			%i = select i1 %arg1, double 0.0, double %arg
	%i2 = fneg double %i			%i2 = fneg double %i
	%i3 = select i1 %arg1, double 0.0, double %i2			%i3 = select i1 %arg1, double 0.0, double %i2
	store double %i3, ptr addrspace(1) %ptr, align 4			store double %i3, ptr addrspace(1) %ptr, align 4
	ret void			ret void
	}			}

	define double @v_fneg_select_infloop_regression_f64(double %arg, i1 %arg1) {			define double @v_fneg_select_infloop_regression_f64(double %arg, i1 %arg1) {
	; GCN-LABEL: v_fneg_select_infloop_regression_f64:			; GCN-LABEL: v_fneg_select_infloop_regression_f64:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_and_b32_e32 v2, 1, v2			; GCN-NEXT: v_and_b32_e32 v2, 1, v2
	; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 1, v2			; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 1, v2
	; GCN-NEXT: v_cndmask_b32_e64 v1, v1, 0, vcc			; GCN-NEXT: v_cndmask_b32_e64 v1, v1, 0, vcc
	; GCN-NEXT: v_cndmask_b32_e64 v0, v0, 0, vcc
	; GCN-NEXT: v_xor_b32_e32 v1, 0x80000000, v1			; GCN-NEXT: v_xor_b32_e32 v1, 0x80000000, v1
	; GCN-NEXT: v_cndmask_b32_e64 v0, v0, 0, vcc			; GCN-NEXT: v_cndmask_b32_e64 v0, v0, 0, vcc
	; GCN-NEXT: v_cndmask_b32_e64 v1, v1, 0, vcc			; GCN-NEXT: v_cndmask_b32_e64 v1, v1, 0, vcc
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%i = select i1 %arg1, double 0.0, double %arg			%i = select i1 %arg1, double 0.0, double %arg
	%i2 = fneg double %i			%i2 = fneg double %i
	%i3 = select i1 %arg1, double 0.0, double %i2			%i3 = select i1 %arg1, double 0.0, double %i2
	ret double %i3			ret double %i3
	▲ Show 20 Lines • Show All 375 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fneg-modifier-casting.ll

Show First 20 Lines • Show All 391 Lines • ▼ Show 20 Lines
}		}

define double @fneg_xor_select_f64(i1 %cond, double %arg0, double %arg1) {		define double @fneg_xor_select_f64(i1 %cond, double %arg0, double %arg1) {
; GCN-LABEL: fneg_xor_select_f64:		; GCN-LABEL: fneg_xor_select_f64:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: v_and_b32_e32 v0, 1, v0		; GCN-NEXT: v_and_b32_e32 v0, 1, v0
; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 1, v0		; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 1, v0
; GCN-NEXT: v_cndmask_b32_e32 v2, v4, v2, vcc
; GCN-NEXT: v_cndmask_b32_e32 v0, v3, v1, vcc		; GCN-NEXT: v_cndmask_b32_e32 v0, v3, v1, vcc
; GCN-NEXT: v_xor_b32_e32 v1, 0x80000000, v2		; GCN-NEXT: v_cndmask_b32_e32 v1, v4, v2, vcc
		; GCN-NEXT: v_xor_b32_e32 v1, 0x80000000, v1
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: fneg_xor_select_f64:		; GFX11-LABEL: fneg_xor_select_f64:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: v_and_b32_e32 v0, 1, v0		; GFX11-NEXT: v_and_b32_e32 v0, 1, v0
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_2) \| instid1(VALU_DEP_2)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_cmp_eq_u32_e32 vcc_lo, 1, v0		; GFX11-NEXT: v_cmp_eq_u32_e32 vcc_lo, 1, v0
; GFX11-NEXT: v_cndmask_b32_e32 v2, v4, v2, vcc_lo		; GFX11-NEXT: v_dual_cndmask_b32 v0, v3, v1 :: v_dual_cndmask_b32 v1, v4, v2
; GFX11-NEXT: v_cndmask_b32_e32 v0, v3, v1, vcc_lo		; GFX11-NEXT: v_xor_b32_e32 v1, 0x80000000, v1
; GFX11-NEXT: v_xor_b32_e32 v1, 0x80000000, v2
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
%select = select i1 %cond, double %arg0, double %arg1		%select = select i1 %cond, double %arg0, double %arg1
%fneg = fneg double %select		%fneg = fneg double %select
ret double %fneg		ret double %fneg
}		}

define double @fneg_xor_select_f64_multi_user(i1 %cond, double %arg0, double %arg1, ptr addrspace(1) %ptr) {		define double @fneg_xor_select_f64_multi_user(i1 %cond, double %arg0, double %arg1, ptr addrspace(1) %ptr) {
; GFX7-LABEL: fneg_xor_select_f64_multi_user:		; GFX7-LABEL: fneg_xor_select_f64_multi_user:
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
define double @select_fneg_select_fneg_f64(i1 %cond0, i1 %cond1, double %arg0, double %arg1) {		define double @select_fneg_select_fneg_f64(i1 %cond0, i1 %cond1, double %arg0, double %arg1) {
; GCN-LABEL: select_fneg_select_fneg_f64:		; GCN-LABEL: select_fneg_select_fneg_f64:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: v_and_b32_e32 v0, 1, v0		; GCN-NEXT: v_and_b32_e32 v0, 1, v0
; GCN-NEXT: v_xor_b32_e32 v3, 0x80000000, v3		; GCN-NEXT: v_xor_b32_e32 v3, 0x80000000, v3
; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 1, v0		; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 1, v0
; GCN-NEXT: v_and_b32_e32 v1, 1, v1		; GCN-NEXT: v_and_b32_e32 v1, 1, v1
; GCN-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc
; GCN-NEXT: v_cndmask_b32_e32 v0, v2, v4, vcc		; GCN-NEXT: v_cndmask_b32_e32 v0, v2, v4, vcc
; GCN-NEXT: v_xor_b32_e32 v2, 0x80000000, v3		; GCN-NEXT: v_cndmask_b32_e32 v2, v3, v5, vcc
		; GCN-NEXT: v_xor_b32_e32 v3, 0x80000000, v2
; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 1, v1		; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 1, v1
; GCN-NEXT: v_cndmask_b32_e32 v1, v3, v2, vcc		; GCN-NEXT: v_cndmask_b32_e32 v1, v2, v3, vcc
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: select_fneg_select_fneg_f64:		; GFX11-LABEL: select_fneg_select_fneg_f64:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: v_xor_b32_e32 v3, 0x80000000, v3
; GFX11-NEXT: v_and_b32_e32 v0, 1, v0		; GFX11-NEXT: v_and_b32_e32 v0, 1, v0
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_4)		; GFX11-NEXT: v_xor_b32_e32 v3, 0x80000000, v3
		; GFX11-NEXT: v_and_b32_e32 v1, 1, v1
		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_1) \| instid1(VALU_DEP_4)
		Pierre-vhUnsubmitted Not Done Reply Inline Actions There's a small regression here (1 extra instruction), is that addressed in a next patch? Pierre-vh: There's a small regression here (1 extra instruction), is that addressed in a next patch?
		arsenmAuthorUnsubmitted Done Reply Inline Actions Yes, there's additional source modifier usage after D142749 arsenm: Yes, there's additional source modifier usage after D142749
; GFX11-NEXT: v_cmp_eq_u32_e32 vcc_lo, 1, v0		; GFX11-NEXT: v_cmp_eq_u32_e32 vcc_lo, 1, v0
; GFX11-NEXT: v_dual_cndmask_b32 v0, v2, v4 :: v_dual_and_b32 v1, 1, v1		; GFX11-NEXT: v_cndmask_b32_e32 v0, v2, v4, vcc_lo
; GFX11-NEXT: v_cndmask_b32_e32 v3, v3, v5, vcc_lo		; GFX11-NEXT: v_cndmask_b32_e32 v2, v3, v5, vcc_lo
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) \| instskip(NEXT) \| instid1(VALU_DEP_2)
; GFX11-NEXT: v_cmp_eq_u32_e32 vcc_lo, 1, v1		; GFX11-NEXT: v_cmp_eq_u32_e32 vcc_lo, 1, v1
; GFX11-NEXT: v_xor_b32_e32 v5, 0x80000000, v3		; GFX11-NEXT: v_xor_b32_e32 v3, 0x80000000, v2
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_cndmask_b32_e32 v1, v3, v5, vcc_lo		; GFX11-NEXT: v_cndmask_b32_e32 v1, v2, v3, vcc_lo
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
%fneg0 = fneg double %arg0		%fneg0 = fneg double %arg0
%select0 = select i1 %cond0, double %arg1, double %fneg0		%select0 = select i1 %cond0, double %arg1, double %fneg0
%fneg1 = fneg double %select0		%fneg1 = fneg double %select0
%select1 = select i1 %cond1, double %fneg1, double %select0		%select1 = select i1 %cond1, double %fneg1, double %select0
ret double %select1		ret double %select1
}		}

▲ Show 20 Lines • Show All 354 Lines • ▼ Show 20 Lines
}		}

define double @cospiD_pattern1(i32 %arg, double %arg1, double %arg2) {		define double @cospiD_pattern1(i32 %arg, double %arg1, double %arg2) {
; GCN-LABEL: cospiD_pattern1:		; GCN-LABEL: cospiD_pattern1:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: v_and_b32_e32 v5, 1, v0		; GCN-NEXT: v_and_b32_e32 v5, 1, v0
; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5		; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v5
; GCN-NEXT: v_cndmask_b32_e32 v4, v2, v4, vcc		; GCN-NEXT: v_cndmask_b32_e32 v3, v1, v3, vcc
; GCN-NEXT: v_cndmask_b32_e32 v2, v1, v3, vcc		; GCN-NEXT: v_cndmask_b32_e32 v1, v2, v4, vcc
; GCN-NEXT: v_xor_b32_e32 v1, 0x80000000, v4		; GCN-NEXT: v_xor_b32_e32 v2, 0x80000000, v1
; GCN-NEXT: v_cmp_lt_i32_e32 vcc, 1, v0		; GCN-NEXT: v_cmp_lt_i32_e32 vcc, 1, v0
; GCN-NEXT: v_cndmask_b32_e32 v1, v4, v1, vcc		; GCN-NEXT: v_cndmask_b32_e32 v1, v1, v2, vcc
; GCN-NEXT: v_mov_b32_e32 v0, v2		; GCN-NEXT: v_mov_b32_e32 v0, v3
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: cospiD_pattern1:		; GFX11-LABEL: cospiD_pattern1:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: v_and_b32_e32 v5, 1, v0		; GFX11-NEXT: v_and_b32_e32 v5, 1, v0
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_3) \| instid1(VALU_DEP_3)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_3) \| instid1(VALU_DEP_3)
; GFX11-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v5		; GFX11-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v5
; GFX11-NEXT: v_cndmask_b32_e32 v4, v2, v4, vcc_lo		; GFX11-NEXT: v_cndmask_b32_e32 v3, v1, v3, vcc_lo
; GFX11-NEXT: v_cndmask_b32_e32 v2, v1, v3, vcc_lo		; GFX11-NEXT: v_cndmask_b32_e32 v1, v2, v4, vcc_lo
; GFX11-NEXT: v_cmp_lt_i32_e32 vcc_lo, 1, v0		; GFX11-NEXT: v_cmp_lt_i32_e32 vcc_lo, 1, v0
; GFX11-NEXT: v_xor_b32_e32 v5, 0x80000000, v4		; GFX11-NEXT: v_mov_b32_e32 v0, v3
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_1)
; GFX11-NEXT: v_dual_mov_b32 v0, v2 :: v_dual_cndmask_b32 v1, v4, v5		; GFX11-NEXT: v_xor_b32_e32 v2, 0x80000000, v1
		; GFX11-NEXT: v_cndmask_b32_e32 v1, v1, v2, vcc_lo
		Pierre-vhUnsubmitted Not Done Reply Inline Actions ditto Pierre-vh: ditto
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
%i = and i32 %arg, 1		%i = and i32 %arg, 1
%i3 = icmp eq i32 %i, 0		%i3 = icmp eq i32 %i, 0
%i4 = select i1 %i3, double %arg2, double %arg1		%i4 = select i1 %i3, double %arg2, double %arg1
%i5 = icmp sgt i32 %arg, 1		%i5 = icmp sgt i32 %arg, 1
%i6 = fneg double %i4		%i6 = fneg double %i4
%i7 = select i1 %i5, double %i6, double %i4		%i7 = select i1 %i5, double %i6, double %i4
ret double %i7		ret double %i7
▲ Show 20 Lines • Show All 460 Lines • ▼ Show 20 Lines	; GFX11-NEXT: s_setpc_b64 s[30:31]
%fmul = fmul double %fneg, %fp.val		%fmul = fmul double %fneg, %fp.val
ret double %fmul		ret double %fmul
}		}

define double @fneg_f64_bitcast_build_vector_v2f32_foldable_sources_to_f64(float %elt0, float %elt1) {		define double @fneg_f64_bitcast_build_vector_v2f32_foldable_sources_to_f64(float %elt0, float %elt1) {
; GCN-LABEL: fneg_f64_bitcast_build_vector_v2f32_foldable_sources_to_f64:		; GCN-LABEL: fneg_f64_bitcast_build_vector_v2f32_foldable_sources_to_f64:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: v_add_f32_e32 v1, 2.0, v1		; GCN-NEXT: v_sub_f32_e32 v1, -2.0, v1
; GCN-NEXT: v_xor_b32_e32 v1, 0x80000000, v1
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX11-LABEL: fneg_f64_bitcast_build_vector_v2f32_foldable_sources_to_f64:		; GFX11-LABEL: fneg_f64_bitcast_build_vector_v2f32_foldable_sources_to_f64:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX11-NEXT: s_waitcnt_vscnt null, 0x0		; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
; GFX11-NEXT: v_add_f32_e32 v1, 2.0, v1		; GFX11-NEXT: v_sub_f32_e32 v1, -2.0, v1
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_xor_b32_e32 v1, 0x80000000, v1
; GFX11-NEXT: s_setpc_b64 s[30:31]		; GFX11-NEXT: s_setpc_b64 s[30:31]
%fadd = fadd nsz nnan float %elt1, 2.0		%fadd = fadd nsz nnan float %elt1, 2.0
%insert.0 = insertelement <2 x float> poison, float %elt0, i32 0		%insert.0 = insertelement <2 x float> poison, float %elt0, i32 0
%insert.1 = insertelement <2 x float> %insert.0, float %fadd, i32 1		%insert.1 = insertelement <2 x float> %insert.0, float %fadd, i32 1
%bitcast = bitcast <2 x float> %insert.1 to double		%bitcast = bitcast <2 x float> %insert.1 to double
%fneg = fneg double %bitcast		%fneg = fneg double %bitcast
ret double %fneg		ret double %fneg
}		}
▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Fold fneg into bitcast of build_vectorClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 505669

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/test/CodeGen/AMDGPU/fneg-combines.new.ll

llvm/test/CodeGen/AMDGPU/fneg-modifier-casting.ll

AMDGPU: Fold fneg into bitcast of build_vector
ClosedPublic