This is an archive of the discontinued LLVM Phabricator instance.

GlobalISel: Don't lose fneg flags when lowering to fsub
ClosedPublic

Authored by arsenm on Jun 17 2019, 4:32 AM.

Download Raw Diff

Details

Reviewers

mcberg2017
cameron.mcinally
spatel
aemerson
aditya_nandakumar
paquette
volkan
hfinkel
efriedma

Summary

This was ignoring the flag on fneg, and using the source instruction's
flags which wasn't obviously correct looking to me. I think this is
OK, but should also keep any flags present on the original fneg.

Also fixes tests missing from r358702.

Diff Detail

Event Timeline

arsenm created this revision.Jun 17 2019, 4:32 AM

Herald added subscribers: Petar.Avramovic, kristof.beyls, rovka and 3 others. · View Herald TranscriptJun 17 2019, 4:32 AM

Seems reasonable to me.

This revision is now accepted and ready to land.Jun 17 2019, 8:25 AM

I'm not familiar with how flags are being used in this layer, but union of flags doesn't match the way we deal with FMF in IR. If the fneg is strict (disregard the irrelevant 'arcp' in that last test), then it's invalid to assume that it inherits the 'nsz' property from its operand.

In D63405#1546342, @spatel wrote:

I'm not familiar with how flags are being used in this layer, but union of flags doesn't match the way we deal with FMF in IR. If the fneg is strict (disregard the irrelevant 'arcp' in that last test), then it's invalid to assume that it inherits the 'nsz' property from its operand.

I would expect the lowering here to just preserve the flag on the original instruction, and not propagate the source (as was done here previously). The corresponding expansion in the DAG has a TODO to preserve the flags, so there isn't precedent for this

In D63405#1546348, @arsenm wrote:

In D63405#1546342, @spatel wrote:

I'm not familiar with how flags are being used in this layer, but union of flags doesn't match the way we deal with FMF in IR. If the fneg is strict (disregard the irrelevant 'arcp' in that last test), then it's invalid to assume that it inherits the 'nsz' property from its operand.

I would expect the lowering here to just preserve the flag on the original instruction, and not propagate the source (as was done here previously). The corresponding expansion in the DAG has a TODO to preserve the flags, so there isn't precedent for this

Let me know if I'm not seeing this correctly (haven't looked at MIR very much - it might help if the test was committed with the baseline output first, so we just have a diff in this patch).
Did we start with 'fneg arcp X' and end up with 'fsub arcp nsz -0.0, X'? Preserving (rather than unioning) the flag should end up with only 'fsub arcp -0.0, X'?

In D63405#1546410, @spatel wrote:

In D63405#1546348, @arsenm wrote:

In D63405#1546342, @spatel wrote:

I'm not familiar with how flags are being used in this layer, but union of flags doesn't match the way we deal with FMF in IR. If the fneg is strict (disregard the irrelevant 'arcp' in that last test), then it's invalid to assume that it inherits the 'nsz' property from its operand.

I would expect the lowering here to just preserve the flag on the original instruction, and not propagate the source (as was done here previously). The corresponding expansion in the DAG has a TODO to preserve the flags, so there isn't precedent for this

Let me know if I'm not seeing this correctly (haven't looked at MIR very much - it might help if the test was committed with the baseline output first, so we just have a diff in this patch).
Did we start with 'fneg arcp X' and end up with 'fsub arcp nsz -0.0, X'? Preserving (rather than unioning) the flag should end up with only 'fsub arcp -0.0, X'?

The pre-patch code would fold fneg (fadd flags x, y) -> fsub flags -0.0, (fadd flags x, y), and entirely ignore any flags on fneg. I'm not entirely sure this is correct on its own

In D63405#1546424, @arsenm wrote:

In D63405#1546410, @spatel wrote:

In D63405#1546348, @arsenm wrote:

In D63405#1546342, @spatel wrote:

I'm not familiar with how flags are being used in this layer, but union of flags doesn't match the way we deal with FMF in IR. If the fneg is strict (disregard the irrelevant 'arcp' in that last test), then it's invalid to assume that it inherits the 'nsz' property from its operand.

I would expect the lowering here to just preserve the flag on the original instruction, and not propagate the source (as was done here previously). The corresponding expansion in the DAG has a TODO to preserve the flags, so there isn't precedent for this

Let me know if I'm not seeing this correctly (haven't looked at MIR very much - it might help if the test was committed with the baseline output first, so we just have a diff in this patch).
Did we start with 'fneg arcp X' and end up with 'fsub arcp nsz -0.0, X'? Preserving (rather than unioning) the flag should end up with only 'fsub arcp -0.0, X'?

The pre-patch code would fold fneg (fadd flags x, y) -> fsub flags -0.0, (fadd flags x, y), and entirely ignore any flags on fneg. I'm not entirely sure this is correct on its own

The original code is wrong then (assuming we're using FMF on a value with the same reasoning that we use in IR/DAG). I'm not seeing how 'union' of flags is the correct fix though.

Just preserve the original flags

Ok, with the constraint like in the DAG:

as Unsafe or nsz on the Op

 SDAG is preserving the Op flags or in this case the MI flags in place of SrcMI.

IIRC, isn't preserving the original flags the outcome we want here? If so, I think the updated patch is fine.

@mcberg2017 is that correct?

... ah wait he already got to it while I was typing, never mind. :)

Should we not also guard GlobslIsel translation in this case and avoid if not met?

SDAG fails isNegatibleForFree if that constraint is not met and we do not translate.

In D63405#1546507, @mcberg2017 wrote:
Ok, with the constraint like in the DAG:
as Unsafe or nsz on the Op

 SDAG is preserving the Op flags or in this case the MI flags in place of SrcMI.

I don't follow. I don't see why the source should matter. The DAG currently does not preserve the flags here

I am saying you have it right Matt for using MI flags, but we should guard folding based on Unsafe or nsz in the Flags.

In D63405#1546558, @mcberg2017 wrote:

I am saying you have it right Matt for using MI flags, but we should guard folding based on Unsafe or nsz in the Flags.

There's no folding going on here? This is just a 1-op to 1-op legalization

We do this as a combine earlier in target specific code under this constraint. Matt, perhaps you just missing that combine in GlobalIsel?

For your target. But as a catch all for legalization we should do this.

In D63405#1546641, @mcberg2017 wrote:

For your target. But as a catch all for legalization we should do this.

AMDGPU doesn't use this legalization. I'm just trying to fix a correctness issue in the generic legalizer

So I agree, but I think there should be a target specific combine with guards for Unsafe and nsz and this legalization case (for those who do not have it).

Ok.

I think we should preserve the existing flags and be consistent with what we do in SDAGISel. LGTM.

This matches my mental model for FMF propagation, so LGTM.

But there's a separate question that is raised here: why is it legal to convert fneg to fsub -0.0? That loosens the IEEE requirement when dealing with a NAN. I'd think this should be legalized by converting to integer and flipping the sign bit (xor).
ping @cameron.mcinally

In D63405#1546701, @spatel wrote:

But there's a separate question that is raised here: why is it legal to convert fneg to fsub -0.0? That loosens the IEEE requirement when dealing with a NAN. I'd think this should be legalized by converting to integer and flipping the sign bit (xor).
ping @cameron.mcinally

Sanjay is correct. It’s not safe to convert fneg->fsub without nnan.

Should this expansion just be ripped out then? This is also broken in SelectionDAG. I don't like the idea of a legalization that relies on checking the flags, and this could be an optimization fold somewhere else

mcberg2017 mentioned this in D63458: Propagate fmf in IRTranslate for fneg.Jun 17 2019, 1:51 PM

mcberg2017 mentioned this in rL363631: Propagate fmf in IRTranslate for fneg.Jun 17 2019, 4:17 PM

mcberg2017 mentioned this in rGf9bff2a55e74: Propagate fmf in IRTranslate for fneg.

r363637

Revision Contents

Path

Size

lib/

CodeGen/

GlobalISel/

LegalizerHelper.cpp

3 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

irtranslator-fast-math-flags.ll

31 lines

legalize-fsub.mir

30 lines

unittests/

CodeGen/

GlobalISel/

LegalizerHelperTest.cpp

46 lines

Diff 205108

lib/CodeGen/GlobalISel/LegalizerHelper.cpp

Show First 20 Lines • Show All 1,457 Lines • ▼ Show 20 Lines	case TargetOpcode::G_FNEG: {
default:		default:
llvm_unreachable("unexpected floating-point type");		llvm_unreachable("unexpected floating-point type");
}		}
ConstantFP &ZeroForNegation =		ConstantFP &ZeroForNegation =
*cast<ConstantFP>(ConstantFP::getZeroValueForNegation(ZeroTy));		*cast<ConstantFP>(ConstantFP::getZeroValueForNegation(ZeroTy));
auto Zero = MIRBuilder.buildFConstant(Ty, ZeroForNegation);		auto Zero = MIRBuilder.buildFConstant(Ty, ZeroForNegation);
unsigned SubByReg = MI.getOperand(1).getReg();		unsigned SubByReg = MI.getOperand(1).getReg();
unsigned ZeroReg = Zero->getOperand(0).getReg();		unsigned ZeroReg = Zero->getOperand(0).getReg();
MachineInstr *SrcMI = MRI.getVRegDef(SubByReg);
MIRBuilder.buildInstr(TargetOpcode::G_FSUB, {Res}, {ZeroReg, SubByReg},		MIRBuilder.buildInstr(TargetOpcode::G_FSUB, {Res}, {ZeroReg, SubByReg},
SrcMI->getFlags());		MI.getFlags());
MI.eraseFromParent();		MI.eraseFromParent();
return Legalized;		return Legalized;
}		}
case TargetOpcode::G_FSUB: {		case TargetOpcode::G_FSUB: {
// Lower (G_FSUB LHS, RHS) to (G_FADD LHS, (G_FNEG RHS)).		// Lower (G_FSUB LHS, RHS) to (G_FADD LHS, (G_FNEG RHS)).
// First, check if G_FNEG is marked as Lower. If so, we may		// First, check if G_FNEG is marked as Lower. If so, we may
// end up with an infinite loop as G_FSUB is used to legalize G_FNEG.		// end up with an infinite loop as G_FSUB is used to legalize G_FNEG.
if (LI.getAction({G_FNEG, {Ty}}).Action == Lower)		if (LI.getAction({G_FNEG, {Ty}}).Action == Lower)
▲ Show 20 Lines • Show All 1,636 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/GlobalISel/irtranslator-fast-math-flags.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=fiji -O0 -stop-after=irtranslator -global-isel %s -o - \| FileCheck %s

				; Check flags are preserved for a regular instruction.
				; CHECK-LABEL: name: fadd_nnan
				; CHECK: nnan G_FADD
				define amdgpu_kernel void @fadd_nnan(float %arg0, float %arg1) {
				%res = fadd nnan float %arg0, %arg1
				store float %res, float addrspace(1)* undef
				ret void
				}

				; Check flags are preserved for a specially handled intrinsic
				; CHECK-LABEL: name: fma_fast
				; CHECK: nnan ninf nsz arcp contract afn reassoc G_FMA
				define amdgpu_kernel void @fma_fast(float %arg0, float %arg1, float %arg2) {
				%res = call fast float @llvm.fma.f32(float %arg0, float %arg1, float %arg2)
				store float %res, float addrspace(1)* undef
				ret void
				}

				; Check flags are preserved for an arbitrarry target intrinsic
				; CHECK-LABEL: name: rcp_nsz
				; CHECK: = nsz G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), %8(s32)
				define amdgpu_kernel void @rcp_nsz(float %arg0) {
				%res = call nsz float @llvm.amdgcn.rcp.f32 (float %arg0)
				store float %res, float addrspace(1)* undef
				ret void
				}

				declare float @llvm.fma.f32(float, float, float)
				declare float @llvm.amdgcn.rcp.f32(float)

test/CodeGen/AMDGPU/GlobalISel/legalize-fsub.mir

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	bb.0:
; GFX9: $vgpr0_vgpr1 = COPY [[FADD]](s64)		; GFX9: $vgpr0_vgpr1 = COPY [[FADD]](s64)
%0:_(s64) = COPY $vgpr0_vgpr1		%0:_(s64) = COPY $vgpr0_vgpr1
%1:_(s64) = COPY $vgpr2_vgpr3		%1:_(s64) = COPY $vgpr2_vgpr3
%2:_(s64) = G_FSUB %0, %1		%2:_(s64) = G_FSUB %0, %1
$vgpr0_vgpr1 = COPY %2		$vgpr0_vgpr1 = COPY %2
...		...

---		---
		name: test_fsub_s64_fmf
		body: \|
		bb.0:
		liveins: $vgpr0_vgpr1, $vgpr2_vgpr3

		; SI-LABEL: name: test_fsub_s64_fmf
		; SI: [[COPY:%[0-9]+]]:_(s64) = COPY $vgpr0_vgpr1
		; SI: [[COPY1:%[0-9]+]]:_(s64) = COPY $vgpr2_vgpr3
		; SI: [[FNEG:%[0-9]+]]:_(s64) = G_FNEG [[COPY1]]
		; SI: %2:_(s64) = nnan nsz G_FADD [[COPY]], [[FNEG]]
		; SI: $vgpr0_vgpr1 = COPY %2(s64)
		; VI-LABEL: name: test_fsub_s64_fmf
		; VI: [[COPY:%[0-9]+]]:_(s64) = COPY $vgpr0_vgpr1
		; VI: [[COPY1:%[0-9]+]]:_(s64) = COPY $vgpr2_vgpr3
		; VI: [[FNEG:%[0-9]+]]:_(s64) = G_FNEG [[COPY1]]
		; VI: %2:_(s64) = nnan nsz G_FADD [[COPY]], [[FNEG]]
		; VI: $vgpr0_vgpr1 = COPY %2(s64)
		; GFX9-LABEL: name: test_fsub_s64_fmf
		; GFX9: [[COPY:%[0-9]+]]:_(s64) = COPY $vgpr0_vgpr1
		; GFX9: [[COPY1:%[0-9]+]]:_(s64) = COPY $vgpr2_vgpr3
		; GFX9: [[FNEG:%[0-9]+]]:_(s64) = G_FNEG [[COPY1]]
		; GFX9: %2:_(s64) = nnan nsz G_FADD [[COPY]], [[FNEG]]
		; GFX9: $vgpr0_vgpr1 = COPY %2(s64)
		%0:_(s64) = COPY $vgpr0_vgpr1
		%1:_(s64) = COPY $vgpr2_vgpr3
		%2:_(s64) = nnan nsz G_FSUB %0, %1
		$vgpr0_vgpr1 = COPY %2
		...

		---
name: test_fsub_s16		name: test_fsub_s16
body: \|		body: \|
bb.0:		bb.0:
liveins: $vgpr0, $vgpr1		liveins: $vgpr0, $vgpr1

; SI-LABEL: name: test_fsub_s16		; SI-LABEL: name: test_fsub_s16
; SI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0		; SI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; SI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1		; SI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
▲ Show 20 Lines • Show All 402 Lines • Show Last 20 Lines

unittests/CodeGen/GlobalISel/LegalizerHelperTest.cpp

Show First 20 Lines • Show All 723 Lines • ▼ Show 20 Lines	TEST_F(GISelMITest, FewerElementsPhi) {
CHECK: [[INSERT0:%[0-9]+]]:_(<5 x s32>) = G_INSERT [[REBUILD_VAL_IMPDEF]]:_, [[PHI0]]:_(<2 x s32>), 0		CHECK: [[INSERT0:%[0-9]+]]:_(<5 x s32>) = G_INSERT [[REBUILD_VAL_IMPDEF]]:_, [[PHI0]]:_(<2 x s32>), 0
CHECK: [[INSERT1:%[0-9]+]]:_(<5 x s32>) = G_INSERT [[INSERT0]]:_, [[PHI1]]:_(<2 x s32>), 64		CHECK: [[INSERT1:%[0-9]+]]:_(<5 x s32>) = G_INSERT [[INSERT0]]:_, [[PHI1]]:_(<2 x s32>), 64
CHECK: [[INSERT2:%[0-9]+]]:_(<5 x s32>) = G_INSERT [[INSERT1]]:_, [[PHI2]]:_(s32), 128		CHECK: [[INSERT2:%[0-9]+]]:_(<5 x s32>) = G_INSERT [[INSERT1]]:_, [[PHI2]]:_(s32), 128
CHECK: [[USE_OP:%[0-9]+]]:_(<5 x s32>) = G_AND [[INSERT2]]:_, [[INSERT2]]:_		CHECK: [[USE_OP:%[0-9]+]]:_(<5 x s32>) = G_AND [[INSERT2]]:_, [[INSERT2]]:_
)";		)";

EXPECT_TRUE(CheckMachineFunction(MF, CheckStr)) << MF;		EXPECT_TRUE(CheckMachineFunction(MF, CheckStr)) << MF;
}		}

		// FNEG expansion in terms of FSUB
		TEST_F(GISelMITest, LowerFNEG) {
		if (!TM)
		return;

		// Declare your legalization info
		DefineLegalizerInfo(A, {
		getActionDefinitionsBuilder(G_FSUB).legalFor({s64});
		});

		// Build Instr. Make sure FMF are preserved.
		auto FAdd =
		B.buildInstr(TargetOpcode::G_FADD, {LLT::scalar(64)}, {Copies[0], Copies[1]},
		MachineInstr::MIFlag::FmNsz);

		// Should not propagate the flags of src instruction.
		auto FNeg0 =
		B.buildInstr(TargetOpcode::G_FNEG, {LLT::scalar(64)}, {FAdd.getReg(0)},
		{MachineInstr::MIFlag::FmArcp});

		// Preserve the one flag.
		auto FNeg1 =
		B.buildInstr(TargetOpcode::G_FNEG, {LLT::scalar(64)}, {Copies[0]},
		MachineInstr::MIFlag::FmNoInfs);

		AInfo Info(MF->getSubtarget());
		DummyGISelObserver Observer;
		LegalizerHelper Helper(*MF, Info, Observer, B);
		// Perform Legalization
		EXPECT_EQ(LegalizerHelper::LegalizeResult::Legalized,
		Helper.lower(*FNeg0, 0, LLT::scalar(64)));
		EXPECT_EQ(LegalizerHelper::LegalizeResult::Legalized,
		Helper.lower(*FNeg1, 0, LLT::scalar(64)));

		auto CheckStr = R"(
		CHECK: [[FADD:%[0-9]+]]:_(s64) = nsz G_FADD %0:_, %1:_
		CHECK: [[CONST0:%[0-9]+]]:_(s64) = G_FCONSTANT double -0.000000e+00
		CHECK: [[FSUB0:%[0-9]+]]:_(s64) = arcp G_FSUB [[CONST0]]:_, [[FADD]]:_
		CHECK: [[CONST1:%[0-9]+]]:_(s64) = G_FCONSTANT double -0.000000e+00
		CHECK: [[FSUB1:%[0-9]+]]:_(s64) = ninf G_FSUB [[CONST1]]:_, %0:_
		)";

		// Check
		EXPECT_TRUE(CheckMachineFunction(MF, CheckStr)) << MF;
		}
} // namespace		} // namespace

This is an archive of the discontinued LLVM Phabricator instance.

GlobalISel: Don't lose fneg flags when lowering to fsubClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 205108

lib/CodeGen/GlobalISel/LegalizerHelper.cpp

test/CodeGen/AMDGPU/GlobalISel/irtranslator-fast-math-flags.ll

test/CodeGen/AMDGPU/GlobalISel/legalize-fsub.mir

unittests/CodeGen/GlobalISel/LegalizerHelperTest.cpp

GlobalISel: Don't lose fneg flags when lowering to fsub
ClosedPublic