This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/GlobalISel/
-
CodeGen/
-
GlobalISel/
3/6
Utils.cpp
-
test/CodeGen/AMDGPU/GlobalISel/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
-
fmed3.ll
1/2
legalize-fmaxnum.mir
5/8
legalize-fminnum.mir

Differential D91716

AMDGPU/GlobalISel: Calculate isKnownNeverNaN for fminnum and fmaxnum
ClosedPublic

Authored by Petar.Avramovic on Nov 18 2020, 8:29 AM.

Download Raw Diff

Details

Reviewers

arsenm
foad

Commits

rGf0d65f40968d: AMDGPU/GlobalISel: Calculate isKnownNeverNaN for fminnum and fmaxnum

Summary

Implements same logis as in SelectionDAG.
G_FMINNUM_IEEE and G_FMAXNUM_IEEE are never SNaN by definition and
never NaN when one operand is known non-NaN and other known non-SNaN.
G_FMINNUM_IEEE and G_FMAXNUM_IEEE are never NaN/SNaN when one of the
operands is known non-NaN/SNaN.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Petar.Avramovic created this revision.Nov 18 2020, 8:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2020, 8:29 AM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 8 others. · View Herald Transcript

Petar.Avramovic requested review of this revision.Nov 18 2020, 8:29 AM

Herald added a subscriber: wdng. · View Herald TranscriptNov 18 2020, 8:29 AM

arsenm added inline comments.Nov 18 2020, 8:32 AM

llvm/lib/CodeGen/GlobalISel/Utils.cpp
533–539	Generic opcodes should not have target dependent semantics like this, especially since these already have semantics inherited from the IR/C fmin/fmax definition. They return the non-nan argument regardless of snan or not

Petar.Avramovic added inline comments.Nov 18 2020, 8:34 AM

llvm/lib/CodeGen/GlobalISel/Utils.cpp
537–538	Can this then be `return true`?

arsenm added inline comments.Nov 18 2020, 8:37 AM

llvm/lib/CodeGen/GlobalISel/Utils.cpp

537–538

No, it depends on the inputs. You can refer to the existing SelectionDAG::isKnownNeverNaN:

// Only one needs to be known not-nan, since it will be returned if the
// other ends up being one.
return isKnownNeverNaN(Op.getOperand(0), SNaN, Depth + 1) ||
       isKnownNeverNaN(Op.getOperand(1), SNaN, Depth + 1);

Petar.Avramovic added inline comments.Nov 18 2020, 8:49 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fminnum.mir
823	return isKnownNeverNaN(Op.getOperand(0), SNaN, Depth + 1) \|\| isKnownNeverNaN(Op.getOperand(1), SNaN, Depth + 1); } This is legalized first. `%2` is isKnownNeverSNaN with IEEE = true, but since inputs for `%2:_(s32) = G_FMAXNUM %0, %1` are not yet canonicalized isKnownNeverSNaN will fail and canonicalize is inserted. However since we know what will happen with `'%2:_(s32) = G_FMAXNUM %0, %1` we declare it isKnownNeverSNaN to not depend on order of legalization.

arsenm added inline comments.Nov 18 2020, 8:50 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fminnum.mir
823	The legalization order doesn't matter. These operations have their own independent semantics. isKnownNeverNaN needs to understand both pairs

Petar.Avramovic added inline comments.Nov 18 2020, 9:05 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fminnum.mir
818–820	Using only that formula this will not change and we are left with unnecessary G_FCANONICALIZE(G_FMAXNUM_IEEE) which stops us from performing further combines both from td file and combiner. Do you have other suggestion how to avoid making G_FCANONICALIZE?
823	How can this be done? The legalization order doesn't matter. Formula above gives different result depending if input was legalized already or not. It would work if we would legalize uses first.

arsenm added inline comments.Nov 18 2020, 9:08 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fminnum.mir

823

The same way SelectionDAG handles this:

case ISD::FMINNUM_IEEE:
case ISD::FMAXNUM_IEEE: {
  if (SNaN)
    return true;
  // This can return a NaN if either operand is an sNaN, or if both operands
  // are NaN.
  return (isKnownNeverNaN(Op.getOperand(0), false, Depth + 1) &&
          isKnownNeverSNaN(Op.getOperand(1), Depth + 1)) ||
         (isKnownNeverNaN(Op.getOperand(1), false, Depth + 1) &&
          isKnownNeverSNaN(Op.getOperand(0), Depth + 1));
}

Petar.Avramovic added inline comments.Nov 18 2020, 9:26 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fminnum.mir
823	I meant for `FMINNUM`, writing same code as in SDAG doesn't deal with unnecessary canonicalize. `FMINNUM_IEEE` is the same (there is only `if (SNaN)` part atm).

arsenm added inline comments.Nov 18 2020, 9:31 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fminnum.mir
823	The lowering for fminnum to fminnum_ieee tries to avoid the canonicalize depending on the instruction/inputs. This is how SelectionDAG also does it. For this patch, it just needs to fully handle the correct semanics for the two pairs

Petar.Avramovic added inline comments.Nov 18 2020, 11:27 AM

llvm/lib/CodeGen/GlobalISel/Utils.cpp
533–539	Generic opcodes should not have target dependent semantics like this, especially since these already have semantics inherited from the IR/C fmin/fmax definition. I would say that only "is Known Never any NaN" is given. This patch does not change that (isKnownNeverNaN(SNaN = false)). They return the non-nan argument regardless of snan or not still stands. We are just aware of the way we lower instruction and use it in isKnownNeverSNaN only. This is only used for efficiency to avoid having to quiet "known non SNaN" into instruction that is about to be lowered.
537–538	I don't think return isKnownNeverNaN(DefMI->getOperand(1).getReg(), MRI, SNaN) \|\| isKnownNeverNaN(DefMI->getOperand(2).getReg(), MRI, SNaN); is enough for globalisel since it does not look at the whole picture. The formula above works when target naturally has FMINNUM instruction and knowing the inputs gives conclusion about the output (one input is not nan result is not nan). This would be generic formula (FIXME, use this formula in isKnownNeverNaNForMI instead of return false). But subtarget with IEEE=true does not have FMINNUM and has to lower it to FMINNUM_IEEE. This gives additional critical piece of information FMINNUM is never SNaN with IEEE=true. Are test changes correct?
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fminnum.mir
823	About examining the input: We are likely examining inputs in the middle of the legalization. With IEEE=true fminnum will never be the input but fminnum_ieee. Recursively asking for fminnum's inputs is not correct in this context since (maybe quieted)inputs will go to fminnum_ieee (its result is never SNaN) not fminnum and this is target dependent. I don't see how this breaks correct semantics.

arsenm added inline comments.Nov 18 2020, 11:45 AM

llvm/lib/CodeGen/GlobalISel/Utils.cpp
537–538	The single instruction is the whole picture and does not depend on the target. These have defined, universal NAN handling semantics regardless of however they are lowered. This patch should only aim to faithfully implement these semantics. Ensuring we don't have redundant quieting canonicalizes is an optimization beyond this patch. As an optimization, the expansion to the IEEE version tries to avoid inserting a quiet, which mostly works well enough. We are missing some optimization hints. I have long wished that we had separate no-qnan and no-snan fast math flags (as well as renaming the IR intrinsics to match the real fmin/fmax semantics, and introduce a new pair for the IEEE-758 2008 semantics)

reverse ping

Removing target dependent semantics.

Petar.Avramovic added inline comments.Dec 18 2020, 8:18 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fmaxnum.mir
721	Thoughts on adding %0:_(s32) = G_FMINNUM_IEEE/G_FMAXNUM_IEEE ... %1:_(s32) = G_FCANONICALIZE %0 -> %0:_(s32) = G_FMINNUM_IEEE/G_FMAXNUM_IEEE ... %1:_(s32) = COPY %0 combine post legalizer?

arsenm added inline comments.Dec 18 2020, 9:36 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fmaxnum.mir
721	Yes, redundant canonicalization elimination belongs post legalizer

Ping.

arsenm accepted this revision.Feb 12 2021, 7:40 AM

This revision is now accepted and ready to land.Feb 12 2021, 7:40 AM

Closed by commit rGf0d65f40968d: AMDGPU/GlobalISel: Calculate isKnownNeverNaN for fminnum and fmaxnum (authored by Petar.Avramovic). · Explain WhyFeb 12 2021, 8:15 AM

This revision was automatically updated to reflect the committed changes.

Petar.Avramovic added a commit: rGf0d65f40968d: AMDGPU/GlobalISel: Calculate isKnownNeverNaN for fminnum and fmaxnum.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

GlobalISel/

Utils.cpp

23 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

fmed3.ll

24 lines

legalize-fmaxnum.mir

30 lines

legalize-fminnum.mir

30 lines

Diff 323339

llvm/lib/CodeGen/GlobalISel/Utils.cpp

Show First 20 Lines • Show All 493 Lines • ▼ Show 20 Lines	bool llvm::isKnownNeverNaN(Register Val, const MachineRegisterInfo &MRI,

if (DefMI->getOpcode() == TargetOpcode::G_BUILD_VECTOR) {		if (DefMI->getOpcode() == TargetOpcode::G_BUILD_VECTOR) {
for (const auto &Op : DefMI->uses())		for (const auto &Op : DefMI->uses())
if (!isKnownNeverNaN(Op.getReg(), MRI, SNaN))		if (!isKnownNeverNaN(Op.getReg(), MRI, SNaN))
return false;		return false;
return true;		return true;
}		}

		switch (DefMI->getOpcode()) {
		default:
		break;
		case TargetOpcode::G_FMINNUM_IEEE:
		case TargetOpcode::G_FMAXNUM_IEEE: {
		if (SNaN)
		return true;
		// This can return a NaN if either operand is an sNaN, or if both operands
		// are NaN.
		return (isKnownNeverNaN(DefMI->getOperand(1).getReg(), MRI) &&
		isKnownNeverSNaN(DefMI->getOperand(2).getReg(), MRI)) \|\|
		(isKnownNeverSNaN(DefMI->getOperand(1).getReg(), MRI) &&
		isKnownNeverNaN(DefMI->getOperand(2).getReg(), MRI));
		}
		case TargetOpcode::G_FMINNUM:
		case TargetOpcode::G_FMAXNUM: {
		// Only one needs to be known not-nan, since it will be returned if the
		// other ends up being one.
		return isKnownNeverNaN(DefMI->getOperand(1).getReg(), MRI, SNaN) \|\|
		isKnownNeverNaN(DefMI->getOperand(2).getReg(), MRI, SNaN);
		}
		}

if (SNaN) {		if (SNaN) {
// FP operations quiet. For now, just handle the ones inserted during		// FP operations quiet. For now, just handle the ones inserted during
// legalization.		// legalization.
switch (DefMI->getOpcode()) {		switch (DefMI->getOpcode()) {
case TargetOpcode::G_FPEXT:		case TargetOpcode::G_FPEXT:
case TargetOpcode::G_FPTRUNC:		case TargetOpcode::G_FPTRUNC:
case TargetOpcode::G_FCANONICALIZE:		case TargetOpcode::G_FCANONICALIZE:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

return false;		return false;
		Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions Can this then be `return true`? Petar.Avramovic: Can this then be `return true`?
		arsenmUnsubmitted Not Done Reply Inline Actions No, it depends on the inputs. You can refer to the existing SelectionDAG::isKnownNeverNaN: // Only one needs to be known not-nan, since it will be returned if the // other ends up being one. return isKnownNeverNaN(Op.getOperand(0), SNaN, Depth + 1) \|\| isKnownNeverNaN(Op.getOperand(1), SNaN, Depth + 1); arsenm: No, it depends on the inputs. You can refer to the existing SelectionDAG::isKnownNeverNaN: ```…
		Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions I don't think return isKnownNeverNaN(DefMI->getOperand(1).getReg(), MRI, SNaN) \|\| isKnownNeverNaN(DefMI->getOperand(2).getReg(), MRI, SNaN); is enough for globalisel since it does not look at the whole picture. The formula above works when target naturally has FMINNUM instruction and knowing the inputs gives conclusion about the output (one input is not nan result is not nan). This would be generic formula (FIXME, use this formula in isKnownNeverNaNForMI instead of return false). But subtarget with IEEE=true does not have FMINNUM and has to lower it to FMINNUM_IEEE. This gives additional critical piece of information FMINNUM is never SNaN with IEEE=true. Are test changes correct? Petar.Avramovic: I don't think ``` return isKnownNeverNaN(DefMI->getOperand(1).getReg(), MRI, SNaN) \|\|…
		arsenmUnsubmitted Not Done Reply Inline Actions The single instruction is the whole picture and does not depend on the target. These have defined, universal NAN handling semantics regardless of however they are lowered. This patch should only aim to faithfully implement these semantics. Ensuring we don't have redundant quieting canonicalizes is an optimization beyond this patch. As an optimization, the expansion to the IEEE version tries to avoid inserting a quiet, which mostly works well enough. We are missing some optimization hints. I have long wished that we had separate no-qnan and no-snan fast math flags (as well as renaming the IR intrinsics to match the real fmin/fmax semantics, and introduce a new pair for the IEEE-758 2008 semantics) arsenm: The single instruction is the whole picture and does not depend on the target. These have…
}		}
		arsenmUnsubmitted Not Done Reply Inline Actions Generic opcodes should not have target dependent semantics like this, especially since these already have semantics inherited from the IR/C fmin/fmax definition. They return the non-nan argument regardless of snan or not arsenm: Generic opcodes should not have target dependent semantics like this, especially since these…
		Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions Generic opcodes should not have target dependent semantics like this, especially since these already have semantics inherited from the IR/C fmin/fmax definition. I would say that only "is Known Never any NaN" is given. This patch does not change that (isKnownNeverNaN(SNaN = false)). They return the non-nan argument regardless of snan or not still stands. We are just aware of the way we lower instruction and use it in isKnownNeverSNaN only. This is only used for efficiency to avoid having to quiet "known non SNaN" into instruction that is about to be lowered. Petar.Avramovic: > Generic opcodes should not have target dependent semantics like this, especially since these…

Align llvm::inferAlignFromPtrInfo(MachineFunction &MF,		Align llvm::inferAlignFromPtrInfo(MachineFunction &MF,
const MachinePointerInfo &MPO) {		const MachinePointerInfo &MPO) {
auto PSV = MPO.V.dyn_cast<const PseudoSourceValue *>();		auto PSV = MPO.V.dyn_cast<const PseudoSourceValue *>();
if (auto FSPV = dyn_cast_or_null<FixedStackPseudoSourceValue>(PSV)) {		if (auto FSPV = dyn_cast_or_null<FixedStackPseudoSourceValue>(PSV)) {
MachineFrameInfo &MFI = MF.getFrameInfo();		MachineFrameInfo &MFI = MF.getFrameInfo();
return commonAlignment(MFI.getObjectAlign(FSPV->getFrameIndex()),		return commonAlignment(MFI.getObjectAlign(FSPV->getFrameIndex()),
MPO.Offset);		MPO.Offset);
▲ Show 20 Lines • Show All 306 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/fmed3.ll

	Show First 20 Lines • Show All 426 Lines • ▼ Show 20 Lines
	; SI-NEXT: buffer_load_dword v3, v[0:1], s[8:11], 0 addr64 glc			; SI-NEXT: buffer_load_dword v3, v[0:1], s[8:11], 0 addr64 glc
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: s_mov_b64 s[8:9], s[6:7]			; SI-NEXT: s_mov_b64 s[8:9], s[6:7]
	; SI-NEXT: buffer_load_dword v4, v[0:1], s[8:11], 0 addr64 glc			; SI-NEXT: buffer_load_dword v4, v[0:1], s[8:11], 0 addr64 glc
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_add_f32_e32 v2, 1.0, v2			; SI-NEXT: v_add_f32_e32 v2, 1.0, v2
	; SI-NEXT: v_add_f32_e32 v3, 2.0, v3			; SI-NEXT: v_add_f32_e32 v3, 2.0, v3
	; SI-NEXT: v_add_f32_e32 v4, 4.0, v4			; SI-NEXT: v_add_f32_e32 v4, 4.0, v4
	; SI-NEXT: v_min_f32_e32 v5, v2, v3			; SI-NEXT: v_med3_f32 v2, v2, v3, v4
	; SI-NEXT: v_max_f32_e32 v2, v2, v3
	; SI-NEXT: v_mul_f32_e32 v2, 1.0, v2
	; SI-NEXT: v_min_f32_e32 v2, v2, v4
	; SI-NEXT: v_mul_f32_e32 v3, 1.0, v5
	; SI-NEXT: v_mul_f32_e32 v2, 1.0, v2
	; SI-NEXT: v_max_f32_e32 v2, v3, v2
	; SI-NEXT: s_mov_b64 s[2:3], s[10:11]			; SI-NEXT: s_mov_b64 s[2:3], s[10:11]
	; SI-NEXT: buffer_store_dword v2, v[0:1], s[0:3], 0 addr64			; SI-NEXT: buffer_store_dword v2, v[0:1], s[0:3], 0 addr64
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: v_nnan_inputs_med3_f32_pat0:			; VI-LABEL: v_nnan_inputs_med3_f32_pat0:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24			; VI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24
	; VI-NEXT: v_lshlrev_b32_e32 v6, 2, v0			; VI-NEXT: v_lshlrev_b32_e32 v6, 2, v0
	Show All 17 Lines
	; VI-NEXT: flat_load_dword v3, v[4:5] glc			; VI-NEXT: flat_load_dword v3, v[4:5] glc
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v0, s0			; VI-NEXT: v_mov_b32_e32 v0, s0
	; VI-NEXT: v_mov_b32_e32 v1, s1			; VI-NEXT: v_mov_b32_e32 v1, s1
	; VI-NEXT: v_add_u32_e32 v0, vcc, v0, v6			; VI-NEXT: v_add_u32_e32 v0, vcc, v0, v6
	; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; VI-NEXT: v_add_f32_e32 v4, 1.0, v7			; VI-NEXT: v_add_f32_e32 v4, 1.0, v7
	; VI-NEXT: v_add_f32_e32 v2, 2.0, v2			; VI-NEXT: v_add_f32_e32 v2, 2.0, v2
	; VI-NEXT: v_min_f32_e32 v5, v4, v2
	; VI-NEXT: v_max_f32_e32 v2, v4, v2
	; VI-NEXT: v_add_f32_e32 v3, 4.0, v3			; VI-NEXT: v_add_f32_e32 v3, 4.0, v3
	; VI-NEXT: v_mul_f32_e32 v2, 1.0, v2			; VI-NEXT: v_med3_f32 v2, v4, v2, v3
	; VI-NEXT: v_min_f32_e32 v2, v2, v3
	; VI-NEXT: v_mul_f32_e32 v3, 1.0, v5
	; VI-NEXT: v_mul_f32_e32 v2, 1.0, v2
	; VI-NEXT: v_max_f32_e32 v2, v3, v2
	; VI-NEXT: flat_store_dword v[0:1], v2			; VI-NEXT: flat_store_dword v[0:1], v2
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: v_nnan_inputs_med3_f32_pat0:			; GFX9-LABEL: v_nnan_inputs_med3_f32_pat0:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24
	; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dword v1, v0, s[2:3] glc			; GFX9-NEXT: global_load_dword v1, v0, s[2:3] glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: global_load_dword v2, v0, s[4:5] glc			; GFX9-NEXT: global_load_dword v2, v0, s[4:5] glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: global_load_dword v3, v0, s[6:7] glc			; GFX9-NEXT: global_load_dword v3, v0, s[6:7] glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1			; GFX9-NEXT: v_add_f32_e32 v1, 1.0, v1
	; GFX9-NEXT: v_add_f32_e32 v2, 2.0, v2			; GFX9-NEXT: v_add_f32_e32 v2, 2.0, v2
	; GFX9-NEXT: v_min_f32_e32 v4, v1, v2
	; GFX9-NEXT: v_max_f32_e32 v1, v1, v2
	; GFX9-NEXT: v_add_f32_e32 v3, 4.0, v3			; GFX9-NEXT: v_add_f32_e32 v3, 4.0, v3
	; GFX9-NEXT: v_max_f32_e32 v1, v1, v1			; GFX9-NEXT: v_med3_f32 v1, v1, v2, v3
	; GFX9-NEXT: v_min_f32_e32 v1, v1, v3
	; GFX9-NEXT: v_max_f32_e32 v2, v4, v4
	; GFX9-NEXT: v_max_f32_e32 v1, v1, v1
	; GFX9-NEXT: v_max_f32_e32 v1, v2, v1
	; GFX9-NEXT: global_store_dword v0, v1, s[0:1]			; GFX9-NEXT: global_store_dword v0, v1, s[0:1]
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid			%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
	%gep1 = getelementptr float, float addrspace(1)* %bptr, i32 %tid			%gep1 = getelementptr float, float addrspace(1)* %bptr, i32 %tid
	%gep2 = getelementptr float, float addrspace(1)* %cptr, i32 %tid			%gep2 = getelementptr float, float addrspace(1)* %cptr, i32 %tid
	%outgep = getelementptr float, float addrspace(1)* %out, i32 %tid			%outgep = getelementptr float, float addrspace(1)* %out, i32 %tid
	%a = load volatile float, float addrspace(1)* %gep0			%a = load volatile float, float addrspace(1)* %gep0
	▲ Show 20 Lines • Show All 153 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fmaxnum.mir

Show First 20 Lines • Show All 712 Lines • ▼ Show 20 Lines	bb.0:
; VI: $vgpr0 = COPY [[FMAXNUM_IEEE1]](s32)		; VI: $vgpr0 = COPY [[FMAXNUM_IEEE1]](s32)
; GFX9-LABEL: name: test_fmaxnum_with_fmaxnum_argument_s32_ieee_mode_on		; GFX9-LABEL: name: test_fmaxnum_with_fmaxnum_argument_s32_ieee_mode_on
; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX9: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]		; GFX9: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]
; GFX9: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]		; GFX9: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]
; GFX9: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE]], [[FCANONICALIZE1]]		; GFX9: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE]], [[FCANONICALIZE1]]
; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2		; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
; GFX9: [[FCANONICALIZE2:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[FMAXNUM_IEEE]]		; GFX9: [[FCANONICALIZE2:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[FMAXNUM_IEEE]]
		Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions Thoughts on adding %0:_(s32) = G_FMINNUM_IEEE/G_FMAXNUM_IEEE ... %1:_(s32) = G_FCANONICALIZE %0 -> %0:_(s32) = G_FMINNUM_IEEE/G_FMAXNUM_IEEE ... %1:_(s32) = COPY %0 combine post legalizer? Petar.Avramovic: Thoughts on adding ``` %0:_(s32) = G_FMINNUM_IEEE/G_FMAXNUM_IEEE ... %1:_(s32) =…
		arsenmUnsubmitted Not Done Reply Inline Actions Yes, redundant canonicalization elimination belongs post legalizer arsenm: Yes, redundant canonicalization elimination belongs post legalizer
; GFX9: [[FCANONICALIZE3:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY2]]		; GFX9: [[FCANONICALIZE3:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY2]]
; GFX9: [[FMAXNUM_IEEE1:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE2]], [[FCANONICALIZE3]]		; GFX9: [[FMAXNUM_IEEE1:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE2]], [[FCANONICALIZE3]]
; GFX9: $vgpr0 = COPY [[FMAXNUM_IEEE1]](s32)		; GFX9: $vgpr0 = COPY [[FMAXNUM_IEEE1]](s32)
%0:_(s32) = COPY $vgpr0		%0:_(s32) = COPY $vgpr0
%1:_(s32) = COPY $vgpr1		%1:_(s32) = COPY $vgpr1
%2:_(s32) = G_FMAXNUM %0, %1		%2:_(s32) = G_FMAXNUM %0, %1
%3:_(s32) = COPY $vgpr2		%3:_(s32) = COPY $vgpr2
%4:_(s32) = G_FMAXNUM %2, %3		%4:_(s32) = G_FMAXNUM %2, %3
Show All 10 Lines	bb.0:
liveins: $vgpr0, $vgpr1		liveins: $vgpr0, $vgpr1

; SI-LABEL: name: test_fmaxnum_with_nonNaN_fmaxnum_argument_s32_ieee_mode_on		; SI-LABEL: name: test_fmaxnum_with_nonNaN_fmaxnum_argument_s32_ieee_mode_on
; SI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0		; SI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; SI: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00		; SI: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00
; SI: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]		; SI: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]
; SI: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE]], [[C]]		; SI: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE]], [[C]]
; SI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1		; SI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
; SI: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[FMAXNUM_IEEE]]		; SI: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]
; SI: [[FCANONICALIZE2:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]		; SI: [[FMAXNUM_IEEE1:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FMAXNUM_IEEE]], [[FCANONICALIZE1]]
; SI: [[FMAXNUM_IEEE1:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE1]], [[FCANONICALIZE2]]
; SI: $vgpr0 = COPY [[FMAXNUM_IEEE1]](s32)		; SI: $vgpr0 = COPY [[FMAXNUM_IEEE1]](s32)
; VI-LABEL: name: test_fmaxnum_with_nonNaN_fmaxnum_argument_s32_ieee_mode_on		; VI-LABEL: name: test_fmaxnum_with_nonNaN_fmaxnum_argument_s32_ieee_mode_on
; VI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0		; VI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; VI: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00		; VI: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00
; VI: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]		; VI: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]
; VI: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE]], [[C]]		; VI: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE]], [[C]]
; VI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1		; VI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
; VI: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[FMAXNUM_IEEE]]		; VI: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]
; VI: [[FCANONICALIZE2:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]		; VI: [[FMAXNUM_IEEE1:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FMAXNUM_IEEE]], [[FCANONICALIZE1]]
; VI: [[FMAXNUM_IEEE1:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE1]], [[FCANONICALIZE2]]
; VI: $vgpr0 = COPY [[FMAXNUM_IEEE1]](s32)		; VI: $vgpr0 = COPY [[FMAXNUM_IEEE1]](s32)
; GFX9-LABEL: name: test_fmaxnum_with_nonNaN_fmaxnum_argument_s32_ieee_mode_on		; GFX9-LABEL: name: test_fmaxnum_with_nonNaN_fmaxnum_argument_s32_ieee_mode_on
; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX9: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00		; GFX9: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00
; GFX9: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]		; GFX9: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]
; GFX9: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE]], [[C]]		; GFX9: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE]], [[C]]
; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX9: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[FMAXNUM_IEEE]]		; GFX9: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]
; GFX9: [[FCANONICALIZE2:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]		; GFX9: [[FMAXNUM_IEEE1:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FMAXNUM_IEEE]], [[FCANONICALIZE1]]
; GFX9: [[FMAXNUM_IEEE1:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE1]], [[FCANONICALIZE2]]
; GFX9: $vgpr0 = COPY [[FMAXNUM_IEEE1]](s32)		; GFX9: $vgpr0 = COPY [[FMAXNUM_IEEE1]](s32)
%0:_(s32) = COPY $vgpr0		%0:_(s32) = COPY $vgpr0
%1:_(s32) = G_FCONSTANT float 0.000000e+00		%1:_(s32) = G_FCONSTANT float 0.000000e+00
%2:_(s32) = G_FMAXNUM %0, %1		%2:_(s32) = G_FMAXNUM %0, %1
%3:_(s32) = COPY $vgpr1		%3:_(s32) = COPY $vgpr1
%4:_(s32) = G_FMAXNUM %2, %3		%4:_(s32) = G_FMAXNUM %2, %3
$vgpr0 = COPY %4		$vgpr0 = COPY %4
...		...
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	bb.0:
liveins: $vgpr0, $vgpr1		liveins: $vgpr0, $vgpr1

; SI-LABEL: name: test_fmaxnum_with_nonNaN_fminnum_argument_s32_ieee_mode_on		; SI-LABEL: name: test_fmaxnum_with_nonNaN_fminnum_argument_s32_ieee_mode_on
; SI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0		; SI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; SI: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00		; SI: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00
; SI: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]		; SI: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]
; SI: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE]], [[C]]		; SI: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE]], [[C]]
; SI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1		; SI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
; SI: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[FMINNUM_IEEE]]		; SI: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]
; SI: [[FCANONICALIZE2:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]		; SI: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FMINNUM_IEEE]], [[FCANONICALIZE1]]
; SI: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE1]], [[FCANONICALIZE2]]
; SI: $vgpr0 = COPY [[FMAXNUM_IEEE]](s32)		; SI: $vgpr0 = COPY [[FMAXNUM_IEEE]](s32)
; VI-LABEL: name: test_fmaxnum_with_nonNaN_fminnum_argument_s32_ieee_mode_on		; VI-LABEL: name: test_fmaxnum_with_nonNaN_fminnum_argument_s32_ieee_mode_on
; VI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0		; VI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; VI: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00		; VI: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00
; VI: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]		; VI: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]
; VI: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE]], [[C]]		; VI: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE]], [[C]]
; VI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1		; VI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
; VI: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[FMINNUM_IEEE]]		; VI: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]
; VI: [[FCANONICALIZE2:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]		; VI: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FMINNUM_IEEE]], [[FCANONICALIZE1]]
; VI: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE1]], [[FCANONICALIZE2]]
; VI: $vgpr0 = COPY [[FMAXNUM_IEEE]](s32)		; VI: $vgpr0 = COPY [[FMAXNUM_IEEE]](s32)
; GFX9-LABEL: name: test_fmaxnum_with_nonNaN_fminnum_argument_s32_ieee_mode_on		; GFX9-LABEL: name: test_fmaxnum_with_nonNaN_fminnum_argument_s32_ieee_mode_on
; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX9: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00		; GFX9: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00
; GFX9: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]		; GFX9: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]
; GFX9: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE]], [[C]]		; GFX9: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE]], [[C]]
; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX9: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[FMINNUM_IEEE]]		; GFX9: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]
; GFX9: [[FCANONICALIZE2:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]		; GFX9: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FMINNUM_IEEE]], [[FCANONICALIZE1]]
; GFX9: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE1]], [[FCANONICALIZE2]]
; GFX9: $vgpr0 = COPY [[FMAXNUM_IEEE]](s32)		; GFX9: $vgpr0 = COPY [[FMAXNUM_IEEE]](s32)
%0:_(s32) = COPY $vgpr0		%0:_(s32) = COPY $vgpr0
%1:_(s32) = G_FCONSTANT float 0.000000e+00		%1:_(s32) = G_FCONSTANT float 0.000000e+00
%2:_(s32) = G_FMINNUM %0, %1		%2:_(s32) = G_FMINNUM %0, %1
%3:_(s32) = COPY $vgpr1		%3:_(s32) = COPY $vgpr1
%4:_(s32) = G_FMAXNUM %2, %3		%4:_(s32) = G_FMAXNUM %2, %3
$vgpr0 = COPY %4		$vgpr0 = COPY %4
...		...
▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fminnum.mir

Show First 20 Lines • Show All 739 Lines • ▼ Show 20 Lines	bb.0:
liveins: $vgpr0, $vgpr1		liveins: $vgpr0, $vgpr1

; SI-LABEL: name: test_fminnum_with_nonNaN_fminnum_argument_s32_ieee_mode_on		; SI-LABEL: name: test_fminnum_with_nonNaN_fminnum_argument_s32_ieee_mode_on
; SI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0		; SI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; SI: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00		; SI: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00
; SI: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]		; SI: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]
; SI: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE]], [[C]]		; SI: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE]], [[C]]
; SI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1		; SI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
; SI: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[FMINNUM_IEEE]]		; SI: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]
; SI: [[FCANONICALIZE2:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]		; SI: [[FMINNUM_IEEE1:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FMINNUM_IEEE]], [[FCANONICALIZE1]]
; SI: [[FMINNUM_IEEE1:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE1]], [[FCANONICALIZE2]]
; SI: $vgpr0 = COPY [[FMINNUM_IEEE1]](s32)		; SI: $vgpr0 = COPY [[FMINNUM_IEEE1]](s32)
; VI-LABEL: name: test_fminnum_with_nonNaN_fminnum_argument_s32_ieee_mode_on		; VI-LABEL: name: test_fminnum_with_nonNaN_fminnum_argument_s32_ieee_mode_on
; VI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0		; VI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; VI: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00		; VI: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00
; VI: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]		; VI: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]
; VI: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE]], [[C]]		; VI: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE]], [[C]]
; VI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1		; VI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
; VI: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[FMINNUM_IEEE]]		; VI: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]
; VI: [[FCANONICALIZE2:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]		; VI: [[FMINNUM_IEEE1:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FMINNUM_IEEE]], [[FCANONICALIZE1]]
; VI: [[FMINNUM_IEEE1:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE1]], [[FCANONICALIZE2]]
; VI: $vgpr0 = COPY [[FMINNUM_IEEE1]](s32)		; VI: $vgpr0 = COPY [[FMINNUM_IEEE1]](s32)
; GFX9-LABEL: name: test_fminnum_with_nonNaN_fminnum_argument_s32_ieee_mode_on		; GFX9-LABEL: name: test_fminnum_with_nonNaN_fminnum_argument_s32_ieee_mode_on
; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX9: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00		; GFX9: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00
; GFX9: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]		; GFX9: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]
; GFX9: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE]], [[C]]		; GFX9: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE]], [[C]]
; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX9: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[FMINNUM_IEEE]]		; GFX9: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]
; GFX9: [[FCANONICALIZE2:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]		; GFX9: [[FMINNUM_IEEE1:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FMINNUM_IEEE]], [[FCANONICALIZE1]]
; GFX9: [[FMINNUM_IEEE1:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE1]], [[FCANONICALIZE2]]
; GFX9: $vgpr0 = COPY [[FMINNUM_IEEE1]](s32)		; GFX9: $vgpr0 = COPY [[FMINNUM_IEEE1]](s32)
%0:_(s32) = COPY $vgpr0		%0:_(s32) = COPY $vgpr0
%1:_(s32) = G_FCONSTANT float 0.000000e+00		%1:_(s32) = G_FCONSTANT float 0.000000e+00
%2:_(s32) = G_FMINNUM %0, %1		%2:_(s32) = G_FMINNUM %0, %1
%3:_(s32) = COPY $vgpr1		%3:_(s32) = COPY $vgpr1
%4:_(s32) = G_FMINNUM %2, %3		%4:_(s32) = G_FMINNUM %2, %3
$vgpr0 = COPY %4		$vgpr0 = COPY %4
...		...
Show All 31 Lines	bb.0:
; VI: $vgpr0 = COPY [[FMINNUM_IEEE]](s32)		; VI: $vgpr0 = COPY [[FMINNUM_IEEE]](s32)
; GFX9-LABEL: name: test_fminnum_with_fmaxnum_argument_s32_ieee_mode_on		; GFX9-LABEL: name: test_fminnum_with_fmaxnum_argument_s32_ieee_mode_on
; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX9: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]		; GFX9: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]
; GFX9: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]		; GFX9: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]
; GFX9: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE]], [[FCANONICALIZE1]]		; GFX9: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE]], [[FCANONICALIZE1]]
; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2		; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
; GFX9: [[FCANONICALIZE2:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[FMAXNUM_IEEE]]		; GFX9: [[FCANONICALIZE2:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[FMAXNUM_IEEE]]
; GFX9: [[FCANONICALIZE3:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY2]]		; GFX9: [[FCANONICALIZE3:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY2]]
; GFX9: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE2]], [[FCANONICALIZE3]]		; GFX9: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE2]], [[FCANONICALIZE3]]
Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions Using only that formula this will not change and we are left with unnecessary G_FCANONICALIZE(G_FMAXNUM_IEEE) which stops us from performing further combines both from td file and combiner. Do you have other suggestion how to avoid making G_FCANONICALIZE? Petar.Avramovic: Using only that formula this will not change and we are left with unnecessary G_FCANONICALIZE…
; GFX9: $vgpr0 = COPY [[FMINNUM_IEEE]](s32)		; GFX9: $vgpr0 = COPY [[FMINNUM_IEEE]](s32)
%0:_(s32) = COPY $vgpr0		%0:_(s32) = COPY $vgpr0
%1:_(s32) = COPY $vgpr1		%1:_(s32) = COPY $vgpr1
%2:_(s32) = G_FMAXNUM %0, %1		%2:_(s32) = G_FMAXNUM %0, %1
%3:_(s32) = COPY $vgpr2		%3:_(s32) = COPY $vgpr2
%4:_(s32) = G_FMINNUM %2, %3		%4:_(s32) = G_FMINNUM %2, %3
		Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions return isKnownNeverNaN(Op.getOperand(0), SNaN, Depth + 1) \|\| isKnownNeverNaN(Op.getOperand(1), SNaN, Depth + 1); } This is legalized first. `%2` is isKnownNeverSNaN with IEEE = true, but since inputs for `%2:_(s32) = G_FMAXNUM %0, %1` are not yet canonicalized isKnownNeverSNaN will fail and canonicalize is inserted. However since we know what will happen with `'%2:_(s32) = G_FMAXNUM %0, %1` we declare it isKnownNeverSNaN to not depend on order of legalization. Petar.Avramovic: ``` return isKnownNeverNaN(Op.getOperand(0), SNaN, Depth + 1) \|\| isKnownNeverNaN(Op.
		arsenmUnsubmitted Not Done Reply Inline Actions The legalization order doesn't matter. These operations have their own independent semantics. isKnownNeverNaN needs to understand both pairs arsenm: The legalization order doesn't matter. These operations have their own independent semantics.
		Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions How can this be done? The legalization order doesn't matter. Formula above gives different result depending if input was legalized already or not. It would work if we would legalize uses first. Petar.Avramovic: How can this be done? >The legalization order doesn't matter. Formula above gives different…
		arsenmUnsubmitted Not Done Reply Inline Actions The same way SelectionDAG handles this: case ISD::FMINNUM_IEEE: case ISD::FMAXNUM_IEEE: { if (SNaN) return true; // This can return a NaN if either operand is an sNaN, or if both operands // are NaN. return (isKnownNeverNaN(Op.getOperand(0), false, Depth + 1) && isKnownNeverSNaN(Op.getOperand(1), Depth + 1)) \|\| (isKnownNeverNaN(Op.getOperand(1), false, Depth + 1) && isKnownNeverSNaN(Op.getOperand(0), Depth + 1)); } arsenm: The same way SelectionDAG handles this: ``` case ISD::FMINNUM_IEEE: case ISD::FMAXNUM_IEEE…
		Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions I meant for `FMINNUM`, writing same code as in SDAG doesn't deal with unnecessary canonicalize. `FMINNUM_IEEE` is the same (there is only `if (SNaN)` part atm). Petar.Avramovic: I meant for `FMINNUM`, writing same code as in SDAG doesn't deal with unnecessary canonicalize.
		arsenmUnsubmitted Not Done Reply Inline Actions The lowering for fminnum to fminnum_ieee tries to avoid the canonicalize depending on the instruction/inputs. This is how SelectionDAG also does it. For this patch, it just needs to fully handle the correct semanics for the two pairs arsenm: The lowering for fminnum to fminnum_ieee tries to avoid the canonicalize depending on the…
		Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions About examining the input: We are likely examining inputs in the middle of the legalization. With IEEE=true fminnum will never be the input but fminnum_ieee. Recursively asking for fminnum's inputs is not correct in this context since (maybe quieted)inputs will go to fminnum_ieee (its result is never SNaN) not fminnum and this is target dependent. I don't see how this breaks correct semantics. Petar.Avramovic: About examining the input: We are likely examining inputs in the middle of the legalization.
$vgpr0 = COPY %4		$vgpr0 = COPY %4
...		...

---		---
name: test_fminnum_with_nonNaN_fmaxnum_argument_s32_ieee_mode_on		name: test_fminnum_with_nonNaN_fmaxnum_argument_s32_ieee_mode_on
machineFunctionInfo:		machineFunctionInfo:
mode:		mode:
ieee: true		ieee: true
body: \|		body: \|
bb.0:		bb.0:
liveins: $vgpr0, $vgpr1		liveins: $vgpr0, $vgpr1

; SI-LABEL: name: test_fminnum_with_nonNaN_fmaxnum_argument_s32_ieee_mode_on		; SI-LABEL: name: test_fminnum_with_nonNaN_fmaxnum_argument_s32_ieee_mode_on
; SI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0		; SI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; SI: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00		; SI: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00
; SI: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]		; SI: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]
; SI: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE]], [[C]]		; SI: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE]], [[C]]
; SI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1		; SI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
; SI: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[FMAXNUM_IEEE]]		; SI: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]
; SI: [[FCANONICALIZE2:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]		; SI: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FMAXNUM_IEEE]], [[FCANONICALIZE1]]
; SI: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE1]], [[FCANONICALIZE2]]
; SI: $vgpr0 = COPY [[FMINNUM_IEEE]](s32)		; SI: $vgpr0 = COPY [[FMINNUM_IEEE]](s32)
; VI-LABEL: name: test_fminnum_with_nonNaN_fmaxnum_argument_s32_ieee_mode_on		; VI-LABEL: name: test_fminnum_with_nonNaN_fmaxnum_argument_s32_ieee_mode_on
; VI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0		; VI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; VI: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00		; VI: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00
; VI: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]		; VI: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]
; VI: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE]], [[C]]		; VI: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE]], [[C]]
; VI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1		; VI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
; VI: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[FMAXNUM_IEEE]]		; VI: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]
; VI: [[FCANONICALIZE2:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]		; VI: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FMAXNUM_IEEE]], [[FCANONICALIZE1]]
; VI: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE1]], [[FCANONICALIZE2]]
; VI: $vgpr0 = COPY [[FMINNUM_IEEE]](s32)		; VI: $vgpr0 = COPY [[FMINNUM_IEEE]](s32)
; GFX9-LABEL: name: test_fminnum_with_nonNaN_fmaxnum_argument_s32_ieee_mode_on		; GFX9-LABEL: name: test_fminnum_with_nonNaN_fmaxnum_argument_s32_ieee_mode_on
; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0		; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
; GFX9: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00		; GFX9: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 0.000000e+00
; GFX9: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]		; GFX9: [[FCANONICALIZE:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY]]
; GFX9: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE]], [[C]]		; GFX9: [[FMAXNUM_IEEE:%[0-9]+]]:_(s32) = G_FMAXNUM_IEEE [[FCANONICALIZE]], [[C]]
; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1		; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
; GFX9: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[FMAXNUM_IEEE]]		; GFX9: [[FCANONICALIZE1:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]
; GFX9: [[FCANONICALIZE2:%[0-9]+]]:_(s32) = G_FCANONICALIZE [[COPY1]]		; GFX9: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FMAXNUM_IEEE]], [[FCANONICALIZE1]]
; GFX9: [[FMINNUM_IEEE:%[0-9]+]]:_(s32) = G_FMINNUM_IEEE [[FCANONICALIZE1]], [[FCANONICALIZE2]]
; GFX9: $vgpr0 = COPY [[FMINNUM_IEEE]](s32)		; GFX9: $vgpr0 = COPY [[FMINNUM_IEEE]](s32)
%0:_(s32) = COPY $vgpr0		%0:_(s32) = COPY $vgpr0
%1:_(s32) = G_FCONSTANT float 0.000000e+00		%1:_(s32) = G_FCONSTANT float 0.000000e+00
%2:_(s32) = G_FMAXNUM %0, %1		%2:_(s32) = G_FMAXNUM %0, %1
%3:_(s32) = COPY $vgpr1		%3:_(s32) = COPY $vgpr1
%4:_(s32) = G_FMINNUM %2, %3		%4:_(s32) = G_FMINNUM %2, %3
$vgpr0 = COPY %4		$vgpr0 = COPY %4
...		...
▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Calculate isKnownNeverNaN for fminnum and fmaxnumClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 323339

llvm/lib/CodeGen/GlobalISel/Utils.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/fmed3.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fmaxnum.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fminnum.mir

AMDGPU/GlobalISel: Calculate isKnownNeverNaN for fminnum and fmaxnum
ClosedPublic