This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Lower G_FREM
ClosedPublic

Authored by Petar.Avramovic on Jul 22 2020, 7:07 AM.

Download Raw Diff

Details

Reviewers

foad
arsenm

Commits

rG0d58d9e8fb93: AMDGPU/GlobalISel: Lower G_FREM

Summary

Add custom lower for G_FREM.

Diff Detail

Event Timeline

Petar.Avramovic created this revision.Jul 22 2020, 7:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2020, 7:07 AM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 9 others. · View Herald Transcript

arsenm added inline comments.Jul 22 2020, 7:12 AM

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
2737	buildFDiv? These are all dropping the flags too
2738	buildFFloor?
2738	Is this a correct handling of frem? The AMDGPU dag expansion uses ISD::FTRUNC, but I'm not sure that was ever correct

I tried fixing the existing one to use ffloor instead of ftrunc; OpenCL conformance still fails when I plug frem into fmod

Petar.Avramovic marked an inline comment as done.Jul 22 2020, 8:22 AM

Petar.Avramovic added inline comments.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
2738	The G_FPTRUNC complains about src and dst being same size, I hit assert(DstTy.getSizeInBits() < SrcTy.getSizeInBits() && "invalid widening trunc"); from the variable name I thought that FFloor could work but I guess that it works only when operands have same sign. (btw vulkan cts tests where I saw this passed). Dag expansion seems correct from the description of fmod/frem. This generic instruction here should discard digits after decimal point, do we have such instruction?

arsenm added inline comments.Jul 22 2020, 8:27 AM

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
2738	ISD::FTRUNC is G_INTRINSIC_TRUNC I'm not really clear on what frem really is, or if it's really supposed to be the same as OpenCL fmod

Preserve flags, and use G_INTRINSIC_TRUNC.

I don't think copy the DAG path was necessarily the right choice. The correct thing to do might be to make the DAG path use floor? Is either even correct if this fails conformance?

I don't think this should go in generic code unless we're more sure this is the correct operation

In D84324#2168127, @arsenm wrote:

I don't think copy the DAG path was necessarily the right choice. The correct thing to do might be to make the DAG path use floor? Is either even correct if this fails conformance?

I don't think this should go in generic code unless we're more sure this is the correct operation

I am convinced that trunc (not floor) is what you need here to implement IR's frem instruction, where the result has the same sign as the dividend (same as the C library fmod).

See also the OpenCL fmod spec which is pretty clear on this: http://man.opencl.org/fmod.html

LGTM if Matt has no further comments.

This revision is now accepted and ready to land.Jul 23 2020, 7:00 AM

I'd still like to understand why this is failing conformance if I use frem for opencl fmod. My current suspicion is the fsub + fmul really needs to be an FMA

This revision now requires changes to proceed.Jul 23 2020, 7:03 AM

I'd still like to understand why this is failing conformance if I use frem for opencl fmod.

What is the alternative to using frem, that passes conformance?

In D84324#2169289, @foad wrote:

I'd still like to understand why this is failing conformance if I use frem for opencl fmod.

What is the alternative to using frem, that passes conformance?

A huge expansion that involves loops:
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/ocml/src/remainderF_base.h#L38

In D84324#2169293, @arsenm wrote:

In D84324#2169289, @foad wrote:

I'd still like to understand why this is failing conformance if I use frem for opencl fmod.

What is the alternative to using frem, that passes conformance?

A huge expansion that involves loops:
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/ocml/src/remainderF_base.h#L38

Then it needs debugging. Perhaps there are cases where the simple expansion gives fmod(x,y)==y, even though the result is supposed to have magnitude strictly less than y. Or perhaps it doesn't handle nans or infinities correctly.

In D84324#2169348, @foad wrote:

In D84324#2169293, @arsenm wrote:

In D84324#2169289, @foad wrote:

I'd still like to understand why this is failing conformance if I use frem for opencl fmod.

What is the alternative to using frem, that passes conformance?

A huge expansion that involves loops:
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/ocml/src/remainderF_base.h#L38

Then it needs debugging. Perhaps there are cases where the simple expansion gives fmod(x,y)==y, even though the result is supposed to have magnitude strictly less than y. Or perhaps it doesn't handle nans or infinities correctly.

The errors aren't small, and aren't just edge cases:
ERROR: fmod: inf ulp error at {-0x1.7a1ba8p+111 (0xf73d0dd4), -0x1.5b9526p-97 (0x8f2dca93)}: *-0x1.c5f348p-98 vs. inf (0x7f800000) at index: 3

ERROR: fmod: -inf ulp error at {0x1.80bb0ep+70 (0x62c05d87), 0x1.08e51ap-82 (0x1684728d)}: *0x1.9d1d8cp-83 vs. -inf (0xff800000) at index: 0

ERROR: fmod: -134961856.000000 ulp error at {-0x1.f47464p-69 (0x9d7a3a32), -0x1.9bdef4p-97 (0x8f4def7a)}: *-0x1.682a78p-98 vs. -0x1.17eep-94 (0x908bf700) at index: 0

ERROR: fmod: inf ulp error at {-0x1.50c0b6p+67 (0xe128605b), 0x1.80b7ep-90 (0x12c05bf0)}: *-0x1.fa594p-91 vs. inf (0x7f800000) at index: 3

ERROR: fmod: -672311475662299076755456.000000 ulp error at {0x1.f8807ep+111 (0x777c403f), -0x1.7711aap+32 (0xcfbb88d5)}: *0x1.b2ab38p+30 vs. -0x1.1cbc28p+86 (0xea8e5e14) at index: 1

ERROR: fmod: 20258841443692227182914624580021649408.000000 ulp error at {-0x1.0107a4p+41 (0xd40083d2), -0x1.144c6ap-86 (0x948a2635)}: *-0x1.85d4acp-87 vs. 0x1.e7b6cp+13 (0x4673db60) at index: 2

ERROR: fmod: -29506071830531670016.000000 ulp error at {0x1.804b2cp+9 (0x44402596), -0x1.477f18p-57 (0xa323bf8c)}: *0x1.41dc88p-57 vs. -0x1.997aap-16 (0xb7ccbd50) at index: 4

ERROR: fmod: 211623838063271919251058575015936.000000 ulp error at {0x1.4d90a4p+41 (0x5426c852), 0x1.96784ap-65 (0x1f4b3c25)}: *0x1.61a4fp-68 vs. 0x1.4de23p+16 (0x47a6f118) at index: 3

ERROR: fmod: -inf ulp error at {0x1.9aedb8p+83 (0x694d76dc), 0x1.4c23f2p-119 (0x42611f9)}: *0x1.bba984p-120 vs. -inf (0xff800000) at index: 0

ERROR: fmod: 100959080964579999364158242467109404672.000000 ulp error at {0x1.aa412p+26 (0x4cd52090), -0x1.92be06p-101 (0x8d495f03)}: *0x1.59267p-101 vs. 0x1.2fd00cp+2 (0x4097e806) at index: 0

ERROR: fmod: inf ulp error at {-0x1.6ebdap+99 (0xf1375ed0), -0x1.fe72bp-36 (0xadff3958)}: *-0x1.39174p-38 vs. inf (0x7f800000) at index: 0

ERROR: fmod: -101767765295104.000000 ulp error at {-0x1.7f0c94p+21 (0xca3f864a), 0x1.8df2eap-29 (0x3146f975)}: *-0x1.191792p-29 vs. -0x1.723aap-6 (0xbcb91d50) at index: 4

ERROR: fmod: 5685162310369280.000000 ulp error at {-0x1.8ccd58p-14 (0xb8c666ac), 0x1.b15ed6p-71 (0x1c58af6b)}: *-0x1.b6d34p-74 vs. 0x1.432ap-45 (0x29219500) at index: 2

ERROR: fmod: inf ulp error at {-0x1.5e45dep+81 (0xe82f22ef), -0x1.4bae4cp-112 (0x87a5d726)}: *-0x1.56f99p-114 vs. inf (0x7f800000) at index: 1

ERROR: fmod: -inf ulp error at {0x1.cd2da6p+126 (0x7ee696d3), -0x1.647a26p-40 (0xabb23d13)}: *0x1.03ad2ap-40 vs. -inf (0xff800000) at index: 3

ERROR: fmod: -50753958115442425856.000000 ulp error at {-0x1.abb154p+118 (0xfad5d8aa), 0x1.8eb61ap+61 (0x5e475b0d)}: *-0x1.eeecp+52 vs. -0x1.602d24p+94 (0xeeb01692) at index: 0

ERROR: fmod: 65524330144892614344704.000000 ulp error at {0x1.ce604ap+11 (0x45673025), 0x1.0bbbaap-65 (0x1f05ddd5)}: *0x1.e25edp-66 vs. 0x1.bc0298p-14 (0x38de014c) at index: 1

ERROR: fmod: -inf ulp error at {0x1.c98fd6p+122 (0x7ce4c7eb), 0x1.fe4a44p-116 (0x5ff2522)}: *0x1.98780cp-116 vs. -inf (0xff800000) at index: 0

ERROR: fmod: inf ulp error at {-0x1.3132dep+84 (0xe998996f), -0x1.a91dcep-110 (0x88d48ee7)}: *-0x1.9270eap-110 vs. inf (0x7f800000) at index: 4

ERROR: fmod: -inf ulp error at {0x1.24c0e6p+36 (0x51926073), -0x1.a5528ap-122 (0x82d2a945)}: *0x1.86bd68p-122 vs. -inf (0xff800000) at index: 1

ERROR: fmod: 328137422309548672426878384939712118784.000000 ulp error at {-0x1.f16f6cp+68 (0xe1f8b7b6), 0x1.34d296p-58 (0x229a694b)}: *-0x1.5fa6ap-61 vs. 0x1.edb9fp+43 (0x5576dcf8) at index: 2

ERROR: fmod: -355087763374080.000000 ulp error at {0x1.b2ab98p+95 (0x6f5955cc), -0x1.ffc7fcp+45 (0xd67fe3fe)}: *0x1.fe7558p+45 vs. -0x1.42f35p+70 (0xe2a179a8) at index: 7

ERROR: fmod: -22169001879878959779282944.000000 ulp error at {-0x1.92d1bap-15 (0xb84968dd), -0x1.a4b14cp-100 (0x8dd258a6)}: *-0x1.38fb4p-103 vs. -0x1.25678p-42 (0xaa92b3c0) at index: 1

ERROR: fmod: inf ulp error at {-0x1.080adep+92 (0xed84056f), -0x1.c2f17cp-77 (0x996178be)}: *-0x1.cd9d5p-79 vs. inf (0x7f800000) at index: 0

ERROR: fmod: inf ulp error at {-0x1.17f76ap+95 (0xef0bfbb5), -0x1.4e0fe4p-108 (0x89a707f2)}: *-0x1.a2f8c8p-109 vs. inf (0x7f800000) at index: 2

ERROR: fmod: -6874084436590301837445300224.000000 ulp error at {0x1.697b38p+54 (0x5ab4bd9c), 0x1.f3988cp-39 (0x2c79cc46)}: *0x1.b4c1ap-40 vs. -0x1.6361cp+29 (0xce31b0e0) at index: 1

ERROR: fmod: -8738995383377592320.000000 ulp error at {0x1.496b64p+85 (0x6a24b5b2), 0x1.bae466p+24 (0x4bdd7233)}: *0x1.7a58cp+20 vs. -0x1.e51c98p+59 (0xdd728e4c) at index: 1

ERROR: fmod: 343065079314668060672.000000 ulp error at {-0x1.21b74ap-47 (0xa810dba5), 0x1.fdb82ep-117 (0x57edc17)}: *-0x1.606e5ap-117 vs. 0x1.298fcp-72 (0x1b94c7e0) at index: 4

ERROR: fmod: -inf ulp error at {0x1.8cdda2p+60 (0x5dc66ed1), 0x1.aeea96p-109 (0x957754b)}: *0x1.6ac0ep-112 vs. -inf (0xff800000) at index: 2

ERROR: fmod: -2991471016484578263040000.000000 ulp error at {-0x1.6a71bep+46 (0xd6b538df), -0x1.4091d2p-34 (0xaea048e9)}: *-0x1.7398e8p-36 vs. -0x1.3cbbfcp+22 (0xca9e5dfe) at index: 3

ERROR: fmod: -inf ulp error at {0x1.ec3bf8p+71 (0x63761dfc), -0x1.36870ep-68 (0x9d9b4387)}: *0x1.06b884p-68 vs. -inf (0xff800000) at index: 0

ERROR: fmod: -1561864113370783809536.000000 ulp error at {0x1.c99b3ap+127 (0x7f64cd9d), -0x1.8f9088p+58 (0xdcc7c844)}: *0x1.aed24p+55 vs. -0x1.52acep+102 (0xf2a95670) at index: 0

In D84324#2169378, @arsenm wrote:

The errors aren't small, and aren't just edge cases:

The ones that return +/- inf are because the division overflows. The others look like rounding error in the division when the result of x/y is large but doesn't overflow - this can easily lead to a result with the wrong sign, or with magnitude larger than y. I don't think it's realistic to try to fix these problems with an inline expansion. It really needs a library function.

I suppose the question is: is this patch still a useful default implementation of frem?

In D84324#2169535, @foad wrote:

In D84324#2169378, @arsenm wrote:

The errors aren't small, and aren't just edge cases:

The ones that return +/- inf are because the division overflows. The others look like rounding error in the division when the result of x/y is large but doesn't overflow - this can easily lead to a result with the wrong sign, or with magnitude larger than y. I don't think it's realistic to try to fix these problems with an inline expansion. It really needs a library function.

I suppose the question is: is this patch still a useful default implementation of frem?

For something that doesn't work perfectly, I don't think it belongs in the generic code. It would be more palatable to keep this in AMDGPU to match the DAG behavior

In D84324#2170042, @arsenm wrote:

In D84324#2169535, @foad wrote:

In D84324#2169378, @arsenm wrote:

The errors aren't small, and aren't just edge cases:

The ones that return +/- inf are because the division overflows. The others look like rounding error in the division when the result of x/y is large but doesn't overflow - this can easily lead to a result with the wrong sign, or with magnitude larger than y. I don't think it's realistic to try to fix these problems with an inline expansion. It really needs a library function.

I suppose the question is: is this patch still a useful default implementation of frem?

For something that doesn't work perfectly, I don't think it belongs in the generic code. It would be more palatable to keep this in AMDGPU to match the DAG behavior

Maybe it should also fail to legalize if it's not afn?

In D84324#2170042, @arsenm wrote:

In D84324#2169535, @foad wrote:

In D84324#2169378, @arsenm wrote:

The errors aren't small, and aren't just edge cases:

The ones that return +/- inf are because the division overflows. The others look like rounding error in the division when the result of x/y is large but doesn't overflow - this can easily lead to a result with the wrong sign, or with magnitude larger than y. I don't think it's realistic to try to fix these problems with an inline expansion. It really needs a library function.

I suppose the question is: is this patch still a useful default implementation of frem?

For something that doesn't work perfectly, I don't think it belongs in the generic code. It would be more palatable to keep this in AMDGPU to match the DAG behavior

Sounds reasonable.

Switch to custom lowering and update to match changes in dag custom lowering for frem (and use same lowering for s16 also).

Looks OK to me but please wait to hear from @arsenm too.

llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
1643–1649	Maybe put this declaration next to buildFAdd / buildFSub ?
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
623–625	Does this need to be conditional on ST.has16BitInsts ?
1775–1776	Use buildIntrinsicTrunc?

arsenm added inline comments.Aug 6 2020, 7:09 AM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
623–625	It doesn't strictly have to be, but it would produce a better result to force promotion to 32-bit first

Addressed review comments.

foad added inline comments.Aug 7 2020, 6:15 AM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
623–625	I assume Matt meant to force promotion to 32-bit first if the subtarget doesn't have 16-bit instructions. Compared to the previous version of your patch, the code for fast_frem_f16 has got better for CI but worse for VI.

Force promotion to 32-bit first when subtarget doesn't have 16-bit instruction.

foad accepted this revision.Aug 7 2020, 6:35 AM

This revision was not accepted when it landed; it landed in state Needs Review.Aug 10 2020, 1:18 AM

Closed by commit rG0d58d9e8fb93: AMDGPU/GlobalISel: Lower G_FREM (authored by Petar.Avramovic). · Explain Why

This revision was automatically updated to reflect the committed changes.

Petar.Avramovic added a commit: rG0d58d9e8fb93: AMDGPU/GlobalISel: Lower G_FREM.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

GlobalISel/

MachineIRBuilder.h

7 lines

lib/

CodeGen/

GlobalISel/

LegalizerHelper.cpp

14 lines

Target/

AMDGPU/

AMDGPULegalizerInfo.cpp

4 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

legalize-frem.mir

446 lines

Diff 279840

llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h

	Show First 20 Lines • Show All 1,634 Lines • ▼ Show 20 Lines
	}			}

	/// Build and insert \p Res = G_UMAX \p Op0, \p Op1			/// Build and insert \p Res = G_UMAX \p Op0, \p Op1
	MachineInstrBuilder buildUMax(const DstOp &Dst, const SrcOp &Src0,			MachineInstrBuilder buildUMax(const DstOp &Dst, const SrcOp &Src0,
	const SrcOp &Src1) {			const SrcOp &Src1) {
	return buildInstr(TargetOpcode::G_UMAX, {Dst}, {Src0, Src1});			return buildInstr(TargetOpcode::G_UMAX, {Dst}, {Src0, Src1});
	}			}

				/// Build and insert \p Res = G_FDIV \p Op0, \p Op1
				MachineInstrBuilder buildFDiv(const DstOp &Dst, const SrcOp &Src0,
				const SrcOp &Src1,
				Optional<unsigned> Flags = None) {
				return buildInstr(TargetOpcode::G_FDIV, {Dst}, {Src0, Src1}, Flags);
				}

				foadUnsubmitted Not Done Reply Inline Actions Maybe put this declaration next to buildFAdd / buildFSub ? foad: Maybe put this declaration next to buildFAdd / buildFSub ?
	/// Build and insert \p Res = G_JUMP_TABLE \p JTI			/// Build and insert \p Res = G_JUMP_TABLE \p JTI
	///			///
	/// G_JUMP_TABLE sets \p Res to the address of the jump table specified by			/// G_JUMP_TABLE sets \p Res to the address of the jump table specified by
	/// the jump table index \p JTI.			/// the jump table index \p JTI.
	///			///
	/// \return a MachineInstrBuilder for the newly created instruction.			/// \return a MachineInstrBuilder for the newly created instruction.
	MachineInstrBuilder buildJumpTable(const LLT PtrTy, unsigned JTI);			MachineInstrBuilder buildJumpTable(const LLT PtrTy, unsigned JTI);

	virtual MachineInstrBuilder buildInstr(unsigned Opc, ArrayRef<DstOp> DstOps,			virtual MachineInstrBuilder buildInstr(unsigned Opc, ArrayRef<DstOp> DstOps,
	ArrayRef<SrcOp> SrcOps,			ArrayRef<SrcOp> SrcOps,
	Optional<unsigned> Flags = None);			Optional<unsigned> Flags = None);
	};			};

	} // End namespace llvm.			} // End namespace llvm.
	#endif // LLVM_CODEGEN_GLOBALISEL_MACHINEIRBUILDER_H			#endif // LLVM_CODEGEN_GLOBALISEL_MACHINEIRBUILDER_H

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

Show First 20 Lines • Show All 2,722 Lines • ▼ Show 20 Lines	case G_INSERT:
return lowerInsert(MI);		return lowerInsert(MI);
case G_BSWAP:		case G_BSWAP:
return lowerBswap(MI);		return lowerBswap(MI);
case G_BITREVERSE:		case G_BITREVERSE:
return lowerBitreverse(MI);		return lowerBitreverse(MI);
case G_READ_REGISTER:		case G_READ_REGISTER:
case G_WRITE_REGISTER:		case G_WRITE_REGISTER:
return lowerReadWriteRegister(MI);		return lowerReadWriteRegister(MI);
		case G_FREM: {
		Register DstReg = MI.getOperand(0).getReg();
		Register Src0Reg = MI.getOperand(1).getReg();
		Register Src1Reg = MI.getOperand(2).getReg();
		auto Flags = MI.getFlags();
		LLT Ty = MRI.getType(DstReg);
		auto Div = MIRBuilder.buildFDiv(Ty, Src0Reg, Src1Reg, Flags);
		arsenmUnsubmitted Not Done Reply Inline Actions buildFDiv? These are all dropping the flags too arsenm: buildFDiv? These are all dropping the flags too
		auto FpTrunc = MIRBuilder.buildInstr(TargetOpcode::G_INTRINSIC_TRUNC, {Ty},
		arsenmUnsubmitted Not Done Reply Inline Actions buildFFloor? arsenm: buildFFloor?
		arsenmUnsubmitted Not Done Reply Inline Actions Is this a correct handling of frem? The AMDGPU dag expansion uses ISD::FTRUNC, but I'm not sure that was ever correct arsenm: Is this a correct handling of frem? The AMDGPU dag expansion uses ISD::FTRUNC, but I'm not sure…
		Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions The G_FPTRUNC complains about src and dst being same size, I hit assert(DstTy.getSizeInBits() < SrcTy.getSizeInBits() && "invalid widening trunc"); from the variable name I thought that FFloor could work but I guess that it works only when operands have same sign. (btw vulkan cts tests where I saw this passed). Dag expansion seems correct from the description of fmod/frem. This generic instruction here should discard digits after decimal point, do we have such instruction? Petar.Avramovic: The G_FPTRUNC complains about src and dst being same size, I hit assert(DstTy.getSizeInBits…
		arsenmUnsubmitted Not Done Reply Inline Actions ISD::FTRUNC is G_INTRINSIC_TRUNC I'm not really clear on what frem really is, or if it's really supposed to be the same as OpenCL fmod arsenm: ISD::FTRUNC is G_INTRINSIC_TRUNC I'm not really clear on what frem really is, or if it's…
		{Div}, Flags);
		auto Mul = MIRBuilder.buildFMul(Ty, FpTrunc, Src1Reg, Flags);
		MIRBuilder.buildFSub(DstReg, Src0Reg, Mul, Flags);
		MI.eraseFromParent();
		return Legalized;
		}
}		}
}		}

LegalizerHelper::LegalizeResult LegalizerHelper::fewerElementsVectorImplicitDef(		LegalizerHelper::LegalizeResult LegalizerHelper::fewerElementsVectorImplicitDef(
MachineInstr &MI, unsigned TypeIdx, LLT NarrowTy) {		MachineInstr &MI, unsigned TypeIdx, LLT NarrowTy) {
SmallVector<Register, 2> DstRegs;		SmallVector<Register, 2> DstRegs;

unsigned NarrowSize = NarrowTy.getSizeInBits();		unsigned NarrowSize = NarrowTy.getSizeInBits();
▲ Show 20 Lines • Show All 2,683 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Show First 20 Lines • Show All 614 Lines • ▼ Show 20 Lines	if (ST.hasMadF16() && ST.hasMadMacF32Insts())
FMad.customFor({S32, S16});		FMad.customFor({S32, S16});
else if (ST.hasMadMacF32Insts())		else if (ST.hasMadMacF32Insts())
FMad.customFor({S32});		FMad.customFor({S32});
else if (ST.hasMadF16())		else if (ST.hasMadF16())
FMad.customFor({S16});		FMad.customFor({S16});
FMad.scalarize(0)		FMad.scalarize(0)
.lower();		.lower();

// TODO: Do we need to clamp maximum bitwidth?		// TODO: Do we need to clamp maximum bitwidth?
getActionDefinitionsBuilder(G_TRUNC)		getActionDefinitionsBuilder(G_TRUNC)
.legalIf(isScalar(0))		.legalIf(isScalar(0))
		foadUnsubmitted Not Done Reply Inline Actions Does this need to be conditional on ST.has16BitInsts ? foad: Does this need to be conditional on ST.has16BitInsts ?
		arsenmUnsubmitted Not Done Reply Inline Actions It doesn't strictly have to be, but it would produce a better result to force promotion to 32-bit first arsenm: It doesn't strictly have to be, but it would produce a better result to force promotion to 32…
		foadUnsubmitted Not Done Reply Inline Actions I assume Matt meant to force promotion to 32-bit first if the subtarget doesn't have 16-bit instructions. Compared to the previous version of your patch, the code for fast_frem_f16 has got better for CI but worse for VI. foad: I assume Matt meant to force promotion to 32-bit first //if// the subtarget doesn't have 16-bit…
.legalFor({{V2S16, V2S32}})		.legalFor({{V2S16, V2S32}})
.clampMaxNumElements(0, S16, 2)		.clampMaxNumElements(0, S16, 2)
// Avoid scalarizing in cases that should be truly illegal. In unresolvable		// Avoid scalarizing in cases that should be truly illegal. In unresolvable
// situations (like an invalid implicit use), we don't want to infinite loop		// situations (like an invalid implicit use), we don't want to infinite loop
// in the legalizer.		// in the legalizer.
.fewerElementsIf(elementTypeIsLegal(0), LegalizeMutations::scalarize(0))		.fewerElementsIf(elementTypeIsLegal(0), LegalizeMutations::scalarize(0))
.alwaysLegal();		.alwaysLegal();

▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_,
} else {		} else {
getActionDefinitionsBuilder({G_INTRINSIC_TRUNC, G_FCEIL, G_FRINT})		getActionDefinitionsBuilder({G_INTRINSIC_TRUNC, G_FCEIL, G_FRINT})
.legalFor({S32})		.legalFor({S32})
.customFor({S64})		.customFor({S64})
.clampScalar(0, S32, S64)		.clampScalar(0, S32, S64)
.scalarize(0);		.scalarize(0);
}		}

		getActionDefinitionsBuilder(G_FREM)
		.lowerFor({S32, S64})
		.scalarize(0);

// FIXME: Clamp offset operand.		// FIXME: Clamp offset operand.
getActionDefinitionsBuilder(G_PTR_ADD)		getActionDefinitionsBuilder(G_PTR_ADD)
.legalIf(isPointer(0))		.legalIf(isPointer(0))
.scalarize(0);		.scalarize(0);

getActionDefinitionsBuilder(G_PTRMASK)		getActionDefinitionsBuilder(G_PTRMASK)
.legalIf(typeInSet(1, {S64, S32}))		.legalIf(typeInSet(1, {S64, S32}))
.minScalar(1, S32)		.minScalar(1, S32)
▲ Show 20 Lines • Show All 1,066 Lines • ▼ Show 20 Lines	static MachineInstrBuilder extractF64Exponent(Register Hi,
const unsigned FractBits = 52;		const unsigned FractBits = 52;
const unsigned ExpBits = 11;		const unsigned ExpBits = 11;
LLT S32 = LLT::scalar(32);		LLT S32 = LLT::scalar(32);

auto Const0 = B.buildConstant(S32, FractBits - 32);		auto Const0 = B.buildConstant(S32, FractBits - 32);
auto Const1 = B.buildConstant(S32, ExpBits);		auto Const1 = B.buildConstant(S32, ExpBits);

auto ExpPart = B.buildIntrinsic(Intrinsic::amdgcn_ubfe, {S32}, false)		auto ExpPart = B.buildIntrinsic(Intrinsic::amdgcn_ubfe, {S32}, false)
.addUse(Hi)		.addUse(Hi)
.addUse(Const0.getReg(0))		.addUse(Const0.getReg(0))
		foadUnsubmitted Not Done Reply Inline Actions Use buildIntrinsicTrunc? foad: Use buildIntrinsicTrunc?
.addUse(Const1.getReg(0));		.addUse(Const1.getReg(0));

return B.buildSub(S32, ExpPart, B.buildConstant(S32, 1023));		return B.buildSub(S32, ExpPart, B.buildConstant(S32, 1023));
}		}

bool AMDGPULegalizerInfo::legalizeIntrinsicTrunc(		bool AMDGPULegalizerInfo::legalizeIntrinsicTrunc(
MachineInstr &MI, MachineRegisterInfo &MRI,		MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &B) const {		MachineIRBuilder &B) const {
▲ Show 20 Lines • Show All 2,586 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-frem.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -run-pass=legalizer %s -o - \| FileCheck -check-prefix=SI %s
				# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=fiji -run-pass=legalizer %s -o - \| FileCheck -check-prefix=VI %s
				# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -run-pass=legalizer %s -o - \| FileCheck -check-prefix=GFX9 %s

				---
				name: test_frem_s32
				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1

				; SI-LABEL: name: test_frem_s32
				; SI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; SI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; SI: [[INT:%[0-9]+]]:_(s32) = nnan nsz arcp contract afn reassoc G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), [[COPY1]](s32)
				; SI: [[FMUL:%[0-9]+]]:_(s32) = nnan nsz arcp contract afn reassoc G_FMUL [[COPY]], [[INT]]
				; SI: [[INTRINSIC_TRUNC:%[0-9]+]]:_(s32) = nnan nsz arcp contract afn reassoc G_INTRINSIC_TRUNC [[FMUL]]
				; SI: [[FMUL1:%[0-9]+]]:_(s32) = nnan nsz arcp contract afn reassoc G_FMUL [[INTRINSIC_TRUNC]], [[COPY1]]
				; SI: [[FSUB:%[0-9]+]]:_(s32) = nnan nsz arcp contract afn reassoc G_FSUB [[COPY]], [[FMUL1]]
				; SI: $vgpr0 = COPY [[FSUB]](s32)
				; VI-LABEL: name: test_frem_s32
				; VI: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; VI: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; VI: [[INT:%[0-9]+]]:_(s32) = nnan nsz arcp contract afn reassoc G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), [[COPY1]](s32)
				; VI: [[FMUL:%[0-9]+]]:_(s32) = nnan nsz arcp contract afn reassoc G_FMUL [[COPY]], [[INT]]
				; VI: [[INTRINSIC_TRUNC:%[0-9]+]]:_(s32) = nnan nsz arcp contract afn reassoc G_INTRINSIC_TRUNC [[FMUL]]
				; VI: [[FMUL1:%[0-9]+]]:_(s32) = nnan nsz arcp contract afn reassoc G_FMUL [[INTRINSIC_TRUNC]], [[COPY1]]
				; VI: [[FSUB:%[0-9]+]]:_(s32) = nnan nsz arcp contract afn reassoc G_FSUB [[COPY]], [[FMUL1]]
				; VI: $vgpr0 = COPY [[FSUB]](s32)
				; GFX9-LABEL: name: test_frem_s32
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[INT:%[0-9]+]]:_(s32) = nnan nsz arcp contract afn reassoc G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), [[COPY1]](s32)
				; GFX9: [[FMUL:%[0-9]+]]:_(s32) = nnan nsz arcp contract afn reassoc G_FMUL [[COPY]], [[INT]]
				; GFX9: [[INTRINSIC_TRUNC:%[0-9]+]]:_(s32) = nnan nsz arcp contract afn reassoc G_INTRINSIC_TRUNC [[FMUL]]
				; GFX9: [[FMUL1:%[0-9]+]]:_(s32) = nnan nsz arcp contract afn reassoc G_FMUL [[INTRINSIC_TRUNC]], [[COPY1]]
				; GFX9: [[FSUB:%[0-9]+]]:_(s32) = nnan nsz arcp contract afn reassoc G_FSUB [[COPY]], [[FMUL1]]
				; GFX9: $vgpr0 = COPY [[FSUB]](s32)
				%0:_(s32) = COPY $vgpr0
				%1:_(s32) = COPY $vgpr1
				%2:_(s32) = nnan nsz arcp contract afn reassoc G_FREM %0, %1
				$vgpr0 = COPY %2
				...

				---
				name: test_frem_s64
				body: \|
				bb.0:
				liveins: $vgpr0_vgpr1, $vgpr2_vgpr3

				; SI-LABEL: name: test_frem_s64
				; SI: [[COPY:%[0-9]+]]:_(s64) = COPY $vgpr0_vgpr1
				; SI: [[COPY1:%[0-9]+]]:_(s64) = COPY $vgpr2_vgpr3
				; SI: [[C:%[0-9]+]]:_(s64) = G_FCONSTANT double 1.000000e+00
				; SI: [[INT:%[0-9]+]]:_(s64), [[INT1:%[0-9]+]]:_(s1) = nnan nsz arcp contract reassoc G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[COPY]](s64), [[COPY1]](s64), 0
				; SI: [[FNEG:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FNEG [[INT]]
				; SI: [[INT2:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), [[INT]](s64)
				; SI: [[FMA:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMA [[FNEG]], [[INT2]], [[C]]
				; SI: [[FMA1:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMA [[INT2]], [[FMA]], [[INT2]]
				; SI: [[FMA2:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMA [[FNEG]], [[FMA1]], [[C]]
				; SI: [[INT3:%[0-9]+]]:_(s64), [[INT4:%[0-9]+]]:_(s1) = nnan nsz arcp contract reassoc G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[COPY]](s64), [[COPY1]](s64), 1
				; SI: [[FMA3:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMA [[FMA1]], [[FMA2]], [[FMA1]]
				; SI: [[FMUL:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMUL [[INT3]], [[FMA3]]
				; SI: [[FMA4:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMA [[FNEG]], [[FMUL]], [[INT3]]
				; SI: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](s64)
				; SI: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](s64)
				; SI: [[UV4:%[0-9]+]]:_(s32), [[UV5:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](s64)
				; SI: [[UV6:%[0-9]+]]:_(s32), [[UV7:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT3]](s64)
				; SI: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UV1]](s32), [[UV7]]
				; SI: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UV3]](s32), [[UV5]]
				; SI: [[XOR:%[0-9]+]]:_(s1) = G_XOR [[ICMP]], [[ICMP1]]
				; SI: [[INT5:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[FMA4]](s64), [[FMA3]](s64), [[FMUL]](s64), [[XOR]](s1)
				; SI: [[INT6:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_INTRINSIC intrinsic(@llvm.amdgcn.div.fixup), [[INT5]](s64), [[COPY1]](s64), [[COPY]](s64)
				; SI: [[UV8:%[0-9]+]]:_(s32), [[UV9:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT6]](s64)
				; SI: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 20
				; SI: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 11
				; SI: [[INT7:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.ubfe), [[UV9]](s32), [[C1]](s32), [[C2]](s32)
				; SI: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1023
				; SI: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[INT7]], [[C3]]
				; SI: [[C4:%[0-9]+]]:_(s32) = G_CONSTANT i32 -2147483648
				; SI: [[AND:%[0-9]+]]:_(s32) = G_AND [[UV9]], [[C4]]
				; SI: [[C5:%[0-9]+]]:_(s64) = G_CONSTANT i64 4503599627370495
				; SI: [[C6:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
				; SI: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[C6]](s32), [[AND]](s32)
				; SI: [[ASHR:%[0-9]+]]:_(s64) = G_ASHR [[C5]], [[SUB]](s32)
				; SI: [[C7:%[0-9]+]]:_(s64) = G_CONSTANT i64 -1
				; SI: [[XOR1:%[0-9]+]]:_(s64) = G_XOR [[ASHR]], [[C7]]
				; SI: [[AND1:%[0-9]+]]:_(s64) = G_AND [[INT6]], [[XOR1]]
				; SI: [[C8:%[0-9]+]]:_(s32) = G_CONSTANT i32 51
				; SI: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[SUB]](s32), [[C6]]
				; SI: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(sgt), [[SUB]](s32), [[C8]]
				; SI: [[SELECT:%[0-9]+]]:_(s64) = G_SELECT [[ICMP2]](s1), [[MV]], [[AND1]]
				; SI: [[SELECT1:%[0-9]+]]:_(s64) = G_SELECT [[ICMP3]](s1), [[INT6]], [[SELECT]]
				; SI: [[FMUL1:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMUL [[SELECT1]], [[COPY1]]
				; SI: [[FNEG1:%[0-9]+]]:_(s64) = G_FNEG [[FMUL1]]
				; SI: [[FADD:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FADD [[COPY]], [[FNEG1]]
				; SI: $vgpr0_vgpr1 = COPY [[FADD]](s64)
				; VI-LABEL: name: test_frem_s64
				; VI: [[COPY:%[0-9]+]]:_(s64) = COPY $vgpr0_vgpr1
				; VI: [[COPY1:%[0-9]+]]:_(s64) = COPY $vgpr2_vgpr3
				; VI: [[C:%[0-9]+]]:_(s64) = G_FCONSTANT double 1.000000e+00
				; VI: [[INT:%[0-9]+]]:_(s64), [[INT1:%[0-9]+]]:_(s1) = nnan nsz arcp contract reassoc G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[COPY]](s64), [[COPY1]](s64), 0
				; VI: [[FNEG:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FNEG [[INT]]
				; VI: [[INT2:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), [[INT]](s64)
				; VI: [[FMA:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMA [[FNEG]], [[INT2]], [[C]]
				; VI: [[FMA1:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMA [[INT2]], [[FMA]], [[INT2]]
				; VI: [[FMA2:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMA [[FNEG]], [[FMA1]], [[C]]
				; VI: [[INT3:%[0-9]+]]:_(s64), [[INT4:%[0-9]+]]:_(s1) = nnan nsz arcp contract reassoc G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[COPY]](s64), [[COPY1]](s64), 1
				; VI: [[FMA3:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMA [[FMA1]], [[FMA2]], [[FMA1]]
				; VI: [[FMUL:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMUL [[INT3]], [[FMA3]]
				; VI: [[FMA4:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMA [[FNEG]], [[FMUL]], [[INT3]]
				; VI: [[INT5:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[FMA4]](s64), [[FMA3]](s64), [[FMUL]](s64), [[INT4]](s1)
				; VI: [[INT6:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_INTRINSIC intrinsic(@llvm.amdgcn.div.fixup), [[INT5]](s64), [[COPY1]](s64), [[COPY]](s64)
				; VI: [[INTRINSIC_TRUNC:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_INTRINSIC_TRUNC [[INT6]]
				; VI: [[FMUL1:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMUL [[INTRINSIC_TRUNC]], [[COPY1]]
				; VI: [[FNEG1:%[0-9]+]]:_(s64) = G_FNEG [[FMUL1]]
				; VI: [[FADD:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FADD [[COPY]], [[FNEG1]]
				; VI: $vgpr0_vgpr1 = COPY [[FADD]](s64)
				; GFX9-LABEL: name: test_frem_s64
				; GFX9: [[COPY:%[0-9]+]]:_(s64) = COPY $vgpr0_vgpr1
				; GFX9: [[COPY1:%[0-9]+]]:_(s64) = COPY $vgpr2_vgpr3
				; GFX9: [[C:%[0-9]+]]:_(s64) = G_FCONSTANT double 1.000000e+00
				; GFX9: [[INT:%[0-9]+]]:_(s64), [[INT1:%[0-9]+]]:_(s1) = nnan nsz arcp contract reassoc G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[COPY]](s64), [[COPY1]](s64), 0
				; GFX9: [[FNEG:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FNEG [[INT]]
				; GFX9: [[INT2:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), [[INT]](s64)
				; GFX9: [[FMA:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMA [[FNEG]], [[INT2]], [[C]]
				; GFX9: [[FMA1:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMA [[INT2]], [[FMA]], [[INT2]]
				; GFX9: [[FMA2:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMA [[FNEG]], [[FMA1]], [[C]]
				; GFX9: [[INT3:%[0-9]+]]:_(s64), [[INT4:%[0-9]+]]:_(s1) = nnan nsz arcp contract reassoc G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[COPY]](s64), [[COPY1]](s64), 1
				; GFX9: [[FMA3:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMA [[FMA1]], [[FMA2]], [[FMA1]]
				; GFX9: [[FMUL:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMUL [[INT3]], [[FMA3]]
				; GFX9: [[FMA4:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMA [[FNEG]], [[FMUL]], [[INT3]]
				; GFX9: [[INT5:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[FMA4]](s64), [[FMA3]](s64), [[FMUL]](s64), [[INT4]](s1)
				; GFX9: [[INT6:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_INTRINSIC intrinsic(@llvm.amdgcn.div.fixup), [[INT5]](s64), [[COPY1]](s64), [[COPY]](s64)
				; GFX9: [[INTRINSIC_TRUNC:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_INTRINSIC_TRUNC [[INT6]]
				; GFX9: [[FMUL1:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FMUL [[INTRINSIC_TRUNC]], [[COPY1]]
				; GFX9: [[FNEG1:%[0-9]+]]:_(s64) = G_FNEG [[FMUL1]]
				; GFX9: [[FADD:%[0-9]+]]:_(s64) = nnan nsz arcp contract reassoc G_FADD [[COPY]], [[FNEG1]]
				; GFX9: $vgpr0_vgpr1 = COPY [[FADD]](s64)
				%0:_(s64) = COPY $vgpr0_vgpr1
				%1:_(s64) = COPY $vgpr2_vgpr3
				%2:_(s64) = nnan nsz arcp contract reassoc G_FREM %0, %1
				$vgpr0_vgpr1 = COPY %2
				...

				---
				name: test_frem_v2s32
				body: \|
				bb.0.entry:
				liveins: $vgpr0_vgpr1, $vgpr2_vgpr3

				; SI-LABEL: name: test_frem_v2s32
				; SI: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1
				; SI: [[COPY1:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr2_vgpr3
				; SI: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](<2 x s32>)
				; SI: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>)
				; SI: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 1.000000e+00
				; SI: [[INT:%[0-9]+]]:_(s32), [[INT1:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV]](s32), [[UV2]](s32), 0
				; SI: [[INT2:%[0-9]+]]:_(s32), [[INT3:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV]](s32), [[UV2]](s32), 1
				; SI: [[INT4:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), [[INT]](s32)
				; SI: [[FNEG:%[0-9]+]]:_(s32) = G_FNEG [[INT]]
				; SI: [[FMA:%[0-9]+]]:_(s32) = G_FMA [[FNEG]], [[INT4]], [[C]]
				; SI: [[FMA1:%[0-9]+]]:_(s32) = G_FMA [[FMA]], [[INT4]], [[INT4]]
				; SI: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[INT2]], [[FMA1]]
				; SI: [[FMA2:%[0-9]+]]:_(s32) = G_FMA [[FNEG]], [[FMUL]], [[INT2]]
				; SI: [[FMA3:%[0-9]+]]:_(s32) = G_FMA [[FMA2]], [[FMA1]], [[FMUL]]
				; SI: [[FMA4:%[0-9]+]]:_(s32) = G_FMA [[FNEG]], [[FMA3]], [[INT2]]
				; SI: [[INT5:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[FMA4]](s32), [[FMA1]](s32), [[FMA3]](s32), [[INT3]](s1)
				; SI: [[INT6:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fixup), [[INT5]](s32), [[UV2]](s32), [[UV]](s32)
				; SI: [[INTRINSIC_TRUNC:%[0-9]+]]:_(s32) = G_INTRINSIC_TRUNC [[INT6]]
				; SI: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[INTRINSIC_TRUNC]], [[UV2]]
				; SI: [[FSUB:%[0-9]+]]:_(s32) = G_FSUB [[UV]], [[FMUL1]]
				; SI: [[INT7:%[0-9]+]]:_(s32), [[INT8:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV1]](s32), [[UV3]](s32), 0
				; SI: [[INT9:%[0-9]+]]:_(s32), [[INT10:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV1]](s32), [[UV3]](s32), 1
				; SI: [[INT11:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), [[INT7]](s32)
				; SI: [[FNEG1:%[0-9]+]]:_(s32) = G_FNEG [[INT7]]
				; SI: [[FMA5:%[0-9]+]]:_(s32) = G_FMA [[FNEG1]], [[INT11]], [[C]]
				; SI: [[FMA6:%[0-9]+]]:_(s32) = G_FMA [[FMA5]], [[INT11]], [[INT11]]
				; SI: [[FMUL2:%[0-9]+]]:_(s32) = G_FMUL [[INT9]], [[FMA6]]
				; SI: [[FMA7:%[0-9]+]]:_(s32) = G_FMA [[FNEG1]], [[FMUL2]], [[INT9]]
				; SI: [[FMA8:%[0-9]+]]:_(s32) = G_FMA [[FMA7]], [[FMA6]], [[FMUL2]]
				; SI: [[FMA9:%[0-9]+]]:_(s32) = G_FMA [[FNEG1]], [[FMA8]], [[INT9]]
				; SI: [[INT12:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[FMA9]](s32), [[FMA6]](s32), [[FMA8]](s32), [[INT10]](s1)
				; SI: [[INT13:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fixup), [[INT12]](s32), [[UV3]](s32), [[UV1]](s32)
				; SI: [[INTRINSIC_TRUNC1:%[0-9]+]]:_(s32) = G_INTRINSIC_TRUNC [[INT13]]
				; SI: [[FMUL3:%[0-9]+]]:_(s32) = G_FMUL [[INTRINSIC_TRUNC1]], [[UV3]]
				; SI: [[FSUB1:%[0-9]+]]:_(s32) = G_FSUB [[UV1]], [[FMUL3]]
				; SI: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[FSUB]](s32), [[FSUB1]](s32)
				; SI: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>)
				; VI-LABEL: name: test_frem_v2s32
				; VI: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1
				; VI: [[COPY1:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr2_vgpr3
				; VI: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](<2 x s32>)
				; VI: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>)
				; VI: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 1.000000e+00
				; VI: [[INT:%[0-9]+]]:_(s32), [[INT1:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV]](s32), [[UV2]](s32), 0
				; VI: [[INT2:%[0-9]+]]:_(s32), [[INT3:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV]](s32), [[UV2]](s32), 1
				; VI: [[INT4:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), [[INT]](s32)
				; VI: [[FNEG:%[0-9]+]]:_(s32) = G_FNEG [[INT]]
				; VI: [[FMA:%[0-9]+]]:_(s32) = G_FMA [[FNEG]], [[INT4]], [[C]]
				; VI: [[FMA1:%[0-9]+]]:_(s32) = G_FMA [[FMA]], [[INT4]], [[INT4]]
				; VI: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[INT2]], [[FMA1]]
				; VI: [[FMA2:%[0-9]+]]:_(s32) = G_FMA [[FNEG]], [[FMUL]], [[INT2]]
				; VI: [[FMA3:%[0-9]+]]:_(s32) = G_FMA [[FMA2]], [[FMA1]], [[FMUL]]
				; VI: [[FMA4:%[0-9]+]]:_(s32) = G_FMA [[FNEG]], [[FMA3]], [[INT2]]
				; VI: [[INT5:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[FMA4]](s32), [[FMA1]](s32), [[FMA3]](s32), [[INT3]](s1)
				; VI: [[INT6:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fixup), [[INT5]](s32), [[UV2]](s32), [[UV]](s32)
				; VI: [[INTRINSIC_TRUNC:%[0-9]+]]:_(s32) = G_INTRINSIC_TRUNC [[INT6]]
				; VI: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[INTRINSIC_TRUNC]], [[UV2]]
				; VI: [[FSUB:%[0-9]+]]:_(s32) = G_FSUB [[UV]], [[FMUL1]]
				; VI: [[INT7:%[0-9]+]]:_(s32), [[INT8:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV1]](s32), [[UV3]](s32), 0
				; VI: [[INT9:%[0-9]+]]:_(s32), [[INT10:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV1]](s32), [[UV3]](s32), 1
				; VI: [[INT11:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), [[INT7]](s32)
				; VI: [[FNEG1:%[0-9]+]]:_(s32) = G_FNEG [[INT7]]
				; VI: [[FMA5:%[0-9]+]]:_(s32) = G_FMA [[FNEG1]], [[INT11]], [[C]]
				; VI: [[FMA6:%[0-9]+]]:_(s32) = G_FMA [[FMA5]], [[INT11]], [[INT11]]
				; VI: [[FMUL2:%[0-9]+]]:_(s32) = G_FMUL [[INT9]], [[FMA6]]
				; VI: [[FMA7:%[0-9]+]]:_(s32) = G_FMA [[FNEG1]], [[FMUL2]], [[INT9]]
				; VI: [[FMA8:%[0-9]+]]:_(s32) = G_FMA [[FMA7]], [[FMA6]], [[FMUL2]]
				; VI: [[FMA9:%[0-9]+]]:_(s32) = G_FMA [[FNEG1]], [[FMA8]], [[INT9]]
				; VI: [[INT12:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[FMA9]](s32), [[FMA6]](s32), [[FMA8]](s32), [[INT10]](s1)
				; VI: [[INT13:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fixup), [[INT12]](s32), [[UV3]](s32), [[UV1]](s32)
				; VI: [[INTRINSIC_TRUNC1:%[0-9]+]]:_(s32) = G_INTRINSIC_TRUNC [[INT13]]
				; VI: [[FMUL3:%[0-9]+]]:_(s32) = G_FMUL [[INTRINSIC_TRUNC1]], [[UV3]]
				; VI: [[FSUB1:%[0-9]+]]:_(s32) = G_FSUB [[UV1]], [[FMUL3]]
				; VI: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[FSUB]](s32), [[FSUB1]](s32)
				; VI: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>)
				; GFX9-LABEL: name: test_frem_v2s32
				; GFX9: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1
				; GFX9: [[COPY1:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr2_vgpr3
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](<2 x s32>)
				; GFX9: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_FCONSTANT float 1.000000e+00
				; GFX9: [[INT:%[0-9]+]]:_(s32), [[INT1:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV]](s32), [[UV2]](s32), 0
				; GFX9: [[INT2:%[0-9]+]]:_(s32), [[INT3:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV]](s32), [[UV2]](s32), 1
				; GFX9: [[INT4:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), [[INT]](s32)
				; GFX9: [[FNEG:%[0-9]+]]:_(s32) = G_FNEG [[INT]]
				; GFX9: [[FMA:%[0-9]+]]:_(s32) = G_FMA [[FNEG]], [[INT4]], [[C]]
				; GFX9: [[FMA1:%[0-9]+]]:_(s32) = G_FMA [[FMA]], [[INT4]], [[INT4]]
				; GFX9: [[FMUL:%[0-9]+]]:_(s32) = G_FMUL [[INT2]], [[FMA1]]
				; GFX9: [[FMA2:%[0-9]+]]:_(s32) = G_FMA [[FNEG]], [[FMUL]], [[INT2]]
				; GFX9: [[FMA3:%[0-9]+]]:_(s32) = G_FMA [[FMA2]], [[FMA1]], [[FMUL]]
				; GFX9: [[FMA4:%[0-9]+]]:_(s32) = G_FMA [[FNEG]], [[FMA3]], [[INT2]]
				; GFX9: [[INT5:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[FMA4]](s32), [[FMA1]](s32), [[FMA3]](s32), [[INT3]](s1)
				; GFX9: [[INT6:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fixup), [[INT5]](s32), [[UV2]](s32), [[UV]](s32)
				; GFX9: [[INTRINSIC_TRUNC:%[0-9]+]]:_(s32) = G_INTRINSIC_TRUNC [[INT6]]
				; GFX9: [[FMUL1:%[0-9]+]]:_(s32) = G_FMUL [[INTRINSIC_TRUNC]], [[UV2]]
				; GFX9: [[FSUB:%[0-9]+]]:_(s32) = G_FSUB [[UV]], [[FMUL1]]
				; GFX9: [[INT7:%[0-9]+]]:_(s32), [[INT8:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV1]](s32), [[UV3]](s32), 0
				; GFX9: [[INT9:%[0-9]+]]:_(s32), [[INT10:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV1]](s32), [[UV3]](s32), 1
				; GFX9: [[INT11:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), [[INT7]](s32)
				; GFX9: [[FNEG1:%[0-9]+]]:_(s32) = G_FNEG [[INT7]]
				; GFX9: [[FMA5:%[0-9]+]]:_(s32) = G_FMA [[FNEG1]], [[INT11]], [[C]]
				; GFX9: [[FMA6:%[0-9]+]]:_(s32) = G_FMA [[FMA5]], [[INT11]], [[INT11]]
				; GFX9: [[FMUL2:%[0-9]+]]:_(s32) = G_FMUL [[INT9]], [[FMA6]]
				; GFX9: [[FMA7:%[0-9]+]]:_(s32) = G_FMA [[FNEG1]], [[FMUL2]], [[INT9]]
				; GFX9: [[FMA8:%[0-9]+]]:_(s32) = G_FMA [[FMA7]], [[FMA6]], [[FMUL2]]
				; GFX9: [[FMA9:%[0-9]+]]:_(s32) = G_FMA [[FNEG1]], [[FMA8]], [[INT9]]
				; GFX9: [[INT12:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[FMA9]](s32), [[FMA6]](s32), [[FMA8]](s32), [[INT10]](s1)
				; GFX9: [[INT13:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fixup), [[INT12]](s32), [[UV3]](s32), [[UV1]](s32)
				; GFX9: [[INTRINSIC_TRUNC1:%[0-9]+]]:_(s32) = G_INTRINSIC_TRUNC [[INT13]]
				; GFX9: [[FMUL3:%[0-9]+]]:_(s32) = G_FMUL [[INTRINSIC_TRUNC1]], [[UV3]]
				; GFX9: [[FSUB1:%[0-9]+]]:_(s32) = G_FSUB [[UV1]], [[FMUL3]]
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[FSUB]](s32), [[FSUB1]](s32)
				; GFX9: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>)
				%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
				%1:_(<2 x s32>) = COPY $vgpr2_vgpr3
				%2:_(<2 x s32>) = G_FREM %0, %1
				$vgpr0_vgpr1 = COPY %2
				...

				---
				name: test_frem_v2s64
				body: \|
				bb.0.entry:
				liveins: $vgpr0_vgpr1_vgpr2_vgpr3, $vgpr4_vgpr5_vgpr6_vgpr7

				; SI-LABEL: name: test_frem_v2s64
				; SI: [[COPY:%[0-9]+]]:_(<2 x s64>) = COPY $vgpr0_vgpr1_vgpr2_vgpr3
				; SI: [[COPY1:%[0-9]+]]:_(<2 x s64>) = COPY $vgpr4_vgpr5_vgpr6_vgpr7
				; SI: [[UV:%[0-9]+]]:_(s64), [[UV1:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES [[COPY]](<2 x s64>)
				; SI: [[UV2:%[0-9]+]]:_(s64), [[UV3:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES [[COPY1]](<2 x s64>)
				; SI: [[C:%[0-9]+]]:_(s64) = G_FCONSTANT double 1.000000e+00
				; SI: [[INT:%[0-9]+]]:_(s64), [[INT1:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV]](s64), [[UV2]](s64), 0
				; SI: [[FNEG:%[0-9]+]]:_(s64) = G_FNEG [[INT]]
				; SI: [[INT2:%[0-9]+]]:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), [[INT]](s64)
				; SI: [[FMA:%[0-9]+]]:_(s64) = G_FMA [[FNEG]], [[INT2]], [[C]]
				; SI: [[FMA1:%[0-9]+]]:_(s64) = G_FMA [[INT2]], [[FMA]], [[INT2]]
				; SI: [[FMA2:%[0-9]+]]:_(s64) = G_FMA [[FNEG]], [[FMA1]], [[C]]
				; SI: [[INT3:%[0-9]+]]:_(s64), [[INT4:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV]](s64), [[UV2]](s64), 1
				; SI: [[FMA3:%[0-9]+]]:_(s64) = G_FMA [[FMA1]], [[FMA2]], [[FMA1]]
				; SI: [[FMUL:%[0-9]+]]:_(s64) = G_FMUL [[INT3]], [[FMA3]]
				; SI: [[FMA4:%[0-9]+]]:_(s64) = G_FMA [[FNEG]], [[FMUL]], [[INT3]]
				; SI: [[UV4:%[0-9]+]]:_(s32), [[UV5:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[UV]](s64)
				; SI: [[UV6:%[0-9]+]]:_(s32), [[UV7:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[UV2]](s64)
				; SI: [[UV8:%[0-9]+]]:_(s32), [[UV9:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT]](s64)
				; SI: [[UV10:%[0-9]+]]:_(s32), [[UV11:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT3]](s64)
				; SI: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UV5]](s32), [[UV11]]
				; SI: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UV7]](s32), [[UV9]]
				; SI: [[XOR:%[0-9]+]]:_(s1) = G_XOR [[ICMP]], [[ICMP1]]
				; SI: [[INT5:%[0-9]+]]:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[FMA4]](s64), [[FMA3]](s64), [[FMUL]](s64), [[XOR]](s1)
				; SI: [[INT6:%[0-9]+]]:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fixup), [[INT5]](s64), [[UV2]](s64), [[UV]](s64)
				; SI: [[UV12:%[0-9]+]]:_(s32), [[UV13:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT6]](s64)
				; SI: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 20
				; SI: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 11
				; SI: [[INT7:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.ubfe), [[UV13]](s32), [[C1]](s32), [[C2]](s32)
				; SI: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1023
				; SI: [[SUB:%[0-9]+]]:_(s32) = G_SUB [[INT7]], [[C3]]
				; SI: [[C4:%[0-9]+]]:_(s32) = G_CONSTANT i32 -2147483648
				; SI: [[AND:%[0-9]+]]:_(s32) = G_AND [[UV13]], [[C4]]
				; SI: [[C5:%[0-9]+]]:_(s64) = G_CONSTANT i64 4503599627370495
				; SI: [[C6:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
				; SI: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[C6]](s32), [[AND]](s32)
				; SI: [[ASHR:%[0-9]+]]:_(s64) = G_ASHR [[C5]], [[SUB]](s32)
				; SI: [[C7:%[0-9]+]]:_(s64) = G_CONSTANT i64 -1
				; SI: [[XOR1:%[0-9]+]]:_(s64) = G_XOR [[ASHR]], [[C7]]
				; SI: [[AND1:%[0-9]+]]:_(s64) = G_AND [[INT6]], [[XOR1]]
				; SI: [[C8:%[0-9]+]]:_(s32) = G_CONSTANT i32 51
				; SI: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[SUB]](s32), [[C6]]
				; SI: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(sgt), [[SUB]](s32), [[C8]]
				; SI: [[SELECT:%[0-9]+]]:_(s64) = G_SELECT [[ICMP2]](s1), [[MV]], [[AND1]]
				; SI: [[SELECT1:%[0-9]+]]:_(s64) = G_SELECT [[ICMP3]](s1), [[INT6]], [[SELECT]]
				; SI: [[FMUL1:%[0-9]+]]:_(s64) = G_FMUL [[SELECT1]], [[UV2]]
				; SI: [[FNEG1:%[0-9]+]]:_(s64) = G_FNEG [[FMUL1]]
				; SI: [[FADD:%[0-9]+]]:_(s64) = G_FADD [[UV]], [[FNEG1]]
				; SI: [[INT8:%[0-9]+]]:_(s64), [[INT9:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV1]](s64), [[UV3]](s64), 0
				; SI: [[FNEG2:%[0-9]+]]:_(s64) = G_FNEG [[INT8]]
				; SI: [[INT10:%[0-9]+]]:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), [[INT8]](s64)
				; SI: [[FMA5:%[0-9]+]]:_(s64) = G_FMA [[FNEG2]], [[INT10]], [[C]]
				; SI: [[FMA6:%[0-9]+]]:_(s64) = G_FMA [[INT10]], [[FMA5]], [[INT10]]
				; SI: [[FMA7:%[0-9]+]]:_(s64) = G_FMA [[FNEG2]], [[FMA6]], [[C]]
				; SI: [[INT11:%[0-9]+]]:_(s64), [[INT12:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV1]](s64), [[UV3]](s64), 1
				; SI: [[FMA8:%[0-9]+]]:_(s64) = G_FMA [[FMA6]], [[FMA7]], [[FMA6]]
				; SI: [[FMUL2:%[0-9]+]]:_(s64) = G_FMUL [[INT11]], [[FMA8]]
				; SI: [[FMA9:%[0-9]+]]:_(s64) = G_FMA [[FNEG2]], [[FMUL2]], [[INT11]]
				; SI: [[UV14:%[0-9]+]]:_(s32), [[UV15:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[UV1]](s64)
				; SI: [[UV16:%[0-9]+]]:_(s32), [[UV17:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[UV3]](s64)
				; SI: [[UV18:%[0-9]+]]:_(s32), [[UV19:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT8]](s64)
				; SI: [[UV20:%[0-9]+]]:_(s32), [[UV21:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT11]](s64)
				; SI: [[ICMP4:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UV15]](s32), [[UV21]]
				; SI: [[ICMP5:%[0-9]+]]:_(s1) = G_ICMP intpred(eq), [[UV17]](s32), [[UV19]]
				; SI: [[XOR2:%[0-9]+]]:_(s1) = G_XOR [[ICMP4]], [[ICMP5]]
				; SI: [[INT13:%[0-9]+]]:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[FMA9]](s64), [[FMA8]](s64), [[FMUL2]](s64), [[XOR2]](s1)
				; SI: [[INT14:%[0-9]+]]:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fixup), [[INT13]](s64), [[UV3]](s64), [[UV1]](s64)
				; SI: [[UV22:%[0-9]+]]:_(s32), [[UV23:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[INT14]](s64)
				; SI: [[INT15:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.ubfe), [[UV23]](s32), [[C1]](s32), [[C2]](s32)
				; SI: [[SUB1:%[0-9]+]]:_(s32) = G_SUB [[INT15]], [[C3]]
				; SI: [[AND2:%[0-9]+]]:_(s32) = G_AND [[UV23]], [[C4]]
				; SI: [[MV1:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[C6]](s32), [[AND2]](s32)
				; SI: [[ASHR1:%[0-9]+]]:_(s64) = G_ASHR [[C5]], [[SUB1]](s32)
				; SI: [[XOR3:%[0-9]+]]:_(s64) = G_XOR [[ASHR1]], [[C7]]
				; SI: [[AND3:%[0-9]+]]:_(s64) = G_AND [[INT14]], [[XOR3]]
				; SI: [[ICMP6:%[0-9]+]]:_(s1) = G_ICMP intpred(slt), [[SUB1]](s32), [[C6]]
				; SI: [[ICMP7:%[0-9]+]]:_(s1) = G_ICMP intpred(sgt), [[SUB1]](s32), [[C8]]
				; SI: [[SELECT2:%[0-9]+]]:_(s64) = G_SELECT [[ICMP6]](s1), [[MV1]], [[AND3]]
				; SI: [[SELECT3:%[0-9]+]]:_(s64) = G_SELECT [[ICMP7]](s1), [[INT14]], [[SELECT2]]
				; SI: [[FMUL3:%[0-9]+]]:_(s64) = G_FMUL [[SELECT3]], [[UV3]]
				; SI: [[FNEG3:%[0-9]+]]:_(s64) = G_FNEG [[FMUL3]]
				; SI: [[FADD1:%[0-9]+]]:_(s64) = G_FADD [[UV1]], [[FNEG3]]
				; SI: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s64>) = G_BUILD_VECTOR [[FADD]](s64), [[FADD1]](s64)
				; SI: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[BUILD_VECTOR]](<2 x s64>)
				; VI-LABEL: name: test_frem_v2s64
				; VI: [[COPY:%[0-9]+]]:_(<2 x s64>) = COPY $vgpr0_vgpr1_vgpr2_vgpr3
				; VI: [[COPY1:%[0-9]+]]:_(<2 x s64>) = COPY $vgpr4_vgpr5_vgpr6_vgpr7
				; VI: [[UV:%[0-9]+]]:_(s64), [[UV1:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES [[COPY]](<2 x s64>)
				; VI: [[UV2:%[0-9]+]]:_(s64), [[UV3:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES [[COPY1]](<2 x s64>)
				; VI: [[C:%[0-9]+]]:_(s64) = G_FCONSTANT double 1.000000e+00
				; VI: [[INT:%[0-9]+]]:_(s64), [[INT1:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV]](s64), [[UV2]](s64), 0
				; VI: [[FNEG:%[0-9]+]]:_(s64) = G_FNEG [[INT]]
				; VI: [[INT2:%[0-9]+]]:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), [[INT]](s64)
				; VI: [[FMA:%[0-9]+]]:_(s64) = G_FMA [[FNEG]], [[INT2]], [[C]]
				; VI: [[FMA1:%[0-9]+]]:_(s64) = G_FMA [[INT2]], [[FMA]], [[INT2]]
				; VI: [[FMA2:%[0-9]+]]:_(s64) = G_FMA [[FNEG]], [[FMA1]], [[C]]
				; VI: [[INT3:%[0-9]+]]:_(s64), [[INT4:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV]](s64), [[UV2]](s64), 1
				; VI: [[FMA3:%[0-9]+]]:_(s64) = G_FMA [[FMA1]], [[FMA2]], [[FMA1]]
				; VI: [[FMUL:%[0-9]+]]:_(s64) = G_FMUL [[INT3]], [[FMA3]]
				; VI: [[FMA4:%[0-9]+]]:_(s64) = G_FMA [[FNEG]], [[FMUL]], [[INT3]]
				; VI: [[INT5:%[0-9]+]]:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[FMA4]](s64), [[FMA3]](s64), [[FMUL]](s64), [[INT4]](s1)
				; VI: [[INT6:%[0-9]+]]:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fixup), [[INT5]](s64), [[UV2]](s64), [[UV]](s64)
				; VI: [[INTRINSIC_TRUNC:%[0-9]+]]:_(s64) = G_INTRINSIC_TRUNC [[INT6]]
				; VI: [[FMUL1:%[0-9]+]]:_(s64) = G_FMUL [[INTRINSIC_TRUNC]], [[UV2]]
				; VI: [[FNEG1:%[0-9]+]]:_(s64) = G_FNEG [[FMUL1]]
				; VI: [[FADD:%[0-9]+]]:_(s64) = G_FADD [[UV]], [[FNEG1]]
				; VI: [[INT7:%[0-9]+]]:_(s64), [[INT8:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV1]](s64), [[UV3]](s64), 0
				; VI: [[FNEG2:%[0-9]+]]:_(s64) = G_FNEG [[INT7]]
				; VI: [[INT9:%[0-9]+]]:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), [[INT7]](s64)
				; VI: [[FMA5:%[0-9]+]]:_(s64) = G_FMA [[FNEG2]], [[INT9]], [[C]]
				; VI: [[FMA6:%[0-9]+]]:_(s64) = G_FMA [[INT9]], [[FMA5]], [[INT9]]
				; VI: [[FMA7:%[0-9]+]]:_(s64) = G_FMA [[FNEG2]], [[FMA6]], [[C]]
				; VI: [[INT10:%[0-9]+]]:_(s64), [[INT11:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV1]](s64), [[UV3]](s64), 1
				; VI: [[FMA8:%[0-9]+]]:_(s64) = G_FMA [[FMA6]], [[FMA7]], [[FMA6]]
				; VI: [[FMUL2:%[0-9]+]]:_(s64) = G_FMUL [[INT10]], [[FMA8]]
				; VI: [[FMA9:%[0-9]+]]:_(s64) = G_FMA [[FNEG2]], [[FMUL2]], [[INT10]]
				; VI: [[INT12:%[0-9]+]]:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[FMA9]](s64), [[FMA8]](s64), [[FMUL2]](s64), [[INT11]](s1)
				; VI: [[INT13:%[0-9]+]]:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fixup), [[INT12]](s64), [[UV3]](s64), [[UV1]](s64)
				; VI: [[INTRINSIC_TRUNC1:%[0-9]+]]:_(s64) = G_INTRINSIC_TRUNC [[INT13]]
				; VI: [[FMUL3:%[0-9]+]]:_(s64) = G_FMUL [[INTRINSIC_TRUNC1]], [[UV3]]
				; VI: [[FNEG3:%[0-9]+]]:_(s64) = G_FNEG [[FMUL3]]
				; VI: [[FADD1:%[0-9]+]]:_(s64) = G_FADD [[UV1]], [[FNEG3]]
				; VI: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s64>) = G_BUILD_VECTOR [[FADD]](s64), [[FADD1]](s64)
				; VI: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[BUILD_VECTOR]](<2 x s64>)
				; GFX9-LABEL: name: test_frem_v2s64
				; GFX9: [[COPY:%[0-9]+]]:_(<2 x s64>) = COPY $vgpr0_vgpr1_vgpr2_vgpr3
				; GFX9: [[COPY1:%[0-9]+]]:_(<2 x s64>) = COPY $vgpr4_vgpr5_vgpr6_vgpr7
				; GFX9: [[UV:%[0-9]+]]:_(s64), [[UV1:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES [[COPY]](<2 x s64>)
				; GFX9: [[UV2:%[0-9]+]]:_(s64), [[UV3:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES [[COPY1]](<2 x s64>)
				; GFX9: [[C:%[0-9]+]]:_(s64) = G_FCONSTANT double 1.000000e+00
				; GFX9: [[INT:%[0-9]+]]:_(s64), [[INT1:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV]](s64), [[UV2]](s64), 0
				; GFX9: [[FNEG:%[0-9]+]]:_(s64) = G_FNEG [[INT]]
				; GFX9: [[INT2:%[0-9]+]]:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), [[INT]](s64)
				; GFX9: [[FMA:%[0-9]+]]:_(s64) = G_FMA [[FNEG]], [[INT2]], [[C]]
				; GFX9: [[FMA1:%[0-9]+]]:_(s64) = G_FMA [[INT2]], [[FMA]], [[INT2]]
				; GFX9: [[FMA2:%[0-9]+]]:_(s64) = G_FMA [[FNEG]], [[FMA1]], [[C]]
				; GFX9: [[INT3:%[0-9]+]]:_(s64), [[INT4:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV]](s64), [[UV2]](s64), 1
				; GFX9: [[FMA3:%[0-9]+]]:_(s64) = G_FMA [[FMA1]], [[FMA2]], [[FMA1]]
				; GFX9: [[FMUL:%[0-9]+]]:_(s64) = G_FMUL [[INT3]], [[FMA3]]
				; GFX9: [[FMA4:%[0-9]+]]:_(s64) = G_FMA [[FNEG]], [[FMUL]], [[INT3]]
				; GFX9: [[INT5:%[0-9]+]]:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[FMA4]](s64), [[FMA3]](s64), [[FMUL]](s64), [[INT4]](s1)
				; GFX9: [[INT6:%[0-9]+]]:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fixup), [[INT5]](s64), [[UV2]](s64), [[UV]](s64)
				; GFX9: [[INTRINSIC_TRUNC:%[0-9]+]]:_(s64) = G_INTRINSIC_TRUNC [[INT6]]
				; GFX9: [[FMUL1:%[0-9]+]]:_(s64) = G_FMUL [[INTRINSIC_TRUNC]], [[UV2]]
				; GFX9: [[FNEG1:%[0-9]+]]:_(s64) = G_FNEG [[FMUL1]]
				; GFX9: [[FADD:%[0-9]+]]:_(s64) = G_FADD [[UV]], [[FNEG1]]
				; GFX9: [[INT7:%[0-9]+]]:_(s64), [[INT8:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV1]](s64), [[UV3]](s64), 0
				; GFX9: [[FNEG2:%[0-9]+]]:_(s64) = G_FNEG [[INT7]]
				; GFX9: [[INT9:%[0-9]+]]:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), [[INT7]](s64)
				; GFX9: [[FMA5:%[0-9]+]]:_(s64) = G_FMA [[FNEG2]], [[INT9]], [[C]]
				; GFX9: [[FMA6:%[0-9]+]]:_(s64) = G_FMA [[INT9]], [[FMA5]], [[INT9]]
				; GFX9: [[FMA7:%[0-9]+]]:_(s64) = G_FMA [[FNEG2]], [[FMA6]], [[C]]
				; GFX9: [[INT10:%[0-9]+]]:_(s64), [[INT11:%[0-9]+]]:_(s1) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.scale), [[UV1]](s64), [[UV3]](s64), 1
				; GFX9: [[FMA8:%[0-9]+]]:_(s64) = G_FMA [[FMA6]], [[FMA7]], [[FMA6]]
				; GFX9: [[FMUL2:%[0-9]+]]:_(s64) = G_FMUL [[INT10]], [[FMA8]]
				; GFX9: [[FMA9:%[0-9]+]]:_(s64) = G_FMA [[FNEG2]], [[FMUL2]], [[INT10]]
				; GFX9: [[INT12:%[0-9]+]]:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fmas), [[FMA9]](s64), [[FMA8]](s64), [[FMUL2]](s64), [[INT11]](s1)
				; GFX9: [[INT13:%[0-9]+]]:_(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.div.fixup), [[INT12]](s64), [[UV3]](s64), [[UV1]](s64)
				; GFX9: [[INTRINSIC_TRUNC1:%[0-9]+]]:_(s64) = G_INTRINSIC_TRUNC [[INT13]]
				; GFX9: [[FMUL3:%[0-9]+]]:_(s64) = G_FMUL [[INTRINSIC_TRUNC1]], [[UV3]]
				; GFX9: [[FNEG3:%[0-9]+]]:_(s64) = G_FNEG [[FMUL3]]
				; GFX9: [[FADD1:%[0-9]+]]:_(s64) = G_FADD [[UV1]], [[FNEG3]]
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s64>) = G_BUILD_VECTOR [[FADD]](s64), [[FADD1]](s64)
				; GFX9: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[BUILD_VECTOR]](<2 x s64>)
				%0:_(<2 x s64>) = COPY $vgpr0_vgpr1_vgpr2_vgpr3
				%1:_(<2 x s64>) = COPY $vgpr4_vgpr5_vgpr6_vgpr7
				%2:_(<2 x s64>) = G_FREM %0, %1
				$vgpr0_vgpr1_vgpr2_vgpr3 = COPY %2
				...

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Lower G_FREMClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 279840

llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-frem.mir

AMDGPU/GlobalISel: Lower G_FREM
ClosedPublic