This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
2/6
AMDGPUISelLowering.cpp
2/2
AMDGPUInstrInfo.td
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
7/7
llvm.is.fpclass.ll

Differential D135447

[AMDGPU] Add llvm.is.fpclass intrinsic to existing SelectionDAG fp class support and introduce GlobalISel implementation for AMDGPU
ClosedPublic

Authored by JanekvO on Oct 7 2022, 7:24 AM.

Download Raw Diff

Details

Reviewers

arsenm

Summary

Uses existing SelectionDAG lowering of the llvm.amdgcn.class intrinsic for llvm.is.fpclass

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

JanekvO created this revision.Oct 7 2022, 7:24 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2022, 7:24 AM

Herald added subscribers: kosarev, foad, kerbowa and 7 others. · View Herald Transcript

JanekvO requested review of this revision.Oct 7 2022, 7:24 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2022, 7:24 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B190943: Diff 466071.Oct 7 2022, 8:20 AM

Can do it as a follow up commit, but the existing combines we have for AMDGPU::FP_CLASS should be ported to use the generic intrinsic. Also, llvm.amdgcn.class should get bitcode upgraded to the generic

llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll
182	Should add some vector cases too

arsenm added inline comments.Oct 7 2022, 9:15 AM

llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll
3	Should also test/handle globalisel

In D135447#3843046, @arsenm wrote:

Can do it as a follow up commit, but the existing combines we have for AMDGPU::FP_CLASS should be ported to use the generic intrinsic. Also, llvm.amdgcn.class should get bitcode upgraded to the generic

I just realized the amdgpu intrinsic allows non-immediate arguments, but is_fpclass does not so these are not equivalent

JanekvO planned changes to this revision.Oct 10 2022, 7:14 AM

SelectionDAG fpclass vector support
GlobalISel llvm.is.fpclass support for AMDGPU
rebase

Harbormaster completed remote builds in B195929: Diff 472930.Nov 3 2022, 8:09 AM

JanekvO retitled this revision from [AMDGPU] Add llvm.is.fpclass intrinsic to existing SelectionDAG fp class support for AMDGPU to [AMDGPU] Add llvm.is.fpclass intrinsic to existing SelectionDAG fp class support and introduce GlobalISel implementation for AMDGPU.Nov 3 2022, 10:20 AM

arsenm added inline comments.Nov 3 2022, 11:49 AM

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2320 ↗	(On Diff #472930)	This should get an IRTranslator test to make sure the flags are passed through
2324–2325 ↗	(On Diff #472930)	getUniqueInteger is unnecessarily fancy, can just cast to ConstantInt directly
2332 ↗	(On Diff #472930)	Do you really need the float type operand? I know bfloat16 isn't going to work without it, but I thought the plan was to introduce FP types to LLT
llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.td
133	Should avoid defining an AMDGPU node for this and move this to generic code
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
870 ↗	(On Diff #472930)	I don't see why you need to manually select this (maybe sharing the pattern between the existing intrinsic is annoying because the new intrinsic uses immarg?)
880–883 ↗	(On Diff #472930)	Should be no reason to check this here
894–902 ↗	(On Diff #472930)	You can just unconditionally materialize the constant into a register and let SIFoldOperands sort out the constant bus restriction
905–906 ↗	(On Diff #472930)	You shouldn't need to special case the result constraint
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
3945 ↗	(On Diff #472930)	Pretty sure this default constructs to null
llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll
3	Should use some share prefixes, a lot of these functions are the same. Also needs a gfx7 and 8 run lines for the half promotion
1923	v3f16 and v4f16 are also potentially interesting

SelectionDAG fpclass vector support
GlobalISel llvm.is.fpclass support for AMDGPU
Address comments, add custom half promotion for gfx7

JanekvO added inline comments.Nov 8 2022, 7:48 AM

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2320 ↗	(On Diff #472930)	Not sure if I completely hit the mark with my added test, but to me it seemed that not all flags were possible (e.g., `nnan` flag didn't work as it required a fp return type). For now I've added flag related tests that explicitly test the addition of `nofpexcept`. Do let me know if there's something missing or whether this `copyFlagsFromInstruction` is better omitted.
2332 ↗	(On Diff #472930)	I believe it's not necessary for amdgpu but required for the `G_IS_FPCLASS` target opcode. Leaving it out results in verifier errors (I also am unaware about introducing FP types and LLT).
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
870 ↗	(On Diff #472930)	I did look on whether I could re-use some of the existing tablegen but I couldn't get it quite into the right shape for it to match. `llvm.is.fpclass` requires the mask to be an immarg as you mentioned so materializing the immediate into a register anywhere before this function results in a verifier error.
llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll
3	I'm not that well versed in how gfx7 should do half promotion. I feel like either gfx7 selectiondag or gfx7 globalisel half promotion tests are incorrect (and if not, selectiondag version does seem suboptimal).

arsenm added inline comments.Nov 8 2022, 8:05 AM

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
2450 ↗	(On Diff #474002)	This will do the wrong thing for snans and also denormals inputs are flushed

Harbormaster completed remote builds in B196711: Diff 474002.Nov 8 2022, 8:15 AM

arsenm added a child revision: D137811: InstCombine: Perform basic isnan combines on llvm.is.fpclass.Nov 10 2022, 7:13 PM

arsenm added inline comments.Nov 10 2022, 7:17 PM

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
2450 ↗	(On Diff #474002)	I also don't see the corresponding DAG legalization. It's such a special case I think this should be split into a separate patch anyway.

arsenm added inline comments.Nov 10 2022, 7:19 PM

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2320 ↗	(On Diff #472930)	I'd consider that a pre-existing bug in intrinsics. The IR is annoyingly strict about what things are allowed to have flags
2332 ↗	(On Diff #472930)	What do you mean verifier errors?

foad added inline comments.Nov 10 2022, 11:27 PM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6492–6493 ↗	(On Diff #474002)	Why did this change?
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
318–320	It seems annoying to have such a long list of types here - it'll need updating whenever we introduce a new one. Can you use something like FloatVectorTypes instead?

JanekvO marked 2 inline comments as done.Nov 11 2022, 6:15 AM

JanekvO added inline comments.

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2332 ↗	(On Diff #472930)	Sorry, I meant that the MachineVerifier will fail for the `G_IS_FPCLASS` instruction.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
2450 ↗	(On Diff #474002)	I also don't see the corresponding DAG legalization. I put the corresponding SelectionDAG type widening code for `IS_FPCLASS` is in target custom function `LowerIS_FPCLASS` as I couldn't bypass expansion in SelectionDAGBuilder.cpp when marking the action for the instruction with f16 as `promote` (i.e., it would call `IS_FPCLASS` expansion code even when trying to promote). It's such a special case I think this should be split into a separate patch anyway. As in, the widening code, or `IS_FPCLASS` support for amdgpu gfx7?
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6492–6493 ↗	(On Diff #474002)	`isOperationLegalOrCustom` will return false if the type is considered illegal regardless of whether the instruction's type is marked legal or custom whereas `isOperationCustom` won't explicitly check for type legality and returns whether the action was set to custom. I basically just wanted it to go through to target custom code. (May revert this in favor of using the expand code for f16 in case there is no f16 fp class instruction for amdgpu)

For this patch I'd like to drop all the attempts to handle legalizing the f16 case and move that to a separate patch. It's a much more complicated edge case that doesn't have much in common with the base handling

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2332 ↗	(On Diff #472930)	Right, it's there in the operand list. I mean more abstractly, why is it there?
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
2450 ↗	(On Diff #474002)	OK, there are several issues here. None of this should be done in target code. I also don't approve of doing this expansion in the DAG builder, but see that's a pre-existing issue. GlobalISel does need to do the same expansion.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6492–6493 ↗	(On Diff #474002)	For the no f16 case, I think we need to do software expansion to get correct results for denormal values
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
318–320	This should be unnecessary, we have no vector class instructions. These should just expand into scalars
2731	This doesn't work correctly for denormals. The f16 denormal value won't be denormal after casting to f32 (if it wasn't flushed to zero under DAZ or FTZ modes)

JanekvO added inline comments.Nov 11 2022, 11:06 AM

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2332 ↗	(On Diff #472930)	I can see the semantics being used in the target independent expansion of `IS_FPCLASS` in SelectionDAG (e.g., for retrieving `inf` of a particular fp semantic). I'm inferring that the rationale could be: GlobalISel will require a similar implementation and therefore requires the semantics. I haven't looked into whether any alternatives exist that don't require passing of the semantics through the operand, though.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
2450 ↗	(On Diff #474002)	I was looking at implementing the SelectionDAG target independent expansion for GlobalISel `lower()`. I'll first remove f16 legalizing for cases where there is no f16 instructions available for amdgpu for this diff and move the GlobalISel's expansion/lower to another diff.
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
318–320	If not set as custom (or legal), these'll get expanded through the target independent expansion. Bypassing said target independent expansion does result into the desired scalarizing.

arsenm added inline comments.Nov 11 2022, 11:14 AM

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2332 ↗	(On Diff #472930)	Currently the LLT directly implies the semantics for every operation
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
318–320	This is one of the problems with doing this kind of expansion in SelectionDAGBuilder. This should go through the usual legalization paths

Remove IS_FPCLASS amdgpu f16 legalization, split tests into f16 and not f16 cases, temporarily disable gfx7 glisel tests
Rebase

JanekvO added a subscriber: sepavloff.Nov 14 2022, 5:37 AM

JanekvO added inline comments.

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2332 ↗	(On Diff #472930)	@sepavloff Do you happen to recall the rationale of the fp semantic operand for `G_IS_FPCLASS`? My knowledge about it are a bit shallow but perhaps it can be removed

Harbormaster completed remote builds in B197510: Diff 475117.Nov 14 2022, 6:18 AM

sepavloff added inline comments.Nov 14 2022, 7:45 AM

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2332 ↗	(On Diff #472930)	It is used to workaround limitations of GlobalISel, - lack of floating-point types. Without this operand it is impossible to distinguish between `half` and `bfloat16` and also between different flavors of 8-bit floats. If LLT supported floating-point types, this operand could be removed.

arsenm added inline comments.Nov 14 2022, 9:11 AM

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2332 ↗	(On Diff #472930)	But this is a problem for every single operation, not just this one. We don't have a decided upon strategy for dealing with this, so it doesn't make sense to me to try to deal with it here

sepavloff added inline comments.Nov 14 2022, 10:43 PM

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2332 ↗	(On Diff #472930)	Sounds reasonable. Let's remove it, in separate commit.
2332 ↗	(On Diff #472930)	See https://reviews.llvm.org/D138004.

Remove fpsem operand construction in irtranslator for G_IS_FPCLASS

Harbormaster completed remote builds in B197732: Diff 475421.Nov 15 2022, 4:55 AM

arsenm added inline comments.Nov 16 2022, 2:54 PM

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
314	Can you add a fixme that we just want scalarization?
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
870 ↗	(On Diff #472930)	You might need to split it into a different pattern instantiation, but you would just need the S_MOV_B32 from the mask to the constant (although I actually would expect it to work if you directly folded the constant anyway, since the operand should have been copied to VGPR anyway). Something like: class ClassPat<Instruction inst, ValueType vt> : GCNPat < (fp_class (VOP3Mods vt:$src0, i32:$src0_mods), (i32 timm:mask)) (inst $src0_mods, VSrc_b32:$src0, $src0_mods, (S_MOV_B32 $mask)) >;
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
983 ↗	(On Diff #475421)	I think this clampScalar isn't doing anything and can be dropped
llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll
12	Can you also add some cases where the input will be an SGPR?
107	s/float/f32 in these function names

arsenm requested changes to this revision.Nov 16 2022, 2:56 PM

arsenm added inline comments.

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-fpclass-flags.ll
1 ↗	(On Diff #475421)	-global-isel to front, also generate these checks
16 ↗	(On Diff #475421)	Needs additional checks with other flags besides the one just set

This revision now requires changes to proceed.Nov 16 2022, 2:56 PM

JanekvO added inline comments.Nov 18 2022, 6:57 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-fpclass-flags.ll
16 ↗	(On Diff #475421)	I've been wondering whether the flag copy from the IR intrinsic to G_IS_FPCLASS in IRTranslator should be removed altogether. I'd have to weaken the flags' constraints as they all require scalar or vector fp return types. Additionally, Any use of fast math flags outside of existing uses will most likely require amending langref. E.g., current descriptions of some fast math flags describe how input can result into a poison value but this wouldn't be possible for G_IS_FPCLASS as it's a bool return. Let me know what you think, I can see some of the flags being useful by folding into constant bool values (e.g., not a nan flag + G_IS_FPCLASS test for nans) but I may be a bit naïve on useful cases beyond said folding.

arsenm added inline comments.Nov 18 2022, 3:53 PM

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2320 ↗	(On Diff #472930)	You can remove the flag copy if you want, although the flags may be introduced in the future
2333–2334 ↗	(On Diff #475421)	I just realized there's no point in doing this. G_IS_FPCLASS is not marked as mayRaiseFPException, so the flag is implied
llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-fpclass-flags.ll
16 ↗	(On Diff #475421)	OK, might as well drop this test if we have end to end tests and there's nothing unique to test in the IRTranslator

Remove patfrag dependency for is_fpclass
Add dedicated patterns
Remove globalisel manual selection and depend on selectiondag tablegen
Address comments

JanekvO marked 4 inline comments as done.Nov 25 2022, 8:07 AM

Harbormaster completed remote builds in B199558: Diff 477969.Nov 25 2022, 8:52 AM

LGTM with nits. you have some dead checks and dead code

llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.td
132–134	Whole file is now whitespace only changes which can be dropped
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
913–922 ↗	(On Diff #477969)	Dead code
llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.f16.ll
94–95 ↗	(On Diff #477969)	This is broken for signaling nans. You dropped this from the patch but left these dead checks around

This revision is now accepted and ready to land.Nov 28 2022, 9:34 AM

Remove dead code and tests

arsenm accepted this revision.Nov 28 2022, 12:17 PM

Sorry, haven't gotten github access yet: could you (or somebody in AMDGPU group) land this for me? 😅

322966f8f8aa2ee1146c40eabe52c9ebeb91dab7

Harbormaster completed remote builds in B199830: Diff 478322.Nov 28 2022, 2:15 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUISelLowering.cpp

2 lines

AMDGPUInstrInfo.td

4 lines

test/

CodeGen/

AMDGPU/

llvm.is.fpclass.ll

182 lines

Diff 466071

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 300 Lines • ▼ Show 20 Lines	AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::FROUND, {MVT::f32, MVT::f64}, Custom);		setOperationAction(ISD::FROUND, {MVT::f32, MVT::f64}, Custom);

setOperationAction({ISD::FLOG, ISD::FLOG10, ISD::FEXP}, MVT::f32, Custom);		setOperationAction({ISD::FLOG, ISD::FLOG10, ISD::FEXP}, MVT::f32, Custom);

setOperationAction(ISD::FNEARBYINT, {MVT::f32, MVT::f64}, Custom);		setOperationAction(ISD::FNEARBYINT, {MVT::f32, MVT::f64}, Custom);

setOperationAction(ISD::FREM, {MVT::f16, MVT::f32, MVT::f64}, Custom);		setOperationAction(ISD::FREM, {MVT::f16, MVT::f32, MVT::f64}, Custom);

		setOperationAction(ISD::IS_FPCLASS, {MVT::f16, MVT::f32, MVT::f64}, Legal);

// Expand to fneg + fadd.		// Expand to fneg + fadd.
setOperationAction(ISD::FSUB, MVT::f64, Expand);		setOperationAction(ISD::FSUB, MVT::f64, Expand);

setOperationAction(ISD::CONCAT_VECTORS,		setOperationAction(ISD::CONCAT_VECTORS,
		arsenmUnsubmitted Done Reply Inline Actions Can you add a fixme that we just want scalarization? arsenm: Can you add a fixme that we just want scalarization?
{MVT::v3i32, MVT::v3f32, MVT::v4i32, MVT::v4f32,		{MVT::v3i32, MVT::v3f32, MVT::v4i32, MVT::v4f32,
MVT::v5i32, MVT::v5f32, MVT::v6i32, MVT::v6f32,		MVT::v5i32, MVT::v5f32, MVT::v6i32, MVT::v6f32,
MVT::v7i32, MVT::v7f32, MVT::v8i32, MVT::v8f32},		MVT::v7i32, MVT::v7f32, MVT::v8i32, MVT::v8f32},
Custom);		Custom);
setOperationAction(		setOperationAction(
ISD::EXTRACT_SUBVECTOR,		ISD::EXTRACT_SUBVECTOR,
		foadUnsubmitted Not Done Reply Inline Actions It seems annoying to have such a long list of types here - it'll need updating whenever we introduce a new one. Can you use something like FloatVectorTypes instead? foad: It seems annoying to have such a long list of types here - it'll need updating whenever we…
		arsenmUnsubmitted Not Done Reply Inline Actions This should be unnecessary, we have no vector class instructions. These should just expand into scalars arsenm: This should be unnecessary, we have no vector class instructions. These should just expand into…
		JanekvOAuthorUnsubmitted Done Reply Inline Actions If not set as custom (or legal), these'll get expanded through the target independent expansion. Bypassing said target independent expansion does result into the desired scalarizing. JanekvO: If not set as custom (or legal), these'll get expanded through the target independent expansion.
		arsenmUnsubmitted Not Done Reply Inline Actions This is one of the problems with doing this kind of expansion in SelectionDAGBuilder. This should go through the usual legalization paths arsenm: This is one of the problems with doing this kind of expansion in SelectionDAGBuilder. This…
{MVT::v2f16, MVT::v2i16, MVT::v4f16, MVT::v4i16, MVT::v2f32,		{MVT::v2f16, MVT::v2i16, MVT::v4f16, MVT::v4i16, MVT::v2f32,
MVT::v2i32, MVT::v3f32, MVT::v3i32, MVT::v4f32, MVT::v4i32,		MVT::v2i32, MVT::v3f32, MVT::v3i32, MVT::v4f32, MVT::v4i32,
MVT::v5f32, MVT::v5i32, MVT::v6f32, MVT::v6i32, MVT::v7f32,		MVT::v5f32, MVT::v5i32, MVT::v6f32, MVT::v6i32, MVT::v7f32,
MVT::v7i32, MVT::v8f32, MVT::v8i32, MVT::v16f16, MVT::v16i16,		MVT::v7i32, MVT::v8f32, MVT::v8i32, MVT::v16f16, MVT::v16i16,
MVT::v16f32, MVT::v16i32, MVT::v32f32, MVT::v32i32, MVT::v2f64,		MVT::v16f32, MVT::v16i32, MVT::v32f32, MVT::v32i32, MVT::v2f64,
MVT::v2i64, MVT::v3f64, MVT::v3i64, MVT::v4f64, MVT::v4i64,		MVT::v2i64, MVT::v3f64, MVT::v3i64, MVT::v4f64, MVT::v4i64,
MVT::v8f64, MVT::v8i64, MVT::v16f64, MVT::v16i64},		MVT::v8f64, MVT::v8i64, MVT::v16f64, MVT::v16i64},
Custom);		Custom);
▲ Show 20 Lines • Show All 2,394 Lines • ▼ Show 20 Lines	SDValue AMDGPUTargetLowering::LowerFP_TO_INT(SDValue Op,
return SDValue();		return SDValue();
}		}

SDValue AMDGPUTargetLowering::LowerSIGN_EXTEND_INREG(SDValue Op,		SDValue AMDGPUTargetLowering::LowerSIGN_EXTEND_INREG(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
EVT ExtraVT = cast<VTSDNode>(Op.getOperand(1))->getVT();		EVT ExtraVT = cast<VTSDNode>(Op.getOperand(1))->getVT();
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();
MVT ScalarVT = VT.getScalarType();		MVT ScalarVT = VT.getScalarType();

		arsenmUnsubmitted Not Done Reply Inline Actions This doesn't work correctly for denormals. The f16 denormal value won't be denormal after casting to f32 (if it wasn't flushed to zero under DAZ or FTZ modes) arsenm: This doesn't work correctly for denormals. The f16 denormal value won't be denormal after…
assert(VT.isVector());		assert(VT.isVector());

SDValue Src = Op.getOperand(0);		SDValue Src = Op.getOperand(0);
SDLoc DL(Op);		SDLoc DL(Op);

// TODO: Don't scalarize on Evergreen?		// TODO: Don't scalarize on Evergreen?
unsigned NElts = VT.getVectorNumElements();		unsigned NElts = VT.getVectorNumElements();
SmallVector<SDValue, 8> Args;		SmallVector<SDValue, 8> Args;
▲ Show 20 Lines • Show All 2,099 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.td

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
def AMDGPUldexp_impl : SDNode<"AMDGPUISD::LDEXP", AMDGPULdExpOp>;		def AMDGPUldexp_impl : SDNode<"AMDGPUISD::LDEXP", AMDGPULdExpOp>;

def AMDGPUpkrtz_f16_f32_impl : SDNode<"AMDGPUISD::CVT_PKRTZ_F16_F32", AMDGPUFPPackOp>;		def AMDGPUpkrtz_f16_f32_impl : SDNode<"AMDGPUISD::CVT_PKRTZ_F16_F32", AMDGPUFPPackOp>;
def AMDGPUpknorm_i16_f32_impl : SDNode<"AMDGPUISD::CVT_PKNORM_I16_F32", AMDGPUFPPackOp>;		def AMDGPUpknorm_i16_f32_impl : SDNode<"AMDGPUISD::CVT_PKNORM_I16_F32", AMDGPUFPPackOp>;
def AMDGPUpknorm_u16_f32_impl : SDNode<"AMDGPUISD::CVT_PKNORM_U16_F32", AMDGPUFPPackOp>;		def AMDGPUpknorm_u16_f32_impl : SDNode<"AMDGPUISD::CVT_PKNORM_U16_F32", AMDGPUFPPackOp>;
def AMDGPUpk_i16_i32_impl : SDNode<"AMDGPUISD::CVT_PK_I16_I32", AMDGPUIntPackOp>;		def AMDGPUpk_i16_i32_impl : SDNode<"AMDGPUISD::CVT_PK_I16_I32", AMDGPUIntPackOp>;
def AMDGPUpk_u16_u32_impl : SDNode<"AMDGPUISD::CVT_PK_U16_U32", AMDGPUIntPackOp>;		def AMDGPUpk_u16_u32_impl : SDNode<"AMDGPUISD::CVT_PK_U16_U32", AMDGPUIntPackOp>;
def AMDGPUfp_to_f16 : SDNode<"AMDGPUISD::FP_TO_FP16" , SDTFPToIntOp>;		def AMDGPUfp_to_f16 : SDNode<"AMDGPUISD::FP_TO_FP16" , SDTFPToIntOp>;

		def AMDGPUis_fpclass_impl : SDNode<"ISD::IS_FPCLASS", AMDGPUFPClassOp>;
		arsenmUnsubmitted Done Reply Inline Actions Should avoid defining an AMDGPU node for this and move this to generic code arsenm: Should avoid defining an AMDGPU node for this and move this to generic code

		arsenmUnsubmitted Done Reply Inline Actions Whole file is now whitespace only changes which can be dropped arsenm: Whole file is now whitespace only changes which can be dropped
def AMDGPUfp_class_impl : SDNode<"AMDGPUISD::FP_CLASS", AMDGPUFPClassOp>;		def AMDGPUfp_class_impl : SDNode<"AMDGPUISD::FP_CLASS", AMDGPUFPClassOp>;

// out = max(a, b) a and b are floats, where a nan comparison fails.		// out = max(a, b) a and b are floats, where a nan comparison fails.
// This is not commutative because this gives the second operand:		// This is not commutative because this gives the second operand:
// x < nan ? x : nan -> nan		// x < nan ? x : nan -> nan
// nan < x ? nan : x -> x		// nan < x ? nan : x -> x
def AMDGPUfmax_legacy : SDNode<"AMDGPUISD::FMAX_LEGACY", SDTFPBinOp,		def AMDGPUfmax_legacy : SDNode<"AMDGPUISD::FMAX_LEGACY", SDTFPBinOp,
[]		[]
▲ Show 20 Lines • Show All 241 Lines • ▼ Show 20 Lines	def AMDGPUfract : PatFrags<(ops node:$src), [(int_amdgcn_fract node:$src),
(AMDGPUfract_impl node:$src)]>;		(AMDGPUfract_impl node:$src)]>;

def AMDGPUldexp : PatFrags<(ops node:$src0, node:$src1),		def AMDGPUldexp : PatFrags<(ops node:$src0, node:$src1),
[(int_amdgcn_ldexp node:$src0, node:$src1),		[(int_amdgcn_ldexp node:$src0, node:$src1),
(AMDGPUldexp_impl node:$src0, node:$src1)]>;		(AMDGPUldexp_impl node:$src0, node:$src1)]>;

def AMDGPUfp_class : PatFrags<(ops node:$src0, node:$src1),		def AMDGPUfp_class : PatFrags<(ops node:$src0, node:$src1),
[(int_amdgcn_class node:$src0, node:$src1),		[(int_amdgcn_class node:$src0, node:$src1),
(AMDGPUfp_class_impl node:$src0, node:$src1)]>;		(AMDGPUfp_class_impl node:$src0, node:$src1),
		(AMDGPUis_fpclass_impl node:$src0, node:$src1)]>;

def AMDGPUfmed3 : PatFrags<(ops node:$src0, node:$src1, node:$src2),		def AMDGPUfmed3 : PatFrags<(ops node:$src0, node:$src1, node:$src2),
[(int_amdgcn_fmed3 node:$src0, node:$src1, node:$src2),		[(int_amdgcn_fmed3 node:$src0, node:$src1, node:$src2),
(AMDGPUfmed3_impl node:$src0, node:$src1, node:$src2)]>;		(AMDGPUfmed3_impl node:$src0, node:$src1, node:$src2)]>;

def AMDGPUdiv_fixup : PatFrags<(ops node:$src0, node:$src1, node:$src2),		def AMDGPUdiv_fixup : PatFrags<(ops node:$src0, node:$src1, node:$src2),
[(int_amdgcn_div_fixup node:$src0, node:$src1, node:$src2),		[(int_amdgcn_div_fixup node:$src0, node:$src1, node:$src2),
(AMDGPUdiv_fixup_impl node:$src0, node:$src1, node:$src2)]>;		(AMDGPUdiv_fixup_impl node:$src0, node:$src1, node:$src2)]>;
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -global-isel=0 -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s \| FileCheck --check-prefix=SELDAG %s

				arsenmUnsubmitted Done Reply Inline Actions Should also test/handle globalisel arsenm: Should also test/handle globalisel
				arsenmUnsubmitted Done Reply Inline Actions Should use some share prefixes, a lot of these functions are the same. Also needs a gfx7 and 8 run lines for the half promotion arsenm: Should use some share prefixes, a lot of these functions are the same. Also needs a gfx7 and 8…
				JanekvOAuthorUnsubmitted Done Reply Inline Actions I'm not that well versed in how gfx7 should do half promotion. I feel like either gfx7 selectiondag or gfx7 globalisel half promotion tests are incorrect (and if not, selectiondag version does seem suboptimal). JanekvO: I'm not that well versed in how gfx7 should do half promotion. I feel like either gfx7…

				define i1 @isnan_half(half %x) nounwind {
				; SELDAG-LABEL: isnan_half:
				; SELDAG: ; %bb.0:
				; SELDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; SELDAG-NEXT: v_cmp_class_f16_e64 s[4:5], v0, 3
				; SELDAG-NEXT: v_cndmask_b32_e64 v0, 0, 1, s[4:5]
				; SELDAG-NEXT: s_setpc_b64 s[30:31]
				%1 = call i1 @llvm.is.fpclass.f16(half %x, i32 3) ; nan
				arsenmUnsubmitted Done Reply Inline Actions Can you also add some cases where the input will be an SGPR? arsenm: Can you also add some cases where the input will be an SGPR?
				ret i1 %1
				}

				define i1 @isnan_float(float %x) nounwind {
				; SELDAG-LABEL: isnan_float:
				; SELDAG: ; %bb.0:
				; SELDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; SELDAG-NEXT: v_cmp_class_f32_e64 s[4:5], v0, 3
				; SELDAG-NEXT: v_cndmask_b32_e64 v0, 0, 1, s[4:5]
				; SELDAG-NEXT: s_setpc_b64 s[30:31]
				%1 = call i1 @llvm.is.fpclass.f32(float %x, i32 3) ; nan
				ret i1 %1
				}

				define i1 @isnan_double(double %x) nounwind {
				; SELDAG-LABEL: isnan_double:
				; SELDAG: ; %bb.0:
				; SELDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; SELDAG-NEXT: v_cmp_class_f64_e64 s[4:5], v[0:1], 3
				; SELDAG-NEXT: v_cndmask_b32_e64 v0, 0, 1, s[4:5]
				; SELDAG-NEXT: s_setpc_b64 s[30:31]
				%1 = call i1 @llvm.is.fpclass.f64(double %x, i32 3) ; nan
				ret i1 %1
				}

				define i1 @isnan_half_strictfp(half %x) strictfp nounwind {
				; SELDAG-LABEL: isnan_half_strictfp:
				; SELDAG: ; %bb.0:
				; SELDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; SELDAG-NEXT: v_cmp_class_f16_e64 s[4:5], v0, 3
				; SELDAG-NEXT: v_cndmask_b32_e64 v0, 0, 1, s[4:5]
				; SELDAG-NEXT: s_setpc_b64 s[30:31]
				%1 = call i1 @llvm.is.fpclass.f16(half %x, i32 3) ; nan
				ret i1 %1
				}

				define i1 @isnan_float_strictfp(float %x) strictfp nounwind {
				; SELDAG-LABEL: isnan_float_strictfp:
				; SELDAG: ; %bb.0:
				; SELDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; SELDAG-NEXT: v_cmp_class_f32_e64 s[4:5], v0, 3
				; SELDAG-NEXT: v_cndmask_b32_e64 v0, 0, 1, s[4:5]
				; SELDAG-NEXT: s_setpc_b64 s[30:31]
				%1 = call i1 @llvm.is.fpclass.f32(float %x, i32 3) ; nan
				ret i1 %1
				}

				define i1 @isnan_double_strictfp(double %x) strictfp nounwind {
				; SELDAG-LABEL: isnan_double_strictfp:
				; SELDAG: ; %bb.0:
				; SELDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; SELDAG-NEXT: v_cmp_class_f64_e64 s[4:5], v[0:1], 3
				; SELDAG-NEXT: v_cndmask_b32_e64 v0, 0, 1, s[4:5]
				; SELDAG-NEXT: s_setpc_b64 s[30:31]
				%1 = call i1 @llvm.is.fpclass.f64(double %x, i32 3) ; nan
				ret i1 %1
				}

				define i1 @isinf_half(half %x) nounwind {
				; SELDAG-LABEL: isinf_half:
				; SELDAG: ; %bb.0:
				; SELDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; SELDAG-NEXT: v_mov_b32_e32 v1, 0x204
				; SELDAG-NEXT: v_cmp_class_f16_e32 vcc, v0, v1
				; SELDAG-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc
				; SELDAG-NEXT: s_setpc_b64 s[30:31]
				%1 = call i1 @llvm.is.fpclass.f16(half %x, i32 516) ; 0x204 = "inf"
				ret i1 %1
				}

				define i1 @isinf_float(float %x) nounwind {
				; SELDAG-LABEL: isinf_float:
				; SELDAG: ; %bb.0:
				; SELDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; SELDAG-NEXT: v_mov_b32_e32 v1, 0x204
				; SELDAG-NEXT: v_cmp_class_f32_e32 vcc, v0, v1
				; SELDAG-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc
				; SELDAG-NEXT: s_setpc_b64 s[30:31]
				%1 = call i1 @llvm.is.fpclass.f32(float %x, i32 516) ; 0x204 = "inf"
				ret i1 %1
				}

				define i1 @isinf_double(double %x) nounwind {
				; SELDAG-LABEL: isinf_double:
				; SELDAG: ; %bb.0:
				; SELDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; SELDAG-NEXT: v_mov_b32_e32 v2, 0x204
				; SELDAG-NEXT: v_cmp_class_f64_e32 vcc, v[0:1], v2
				; SELDAG-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc
				; SELDAG-NEXT: s_setpc_b64 s[30:31]
				%1 = call i1 @llvm.is.fpclass.f64(double %x, i32 516) ; 0x204 = "inf"
				ret i1 %1
				}

				define i1 @isfinite_half(half %x) nounwind {
				arsenmUnsubmitted Done Reply Inline Actions s/float/f32 in these function names arsenm: s/float/f32 in these function names
				; SELDAG-LABEL: isfinite_half:
				; SELDAG: ; %bb.0:
				; SELDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; SELDAG-NEXT: v_mov_b32_e32 v1, 0x1f8
				; SELDAG-NEXT: v_cmp_class_f16_e32 vcc, v0, v1
				; SELDAG-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc
				; SELDAG-NEXT: s_setpc_b64 s[30:31]
				%1 = call i1 @llvm.is.fpclass.f16(half %x, i32 504) ; 0x1f8 = "finite"
				ret i1 %1
				}

				define i1 @isfinite_float(float %x) nounwind {
				; SELDAG-LABEL: isfinite_float:
				; SELDAG: ; %bb.0:
				; SELDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; SELDAG-NEXT: v_mov_b32_e32 v1, 0x1f8
				; SELDAG-NEXT: v_cmp_class_f32_e32 vcc, v0, v1
				; SELDAG-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc
				; SELDAG-NEXT: s_setpc_b64 s[30:31]
				%1 = call i1 @llvm.is.fpclass.f32(float %x, i32 504) ; 0x1f8 = "finite"
				ret i1 %1
				}

				define i1 @isfinite_double(double %x) nounwind {
				; SELDAG-LABEL: isfinite_double:
				; SELDAG: ; %bb.0:
				; SELDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; SELDAG-NEXT: v_mov_b32_e32 v2, 0x1f8
				; SELDAG-NEXT: v_cmp_class_f64_e32 vcc, v[0:1], v2
				; SELDAG-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc
				; SELDAG-NEXT: s_setpc_b64 s[30:31]
				%1 = call i1 @llvm.is.fpclass.f64(double %x, i32 504) ; 0x1f8 = "finite"
				ret i1 %1
				}

				define i1 @isnormal_float(float %x) nounwind {
				; SELDAG-LABEL: isnormal_float:
				; SELDAG: ; %bb.0:
				; SELDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; SELDAG-NEXT: v_mov_b32_e32 v1, 0x108
				; SELDAG-NEXT: v_cmp_class_f32_e32 vcc, v0, v1
				; SELDAG-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc
				; SELDAG-NEXT: s_setpc_b64 s[30:31]
				%1 = call i1 @llvm.is.fpclass.f32(float %x, i32 264) ; 0x108 = "normal"
				ret i1 %1
				}

				define i1 @issubnormal_float(float %x) nounwind {
				; SELDAG-LABEL: issubnormal_float:
				; SELDAG: ; %bb.0:
				; SELDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; SELDAG-NEXT: v_mov_b32_e32 v1, 0x90
				; SELDAG-NEXT: v_cmp_class_f32_e32 vcc, v0, v1
				; SELDAG-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc
				; SELDAG-NEXT: s_setpc_b64 s[30:31]
				%1 = call i1 @llvm.is.fpclass.f32(float %x, i32 144) ; 0x90 = "subnormal"
				ret i1 %1
				}

				define i1 @iszero_float(float %x) nounwind {
				; SELDAG-LABEL: iszero_float:
				; SELDAG: ; %bb.0:
				; SELDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; SELDAG-NEXT: v_mov_b32_e32 v1, 0x60
				; SELDAG-NEXT: v_cmp_class_f32_e32 vcc, v0, v1
				; SELDAG-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc
				; SELDAG-NEXT: s_setpc_b64 s[30:31]
				%1 = call i1 @llvm.is.fpclass.f32(float %x, i32 96) ; 0x60 = "zero"
				ret i1 %1
				}

				declare i1 @llvm.is.fpclass.f32(float, i32)
				declare i1 @llvm.is.fpclass.f16(half, i32)
				declare i1 @llvm.is.fpclass.f64(double, i32)

				arsenmUnsubmitted Done Reply Inline Actions Should add some vector cases too arsenm: Should add some vector cases too
				arsenmUnsubmitted Done Reply Inline Actions v3f16 and v4f16 are also potentially interesting arsenm: v3f16 and v4f16 are also potentially interesting