This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
6/6
LangRef.rst
-
include/llvm/
-
llvm/
-
CodeGen/
-
ISDOpcodes.h
-
IR/
4/11
Intrinsics.td
-
Support/
-
TargetOpcodes.def
-
Target/
-
GenericOpcodes.td
-
lib/
-
CodeGen/
-
GlobalISel/
7/8
IRTranslator.cpp
-
SelectionDAG/
15/20
SelectionDAGBuilder.cpp
-
IR/
5/5
Verifier.cpp
-
Target/AMDGPU/
-
AMDGPU/
-
AMDGPUGISel.td
-
AMDGPUISelLowering.h
-
AMDGPUISelLowering.cpp
-
AMDGPULegalizerInfo.h
1/1
AMDGPULegalizerInfo.cpp
1/4
AMDGPURegisterBankInfo.cpp
1/2
SIISelLowering.cpp
-
SIInstrInfo.td
3/3
SIInstructions.td
2/4
SIModeRegister.cpp
-
test/
-
CodeGen/
-
AArch64/GlobalISel/
-
GlobalISel/
-
legalizer-info-validation.mir
-
AMDGPU/
2/2
fail.llvm.fptrunc.round.ll
-
llvm.fptrunc.round.ll
-
Verifier/
-
llvm.fptrunc.round.ll

Differential D110579

[AMDGPU] Add a new intrinsic to control fp_trunc rounding mode
ClosedPublic

Authored by jpages on Sep 27 2021, 1:17 PM.

Download Raw Diff

Details

Reviewers

foad
andrew.w.kaylor
spatel
sepavloff
kpn
mibintc

Commits

rGdcb2da13f16e: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode

Summary

Add a new intrinsic to precisely control the rounding mode when converting
from f32 to f16.

Diff Detail

Unit TestsFailed

	Time	Test
	60,070 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vloxseg.c
	60,080 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vlseg.c
	60,050 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vlsegff.c
	60,080 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vluxseg.c
	60,070 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vsoxseg.c
		View Full Test Results (13 Failed)

Event Timeline

jpages created this revision.Sep 27 2021, 1:17 PM

Herald added subscribers: kerbowa, hiraditya, t-tye and 7 others. · View Herald TranscriptSep 27 2021, 1:17 PM

jpages requested review of this revision.Sep 27 2021, 1:17 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 27 2021, 1:17 PM

Herald added subscribers: llvm-commits, jdoerfert, wdng. · View Herald Transcript

Harbormaster completed remote builds in B125957: Diff 375382.Sep 27 2021, 2:18 PM

dstuttard added inline comments.Sep 28 2021, 1:29 AM

llvm/test/CodeGen/AMDGPU/llvm.experimental.fptrunc.round.ll
38 ↗	(On Diff #375382)	Is there a reason why we couldn't make this a single s_round_mode 0x8 rather than going via s_round_mode 0x0?

I'm not sure what level of approval is needed for new experimental intrinsics. It might be worth emailing llvmdev to explain the requirement. In any case they need to be documented in docs/LangRef.rst.

You're using EmitInstrWithCustomInserter to emit s_round_mode instructions, and then adding a new phase to SIModeRegister to remove them. This seems wrong. You should be able to teach SIModeRegister to insert the required s_round_mode instructions.

llvm/include/llvm/IR/Intrinsics.td
905	Needs a comment.
908	Needs a comment.
llvm/lib/Target/AMDGPU/SIInstructions.td
182–183	Can you implement instruction selection for the intrinsics by adding patterns here, instead of writing C++ code?

Rebased based on previous comments.

The pseudo-instructions are now selected in the pass SIModeRegister.

jpages marked 4 inline comments as done.Oct 4 2021, 1:17 PM

Harbormaster completed remote builds in B126918: Diff 377017.Oct 4 2021, 2:48 PM

foad added inline comments.Oct 6 2021, 6:00 AM

llvm/lib/Target/AMDGPU/SIModeRegister.cpp
428	You shouldn't need to add a new pass here to insert s_round_mode instructions. The SIModeRegister pass already knows how to do that. You should just be able to add your new PSEUDOs to the switch in getInstructionMode.

Rebased based on Jay's comments.

The consequence of this change is the use of setreg instead of s_round_mode in the codegen. But it's probably better to not reinvent the wheel as this pass is already optimized to not insert too many setreg.

jpages marked an inline comment as done.Oct 8 2021, 9:49 AM

Harbormaster completed remote builds in B127814: Diff 378283.Oct 8 2021, 10:19 AM

In D110579#3051790, @jpages wrote:

The consequence of this change is the use of setreg instead of s_round_mode in the codegen. But it's probably better to not reinvent the wheel as this pass is already optimized to not insert too many setreg.

Good point. s_round_mode/s_denorm_mode are new in GFX10, so they did not exist when this pass was written. Do you think the pass could be improved to emit s_round_mode/s_denorm_mode instead of s_setreg whenever it only needs to change the rounding/denormal bits of the mode register? That could be a separate patch.

foad added inline comments.Oct 11 2021, 1:47 AM

llvm/lib/Target/AMDGPU/SIModeRegister.cpp
439	If you arrange for the pseudo to have exactly the same operands as the real instruction, then you don't need to build or delete instructions, all you need to do is change the opcode, which you can do with MI.setDesc().
442	I'm not sure this needs another pass over all instructions in the function. Would it be cleaner to do it in phase 1? Maybe even inside getInstructionMode???

arsenm added inline comments.Oct 19 2021, 3:04 PM

llvm/include/llvm/IR/Intrinsics.td
914	Would it be better to have a metadata argument for the rounding mode like the constrained intrinsics? The verifier could disallow the unknown mode type
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
4493–4495	The backend should not directly consume generic intrinsics. These need to be routed through a generic opcode which is subject to normal legalization

jpages added inline comments.Oct 20 2021, 12:12 PM

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
4493–4495	What is the reasoning behind this requirement? I guess I would need something like experimental_fptrunc_round_upward -> amdgpu_experimental_fptrunc_round_upward -> codegen?
llvm/lib/Target/AMDGPU/SIModeRegister.cpp
439	I think this could work. I tried to "replace" my pseudo-instructions in Phase1 but it's not a good idea to add/delete instructions in the middle of an iteration on the basic block.

In D110579#3054424, @foad wrote:

In D110579#3051790, @jpages wrote:

The consequence of this change is the use of setreg instead of s_round_mode in the codegen. But it's probably better to not reinvent the wheel as this pass is already optimized to not insert too many setreg.

Good point. s_round_mode/s_denorm_mode are new in GFX10, so they did not exist when this pass was written. Do you think the pass could be improved to emit s_round_mode/s_denorm_mode instead of s_setreg whenever it only needs to change the rounding/denormal bits of the mode register? That could be a separate patch.

Sure, that's a good suggestion. I'll try to add this in a different patch.

arsenm added inline comments.Oct 20 2021, 12:38 PM

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
4493–4495	No, you would just need G_FPTRUNC_ROUND_UPWARD etc. opcodes. We don't have a way to define legalization rules on intrinsics, so if you need type legalization the backend would have to take care of it manually. e.g. we would have to manually scalarize the vector overloads of these intrinsics. With the opcode you can just define legalization rules like normal

Rebased.

Added G_FPTRUNC_ROUND_UPWARD/G_FPTRUNC_ROUND_DOWNWARD opcodes between the intrinsics and the pseudo instructions. Added a custom ISD node for the DAG version.

It may be possible to have a simpler implementation for intrinsics -> G_FPTRUNC_ROUND_UPWARD -> pseudos, I'm open to suggestions.

Arranged the pseudos to have exactly the same operands as the v_cvt instruction like Jay suggested.

Harbormaster completed remote builds in B137011: Diff 391139.Dec 1 2021, 2:56 PM

arsenm added reviewers: andrew.w.kaylor, spatel, sepavloff, kpn, mibintc.Dec 1 2021, 3:15 PM

sepavloff added inline comments.Dec 1 2021, 9:27 PM

llvm/include/llvm/IR/Intrinsics.td
910	What is the difference between this function and `llvm.ceil`? Can the latter be used instead of this one?
914	What is the difference between this function and `llvm.floor`? There are also functions `llvm.nearbyint` and `llvm.rint` that can be used to make rounding with any mode.

sepavloff added inline comments.Dec 1 2021, 9:32 PM

llvm/include/llvm/IR/Intrinsics.td
914	Oh, I see, this is conversion between floats. There is `llvm.experimental.constrained.fptrunc`, which seems to do the same thing?

jpages added inline comments.Dec 2 2021, 9:35 AM

llvm/include/llvm/IR/Intrinsics.td
914	Yes, this is indeed a conversion between floats. The idea of this change is to introduce something simpler than constrained intrinsics in a special case. According to https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics "If any FP operation in a function is constrained then they all must be constrained". We wanted to do a simpler conversion from f32 -> f16 with a special rounding mode, but without constraining the rest of the function after this operation. Introducing an Intrinsic that would be lowered to 3 instructions seemed to be the best solution. These 3 instructions are: Setting a special rounding mode Conversion f32 -> f16 Restoring the rounding mode to the default value

sepavloff added inline comments.Dec 2 2021, 7:02 PM

llvm/include/llvm/IR/Intrinsics.td
914	This would be a useful function, which solves a widespread task. On some platforms, like RISC-V, it even can be lowered into a single instruction. As this is common interface, I would propose you to make rounding mode an argument. It would allows a target to implement the conversion with any supported mode. Besides, it would allow future extensions without changing the interface, for example, to allow other control modes: lvm.fptrunc.round(%a, !"round.towardzero,denorm.ftz") What prevents from making it a regular intrinsic, not experimental?

Rebased.

I changed the patch to have only one intrinsic with a rounding mode parameter as suggested. This required many modifications in the frontend parts of LLVM.

Thanks for the comments!

Harbormaster completed remote builds in B142777: Diff 399125.Jan 11 2022, 4:45 PM

craig.topper added a subscriber: craig.topper.Jan 11 2022, 4:57 PM

craig.topper added inline comments.

llvm/include/llvm/IR/Intrinsics.td
907	Align the brackets with the previous line.
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2261	Use cast instead of dyn_cast if it can't fail. Or check that the dyn_cast didn't return null if it can fail.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6337	Why would this intrinsic ever have a chain?
6349	Use cast instead of dyn_cast if it can't fail. Or check that the dyn_cast didn't return null if it can fail.
6384	The code has optionally added a chain, but INTRINSIC_W_CHAIN must always have a chain. Whether there is a chain or not is a property of the ISD opcode.

craig.topper added inline comments.Jan 11 2022, 5:11 PM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6337	Err. I guess the intrinsic attributes are defined to always have a Chain. Should it be IntrNoMem instead?

foad added inline comments.Jan 12 2022, 1:19 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6337	IIUC, if the rounding mode is allowed to be "dynamic" then the intrinsic needs to be IntrInaccessibleMemOnly (just like the constrained fp intrinsics). If we don't allow "dynamic" then it could be IntrNoMem. Personally I have no interest in supporting "dynamic" so I would vote for disallowing it and making the intrinsic IntrNoMem.

Rebased.

The intrinsic has the IntrNoMem attribute, and is represented by INTRINSIC_WO_CHAIN in the DAG.

jpages marked 4 inline comments as done.Jan 13 2022, 1:42 PM

jpages added inline comments.Jan 13 2022, 1:46 PM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6337	Yes the original version was defined to have a chain, I changed it with IntrNoMem/INTRINSIC_WO_CHAIN as suggested. Thank you for the suggestions. As a consequence this code is a bit simpler, and the lowering in the backend was modified to reflect that .

Harbormaster completed remote builds in B143235: Diff 399768.Jan 13 2022, 2:48 PM

It seems like people are mostly happy with the design now, so I am being a bit more picky with my review comments!

llvm/docs/LangRef.rst
23858	Add "... with a specified rounding mode". Apart from that I think most of the Overview/Arguments/Semantics text could be copied verbatim from the definition of the fptrunc-to instruction?
23868	Need to disallow "dynamic" if that's what we decided.
23874	Probably need to add the text from fptrunc-to: "This instruction is assumed to execute in the default floating-point environment" (because of the stuff about exceptions) and then say except for the rounding mode.
llvm/include/llvm/IR/Intrinsics.td
905	"Truncate"
908	Add IntrWillReturn.
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2257	VRegs isn't used for anything so remove it.
2269	We know it's not void, and has exactly one result, so simplify this.
2273	We know it does not access memory.
llvm/lib/IR/Verifier.cpp
4748	MD could be nullptr here, so you need to handle that.
4750	Check that the rounding mode is not "dynamic"?
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
698 ↗	(On Diff #399125)	Can this be done with patterns in the tablegen files instead of C++ code? Then they should work for both SelectionDAG and GlobalISel.
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3252 ↗	(On Diff #399125)	Can this be done with patterns in the tablegen files instead of C++ code?
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4939	Should this be "return false" to indicate a legalization failure in the normal way?
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
4600	sgpr seems wrong. This is specifying the mapping for the result value and the input value (not the rounding mode which is no longer an operand). I think this should just use the same mapping as G_FPTRUNC (line 3640).
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
6844	I'm not quite sure but I think you should return SDValue() here to indicate a lowering/legalization failure?
llvm/lib/Target/AMDGPU/SIISelLowering.h
134 ↗	(On Diff #399768)	No longer needed.

I think I'd like to see a target independent ISD opcode added for this. We don't use ISD::INTRINSIC_WO_CHAIN for target independent intrinsics.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6356	This is always false. And if it isn't you would only push one argument.
6361	I'd probably just use TLI.getPointerTy instead of MVT::i8.
6377	I know this was taken from visitTargetIntrinsic, but I don't really understand it.
6381	I don't think this is needed. AssertZExt isn't useful for an FP result.
6383	I don't think this alignment stuff is needed. The return type is FP so it probably doesn't do anything.

jpages updated this revision to Diff 404732.Jan 31 2022, 2:21 PM

jpages retitled this revision from [AMDGPU] Add two new intrinsics to control fp_trunc rounding mode to [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.

Herald added a subscriber: dexonsmith. · View Herald TranscriptJan 31 2022, 2:21 PM

Rebased, as suggested I added a target independent ISD opcode.

Thanks again for all the comments!

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6377	I removed this line, I'm not sure of what it was doing either.
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3252 ↗	(On Diff #399125)	Maybe it's possible, but I could not find a way to match both the G_FPTRUNC_ROUND_UPWARD and the selection DAG version. So I let this one in C++.
llvm/lib/Target/AMDGPU/SIInstructions.td
182–183	Like I said previously, I tried to use pure tablegen for both SelectionDag and global-isel and I could not find a way to do it for both.

Harbormaster completed remote builds in B146773: Diff 404732.Jan 31 2022, 4:58 PM

foad added inline comments.Feb 1 2022, 3:13 AM

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

3252 ↗

(On Diff #399125)

You can get some clues about GlobalISel pattern problems by running tablegen with -warn-on-skipped-patterns like this:

$ llvm-tblgen -gen-global-isel -I lib/Target/AMDGPU -I build/include -I include -I lib/Target lib/Target/AMDGPU/AMDGPUGISel.td -warn-on-skipped-patterns
...
lib/Target/AMDGPU/SIInstructions.td:186:1: warning: Skipped pattern: Pattern operator lacks an equivalent Instruction (AMDGPUISD::FPTRUNC_ROUND_UPWARD)
def FPTRUNC_UPWARD_PSEUDO : VPseudoInstSI <(outs VGPR_32:$vdst),
^
...
lib/Target/AMDGPU/SIInstructions.td:190:1: warning: Skipped pattern: Pattern operator lacks an equivalent Instruction (AMDGPUISD::FPTRUNC_ROUND_DOWNWARD)
def FPTRUNC_DOWNWARD_PSEUDO : VPseudoInstSI <(outs VGPR_32:$vdst),
^

This means you need to add some GINodeEquiv lines in AMDGPUGISel.td.

llvm/lib/Target/AMDGPU/SIInstructions.td

188

Just end the line with ; if there is nothing to go inside the {}.

Here's what I came up with to get the globalisel pattern matching working, plus a few other suggested tweaks: https://reviews.llvm.org/differential/diff/404879/

Rebased following Jay's comments.

jpages marked 2 inline comments as done.Feb 1 2022, 8:33 AM

jpages added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3252 ↗	(On Diff #399125)	Thanks again, I didn't know about this command

Harbormaster completed remote builds in B146914: Diff 404944.Feb 1 2022, 10:13 AM

Updated a failing test on AArch64.

craig.topper added inline comments.Feb 3 2022, 3:47 PM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6355	You probably don't need ComputeValueVTs. There shouldn't be any structs or arrays here that need to broken down. I think you can do EVT VT = TLI.getValueType(I.getType) And then use `VT` in place of `VTs`

Harbormaster completed remote builds in B147522: Diff 405819.Feb 3 2022, 5:35 PM

jpages updated this revision to Diff 405846.Feb 3 2022, 6:11 PM

jpages marked 3 inline comments as done.

jpages added inline comments.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6355	Indeed it works, thanks!

Harbormaster completed remote builds in B147543: Diff 405846.Feb 3 2022, 7:45 PM

This is looking pretty good, just another round of minor comments.

llvm/docs/LangRef.rst
23870	This seems to be repeating the "must be larger" text from a few lines above.
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2270	`return true`.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6337	You don't really need a SmallVector here, since you know you have exactly two operands, you could just pass them as two separate arguments to the DAG.getNode call. But it's up to you.
6341	Shouldn't this be getArgOperand(1)? How does it work?
6345	This isn't used.
6346	TLI is already available in this function.
6359	All intrinsics returning floating point type are classed as FPMathOperators, so you can do this part unconditionally.
6364	Use sdl here too.
llvm/lib/IR/Verifier.cpp
4748	I am not an expert on verifying metadata but I think you probably need to check that it is an MDString here, otherwise the cast<MDString> below could fail? And really we ought to have some tests in test/Verifier/llvm.fptrunc.round.ll.
4752	I am not sure but I think `RoundMode != RoundingMode::Dynamic` needs to be `RoundMode.getValue() != ...` (or `*RoundMode != ...`).
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
4662	Do you also need to check that the operand is MVT::f32? Maybe add a test where the operand is f64, and check that it fails to legalize?

Rebased following Jay's comments.

Added some tests for the verifier/legalizer.

jpages marked 12 inline comments as done.Feb 7 2022, 9:00 AM

Harbormaster completed remote builds in B148003: Diff 406482.Feb 7 2022, 10:26 AM

I think this looks good now. Are you happy with it @craig.topper @sepavloff ?

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2258	Should probably use getArgOperand here, though it doesn't make any difference except for bounds checking the index that you pass in.
llvm/lib/IR/Verifier.cpp
4750	`isa` is a bit more natural than `dyn_cast` here.
llvm/test/CodeGen/AMDGPU/fail.llvm.fptrunc.round.ll
2	Does this test work in a Release build? If not, you can add a "REQUIRES: asserts" line just after the RUN lines.

It looks good to me.

I had a look at generic changes only, someone with expertise in AMDGPU should also approve this patch.

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2268	Flags could be set in the call to `MIRBuilder.buildInstr`.

This revision is now accepted and ready to land.Feb 8 2022, 8:24 AM

craig.topper added inline comments.Feb 8 2022, 9:06 AM

llvm/docs/LangRef.rst
23879	Should we document that this isn't supported on all targets, and that a particular target may not support all rounding modes?

jpages updated this revision to Diff 406953.Feb 8 2022, 1:35 PM

jpages marked 5 inline comments as done.Feb 8 2022, 1:42 PM

jpages added inline comments.

llvm/docs/LangRef.rst
23879	Good idea, thanks! I added a comment about that
llvm/test/CodeGen/AMDGPU/fail.llvm.fptrunc.round.ll
2	I just checked and it's working as well in Release without that.

Harbormaster completed remote builds in B148346: Diff 406953.Feb 8 2022, 5:27 PM

foad accepted this revision.Feb 9 2022, 4:35 AM

foad added inline comments.

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2267	You can also do this on the end of the buildInstr line if you prefer, no need for MIB at all.

jpages updated this revision to Diff 407722.Feb 10 2022, 4:30 PM

jpages marked 2 inline comments as done.

Harbormaster completed remote builds in B148876: Diff 407722.Feb 10 2022, 5:30 PM

jpages updated this revision to Diff 407733.Feb 10 2022, 5:33 PM

Harbormaster completed remote builds in B148885: Diff 407733.Feb 10 2022, 6:15 PM

foad accepted this revision.Feb 11 2022, 12:11 AM

This revision was landed with ongoing or failed builds.Feb 11 2022, 9:09 AM

Closed by commit rGdcb2da13f16e: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode (authored by jpages). · Explain Why

This revision was automatically updated to reflect the committed changes.

jpages added a commit: rGdcb2da13f16e: [AMDGPU] Add a new intrinsic to control fp_trunc rounding mode.

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

40 lines

include/

llvm/

CodeGen/

ISDOpcodes.h

3 lines

IR/

Intrinsics.td

6 lines

Support/

TargetOpcodes.def

3 lines

Target/

GenericOpcodes.td

6 lines

lib/

CodeGen/

GlobalISel/

IRTranslator.cpp

17 lines

SelectionDAG/

SelectionDAGBuilder.cpp

23 lines

IR/

Verifier.cpp

21 lines

Target/

AMDGPU/

AMDGPUGISel.td

3 lines

AMDGPUISelLowering.h

3 lines

AMDGPUISelLowering.cpp

2 lines

AMDGPULegalizerInfo.h

2 lines

AMDGPULegalizerInfo.cpp

28 lines

AMDGPURegisterBankInfo.cpp

3 lines

20 lines

8 lines

28 lines

16 lines

test/

CodeGen/

AArch64/

GlobalISel/

legalizer-info-validation.mir

4 lines

AMDGPU/

fail.llvm.fptrunc.round.ll

11 lines

llvm.fptrunc.round.ll

52 lines

Verifier/

llvm.fptrunc.round.ll

13 lines

Diff 407733

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 23,833 Lines • ▼ Show 20 Lines
	<attr_elementtype>` attribute at the call-site. This attribute specifies the			<attr_elementtype>` attribute at the call-site. This attribute specifies the
	getelementptr element type.			getelementptr element type.

	Semantics:			Semantics:
	""""""""""			""""""""""

	The '``llvm.preserve.struct.access.index``' intrinsic produces the same result			The '``llvm.preserve.struct.access.index``' intrinsic produces the same result
	as a getelementptr with base ``base`` and access operands ``{0, gep_index}``.			as a getelementptr with base ``base`` and access operands ``{0, gep_index}``.

				'``llvm.fptrunc.round``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare <ty2>
				@llvm.fptrunc.round(<type> <value>, metadata <rounding mode>)

				Overview:
				"""""""""

				The '``llvm.fptrunc.round``' intrinsic truncates
				:ref:`floating-point <t_floating>` ``value`` to type ``ty2``
				foadUnsubmitted Done Reply Inline Actions Add "... with a specified rounding mode". Apart from that I think most of the Overview/Arguments/Semantics text could be copied verbatim from the definition of the fptrunc-to instruction? foad: Add "... with a specified rounding mode". Apart from that I think most of the…
				with a specified rounding mode.

				Arguments:
				""""""""""

				The '``llvm.fptrunc.round``' intrinsic takes a :ref:`floating-point
				<t_floating>` value to cast and a :ref:`floating-point <t_floating>` type
				to cast it to. This argument must be larger in size than the result.

				The second argument specifies the rounding mode as described in the constrained
				foadUnsubmitted Done Reply Inline Actions Need to disallow "dynamic" if that's what we decided. foad: Need to disallow "dynamic" if that's what we decided.
				intrinsics section.
				For this intrinsic, the "round.dynamic" mode is not supported.
				foadUnsubmitted Done Reply Inline Actions This seems to be repeating the "must be larger" text from a few lines above. foad: This seems to be repeating the "must be larger" text from a few lines above.

				Semantics:
				""""""""""

				foadUnsubmitted Done Reply Inline Actions Probably need to add the text from fptrunc-to: "This instruction is assumed to execute in the default floating-point environment" (because of the stuff about exceptions) and then say except for the rounding mode. foad: Probably need to add the text from fptrunc-to: "This instruction is assumed to execute in the…
				The '``llvm.fptrunc.round``' intrinsic casts a ``value`` from a larger
				:ref:`floating-point <t_floating>` type to a smaller :ref:`floating-point
				<t_floating>` type.
				This intrinsic is assumed to execute in the default :ref:`floating-point
				environment <floatenv>` except for the rounding mode.
				craig.topperUnsubmitted Done Reply Inline Actions Should we document that this isn't supported on all targets, and that a particular target may not support all rounding modes? craig.topper: Should we document that this isn't supported on all targets, and that a particular target may…
				jpagesAuthorUnsubmitted Done Reply Inline Actions Good idea, thanks! I added a comment about that jpages: Good idea, thanks! I added a comment about that
				This intrinsic is not supported on all targets. Some targets may not support
				all rounding modes.

llvm/include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 456 Lines • ▼ Show 20 Lines	enum NodeType {

/// STRICT_FSETCC/STRICT_FSETCCS - Constrained versions of SETCC, used		/// STRICT_FSETCC/STRICT_FSETCCS - Constrained versions of SETCC, used
/// for floating-point operands only. STRICT_FSETCC performs a quiet		/// for floating-point operands only. STRICT_FSETCC performs a quiet
/// comparison operation, while STRICT_FSETCCS performs a signaling		/// comparison operation, while STRICT_FSETCCS performs a signaling
/// comparison operation.		/// comparison operation.
STRICT_FSETCC,		STRICT_FSETCC,
STRICT_FSETCCS,		STRICT_FSETCCS,

		// FPTRUNC_ROUND - This corresponds to the fptrunc_round intrinsic.
		FPTRUNC_ROUND,

/// FMA - Perform a * b + c with no intermediate rounding step.		/// FMA - Perform a * b + c with no intermediate rounding step.
FMA,		FMA,

/// FMAD - Perform a * b + c, while getting the same result as the		/// FMAD - Perform a * b + c, while getting the same result as the
/// separately rounded operations.		/// separately rounded operations.
FMAD,		FMAD,

/// FCOPYSIGN(X, Y) - Return the value of X with the sign of Y. NOTE: This		/// FCOPYSIGN(X, Y) - Return the value of X with the sign of Y. NOTE: This
▲ Show 20 Lines • Show All 987 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 895 Lines • ▼ Show 20 Lines	def int_experimental_constrained_fcmp
llvm_metadata_ty, llvm_metadata_ty ]>;		llvm_metadata_ty, llvm_metadata_ty ]>;
def int_experimental_constrained_fcmps		def int_experimental_constrained_fcmps
: DefaultAttrsIntrinsic<[ LLVMScalarOrSameVectorWidth<0, llvm_i1_ty> ],		: DefaultAttrsIntrinsic<[ LLVMScalarOrSameVectorWidth<0, llvm_i1_ty> ],
[ llvm_anyfloat_ty, LLVMMatchType<0>,		[ llvm_anyfloat_ty, LLVMMatchType<0>,
llvm_metadata_ty, llvm_metadata_ty ]>;		llvm_metadata_ty, llvm_metadata_ty ]>;
}		}
// FIXME: Consider maybe adding intrinsics for sitofp, uitofp.		// FIXME: Consider maybe adding intrinsics for sitofp, uitofp.


		// Truncate a floating point number with a specific rounding mode
		foadUnsubmitted Done Reply Inline Actions Needs a comment. foad: Needs a comment.
		foadUnsubmitted Not Done Reply Inline Actions "Truncate" foad: "Truncate"
		def int_fptrunc_round : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ],
		[ llvm_anyfloat_ty, llvm_metadata_ty ],
		craig.topperUnsubmitted Done Reply Inline Actions Align the brackets with the previous line. craig.topper: Align the brackets with the previous line.
		[ IntrNoMem, IntrWillReturn ]>;
		foadUnsubmitted Done Reply Inline Actions Needs a comment. foad: Needs a comment.
		foadUnsubmitted Not Done Reply Inline Actions Add IntrWillReturn. foad: Add IntrWillReturn.

//===------------------------- Expect Intrinsics --------------------------===//		//===------------------------- Expect Intrinsics --------------------------===//
		sepavloffUnsubmitted Not Done Reply Inline Actions What is the difference between this function and `llvm.ceil`? Can the latter be used instead of this one? sepavloff: What is the difference between this function and `llvm.ceil`? Can the latter be used instead of…
//		//
def int_expect : DefaultAttrsIntrinsic<[llvm_anyint_ty],		def int_expect : DefaultAttrsIntrinsic<[llvm_anyint_ty],
[LLVMMatchType<0>, LLVMMatchType<0>], [IntrNoMem, IntrWillReturn]>;		[LLVMMatchType<0>, LLVMMatchType<0>], [IntrNoMem, IntrWillReturn]>;

		arsenmUnsubmitted Not Done Reply Inline Actions Would it be better to have a metadata argument for the rounding mode like the constrained intrinsics? The verifier could disallow the unknown mode type arsenm: Would it be better to have a metadata argument for the rounding mode like the constrained…
		sepavloffUnsubmitted Not Done Reply Inline Actions What is the difference between this function and `llvm.floor`? There are also functions `llvm.nearbyint` and `llvm.rint` that can be used to make rounding with any mode. sepavloff: What is the difference between this function and `llvm.floor`? There are also functions `llvm.
		sepavloffUnsubmitted Not Done Reply Inline Actions Oh, I see, this is conversion between floats. There is `llvm.experimental.constrained.fptrunc`, which seems to do the same thing? sepavloff: Oh, I see, this is conversion between floats. There is `llvm.experimental.constrained.fptrunc`…
		jpagesAuthorUnsubmitted Not Done Reply Inline Actions Yes, this is indeed a conversion between floats. The idea of this change is to introduce something simpler than constrained intrinsics in a special case. According to https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics "If any FP operation in a function is constrained then they all must be constrained". We wanted to do a simpler conversion from f32 -> f16 with a special rounding mode, but without constraining the rest of the function after this operation. Introducing an Intrinsic that would be lowered to 3 instructions seemed to be the best solution. These 3 instructions are: Setting a special rounding mode Conversion f32 -> f16 Restoring the rounding mode to the default value jpages: Yes, this is indeed a conversion between floats. The idea of this change is to introduce…
		sepavloffUnsubmitted Done Reply Inline Actions This would be a useful function, which solves a widespread task. On some platforms, like RISC-V, it even can be lowered into a single instruction. As this is common interface, I would propose you to make rounding mode an argument. It would allows a target to implement the conversion with any supported mode. Besides, it would allow future extensions without changing the interface, for example, to allow other control modes: lvm.fptrunc.round(%a, !"round.towardzero,denorm.ftz") What prevents from making it a regular intrinsic, not experimental? sepavloff: This would be a useful function, which solves a widespread task. On some platforms, like RISC-V…
def int_expect_with_probability : DefaultAttrsIntrinsic<[llvm_anyint_ty],		def int_expect_with_probability : DefaultAttrsIntrinsic<[llvm_anyint_ty],
[LLVMMatchType<0>, LLVMMatchType<0>, llvm_double_ty],		[LLVMMatchType<0>, LLVMMatchType<0>, llvm_double_ty],
[IntrNoMem, IntrWillReturn]>;		[IntrNoMem, IntrWillReturn]>;

//===-------------------- Bit Manipulation Intrinsics ---------------------===//		//===-------------------- Bit Manipulation Intrinsics ---------------------===//
//		//

// None of these intrinsics accesses memory at all.		// None of these intrinsics accesses memory at all.
▲ Show 20 Lines • Show All 1,011 Lines • Show Last 20 Lines

llvm/include/llvm/Support/TargetOpcodes.def

	Show First 20 Lines • Show All 315 Lines • ▼ Show 20 Lines

	/// Generic bitcast. The source and destination types must be different, or a			/// Generic bitcast. The source and destination types must be different, or a
	/// COPY is the relevant instruction.			/// COPY is the relevant instruction.
	HANDLE_TARGET_OPCODE(G_BITCAST)			HANDLE_TARGET_OPCODE(G_BITCAST)

	/// Generic freeze.			/// Generic freeze.
	HANDLE_TARGET_OPCODE(G_FREEZE)			HANDLE_TARGET_OPCODE(G_FREEZE)

				// INTRINSIC fptrunc_round intrinsic.
				HANDLE_TARGET_OPCODE(G_INTRINSIC_FPTRUNC_ROUND)

	/// INTRINSIC trunc intrinsic.			/// INTRINSIC trunc intrinsic.
	HANDLE_TARGET_OPCODE(G_INTRINSIC_TRUNC)			HANDLE_TARGET_OPCODE(G_INTRINSIC_TRUNC)

	/// INTRINSIC round intrinsic.			/// INTRINSIC round intrinsic.
	HANDLE_TARGET_OPCODE(G_INTRINSIC_ROUND)			HANDLE_TARGET_OPCODE(G_INTRINSIC_ROUND)

	/// INTRINSIC round to integer intrinsic.			/// INTRINSIC round to integer intrinsic.
	HANDLE_TARGET_OPCODE(G_INTRINSIC_LRINT)			HANDLE_TARGET_OPCODE(G_INTRINSIC_LRINT)
	▲ Show 20 Lines • Show All 451 Lines • Show Last 20 Lines

llvm/include/llvm/Target/GenericOpcodes.td

Show First 20 Lines • Show All 959 Lines • ▼ Show 20 Lines	def G_FNEARBYINT : GenericInstruction {
let OutOperandList = (outs type0:$dst);		let OutOperandList = (outs type0:$dst);
let InOperandList = (ins type0:$src1);		let InOperandList = (ins type0:$src1);
let hasSideEffects = false;		let hasSideEffects = false;
}		}

//------------------------------------------------------------------------------		//------------------------------------------------------------------------------
// Opcodes for LLVM Intrinsics		// Opcodes for LLVM Intrinsics
//------------------------------------------------------------------------------		//------------------------------------------------------------------------------
		def G_INTRINSIC_FPTRUNC_ROUND : GenericInstruction {
		let OutOperandList = (outs type0:$dst);
		let InOperandList = (ins type1:$src1, i32imm:$round_mode);
		let hasSideEffects = false;
		}

def G_INTRINSIC_TRUNC : GenericInstruction {		def G_INTRINSIC_TRUNC : GenericInstruction {
let OutOperandList = (outs type0:$dst);		let OutOperandList = (outs type0:$dst);
let InOperandList = (ins type0:$src1);		let InOperandList = (ins type0:$src1);
let hasSideEffects = false;		let hasSideEffects = false;
}		}

def G_INTRINSIC_ROUND : GenericInstruction {		def G_INTRINSIC_ROUND : GenericInstruction {
let OutOperandList = (outs type0:$dst);		let OutOperandList = (outs type0:$dst);
▲ Show 20 Lines • Show All 461 Lines • Show Last 20 Lines

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp

Show First 20 Lines • Show All 2,245 Lines • ▼ Show 20 Lines	if (ID == Intrinsic::ubsantrap) {
Info.OrigArgs.push_back({getOrCreateVRegs(*CI.getArgOperand(0)),		Info.OrigArgs.push_back({getOrCreateVRegs(*CI.getArgOperand(0)),
CI.getArgOperand(0)->getType(), 0});		CI.getArgOperand(0)->getType(), 0});
}		}
Info.Callee = MachineOperand::CreateES(TrapFuncName.data());		Info.Callee = MachineOperand::CreateES(TrapFuncName.data());
Info.CB = &CI;		Info.CB = &CI;
Info.OrigRet = {Register(), Type::getVoidTy(CI.getContext()), 0};		Info.OrigRet = {Register(), Type::getVoidTy(CI.getContext()), 0};
return CLI->lowerCall(MIRBuilder, Info);		return CLI->lowerCall(MIRBuilder, Info);
}		}
		case Intrinsic::fptrunc_round: {
		unsigned Flags = MachineInstr::copyFlagsFromInstruction(CI);

		// Convert the metadata argument to a constant integer
		foadUnsubmitted Done Reply Inline Actions VRegs isn't used for anything so remove it. foad: VRegs isn't used for anything so remove it.
		Metadata *MD = cast<MetadataAsValue>(CI.getArgOperand(1))->getMetadata();
		foadUnsubmitted Done Reply Inline Actions Should probably use getArgOperand here, though it doesn't make any difference except for bounds checking the index that you pass in. foad: Should probably use getArgOperand here, though it doesn't make any difference except for bounds…
		Optional<RoundingMode> RoundMode =
		convertStrToRoundingMode(cast<MDString>(MD)->getString());

		craig.topperUnsubmitted Done Reply Inline Actions Use cast instead of dyn_cast if it can't fail. Or check that the dyn_cast didn't return null if it can fail. craig.topper: Use cast instead of dyn_cast if it can't fail. Or check that the dyn_cast didn't return null if…
		// Add the Rounding mode as an integer
		MIRBuilder
		.buildInstr(TargetOpcode::G_INTRINSIC_FPTRUNC_ROUND,
		{getOrCreateVReg(CI)},
		{getOrCreateVReg(*CI.getArgOperand(0))}, Flags)
		.addImm((int)RoundMode.getValue());
		foadUnsubmitted Not Done Reply Inline Actions You can also do this on the end of the buildInstr line if you prefer, no need for MIB at all. foad: You can also do this on the end of the buildInstr line if you prefer, no need for MIB at all.

		sepavloffUnsubmitted Done Reply Inline Actions Flags could be set in the call to `MIRBuilder.buildInstr`. sepavloff: Flags could be set in the call to `MIRBuilder.buildInstr`.
		return true;
		foadUnsubmitted Done Reply Inline Actions We know it's not void, and has exactly one result, so simplify this. foad: We know it's not void, and has exactly one result, so simplify this.
		}
		foadUnsubmitted Done Reply Inline Actions `return true`. foad: `return true`.
#define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC) \		#define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC) \
case Intrinsic::INTRINSIC:		case Intrinsic::INTRINSIC:
#include "llvm/IR/ConstrainedOps.def"		#include "llvm/IR/ConstrainedOps.def"
		foadUnsubmitted Done Reply Inline Actions We know it does not access memory. foad: We know it does not access memory.
return translateConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(CI),		return translateConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(CI),
MIRBuilder);		MIRBuilder);

}		}
return false;		return false;
}		}

bool IRTranslator::translateInlineAsm(const CallBase &CB,		bool IRTranslator::translateInlineAsm(const CallBase &CB,
▲ Show 20 Lines • Show All 1,251 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,326 Lines • ▼ Show 20 Lines	#define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC) \
case Intrinsic::INTRINSIC:		case Intrinsic::INTRINSIC:
#include "llvm/IR/ConstrainedOps.def"		#include "llvm/IR/ConstrainedOps.def"
visitConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(I));		visitConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(I));
return;		return;
#define BEGIN_REGISTER_VP_INTRINSIC(VPID, ...) case Intrinsic::VPID:		#define BEGIN_REGISTER_VP_INTRINSIC(VPID, ...) case Intrinsic::VPID:
#include "llvm/IR/VPIntrinsics.def"		#include "llvm/IR/VPIntrinsics.def"
visitVectorPredicationIntrinsic(cast<VPIntrinsic>(I));		visitVectorPredicationIntrinsic(cast<VPIntrinsic>(I));
return;		return;
		case Intrinsic::fptrunc_round: {
		// Get the last argument, the metadata and convert it to an integer in the
		// call
		craig.topperUnsubmitted Not Done Reply Inline Actions Why would this intrinsic ever have a chain? craig.topper: Why would this intrinsic ever have a chain?
		craig.topperUnsubmitted Not Done Reply Inline Actions Err. I guess the intrinsic attributes are defined to always have a Chain. Should it be IntrNoMem instead? craig.topper: Err. I guess the intrinsic attributes are defined to always have a Chain. Should it be…
		foadUnsubmitted Not Done Reply Inline Actions IIUC, if the rounding mode is allowed to be "dynamic" then the intrinsic needs to be IntrInaccessibleMemOnly (just like the constrained fp intrinsics). If we don't allow "dynamic" then it could be IntrNoMem. Personally I have no interest in supporting "dynamic" so I would vote for disallowing it and making the intrinsic IntrNoMem. foad: IIUC, if the rounding mode is allowed to be "dynamic" then the intrinsic needs to be…
		jpagesAuthorUnsubmitted Done Reply Inline Actions Yes the original version was defined to have a chain, I changed it with IntrNoMem/INTRINSIC_WO_CHAIN as suggested. Thank you for the suggestions. As a consequence this code is a bit simpler, and the lowering in the backend was modified to reflect that . jpages: Yes the original version was defined to have a chain, I changed it with…
		foadUnsubmitted Done Reply Inline Actions You don't really need a SmallVector here, since you know you have exactly two operands, you could just pass them as two separate arguments to the DAG.getNode call. But it's up to you. foad: You don't really need a SmallVector here, since you know you have exactly two operands, you…
		Metadata *MD = cast<MetadataAsValue>(I.getArgOperand(1))->getMetadata();
		Optional<RoundingMode> RoundMode =
		convertStrToRoundingMode(cast<MDString>(MD)->getString());

		foadUnsubmitted Done Reply Inline Actions Shouldn't this be getArgOperand(1)? How does it work? foad: Shouldn't this be getArgOperand(1)? How does it work?
		EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());

		// Propagate fast-math-flags from IR to node(s).
		SDNodeFlags Flags;
		foadUnsubmitted Done Reply Inline Actions This isn't used. foad: This isn't used.
		Flags.copyFMF(*cast<FPMathOperator>(&I));
		foadUnsubmitted Done Reply Inline Actions TLI is already available in this function. foad: TLI is already available in this function.
		SelectionDAG::FlagInserter FlagsInserter(DAG, Flags);

		SDValue Result;
		craig.topperUnsubmitted Done Reply Inline Actions Use cast instead of dyn_cast if it can't fail. Or check that the dyn_cast didn't return null if it can fail. craig.topper: Use cast instead of dyn_cast if it can't fail. Or check that the dyn_cast didn't return null if…
		Result = DAG.getNode(
		ISD::FPTRUNC_ROUND, sdl, VT, getValue(I.getArgOperand(0)),
		DAG.getTargetConstant((int)RoundMode.getValue(), sdl,
		TLI.getPointerTy(DAG.getDataLayout())));
		setValue(&I, Result);

		craig.topperUnsubmitted Done Reply Inline Actions You probably don't need ComputeValueVTs. There shouldn't be any structs or arrays here that need to broken down. I think you can do EVT VT = TLI.getValueType(I.getType) And then use `VT` in place of `VTs` craig.topper: You probably don't need ComputeValueVTs. There shouldn't be any structs or arrays here that…
		jpagesAuthorUnsubmitted Done Reply Inline Actions Indeed it works, thanks! jpages: Indeed it works, thanks!
		return;
		craig.topperUnsubmitted Done Reply Inline Actions This is always false. And if it isn't you would only push one argument. craig.topper: This is always false. And if it isn't you would only push one argument.
		}
case Intrinsic::fmuladd: {		case Intrinsic::fmuladd: {
EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());		EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
		foadUnsubmitted Done Reply Inline Actions All intrinsics returning floating point type are classed as FPMathOperators, so you can do this part unconditionally. foad: All intrinsics returning floating point type are classed as FPMathOperators, so you can do this…
if (TM.Options.AllowFPOpFusion != FPOpFusion::Strict &&		if (TM.Options.AllowFPOpFusion != FPOpFusion::Strict &&
TLI.isFMAFasterThanFMulAndFAdd(DAG.getMachineFunction(), VT)) {		TLI.isFMAFasterThanFMulAndFAdd(DAG.getMachineFunction(), VT)) {
		craig.topperUnsubmitted Done Reply Inline Actions I'd probably just use TLI.getPointerTy instead of MVT::i8. craig.topper: I'd probably just use TLI.getPointerTy instead of MVT::i8.
setValue(&I, DAG.getNode(ISD::FMA, sdl,		setValue(&I, DAG.getNode(ISD::FMA, sdl,
getValue(I.getArgOperand(0)).getValueType(),		getValue(I.getArgOperand(0)).getValueType(),
getValue(I.getArgOperand(0)),		getValue(I.getArgOperand(0)),
		foadUnsubmitted Done Reply Inline Actions Use sdl here too. foad: Use sdl here too.
getValue(I.getArgOperand(1)),		getValue(I.getArgOperand(1)),
getValue(I.getArgOperand(2)), Flags));		getValue(I.getArgOperand(2)), Flags));
} else {		} else {
// TODO: Intrinsic calls should have fast-math-flags.		// TODO: Intrinsic calls should have fast-math-flags.
SDValue Mul = DAG.getNode(		SDValue Mul = DAG.getNode(
ISD::FMUL, sdl, getValue(I.getArgOperand(0)).getValueType(),		ISD::FMUL, sdl, getValue(I.getArgOperand(0)).getValueType(),
getValue(I.getArgOperand(0)), getValue(I.getArgOperand(1)), Flags);		getValue(I.getArgOperand(0)), getValue(I.getArgOperand(1)), Flags);
SDValue Add = DAG.getNode(ISD::FADD, sdl,		SDValue Add = DAG.getNode(ISD::FADD, sdl,
getValue(I.getArgOperand(0)).getValueType(),		getValue(I.getArgOperand(0)).getValueType(),
Mul, getValue(I.getArgOperand(2)), Flags);		Mul, getValue(I.getArgOperand(2)), Flags);
setValue(&I, Add);		setValue(&I, Add);
}		}
return;		return;
		craig.topperUnsubmitted Not Done Reply Inline Actions I know this was taken from visitTargetIntrinsic, but I don't really understand it. craig.topper: I know this was taken from visitTargetIntrinsic, but I don't really understand it.
		jpagesAuthorUnsubmitted Done Reply Inline Actions I removed this line, I'm not sure of what it was doing either. jpages: I removed this line, I'm not sure of what it was doing either.
}		}
case Intrinsic::convert_to_fp16:		case Intrinsic::convert_to_fp16:
setValue(&I, DAG.getNode(ISD::BITCAST, sdl, MVT::i16,		setValue(&I, DAG.getNode(ISD::BITCAST, sdl, MVT::i16,
DAG.getNode(ISD::FP_ROUND, sdl, MVT::f16,		DAG.getNode(ISD::FP_ROUND, sdl, MVT::f16,
		craig.topperUnsubmitted Done Reply Inline Actions I don't think this is needed. AssertZExt isn't useful for an FP result. craig.topper: I don't think this is needed. AssertZExt isn't useful for an FP result.
getValue(I.getArgOperand(0)),		getValue(I.getArgOperand(0)),
DAG.getTargetConstant(0, sdl,		DAG.getTargetConstant(0, sdl,
		craig.topperUnsubmitted Done Reply Inline Actions I don't think this alignment stuff is needed. The return type is FP so it probably doesn't do anything. craig.topper: I don't think this alignment stuff is needed. The return type is FP so it probably doesn't do…
MVT::i32))));		MVT::i32))));
		craig.topperUnsubmitted Not Done Reply Inline Actions The code has optionally added a chain, but INTRINSIC_W_CHAIN must always have a chain. Whether there is a chain or not is a property of the ISD opcode. craig.topper: The code has optionally added a chain, but INTRINSIC_W_CHAIN must always have a chain. Whether…
return;		return;
case Intrinsic::convert_from_fp16:		case Intrinsic::convert_from_fp16:
setValue(&I, DAG.getNode(ISD::FP_EXTEND, sdl,		setValue(&I, DAG.getNode(ISD::FP_EXTEND, sdl,
TLI.getValueType(DAG.getDataLayout(), I.getType()),		TLI.getValueType(DAG.getDataLayout(), I.getType()),
DAG.getNode(ISD::BITCAST, sdl, MVT::f16,		DAG.getNode(ISD::BITCAST, sdl, MVT::f16,
getValue(I.getArgOperand(0)))));		getValue(I.getArgOperand(0)))));
return;		return;
case Intrinsic::fptosi_sat: {		case Intrinsic::fptosi_sat: {
▲ Show 20 Lines • Show All 4,900 Lines • Show Last 20 Lines

llvm/lib/IR/Verifier.cpp

Show First 20 Lines • Show All 4,732 Lines • ▼ Show 20 Lines	Assert(GV && GV->isConstant() && GV->hasDefinitiveInitializer(),
"info argument of llvm.coro.id must refer to an initialized "		"info argument of llvm.coro.id must refer to an initialized "
"constant");		"constant");
Constant *Init = GV->getInitializer();		Constant *Init = GV->getInitializer();
Assert(isa<ConstantStruct>(Init) \|\| isa<ConstantArray>(Init),		Assert(isa<ConstantStruct>(Init) \|\| isa<ConstantArray>(Init),
"info argument of llvm.coro.id must refer to either a struct or "		"info argument of llvm.coro.id must refer to either a struct or "
"an array");		"an array");
break;		break;
}		}
		case Intrinsic::fptrunc_round: {
		// Check the rounding mode
		Metadata *MD = nullptr;
		auto *MAV = dyn_cast<MetadataAsValue>(Call.getOperand(1));
		if (MAV)
		MD = MAV->getMetadata();

		Assert(MD != nullptr, "missing rounding mode argument", Call);
		foadUnsubmitted Done Reply Inline Actions MD could be nullptr here, so you need to handle that. foad: MD could be nullptr here, so you need to handle that.
		foadUnsubmitted Done Reply Inline Actions I am not an expert on verifying metadata but I think you probably need to check that it is an MDString here, otherwise the cast<MDString> below could fail? And really we ought to have some tests in test/Verifier/llvm.fptrunc.round.ll. foad: I am not an expert on verifying metadata but I think you probably need to check that it is an…

		Assert(isa<MDString>(MD),
		foadUnsubmitted Done Reply Inline Actions Check that the rounding mode is not "dynamic"? foad: Check that the rounding mode is not "dynamic"?
		foadUnsubmitted Done Reply Inline Actions `isa` is a bit more natural than `dyn_cast` here. foad: `isa` is a bit more natural than `dyn_cast` here.
		("invalid value for llvm.fptrunc.round metadata operand"
		" (the operand should be a string)"),
		foadUnsubmitted Done Reply Inline Actions I am not sure but I think `RoundMode != RoundingMode::Dynamic` needs to be `RoundMode.getValue() != ...` (or `RoundMode != ...`). foad:* I am not sure but I think `RoundMode != RoundingMode::Dynamic` needs to be `RoundMode.getValue…
		MD);

		Optional<RoundingMode> RoundMode =
		convertStrToRoundingMode(cast<MDString>(MD)->getString());
		Assert(RoundMode.hasValue() &&
		RoundMode.getValue() != RoundingMode::Dynamic,
		"unsupported rounding mode argument", Call);
		break;
		}
#define INSTRUCTION(NAME, NARGS, ROUND_MODE, INTRINSIC) \		#define INSTRUCTION(NAME, NARGS, ROUND_MODE, INTRINSIC) \
case Intrinsic::INTRINSIC:		case Intrinsic::INTRINSIC:
#include "llvm/IR/ConstrainedOps.def"		#include "llvm/IR/ConstrainedOps.def"
visitConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(Call));		visitConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(Call));
break;		break;
case Intrinsic::dbg_declare: // llvm.dbg.declare		case Intrinsic::dbg_declare: // llvm.dbg.declare
Assert(isa<MetadataAsValue>(Call.getArgOperand(0)),		Assert(isa<MetadataAsValue>(Call.getArgOperand(0)),
"invalid llvm.dbg.declare intrinsic call 1", Call);		"invalid llvm.dbg.declare intrinsic call 1", Call);
▲ Show 20 Lines • Show All 1,563 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUGISel.td

	Show First 20 Lines • Show All 214 Lines • ▼ Show 20 Lines
	def : GINodeEquiv<G_AMDGPU_BUFFER_ATOMIC_INC, SIbuffer_atomic_inc>;			def : GINodeEquiv<G_AMDGPU_BUFFER_ATOMIC_INC, SIbuffer_atomic_inc>;
	def : GINodeEquiv<G_AMDGPU_BUFFER_ATOMIC_DEC, SIbuffer_atomic_dec>;			def : GINodeEquiv<G_AMDGPU_BUFFER_ATOMIC_DEC, SIbuffer_atomic_dec>;
	def : GINodeEquiv<G_AMDGPU_BUFFER_ATOMIC_FADD, SIbuffer_atomic_fadd>;			def : GINodeEquiv<G_AMDGPU_BUFFER_ATOMIC_FADD, SIbuffer_atomic_fadd>;
	def : GINodeEquiv<G_AMDGPU_BUFFER_ATOMIC_FMIN, SIbuffer_atomic_fmin>;			def : GINodeEquiv<G_AMDGPU_BUFFER_ATOMIC_FMIN, SIbuffer_atomic_fmin>;
	def : GINodeEquiv<G_AMDGPU_BUFFER_ATOMIC_FMAX, SIbuffer_atomic_fmax>;			def : GINodeEquiv<G_AMDGPU_BUFFER_ATOMIC_FMAX, SIbuffer_atomic_fmax>;
	def : GINodeEquiv<G_AMDGPU_BUFFER_ATOMIC_CMPSWAP, SIbuffer_atomic_cmpswap>;			def : GINodeEquiv<G_AMDGPU_BUFFER_ATOMIC_CMPSWAP, SIbuffer_atomic_cmpswap>;
	def : GINodeEquiv<G_AMDGPU_S_BUFFER_LOAD, SIsbuffer_load>;			def : GINodeEquiv<G_AMDGPU_S_BUFFER_LOAD, SIsbuffer_load>;

				def : GINodeEquiv<G_FPTRUNC_ROUND_UPWARD, SIfptrunc_round_upward>;
				def : GINodeEquiv<G_FPTRUNC_ROUND_DOWNWARD, SIfptrunc_round_downward>;

	class GISelSop2Pat <			class GISelSop2Pat <
	SDPatternOperator node,			SDPatternOperator node,
	Instruction inst,			Instruction inst,
	ValueType dst_vt,			ValueType dst_vt,
	ValueType src0_vt = dst_vt, ValueType src1_vt = src0_vt> : GCNPat <			ValueType src0_vt = dst_vt, ValueType src1_vt = src0_vt> : GCNPat <

	(dst_vt (node (src0_vt SReg_32:$src0), (src1_vt SReg_32:$src1))),			(dst_vt (node (src0_vt SReg_32:$src0), (src1_vt SReg_32:$src1))),
	(inst src0_vt:$src0, src1_vt:$src1)			(inst src0_vt:$src0, src1_vt:$src1)
	▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 477 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
/// T1\|v.y\| \| \| \|		/// T1\|v.y\| \| \| \|
/// T2\|v.z\| \| \| \|		/// T2\|v.z\| \| \| \|
/// T3\|v.w\| \| \| \|		/// T3\|v.w\| \| \| \|
BUILD_VERTICAL_VECTOR,		BUILD_VERTICAL_VECTOR,
/// Pointer to the start of the shader's constant data.		/// Pointer to the start of the shader's constant data.
CONST_DATA_PTR,		CONST_DATA_PTR,
PC_ADD_REL_OFFSET,		PC_ADD_REL_OFFSET,
LDS,		LDS,
		FPTRUNC_ROUND_UPWARD,
		FPTRUNC_ROUND_DOWNWARD,

DUMMY_CHAIN,		DUMMY_CHAIN,
FIRST_MEM_OPCODE_NUMBER = ISD::FIRST_TARGET_MEMORY_OPCODE,		FIRST_MEM_OPCODE_NUMBER = ISD::FIRST_TARGET_MEMORY_OPCODE,
LOAD_D16_HI,		LOAD_D16_HI,
LOAD_D16_LO,		LOAD_D16_LO,
LOAD_D16_HI_I8,		LOAD_D16_HI_I8,
LOAD_D16_HI_U8,		LOAD_D16_HI_U8,
LOAD_D16_LO_I8,		LOAD_D16_LO_I8,
LOAD_D16_LO_U8,		LOAD_D16_LO_U8,
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 4,439 Lines • ▼ Show 20 Lines	const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
NODE_NAME_CASE(CVT_PKNORM_U16_F32)		NODE_NAME_CASE(CVT_PKNORM_U16_F32)
NODE_NAME_CASE(CVT_PK_I16_I32)		NODE_NAME_CASE(CVT_PK_I16_I32)
NODE_NAME_CASE(CVT_PK_U16_U32)		NODE_NAME_CASE(CVT_PK_U16_U32)
NODE_NAME_CASE(FP_TO_FP16)		NODE_NAME_CASE(FP_TO_FP16)
NODE_NAME_CASE(BUILD_VERTICAL_VECTOR)		NODE_NAME_CASE(BUILD_VERTICAL_VECTOR)
NODE_NAME_CASE(CONST_DATA_PTR)		NODE_NAME_CASE(CONST_DATA_PTR)
NODE_NAME_CASE(PC_ADD_REL_OFFSET)		NODE_NAME_CASE(PC_ADD_REL_OFFSET)
NODE_NAME_CASE(LDS)		NODE_NAME_CASE(LDS)
		NODE_NAME_CASE(FPTRUNC_ROUND_UPWARD)
		NODE_NAME_CASE(FPTRUNC_ROUND_DOWNWARD)
NODE_NAME_CASE(DUMMY_CHAIN)		NODE_NAME_CASE(DUMMY_CHAIN)
case AMDGPUISD::FIRST_MEM_OPCODE_NUMBER: break;		case AMDGPUISD::FIRST_MEM_OPCODE_NUMBER: break;
NODE_NAME_CASE(LOAD_D16_HI)		NODE_NAME_CASE(LOAD_D16_HI)
NODE_NAME_CASE(LOAD_D16_LO)		NODE_NAME_CASE(LOAD_D16_LO)
NODE_NAME_CASE(LOAD_D16_HI_I8)		NODE_NAME_CASE(LOAD_D16_HI_I8)
NODE_NAME_CASE(LOAD_D16_HI_U8)		NODE_NAME_CASE(LOAD_D16_HI_U8)
NODE_NAME_CASE(LOAD_D16_LO_I8)		NODE_NAME_CASE(LOAD_D16_LO_I8)
NODE_NAME_CASE(LOAD_D16_LO_U8)		NODE_NAME_CASE(LOAD_D16_LO_U8)
▲ Show 20 Lines • Show All 422 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h

Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	public:
bool legalizeBufferLoad(MachineInstr &MI, MachineRegisterInfo &MRI,		bool legalizeBufferLoad(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &B, bool IsFormat,		MachineIRBuilder &B, bool IsFormat,
bool IsTyped) const;		bool IsTyped) const;
bool legalizeBufferAtomic(MachineInstr &MI, MachineIRBuilder &B,		bool legalizeBufferAtomic(MachineInstr &MI, MachineIRBuilder &B,
Intrinsic::ID IID) const;		Intrinsic::ID IID) const;

bool legalizeBVHIntrinsic(MachineInstr &MI, MachineIRBuilder &B) const;		bool legalizeBVHIntrinsic(MachineInstr &MI, MachineIRBuilder &B) const;

		bool legalizeFPTruncRound(MachineInstr &MI, MachineIRBuilder &B) const;

bool legalizeImageIntrinsic(		bool legalizeImageIntrinsic(
MachineInstr &MI, MachineIRBuilder &B,		MachineInstr &MI, MachineIRBuilder &B,
GISelChangeObserver &Observer,		GISelChangeObserver &Observer,
const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr) const;		const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr) const;

bool legalizeSBufferLoad(LegalizerHelper &Helper, MachineInstr &MI) const;		bool legalizeSBufferLoad(LegalizerHelper &Helper, MachineInstr &MI) const;

bool legalizeAtomicIncDec(MachineInstr &MI, MachineIRBuilder &B,		bool legalizeAtomicIncDec(MachineInstr &MI, MachineIRBuilder &B,
Show All 18 Lines

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Show First 20 Lines • Show All 833 Lines • ▼ Show 20 Lines	AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_,
else		else
FPToI.minScalar(1, S32);		FPToI.minScalar(1, S32);

FPToI.minScalar(0, S32)		FPToI.minScalar(0, S32)
.widenScalarToNextPow2(0, 32)		.widenScalarToNextPow2(0, 32)
.scalarize(0)		.scalarize(0)
.lower();		.lower();

		getActionDefinitionsBuilder(G_INTRINSIC_FPTRUNC_ROUND)
		.customFor({S16, S32})
		.scalarize(0)
		.lower();

// Lower roundeven into G_FRINT		// Lower roundeven into G_FRINT
getActionDefinitionsBuilder({G_INTRINSIC_ROUND, G_INTRINSIC_ROUNDEVEN})		getActionDefinitionsBuilder({G_INTRINSIC_ROUND, G_INTRINSIC_ROUNDEVEN})
.scalarize(0)		.scalarize(0)
.lower();		.lower();

if (ST.has16BitInsts()) {		if (ST.has16BitInsts()) {
getActionDefinitionsBuilder({G_INTRINSIC_TRUNC, G_FCEIL, G_FRINT})		getActionDefinitionsBuilder({G_INTRINSIC_TRUNC, G_FCEIL, G_FRINT})
.legalFor({S16, S32, S64})		.legalFor({S16, S32, S64})
▲ Show 20 Lines • Show All 916 Lines • ▼ Show 20 Lines	case TargetOpcode::G_FPOW:
return legalizeFPow(MI, B);		return legalizeFPow(MI, B);
case TargetOpcode::G_FFLOOR:		case TargetOpcode::G_FFLOOR:
return legalizeFFloor(MI, MRI, B);		return legalizeFFloor(MI, MRI, B);
case TargetOpcode::G_BUILD_VECTOR:		case TargetOpcode::G_BUILD_VECTOR:
return legalizeBuildVector(MI, MRI, B);		return legalizeBuildVector(MI, MRI, B);
case TargetOpcode::G_CTLZ:		case TargetOpcode::G_CTLZ:
case TargetOpcode::G_CTTZ:		case TargetOpcode::G_CTTZ:
return legalizeCTLZ_CTTZ(MI, MRI, B);		return legalizeCTLZ_CTTZ(MI, MRI, B);
		case TargetOpcode::G_INTRINSIC_FPTRUNC_ROUND:
		return legalizeFPTruncRound(MI, B);
default:		default:
return false;		return false;
}		}

llvm_unreachable("expected switch to return");		llvm_unreachable("expected switch to return");
}		}

Register AMDGPULegalizerInfo::getSegmentAperture(		Register AMDGPULegalizerInfo::getSegmentAperture(
▲ Show 20 Lines • Show All 3,131 Lines • ▼ Show 20 Lines	bool AMDGPULegalizerInfo::legalizeBVHIntrinsic(MachineInstr &MI,
MIB.addUse(TDescr)		MIB.addUse(TDescr)
.addImm(IsA16 ? 1 : 0)		.addImm(IsA16 ? 1 : 0)
.cloneMemRefs(MI);		.cloneMemRefs(MI);

MI.eraseFromParent();		MI.eraseFromParent();
return true;		return true;
}		}

		bool AMDGPULegalizerInfo::legalizeFPTruncRound(MachineInstr &MI,
		MachineIRBuilder &B) const {
		unsigned Opc;
		int RoundMode = MI.getOperand(2).getImm();

		if (RoundMode == (int)RoundingMode::TowardPositive)
		Opc = AMDGPU::G_FPTRUNC_ROUND_UPWARD;
		else if (RoundMode == (int)RoundingMode::TowardNegative)
		Opc = AMDGPU::G_FPTRUNC_ROUND_DOWNWARD;
		else
		return false;

		foadUnsubmitted Done Reply Inline Actions Should this be "return false" to indicate a legalization failure in the normal way? foad: Should this be "return false" to indicate a legalization failure in the normal way?
		B.buildInstr(Opc)
		.addDef(MI.getOperand(0).getReg())
		.addUse(MI.getOperand(1).getReg());

		MI.eraseFromParent();

		return true;
		}

bool AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,		bool AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
MachineInstr &MI) const {		MachineInstr &MI) const {
MachineIRBuilder &B = Helper.MIRBuilder;		MachineIRBuilder &B = Helper.MIRBuilder;
MachineRegisterInfo &MRI = *B.getMRI();		MachineRegisterInfo &MRI = *B.getMRI();

// Replace the use G_BRCOND with the exec manipulate and branch pseudos.		// Replace the use G_BRCOND with the exec manipulate and branch pseudos.
auto IntrID = MI.getIntrinsicID();		auto IntrID = MI.getIntrinsicID();
switch (IntrID) {		switch (IntrID) {
▲ Show 20 Lines • Show All 216 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

Show First 20 Lines • Show All 4,484 Lines • ▼ Show 20 Lines	case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS: {
case Intrinsic::amdgcn_ds_gws_sema_p:		case Intrinsic::amdgcn_ds_gws_sema_p:
case Intrinsic::amdgcn_ds_gws_sema_release_all: {		case Intrinsic::amdgcn_ds_gws_sema_release_all: {
// This must be an SGPR, but accept a VGPR.		// This must be an SGPR, but accept a VGPR.
unsigned Bank = getRegBankID(MI.getOperand(1).getReg(), MRI,		unsigned Bank = getRegBankID(MI.getOperand(1).getReg(), MRI,
AMDGPU::SGPRRegBankID);		AMDGPU::SGPRRegBankID);
OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);		OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);
break;		break;
}		}
default:		default:
return getInvalidInstructionMapping();		return getInvalidInstructionMapping();
}		}
		arsenmUnsubmitted Not Done Reply Inline Actions The backend should not directly consume generic intrinsics. These need to be routed through a generic opcode which is subject to normal legalization arsenm: The backend should not directly consume generic intrinsics. These need to be routed through a…
		jpagesAuthorUnsubmitted Not Done Reply Inline Actions What is the reasoning behind this requirement? I guess I would need something like experimental_fptrunc_round_upward -> amdgpu_experimental_fptrunc_round_upward -> codegen? jpages: What is the reasoning behind this requirement? I guess I would need something like…
		arsenmUnsubmitted Not Done Reply Inline Actions No, you would just need G_FPTRUNC_ROUND_UPWARD etc. opcodes. We don't have a way to define legalization rules on intrinsics, so if you need type legalization the backend would have to take care of it manually. e.g. we would have to manually scalarize the vector overloads of these intrinsics. With the opcode you can just define legalization rules like normal arsenm: No, you would just need G_FPTRUNC_ROUND_UPWARD etc. opcodes. We don't have a way to define…
break;		break;
}		}
case AMDGPU::G_SELECT: {		case AMDGPU::G_SELECT: {
unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();		unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
unsigned Op2Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,		unsigned Op2Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
AMDGPU::SGPRRegBankID);		AMDGPU::SGPRRegBankID);
unsigned Op3Bank = getRegBankID(MI.getOperand(3).getReg(), MRI,		unsigned Op3Bank = getRegBankID(MI.getOperand(3).getReg(), MRI,
AMDGPU::SGPRRegBankID);		AMDGPU::SGPRRegBankID);
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	unsigned Bank = getRegBankID(MI.getOperand(0).getReg(), MRI,
AMDGPU::SGPRRegBankID);		AMDGPU::SGPRRegBankID);
assert(MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 1);		assert(MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 1);
if (Bank != AMDGPU::SGPRRegBankID)		if (Bank != AMDGPU::SGPRRegBankID)
Bank = AMDGPU::VCCRegBankID;		Bank = AMDGPU::VCCRegBankID;

OpdsMapping[0] = AMDGPU::getValueMapping(Bank, 1);		OpdsMapping[0] = AMDGPU::getValueMapping(Bank, 1);
break;		break;
}		}
		case AMDGPU::G_FPTRUNC_ROUND_UPWARD:
		case AMDGPU::G_FPTRUNC_ROUND_DOWNWARD:
		return getDefaultMappingVOP(MI);
}		}
		foadUnsubmitted Done Reply Inline Actions sgpr seems wrong. This is specifying the mapping for the result value and the input value (not the rounding mode which is no longer an operand). I think this should just use the same mapping as G_FPTRUNC (line 3640). foad: sgpr seems wrong. This is specifying the mapping for the result value and the input value (not…

return getInstructionMapping(/ID/1, /Cost/1,		return getInstructionMapping(/ID/1, /Cost/1,
getOperandsMapping(OpdsMapping),		getOperandsMapping(OpdsMapping),
MI.getNumOperands());		MI.getNumOperands());
}		}

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 11 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "SIISelLowering.h"		#include "SIISelLowering.h"
#include "AMDGPU.h"		#include "AMDGPU.h"
#include "AMDGPUInstrInfo.h"		#include "AMDGPUInstrInfo.h"
#include "AMDGPUTargetMachine.h"		#include "AMDGPUTargetMachine.h"
#include "SIMachineFunctionInfo.h"		#include "SIMachineFunctionInfo.h"
#include "SIRegisterInfo.h"		#include "SIRegisterInfo.h"
		#include "llvm/ADT/FloatingPointMode.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/LegacyDivergenceAnalysis.h"		#include "llvm/Analysis/LegacyDivergenceAnalysis.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/BinaryFormat/ELF.h"		#include "llvm/BinaryFormat/ELF.h"
#include "llvm/CodeGen/Analysis.h"		#include "llvm/CodeGen/Analysis.h"
#include "llvm/CodeGen/FunctionLoweringInfo.h"		#include "llvm/CodeGen/FunctionLoweringInfo.h"
#include "llvm/CodeGen/GlobalISel/GISelKnownBits.h"		#include "llvm/CodeGen/GlobalISel/GISelKnownBits.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
▲ Show 20 Lines • Show All 569 Lines • ▼ Show 20 Lines	if (Subtarget->has16BitInsts()) {
setOperationAction(ISD::SINT_TO_FP, MVT::i16, Custom);		setOperationAction(ISD::SINT_TO_FP, MVT::i16, Custom);
setOperationAction(ISD::UINT_TO_FP, MVT::i16, Custom);		setOperationAction(ISD::UINT_TO_FP, MVT::i16, Custom);

setOperationAction(ISD::FP_TO_SINT, MVT::f16, Promote);		setOperationAction(ISD::FP_TO_SINT, MVT::f16, Promote);
setOperationAction(ISD::FP_TO_UINT, MVT::f16, Promote);		setOperationAction(ISD::FP_TO_UINT, MVT::f16, Promote);
setOperationAction(ISD::SINT_TO_FP, MVT::f16, Promote);		setOperationAction(ISD::SINT_TO_FP, MVT::f16, Promote);
setOperationAction(ISD::UINT_TO_FP, MVT::f16, Promote);		setOperationAction(ISD::UINT_TO_FP, MVT::f16, Promote);
setOperationAction(ISD::FROUND, MVT::f16, Custom);		setOperationAction(ISD::FROUND, MVT::f16, Custom);
		setOperationAction(ISD::FPTRUNC_ROUND, MVT::f16, Custom);

// F16 - VOP2 Actions.		// F16 - VOP2 Actions.
setOperationAction(ISD::BR_CC, MVT::f16, Expand);		setOperationAction(ISD::BR_CC, MVT::f16, Expand);
setOperationAction(ISD::SELECT_CC, MVT::f16, Expand);		setOperationAction(ISD::SELECT_CC, MVT::f16, Expand);

setOperationAction(ISD::FDIV, MVT::f16, Custom);		setOperationAction(ISD::FDIV, MVT::f16, Custom);

// F16 - VOP3 Actions.		// F16 - VOP3 Actions.
▲ Show 20 Lines • Show All 4,035 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::EXTRACT_VECTOR_ELT:		case ISD::EXTRACT_VECTOR_ELT:
return lowerEXTRACT_VECTOR_ELT(Op, DAG);		return lowerEXTRACT_VECTOR_ELT(Op, DAG);
case ISD::VECTOR_SHUFFLE:		case ISD::VECTOR_SHUFFLE:
return lowerVECTOR_SHUFFLE(Op, DAG);		return lowerVECTOR_SHUFFLE(Op, DAG);
case ISD::BUILD_VECTOR:		case ISD::BUILD_VECTOR:
return lowerBUILD_VECTOR(Op, DAG);		return lowerBUILD_VECTOR(Op, DAG);
case ISD::FP_ROUND:		case ISD::FP_ROUND:
return lowerFP_ROUND(Op, DAG);		return lowerFP_ROUND(Op, DAG);
		case ISD::FPTRUNC_ROUND: {
		unsigned Opc;
		SDLoc DL(Op);

		if (Op.getOperand(0)->getValueType(0) != MVT::f32)
		foadUnsubmitted Done Reply Inline Actions Do you also need to check that the operand is MVT::f32? Maybe add a test where the operand is f64, and check that it fails to legalize? foad: Do you also need to check that the operand is MVT::f32? Maybe add a test where the operand is…
		return SDValue();

		// Get the rounding mode from the last operand
		int RoundMode = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
		if (RoundMode == (int)RoundingMode::TowardPositive)
		Opc = AMDGPUISD::FPTRUNC_ROUND_UPWARD;
		else if (RoundMode == (int)RoundingMode::TowardNegative)
		Opc = AMDGPUISD::FPTRUNC_ROUND_DOWNWARD;
		else
		return SDValue();

		return DAG.getNode(Opc, DL, Op.getNode()->getVTList(), Op->getOperand(0));
		}
case ISD::TRAP:		case ISD::TRAP:
return lowerTRAP(Op, DAG);		return lowerTRAP(Op, DAG);
case ISD::DEBUGTRAP:		case ISD::DEBUGTRAP:
return lowerDEBUGTRAP(Op, DAG);		return lowerDEBUGTRAP(Op, DAG);
case ISD::FABS:		case ISD::FABS:
case ISD::FNEG:		case ISD::FNEG:
case ISD::FCANONICALIZE:		case ISD::FCANONICALIZE:
case ISD::BSWAP:		case ISD::BSWAP:
▲ Show 20 Lines • Show All 2,152 Lines • ▼ Show 20 Lines	return DAG.getNode(AMDGPUISD::FMAD_FTZ, DL, VT, Op.getOperand(1),
Op.getOperand(2), Op.getOperand(3));		Op.getOperand(2), Op.getOperand(3));

case Intrinsic::amdgcn_if_break:		case Intrinsic::amdgcn_if_break:
return SDValue(DAG.getMachineNode(AMDGPU::SI_IF_BREAK, DL, VT,		return SDValue(DAG.getMachineNode(AMDGPU::SI_IF_BREAK, DL, VT,
Op->getOperand(1), Op->getOperand(2)), 0);		Op->getOperand(1), Op->getOperand(2)), 0);

case Intrinsic::amdgcn_groupstaticsize: {		case Intrinsic::amdgcn_groupstaticsize: {
Triple::OSType OS = getTargetMachine().getTargetTriple().getOS();		Triple::OSType OS = getTargetMachine().getTargetTriple().getOS();
if (OS == Triple::AMDHSA \|\| OS == Triple::AMDPAL)		if (OS == Triple::AMDHSA \|\| OS == Triple::AMDPAL)
		foadUnsubmitted Not Done Reply Inline Actions I'm not quite sure but I think you should return SDValue() here to indicate a lowering/legalization failure? foad: I'm not quite sure but I think you should return SDValue() here to indicate a…
return Op;		return Op;

const Module *M = MF.getFunction().getParent();		const Module *M = MF.getFunction().getParent();
const GlobalValue *GV =		const GlobalValue *GV =
M->getNamedValue(Intrinsic::getName(Intrinsic::amdgcn_groupstaticsize));		M->getNamedValue(Intrinsic::getName(Intrinsic::amdgcn_groupstaticsize));
SDValue GA = DAG.getTargetGlobalAddress(GV, DL, MVT::i32, 0,		SDValue GA = DAG.getTargetGlobalAddress(GV, DL, MVT::i32, 0,
SIInstrInfo::MO_ABS32_LO);		SIInstrInfo::MO_ABS32_LO);
return {DAG.getMachineNode(AMDGPU::S_MOV_B32, DL, MVT::i32, GA), 0};		return {DAG.getMachineNode(AMDGPU::S_MOV_B32, DL, MVT::i32, GA), 0};
▲ Show 20 Lines • Show All 5,610 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.td

Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines	def SIload_d16_hi_i8 : SDNode<"AMDGPUISD::LOAD_D16_HI_I8",
[SDNPMayLoad, SDNPMemOperand, SDNPHasChain]		[SDNPMayLoad, SDNPMemOperand, SDNPHasChain]
>;		>;

def SIdenorm_mode : SDNode<"AMDGPUISD::DENORM_MODE",		def SIdenorm_mode : SDNode<"AMDGPUISD::DENORM_MODE",
SDTypeProfile<0 ,1, [SDTCisInt<0>]>,		SDTypeProfile<0 ,1, [SDTCisInt<0>]>,
[SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]		[SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]
>;		>;

		def SIfptrunc_round_upward : SDNode<"AMDGPUISD::FPTRUNC_ROUND_UPWARD",
		SDTFPRoundOp
		>;

		def SIfptrunc_round_downward : SDNode<"AMDGPUISD::FPTRUNC_ROUND_DOWNWARD",
		SDTFPRoundOp
		>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ValueType helpers		// ValueType helpers
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// Returns 1 if the source arguments have modifiers, 0 if they do not.		// Returns 1 if the source arguments have modifiers, 0 if they do not.
// XXX - do f16 instructions?		// XXX - do f16 instructions?
class isFloatType<ValueType SrcVT> {		class isFloatType<ValueType SrcVT> {
bit ret = !or(!eq(SrcVT.Value, f16.Value),		bit ret = !or(!eq(SrcVT.Value, f16.Value),
▲ Show 20 Lines • Show All 2,329 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstructions.td

Show First 20 Lines • Show All 170 Lines • ▼ Show 20 Lines
}		}

def EXIT_STRICT_WQM : SPseudoInstSI <(outs SReg_1:$sdst), (ins SReg_1:$src0)> {		def EXIT_STRICT_WQM : SPseudoInstSI <(outs SReg_1:$sdst), (ins SReg_1:$src0)> {
let hasSideEffects = 0;		let hasSideEffects = 0;
let mayLoad = 0;		let mayLoad = 0;
let mayStore = 0;		let mayStore = 0;
}		}

		// Pseudo instructions used for @llvm.fptrunc.round upward
		// and @llvm.fptrunc.round downward.
		// These intrinsics will be legalized to G_FPTRUNC_ROUND_UPWARD
		// and G_FPTRUNC_ROUND_DOWNWARD before being lowered to
		// FPTRUNC_UPWARD_PSEUDO and FPTRUNC_DOWNWARD_PSEUDO.
		foadUnsubmitted Done Reply Inline Actions Can you implement instruction selection for the intrinsics by adding patterns here, instead of writing C++ code? foad: Can you implement instruction selection for the intrinsics by adding patterns here, instead of…
		jpagesAuthorUnsubmitted Done Reply Inline Actions Like I said previously, I tried to use pure tablegen for both SelectionDag and global-isel and I could not find a way to do it for both. jpages: Like I said previously, I tried to use pure tablegen for both SelectionDag and global-isel and…
		// The final codegen is done in the ModeRegister pass.
		let Uses = [MODE, EXEC] in {
		def FPTRUNC_UPWARD_PSEUDO : VPseudoInstSI <(outs VGPR_32:$vdst),
		(ins VGPR_32:$src0),
		[(set f16:$vdst, (SIfptrunc_round_upward f32:$src0))]>;
		foadUnsubmitted Done Reply Inline Actions Just end the line with `;` if there is nothing to go inside the `{}`. foad: Just end the line with `;` if there is nothing to go inside the `{}`.

		def FPTRUNC_DOWNWARD_PSEUDO : VPseudoInstSI <(outs VGPR_32:$vdst),
		(ins VGPR_32:$src0),
		[(set f16:$vdst, (SIfptrunc_round_downward f32:$src0))]>;
		} // End Uses = [MODE, EXEC]

// Invert the exec mask and overwrite the inactive lanes of dst with inactive,		// Invert the exec mask and overwrite the inactive lanes of dst with inactive,
// restoring it after we're done.		// restoring it after we're done.
let Defs = [SCC] in {		let Defs = [SCC] in {
def V_SET_INACTIVE_B32 : VPseudoInstSI <(outs VGPR_32:$vdst),		def V_SET_INACTIVE_B32 : VPseudoInstSI <(outs VGPR_32:$vdst),
(ins VGPR_32: $src, VSrc_b32:$inactive),		(ins VGPR_32: $src, VSrc_b32:$inactive),
[(set i32:$vdst, (int_amdgcn_set_inactive i32:$src, i32:$inactive))]> {		[(set i32:$vdst, (int_amdgcn_set_inactive i32:$src, i32:$inactive))]> {
let Constraints = "$src = $vdst";		let Constraints = "$src = $vdst";
}		}
▲ Show 20 Lines • Show All 2,884 Lines • ▼ Show 20 Lines	def G_SI_CALL : AMDGPUGenericInstruction {
let InOperandList = (ins type0:$src0, unknown:$callee);		let InOperandList = (ins type0:$src0, unknown:$callee);
let Size = 4;		let Size = 4;
let isCall = 1;		let isCall = 1;
let UseNamedOperandTable = 1;		let UseNamedOperandTable = 1;
let SchedRW = [WriteBranch];		let SchedRW = [WriteBranch];
// TODO: Should really base this on the call target		// TODO: Should really base this on the call target
let isConvergent = 1;		let isConvergent = 1;
}		}

		def G_FPTRUNC_ROUND_UPWARD : AMDGPUGenericInstruction {
		let OutOperandList = (outs type0:$vdst);
		let InOperandList = (ins type1:$src0);
		let hasSideEffects = 0;
		}

		def G_FPTRUNC_ROUND_DOWNWARD : AMDGPUGenericInstruction {
		let OutOperandList = (outs type0:$vdst);
		let InOperandList = (ins type1:$src0);
		let hasSideEffects = 0;
		}

llvm/lib/Target/AMDGPU/SIModeRegister.cpp

Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
FunctionPass *llvm::createSIModeRegisterPass() { return new SIModeRegister(); }		FunctionPass *llvm::createSIModeRegisterPass() { return new SIModeRegister(); }

// Determine the Mode register setting required for this instruction.		// Determine the Mode register setting required for this instruction.
// Instructions which don't use the Mode register return a null Status.		// Instructions which don't use the Mode register return a null Status.
// Note this currently only deals with instructions that use the floating point		// Note this currently only deals with instructions that use the floating point
// double precision setting.		// double precision setting.
Status SIModeRegister::getInstructionMode(MachineInstr &MI,		Status SIModeRegister::getInstructionMode(MachineInstr &MI,
const SIInstrInfo *TII) {		const SIInstrInfo *TII) {
if (TII->usesFPDPRounding(MI)) {		if (TII->usesFPDPRounding(MI) \|\|
		MI.getOpcode() == AMDGPU::FPTRUNC_UPWARD_PSEUDO \|\|
		MI.getOpcode() == AMDGPU::FPTRUNC_DOWNWARD_PSEUDO) {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case AMDGPU::V_INTERP_P1LL_F16:		case AMDGPU::V_INTERP_P1LL_F16:
case AMDGPU::V_INTERP_P1LV_F16:		case AMDGPU::V_INTERP_P1LV_F16:
case AMDGPU::V_INTERP_P2_F16:		case AMDGPU::V_INTERP_P2_F16:
// f16 interpolation instructions need double precision round to zero		// f16 interpolation instructions need double precision round to zero
return Status(FP_ROUND_MODE_DP(3),		return Status(FP_ROUND_MODE_DP(3),
FP_ROUND_MODE_DP(FP_ROUND_ROUND_TO_ZERO));		FP_ROUND_MODE_DP(FP_ROUND_ROUND_TO_ZERO));
		case AMDGPU::FPTRUNC_UPWARD_PSEUDO: {
		// Replacing the pseudo by a real instruction
		MI.setDesc(TII->get(AMDGPU::V_CVT_F16_F32_e32));
		return Status(FP_ROUND_MODE_DP(3),
		FP_ROUND_MODE_DP(FP_ROUND_ROUND_TO_INF));
		}
		case AMDGPU::FPTRUNC_DOWNWARD_PSEUDO: {
		// Replacing the pseudo by a real instruction
		MI.setDesc(TII->get(AMDGPU::V_CVT_F16_F32_e32));
		return Status(FP_ROUND_MODE_DP(3),
		FP_ROUND_MODE_DP(FP_ROUND_ROUND_TO_NEGINF));
		}
default:		default:
return DefaultStatus;		return DefaultStatus;
}		}
}		}
return Status();		return Status();
}		}

// Insert a setreg instruction to update the Mode register.		// Insert a setreg instruction to update the Mode register.
▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	bool SIModeRegister::runOnMachineFunction(MachineFunction &MF) {

// Phase 1 - determine the initial mode required by each block, and add setreg		// Phase 1 - determine the initial mode required by each block, and add setreg
// instructions for intra block requirements.		// instructions for intra block requirements.
for (MachineBasicBlock &BB : MF)		for (MachineBasicBlock &BB : MF)
processBlockPhase1(BB, TII);		processBlockPhase1(BB, TII);

// Phase 2 - determine the exit mode from each block. We add all blocks to the		// Phase 2 - determine the exit mode from each block. We add all blocks to the
// list here, but will also add any that need to be revisited during Phase 2		// list here, but will also add any that need to be revisited during Phase 2
// processing.		// processing.
		foadUnsubmitted Done Reply Inline Actions You shouldn't need to add a new pass here to insert s_round_mode instructions. The SIModeRegister pass already knows how to do that. You should just be able to add your new PSEUDOs to the switch in getInstructionMode. foad: You shouldn't need to add a new pass here to insert s_round_mode instructions. The…
for (MachineBasicBlock &BB : MF)		for (MachineBasicBlock &BB : MF)
Phase2List.push(&BB);		Phase2List.push(&BB);
while (!Phase2List.empty()) {		while (!Phase2List.empty()) {
processBlockPhase2(*Phase2List.front(), TII);		processBlockPhase2(*Phase2List.front(), TII);
Phase2List.pop();		Phase2List.pop();
}		}

// Phase 3 - add an initial setreg to each block where the required entry mode		// Phase 3 - add an initial setreg to each block where the required entry mode
// is not satisfied by the exit mode of all its predecessors.		// is not satisfied by the exit mode of all its predecessors.
for (MachineBasicBlock &BB : MF)		for (MachineBasicBlock &BB : MF)
processBlockPhase3(BB, TII);		processBlockPhase3(BB, TII);
		foadUnsubmitted Not Done Reply Inline Actions If you arrange for the pseudo to have exactly the same operands as the real instruction, then you don't need to build or delete instructions, all you need to do is change the opcode, which you can do with MI.setDesc(). foad: If you arrange for the pseudo to have exactly the same operands as the real instruction, then…
		jpagesAuthorUnsubmitted Done Reply Inline Actions I think this could work. I tried to "replace" my pseudo-instructions in Phase1 but it's not a good idea to add/delete instructions in the middle of an iteration on the basic block. jpages: I think this could work. I tried to "replace" my pseudo-instructions in Phase1 but it's not a…

BlockInfo.clear();		BlockInfo.clear();

		foadUnsubmitted Not Done Reply Inline Actions I'm not sure this needs another pass over all instructions in the function. Would it be cleaner to do it in phase 1? Maybe even inside getInstructionMode??? foad: I'm not sure this needs another pass over all instructions in the function. Would it be cleaner…
return Changed;		return Changed;
}		}

llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir

	Show First 20 Lines • Show All 125 Lines • ▼ Show 20 Lines
	# DEBUG-NEXT: .. the first uncovered type index: 2, OK			# DEBUG-NEXT: .. the first uncovered type index: 2, OK
	# DEBUG-NEXT: .. the first uncovered imm index: 0, OK			# DEBUG-NEXT: .. the first uncovered imm index: 0, OK
	#			#
	# DEBUG-NEXT: G_FREEZE (opcode {{[0-9]+}}): 1 type index, 0 imm indices			# DEBUG-NEXT: G_FREEZE (opcode {{[0-9]+}}): 1 type index, 0 imm indices
	# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}			# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
	# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected			# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
	# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected			# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
	#			#
				# DEBUG-NEXT: G_INTRINSIC_FPTRUNC_ROUND (opcode {{[0-9]+}}): 2 type indices, 0 imm indices
				# DEBUG-NEXT: .. type index coverage check SKIPPED: no rules defined
				# DEBUG-NEXT: .. imm index coverage check SKIPPED: no rules defined
				#
	# DEBUG-NEXT: G_INTRINSIC_TRUNC (opcode {{[0-9]+}}): 1 type index, 0 imm indices			# DEBUG-NEXT: G_INTRINSIC_TRUNC (opcode {{[0-9]+}}): 1 type index, 0 imm indices
	# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}			# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
	# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected			# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
	# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected			# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
	# DEBUG-NEXT: G_INTRINSIC_ROUND (opcode {{[0-9]+}}): 1 type index, 0 imm indices			# DEBUG-NEXT: G_INTRINSIC_ROUND (opcode {{[0-9]+}}): 1 type index, 0 imm indices
	# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}			# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
	# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected			# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
	# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected			# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
	▲ Show 20 Lines • Show All 574 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fail.llvm.fptrunc.round.ll

This file was added.

				; RUN: not --crash llc -march=amdgcn -mcpu=gfx1030 -verify-machineinstrs -o /dev/null %s 2>&1 \| FileCheck %s --ignore-case --check-prefix=FAIL
				; RUN: not --crash llc -global-isel -march=amdgcn -mcpu=gfx1030 -verify-machineinstrs -o /dev/null %s 2>&1 \| FileCheck %s --ignore-case --check-prefix=FAIL
				foadUnsubmitted Done Reply Inline Actions Does this test work in a Release build? If not, you can add a "REQUIRES: asserts" line just after the RUN lines. foad: Does this test work in a Release build? If not, you can add a "REQUIRES: asserts" line just…
				jpagesAuthorUnsubmitted Done Reply Inline Actions I just checked and it's working as well in Release without that. jpages: I just checked and it's working as well in Release without that.

				define amdgpu_gs void @test_fptrunc_round_legalization(double %a, i32 %data0, <4 x i32> %data1, half addrspace(1)* %out) {
				; FAIL: LLVM ERROR: Cannot select
				%res = call half @llvm.fptrunc.round.f64(double %a, metadata !"round.upward")
				store half %res, half addrspace(1)* %out, align 4
				ret void
				}

				declare half @llvm.fptrunc.round.f64(double, metadata)

llvm/test/CodeGen/AMDGPU/llvm.fptrunc.round.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -march=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s \| FileCheck %s
				; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck %s
				; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1030 -verify-machineinstrs < %s \| FileCheck %s

				define amdgpu_gs void @test_fptrunc_round_upward(float %a, i32 %data0, <4 x i32> %data1, half addrspace(1)* %out) {
				; CHECK-LABEL: test_fptrunc_round_upward:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 1), 1
				; CHECK-NEXT: v_cvt_f16_f32_e32 v0, v0
				; CHECK-NEXT: global_store_short v[6:7], v0, off
				; CHECK-NEXT: s_endpgm
				%res = call half @llvm.fptrunc.round(float %a, metadata !"round.upward")
				store half %res, half addrspace(1)* %out, align 4
				ret void
				}

				define amdgpu_gs void @test_fptrunc_round_downward(float %a, i32 %data0, <4 x i32> %data1, half addrspace(1)* %out) {
				; CHECK-LABEL: test_fptrunc_round_downward:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 3, 1), 1
				; CHECK-NEXT: v_cvt_f16_f32_e32 v0, v0
				; CHECK-NEXT: global_store_short v[6:7], v0, off
				; CHECK-NEXT: s_endpgm
				%res = call half @llvm.fptrunc.round(float %a, metadata !"round.downward")
				store half %res, half addrspace(1)* %out, align 4
				ret void
				}

				define amdgpu_gs void @test_fptrunc_round_upward_multiple_calls(float %a, float %b, i32 %data0, <4 x i32> %data1, half addrspace(1)* %out) {
				; CHECK-LABEL: test_fptrunc_round_upward_multiple_calls:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 1), 1
				; CHECK-NEXT: v_cvt_f16_f32_e32 v0, v0
				; CHECK-NEXT: v_cvt_f16_f32_e32 v2, v1
				; CHECK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 2
				; CHECK-NEXT: v_cvt_f16_f32_e32 v1, v1
				; CHECK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 3, 1), 0
				; CHECK-NEXT: v_add_f16_e32 v0, v0, v2
				; CHECK-NEXT: v_add_f16_e32 v0, v1, v0
				; CHECK-NEXT: global_store_short v[7:8], v0, off
				; CHECK-NEXT: s_endpgm
				%res1 = call half @llvm.fptrunc.round(float %a, metadata !"round.upward")
				%res2 = call half @llvm.fptrunc.round(float %b, metadata !"round.upward")
				%res3 = call half @llvm.fptrunc.round(float %b, metadata !"round.downward")
				%res4 = fadd half %res1, %res2
				%res5 = fadd half %res3, %res4
				store half %res5, half addrspace(1)* %out, align 4
				ret void
				}

				declare half @llvm.fptrunc.round(float, metadata)

llvm/test/Verifier/llvm.fptrunc.round.ll

This file was added.

				; RUN: not opt -verify < %s 2>&1 \| FileCheck %s

				declare half @llvm.fptrunc.round(float, metadata)

				define void @test_fptrunc_round_dynamic(float %a) {
				; CHECK: unsupported rounding mode argument
				%res = call half @llvm.fptrunc.round(float %a, metadata !"round.dynamic")
				; CHECK: unsupported rounding mode argument
				%res1 = call half @llvm.fptrunc.round(float %a, metadata !"round.test")
				; CHECK: invalid value for llvm.fptrunc.round metadata operand (the operand should be a string)
				%res2 = call half @llvm.fptrunc.round(float %a, metadata i32 5)
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add a new intrinsic to control fp_trunc rounding modeClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 407733

llvm/docs/LangRef.rst

llvm/include/llvm/CodeGen/ISDOpcodes.h

llvm/include/llvm/IR/Intrinsics.td

llvm/include/llvm/Support/TargetOpcodes.def

llvm/include/llvm/Target/GenericOpcodes.td

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/IR/Verifier.cpp

llvm/lib/Target/AMDGPU/AMDGPUGISel.td

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/lib/Target/AMDGPU/SIInstrInfo.td

llvm/lib/Target/AMDGPU/SIInstructions.td

llvm/lib/Target/AMDGPU/SIModeRegister.cpp

llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir

llvm/test/CodeGen/AMDGPU/fail.llvm.fptrunc.round.ll

llvm/test/CodeGen/AMDGPU/llvm.fptrunc.round.ll

llvm/test/Verifier/llvm.fptrunc.round.ll

[AMDGPU] Add a new intrinsic to control fp_trunc rounding mode
ClosedPublic