Add a new intrinsic to precisely control the rounding mode when converting
from f32 to f16.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
6348 | Err. I guess the intrinsic attributes are defined to always have a Chain. Should it be IntrNoMem instead? |
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
6348 | IIUC, if the rounding mode is allowed to be "dynamic" then the intrinsic needs to be IntrInaccessibleMemOnly (just like the constrained fp intrinsics). If we don't allow "dynamic" then it could be IntrNoMem. Personally I have no interest in supporting "dynamic" so I would vote for disallowing it and making the intrinsic IntrNoMem. |
Rebased.
The intrinsic has the IntrNoMem attribute, and is represented by INTRINSIC_WO_CHAIN in the DAG.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
6348 | Yes the original version was defined to have a chain, I changed it with IntrNoMem/INTRINSIC_WO_CHAIN as suggested. Thank you for the suggestions. As a consequence this code is a bit simpler, and the lowering in the backend was modified to reflect that . |
It seems like people are mostly happy with the design now, so I am being a bit more picky with my review comments!
llvm/docs/LangRef.rst | ||
---|---|---|
24068 | Add "... with a specified rounding mode". Apart from that I think most of the Overview/Arguments/Semantics text could be copied verbatim from the definition of the fptrunc-to instruction? | |
24078 | Need to disallow "dynamic" if that's what we decided. | |
24084 | Probably need to add the text from fptrunc-to: "This instruction is assumed to execute in the default floating-point environment" (because of the stuff about exceptions) and then say *except* for the rounding mode. | |
llvm/include/llvm/IR/Intrinsics.td | ||
913 | "Truncate" | |
916 | Add IntrWillReturn. | |
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
2258 | VRegs isn't used for anything so remove it. | |
2270 | We know it's not void, and has exactly one result, so simplify this. | |
2274 | We know it does not access memory. | |
llvm/lib/IR/Verifier.cpp | ||
4782 | MD could be nullptr here, so you need to handle that. | |
4784 | Check that the rounding mode is not "dynamic"? | |
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | ||
698 ↗ | (On Diff #399125) | Can this be done with patterns in the tablegen files instead of C++ code? Then they should work for both SelectionDAG and GlobalISel. |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
3252 ↗ | (On Diff #399125) | Can this be done with patterns in the tablegen files instead of C++ code? |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
4978 | Should this be "return false" to indicate a legalization failure in the normal way? | |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
4576 | sgpr seems wrong. This is specifying the mapping for the result value and the input value (not the rounding mode which is no longer an operand). I think this should just use the same mapping as G_FPTRUNC (line 3640). | |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
7006 | I'm not quite sure but I think you should return SDValue() here to indicate a lowering/legalization failure? | |
llvm/lib/Target/AMDGPU/SIISelLowering.h | ||
134 ↗ | (On Diff #399768) | No longer needed. |
I think I'd like to see a target independent ISD opcode added for this. We don't use ISD::INTRINSIC_WO_CHAIN for target independent intrinsics.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
6367 | This is always false. And if it isn't you would only push one argument. | |
6372 | I'd probably just use TLI.getPointerTy instead of MVT::i8. | |
6388 | I know this was taken from visitTargetIntrinsic, but I don't really understand it. | |
6392 | I don't think this is needed. AssertZExt isn't useful for an FP result. | |
6394 | I don't think this alignment stuff is needed. The return type is FP so it probably doesn't do anything. |
Rebased, as suggested I added a target independent ISD opcode.
Thanks again for all the comments!
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
6388 | I removed this line, I'm not sure of what it was doing either. | |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
3252 ↗ | (On Diff #399125) | Maybe it's possible, but I could not find a way to match both the G_FPTRUNC_ROUND_UPWARD and the selection DAG version. So I let this one in C++. |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
192–193 | Like I said previously, I tried to use pure tablegen for both SelectionDag and global-isel and I could not find a way to do it for both. |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
3252 ↗ | (On Diff #399125) | You can get some clues about GlobalISel pattern problems by running tablegen with -warn-on-skipped-patterns like this: $ llvm-tblgen -gen-global-isel -I lib/Target/AMDGPU -I build/include -I include -I lib/Target lib/Target/AMDGPU/AMDGPUGISel.td -warn-on-skipped-patterns ... lib/Target/AMDGPU/SIInstructions.td:186:1: warning: Skipped pattern: Pattern operator lacks an equivalent Instruction (AMDGPUISD::FPTRUNC_ROUND_UPWARD) def FPTRUNC_UPWARD_PSEUDO : VPseudoInstSI <(outs VGPR_32:$vdst), ^ ... lib/Target/AMDGPU/SIInstructions.td:190:1: warning: Skipped pattern: Pattern operator lacks an equivalent Instruction (AMDGPUISD::FPTRUNC_ROUND_DOWNWARD) def FPTRUNC_DOWNWARD_PSEUDO : VPseudoInstSI <(outs VGPR_32:$vdst), ^ This means you need to add some GINodeEquiv lines in AMDGPUGISel.td. |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
198 | Just end the line with ; if there is nothing to go inside the {}. |
Here's what I came up with to get the globalisel pattern matching working, plus a few other suggested tweaks: https://reviews.llvm.org/differential/diff/404879/
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
3252 ↗ | (On Diff #399125) | Thanks again, I didn't know about this command |
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
6366 | You probably don't need ComputeValueVTs. There shouldn't be any structs or arrays here that need to broken down. I think you can do EVT VT = TLI.getValueType(I.getType) And then use VT in place of VTs |
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
6366 | Indeed it works, thanks! |
This is looking pretty good, just another round of minor comments.
llvm/docs/LangRef.rst | ||
---|---|---|
24080 | This seems to be repeating the "must be larger" text from a few lines above. | |
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
2271 | return true. | |
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
6348 | You don't really need a SmallVector here, since you know you have exactly two operands, you could just pass them as two separate arguments to the DAG.getNode call. But it's up to you. | |
6352 | Shouldn't this be getArgOperand(1)? How does it work? | |
6356 | This isn't used. | |
6357 | TLI is already available in this function. | |
6370 | All intrinsics returning floating point type are classed as FPMathOperators, so you can do this part unconditionally. | |
6375 | Use sdl here too. | |
llvm/lib/IR/Verifier.cpp | ||
4782 | I am not an expert on verifying metadata but I think you probably need to check that it is an MDString here, otherwise the cast<MDString> below could fail? And really we ought to have some tests in test/Verifier/llvm.fptrunc.round.ll. | |
4786 | I am not sure but I think RoundMode != RoundingMode::Dynamic needs to be RoundMode.getValue() != ... (or *RoundMode != ...). | |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
4749 | Do you also need to check that the operand is MVT::f32? Maybe add a test where the operand is f64, and check that it fails to legalize? |
I think this looks good now. Are you happy with it @craig.topper @sepavloff ?
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
2259 | Should probably use getArgOperand here, though it doesn't make any difference except for bounds checking the index that you pass in. | |
llvm/lib/IR/Verifier.cpp | ||
4784 | isa is a bit more natural than dyn_cast here. | |
llvm/test/CodeGen/AMDGPU/fail.llvm.fptrunc.round.ll | ||
2 | Does this test work in a Release build? If not, you can add a "REQUIRES: asserts" line just after the RUN lines. |
It looks good to me.
I had a look at generic changes only, someone with expertise in AMDGPU should also approve this patch.
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
2269 | Flags could be set in the call to MIRBuilder.buildInstr. |
llvm/docs/LangRef.rst | ||
---|---|---|
24089 | Should we document that this isn't supported on all targets, and that a particular target may not support all rounding modes? |
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
2268 | You can also do this on the end of the buildInstr line if you prefer, no need for MIB at all. |