Uses existing SelectionDAG lowering of the llvm.amdgcn.class intrinsic for llvm.is.fpclass
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Can do it as a follow up commit, but the existing combines we have for AMDGPU::FP_CLASS should be ported to use the generic intrinsic. Also, llvm.amdgcn.class should get bitcode upgraded to the generic
llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll | ||
---|---|---|
182 | Should add some vector cases too |
llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll | ||
---|---|---|
3 | Should also test/handle globalisel |
I just realized the amdgpu intrinsic allows non-immediate arguments, but is_fpclass does not so these are not equivalent
- SelectionDAG fpclass vector support
- GlobalISel llvm.is.fpclass support for AMDGPU
- rebase
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
2320 ↗ | (On Diff #472930) | This should get an IRTranslator test to make sure the flags are passed through |
2324–2325 ↗ | (On Diff #472930) | getUniqueInteger is unnecessarily fancy, can just cast to ConstantInt directly |
2332 ↗ | (On Diff #472930) | Do you really need the float type operand? I know bfloat16 isn't going to work without it, but I thought the plan was to introduce FP types to LLT |
llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.td | ||
133 | Should avoid defining an AMDGPU node for this and move this to generic code | |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
870 ↗ | (On Diff #472930) | I don't see why you need to manually select this (maybe sharing the pattern between the existing intrinsic is annoying because the new intrinsic uses immarg?) |
880–883 ↗ | (On Diff #472930) | Should be no reason to check this here |
894–902 ↗ | (On Diff #472930) | You can just unconditionally materialize the constant into a register and let SIFoldOperands sort out the constant bus restriction |
905–906 ↗ | (On Diff #472930) | You shouldn't need to special case the result constraint |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
3945 ↗ | (On Diff #472930) | Pretty sure this default constructs to null |
llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll | ||
3 | Should use some share prefixes, a lot of these functions are the same. Also needs a gfx7 and 8 run lines for the half promotion | |
1923 | v3f16 and v4f16 are also potentially interesting |
- SelectionDAG fpclass vector support
- GlobalISel llvm.is.fpclass support for AMDGPU
- Address comments, add custom half promotion for gfx7
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
2320 ↗ | (On Diff #472930) | Not sure if I completely hit the mark with my added test, but to me it seemed that not all flags were possible (e.g., nnan flag didn't work as it required a fp return type). For now I've added flag related tests that explicitly test the addition of nofpexcept. Do let me know if there's something missing or whether this copyFlagsFromInstruction is better omitted. |
2332 ↗ | (On Diff #472930) | I believe it's not necessary for amdgpu but required for the G_IS_FPCLASS target opcode. Leaving it out results in verifier errors (I also am unaware about introducing FP types and LLT). |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
870 ↗ | (On Diff #472930) | I did look on whether I could re-use some of the existing tablegen but I couldn't get it quite into the right shape for it to match. llvm.is.fpclass requires the mask to be an immarg as you mentioned so materializing the immediate into a register anywhere before this function results in a verifier error. |
llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll | ||
3 | I'm not that well versed in how gfx7 should do half promotion. I feel like either gfx7 selectiondag or gfx7 globalisel half promotion tests are incorrect (and if not, selectiondag version does seem suboptimal). |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
2450 ↗ | (On Diff #474002) | This will do the wrong thing for snans and also denormals inputs are flushed |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
2450 ↗ | (On Diff #474002) | I also don't see the corresponding DAG legalization. It's such a special case I think this should be split into a separate patch anyway. |
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
6492–6493 ↗ | (On Diff #474002) | Why did this change? |
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
318–320 | It seems annoying to have such a long list of types here - it'll need updating whenever we introduce a new one. Can you use something like FloatVectorTypes instead? |
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
2332 ↗ | (On Diff #472930) | Sorry, I meant that the MachineVerifier will fail for the G_IS_FPCLASS instruction. |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
2450 ↗ | (On Diff #474002) |
I put the corresponding SelectionDAG type widening code for IS_FPCLASS is in target custom function LowerIS_FPCLASS as I couldn't bypass expansion in SelectionDAGBuilder.cpp when marking the action for the instruction with f16 as promote (i.e., it would call IS_FPCLASS expansion code even when trying to promote).
As in, the widening code, or IS_FPCLASS support for amdgpu gfx7? |
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
6492–6493 ↗ | (On Diff #474002) | isOperationLegalOrCustom will return false if the type is considered illegal regardless of whether the instruction's type is marked legal or custom whereas isOperationCustom won't explicitly check for type legality and returns whether the action was set to custom. I basically just wanted it to go through to target custom code. (May revert this in favor of using the expand code for f16 in case there is no f16 fp class instruction for amdgpu) |
For this patch I'd like to drop all the attempts to handle legalizing the f16 case and move that to a separate patch. It's a much more complicated edge case that doesn't have much in common with the base handling
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
2332 ↗ | (On Diff #472930) | Right, it's there in the operand list. I mean more abstractly, why is it there? |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
2450 ↗ | (On Diff #474002) | OK, there are several issues here. None of this should be done in target code. I also don't approve of doing this expansion in the DAG builder, but see that's a pre-existing issue. GlobalISel does need to do the same expansion. |
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
6492–6493 ↗ | (On Diff #474002) | For the no f16 case, I think we need to do software expansion to get correct results for denormal values |
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
318–320 | This should be unnecessary, we have no vector class instructions. These should just expand into scalars | |
2731 | This doesn't work correctly for denormals. The f16 denormal value won't be denormal after casting to f32 (if it wasn't flushed to zero under DAZ or FTZ modes) |
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
2332 ↗ | (On Diff #472930) | I can see the semantics being used in the target independent expansion of IS_FPCLASS in SelectionDAG (e.g., for retrieving inf of a particular fp semantic). I'm inferring that the rationale could be: GlobalISel will require a similar implementation and therefore requires the semantics. I haven't looked into whether any alternatives exist that don't require passing of the semantics through the operand, though. |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
2450 ↗ | (On Diff #474002) | I was looking at implementing the SelectionDAG target independent expansion for GlobalISel lower(). I'll first remove f16 legalizing for cases where there is no f16 instructions available for amdgpu for this diff and move the GlobalISel's expansion/lower to another diff. |
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
318–320 | If not set as custom (or legal), these'll get expanded through the target independent expansion. Bypassing said target independent expansion does result into the desired scalarizing. |
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
2332 ↗ | (On Diff #472930) | Currently the LLT directly implies the semantics for every operation |
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
318–320 | This is one of the problems with doing this kind of expansion in SelectionDAGBuilder. This should go through the usual legalization paths |
- Remove IS_FPCLASS amdgpu f16 legalization, split tests into f16 and not f16 cases, temporarily disable gfx7 glisel tests
- Rebase
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
2332 ↗ | (On Diff #472930) | @sepavloff Do you happen to recall the rationale of the fp semantic operand for G_IS_FPCLASS? My knowledge about it are a bit shallow but perhaps it can be removed |
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
2332 ↗ | (On Diff #472930) | It is used to workaround limitations of GlobalISel, - lack of floating-point types. Without this operand it is impossible to distinguish between half and bfloat16 and also between different flavors of 8-bit floats. If LLT supported floating-point types, this operand could be removed. |
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
2332 ↗ | (On Diff #472930) | But this is a problem for every single operation, not just this one. We don't have a decided upon strategy for dealing with this, so it doesn't make sense to me to try to deal with it here |
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
314 | Can you add a fixme that we just want scalarization? | |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
870 ↗ | (On Diff #472930) | You might need to split it into a different pattern instantiation, but you would just need the S_MOV_B32 from the mask to the constant (although I actually would expect it to work if you directly folded the constant anyway, since the operand should have been copied to VGPR anyway). Something like: class ClassPat<Instruction inst, ValueType vt> : GCNPat < (fp_class (VOP3Mods vt:$src0, i32:$src0_mods), (i32 timm:mask)) (inst $src0_mods, VSrc_b32:$src0, $src0_mods, (S_MOV_B32 $mask)) >; |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
983 ↗ | (On Diff #475421) | I think this clampScalar isn't doing anything and can be dropped |
llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll | ||
12 | Can you also add some cases where the input will be an SGPR? | |
107 | s/float/f32 in these function names |
llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-fpclass-flags.ll | ||
---|---|---|
16 ↗ | (On Diff #475421) | I've been wondering whether the flag copy from the IR intrinsic to G_IS_FPCLASS in IRTranslator should be removed altogether. I'd have to weaken the flags' constraints as they all require scalar or vector fp return types. Additionally, Any use of fast math flags outside of existing uses will most likely require amending langref. E.g., current descriptions of some fast math flags describe how input can result into a poison value but this wouldn't be possible for G_IS_FPCLASS as it's a bool return. Let me know what you think, I can see some of the flags being useful by folding into constant bool values (e.g., not a nan flag + G_IS_FPCLASS test for nans) but I may be a bit naïve on useful cases beyond said folding. |
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
2320 ↗ | (On Diff #472930) | You can remove the flag copy if you want, although the flags may be introduced in the future |
2333–2334 ↗ | (On Diff #475421) | I just realized there's no point in doing this. G_IS_FPCLASS is not marked as mayRaiseFPException, so the flag is implied |
llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-fpclass-flags.ll | ||
16 ↗ | (On Diff #475421) | OK, might as well drop this test if we have end to end tests and there's nothing unique to test in the IRTranslator |
- Remove patfrag dependency for is_fpclass
- Add dedicated patterns
- Remove globalisel manual selection and depend on selectiondag tablegen
- Address comments
LGTM with nits. you have some dead checks and dead code
llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.td | ||
---|---|---|
132–134 | Whole file is now whitespace only changes which can be dropped | |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
913–922 ↗ | (On Diff #477969) | Dead code |
llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.f16.ll | ||
94–95 ↗ | (On Diff #477969) | This is broken for signaling nans. You dropped this from the patch but left these dead checks around |
Sorry, haven't gotten github access yet: could you (or somebody in AMDGPU group) land this for me? 😅
Can you add a fixme that we just want scalarization?