This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add optimization patterns to combine fp32->fp16 conversions
Needs ReviewPublic

Authored by pendingchaos on Dec 1 2018, 10:57 AM.

Details

Summary

This has a build_vector (or equivalent) of the low or high words of a
cvt_pkrtz_f16_f32 be selected to a single v_cvt_pkrtz_f16_f32.

Diff Detail

Event Timeline

pendingchaos created this revision.Dec 1 2018, 10:57 AM

I don't have commit access.

arsenm added inline comments.Dec 16 2018, 10:32 PM
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
2102

Capitalize and punctuate

2137–2138

I think you should need only one or the other here, not both. Either way, you could move this out of the function (and I think there's a wrapper for this somewhere already)

2148

You can use the constructor directly without the extra = APFloat

2182

Capitalize hi

2211–2216

computeKnownBits?

lib/Target/AMDGPU/SIInstructions.td
1583

Should use isVI, or maybe these should be distinguished by GCN3Encoding? Needs a comment for why these are separated

test/CodeGen/AMDGPU/cvt_pkrtz_f16_f32_combine.ll
24

Probably should explicitly test the different encodings for the different sub targets

pendingchaos added inline comments.Dec 17 2018, 1:24 PM
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
2211–2216

I don't see how that would help with obtaining the source of the low/high 16 bits (the cvt_pkrtz node) and end up make the code more generic/smaller? Since it would still have to match for "and(cvt_pkrtz(v, ), 0xffff)" and such.

I just realized that this function could handle cvt_pkrtz(v, 0) and cvt_pkrtz(0, v) (without any ands or shifts). Should I make it so (and do something similar for SelectCvtRtzF16F32)?

lib/Target/AMDGPU/SIInstructions.td
1583

Since all GCN versions support the VOP3a form, shouldn't it use isGCN (to combine the modifiers into the instruction on SI/CI)?

The VOP2 form is only supported on SI/CI, so isSICI is used. IIRC VOP2 ended up being used when no modifiers could be folded.

arsenm added inline comments.Jan 21 2019, 3:00 PM
lib/Target/AMDGPU/SIInstructions.td
1583

isGCN is obsolete and should be removed anywhere it's used.

Since we try to shrink instructions later, we only want to select the _e64 version when possible

In this update:

  • The 64-bit encodings are always selected and SelectCvtRtzF16F32*Mods() have been removed.
  • SelectCvtRtzF16F32() is now implemented with SelectCvtRtzF16F32Lo().
  • cvt_pkrtz(v, 0) and cvt_pkrtz(0, v) are handled in SelectCvtRtzF16F32LoHiImpl().
  • The test explicitly tests the different encodings for the different sub targets.
  • The getConstantValue() helper is used.
arsenm added inline comments.Apr 7 2020, 2:11 PM
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
2147

Constant folding during selection is pretty weird. Can we just do this in InstSimplify?