This has a build_vector (or equivalent) of the low or high words of a
cvt_pkrtz_f16_f32 be selected to a single v_cvt_pkrtz_f16_f32.
Details
Diff Detail
Event Timeline
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | ||
---|---|---|
2102 | Capitalize and punctuate | |
2137–2138 | I think you should need only one or the other here, not both. Either way, you could move this out of the function (and I think there's a wrapper for this somewhere already) | |
2148 | You can use the constructor directly without the extra = APFloat | |
2182 | Capitalize hi | |
2211–2216 | computeKnownBits? | |
lib/Target/AMDGPU/SIInstructions.td | ||
1583 | Should use isVI, or maybe these should be distinguished by GCN3Encoding? Needs a comment for why these are separated | |
test/CodeGen/AMDGPU/cvt_pkrtz_f16_f32_combine.ll | ||
24 | Probably should explicitly test the different encodings for the different sub targets |
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | ||
---|---|---|
2211–2216 | I don't see how that would help with obtaining the source of the low/high 16 bits (the cvt_pkrtz node) and end up make the code more generic/smaller? Since it would still have to match for "and(cvt_pkrtz(v, ), 0xffff)" and such. I just realized that this function could handle cvt_pkrtz(v, 0) and cvt_pkrtz(0, v) (without any ands or shifts). Should I make it so (and do something similar for SelectCvtRtzF16F32)? | |
lib/Target/AMDGPU/SIInstructions.td | ||
1583 | Since all GCN versions support the VOP3a form, shouldn't it use isGCN (to combine the modifiers into the instruction on SI/CI)? The VOP2 form is only supported on SI/CI, so isSICI is used. IIRC VOP2 ended up being used when no modifiers could be folded. |
lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
1583 | isGCN is obsolete and should be removed anywhere it's used. Since we try to shrink instructions later, we only want to select the _e64 version when possible |
In this update:
- The 64-bit encodings are always selected and SelectCvtRtzF16F32*Mods() have been removed.
- SelectCvtRtzF16F32() is now implemented with SelectCvtRtzF16F32Lo().
- cvt_pkrtz(v, 0) and cvt_pkrtz(0, v) are handled in SelectCvtRtzF16F32LoHiImpl().
- The test explicitly tests the different encodings for the different sub targets.
- The getConstantValue() helper is used.
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | ||
---|---|---|
2147 | Constant folding during selection is pretty weird. Can we just do this in InstSimplify? |
Capitalize and punctuate