This has a build_vector (or equivalent) of the low or high words of a
cvt_pkrtz_f16_f32 be selected to a single v_cvt_pkrtz_f16_f32.
Details
Diff Detail
Event Timeline
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | ||
---|---|---|
2045 | Capitalize and punctuate | |
2080–2081 | I think you should need only one or the other here, not both. Either way, you could move this out of the function (and I think there's a wrapper for this somewhere already) | |
2091 | You can use the constructor directly without the extra = APFloat | |
2125 | Capitalize hi | |
2154–2159 | computeKnownBits? | |
lib/Target/AMDGPU/SIInstructions.td | ||
1597 | Should use isVI, or maybe these should be distinguished by GCN3Encoding? Needs a comment for why these are separated | |
test/CodeGen/AMDGPU/cvt_pkrtz_f16_f32_combine.ll | ||
23 | Probably should explicitly test the different encodings for the different sub targets |
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | ||
---|---|---|
2154–2159 | I don't see how that would help with obtaining the source of the low/high 16 bits (the cvt_pkrtz node) and end up make the code more generic/smaller? Since it would still have to match for "and(cvt_pkrtz(v, ), 0xffff)" and such. I just realized that this function could handle cvt_pkrtz(v, 0) and cvt_pkrtz(0, v) (without any ands or shifts). Should I make it so (and do something similar for SelectCvtRtzF16F32)? | |
lib/Target/AMDGPU/SIInstructions.td | ||
1597 | Since all GCN versions support the VOP3a form, shouldn't it use isGCN (to combine the modifiers into the instruction on SI/CI)? The VOP2 form is only supported on SI/CI, so isSICI is used. IIRC VOP2 ended up being used when no modifiers could be folded. |
lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
1597 | isGCN is obsolete and should be removed anywhere it's used. Since we try to shrink instructions later, we only want to select the _e64 version when possible |
In this update:
- The 64-bit encodings are always selected and SelectCvtRtzF16F32*Mods() have been removed.
- SelectCvtRtzF16F32() is now implemented with SelectCvtRtzF16F32Lo().
- cvt_pkrtz(v, 0) and cvt_pkrtz(0, v) are handled in SelectCvtRtzF16F32LoHiImpl().
- The test explicitly tests the different encodings for the different sub targets.
- The getConstantValue() helper is used.
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | ||
---|---|---|
2090 | Constant folding during selection is pretty weird. Can we just do this in InstSimplify? |
Capitalize and punctuate