This will prevent following regression when enabling i16 support (D18049):
test/CodeGen/AMDGPU/cvt_f32_ubyte.ll
Details
Details
Diff Detail
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
3560 ↗ | (On Diff #75257) | I don't think this is correct, because it's replacing SINT_TO_FP with CVT_F32_UBYTE0, which is an unsigned conversion. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
3525–3527 ↗ | (On Diff #75257) | I think calling simplifyDemandedBits either here or in performUCharToFloat combine will eliminate the need for this code. |
3560 ↗ | (On Diff #75257) | Nevermind, I see that this transform is correct, because performUCharToFloatCombine() is checking that the high bits are all zero. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
3525–3527 ↗ | (On Diff #75257) | In this case, simplifyDemandedBits will replace zero_extend with any_extend. |
Comment Actions
LGTM. Can you mention in the commit message that this will prevent a regression when enabling i16 support, and note the test that would regress.
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
3560 ↗ | (On Diff #75257) | That checks if the high 24 bits are zero. For the signed case it needs to check if 25 bits are 0 so it is still incorrect |