This will prevent following regression when enabling i16 support (D18049):
test/CodeGen/AMDGPU/cvt_f32_ubyte.ll
Details
Diff Detail
Event Timeline
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
3560 | I don't think this is correct, because it's replacing SINT_TO_FP with CVT_F32_UBYTE0, which is an unsigned conversion. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
3525–3527 | I think calling simplifyDemandedBits either here or in performUCharToFloat combine will eliminate the need for this code. | |
3560 | Nevermind, I see that this transform is correct, because performUCharToFloatCombine() is checking that the high bits are all zero. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
3525–3527 | In this case, simplifyDemandedBits will replace zero_extend with any_extend. |
LGTM. Can you mention in the commit message that this will prevent a regression when enabling i16 support, and note the test that would regress.
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
3560 | That checks if the high 24 bits are zero. For the signed case it needs to check if 25 bits are 0 so it is still incorrect |
I think calling simplifyDemandedBits either here or in performUCharToFloat combine will eliminate the need for this code.