This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Perform uchar to float combine for ISD::SINT_TO_FP
ClosedPublic

Authored by kzhuravl on Oct 19 2016, 5:03 PM.

Details

Summary

This will prevent following regression when enabling i16 support (D18049):
test/CodeGen/AMDGPU/cvt_f32_ubyte.ll

Diff Detail

Repository
rL LLVM

Event Timeline

kzhuravl updated this revision to Diff 75257.Oct 19 2016, 5:03 PM
kzhuravl retitled this revision from to [AMDGPU] Perform uchar to float combine for ISD::SINT_TO_FP.
kzhuravl updated this object.
kzhuravl added a reviewer: tstellarAMD.
kzhuravl added a subscriber: llvm-commits.
lib/Target/AMDGPU/SIISelLowering.cpp
3560 ↗(On Diff #75257)

I don't think this is correct, because it's replacing SINT_TO_FP with CVT_F32_UBYTE0, which is an unsigned conversion.

lib/Target/AMDGPU/SIISelLowering.cpp
3525–3527 ↗(On Diff #75257)

I think calling simplifyDemandedBits either here or in performUCharToFloat combine will eliminate the need for this code.

3560 ↗(On Diff #75257)

Nevermind, I see that this transform is correct, because performUCharToFloatCombine() is checking that the high bits are all zero.

kzhuravl updated this revision to Diff 75449.Oct 21 2016, 10:46 AM
kzhuravl edited edge metadata.

Use getZExtOrTrunc

kzhuravl added inline comments.Oct 21 2016, 10:46 AM
lib/Target/AMDGPU/SIISelLowering.cpp
3525–3527 ↗(On Diff #75257)

In this case, simplifyDemandedBits will replace zero_extend with any_extend.

kzhuravl updated this revision to Diff 75459.Oct 21 2016, 12:08 PM
kzhuravl edited edge metadata.

Only check for zero_extend

tstellarAMD accepted this revision.Oct 21 2016, 1:09 PM
tstellarAMD edited edge metadata.

LGTM. Can you mention in the commit message that this will prevent a regression when enabling i16 support, and note the test that would regress.

This revision is now accepted and ready to land.Oct 21 2016, 1:09 PM
kzhuravl updated this object.Oct 21 2016, 3:11 PM
kzhuravl edited edge metadata.
This revision was automatically updated to reflect the committed changes.
arsenm added inline comments.Oct 26 2016, 12:41 PM
lib/Target/AMDGPU/SIISelLowering.cpp
3560 ↗(On Diff #75257)

That checks if the high 24 bits are zero. For the signed case it needs to check if 25 bits are 0 so it is still incorrect