This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Implement {{s|u}}int_to_fp i64 -> f32
ClosedPublic

Authored by arsenm on Jan 11 2016, 8:13 AM.

Details

Summary

The old lowering for uint_to_fp failed opencl conformance.
It might be OK for fast math mode, but I'm not sure.

Diff Detail

Event Timeline

arsenm updated this revision to Diff 44512.Jan 11 2016, 8:13 AM
arsenm retitled this revision from to AMDGPU: Implement {{s|u}}int_to_fp i64 -> f32.
arsenm updated this object.
arsenm added a reviewer: tstellarAMD.
arsenm added a subscriber: llvm-commits.

Other than the removed test, LGTM.

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2233–2236

I was thinking a bit about this because of all the i64, but it quickly gets messy and it's not clear to me that there is a much better way. I wonder whether bitcasting u to v2i32 and only shifting the high dword by 8 results in better code, but I'm fine with not trying that.

test/CodeGen/AMDGPU/uint_to_fp.ll
124–132

I think the R600 variant of the test should stay.

arsenm added inline comments.Jan 11 2016, 1:21 PM
lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2233–2236

There are a few missing combines I'm working on that impact this that SC does. For example, the > 32 bit shift is split into a 32-bit shift and a mov 0. It's best to implement those separately rather than trying to specially emit them here

arsenm accepted this revision.Jan 11 2016, 2:05 PM
arsenm added a reviewer: arsenm.

r257393 with r600 test readded

This revision is now accepted and ready to land.Jan 11 2016, 2:05 PM
arsenm closed this revision.Jan 11 2016, 2:05 PM