The old lowering for uint_to_fp failed opencl conformance.
It might be OK for fast math mode, but I'm not sure.
Details
- Reviewers
• tstellarAMD arsenm
Diff Detail
Event Timeline
Other than the removed test, LGTM.
lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
2233–2236 | I was thinking a bit about this because of all the i64, but it quickly gets messy and it's not clear to me that there is a much better way. I wonder whether bitcasting u to v2i32 and only shifting the high dword by 8 results in better code, but I'm fine with not trying that. | |
test/CodeGen/AMDGPU/uint_to_fp.ll | ||
124–132 | I think the R600 variant of the test should stay. |
lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
2233–2236 | There are a few missing combines I'm working on that impact this that SC does. For example, the > 32 bit shift is split into a 32-bit shift and a mov 0. It's best to implement those separately rather than trying to specially emit them here |
I was thinking a bit about this because of all the i64, but it quickly gets messy and it's not clear to me that there is a much better way. I wonder whether bitcasting u to v2i32 and only shifting the high dword by 8 results in better code, but I'm fine with not trying that.