Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This is a minor code quality improvement and helps reduce the conversion code length by half.
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
2527 | Why is zero undef OK? The high half could be all zeroes | |
2565 | This is redundant since it should assert in the getNode | |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
2100–2101 | You need to assign these constants to variables. The evaluation order of function arguments isn't defined so this could result in a different instruction ordering depending on the host compiler | |
2123–2124 | You can just do {Dst} directly to the buildInstr call |
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
2527 | For the all 0 case, the following 'select' will return 32, so that all hi bits are shifted out. In that case, only the low 32 bits need converting. | |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
2123–2124 | I remembered I tried that but still cannot compile it. Let me try again |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
2123–2124 | Still cannot compile. I noticed that 'buildIntrinsic' has two prototypes. {Dst} make that ambiguous for the compiler to choose. |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
2123–2124 | This ambiguity isn't great. Using ArrayRef<Register>{Dst} works |
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
2455–2456 | Can you rewrite this whole block comment to only describe what the code does now? | |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
2071 | Move this inside the "if (Signed)" case. | |
llvm/test/CodeGen/AMDGPU/GlobalISel/cvt_f32_ubyte.ll | ||
1085–1097 | Not related to your patch, but we should generate v_cvt_f32_ubyte0 here, shouldn't we? | |
llvm/test/CodeGen/AMDGPU/sint_to_fp.i64.ll | ||
1–3 | Can you autogenerate the checks for this file and uint_to_fp.i64.ll and then rebase? |
llvm/test/CodeGen/AMDGPU/GlobalISel/cvt_f32_ubyte.ll | ||
---|---|---|
1085–1097 | Yes, but nothing is trying to reduce the bitwidth of anything right now |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
2105 | Can't you buildCTLZ here and let the legalizer worry about how to handle the ZERO case? In particular it would be better to do v_ffbh followed by v_min_u32 with 32, instead of a compare and select. |
llvm/test/CodeGen/AMDGPU/GlobalISel/cvt_f32_ubyte.ll | ||
---|---|---|
1085–1097 | the constant folding on those intrinsics is not supported yet. You may notice that the unsigned byte conversion is already simplified a lot due to the general SDNode used, where the only missing part if LDEXP(v, 0). Once we teach the combiner to understand ffbh_i32 and ldexp, we should get the expected result. I also found other minor issues related, will prepare patches to improve them soon. |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
2105 | I reused part of the original code and didn't try whether we support CLTZ without undefined zero behavior. We may try that later. |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
2105 | Yeah, that works. I could use CTLZ directly. |
LGTM with nit
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
2482 | Subtarget is already available in the class |
Please rebase on D107429 or similar. Without that, I can't see the full ISA for either the old or the new conversion sequences.
Patch looks OK thanks, just some minor comments inline.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
2102–2103 | You could use buildUMin(S32, LS, MaxShAmt) here instead of compare+select. | |
2105 | Nit: I think LLVM style is to put braces around the "else" part as well, if there are braces around the "if" part. | |
llvm/test/CodeGen/AMDGPU/uint_to_fp.i64.ll | ||
151–152 | Why does this suddenly start using valu instructions for uniform values? |
llvm/test/CodeGen/AMDGPU/sint_to_fp.i64.ll | ||
---|---|---|
233–251 | I think you might be able to shave off another instruction from this sequence with something like: v_alignbit v0, v1, v2, 31 ; extract bits 62..31 v_ashrrev v3, 31, v2 ; duplicate sign bit 32 times v_xor v0, v0, v3 ; mask is 0 where bits 62..31 match sign bit v_ffbh_u32 v0, v0 ; count how many of the high bits from 62..31 match the sign bit v_min_u32 v0, 32, v0 ; clamp to 32 Now v0 is the shift amount for the v_lshlrev_b64. |
llvm/test/CodeGen/AMDGPU/sint_to_fp.i64.ll | ||
---|---|---|
233–251 | see D107507 for further enhancement. I choose another sequence with less instruction because v_alignbit is not available on pre-GCN targets as well as SALU. Also, the final 32-bit integer conversion is also revised inspired by D107474. Overall, it helps reduce the uitofp by 1 insn or 2 (with D107474). and sitofp by 2 insn. |
Can you rewrite this whole block comment to only describe what the code does now?