Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Needs a dedicated test which just runs the pass with opt for all of the operations
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
179 ↗ | (On Diff #69978) | You can't use zext to promote any operation, you must use sext for the signed operations. |
179 ↗ | (On Diff #69978) | This also won't get selects for the min/max pattern |
183 ↗ | (On Diff #69978) | This should check the type first since it is cheaper. We also should investigate whether this should be done for smaller types that will be legalized to i16 |
test/CodeGen/AMDGPU/mul_uint24.ll | ||
28 ↗ | (On Diff #69978) | I'm surprised this test broke from this. These changed cases should then be duplicated into a version with VGPRs to keep checking the pattern this was meant to |
There are various integer intrinsics that can also be handled but that can be a later patch
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
122 ↗ | (On Diff #71067) | I don't think you need this assert. It's not like the code will be incorrect if this is too aggressive |
144 ↗ | (On Diff #71067) | You shouldn't need this |
153 ↗ | (On Diff #71067) | Ditto |
173 ↗ | (On Diff #71067) | ditto |
296–298 ↗ | (On Diff #71067) | I think all of these should be skipped if the target doesn't have i16 instructions |
test/CodeGen/AMDGPU/amdgpu-codegenprepare.ll | ||
234–243 ↗ | (On Diff #71067) | I think we should probably split this test and rename the existing one since this is testing something very different from the fdiv handling |
241 ↗ | (On Diff #71067) | There should be tests ensuring that nsw/nuw are preserved as well (I think that should be correct) |
267–276 ↗ | (On Diff #71067) | I'm not sure we want division to be promoted. At least right now it's going to end up emitting the same i32 code |
340 ↗ | (On Diff #71067) | There should be a test that shows exact is preserved |
521–525 ↗ | (On Diff #71067) | There should also be tests that this happens with vectors |
Does the min/max pattern actually get matched for i16 after the promotions are inserted?
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
257–263 ↗ | (On Diff #72311) | I think you can always zero extend for select, since you will be discarding the high-bits with the truncate. |
test/CodeGen/AMDGPU/mul_uint24.ll | ||
39–42 ↗ | (On Diff #72311) | Was this meant to be the duplicate of the above test? If so, I think it would be better to load %a and %b from a global pointer passed in as a kernel argument to guarantee the operands would be in VGPRS: |
Yes
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
257–263 ↗ | (On Diff #72311) | min/max tests fail when always zero extending |
Just one small comment about the sext/zext for selects. With that change, this LGTM.
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
257–263 ↗ | (On Diff #72435) | Ok, I just noticed Matt's comment below. What happens if you always sign extend? If you get no regressions from that I think that would be better. |
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
257–263 ↗ | (On Diff #72435) | some min patterns do not get matched, which causes min lit test to fail |
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
361 ↗ | (On Diff #72435) | I just noticed that this condition is wrong. We should only be doing the promotion when the target supports 16-bit operations not when it does not support them. |