Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Needs a dedicated test which just runs the pass with opt for all of the operations
| lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
|---|---|---|
| 179 ↗ | (On Diff #69978) | You can't use zext to promote any operation, you must use sext for the signed operations. |
| 179 ↗ | (On Diff #69978) | This also won't get selects for the min/max pattern |
| 183 ↗ | (On Diff #69978) | This should check the type first since it is cheaper. We also should investigate whether this should be done for smaller types that will be legalized to i16 |
| test/CodeGen/AMDGPU/mul_uint24.ll | ||
| 28 ↗ | (On Diff #69978) | I'm surprised this test broke from this. These changed cases should then be duplicated into a version with VGPRs to keep checking the pattern this was meant to |
There are various integer intrinsics that can also be handled but that can be a later patch
| lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
|---|---|---|
| 122 ↗ | (On Diff #71067) | I don't think you need this assert. It's not like the code will be incorrect if this is too aggressive |
| 144 ↗ | (On Diff #71067) | You shouldn't need this |
| 153 ↗ | (On Diff #71067) | Ditto |
| 173 ↗ | (On Diff #71067) | ditto |
| 296–298 ↗ | (On Diff #71067) | I think all of these should be skipped if the target doesn't have i16 instructions |
| test/CodeGen/AMDGPU/amdgpu-codegenprepare.ll | ||
| 234–243 ↗ | (On Diff #71067) | I think we should probably split this test and rename the existing one since this is testing something very different from the fdiv handling |
| 241 ↗ | (On Diff #71067) | There should be tests ensuring that nsw/nuw are preserved as well (I think that should be correct) |
| 267–276 ↗ | (On Diff #71067) | I'm not sure we want division to be promoted. At least right now it's going to end up emitting the same i32 code |
| 340 ↗ | (On Diff #71067) | There should be a test that shows exact is preserved |
| 521–525 ↗ | (On Diff #71067) | There should also be tests that this happens with vectors |
Does the min/max pattern actually get matched for i16 after the promotions are inserted?
| lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
|---|---|---|
| 257–263 ↗ | (On Diff #72311) | I think you can always zero extend for select, since you will be discarding the high-bits with the truncate. |
| test/CodeGen/AMDGPU/mul_uint24.ll | ||
| 39–42 ↗ | (On Diff #72311) | Was this meant to be the duplicate of the above test? If so, I think it would be better to load %a and %b from a global pointer passed in as a kernel argument to guarantee the operands would be in VGPRS: |
Yes
| lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
|---|---|---|
| 257–263 ↗ | (On Diff #72311) | min/max tests fail when always zero extending |
Just one small comment about the sext/zext for selects. With that change, this LGTM.
| lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
|---|---|---|
| 257–263 ↗ | (On Diff #72435) | Ok, I just noticed Matt's comment below. What happens if you always sign extend? If you get no regressions from that I think that would be better. |
| lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
|---|---|---|
| 257–263 ↗ | (On Diff #72435) | some min patterns do not get matched, which causes min lit test to fail |
| lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
|---|---|---|
| 361 ↗ | (On Diff #72435) | I just noticed that this condition is wrong. We should only be doing the promotion when the target supports 16-bit operations not when it does not support them. |