Details
Diff Detail
Event Timeline
Needs a dedicated test which just runs the pass with opt for all of the operations
| lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
|---|---|---|
| 179 | You can't use zext to promote any operation, you must use sext for the signed operations. | |
| 179 | This also won't get selects for the min/max pattern | |
| 183 | This should check the type first since it is cheaper. We also should investigate whether this should be done for smaller types that will be legalized to i16 | |
| test/CodeGen/AMDGPU/mul_uint24.ll | ||
| 28 | I'm surprised this test broke from this. These changed cases should then be duplicated into a version with VGPRs to keep checking the pattern this was meant to | |
There are various integer intrinsics that can also be handled but that can be a later patch
| lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
|---|---|---|
| 83 | I don't think you need this assert. It's not like the code will be incorrect if this is too aggressive | |
| 105 | You shouldn't need this | |
| 114 | Ditto | |
| 134 | ditto | |
| 197–199 | I think all of these should be skipped if the target doesn't have i16 instructions | |
| test/CodeGen/AMDGPU/amdgpu-codegenprepare.ll | ||
| 234–243 ↗ | (On Diff #71067) | I think we should probably split this test and rename the existing one since this is testing something very different from the fdiv handling |
| 241 ↗ | (On Diff #71067) | There should be tests ensuring that nsw/nuw are preserved as well (I think that should be correct) |
| 267–276 ↗ | (On Diff #71067) | I'm not sure we want division to be promoted. At least right now it's going to end up emitting the same i32 code |
| 340 ↗ | (On Diff #71067) | There should be a test that shows exact is preserved |
| 521–525 ↗ | (On Diff #71067) | There should also be tests that this happens with vectors |
Does the min/max pattern actually get matched for i16 after the promotions are inserted?
| lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
|---|---|---|
| 209–215 | I think you can always zero extend for select, since you will be discarding the high-bits with the truncate. | |
| test/CodeGen/AMDGPU/mul_uint24.ll | ||
| 36–39 | Was this meant to be the duplicate of the above test? If so, I think it would be better to load %a and %b from a global pointer passed in as a kernel argument to guarantee the operands would be in VGPRS: | |
Yes
| lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
|---|---|---|
| 209–215 | min/max tests fail when always zero extending | |
Just one small comment about the sext/zext for selects. With that change, this LGTM.
| lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
|---|---|---|
| 209–215 | Ok, I just noticed Matt's comment below. What happens if you always sign extend? If you get no regressions from that I think that would be better. | |
| lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
|---|---|---|
| 209–215 | some min patterns do not get matched, which causes min lit test to fail | |
| lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
|---|---|---|
| 198 | I just noticed that this condition is wrong. We should only be doing the promotion when the target supports 16-bit operations not when it does not support them. | |
CmpInst has an isSigned() memeber function that you can use instead.