Details
Diff Detail
Event Timeline
Needs a dedicated test which just runs the pass with opt for all of the operations
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
288 | You can't use zext to promote any operation, you must use sext for the signed operations. | |
288 | This also won't get selects for the min/max pattern | |
292 | This should check the type first since it is cheaper. We also should investigate whether this should be done for smaller types that will be legalized to i16 | |
test/CodeGen/AMDGPU/mul_uint24.ll | ||
28 | I'm surprised this test broke from this. These changed cases should then be duplicated into a version with VGPRs to keep checking the pattern this was meant to |
There are various integer intrinsics that can also be handled but that can be a later patch
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
122 | I don't think you need this assert. It's not like the code will be incorrect if this is too aggressive | |
144 | You shouldn't need this | |
153 | Ditto | |
173 | ditto | |
296–298 | I think all of these should be skipped if the target doesn't have i16 instructions | |
test/CodeGen/AMDGPU/amdgpu-codegenprepare.ll | ||
234–243 | I think we should probably split this test and rename the existing one since this is testing something very different from the fdiv handling | |
241 | There should be tests ensuring that nsw/nuw are preserved as well (I think that should be correct) | |
267–276 | I'm not sure we want division to be promoted. At least right now it's going to end up emitting the same i32 code | |
340 | There should be a test that shows exact is preserved | |
521–525 | There should also be tests that this happens with vectors |
Does the min/max pattern actually get matched for i16 after the promotions are inserted?
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
248–254 | I think you can always zero extend for select, since you will be discarding the high-bits with the truncate. | |
test/CodeGen/AMDGPU/mul_uint24.ll | ||
36–39 | Was this meant to be the duplicate of the above test? If so, I think it would be better to load %a and %b from a global pointer passed in as a kernel argument to guarantee the operands would be in VGPRS: |
Yes
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
248–254 | min/max tests fail when always zero extending |
Just one small comment about the sext/zext for selects. With that change, this LGTM.
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
248–254 | Ok, I just noticed Matt's comment below. What happens if you always sign extend? If you get no regressions from that I think that would be better. |
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
248–254 | some min patterns do not get matched, which causes min lit test to fail |
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
297 | I just noticed that this condition is wrong. We should only be doing the promotion when the target supports 16-bit operations not when it does not support them. |
CmpInst has an isSigned() memeber function that you can use instead.