Details
Diff Detail
Event Timeline
Needs a dedicated test which just runs the pass with opt for all of the operations
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
352 | You can't use zext to promote any operation, you must use sext for the signed operations. | |
352 | This also won't get selects for the min/max pattern | |
356 | This should check the type first since it is cheaper. We also should investigate whether this should be done for smaller types that will be legalized to i16 | |
test/CodeGen/AMDGPU/mul_uint24.ll | ||
31 | I'm surprised this test broke from this. These changed cases should then be duplicated into a version with VGPRs to keep checking the pattern this was meant to |
There are various integer intrinsics that can also be handled but that can be a later patch
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
131 | I don't think you need this assert. It's not like the code will be incorrect if this is too aggressive | |
153 | You shouldn't need this | |
162 | Ditto | |
182 | ditto | |
360–362 | I think all of these should be skipped if the target doesn't have i16 instructions | |
test/CodeGen/AMDGPU/amdgpu-codegenprepare.ll | ||
1–10 | I think we should probably split this test and rename the existing one since this is testing something very different from the fdiv handling | |
8 | There should be tests ensuring that nsw/nuw are preserved as well (I think that should be correct) | |
34–43 | I'm not sure we want division to be promoted. At least right now it's going to end up emitting the same i32 code | |
107 | There should be a test that shows exact is preserved | |
288–292 | There should also be tests that this happens with vectors |
Does the min/max pattern actually get matched for i16 after the promotions are inserted?
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
257–263 | I think you can always zero extend for select, since you will be discarding the high-bits with the truncate. | |
test/CodeGen/AMDGPU/mul_uint24.ll | ||
39–42 | Was this meant to be the duplicate of the above test? If so, I think it would be better to load %a and %b from a global pointer passed in as a kernel argument to guarantee the operands would be in VGPRS: |
Yes
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
257–263 | min/max tests fail when always zero extending |
Just one small comment about the sext/zext for selects. With that change, this LGTM.
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
257–263 | Ok, I just noticed Matt's comment below. What happens if you always sign extend? If you get no regressions from that I think that would be better. |
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
257–263 | some min patterns do not get matched, which causes min lit test to fail |
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp | ||
---|---|---|
361 | I just noticed that this condition is wrong. We should only be doing the promotion when the target supports 16-bit operations not when it does not support them. |
CmpInst has an isSigned() memeber function that you can use instead.