This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
ClosedPublic

Authored by kzhuravl on Sep 1 2016, 3:46 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

kzhuravl updated this revision to Diff 69978.Sep 1 2016, 3:46 AM
kzhuravl retitled this revision from to [AMDGPU] Promote uniform i16 ops to i32 ops.
kzhuravl updated this object.
kzhuravl added reviewers: tstellarAMD, arsenm, wdng, Restricted Project.
arsenm edited edge metadata.Sep 1 2016, 9:53 AM

Needs a dedicated test which just runs the pass with opt for all of the operations

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
179 ↗(On Diff #69978)

You can't use zext to promote any operation, you must use sext for the signed operations.

179 ↗(On Diff #69978)

This also won't get selects for the min/max pattern

183 ↗(On Diff #69978)

This should check the type first since it is cheaper. We also should investigate whether this should be done for smaller types that will be legalized to i16

test/CodeGen/AMDGPU/mul_uint24.ll
28 ↗(On Diff #69978)

I'm surprised this test broke from this. These changed cases should then be duplicated into a version with VGPRs to keep checking the pattern this was meant to

kzhuravl updated this revision to Diff 71067.Sep 12 2016, 3:47 PM
kzhuravl removed a reviewer: Restricted Project.
kzhuravl marked 4 inline comments as done.

Address review feedback

kzhuravl added a subscriber: Restricted Project.Sep 12 2016, 3:48 PM
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
51 ↗(On Diff #71067)

CmpInst has an isSigned() memeber function that you can use instead.

62 ↗(On Diff #71067)

Same thing here.

test/CodeGen/AMDGPU/ctlz.ll
247 ↗(On Diff #71067)

Did you mean to change this?

There are various integer intrinsics that can also be handled but that can be a later patch

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
122 ↗(On Diff #71067)

I don't think you need this assert. It's not like the code will be incorrect if this is too aggressive

144 ↗(On Diff #71067)

You shouldn't need this

153 ↗(On Diff #71067)

Ditto

173 ↗(On Diff #71067)

ditto

296–298 ↗(On Diff #71067)

I think all of these should be skipped if the target doesn't have i16 instructions

test/CodeGen/AMDGPU/amdgpu-codegenprepare.ll
234–243 ↗(On Diff #71067)

I think we should probably split this test and rename the existing one since this is testing something very different from the fdiv handling

241 ↗(On Diff #71067)

There should be tests ensuring that nsw/nuw are preserved as well (I think that should be correct)

267–276 ↗(On Diff #71067)

I'm not sure we want division to be promoted. At least right now it's going to end up emitting the same i32 code

340 ↗(On Diff #71067)

There should be a test that shows exact is preserved

521–525 ↗(On Diff #71067)

There should also be tests that this happens with vectors

Does the min/max pattern actually get matched for i16 after the promotions are inserted?

kzhuravl marked 13 inline comments as done.Sep 23 2016, 11:15 AM

There are various integer intrinsics that can also be handled but that can be a later patch

Ok

Does the min/max pattern actually get matched for i16 after the promotions are inserted?

Yes

kzhuravl updated this revision to Diff 72311.Sep 23 2016, 11:16 AM

Address review feedback

wdng edited edge metadata.Sep 23 2016, 11:27 AM

There are various integer intrinsics that can also be handled but that can be a later patch

Ok

Does the min/max pattern actually get matched for i16 after the promotions are inserted?

Yes

Does this patch work for umed3 instruction? Just want to check.

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
257–263 ↗(On Diff #72311)

I think you can always zero extend for select, since you will be discarding the high-bits with the truncate.

test/CodeGen/AMDGPU/mul_uint24.ll
39–42 ↗(On Diff #72311)

Was this meant to be the duplicate of the above test? If so, I think it would be better to load %a and %b from a global pointer passed in as a kernel argument to guarantee the operands would be in VGPRS:

kzhuravl marked an inline comment as done.Sep 25 2016, 5:14 PM
In D24125#550988, @wdng wrote:

Does this patch work for umed3 instruction? Just want to check.

Yes

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
257–263 ↗(On Diff #72311)

min/max tests fail when always zero extending

kzhuravl updated this revision to Diff 72435.Sep 25 2016, 5:21 PM
kzhuravl edited edge metadata.

Address review feedback

Just one small comment about the sext/zext for selects. With that change, this LGTM.

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
257–263 ↗(On Diff #72435)

Ok, I just noticed Matt's comment below. What happens if you always sign extend? If you get no regressions from that I think that would be better.

kzhuravl added inline comments.Sep 28 2016, 10:23 AM
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
257–263 ↗(On Diff #72435)

some min patterns do not get matched, which causes min lit test to fail

tstellarAMD requested changes to this revision.Sep 28 2016, 10:32 AM
tstellarAMD edited edge metadata.
tstellarAMD added inline comments.
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
361 ↗(On Diff #72435)

I just noticed that this condition is wrong. We should only be doing the promotion when the target supports 16-bit operations not when it does not support them.

This revision now requires changes to proceed.Sep 28 2016, 10:32 AM
kzhuravl updated this revision to Diff 72868.Sep 28 2016, 11:36 AM
kzhuravl edited edge metadata.

Address review feedback

kzhuravl marked an inline comment as done.Sep 28 2016, 11:37 AM
tstellarAMD accepted this revision.Sep 28 2016, 12:26 PM
tstellarAMD edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Sep 28 2016, 12:26 PM
kzhuravl retitled this revision from [AMDGPU] Promote uniform i16 ops to i32 ops to [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions.Sep 28 2016, 1:10 PM
kzhuravl edited edge metadata.
This revision was automatically updated to reflect the committed changes.
llvm/trunk/test/CodeGen/AMDGPU/mul_uint24-r600.ll