This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
ClosedPublic

Authored by kzhuravl on Sep 1 2016, 3:46 AM.

Diff Detail

Event Timeline

kzhuravl updated this revision to Diff 69978.Sep 1 2016, 3:46 AM
kzhuravl retitled this revision from to [AMDGPU] Promote uniform i16 ops to i32 ops.
kzhuravl updated this object.
kzhuravl added reviewers: tstellarAMD, arsenm, wdng, Restricted Project.
arsenm edited edge metadata.Sep 1 2016, 9:53 AM

Needs a dedicated test which just runs the pass with opt for all of the operations

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
179

You can't use zext to promote any operation, you must use sext for the signed operations.

179

This also won't get selects for the min/max pattern

183

This should check the type first since it is cheaper. We also should investigate whether this should be done for smaller types that will be legalized to i16

test/CodeGen/AMDGPU/mul_uint24.ll
28

I'm surprised this test broke from this. These changed cases should then be duplicated into a version with VGPRs to keep checking the pattern this was meant to

kzhuravl updated this revision to Diff 71067.Sep 12 2016, 3:47 PM
kzhuravl removed a reviewer: Restricted Project.
kzhuravl marked 4 inline comments as done.

Address review feedback

kzhuravl added a subscriber: Restricted Project.Sep 12 2016, 3:48 PM
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
51

CmpInst has an isSigned() memeber function that you can use instead.

62

Same thing here.

test/CodeGen/AMDGPU/ctlz.ll
247 ↗(On Diff #71067)

Did you mean to change this?

There are various integer intrinsics that can also be handled but that can be a later patch

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
83

I don't think you need this assert. It's not like the code will be incorrect if this is too aggressive

105

You shouldn't need this

114

Ditto

134

ditto

197–199

I think all of these should be skipped if the target doesn't have i16 instructions

test/CodeGen/AMDGPU/amdgpu-codegenprepare.ll
234–243 ↗(On Diff #71067)

I think we should probably split this test and rename the existing one since this is testing something very different from the fdiv handling

241 ↗(On Diff #71067)

There should be tests ensuring that nsw/nuw are preserved as well (I think that should be correct)

267–276 ↗(On Diff #71067)

I'm not sure we want division to be promoted. At least right now it's going to end up emitting the same i32 code

340 ↗(On Diff #71067)

There should be a test that shows exact is preserved

521–525 ↗(On Diff #71067)

There should also be tests that this happens with vectors

Does the min/max pattern actually get matched for i16 after the promotions are inserted?

kzhuravl marked 13 inline comments as done.Sep 23 2016, 11:15 AM

There are various integer intrinsics that can also be handled but that can be a later patch

Ok

Does the min/max pattern actually get matched for i16 after the promotions are inserted?

Yes

kzhuravl updated this revision to Diff 72311.Sep 23 2016, 11:16 AM

Address review feedback

wdng edited edge metadata.Sep 23 2016, 11:27 AM

There are various integer intrinsics that can also be handled but that can be a later patch

Ok

Does the min/max pattern actually get matched for i16 after the promotions are inserted?

Yes

Does this patch work for umed3 instruction? Just want to check.

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
209–215

I think you can always zero extend for select, since you will be discarding the high-bits with the truncate.

test/CodeGen/AMDGPU/mul_uint24.ll
36–39

Was this meant to be the duplicate of the above test? If so, I think it would be better to load %a and %b from a global pointer passed in as a kernel argument to guarantee the operands would be in VGPRS:

kzhuravl marked an inline comment as done.Sep 25 2016, 5:14 PM
In D24125#550988, @wdng wrote:

Does this patch work for umed3 instruction? Just want to check.

Yes

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
209–215

min/max tests fail when always zero extending

kzhuravl updated this revision to Diff 72435.Sep 25 2016, 5:21 PM
kzhuravl edited edge metadata.

Address review feedback

Just one small comment about the sext/zext for selects. With that change, this LGTM.

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
209–215

Ok, I just noticed Matt's comment below. What happens if you always sign extend? If you get no regressions from that I think that would be better.

kzhuravl added inline comments.Sep 28 2016, 10:23 AM
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
209–215

some min patterns do not get matched, which causes min lit test to fail

tstellarAMD requested changes to this revision.Sep 28 2016, 10:32 AM
tstellarAMD edited edge metadata.
tstellarAMD added inline comments.
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
198

I just noticed that this condition is wrong. We should only be doing the promotion when the target supports 16-bit operations not when it does not support them.

This revision now requires changes to proceed.Sep 28 2016, 10:32 AM
kzhuravl updated this revision to Diff 72868.Sep 28 2016, 11:36 AM
kzhuravl edited edge metadata.

Address review feedback

kzhuravl marked an inline comment as done.Sep 28 2016, 11:37 AM
tstellarAMD accepted this revision.Sep 28 2016, 12:26 PM
tstellarAMD edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Sep 28 2016, 12:26 PM
kzhuravl retitled this revision from [AMDGPU] Promote uniform i16 ops to i32 ops to [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions.Sep 28 2016, 1:10 PM
kzhuravl edited edge metadata.
This revision was automatically updated to reflect the committed changes.