This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] SDWA operands should not intersect with potential MIs
ClosedPublic

Authored by SamWot on May 3 2017, 7:14 AM.

Details

Summary

There should be no intesection between SDWA operands and potential MIs. E.g.:

v_and_b32 v0, 0xff, v1 -> src:v1 sel:BYTE_0
v_and_b32 v2, 0xff, v0 -> src:v0 sel:BYTE_0
v_add_u32 v3, v4, v2

In that example it is possible that we would fold 2nd instruction into 3rd (v_add_u32_sdwa) and then try to fold 1st instruction into 2nd (that was already destroyed)
To solve this problem we keep track of every SDWA operand that should be "pulled out" - operand created from MI matched by another SDWA operand.

Diff Detail

Repository
rL LLVM

Event Timeline

SamWot created this revision.May 3 2017, 7:14 AM
SamWot added subscribers: b-sumner, Restricted Project.May 3 2017, 7:16 AM
arsenm added inline comments.May 3 2017, 10:03 AM
lib/Target/AMDGPU/SIPeepholeSDWA.cpp
696 ↗(On Diff #97635)

Capitalize, punctuate

712 ↗(On Diff #97635)

Don't use shared_ptr

724 ↗(On Diff #97635)

This management of the multiple maps seems overcomplicated to me.

Also, since these are only potential cases that could still fail to fold, shouldn't this be handled while folding the operands rather than eliminating them from the potential set in case they fold into other uses?

Shouldn't this be handled as part of the iteration order? If you made SDWAOperandsVector some kind of SetVector and changed the discovery order to post order you should see the final foldable uses first

test/CodeGen/AMDGPU/sdwa-peephole.ll
420–421 ↗(On Diff #97635)

Run instnamer on the test

SamWot updated this revision to Diff 97821.May 4 2017, 6:57 AM

Resolved some issues

arsenm added inline comments.May 5 2017, 10:49 AM
lib/Target/AMDGPU/SIPeepholeSDWA.cpp
478 ↗(On Diff #97821)

shared_ptr should not be used anywhere

SamWot updated this revision to Diff 98432.May 10 2017, 5:57 AM

Got rid of shared_ptr

SamWot added inline comments.May 10 2017, 5:58 AM
lib/Target/AMDGPU/SIPeepholeSDWA.cpp
724 ↗(On Diff #97635)

Also, since these are only potential cases that could still fail to fold, shouldn't this be handled while folding the operands rather than eliminating them from the potential set in case they fold into other uses?

I moved check if instruction is convertible earlier to protect from failed folds.
Also, I'm eliminating SDWA operand, not potential instruction. This instruction would be able to fold with other SDWA operands

Shouldn't this be handled as part of the iteration order? If you made SDWAOperandsVector some kind of SetVector and changed the discovery order to post order you should see the final foldable uses first

Wouldn't this be even more complicated - to rely on some kind of order to ensure that one instruction folds before that another?
Also, I don't understand what order should this be. Problem is that there are both source and destination SDWA operands. While source operands folds into uses, destination operands folds backward into defs.

SamWot updated this revision to Diff 98627.May 11 2017, 6:16 AM

Small fix

arsenm added inline comments.May 15 2017, 1:11 AM
lib/Target/AMDGPU/SIPeepholeSDWA.cpp
724 ↗(On Diff #97635)

It's normal for passes to depend on program order for things like this. Multiple maps that need to be kept consistent I think is harder to follow and reason about

SamWot updated this revision to Diff 99130.May 16 2017, 5:54 AM

Simplified process. Got rid of PulledOut vector.

vpykhtin accepted this revision.May 18 2017, 4:40 AM

I agree if this is a bugfix.

This revision is now accepted and ready to land.May 18 2017, 4:40 AM
This revision was automatically updated to reflect the committed changes.