This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold (x << y) >> y -> x & (-1 >> y)
ClosedPublic

Authored by lebedev.ri on Jun 9 2018, 3:26 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.Jun 9 2018, 3:26 AM
lebedev.ri updated this revision to Diff 150626.Jun 9 2018, 7:13 AM

Simplify matcher, NFC.

spatel accepted this revision.Jun 10 2018, 10:03 AM

LGTM.

lib/Transforms/InstCombine/InstCombineShifts.cpp
732–735 ↗(On Diff #150626)

As with D47981, we could consolidate this, but the constant version doesn't need the one-use check.

I think it's fine either way, but let's keep both patches consistent in their structure.

This revision is now accepted and ready to land.Jun 10 2018, 10:03 AM

LGTM.

Thank you for the review!

lib/Transforms/InstCombine/InstCombineShifts.cpp
732–735 ↗(On Diff #150626)

As with D47981, we could consolidate this, but the constant version doesn't need the one-use check.

I haven't really though about that..
Sounds like i should add multi-use tests with constants to the tests.

spatel added inline comments.Jun 10 2018, 10:09 AM
lib/Transforms/InstCombine/InstCombineShifts.cpp
732–735 ↗(On Diff #150626)

Oops - misplaced this inline comment. It was meant for the block where ShlAmt == ShAmt.

lebedev.ri added inline comments.Jun 10 2018, 12:30 PM
lib/Transforms/InstCombine/InstCombineShifts.cpp
732–735 ↗(On Diff #150626)

As with D47981, we could consolidate this, but the constant version doesn't need the one-use check.

Hmm, this is actually broken. That is only true if either the shifts are the same constant,
or shl is nuw, else it produces one extra instruction.

lebedev.ri added inline comments.Jun 10 2018, 1:08 PM
lib/Transforms/InstCombine/InstCombineShifts.cpp
732–735 ↗(On Diff #150626)

I will add tests, but there is one test in test/Transforms/InstCombine/shift.ll that breaks
if i 'fix' this, so maybe this is intentional..

This revision was automatically updated to reflect the committed changes.
lebedev.ri added reviewers: mareko, bogner, rampitec.

Ok, that didn't go as planned :)
Did not notice the AMDGPU test change (yay tests), so the bots broke.
Did not evaluate the effect of that change on the AMDGPU codegen yet, will do tomorrow.

lebedev.ri reopened this revision.Jun 10 2018, 2:17 PM
This revision is now accepted and ready to land.Jun 10 2018, 2:17 PM
nhaehnle added inline comments.Jun 11 2018, 2:59 AM
test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
898–899 ↗(On Diff #150662)

Indeed, the pattern matching in the backend does not recognize this and ends up generating worse code.

nhaehnle requested changes to this revision.Jun 11 2018, 2:59 AM
This revision now requires changes to proceed.Jun 11 2018, 2:59 AM
lebedev.ri added a comment.EditedJun 11 2018, 3:02 AM
This comment has been deleted.
test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
898–899 ↗(On Diff #150662)

Yep, see D48005 for the backend tests.

@nhaehnle AMDGPU patches (D48007, D48010, D48012) are ready to land, please accept this differential :)

In general for AMDGPU two shifts are better. Any shift immediate can be folded right into the shift instruction while a rather big mask produced by this change would require either extra 4 bytes in the encoding or even worse a move and a register.

What's the rational for the folding?

In addition as tests suggest we would expect the pattern to be folded into a bfe instruction but D48005 shows it is at best "bfm" (with an extra register to hold a mask) and "and". I.e. it basically shows a regression for our target. There probably would be no concern if the sequence is converted to a bfe as expected.

In general for AMDGPU two shifts are better. Any shift immediate can be folded right into the shift instruction while a rather big mask produced by this change would require either extra 4 bytes in the encoding or even worse a move and a register.

What's the rational for the folding?

In addition as tests suggest we would expect the pattern to be folded into a bfe instruction but D48005 shows it is at best "bfm" (with an extra register to hold a mask) and "and". I.e. it basically shows a regression for our target. There probably would be no concern if the sequence is converted to a bfe as expected.

You did note that i have posted D48007, D48010, D48012 to fix the AMDGPU backend, right?
Any other concerns ontop of that?

rampitec accepted this revision.Jun 11 2018, 10:04 AM

Ah! I see the reviews to fold it into a bfe. I have no concern then, LGTM.

Ah! I see the reviews to fold it into a bfe. I have no concern then, LGTM.

Thank you for reassurance!
Though i'm still waiting for @nhaehnle to re-accept this.

nhaehnle accepted this revision.Jun 15 2018, 2:42 AM
This revision is now accepted and ready to land.Jun 15 2018, 2:42 AM

Thanks!
Do note that there does not seem to be any folds for when the offset is not 0.
E.g. (x >> start) & (-1 >> (width - y)) might be foldable to (UBFE
$x, $start, $width)

This revision was automatically updated to reflect the committed changes.
lebedev.ri added inline comments.Jun 17 2018, 11:07 AM
llvm/trunk/lib/Transforms/InstCombine/InstCombineShifts.cpp
741–744

BTW it's interesting to note that all these masks are not fine-grained, isn't it?
Alive says https://rise4fun.com/Alive/Yes (lol)

Though in practice, from what i have seen from the tests, somehow the mask seems to be adjusted later.