This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Transform `(binop1 (binop2 (lshift X,Amt),Mask),(lshift Y,Amt))`
ClosedPublic

Authored by goldstein.w.n on Jun 9 2023, 11:02 AM.

Details

Summary

If Mask and Amt are not constants and binop1 and binop2 are
the same we can transform to:
(binop (lshift (binop X, Y), Amt), Mask)

If binop is add, lshift must be shl.

If Mask and Amt are constants C and C1 respectively.
We can transform to:
(lshift1 (binop1 (binop2 X, (inv_lshift1 C, C1), Y)), C1)

Saving an instruction IFF:
lshift1 is same opcode as lshift2
Either bitwise1 and/or bitwise2 is and.

Proofs(1/2): https://alive2.llvm.org/ce/z/BjN-m_
Proofs(2/2): https://alive2.llvm.org/ce/z/bZn5QB

This is to help fix the regression caused in D151807

Diff Detail

Event Timeline

goldstein.w.n created this revision.Jun 9 2023, 11:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 9 2023, 11:02 AM
goldstein.w.n requested review of this revision.Jun 9 2023, 11:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 9 2023, 11:02 AM
nikic added inline comments.Jun 9 2023, 1:12 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2096

This is probably not safe, it looks like m_LogicalShift also supports constant expressions.

2108

So I guess this works, but it's mostly by accident and doesn't capture the core requirement. We're moving a bitwise op before the shift, which for and/or requires that the shifted out bits are zero (this also extends to add/sub): https://alive2.llvm.org/ce/z/ieAz2T This requirement doesn't exist for and, which is why this check happens to work.

I would prefer checking that the constant round-trips the shift rather than checking opcodes here. It's okay to enforce this for and as well, as demanded bits simplification will zero out the low bits.

Dramatically increase cases covered

goldstein.w.n retitled this revision from [InstCombine] Transform `(bitwise1 (bitwise2 (lshift1 X, C), C1), (lshift2 Y, C))` to [InstCombine] Transform `(binop1 (binop2 (lshift X,Amt),Mask),(lshift Y,Amt))`.Jun 10 2023, 1:08 AM
goldstein.w.n edited the summary of this revision. (Show Details)
goldstein.w.n marked an inline comment as done.Jun 10 2023, 1:11 AM
goldstein.w.n added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2108

Okay, updated for all the cases I could think of.

I didn't extend to sub just because 1) it has some special cases and 2) sub X, Const always canonicalizes to add X, -Const so covering add is enough.

Current workings are:
if both ops are same -> do generic transform and just re-associate the shift. (if add, then only if shl)
if both are bitwise and 1 is and -> just do transform
if both are bitwise no and -> check mask -> do transform
if outer binop is and -> just do transform
else -> fail

nikic added a comment.Jun 11 2023, 9:30 AM

It looks like there are some more tests that need to be updated.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
771

Here and elsewhere: Keep variable scope minimal.

775

ConstantExpr

785

it's

786

irrelevant

795

reassociate

Though this is really more "distribute" than reassociate.

812

Does this case trigger for anything other than non-splat vectors? If not, please remove it. (Generally, don't add non-splat vector support it it requires any additional complexity.)

820

Isn't this also fine for add/shl, as in the case above? https://alive2.llvm.org/ce/z/ztthn5

goldstein.w.n marked 6 inline comments as done.Jun 11 2023, 12:42 PM

It looks like there are some more tests that need to be updated.

Fixed.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
812

Why is that? I'd argue this is very minimal complexity and think it is an issue that a significant number of our combines only work for splats. Generally we mark combines that fail because of lacking splat support not as "*_fail", but "*_todo" which seems to imply we *want* to handle them. Leaving out these 4-lines of code (that also provide an early exit before more expensive checks) seems equivelent to choosing to leave a simple addressable TODO.

goldstein.w.n edited the summary of this revision. (Show Details)

Fix nits and handle add better

nikic added inline comments.Jun 11 2023, 2:29 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
808

Handling add+shl should be safe here as well: https://alive2.llvm.org/ce/z/dWf_6D

I think you can extract a helper to check Instruction::isBitwiseLogicOp(BinOpc) || ShOpc == Instruction::Shl and use that everywhere that isBitWiseLogicOp() is used.

(I wanted to suggest just excluding the add+lshr combination upfront so we don't have to worry about it everywhere else, but it is valid for the and case above. If you're okay with losing that, we can just check this in IsValidBinOpc and not worry about it after that anymore.)

812

It's not a lot of code, but it does increase the verification space of the transform non-trivially, for a transform that already has some combinatorial explosion going on.

Anyway, I'm okay with keeping this, but I would suggest thinking about whether non-trivial non-splat vector handling is really a good use of implementation, review and long-term maintenance time, for the optimization benefit it provides.

goldstein.w.n added inline comments.Jun 11 2023, 11:27 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
808

Handling add+shl should be safe here as well: https://alive2.llvm.org/ce/z/dWf_6D

I think you can extract a helper to check Instruction::isBitwiseLogicOp(BinOpc) || ShOpc == Instruction::Shl and use that everywhere that isBitWiseLogicOp() is used.

Good catch (when I was playing around with combinations, I was using lshr so missed most of the add + shl cases).

(I wanted to suggest just excluding the add+lshr combination upfront so we don't have to worry about it everywhere else, but it is valid for the and case above. If you're okay with losing that, we can just check this in IsValidBinOpc and not worry about it after that anymore.)

I added IsCompletelyDistributable and early out if its not met after this check so the following two checks can ignore it.

812

It's not a lot of code, but it does increase the verification space of the transform non-trivially, for a transform that already has some combinatorial explosion going on.

Anyway, I'm okay with keeping this, but I would suggest thinking about whether non-trivial non-splat vector handling is really a good use of implementation, review and long-term maintenance time, for the optimization benefit it provides.

I guess this (add + lshr) highlight a difference in philosophy between us (that has come up on other reviews). My general feeling is the more that can be covered, the better. Its a lot more painful to have to rewrite code because the compiler isn't getting your case and updating the compiler doesn't reap rewards for most until at least a year or so down the line. More cases does imply code complexity, but generally think a clean 10k LOC is clearer than 1k messy LOC i.e, as long as coding standards are kept high and the niche cases don't degrade the quality of the file, they are worth it.
Maybe I'm trivializing some aspect of it however.

Add helper for detecting easily distributable ops + ruleout non-distributable ops earlier

nikic added a comment.Jun 12 2023, 8:30 AM

It looks like binop-and-shifts.ll reverted to the baseline checks.

llvm/lib/Transforms/InstCombine/InstCombineInternal.h
453–467

I don't think listing the preconditions for the transform is particularly useful in the API docs, especially if they are this convoluted.

Alternatively, maybe move the comment into the source file.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
811–818
819–833

Or alternative extract all the checks into a function returning bool, but unnecessary // Pass please.

nikic added inline comments.Jun 12 2023, 9:35 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
812

My general philosophy here is that transforms should be "principled", and not based around long lists of special cases. For pragmatic reasons, I am willing to accept unprincipled transforms for cases that have been encountered in real code, but I prefer not to have unprincipled generalizations of transforms "just because we can".

For example, if you want to implement a fold for eq, you should always also handle ne, otherwise you're just handling one special case of a more general fold. If that same fold also works for unsigned predicates, it would be strongly encouraged to handle that as well. But if it's possible to generalize that fold to signed predicates, but this requires introducing special cases for known sign bit, known power of two, known bits and signed min, then ... that's not a principled extension, it's just a bag of random special cases. Adding those special cases may be fine if there is reason to believe that they will actually be useful. But we should not add those special cases just because it leaves the transform "incomplete" in a rather pedantic sense. At least that's my 2c.

Of course, the point where we go from a principled extension to a bag of special cases is subjective. The particular case that started this discussion isn't really problematic (it's really less of a special-case extension and more a precisely expressed pre-condition), and deserves less attention than this discussion ended up giving it ;)

nikic added inline comments.Jun 12 2023, 10:31 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
811–818

Ignore this bit, it's not going to work because it's followed by else if ... in that case, see the note below about just extracting these checks into a bool-returning function, so that you can use early returns.

goldstein.w.n marked 2 inline comments as done.Jun 12 2023, 11:07 AM

It looks like binop-and-shifts.ll reverted to the baseline checks.

Fixed.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
812

My general philosophy here is that transforms should be "principled", and not based around long lists of special cases. For pragmatic reasons, I am willing to accept unprincipled transforms for cases that have been encountered in real code, but I prefer not to have unprincipled generalizations of transforms "just because we can".

For example, if you want to implement a fold for eq, you should always also handle ne, otherwise you're just handling one special case of a more general fold. If that same fold also works for unsigned predicates, it would be strongly encouraged to handle that as well. But if it's possible to generalize that fold to signed predicates, but this requires introducing special cases for known sign bit, known power of two, known bits and signed min, then ... that's not a principled extension, it's just a bag of random special cases. Adding those special cases may be fine if there is reason to believe that they will actually be useful. But we should not add those special cases just because it leaves the transform "incomplete" in a rather pedantic sense. At least that's my 2c.

Ah, so I guess I often end up on the pedantic end of wanting "complete" transformations.

Of course, the point where we go from a principled extension to a bag of special cases is subjective. The particular case that started this discussion isn't really problematic (it's really less of a special-case extension and more a precisely expressed pre-condition), and deserves less attention than this discussion ended up giving it ;)

Sorry :(
but thank you for sharing your 2c :)

On the note of potentially more useful patches, any chance you can let me know if:
D149423
and
D152226

are on your radar / todolist (wherever they may be ordered on that list).
Just want to know if I should be looking for new reviewers for them or
if you intend to review when you have the time (again, whenever that
may be).

Move checks for when transform is valid to helper.

nikic accepted this revision.Jun 12 2023, 12:07 PM

LGTM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
772

and -> any?

llvm/test/Transforms/InstCombine/or-shifted-masks.ll
133

This highlights another possible extension: We can have an extra binop on both sides, not just on one.

This revision is now accepted and ready to land.Jun 12 2023, 12:07 PM

Move comments

nikic added inline comments.Jun 13 2023, 6:42 AM
llvm/test/Transforms/InstCombine/or-shifted-masks.ll
133

Tested this patch together with D151807, and it looks like handling this case would be needed to get the patterns in one instcombine run instead of instcombine,early-cse,instcombine.

goldstein.w.n added inline comments.Jun 13 2023, 10:32 AM
llvm/test/Transforms/InstCombine/or-shifted-masks.ll
133

kk, working on a follow up patch.

goldstein.w.n added inline comments.Jun 13 2023, 3:30 PM
llvm/test/Transforms/InstCombine/or-shifted-masks.ll
133

I'm going to push this one first as is and split multi-binop version to new patch. Handling generic chains necessitates an entire rewrite + alot more complexity so may end up not being worth it if its more modularly done across multiple passes.