This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
26/39
InstCombineAndOrXor.cpp
-
InstCombineInternal.h
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
2/2
FoldRedundantShiftedMasks.ll

Differential D49229

[AggressiveInstCombine] Fold redundant masking operations of shifted value
AbandonedPublic

Authored by dnsampaio on Jul 12 2018, 5:44 AM.

Download Raw Diff

Details

Reviewers

spatel
efriedma
lebedev.ri
labrinea

Summary

Allow to reduce redundant shift masks

Let
OR = x1 | x2, where
x1 = x & 0xAB00
x2 = (x >> 8) & 0xAB

The x2 operation can be seen as
x2 = (x >> 8) & (0xAB00 >> 8)

>

x2 = (x & 0xAB00) >> 8

And finally reduced to
x2 = x1 >> 8

Diff Detail

Event Timeline

dnsampaio created this revision.Jul 12 2018, 5:44 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptJul 12 2018, 5:44 AM

Would be good if you could also put these folds into https://rise4fun.com/Alive and link them here,
to validate that at least the cases tested here are handled correctly.

test/Transforms/InstCombine/D48278.ll
1 ↗	(On Diff #155155)	use `./utils/update_test_checks.py` Don't use `-O3`, specify `-instcombine` Please clean up the test cases, run `-O3 -instnamer` on it beforehand. How about calling the filename a bit more meaningful name?

lebedev.ri added inline comments.Jul 12 2018, 5:53 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2838	Is this even for instcombine? I wonder if this should be aggressiveinstcombine, or something else?

spatel added inline comments.Jul 12 2018, 6:51 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2838	Right - you'll notice that in my potential suggestions in D48278, I did not include instcombine. I think this has the same problem as that patch, but when you do it in instcombine, it gets multiplied by another 8x factor because we run instcombine so many times in the normal opt pass pipelines. In general, we don't walk user lists in instcombine because that's not efficient (same concern that @efriedma raised in the other patch I think). On top of that, this patch is using recursive value tracking which is also expensive. I suggest looking at (new)-gvn or early-cse to see if we can find the redundant op efficiently.

Constrain to allow the transformation to happen only when the masked value has only 2 users (an AND and a SHIFT).
Removed value tracking operations.

Fixed execution costs. See below.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2838	If it is due the running costs, that is fixed. If is due the complexity of it ... I believe it is quite straight forward no?
2838	No problems. Can remove both value tracking and iteration as to work in cases where the masked value only has 2 users, an and and a shift operation.

dmgreen added a subscriber: dmgreen.Jul 13 2018, 1:45 AM

Moved test to correct folder

Using hasNUses() instead of numUses() ==

Please always upload all patches with the full context (-U99999)
I think this was said in the other review, but while i acknowledge the missing fold, i'm not convinced that this approach is the right one. I think this should be done in smaller, fine-grained steps. I think, first one would be to canonicalize the and-mask before shifts. https://rise4fun.com/Alive/rOb Incidentally, that would already solve these, at least at -O3: https://godbolt.org/g/CkJzQd

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1416	Newline
2838	No the point. What @spatel said: In general, we don't walk user lists in instcombine because that's not efficient (same concern that @efriedma raised in the other patch I think). We simply don't. For this to be in instcombine, you'd need to match the `%4 = or i32 %2, %3` and look at it's parents. Or maybe this should be in some other pass.
test/Transforms/InstCombine/FoldRedundantShiftedMasks.ll
27–32	Please cleanup the tests a little, prefix all these numeric variables with explicit `tmp` prefix.

In D49229#1163319, @lebedev.ri wrote:

Please always upload all patches with the full context (-U99999)

Sorry, I thought that it being an entire function it wouldn't matter. Lesson learned.

I think this was said in the other review, but while i acknowledge the missing fold, i'm not convinced that this approach is the right one. I think this should be done in smaller, fine-grained steps. I think, first one would be to canonicalize the and-mask before shifts.

The problem with this canonicalization process is that, it is not certainly reducing code-size. It might be profitable in certain cases, and problematic in others. I won't take place in the decision if the normalization should be changed, as I do not have a strong opinion about it.
Also about from the canonicalization thread, I understood that they wanted to canonicalize only shl. It won't help in my case where I want the redundant and of ashr and lshr results to be handled.

About using the value uses in InstCombine. Well, it wouldn't be the 1st transformation to do it. The idea to start matching all the way from the bottom, like from the "or" operation and match both sides of it could be a solution. But is much less generic. It would require to be called in all binary operation as starting point as to cover the same cases of this one.

So what do you think is the best?
Create a separate pass? If so, when should it be called, as to maximize the chances of the pattern being detected?

Use InstCombine and start all the way from the bottom? Should it be added to all binary operations or do I leave it just in the 'or' operation?

Or leave as it is, apart from the corrections just described?

All of your test cases are rooted at an or, so it makes sense to search up from there. Why not start with searching just from or (and xors?) and then add the search from more operators in later patches?

Detect desired pattern from the binary operation using the results.

dnsampaio updated this revision to Diff 156475.Jul 20 2018, 6:16 AM

labrinea added inline comments.Jul 23 2018, 7:10 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2024	You could add a more descriptive comment like: // Fold redundant masking. // Let C3 = C1 lsh C2 // (X & C1) \| ((X lsh C2) & C3) -> (X & C1) \| ((X & C1) lsh C2) // Also handles the commutative cases.
2027	You could change that into `m_c_Or` to cover the commutative case too. Then you can get rid of the code duplication.
2031	You could replace the lines 2031-2035 with BinaryOperator Op0 = cast<BinaryOperator>(I.getOperand(0)); BinaryOperator Op1 = cast<BinaryOperator>(I.getOperand(1)); BinaryOperator And, Shift; if ((Shift = dyn_cast<BinaryOperator>(Op0->getOperand(0)))) And = Op1; else if ((Shift = dyn_cast<BinaryOperator>(Op0->getOperand(1)))) And = Op1; else if ((Shift = dyn_cast<BinaryOperator>(Op1->getOperand(0)))) And = Op0; else And = Op0; and then remove the commutative pattern match from below (lines 2049-2070).
test/Transforms/InstCombine/FoldRedundantShiftedMasks.ll
5	The reference to Phabricator is confusing. Please remove it. It'll become irrelevant with time anyway.

labrinea added a reviewer: labrinea.Jul 23 2018, 7:11 AM

samparker added inline comments.Jul 23 2018, 7:13 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2024	Looks like the convention here is to perform more specific folding after the calls to SimplifyAssociativeOrCommutative, etc.. And there is already a case for '(A & C)\|(B & D)'.
2027	We already know I is an Or.

labrinea added inline comments.Jul 23 2018, 7:38 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2024	Yeap, Sam is right. Line 2079 of the original source.
2031	I made a mistake at the else clause. Should be else { Shift = cast<BinaryOperator>(Op1->getOperand(1)); And = Op0; }

lebedev.ri added inline comments.Jul 23 2018, 7:50 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2024	Just put this into a new function and call it from here.
2026–2027	s/`X1`/`X`/
2027	if (match(&I, m_Or(m_c_And(m_Value(X1), m_APInt(AndC)), @labrinea is correct. This is backwards. `m_c_And` won't ever matter. Constant will be rhs, So it should be `m_And`. But this isn't true about `m_Or`, that one will be missing some commutative cases. It should be m_c_Or. (i thought i commented that already?)
2028	Same here, constant will always be RHS, no need for commutativity.
2028	s/`m_Value(X2)`/`m_Deferred(X)`/
2030	And this is no longer needed.
2049–2052	This should be if (match(&I, m_c_Or(m_And(m_OneUse(m_Shift(m_Value(X), m_APInt(ShftAmt))), m_APInt(ShAndC)), m_And(m_Deferred(X), m_APInt(AndC)))) {
2053–2058	Capture them in `match()`.

Did most fixes. Just don't how to capture non-leaf nodes of the pattern being matched. Using other match operations would actually be more complicated than just passing the operands as arguments to the new function, now that I already know they are AND operations due the function call placement.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

2053–2058

I don't see how to capture nodes that are not the leafs of the pattern. Should I use another match? It would be easier to just check if the first operand of the OR is the "and(X, AndC)" no?

if (match(&I, m_c_Or(m_And(m_OneUse(m_Shift(m_Value(X),
            m_APInt(ShftAmt))), m_APInt(ShAndC)),m_And(
            m_Deferred(X),  m_APInt(AndC))))){
 BinaryOperator *And = I.getOperand(0), *ShtAnd = I.getOperand(1), *Shift;
if(And->getOperand(0) != X)
 std::swap(And, ShtAnd);
Shift = ShtAnd->getOperand(0);

lebedev.ri added inline comments.Jul 24 2018, 4:51 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2053–2058	See `m_CombineAnd()`

Moved to a separated function. Placed function call after knowing more about the operands.
Added ashr case, that was being wrongly treated as lshr.
Added comments, including one that argues that this function would be useless if and instructions are move before any type of shift operations.
Using m_c_Or, and passing operands as arguments.

lebedev.ri added inline comments.Jul 24 2018, 5:01 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2026–2036	See `m_CombineAnd()`.
2042–2046	This will miscompile with types larger than i64. Please add tests.

lebedev.ri added inline comments.Jul 24 2018, 5:07 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2031	This is super subtle, defining more variables after the first one which is being initialized,
2042–2046	Hmm, no, actually. disregard that. If the shift is too large, then that old shift would have been folded to `undef` already.

dnsampaio marked 2 inline comments as done.Jul 24 2018, 8:36 AM

dnsampaio added inline comments.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2026–2036	Nice one, it took me some time to understand that I could match and then capture. Thanks. :)

All required values are obtained during the pattern matching.

lebedev.ri added inline comments.Jul 24 2018, 8:55 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2022	I don't think `A` and `B` are used anymore?
2028	We create two new instructions, but we only make sure that one instruction (`Shift`, i think?) goes away. The outermost `m_And()` should too be `m_OneUse()`, i think.
2033	Early return please.

dnsampaio marked 2 inline comments as done.Jul 24 2018, 9:14 AM

dnsampaio added inline comments.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2028	Actually no. We are eliminating one of the and operations. x1 = v & 0xFF00 x2 = v >> 8 x3 = x2 & 0xFF or = x1 \| x3 x4 = x1 + 5; We eliminate x3 and it becomes x1 = v & 0xFF00 x3' = x1 >> 8 or' = x3 \| x1 x4 = x1 + 5; x1 can have more than 1 user. x3 will be remove by dead-code elimination. We replace or.

Removed unused arguments.
Early exit.

dnsampaio added inline comments.Jul 24 2018, 9:19 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2028	Small correction: x1 = v & 0xFF00 x3' = x1 >> 8 or' = x3' \| x1 x4 = x1 + 5

Relaxed conditions for which the transformation is applied.
Added more tests for ashr.

dnsampaio updated this revision to Diff 157279.Jul 25 2018, 8:15 AM

dnsampaio marked 2 inline comments as done.Jul 29 2018, 7:11 AM

Replaces all uses of the innermost and with the new shift.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2028	But indeed, we must replace all uses of the innermost AND with the new shift.

lebedev.ri added inline comments.Jul 30 2018, 4:10 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2058	But how do you know it's dead? I don't think you checked that it is one-use? And if it is one-use, then it will be dead after the parent instruction will be replaced, so why do we care? I guess you want to add a bit more tests.

As we are replacing all the uses of the DeadAnd, it is not required to create and replace the visiting Or operation. Replace all uses does the job already.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2058	The new shift, being created, holds the same value of the DeadAnd. So we can just replace all uses of the DeadAnd with the new shifted value. Test added, to be uploaded.

dnsampaio marked an inline comment as done.Jul 30 2018, 5:03 AM

dnsampaio added inline comments.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2058	Perhaps the naming is misleading. It is "dead" in the sense that the new shift holds the same value, and all it's uses must be replaced. But no, it is not restricted to be a single use. Adding example test.

Added test that the inner and must be replaced by the new shift operation.
Converted the function to bool, as it does not require to create the Or operation after the replaceAll.

spatel added inline comments.Jul 30 2018, 8:11 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2058	I'll say it again: this is getting weird because we're trying to make instcombine do something it should not be doing. You crippled the transform to try to make it fit, and now you're trying to expand it to handle the motivating cases. The most basic case where this transform should fire looks like this: define void @shift_r1(i32 %x) { %r1 = and i32 %x, 172 %sh = shl i32 %x, 8 %r2 = and i32 %sh, 44032 tail call void @use(i32 %r1) tail call void @use(i32 %r2) ret void } --> define void @shift_r1(i32 %x) { %r1 = and i32 %x, 172 %r2 = shl i32 %r1, 8 tail call void @use(i32 %r1) tail call void @use(i32 %r2) ret void } There is no 'or' in the pattern. The optimization is about recognizing a common subexpression and using it to remove an instruction. That could be CSE, GVN, or a standalone pass. I don't see how this fits in instcombine.

dnsampaio added inline comments.Aug 2 2018, 3:38 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2058	Hi Sanjay, So all expensive operations have been eliminated, I do not see why it wouldn't fit in InstCombne. We detect a pattern and we reduce it. The transformation won't fit EarlyCSE pass, as it might be required to move the producer `and` up, such as in: define void @shift_r1(i32 %x) { %sh = shl i32 %x, 8 %r2 = and i32 %sh, 44032 tail call void @use(i32 %r2) %r1 = and i32 %x, 172 tail call void @use(i32 %r1) ret void } We must move `%r1` before `%sh`. I could very well do a stand-alone BasicBlockPass, but don't think it would have a very appealing reason to be.

So all expensive operations have been eliminated, I do not see why it wouldn't fit in InstCombne. We detect a pattern and we reduce it.

Because the pattern that we are matching is larger than it needs to be (as the comment in the test file clearly shows - there is no 'or' in the minimal pattern). This problem of trying to make everything fit in instcombine has been discussed several times on llvm-dev in the last ~year. Eg:
http://lists.llvm.org/pipermail/llvm-dev/2017-September/117151.html

Did you look at (new)-GVN to see if it fits in there?

If this is in instcombine (in addition to missing the pattern when there is no 'or'), I think you have to limit the transform based on uses as Roman mentioned in an earlier comment.

So this started life in the DAGCombiner and issues around the implementation were raised and that it would be useful to have earlier in the pipeline. But it seems that it hasn't really be thought, or discussion, about how this would fit well in the existing passes... I think DAG combine has always been the right place for this because we're trying to reuse values - something that DAGs are good for. In DAGCombiner::visitANDLike, we already handle ANDs with SRL operands and the motivating example can be addressed with very little effort:

+        uint32_t ShiftedMask = CAnd->getZExtValue() << ShiftBits;
+        SDNode *Res =
+          DAG.UpdateNodeOperands(N, N0->getOperand(0), DAG.getConstant(ShiftedMask, DL, VT));
+        if (Res != N)
+          return DAG.getNode(ISD::SRL, DL, VT, SDValue(Res, 0), N0->getOperand(1));
+        else
+          return SDValue(DAG.UpdateNodeOperands(N, N0, SDValue(CAnd, 0)), 0);

I don't know the cost of calling UpdateNodeOperands, potentially twice, but I feel the simplicity suggests that the transform is most suited to the DAG.

In D49229#1185608, @spatel wrote:

So all expensive operations have been eliminated, I do not see why it wouldn't fit in InstCombne. We detect a pattern and we reduce it.

Because the pattern that we are matching is larger than it needs to be (as the comment in the test file clearly shows - there is no 'or' in the minimal pattern). This problem of trying to make everything fit in instcombine has been discussed several times on llvm-dev in the last ~year. Eg:
http://lists.llvm.org/pipermail/llvm-dev/2017-September/117151.html

I agree that it won't handle all cases. But one will need to come with a more generic thinking as to create a new pass that handles this. Something like an abstract known bits, that tells that two values hold the same bits coming from a given instruction, or some simplification by demanded bits from the same values. It is feasible, but it is not my intention to do it so now.

Did you look at (new)-GVN to see if it fits in there?

I must confess that I did not quite understand all the work-flow of newGVN, but from what I did see, it mostly wouldn't fit. It seems to behave like InstCombine, expecting to replace the current instruction being visited. And it would require to create one value as to detect if there is a leader of that value and then reuse it. It is not that complicated, but quite awkward IMO.

If this is in instcombine (in addition to missing the pattern when there is no 'or'), I think you have to limit the transform based on uses as Roman mentioned in an earlier comment.

Why? The @mulUses in the test shows the case where it does not require %x2 to have a single user. The shift operation dominates the and being replaced.

And I agree with @samparker, our motivating example is quite simple, is the @foo in our tests. I just intended to made it more generic so it would act in more shift operations, not to over complicate it. I acknowledge that doing the transformation in the IR could be good, but the DAG fits much simpler. So I really see no problems in having it both places (as in D48278). Either way, it would be nice to come to a definitive solution.

Moving this pattern matching to AggressiveInstCombine following a suggestion of @lebedev.ri . Now it searches for minimal required patterns as desired by @spatel.

Added missing test-file

dnsampaio edited the summary of this revision. (Show Details)Aug 13 2018, 7:30 AM

samparker removed a reviewer: samparker.Feb 7 2019, 8:38 AM

lebedev.ri requested changes to this revision.Jun 21 2019, 10:51 AM

This revision now requires changes to proceed.Jun 21 2019, 10:51 AM

Hi @lebedev.ri,
Nice you looked this one as I am not quite sure what to do about it. Any suggestions?

Perhaps this makes it clearer:
https://rise4fun.com/Alive/4TLv

In D49229#1562329, @dnsampaio wrote:

Hi @lebedev.ri,
Nice you looked this one as I am not quite sure what to do about it. Any suggestions?

As it can be seen from the disscussion here, while the problem this is trying to solve is real,
the actual solution that should be done is very much non-obvious. It kind-of doesn't anywhere.
Or maybe there's some astounding perf numbers (SPEC?) that justify this solution?
I don't have any further ideas presently, sorry. I only marked it to remove from review queue.

dnsampaio abandoned this revision.Jun 28 2019, 8:10 AM

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineAndOrXor.cpp

99 lines

InstCombineInternal.h

1 line

test/

Transforms/

InstCombine/

FoldRedundantShiftedMasks.ll

88 lines

Diff 155377

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

Context not available.
	if (Instruction *X = foldShuffledBinop(I))	if (Instruction *X = foldShuffledBinop(I))
	return X;	return X;

		if (Instruction *Shift = foldRedundantShiftedMasks(&I))
		return Shift;
	// See if we can simplify any instructions used by the instruction whose sole	// See if we can simplify any instructions used by the instruction whose sole
		lebedev.riUnsubmitted Done Reply Inline Actions Newline lebedev.ri: Newline
	// purpose is to compute bits we don't care about.	// purpose is to compute bits we don't care about.
	if (SimplifyDemandedInstructionBits(I))	if (SimplifyDemandedInstructionBits(I))
		labrineaUnsubmitted Done Reply Inline Actions You could add a more descriptive comment like: // Fold redundant masking. // Let C3 = C1 lsh C2 // (X & C1) \| ((X lsh C2) & C3) -> (X & C1) \| ((X & C1) lsh C2) // Also handles the commutative cases. labrinea: You could add a more descriptive comment like: ```// Fold redundant masking. // Let C3 = C1 lsh…
		labrineaUnsubmitted Done Reply Inline Actions You could change that into `m_c_Or` to cover the commutative case too. Then you can get rid of the code duplication. labrinea: You could change that into `m_c_Or` to cover the commutative case too. Then you can get rid of…
		labrineaUnsubmitted Not Done Reply Inline Actions You could replace the lines 2031-2035 with BinaryOperator Op0 = cast<BinaryOperator>(I.getOperand(0)); BinaryOperator Op1 = cast<BinaryOperator>(I.getOperand(1)); BinaryOperator And, Shift; if ((Shift = dyn_cast<BinaryOperator>(Op0->getOperand(0)))) And = Op1; else if ((Shift = dyn_cast<BinaryOperator>(Op0->getOperand(1)))) And = Op1; else if ((Shift = dyn_cast<BinaryOperator>(Op1->getOperand(0)))) And = Op0; else And = Op0; and then remove the commutative pattern match from below (lines 2049-2070). labrinea: You could replace the lines 2031-2035 with ``` BinaryOperator *Op0 = cast<BinaryOperator>(I.
		labrineaUnsubmitted Not Done Reply Inline Actions I made a mistake at the else clause. Should be else { Shift = cast<BinaryOperator>(Op1->getOperand(1)); And = Op0; } labrinea: I made a mistake at the else clause. Should be ``` else { Shift = cast<BinaryOperator>(Op1…
		samparkerUnsubmitted Done Reply Inline Actions We already know I is an Or. samparker: We already know I is an Or.
		lebedev.riUnsubmitted Done Reply Inline Actions if (match(&I, m_Or(m_c_And(m_Value(X1), m_APInt(AndC)), @labrinea is correct. This is backwards. `m_c_And` won't ever matter. Constant will be rhs, So it should be `m_And`. But this isn't true about `m_Or`, that one will be missing some commutative cases. It should be m_c_Or. (i thought i commented that already?) lebedev.ri: > if (match(&I, m_Or(m_c_And(m_Value(X1), m_APInt(AndC)), @labrinea is correct. This is…
		samparkerUnsubmitted Done Reply Inline Actions Looks like the convention here is to perform more specific folding after the calls to SimplifyAssociativeOrCommutative, etc.. And there is already a case for '(A & C)\|(B & D)'. samparker: Looks like the convention here is to perform more specific folding after the calls to…
		labrineaUnsubmitted Done Reply Inline Actions Yeap, Sam is right. Line 2079 of the original source. labrinea: Yeap, Sam is right. Line 2079 of the original source.
		lebedev.riUnsubmitted Done Reply Inline Actions Just put this into a new function and call it from here. lebedev.ri: Just put this into a new function and call it from here.
		lebedev.riUnsubmitted Done Reply Inline Actions Same here, constant will always be RHS, no need for commutativity. lebedev.ri: Same here, constant will always be RHS, no need for commutativity.
		lebedev.riUnsubmitted Done Reply Inline Actions s/`m_Value(X2)`/`m_Deferred(X)`/ lebedev.ri: s/`m_Value(X2)`/`m_Deferred(X)`/
		lebedev.riUnsubmitted Done Reply Inline Actions s/`X1`/`X`/ lebedev.ri: s/`X1`/`X`/
		lebedev.riUnsubmitted Done Reply Inline Actions And this is no longer needed. lebedev.ri: And this is no longer needed.
		lebedev.riUnsubmitted Done Reply Inline Actions This should be if (match(&I, m_c_Or(m_And(m_OneUse(m_Shift(m_Value(X), m_APInt(ShftAmt))), m_APInt(ShAndC)), m_And(m_Deferred(X), m_APInt(AndC)))) { lebedev.ri: This should be ``` if (match(&I, m_c_Or(m_And(m_OneUse(m_Shift(m_Value(X), m_APInt…
		lebedev.riUnsubmitted Done Reply Inline Actions Capture them in `match()`. lebedev.ri: Capture them in `match()`.
		dnsampaioAuthorUnsubmitted Not Done Reply Inline Actions I don't see how to capture nodes that are not the leafs of the pattern. Should I use another match? It would be easier to just check if the first operand of the OR is the "and(X, AndC)" no? if (match(&I, m_c_Or(m_And(m_OneUse(m_Shift(m_Value(X), m_APInt(ShftAmt))), m_APInt(ShAndC)),m_And( m_Deferred(X), m_APInt(AndC))))){ BinaryOperator And = I.getOperand(0), ShtAnd = I.getOperand(1), Shift; if(And->getOperand(0) != X) std::swap(And, ShtAnd); Shift = ShtAnd->getOperand(0); dnsampaio:* I don't see how to capture nodes that are not the leafs of the pattern. Should I use another…
		lebedev.riUnsubmitted Done Reply Inline Actions See `m_CombineAnd()` lebedev.ri: See `m_CombineAnd()`
		lebedev.riUnsubmitted Not Done Reply Inline Actions This will miscompile with types larger than i64. Please add tests. lebedev.ri: This will miscompile with types larger than i64. Please add tests.
		lebedev.riUnsubmitted Not Done Reply Inline Actions Hmm, no, actually. disregard that. If the shift is too large, then that old shift would have been folded to `undef` already. lebedev.ri: Hmm, no, actually. disregard that. If the shift is too large, then that old shift would have…
		lebedev.riUnsubmitted Done Reply Inline Actions See `m_CombineAnd()`. lebedev.ri: See `m_CombineAnd()`.
		dnsampaioAuthorUnsubmitted Not Done Reply Inline Actions Nice one, it took me some time to understand that I could match and then capture. Thanks. :) dnsampaio: Nice one, it took me some time to understand that I could match and then capture. Thanks. :)
		lebedev.riUnsubmitted Done Reply Inline Actions This is super subtle, defining more variables after the first one which is being initialized, lebedev.ri: This is super subtle, defining more variables after the first one which is being initialized,
		lebedev.riUnsubmitted Done Reply Inline Actions I don't think `A` and `B` are used anymore? lebedev.ri: I don't think `A` and `B` are used anymore?
		lebedev.riUnsubmitted Done Reply Inline Actions Early return please. lebedev.ri: Early return please.
		lebedev.riUnsubmitted Done Reply Inline Actions We create two new instructions, but we only make sure that one instruction (`Shift`, i think?) goes away. The outermost `m_And()` should too be `m_OneUse()`, i think. lebedev.ri: We create two new instructions, but we only make sure that one instruction (`Shift`, i think?)…
		dnsampaioAuthorUnsubmitted Done Reply Inline Actions Actually no. We are eliminating one of the and operations. x1 = v & 0xFF00 x2 = v >> 8 x3 = x2 & 0xFF or = x1 \| x3 x4 = x1 + 5; We eliminate x3 and it becomes x1 = v & 0xFF00 x3' = x1 >> 8 or' = x3 \| x1 x4 = x1 + 5; x1 can have more than 1 user. x3 will be remove by dead-code elimination. We replace or. dnsampaio: Actually no. We are eliminating one of the and operations. ``` x1 = v & 0xFF00 x2 = v >> 8 x3 =…
		dnsampaioAuthorUnsubmitted Not Done Reply Inline Actions Small correction: x1 = v & 0xFF00 x3' = x1 >> 8 or' = x3' \| x1 x4 = x1 + 5 dnsampaio: Small correction: ``` x1 = v & 0xFF00 x3' = x1 >> 8 or' = x3' \| x1 x4 = x1 + 5 ```
		dnsampaioAuthorUnsubmitted Not Done Reply Inline Actions But indeed, we must replace all uses of the innermost AND with the new shift. dnsampaio: But indeed, we must replace all uses of the innermost AND with the new shift.
		lebedev.riUnsubmitted Done Reply Inline Actions But how do you know it's dead? I don't think you checked that it is one-use? And if it is one-use, then it will be dead after the parent instruction will be replaced, so why do we care? I guess you want to add a bit more tests. lebedev.ri: But how do you know it's dead? I don't think you checked that it is one-use? And if it is one…
		dnsampaioAuthorUnsubmitted Not Done Reply Inline Actions The new shift, being created, holds the same value of the DeadAnd. So we can just replace all uses of the DeadAnd with the new shifted value. Test added, to be uploaded. dnsampaio: The new shift, being created, holds the same value of the DeadAnd. So we can just replace all…
		dnsampaioAuthorUnsubmitted Not Done Reply Inline Actions Perhaps the naming is misleading. It is "dead" in the sense that the new shift holds the same value, and all it's uses must be replaced. But no, it is not restricted to be a single use. Adding example test. dnsampaio: Perhaps the naming is misleading. It is "dead" in the sense that the new shift holds the same…
		spatelUnsubmitted Not Done Reply Inline Actions I'll say it again: this is getting weird because we're trying to make instcombine do something it should not be doing. You crippled the transform to try to make it fit, and now you're trying to expand it to handle the motivating cases. The most basic case where this transform should fire looks like this: define void @shift_r1(i32 %x) { %r1 = and i32 %x, 172 %sh = shl i32 %x, 8 %r2 = and i32 %sh, 44032 tail call void @use(i32 %r1) tail call void @use(i32 %r2) ret void } --> define void @shift_r1(i32 %x) { %r1 = and i32 %x, 172 %r2 = shl i32 %r1, 8 tail call void @use(i32 %r1) tail call void @use(i32 %r2) ret void } There is no 'or' in the pattern. The optimization is about recognizing a common subexpression and using it to remove an instruction. That could be CSE, GVN, or a standalone pass. I don't see how this fits in instcombine. spatel: I'll say it again: this is getting weird because we're trying to make instcombine do something…
		dnsampaioAuthorUnsubmitted Not Done Reply Inline Actions Hi Sanjay, So all expensive operations have been eliminated, I do not see why it wouldn't fit in InstCombne. We detect a pattern and we reduce it. The transformation won't fit EarlyCSE pass, as it might be required to move the producer `and` up, such as in: define void @shift_r1(i32 %x) { %sh = shl i32 %x, 8 %r2 = and i32 %sh, 44032 tail call void @use(i32 %r2) %r1 = and i32 %x, 172 tail call void @use(i32 %r1) ret void } We must move `%r1` before `%sh`. I could very well do a stand-alone BasicBlockPass, but don't think it would have a very appealing reason to be. dnsampaio: Hi Sanjay, So all expensive operations have been eliminated, I do not see why it wouldn't fit…
Context not available.

	return nullptr;	return nullptr;
	}	}

		// fold expressions x1 and x2 alike:
		// x1 = ( and, x, 0x00FF )
		// x2 = (( shl x, 8 ) and 0xFF00 )
		// into
		// x2 = shl x1, 8 ; reuse the computation of x1
		Instruction InstCombiner::foldRedundantShiftedMasks(BinaryOperator AND) {
		// 1st Check our desired pattern / structure
		if (!AND \|\| AND->getOpcode() != Instruction::And)
		return nullptr;

		Instruction *SHIFT = dyn_cast<BinaryOperator>(AND->getOperand(0));
		if (!SHIFT \|\| (SHIFT->getNumOperands() != 2) \|\| (!SHIFT->hasOneUse()) \|\|
		!SHIFT->isShift())
		return nullptr;

		unsigned N0Opcode = SHIFT->getOpcode();

		ConstantInt *ShiftAmount = dyn_cast<ConstantInt>(SHIFT->getOperand(1));
		if (!ShiftAmount)
		return nullptr;

		const APInt &ShiftValue = ShiftAmount->getValue();
		const ConstantInt *Mask = dyn_cast<ConstantInt>(AND->getOperand(1));
		if (!Mask)
		return nullptr;

		const APInt &MaskValue = Mask->getValue();
		Value *MaskedValue = dyn_cast<Value>(SHIFT->getOperand(0));
		if (!MaskedValue \|\| (MaskedValue->getNumUses() != 2))
		return nullptr;

		BinaryOperator *OtherUser =
		dyn_cast<BinaryOperator>(*MaskedValue->users().begin());

		if (!OtherUser)
		return nullptr;

		if (OtherUser == SHIFT)
		OtherUser =
		dyn_cast<BinaryOperator>(*std::next(MaskedValue->users().begin()));
		lebedev.riUnsubmitted Done Reply Inline Actions Is this even for instcombine? I wonder if this should be aggressiveinstcombine, or something else? lebedev.ri: Is this even for instcombine? I wonder if this should be aggressiveinstcombine, or something…
		dnsampaioAuthorUnsubmitted Done Reply Inline Actions If it is due the running costs, that is fixed. If is due the complexity of it ... I believe it is quite straight forward no? dnsampaio: If it is due the running costs, that is fixed. If is due the complexity of it ... I believe it…
		spatelUnsubmitted Done Reply Inline Actions Right - you'll notice that in my potential suggestions in D48278, I did not include instcombine. I think this has the same problem as that patch, but when you do it in instcombine, it gets multiplied by another 8x factor because we run instcombine so many times in the normal opt pass pipelines. In general, we don't walk user lists in instcombine because that's not efficient (same concern that @efriedma raised in the other patch I think). On top of that, this patch is using recursive value tracking which is also expensive. I suggest looking at (new)-gvn or early-cse to see if we can find the redundant op efficiently. spatel: Right - you'll notice that in my potential suggestions in D48278, I did not include…
		dnsampaioAuthorUnsubmitted Done Reply Inline Actions No problems. Can remove both value tracking and iteration as to work in cases where the masked value only has 2 users, an and and a shift operation. dnsampaio: No problems. Can remove both value tracking and iteration as to work in cases where the masked…
		lebedev.riUnsubmitted Not Done Reply Inline Actions No the point. What @spatel said: In general, we don't walk user lists in instcombine because that's not efficient (same concern that @efriedma raised in the other patch I think). We simply don't. For this to be in instcombine, you'd need to match the `%4 = or i32 %2, %3` and look at it's parents. Or maybe this should be in some other pass. lebedev.ri: No the point. What @spatel said: > In general, we don't walk user lists in instcombine because…

		if (!OtherUser \|\| OtherUser == SHIFT \|\|
		OtherUser->getOpcode() != Instruction::And \|\|
		OtherUser->getParent() != SHIFT->getParent())
		return nullptr;

		ConstantInt *OtherMask = dyn_cast<ConstantInt>(OtherUser->getOperand(1));

		if (!OtherMask)
		return nullptr;

		const APInt &OtherMaskValue = OtherMask->getValue();

		if (OtherMaskValue.getBitWidth() != MaskValue.getBitWidth())
		return nullptr;

		// 2nd Check if the masks and shifted masks match
		switch (N0Opcode) {
		case Instruction::Shl:
		if (!(OtherMaskValue.shl(ShiftValue) == MaskValue \|\|
		MaskValue.lshr(ShiftValue) == OtherMaskValue))
		return nullptr;
		break;
		case Instruction::AShr:
		if (!(OtherMaskValue.ashr(ShiftValue) == MaskValue))
		return nullptr;
		break;
		case Instruction::LShr:
		if (!(OtherMaskValue.lshr(ShiftValue) == MaskValue \|\|
		MaskValue.shl(ShiftValue) == OtherMaskValue))
		return nullptr;
		break;
		default:
		return nullptr;
		}

		LLVM_DEBUG(
		dbgs() << "\tValue being masked and shift-masked: "; MaskedValue->dump();
		dbgs() << "\t\tApplied mask: 0x" << OtherMaskValue.toString(16, false)
		<< " : ";
		OtherUser->dump();
		dbgs() << "\n\n\tShifted by: " << ShiftValue.getZExtValue() << " : ";
		SHIFT->dump(); dbgs() << "\t\tAnd masked by: 0x"
		<< MaskValue.toString(16, false) << " : ";
		AND->dump(); dbgs() << "\n\tCan just shift the masked value from ";
		OtherUser->dump(););
		// 3rd If OtherUser (the new producer) runs after this SHIFT, then we must
		// move it higher.
		if (!DT.dominates(OtherUser, SHIFT))
		OtherUser->moveBefore(SHIFT);

		return BinaryOperator::Create((Instruction::BinaryOps)(N0Opcode), OtherUser,
		ShiftAmount);

		return nullptr;
		}
Context not available.

lib/Transforms/InstCombine/InstCombineInternal.h

Context not available.
	Instruction *visitICmpInst(ICmpInst &I);	Instruction *visitICmpInst(ICmpInst &I);
	Instruction FoldShiftByConstant(Value Op0, Constant *Op1,	Instruction FoldShiftByConstant(Value Op0, Constant *Op1,
	BinaryOperator &I);	BinaryOperator &I);
		Instruction foldRedundantShiftedMasks(BinaryOperator AND);
	Instruction *commonCastTransforms(CastInst &CI);	Instruction *commonCastTransforms(CastInst &CI);
	Instruction *commonPointerCastTransforms(CastInst &CI);	Instruction *commonPointerCastTransforms(CastInst &CI);
	Instruction *visitTrunc(TruncInst &CI);	Instruction *visitTrunc(TruncInst &CI);
Context not available.

test/Transforms/InstCombine/FoldRedundantShiftedMasks.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -instcombine %s -o - \| FileCheck %s

				; https://reviews.llvm.org/D48278
				; Fold redundant masking operations of shifted value
				labrineaUnsubmitted Done Reply Inline Actions The reference to Phabricator is confusing. Please remove it. It'll become irrelevant with time anyway. labrinea: The reference to Phabricator is confusing. Please remove it. It'll become irrelevant with time…
				; In a case where
				; x1 = a & 0xFF
				; x2 = a << 8 & 0xFF00
				; we can see x2 as:
				; x2 = a << 8 & 0xFF << 8
				; that can be translated to
				; x2 = (a & 0xFF) << 8
				; that is
				; x2 = x1 << 8


				define i32 @shl(i16 %a) {
				; CHECK-LABEL: @shl(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = and i16 [[A:%.]], 172
				; CHECK-NEXT: [[TMP1:%.*]] = shl nuw i16 [[TMP0]], 8
				; CHECK-NEXT: [[TMP2:%.*]] = or i16 [[TMP0]], [[TMP1]]
				; CHECK-NEXT: [[TMP3:%.*]] = zext i16 [[TMP2]] to i32
				; CHECK-NEXT: ret i32 [[TMP3]]
				;
				entry:
				%0 = sext i16 %a to i32
				%1 = shl i32 %0, 8
				%2 = and i32 %0, 172
				%3 = and i32 %1, 44032
				%4 = or i32 %2, %3
				ret i32 %4
				lebedev.riUnsubmitted Done Reply Inline Actions Please cleanup the tests a little, prefix all these numeric variables with explicit `tmp` prefix. lebedev.ri: Please cleanup the tests a little, prefix all these numeric variables with explicit `tmp`…
				}

				define i32 @lshr(i16 %a) {
				; CHECK-LABEL: @lshr(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = and i16 [[A:%.]], -21504
				; CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[TMP0]] to i32
				; CHECK-NEXT: [[TMP2:%.*]] = lshr exact i32 [[TMP1]], 8
				; CHECK-NEXT: [[TMP3:%.*]] = or i32 [[TMP2]], [[TMP1]]
				; CHECK-NEXT: ret i32 [[TMP3]]
				;
				entry:
				%0 = sext i16 %a to i32
				%1 = lshr i32 %0, 8
				%2 = and i32 %1, 172
				%3 = and i32 %0, 44032
				%4 = or i32 %2, %3
				ret i32 %4
				}

				define i32 @ashr(i16 %a) {
				; CHECK-LABEL: @ashr(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = and i16 [[A:%.]], -21504
				; CHECK-NEXT: [[TMP1:%.*]] = zext i16 [[TMP0]] to i32
				; CHECK-NEXT: [[TMP2:%.*]] = lshr exact i32 [[TMP1]], 8
				; CHECK-NEXT: [[TMP3:%.*]] = or i32 [[TMP2]], [[TMP1]]
				; CHECK-NEXT: ret i32 [[TMP3]]
				;
				entry:
				%0 = sext i16 %a to i32
				%1 = and i32 %0, 44032
				%2 = ashr i32 %0, 8
				%3 = and i32 %2, 172
				%4 = or i32 %1, %3
				ret i32 %4
				}

				define i32 @shl_nogood(i16 %a) {
				; CHECK-LABEL: @shl_nogood(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = sext i16 [[A:%.]] to i32
				; CHECK-NEXT: [[TMP1:%.*]] = and i32 [[TMP0]], 172
				; CHECK-NEXT: [[TMP2:%.*]] = shl i32 [[TMP0]], [[TMP1]]
				; CHECK-NEXT: [[TMP3:%.*]] = and i32 [[TMP2]], 44032
				; CHECK-NEXT: [[TMP4:%.*]] = or i32 [[TMP1]], [[TMP3]]
				; CHECK-NEXT: ret i32 [[TMP4]]
				;
				entry:
				%0 = sext i16 %a to i32
				%1 = and i32 %0, 172
				%2 = shl i32 %0, %1
				%3 = and i32 %2, 44032
				%4 = or i32 %1, %3
				ret i32 %4
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AggressiveInstCombine] Fold redundant masking operations of shifted valueAbandonedPublic

Details

>

Diff Detail

Event Timeline

Revision Contents

Diff 155377

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

lib/Transforms/InstCombine/InstCombineInternal.h

test/Transforms/InstCombine/FoldRedundantShiftedMasks.ll

[AggressiveInstCombine] Fold redundant masking operations of shifted value
AbandonedPublic