This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
Local.h
-
lib/Transforms/
-
Transforms/
-
AggressiveInstCombine/
-
AggressiveInstCombine.cpp
-
Utils/
-
Local.cpp
-
test/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
-
EliminateRedundantMasks.ll

Differential D49229

[AggressiveInstCombine] Fold redundant masking operations of shifted value
AbandonedPublic

Authored by dnsampaio on Jul 12 2018, 5:44 AM.

Download Raw Diff

Details

Reviewers

spatel
efriedma
lebedev.ri
labrinea

Summary

Allow to reduce redundant shift masks

Let
OR = x1 | x2, where
x1 = x & 0xAB00
x2 = (x >> 8) & 0xAB

The x2 operation can be seen as
x2 = (x >> 8) & (0xAB00 >> 8)

>

x2 = (x & 0xAB00) >> 8

And finally reduced to
x2 = x1 >> 8

Diff Detail

Event Timeline

dnsampaio created this revision.Jul 12 2018, 5:44 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptJul 12 2018, 5:44 AM

Would be good if you could also put these folds into https://rise4fun.com/Alive and link them here,
to validate that at least the cases tested here are handled correctly.

test/Transforms/InstCombine/D48278.ll
1 ↗	(On Diff #155155)	use `./utils/update_test_checks.py` Don't use `-O3`, specify `-instcombine` Please clean up the test cases, run `-O3 -instnamer` on it beforehand. How about calling the filename a bit more meaningful name?

lebedev.ri added inline comments.Jul 12 2018, 5:53 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2838 ↗	(On Diff #155155)	Is this even for instcombine? I wonder if this should be aggressiveinstcombine, or something else?

spatel added inline comments.Jul 12 2018, 6:51 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2838 ↗	(On Diff #155155)	Right - you'll notice that in my potential suggestions in D48278, I did not include instcombine. I think this has the same problem as that patch, but when you do it in instcombine, it gets multiplied by another 8x factor because we run instcombine so many times in the normal opt pass pipelines. In general, we don't walk user lists in instcombine because that's not efficient (same concern that @efriedma raised in the other patch I think). On top of that, this patch is using recursive value tracking which is also expensive. I suggest looking at (new)-gvn or early-cse to see if we can find the redundant op efficiently.

Constrain to allow the transformation to happen only when the masked value has only 2 users (an AND and a SHIFT).
Removed value tracking operations.

Fixed execution costs. See below.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2838 ↗	(On Diff #155155)	If it is due the running costs, that is fixed. If is due the complexity of it ... I believe it is quite straight forward no?
2838 ↗	(On Diff #155155)	No problems. Can remove both value tracking and iteration as to work in cases where the masked value only has 2 users, an and and a shift operation.

dmgreen added a subscriber: dmgreen.Jul 13 2018, 1:45 AM

Moved test to correct folder

Using hasNUses() instead of numUses() ==

Please always upload all patches with the full context (-U99999)
I think this was said in the other review, but while i acknowledge the missing fold, i'm not convinced that this approach is the right one. I think this should be done in smaller, fine-grained steps. I think, first one would be to canonicalize the and-mask before shifts. https://rise4fun.com/Alive/rOb Incidentally, that would already solve these, at least at -O3: https://godbolt.org/g/CkJzQd

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1416 ↗	(On Diff #155642)	Newline
2838 ↗	(On Diff #155155)	No the point. What @spatel said: In general, we don't walk user lists in instcombine because that's not efficient (same concern that @efriedma raised in the other patch I think). We simply don't. For this to be in instcombine, you'd need to match the `%4 = or i32 %2, %3` and look at it's parents. Or maybe this should be in some other pass.
test/Transforms/InstCombine/FoldRedundantShiftedMasks.ll
26–31 ↗	(On Diff #155642)	Please cleanup the tests a little, prefix all these numeric variables with explicit `tmp` prefix.

In D49229#1163319, @lebedev.ri wrote:

Please always upload all patches with the full context (-U99999)

Sorry, I thought that it being an entire function it wouldn't matter. Lesson learned.

I think this was said in the other review, but while i acknowledge the missing fold, i'm not convinced that this approach is the right one. I think this should be done in smaller, fine-grained steps. I think, first one would be to canonicalize the and-mask before shifts.

The problem with this canonicalization process is that, it is not certainly reducing code-size. It might be profitable in certain cases, and problematic in others. I won't take place in the decision if the normalization should be changed, as I do not have a strong opinion about it.
Also about from the canonicalization thread, I understood that they wanted to canonicalize only shl. It won't help in my case where I want the redundant and of ashr and lshr results to be handled.

About using the value uses in InstCombine. Well, it wouldn't be the 1st transformation to do it. The idea to start matching all the way from the bottom, like from the "or" operation and match both sides of it could be a solution. But is much less generic. It would require to be called in all binary operation as starting point as to cover the same cases of this one.

So what do you think is the best?
Create a separate pass? If so, when should it be called, as to maximize the chances of the pattern being detected?

Use InstCombine and start all the way from the bottom? Should it be added to all binary operations or do I leave it just in the 'or' operation?

Or leave as it is, apart from the corrections just described?

All of your test cases are rooted at an or, so it makes sense to search up from there. Why not start with searching just from or (and xors?) and then add the search from more operators in later patches?

Detect desired pattern from the binary operation using the results.

dnsampaio updated this revision to Diff 156475.Jul 20 2018, 6:16 AM

labrinea added inline comments.Jul 23 2018, 7:10 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2024 ↗	(On Diff #156475)	You could add a more descriptive comment like: // Fold redundant masking. // Let C3 = C1 lsh C2 // (X & C1) \| ((X lsh C2) & C3) -> (X & C1) \| ((X & C1) lsh C2) // Also handles the commutative cases.
2027 ↗	(On Diff #156475)	You could change that into `m_c_Or` to cover the commutative case too. Then you can get rid of the code duplication.
2031 ↗	(On Diff #156475)	You could replace the lines 2031-2035 with BinaryOperator Op0 = cast<BinaryOperator>(I.getOperand(0)); BinaryOperator Op1 = cast<BinaryOperator>(I.getOperand(1)); BinaryOperator And, Shift; if ((Shift = dyn_cast<BinaryOperator>(Op0->getOperand(0)))) And = Op1; else if ((Shift = dyn_cast<BinaryOperator>(Op0->getOperand(1)))) And = Op1; else if ((Shift = dyn_cast<BinaryOperator>(Op1->getOperand(0)))) And = Op0; else And = Op0; and then remove the commutative pattern match from below (lines 2049-2070).
test/Transforms/InstCombine/FoldRedundantShiftedMasks.ll
4 ↗	(On Diff #156475)	The reference to Phabricator is confusing. Please remove it. It'll become irrelevant with time anyway.

labrinea added a reviewer: labrinea.Jul 23 2018, 7:11 AM

samparker added inline comments.Jul 23 2018, 7:13 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2024 ↗	(On Diff #156475)	Looks like the convention here is to perform more specific folding after the calls to SimplifyAssociativeOrCommutative, etc.. And there is already a case for '(A & C)\|(B & D)'.
2027 ↗	(On Diff #156475)	We already know I is an Or.

labrinea added inline comments.Jul 23 2018, 7:38 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2024 ↗	(On Diff #156475)	Yeap, Sam is right. Line 2079 of the original source.
2031 ↗	(On Diff #156475)	I made a mistake at the else clause. Should be else { Shift = cast<BinaryOperator>(Op1->getOperand(1)); And = Op0; }

lebedev.ri added inline comments.Jul 23 2018, 7:50 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2024 ↗	(On Diff #156475)	Just put this into a new function and call it from here.
2026–2027 ↗	(On Diff #156475)	s/`X1`/`X`/
2027 ↗	(On Diff #156475)	if (match(&I, m_Or(m_c_And(m_Value(X1), m_APInt(AndC)), @labrinea is correct. This is backwards. `m_c_And` won't ever matter. Constant will be rhs, So it should be `m_And`. But this isn't true about `m_Or`, that one will be missing some commutative cases. It should be m_c_Or. (i thought i commented that already?)
2028 ↗	(On Diff #156475)	Same here, constant will always be RHS, no need for commutativity.
2028 ↗	(On Diff #156475)	s/`m_Value(X2)`/`m_Deferred(X)`/
2030 ↗	(On Diff #156475)	And this is no longer needed.
2049–2052 ↗	(On Diff #156475)	This should be if (match(&I, m_c_Or(m_And(m_OneUse(m_Shift(m_Value(X), m_APInt(ShftAmt))), m_APInt(ShAndC)), m_And(m_Deferred(X), m_APInt(AndC)))) {
2053–2058 ↗	(On Diff #156475)	Capture them in `match()`.

Did most fixes. Just don't how to capture non-leaf nodes of the pattern being matched. Using other match operations would actually be more complicated than just passing the operands as arguments to the new function, now that I already know they are AND operations due the function call placement.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

2053–2058 ↗

(On Diff #156475)

I don't see how to capture nodes that are not the leafs of the pattern. Should I use another match? It would be easier to just check if the first operand of the OR is the "and(X, AndC)" no?

if (match(&I, m_c_Or(m_And(m_OneUse(m_Shift(m_Value(X),
            m_APInt(ShftAmt))), m_APInt(ShAndC)),m_And(
            m_Deferred(X),  m_APInt(AndC))))){
 BinaryOperator *And = I.getOperand(0), *ShtAnd = I.getOperand(1), *Shift;
if(And->getOperand(0) != X)
 std::swap(And, ShtAnd);
Shift = ShtAnd->getOperand(0);

lebedev.ri added inline comments.Jul 24 2018, 4:51 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2053–2058 ↗	(On Diff #156475)	See `m_CombineAnd()`

Moved to a separated function. Placed function call after knowing more about the operands.
Added ashr case, that was being wrongly treated as lshr.
Added comments, including one that argues that this function would be useless if and instructions are move before any type of shift operations.
Using m_c_Or, and passing operands as arguments.

lebedev.ri added inline comments.Jul 24 2018, 5:01 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2026–2036 ↗	(On Diff #157006)	See `m_CombineAnd()`.
2042–2046 ↗	(On Diff #157006)	This will miscompile with types larger than i64. Please add tests.

lebedev.ri added inline comments.Jul 24 2018, 5:07 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2031 ↗	(On Diff #157006)	This is super subtle, defining more variables after the first one which is being initialized,
2042–2046 ↗	(On Diff #157006)	Hmm, no, actually. disregard that. If the shift is too large, then that old shift would have been folded to `undef` already.

dnsampaio marked 2 inline comments as done.Jul 24 2018, 8:36 AM

dnsampaio added inline comments.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2026–2036 ↗	(On Diff #157006)	Nice one, it took me some time to understand that I could match and then capture. Thanks. :)

All required values are obtained during the pattern matching.

lebedev.ri added inline comments.Jul 24 2018, 8:55 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2022 ↗	(On Diff #157047)	I don't think `A` and `B` are used anymore?
2028 ↗	(On Diff #157047)	We create two new instructions, but we only make sure that one instruction (`Shift`, i think?) goes away. The outermost `m_And()` should too be `m_OneUse()`, i think.
2033 ↗	(On Diff #157047)	Early return please.

dnsampaio marked 2 inline comments as done.Jul 24 2018, 9:14 AM

dnsampaio added inline comments.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2028 ↗	(On Diff #157047)	Actually no. We are eliminating one of the and operations. x1 = v & 0xFF00 x2 = v >> 8 x3 = x2 & 0xFF or = x1 \| x3 x4 = x1 + 5; We eliminate x3 and it becomes x1 = v & 0xFF00 x3' = x1 >> 8 or' = x3 \| x1 x4 = x1 + 5; x1 can have more than 1 user. x3 will be remove by dead-code elimination. We replace or.

Removed unused arguments.
Early exit.

dnsampaio added inline comments.Jul 24 2018, 9:19 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2028 ↗	(On Diff #157047)	Small correction: x1 = v & 0xFF00 x3' = x1 >> 8 or' = x3' \| x1 x4 = x1 + 5

Relaxed conditions for which the transformation is applied.
Added more tests for ashr.

dnsampaio updated this revision to Diff 157279.Jul 25 2018, 8:15 AM

dnsampaio marked 2 inline comments as done.Jul 29 2018, 7:11 AM

Replaces all uses of the innermost and with the new shift.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2028 ↗	(On Diff #157047)	But indeed, we must replace all uses of the innermost AND with the new shift.

lebedev.ri added inline comments.Jul 30 2018, 4:10 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2058 ↗	(On Diff #157930)	But how do you know it's dead? I don't think you checked that it is one-use? And if it is one-use, then it will be dead after the parent instruction will be replaced, so why do we care? I guess you want to add a bit more tests.

As we are replacing all the uses of the DeadAnd, it is not required to create and replace the visiting Or operation. Replace all uses does the job already.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2058 ↗	(On Diff #157930)	The new shift, being created, holds the same value of the DeadAnd. So we can just replace all uses of the DeadAnd with the new shifted value. Test added, to be uploaded.

dnsampaio marked an inline comment as done.Jul 30 2018, 5:03 AM

dnsampaio added inline comments.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2058 ↗	(On Diff #157930)	Perhaps the naming is misleading. It is "dead" in the sense that the new shift holds the same value, and all it's uses must be replaced. But no, it is not restricted to be a single use. Adding example test.

Added test that the inner and must be replaced by the new shift operation.
Converted the function to bool, as it does not require to create the Or operation after the replaceAll.

spatel added inline comments.Jul 30 2018, 8:11 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2058 ↗	(On Diff #157930)	I'll say it again: this is getting weird because we're trying to make instcombine do something it should not be doing. You crippled the transform to try to make it fit, and now you're trying to expand it to handle the motivating cases. The most basic case where this transform should fire looks like this: define void @shift_r1(i32 %x) { %r1 = and i32 %x, 172 %sh = shl i32 %x, 8 %r2 = and i32 %sh, 44032 tail call void @use(i32 %r1) tail call void @use(i32 %r2) ret void } --> define void @shift_r1(i32 %x) { %r1 = and i32 %x, 172 %r2 = shl i32 %r1, 8 tail call void @use(i32 %r1) tail call void @use(i32 %r2) ret void } There is no 'or' in the pattern. The optimization is about recognizing a common subexpression and using it to remove an instruction. That could be CSE, GVN, or a standalone pass. I don't see how this fits in instcombine.

dnsampaio added inline comments.Aug 2 2018, 3:38 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2058 ↗	(On Diff #157930)	Hi Sanjay, So all expensive operations have been eliminated, I do not see why it wouldn't fit in InstCombne. We detect a pattern and we reduce it. The transformation won't fit EarlyCSE pass, as it might be required to move the producer `and` up, such as in: define void @shift_r1(i32 %x) { %sh = shl i32 %x, 8 %r2 = and i32 %sh, 44032 tail call void @use(i32 %r2) %r1 = and i32 %x, 172 tail call void @use(i32 %r1) ret void } We must move `%r1` before `%sh`. I could very well do a stand-alone BasicBlockPass, but don't think it would have a very appealing reason to be.

So all expensive operations have been eliminated, I do not see why it wouldn't fit in InstCombne. We detect a pattern and we reduce it.

Because the pattern that we are matching is larger than it needs to be (as the comment in the test file clearly shows - there is no 'or' in the minimal pattern). This problem of trying to make everything fit in instcombine has been discussed several times on llvm-dev in the last ~year. Eg:
http://lists.llvm.org/pipermail/llvm-dev/2017-September/117151.html

Did you look at (new)-GVN to see if it fits in there?

If this is in instcombine (in addition to missing the pattern when there is no 'or'), I think you have to limit the transform based on uses as Roman mentioned in an earlier comment.

So this started life in the DAGCombiner and issues around the implementation were raised and that it would be useful to have earlier in the pipeline. But it seems that it hasn't really be thought, or discussion, about how this would fit well in the existing passes... I think DAG combine has always been the right place for this because we're trying to reuse values - something that DAGs are good for. In DAGCombiner::visitANDLike, we already handle ANDs with SRL operands and the motivating example can be addressed with very little effort:

+        uint32_t ShiftedMask = CAnd->getZExtValue() << ShiftBits;
+        SDNode *Res =
+          DAG.UpdateNodeOperands(N, N0->getOperand(0), DAG.getConstant(ShiftedMask, DL, VT));
+        if (Res != N)
+          return DAG.getNode(ISD::SRL, DL, VT, SDValue(Res, 0), N0->getOperand(1));
+        else
+          return SDValue(DAG.UpdateNodeOperands(N, N0, SDValue(CAnd, 0)), 0);

I don't know the cost of calling UpdateNodeOperands, potentially twice, but I feel the simplicity suggests that the transform is most suited to the DAG.

In D49229#1185608, @spatel wrote:

So all expensive operations have been eliminated, I do not see why it wouldn't fit in InstCombne. We detect a pattern and we reduce it.

Because the pattern that we are matching is larger than it needs to be (as the comment in the test file clearly shows - there is no 'or' in the minimal pattern). This problem of trying to make everything fit in instcombine has been discussed several times on llvm-dev in the last ~year. Eg:
http://lists.llvm.org/pipermail/llvm-dev/2017-September/117151.html

I agree that it won't handle all cases. But one will need to come with a more generic thinking as to create a new pass that handles this. Something like an abstract known bits, that tells that two values hold the same bits coming from a given instruction, or some simplification by demanded bits from the same values. It is feasible, but it is not my intention to do it so now.

Did you look at (new)-GVN to see if it fits in there?

I must confess that I did not quite understand all the work-flow of newGVN, but from what I did see, it mostly wouldn't fit. It seems to behave like InstCombine, expecting to replace the current instruction being visited. And it would require to create one value as to detect if there is a leader of that value and then reuse it. It is not that complicated, but quite awkward IMO.

If this is in instcombine (in addition to missing the pattern when there is no 'or'), I think you have to limit the transform based on uses as Roman mentioned in an earlier comment.

Why? The @mulUses in the test shows the case where it does not require %x2 to have a single user. The shift operation dominates the and being replaced.

And I agree with @samparker, our motivating example is quite simple, is the @foo in our tests. I just intended to made it more generic so it would act in more shift operations, not to over complicate it. I acknowledge that doing the transformation in the IR could be good, but the DAG fits much simpler. So I really see no problems in having it both places (as in D48278). Either way, it would be nice to come to a definitive solution.

Moving this pattern matching to AggressiveInstCombine following a suggestion of @lebedev.ri . Now it searches for minimal required patterns as desired by @spatel.

Added missing test-file

dnsampaio edited the summary of this revision. (Show Details)Aug 13 2018, 7:30 AM

samparker removed a reviewer: samparker.Feb 7 2019, 8:38 AM

lebedev.ri requested changes to this revision.Jun 21 2019, 10:51 AM

This revision now requires changes to proceed.Jun 21 2019, 10:51 AM

Hi @lebedev.ri,
Nice you looked this one as I am not quite sure what to do about it. Any suggestions?

Perhaps this makes it clearer:
https://rise4fun.com/Alive/4TLv

In D49229#1562329, @dnsampaio wrote:

Hi @lebedev.ri,
Nice you looked this one as I am not quite sure what to do about it. Any suggestions?

As it can be seen from the disscussion here, while the problem this is trying to solve is real,
the actual solution that should be done is very much non-obvious. It kind-of doesn't anywhere.
Or maybe there's some astounding perf numbers (SPEC?) that justify this solution?
I don't have any further ideas presently, sorry. I only marked it to remove from review queue.

dnsampaio abandoned this revision.Jun 28 2019, 8:10 AM

Revision Contents

Path

Size

include/

llvm/

Transforms/

Utils/

Local.h

1 line

lib/

Transforms/

AggressiveInstCombine/

AggressiveInstCombine.cpp

1 line

Utils/

Local.cpp

223 lines

test/

Transforms/

AggressiveInstCombine/

EliminateRedundantMasks.ll

244 lines

Diff 159530

include/llvm/Transforms/Utils/Local.h

	Show First 20 Lines • Show All 166 Lines • ▼ Show 20 Lines
	/// Scan the specified basic block and try to simplify any instructions in it			/// Scan the specified basic block and try to simplify any instructions in it
	/// and recursively delete dead instructions.			/// and recursively delete dead instructions.
	///			///
	/// This returns true if it changed the code, note that it can delete			/// This returns true if it changed the code, note that it can delete
	/// instructions in other blocks as well in this block.			/// instructions in other blocks as well in this block.
	bool SimplifyInstructionsInBlock(BasicBlock *BB,			bool SimplifyInstructionsInBlock(BasicBlock *BB,
	const TargetLibraryInfo *TLI = nullptr);			const TargetLibraryInfo *TLI = nullptr);

				bool EliminateRedundantMasks(BasicBlock &BB);
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Control Flow Graph Restructuring.			// Control Flow Graph Restructuring.
	//			//

	/// Like BasicBlock::removePredecessor, this method is called when we're about			/// Like BasicBlock::removePredecessor, this method is called when we're about
	/// to delete Pred as a predecessor of BB. If BB contains any PHI nodes, this			/// to delete Pred as a predecessor of BB. If BB contains any PHI nodes, this
	/// drops the entries in the PHI nodes for Pred.			/// drops the entries in the PHI nodes for Pred.
	///			///
	▲ Show 20 Lines • Show All 294 Lines • Show Last 20 Lines

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

	Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines
	}			}

	/// This is the entry point for folds that could be implemented in regular			/// This is the entry point for folds that could be implemented in regular
	/// InstCombine, but they are separated because they are not expected to			/// InstCombine, but they are separated because they are not expected to
	/// occur frequently and/or have more than a constant-length pattern match.			/// occur frequently and/or have more than a constant-length pattern match.
	static bool foldUnusualPatterns(Function &F, DominatorTree &DT) {			static bool foldUnusualPatterns(Function &F, DominatorTree &DT) {
	bool MadeChange = false;			bool MadeChange = false;
	for (BasicBlock &BB : F) {			for (BasicBlock &BB : F) {
				MadeChange = EliminateRedundantMasks(BB);
	// Ignore unreachable basic blocks.			// Ignore unreachable basic blocks.
	if (!DT.isReachableFromEntry(&BB))			if (!DT.isReachableFromEntry(&BB))
	continue;			continue;
	// Do not delete instructions under here and invalidate the iterator.			// Do not delete instructions under here and invalidate the iterator.
	// Walk the block backwards for efficiency. We're matching a chain of			// Walk the block backwards for efficiency. We're matching a chain of
	// use->defs, so we're more likely to succeed by starting from the bottom.			// use->defs, so we're more likely to succeed by starting from the bottom.
	// Also, we want to avoid matching partial patterns.			// Also, we want to avoid matching partial patterns.
	// TODO: It would be more efficient if we removed dead instructions			// TODO: It would be more efficient if we removed dead instructions
	▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

lib/Transforms/Utils/Local.cpp

Show All 17 Lines
#include "llvm/ADT/DenseMapInfo.h"		#include "llvm/ADT/DenseMapInfo.h"
#include "llvm/ADT/DenseSet.h"		#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/Hashing.h"		#include "llvm/ADT/Hashing.h"
#include "llvm/ADT/None.h"		#include "llvm/ADT/None.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/TinyPtrVector.h"		#include "llvm/ADT/TinyPtrVector.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/EHPersonalities.h"		#include "llvm/Analysis/EHPersonalities.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LazyValueInfo.h"		#include "llvm/Analysis/LazyValueInfo.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
▲ Show 20 Lines • Show All 572 Lines • ▼ Show 20 Lines	#endif

while (!WorkList.empty()) {		while (!WorkList.empty()) {
Instruction *I = WorkList.pop_back_val();		Instruction *I = WorkList.pop_back_val();
MadeChange \|= simplifyAndDCEInstruction(I, WorkList, DL, TLI);		MadeChange \|= simplifyAndDCEInstruction(I, WorkList, DL, TLI);
}		}
return MadeChange;		return MadeChange;
}		}

		/// Eliminate redundant masks
		/// let the pattern be:
		/// X1 = and(X, c1)
		/// X2 = and(shift(X, c3), c2)
		/// where c1, c2, c3 are constants
		/// We try to infer if we can just reuse X1 to compute X2
		/// obtaining X2 = shift(X1, c2)
		static cl::opt<unsigned> EliminateRedundantMasksSearchLimit(
		"erm-search-limit", cl::desc("Maximum instructions to compare a mask to."),
		cl::init(6), cl::Hidden);

		bool llvm::EliminateRedundantMasks(BasicBlock &BB) {
		struct APIntLT {
		bool operator()(APInt a, APInt b) { return a.ult(b); }
		};

		typedef std::map<APInt, BinaryOperator *, APIntLT> ANDS;

		bool HasChanges = false;
		// Cache for every masked value the bits that we don't know to be zero
		std::map<Value *, APInt> ValuesNonZeroCache;
		// A given mask should be extracted from a value once. Cache tem as to
		// detect duplicated
		std::map<Value *, ANDS> Ands;
		// Duplicated masking operations are removed at the 2nd stage
		SmallSet<Instruction *, 8> ToRemove;

		// 1st step: Cache all (and X, c1) values, using key (X, "effective c1")
		// If we detect duplicated AND operations, erase those that come after.
		for (Instruction &I : BB) {
		const Type *T = I.getType();
		// At the moment we limit it to work in integer types, sized <= 64bits
		if (!(T->isIntegerTy() && T->isSized()) \|\| T->isVectorTy())
		continue;

		const APInt *MaskR;
		BinaryOperator *AndOp;
		Value *X;

		if (!match(&I,
		m_CombineAnd(m_BinOp(AndOp), m_And(m_Value(X), m_APInt(MaskR)))))
		continue;

		APInt Mask = *MaskR;
		if (ValuesNonZeroCache.find(X) == ValuesNonZeroCache.end()) {
		KnownBits KB =
		computeKnownBits(X, BB.getParent()->getParent()->getDataLayout());
		APInt NotZero = ~KB.Zero;

		Mask &= NotZero;
		ValuesNonZeroCache[X] = NotZero;
		}

		if (BinaryOperator *ReferenceAnd = Ands[X][Mask]) {
		// Replace all uses of this found and with the already cached one
		LLVM_DEBUG(dbgs() << "Replacing: "; AndOp->dump(); dbgs() << " With: ";
		ReferenceAnd->dump());
		AndOp->replaceAllUsesWith(ReferenceAnd);
		ToRemove.insert(AndOp);
		HasChanges = true;
		continue;
		}

		Ands[X][Mask] = AndOp;
		LLVM_DEBUG(dbgs() << "Value: "; X->dump(); dbgs() << "\t is masked by ";
		AndOp->dump();
		dbgs() << "\tThe mask: 0x" << Mask.toString(16, false) << '\n');
		}

		// 2nd stage: We remove the dead masking operations
		for (Instruction *I : ToRemove) {
		I->removeFromParent();
		I->deleteValue();
		}

		ToRemove.clear();

		// 3rd stage: We check backwards if masking of shift operations also extract
		// the same mask, replacing their operand for the existing mask.

		// Once we decided to reuse a given value, we must ensure all (and) ops
		// dominate their uses. ToExecuteBefore holds the first user for an (and).
		std::map<BinaryOperator , BinaryOperator > ToExecuteBefore;

		for (auto II = BB.rbegin(); II != BB.rend(); II++) {
		Instruction I = &II;
		const Type *T = I->getType();
		if (!(T->isIntegerTy() && T->isSized()) \|\| T->isVectorTy())
		continue;

		ConstantInt *ShiftAmtC;
		const APInt *SMask;
		BinaryOperator Shift, DeadAnd;
		Value *X;

		if (!match(I, m_CombineAnd(
		m_BinOp(DeadAnd),
		m_And(m_CombineAnd(
		m_BinOp(Shift),
		m_OneUse(m_Shift(m_Value(X),
		m_ConstantInt(ShiftAmtC)))),
		m_APInt(SMask)))))
		continue;
		LLVM_DEBUG(dbgs() << "Matched: "; DeadAnd->dump(); Shift->dump(););

		auto ValueAndsI = Ands.find(X);
		if (ValueAndsI == Ands.end())
		continue;

		const APInt *ShiftAmt = &ShiftAmtC->getValue();
		const unsigned VShiftAmt = ShiftAmt->getZExtValue();
		const bool SafeAShr = SMask->countLeadingOnes() < VShiftAmt;
		const bool AShrToLShr = ValuesNonZeroCache[X].isSignBitClear() \|\|
		SMask->countLeadingZeros() >= VShiftAmt;
		const auto Opcode = Shift->getOpcode();
		// We try to find an direct match of the masked value
		BinaryOperator *RAnd = nullptr;
		if (Opcode == Instruction::Shl) {
		APInt EffectiveMask = ValuesNonZeroCache[X] & SMask->lshr(VShiftAmt);
		LLVM_DEBUG(dbgs() << "\tThe mask: 0x" << EffectiveMask.toString(16, false)
		<< '\n');

		RAnd = ValueAndsI->second[EffectiveMask];
		} else if (Opcode == Instruction::LShr \|\| SafeAShr \|\| AShrToLShr) {
		APInt EffectiveMask = ValuesNonZeroCache[X] & SMask->shl(VShiftAmt);
		LLVM_DEBUG(dbgs() << "\tThe mask: 0x" << EffectiveMask.toString(16, false)
		<< "\nNon Zero mask: 0x"
		<< ValuesNonZeroCache[X].toString(16, false) << '\n');

		RAnd = ValueAndsI->second[EffectiveMask];
		}

		// If we can't match it, try to explore the existing masks to see if any of
		// them suits our required bits. Limited to search up to
		// n masks = erm-search-limit[default=4].
		if (!RAnd) {
		unsigned C = EliminateRedundantMasksSearchLimit;
		for (auto ToTest = ValueAndsI->second.begin(),
		End = ValueAndsI->second.end();
		ToTest != End && C != 0; C++, ToTest++) {
		if (Shift->getOpcode() == Instruction::Shl) {
		if (ToTest->first.shl(VShiftAmt) == *SMask) {
		RAnd = ToTest->second;
		break;
		}
		} else if (Shift->getOpcode() == Instruction::LShr) {
		if (ToTest->first.lshr(VShiftAmt) == *SMask) {
		RAnd = ToTest->second;
		break;
		}
		}
		// Instruction::AShr
		else if (ToTest->first.ashr(VShiftAmt) == *SMask \|\|
		(AShrToLShr && ToTest->first.lshr(VShiftAmt) == *SMask)) {
		RAnd = ToTest->second;
		break;
		}
		}
		}

		if (!RAnd)
		continue;

		LLVM_DEBUG(dbgs() << "Reusing result of: "; RAnd->dump();
		dbgs() << " To compute: "; Shift->dump();
		dbgs() << " Eliminating: "; DeadAnd->dump());
		HasChanges = true;
		if (Opcode == Instruction::AShr && AShrToLShr) {
		BinaryOperator *newShift = BinaryOperator::CreateLShr(RAnd, ShiftAmtC);
		newShift->insertBefore(Shift);
		Shift->replaceAllUsesWith(newShift);
		Shift->removeFromParent();
		Shift->deleteValue();
		Shift = newShift;
		} else
		Shift->setOperand(0, RAnd);
		#ifndef NDEBUG
		KnownBits Before =
		computeKnownBits(DeadAnd, BB.getParent()->getParent()->getDataLayout());
		KnownBits After =
		computeKnownBits(Shift, BB.getParent()->getParent()->getDataLayout());
		(void)Before;
		(void)After;
		LLVM_DEBUG(dbgs() << "";

		if (!(Before.Zero == After.Zero && Before.One == After.One)) {
		dbgs() << "Before zero: 0x" << Before.Zero.toString(16, false)
		<< "\nAfter zero: 0x" << After.Zero.toString(16, false)
		<< "\nBefore one: 0x" << Before.One.toString(16, false)
		<< "\nAfter one: 0x" << After.One.toString(16, false);
		BB.dump();
		errs() << "This transformation is invalid!";
		});
		assert(Before.Zero == After.Zero && Before.One == After.One);
		#endif
		DeadAnd->replaceAllUsesWith(Shift);
		ValuesNonZeroCache.erase(DeadAnd);
		Ands.erase(DeadAnd);
		ToRemove.insert(DeadAnd);
		ToExecuteBefore[RAnd] = Shift;
		}

		for (Instruction *I : ToRemove) {
		I->removeFromParent();
		I->deleteValue();
		}

		ToRemove.clear();

		auto End = BB.end();
		for (auto &ProducerUser : ToExecuteBefore) {
		auto Producer = ProducerUser.first->getIterator();
		auto User = ProducerUser.second->getIterator();
		while (++Producer != End && Producer != User)
		;
		if (Producer == End)
		ProducerUser.first->moveBefore(ProducerUser.second);
		}
		ToExecuteBefore.clear();
		return HasChanges;
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Control Flow Graph Restructuring.		// Control Flow Graph Restructuring.
//		//

/// RemovePredecessorAndSimplify - Like BasicBlock::removePredecessor, this		/// RemovePredecessorAndSimplify - Like BasicBlock::removePredecessor, this
/// method is called when we're about to delete Pred as a predecessor of BB. If		/// method is called when we're about to delete Pred as a predecessor of BB. If
/// BB contains any PHI nodes, this drops the entries in the PHI nodes for Pred.		/// BB contains any PHI nodes, this drops the entries in the PHI nodes for Pred.
///		///
▲ Show 20 Lines • Show All 2,207 Lines • Show Last 20 Lines

test/Transforms/AggressiveInstCombine/EliminateRedundantMasks.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -aggressive-instcombine %s -o - \| FileCheck %s

				; Fold redundant masking operations of shifted value
				; In a case where
				; x1 = a & 0xFF00
				; x2 = a >> 8 & 0xFF
				; we can see x2 as:
				; x2 = a >> 8 & 0xFF00 >> 8
				; that can be translated to
				; x2 = (a & 0xFF00) >> 8
				; that is
				; x2 = x1 >> 8


				define i32 @shl(i16 %a) {
				; CHECK-LABEL: @shl(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[SXT:%.]] = sext i16 [[A:%.]] to i32
				; CHECK-NEXT: [[AND:%.*]] = and i32 [[SXT]], 172
				; CHECK-NEXT: [[SH:%.*]] = shl i32 [[AND]], 8
				; CHECK-NEXT: [[OUT:%.*]] = or i32 [[AND]], [[SH]]
				; CHECK-NEXT: ret i32 [[OUT]]
				;
				entry:
				%sxt = sext i16 %a to i32
				%sh = shl i32 %sxt, 8
				%and = and i32 %sxt, 172
				%dead = and i32 %sh, 44032
				%out = or i32 %and, %dead
				ret i32 %out
				}

				define i32 @lshr(i16 %a) {
				; CHECK-LABEL: @lshr(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[SXT:%.]] = sext i16 [[A:%.]] to i32
				; CHECK-NEXT: [[AND:%.*]] = and i32 [[SXT]], 44032
				; CHECK-NEXT: [[SH:%.*]] = lshr i32 [[AND]], 8
				; CHECK-NEXT: [[OUT:%.*]] = or i32 [[SH]], [[AND]]
				; CHECK-NEXT: ret i32 [[OUT]]
				;
				entry:
				%sxt = sext i16 %a to i32
				%sh = lshr i32 %sxt, 8
				%dead = and i32 %sh, 172
				%and = and i32 %sxt, 44032
				%out = or i32 %dead, %and
				ret i32 %out
				}

				define i32 @ashr(i16 %a) {
				; CHECK-LABEL: @ashr(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[EXT:%.]] = sext i16 [[A:%.]] to i32
				; CHECK-NEXT: [[AND:%.*]] = and i32 [[EXT]], 44032
				; CHECK-NEXT: [[TMP0:%.*]] = lshr i32 [[AND]], 8
				; CHECK-NEXT: [[OUT:%.*]] = or i32 [[AND]], [[TMP0]]
				; CHECK-NEXT: ret i32 [[OUT]]
				;
				entry:
				%ext = sext i16 %a to i32
				%and = and i32 %ext, 44032
				%sh = ashr i32 %ext, 8
				%dead = and i32 %sh, 172
				%out = or i32 %and, %dead
				ret i32 %out
				}

				;Check sing bit is one, we can't ignore upper lost bits
				define i32 @ashr_no_good(i16 %a) {
				; CHECK-LABEL: @ashr_no_good(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[EXT:%.]] = sext i16 [[A:%.]] to i32
				; CHECK-NEXT: [[SH:%.*]] = ashr i32 [[EXT]], 8
				; CHECK-NEXT: [[NOTDEAD:%.*]] = and i32 [[SH]], 2113895935
				; CHECK-NEXT: [[AND:%.*]] = and i32 [[EXT]], 32255
				; CHECK-NEXT: [[OUT:%.*]] = or i32 [[AND]], [[NOTDEAD]]
				; CHECK-NEXT: ret i32 [[OUT]]
				;
				entry:
				%ext = sext i16 %a to i32
				%sh = ashr i32 %ext, 8
				%notdead = and i32 %sh, 2113895935 ; 0x7DFF7DFF
				%and = and i32 %ext, 32255 ; 0x7DFF
				%out = or i32 %and, %notdead
				ret i32 %out
				}

				define i32 @ashr2(i16 %a) {
				; CHECK-LABEL: @ashr2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[EXT:%.]] = sext i16 [[A:%.]] to i32
				; CHECK-NEXT: [[AND:%.*]] = and i32 [[EXT]], 65280
				; CHECK-NEXT: [[TMP0:%.*]] = lshr i32 [[AND]], 8
				; CHECK-NEXT: [[OUT:%.*]] = or i32 [[AND]], [[TMP0]]
				; CHECK-NEXT: ret i32 [[OUT]]
				;
				entry:
				%ext = sext i16 %a to i32
				%and = and i32 %ext, 65280
				%sh = ashr i32 %ext, 8
				%dead = and i32 %sh, 255
				%out = or i32 %and, %dead
				ret i32 %out
				}

				define i16 @ashr3(i16 %a) {
				; CHECK-LABEL: @ashr3(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[AND:%.]] = and i16 [[A:%.]], -256
				; CHECK-NEXT: [[TMP0:%.*]] = lshr i16 [[AND]], 8
				; CHECK-NEXT: [[OUT:%.*]] = or i16 [[AND]], [[TMP0]]
				; CHECK-NEXT: ret i16 [[OUT]]
				;
				entry:
				%and = and i16 %a, 65280
				%sh = ashr i16 %a, 8
				%dead = and i16 %sh, 255
				%out = or i16 %and, %dead
				ret i16 %out
				}

				define i32 @ashr_nogood(i16 %a) {
				; CHECK-LABEL: @ashr_nogood(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[EXT:%.]] = sext i16 [[A:%.]] to i32
				; CHECK-NEXT: [[AND:%.*]] = and i32 [[EXT]], 65280
				; CHECK-NEXT: [[SH:%.*]] = ashr i32 [[EXT]], 8
				; CHECK-NEXT: [[NOTDEAD:%.*]] = and i32 [[SH]], 65535
				; CHECK-NEXT: [[OUT:%.*]] = or i32 [[AND]], [[NOTDEAD]]
				; CHECK-NEXT: ret i32 [[OUT]]
				;
				entry:
				%ext = sext i16 %a to i32
				%and = and i32 %ext, 65280
				%sh = ashr i32 %ext, 8
				%notdead = and i32 %sh, 65535
				%out = or i32 %and, %notdead
				ret i32 %out
				}

				define i32 @shl_nogood(i16 %a) {
				; CHECK-LABEL: @shl_nogood(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[EXT:%.]] = sext i16 [[A:%.]] to i32
				; CHECK-NEXT: [[AND:%.*]] = and i32 [[EXT]], 172
				; CHECK-NEXT: [[SH:%.*]] = shl i32 [[EXT]], [[AND]]
				; CHECK-NEXT: [[NOTDEAD:%.*]] = and i32 [[SH]], 44032
				; CHECK-NEXT: [[OUT:%.*]] = or i32 [[AND]], [[NOTDEAD]]
				; CHECK-NEXT: ret i32 [[OUT]]
				;
				entry:
				%ext = sext i16 %a to i32
				%and = and i32 %ext, 172
				%sh = shl i32 %ext, %and
				%notdead = and i32 %sh, 44032
				%out = or i32 %and, %notdead
				ret i32 %out
				}

				define void @foo(i16* %a) {
				; CHECK-LABEL: @foo(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[LD:%.]] = load i16, i16 [[A:%.*]], align 2
				; CHECK-NEXT: [[CONV:%.*]] = sext i16 [[LD]] to i32
				; CHECK-NEXT: [[AND:%.*]] = and i32 [[CONV]], 65280
				; CHECK-NEXT: [[SH:%.*]] = lshr i32 [[AND]], 8
				; CHECK-NEXT: [[OR:%.*]] = or i32 [[SH]], [[AND]]
				; CHECK-NEXT: [[CONV4:%.*]] = trunc i32 [[OR]] to i16
				; CHECK-NEXT: store i16 [[CONV4]], i16* [[A]], align 2
				; CHECK-NEXT: ret void
				;
				entry:
				%ld = load i16, i16* %a, align 2
				%conv = sext i16 %ld to i32
				%and = and i32 %conv, 65280
				%sh = lshr i32 %conv, 8
				%and3 = and i32 %sh, 255
				%or = or i32 %and3, %and
				%conv4 = trunc i32 %or to i16
				store i16 %conv4, i16* %a, align 2
				ret void
				}

				; Check that all uses of %dead are replaced
				define i32 @mulUses(i32 %a) {
				; CHECK-LABEL: @mulUses(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[AND:%.]] = and i32 [[A:%.]], 44032
				; CHECK-NEXT: [[SH:%.*]] = lshr i32 [[AND]], 8
				; CHECK-NEXT: [[OR:%.*]] = or i32 [[SH]], [[AND]]
				; CHECK-NEXT: [[ADD:%.*]] = add i32 [[OR]], [[SH]]
				; CHECK-NEXT: [[OUT:%.*]] = mul i32 [[ADD]], [[SH]]
				; CHECK-NEXT: ret i32 [[OUT]]
				;
				entry:
				%sh = lshr i32 %a, 8
				%dead = and i32 %sh, 172
				%and = and i32 %a, 44032
				%or = or i32 %dead, %and
				%add = add i32 %or, %dead
				%out = mul i32 %add, %dead
				ret i32 %out
				}

				; If the shift operation is used elsewhere, it must live
				define i32 @mulUsesShift_no_good(i32 %a) {
				; CHECK-LABEL: @mulUsesShift_no_good(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[SH:%.]] = lshr i32 [[A:%.]], 8
				; CHECK-NEXT: [[NOTDEAD:%.*]] = and i32 [[SH]], 172
				; CHECK-NEXT: [[AND:%.*]] = and i32 [[A]], 44032
				; CHECK-NEXT: [[OR:%.*]] = or i32 [[NOTDEAD]], [[AND]]
				; CHECK-NEXT: [[ADD:%.*]] = add i32 [[OR]], [[NOTDEAD]]
				; CHECK-NEXT: [[OUT:%.*]] = mul i32 [[ADD]], [[SH]]
				; CHECK-NEXT: ret i32 [[OUT]]
				;
				entry:
				%sh = lshr i32 %a, 8
				%notdead = and i32 %sh, 172
				%and = and i32 %a, 44032
				%or = or i32 %notdead, %and
				%add = add i32 %or, %notdead
				%out = mul i32 %add, %sh
				ret i32 %out
				}

				define void @shift_r1(i32 %x) {
				; CHECK-LABEL: @shift_r1(
				; CHECK-NEXT: [[AND:%.]] = and i32 [[X:%.]], 172
				; CHECK-NEXT: [[SH:%.*]] = shl i32 [[AND]], 8
				; CHECK-NEXT: [[CL1:%.*]] = call i32 @mulUses(i32 [[SH]])
				; CHECK-NEXT: [[CL2:%.*]] = call i32 @mulUses(i32 [[AND]])
				; CHECK-NEXT: ret void
				;
				%sh = shl i32 %x, 8
				%dead = and i32 %sh, 44032
				%cl1 = call i32 @mulUses(i32 %dead)
				%and = and i32 %x, 172
				%cl2 = call i32 @mulUses(i32 %and)
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AggressiveInstCombine] Fold redundant masking operations of shifted valueAbandonedPublic

Details

>

Diff Detail

Event Timeline

Revision Contents

Diff 159530

include/llvm/Transforms/Utils/Local.h

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

lib/Transforms/Utils/Local.cpp

test/Transforms/AggressiveInstCombine/EliminateRedundantMasks.ll

[AggressiveInstCombine] Fold redundant masking operations of shifted value
AbandonedPublic