This is an archive of the discontinued LLVM Phabricator instance.

[LogicCombine 1/?] Implement a general way to simplify logical operations.
ClosedPublic

Authored by bcl5980 on Jan 28 2023, 2:15 AM.

Details

Summary

This patch involves boolean ring to simplify logical operations. We can treat & as ring multiplication and ^ as ring addition.
So we need to canonicalize all other operations to * +. Like:

a & b -> a * b
a ^ b -> a + b
~a -> a + 1
a | b -> a * b + a + b
c ? a : b -> c * a + (c + 1) * b

In the code, we use a mask set to represent an expression. Every value that is not comes from logical operations could be a bit in the mask.
The mask itself is a multiplication chain. The mask set is an addiction chain.
We can calculate two expressions based on boolean algebras.

For now, the initial patch only enabled on and/or/xor, Later we can enhance the code step by step.

Reference: https://en.wikipedia.org/wiki/Boolean_ring

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
bcl5980 updated this revision to Diff 493831.Jan 31 2023, 10:58 PM

variable rename. NFC

bcl5980 retitled this revision from [AggressiveInstCombine] Implement a general way to simplify complex logical operations. to [ComplexLogicCombine 1/?] Implement a general way to simplify complex logical operations..
bcl5980 updated this revision to Diff 493875.Feb 1 2023, 1:52 AM

fix a bug in exprAnd

bcl5980 updated this revision to Diff 494173.Feb 1 2023, 11:26 PM

Define LogicalExpr as a class and split it into a new header file.

bcl5980 updated this revision to Diff 494201.Feb 2 2023, 1:22 AM
bcl5980 updated this revision to Diff 494505.Feb 2 2023, 10:09 PM

add one more test.

bcl5980 updated this revision to Diff 494540.Feb 3 2023, 12:57 AM

This is cool!

bcl5980 updated this revision to Diff 494597.Feb 3 2023, 5:52 AM

Fix test failure.

Ping.
Anyone can help to review it? Or is this necessary for llvm?
I try to run it on test-suite, it looks only spec2017 502.gcc can trigger this 6 times.

spatel added a comment.Feb 6 2023, 6:48 AM

Ping.
Anyone can help to review it? Or is this necessary for llvm?
I try to run it on test-suite, it looks only spec2017 502.gcc can trigger this 6 times.

It seems like a nice improvement. It would be great if it eventually allows removing some of the pattern-specific complex logic reductions that we have in InstCombine/InstSimplify:
https://github.com/llvm/llvm-project/blob/b5ee4f755fcff56243f6ff0cea9e7a722259304a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1719

I had not seen "boolean ring" before this. Do you know if that is implemented in any other programs/compilers?

Ping.
Anyone can help to review it? Or is this necessary for llvm?
I try to run it on test-suite, it looks only spec2017 502.gcc can trigger this 6 times.

It seems like a nice improvement. It would be great if it eventually allows removing some of the pattern-specific complex logic reductions that we have in InstCombine/InstSimplify:
https://github.com/llvm/llvm-project/blob/b5ee4f755fcff56243f6ff0cea9e7a722259304a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1719

The initial patch can't replace the code in foldComplexAndOrPatterns. I will continue work on it that maybe one day we can replace all patterns in foldComplexAndOrPatterns.
For now, this code only enabled on O3, but the foldComplexAndOrPatterns may work on O2 also. I'm not sure the CPU overhead for this patch. If later we can proof this is not a big overhead we can move it to Instcombine.

I had not seen "boolean ring" before this. Do you know if that is implemented in any other programs/compilers?

I'm sorry I am a beginner on compiler, the only compiler I touch is Clang/LLVM. So I don't know if any other programs/compiler have similar code or not. This patch is written based on my understand of boolean ring.

I'm still trying to understand how this works, so I only looked at the high-level comments to start.

Is it possible to convert existing logic tests in InstCombien/InstSimplify to use branches, then run those tests with "opt -aggressive-instcombine" and verify that the results are correct?

llvm/lib/Transforms/AggressiveInstCombine/ComplexLogicalOpsCombine.cpp
14 ↗(On Diff #494597)

I don't understand the term "unsigned set" in this context. Could this be "bit set"? But this is also what you are calling a "mask" later in this description and in the code itself?

15 ↗(On Diff #494597)

Word choice:
"Every value that is not comes from logic operation should be the leaf node." ->
"Any value that is not a logic operation is a leaf node." ?

21 ↗(On Diff #494597)

Now it says "unsigned value", but didn't we just define that as "mask"?
Can we replace "unsigned set" with "set of masks"?

25 ↗(On Diff #494597)

Is "b * c" -> 6 in this example?

llvm/lib/Transforms/AggressiveInstCombine/LogicalExpr.h
10 ↗(On Diff #494597)

Is it correct to write it like this:
For source values {a,b,c,d}, we can represent them as a bitmask with 'a' as the least-significant-bit: {dcba}.
?

13 ↗(On Diff #494597)

Does the multiplication between LHS and RHS mean the top-level logic operation in this example is an "and"?

15 ↗(On Diff #494597)

I don't understand what happened in these steps. What is the relationship of "*" and "+" to "|" and ","?

43–44 ↗(On Diff #494597)

If we are using uin64_t as the basic "mask" of values and we have magic constants for the 2 high-bits, does it mean that the "MaxLogicOpLeafsToScan" must be less than 62? Is that enforced with assertions or other logic?

bcl5980 added inline comments.Feb 13 2023, 6:39 PM
llvm/lib/Transforms/AggressiveInstCombine/ComplexLogicalOpsCombine.cpp
14 ↗(On Diff #494597)

I will update the comments to set of masks.

25 ↗(On Diff #494597)

Yeah, thanks for the finding.

llvm/lib/Transforms/AggressiveInstCombine/LogicalExpr.h
13 ↗(On Diff #494597)

Yes, it means LHS & RHS

15 ↗(On Diff #494597)

That's a little tricky here. The bitset means and/multiplication chain. So when we do a & b, the bitset should be 1 | 2 to set a and b 's corresponding bits.
"," actually the same to "+", I just want to show the pattern means LHS.
How about I write the comments like:

{0b1111, 0b1001, 0b0010 , 0b1101} * {0b0001, 0b0101}
-->
(0b1111 + 0b1001 + 0b0010 + 0b1101) * (0b0001 + 0b0101)
-->
(0b1111 + 0b1001 + 0b0010 + 0b1101) * 0b0001+ (0b1111 + 0b1001 + 0b0010 + 0b1101) * 0b0101
-->
(0b1111 | 0b0001) + (0b1001| 0b0001) + (0b0010 | 0b0001) + (0b1101 | 0b0001) + (0b1111 | 0b0101) + (0b1001| 0b0101) + (0b0010 + 0b0101) + (0b1101 + 0b0101)
--> 
(0b1111 + 0b1001 + 0b0010 + 0b1101 + 0b1111 + 0b1101 + 0b0111 + 0b1101
-->
0b1001 + 0b0010 + 0b1101 + 0b0111
-->
{0b1001, 0b0010, 0b1101, 0b0111}
-->
a * d + b + a * c * d + a * b * c
43–44 ↗(On Diff #494597)

Yeah, you are right. We need an assertion here to make sure the max leaf number is less than 62. I will update it later.

I'm still trying to understand how this works, so I only looked at the high-level comments to start.

Is it possible to convert existing logic tests in InstCombien/InstSimplify to use branches, then run those tests with "opt -aggressive-instcombine" and verify that the results are correct?

I'm not sure if it is possible, but if we want to test more cases I can try to add an option to make it work on every logical operations. And I can run local to test but I'm not sure how to show the result on the review.

bcl5980 updated this revision to Diff 497199.Feb 13 2023, 9:42 PM

Address comments.

I'm still trying to understand how this works, so I only looked at the high-level comments to start.

Is it possible to convert existing logic tests in InstCombien/InstSimplify to use branches, then run those tests with "opt -aggressive-instcombine" and verify that the results are correct?

@spatel I try to run all tests of InstCombine and some of current tests can be improved. The detail diff is D144071.

bcl5980 updated this revision to Diff 497553.Feb 14 2023, 11:09 PM

Move complex-logic-combine to Analysis

I'm still trying to understand how this works, so I only looked at the high-level comments to start.

Is it possible to convert existing logic tests in InstCombien/InstSimplify to use branches, then run those tests with "opt -aggressive-instcombine" and verify that the results are correct?

@spatel I try to run all tests of InstCombine and some of current tests can be improved. The detail diff is D144071.

Nice! The results look promising. I'm still trying to understand how this works by reading the code (rather than reading the wikipedia reference page).
I'm confused, so it's still just a couple of high-level questions/comments.

llvm/include/llvm/Analysis/LogicalExpr.h
17

I'm still confused by the notation. Each "-->" step needs a comment to describe exactly what is happening. If we are not showing some unique math/logic property with each of the terms in the equation/set, then it would be easier to follow the logic with a smaller example.

In this step, we are splitting the RHS masks to operate over the LHS? But are those "+" and "*" symbols representing real math operations or are they bitwise logical operations?

21

Here we have distributed the RHS mask values over the LHS mask values? Why did "*" become "|"?

25

I don't know what operation was done there. It's not logical-or or multiplication?

llvm/test/Transforms/AggressiveInstCombine/complex-logic.ll
4 ↗(On Diff #497553)

We're going to need more tests than this. Name each test to suggest what it demonstrates (or add test comments, so that is clear).

We should build up to the complex examples. Start with single value logic to prove that works:

define i32 @x_and_not_x(i32 %x) {
  %notx = xor i32 %x, -1
  %r = and i32 %x, %notx
  ret i32 %r
}

...and more like that. Next, show some 2 value tests. Then, show 3 values and finally 4 values.

bcl5980 added inline comments.Feb 15 2023, 7:26 PM
llvm/include/llvm/Analysis/LogicalExpr.h
17

"+" and "*" representing math operation on the boolean ring.
"+" is the same to xor, "*" is the same to and. And they also follow the distributive laws and commutative law like the normal "*", "+".

21

For example: ab * bd = abd
The expression ab * bd convert to mask will be 0b0011 * 0b1010.
The result abcd convert to mask will become 0b1011.
So for the "*" operation is actually "|" LHS and RHS 's masks.

25

For the "+" operation, we can replace them to xor. So if we find two mask is the same, we can remove both of them.

bcl5980 updated this revision to Diff 497929.Feb 16 2023, 1:40 AM

update more tests.

I like the test progression now. You could pre-commit the baseline file.

Can you replace the example in the code comment in LogicalExpr.h with the steps used to solve @leaf3_complex_ret_leaf?

That's a smaller test, but it seems like it covers many interesting reductions. If that's still too big to be a good introduction, then make it smaller.

The first line should be the expression in logical form using the usual C logic operators:

((a & b) | (a ^ c)) ^ (!(b & c) & a)

Next line: show how that is expanded to a logic form suitable for boolean ring - 'or' becomes 'and' and 'xor', so (make sure this is correct):

(((a & b) & (a ^ c)) ^ (a & b) ^ (a ^ c)) ^ (((b & c) ^ -1) & a)

Is convert to ring masks/operators the next step (or is this 2 steps)?

(((001 * 010) * (001 + 100)) + (001 * 010) + (001 + 100)) + (((010 * 100) + ???) * 001

Now go through each distributive, associative, destructive step just like the code would do and describe it.

bcl5980 updated this revision to Diff 498234.Feb 16 2023, 6:46 PM

Update comments.
Rebase tests.

@spatel , do you think this change can split into a individual pass? It looks the sequence of instruction iterator is reversed but I prefer to call the complex logic combine based on normal sequence.
But I'm not sure if this code is worth to add a new pass. If it is worth, where should this pass been inserted?

bcl5980 updated this revision to Diff 498266.Feb 17 2023, 12:17 AM

Treat mask value 0 as ExprZero and remove current ExprZero.

@spatel , do you think this change can split into a individual pass? It looks the sequence of instruction iterator is reversed but I prefer to call the complex logic combine based on normal sequence.
But I'm not sure if this code is worth to add a new pass. If it is worth, where should this pass been inserted?

It would probably be better for all of the transforms if we update AggressiveInstCombine to use a worklist instead of a simple iterator. VectorCombine was updated like that not too long ago.
But we can defer that to a follow-up if there's no concern about correctness.

I was curious if enabling this for all logic ops would cause any compile-time regressions, but it seems like it has almost no cost:
https://llvm-compile-time-tracker.com/compare.php?from=0e90cd7551f2d0b151f7406e8f3848ec54e650bf&to=ae505cb2a674ac4c240c94a74fc04ee274321697&stat=instructions:u

llvm/include/llvm/Analysis/LogicalExpr.h
10–16

This is difficult to parse. We must differentiate the logical ops "or" and "and" from the English words.
Header comments should use "///" to auto-generate doxygen.

See if this is a correct edit (adjust to fit 80-columns as necessary):

/// For a logical expression represented by bitmasks, the "and" logic operator
/// represented by "&" is translated to "*" and is then evaluated as the "or" of 
/// the bitmasks. For example, pattern "a & b" is represented by the logical 
/// expression "01 * 10", and the expression is reduced to "11". So the 
/// operation "&" between two logical expressions (not "xor", only "and" chain)
/// is actually bitwise "or" of the masks. There is one exception: if one of the 
/// operands is constant 0, the entire mask represents 0. We do not "or" the 
/// masks in that case.
18–23
/// The evaluation of a pattern for bitwise "xor" is represented by a "+" math operator. 
/// But it also has one exception to normal math rules: if two masks are identical, we 
/// remove them. For example with "a ^ a", the logical expression is "1 + 1". We eliminate 
/// them from the logical expression rather than "or" the bits.
///
/// We use commutative, associative, and distributive laws of arithmetic multiplication
/// and addition to reduce the expression.
33

How we got to this is still not obvious. Add at least one intermediate line between these two.
IIUC, we are evaluating "*" before "+" regardless of the parentheses?

86

It took me a long time to realize that "multiplication" in this expression is really just "bitwise-or". This could use a comment to make it clearer.

llvm/lib/Analysis/ComplexLogicCombine.cpp
9 ↗(On Diff #498266)

help to find -> attempts to find

20 ↗(On Diff #498266)

Use quotes around "and" / "xor" / "or" if we are referring to a logic operation.

22 ↗(On Diff #498266)

by a unsigned set -> as a chain of bitsets ?

25–26 ↗(On Diff #498266)

Delete the last line - that's an implementation detail that could change in the future.

bcl5980 updated this revision to Diff 498707.Feb 19 2023, 6:12 PM
bcl5980 edited the summary of this revision. (Show Details)

I was curious if enabling this for all logic ops would cause any compile-time regressions, but it seems like it has almost no cost:
https://llvm-compile-time-tracker.com/compare.php?from=0e90cd7551f2d0b151f7406e8f3848ec54e650bf&to=ae505cb2a674ac4c240c94a74fc04ee274321697&stat=instructions:u

Based on the result, can I just enable it for all logical operations by default and remove the option?

bcl5980 updated this revision to Diff 498754.Feb 20 2023, 1:09 AM

some renaming.

I was curious if enabling this for all logic ops would cause any compile-time regressions, but it seems like it has almost no cost:
https://llvm-compile-time-tracker.com/compare.php?from=0e90cd7551f2d0b151f7406e8f3848ec54e650bf&to=ae505cb2a674ac4c240c94a74fc04ee274321697&stat=instructions:u

Based on the result, can I just enable it for all logical operations by default and remove the option?

Yes, let's make the transform more general (and the patch becomes simpler if we remove the option).

spatel added inline comments.Feb 20 2023, 7:43 AM
llvm/include/llvm/Analysis/LogicalExpr.h
36

The "*" operation with -1 is also a special-case, so we should mention it in the text above here.
If I understand the code, if we have "a * -1", then the result is always "a".

37–39

Spelling: Caculate -> Calculate
Spelling: addiction -> addition

96

When we get here, we know that both LHSMask and RHSMask are not equal to "0".
Can we also assert that if a mask has ExprAllOne set, then no other bit in the mask is set?
So would it be clearer to move this check up and write this as:

// 1 & a -> a
// a & 1 -> a
if (LHSMask == ExprAllOne)
  NewMask = RHSMask;
else if (RHSMask == ExprAllOne)
  NewMask = LHSMask;
bcl5980 updated this revision to Diff 498982.Feb 20 2023, 5:29 PM

This seems close to ready to me. It would be great if other reviewers have a look too. :)

llvm/lib/Analysis/ComplexLogicCombine.cpp
1 ↗(On Diff #498982)

I think we should remove "Complex" from the name. This can handle all LogicOp combining now (and eventually, it could be part of regular InstCombine).

21 ↗(On Diff #498982)

We -> we

25 ↗(On Diff #498982)

Final -> Finally,

45 ↗(On Diff #498982)

If you agree with the earlier comment about removing "Complex" from the name, then change the "clc" also. It could just be "logic-combine-max-leafs" or something like that.

llvm/test/Transforms/AggressiveInstCombine/complex-logic.ll
12 ↗(On Diff #498982)

We should have more than i1 types in these tests if we are handling all bitwise logic now.

spatel added inline comments.Feb 21 2023, 1:21 PM
llvm/lib/Analysis/ComplexLogicCombine.cpp
65 ↗(On Diff #498982)

Should not need "llvm::" specifier here?

66 ↗(On Diff #498982)

The "LeafCnt == 0" is redundant. Move the "LeafBits == 0" check above the popcount for efficiency.

76 ↗(On Diff #498982)

Shouldn't need "llvm::" ?

135–136 ↗(On Diff #498982)

!BO->isBitwiseLogicOp()

180 ↗(On Diff #498982)

earsed -> erased

196 ↗(On Diff #498982)

What does this TODO mean? Either describe the planned follow-up with more details or remove this line.

203 ↗(On Diff #498982)

"can't be larger"

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
830–831

I.isBitwiseLogicOp()

869–870

What does the "for now" mean? Will it change in the future? What limitation does this imply?

bcl5980 added inline comments.Feb 21 2023, 6:35 PM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
869–870

There are 2 limitations to be trade off. here The higher level the LogicalOpsHelper create, the more logical node cached, which means it can save more cpu timing. But it will maintain more leaf nodes. By default the max of leaf node is 8 , which is not enough for whole function I guess.
So I write the comment here to mention me we can do something here later. Like split the helper based on types to make the code more efficient, adjust the default value of max leaf node number, use APInt to support more bits.

The most headache thing for me is test the cpu overhead. @nikic , can you add my github fork to the llvm-compile-time-tracker.com? This serial patches need a lot of CPU overhead tests I think.

bcl5980 added inline comments.Feb 21 2023, 6:38 PM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
869–870
bcl5980 updated this revision to Diff 499357.Feb 21 2023, 7:25 PM

Address comments.

bcl5980 updated this revision to Diff 499398.Feb 22 2023, 12:32 AM
bcl5980 retitled this revision from [ComplexLogicCombine 1/?] Implement a general way to simplify complex logical operations. to [LogicCombine 1/?] Implement a general way to simplify logical operations..Feb 22 2023, 1:00 AM

Please rename the test file as a preliminary step, so we will again show diffs in this patch. We also need to add negative tests to show current limitations and also that the combining is not making wrong logic reductions.

llvm/include/llvm/Analysis/LogicCombine.h
43

This isn't really a "helper" - this is the main part of the code. Can we name this "LogicCombiner"?

54–55

I don't think we need "LeafSet". If you make "LeafValues" a SmallSetVector, we won't insert duplicate values into the vector. Is that the only job of the LeafSet?

llvm/lib/Analysis/LogicCombine.cpp
134

This comment isn't necessary - I think the code is clear enough now.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
828

/// Reduce bitwise logic sequences.

869–875

Adjust wording (make sure I understand it correctly):

// TODO: Combining at the function-level would allow more 
// caching of nodes which saves on compile-time, but it may hit the
// max depth or value limits before finding a solution. We could split
// the helper based on types to make the code more efficient, adjust 
// the value of max depth/values, or use APInt to support tracking 
// more than 63 leaf values.

But I doubt that we have a real problem here for the vast majority of programs? We are tracking up to 8 different leaf values with a depth of 8 logic instructions. Maybe add (currently) negative tests with those large number of instructions to show what is needed to hit the limits?

885

Change name: foldComplexLogic -> foldBitwiseLogic

This call should be moved before "foldSqrt()" as the code comment suggests.

bcl5980 added inline comments.Feb 22 2023, 5:09 PM
llvm/include/llvm/Analysis/LogicCombine.h
54–55

I need LeafValues to access value by index. It looks SmallSetVector can't do that.

bcl5980 updated this revision to Diff 499702.Feb 22 2023, 6:16 PM

Address comments.

I need LeafValues to access value by index. It looks SmallSetVector can't do that.

SmallSetVector allows indexing. This is the patch I tried after applying this patch (I had to fix line endings first), and no tests fail:

diff --git a/llvm/include/llvm/Analysis/LogicCombine.h b/llvm/include/llvm/Analysis/LogicCombine.h
index 3fdcf7998321..56a3d8f36b16 100644
--- a/llvm/include/llvm/Analysis/LogicCombine.h
+++ b/llvm/include/llvm/Analysis/LogicCombine.h
@@ -8,8 +8,7 @@

 #include "LogicalExpr.h"
 #include "llvm/ADT/DenseMap.h"
-#include "llvm/ADT/SmallPtrSet.h"
-#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/IR/InstrTypes.h"
 #include "llvm/IR/Instruction.h"
@@ -50,8 +49,7 @@ private:
   friend class LogicalOpNode;

   SmallDenseMap<Value *, LogicalOpNode *, 16> LogicalOpNodes;
-  SmallPtrSet<Value *, 8> LeafSet;
-  SmallVector<Value *, 8> LeafValues;
+  SmallSetVector<Value *, 8> LeafValues;

   void clear();

diff --git a/llvm/lib/Analysis/LogicCombine.cpp b/llvm/lib/Analysis/LogicCombine.cpp
index 28d9488cab96..3b410cdacd32 100644
--- a/llvm/lib/Analysis/LogicCombine.cpp
+++ b/llvm/lib/Analysis/LogicCombine.cpp
@@ -101,17 +101,16 @@ void LogicCombiner::clear() {
   for (auto node : LogicalOpNodes)
     delete node.second;
   LogicalOpNodes.clear();
-  LeafSet.clear();
   LeafValues.clear();
 }

 LogicalOpNode *LogicCombiner::visitLeafNode(Value *Val, unsigned Depth) {
   // Depth is 0 means the root is not logical operation. We can't
   // do anything for that.
-  if (Depth == 0 || LeafSet.size() >= MaxLogicOpLeafsToScan)
+  if (Depth == 0 || LeafValues.size() >= MaxLogicOpLeafsToScan)
     return nullptr;

-  uint64_t ExprVal = 1ULL << LeafSet.size();
+  uint64_t ExprVal = 1ULL << LeafValues.size();
   // Constant Zero,AllOne are special leaf nodes. They involve
   // LogicalExpr's calculation so we must detect them at first.
   if (auto ConstVal = dyn_cast<ConstantInt>(Val)) {
@@ -120,9 +119,8 @@ LogicalOpNode *LogicCombiner::visitLeafNode(Value *Val, unsigned Depth) {
     else if (ConstVal->isAllOnesValue())
       ExprVal = LogicalExpr::ExprAllOne;
   }
-  if (ExprVal != LogicalExpr::ExprAllOne && ExprVal != 0 &&
-      LeafSet.insert(Val).second)
-    LeafValues.push_back(Val);
+  if (ExprVal != LogicalExpr::ExprAllOne && ExprVal != 0)
+    LeafValues.insert(Val);
   LogicalOpNode *Node = new LogicalOpNode(this, Val, LogicalExpr(ExprVal));
   LogicalOpNodes[Val] = Node;
   return Node;
bcl5980 updated this revision to Diff 499834.Feb 23 2023, 6:31 AM

Remove leafset.

I need LeafValues to access value by index. It looks SmallSetVector can't do that.

SmallSetVector allows indexing. This is the patch I tried after applying this patch (I had to fix line endings first), and no tests fail:

diff --git a/llvm/include/llvm/Analysis/LogicCombine.h b/llvm/include/llvm/Analysis/LogicCombine.h
index 3fdcf7998321..56a3d8f36b16 100644
--- a/llvm/include/llvm/Analysis/LogicCombine.h
+++ b/llvm/include/llvm/Analysis/LogicCombine.h
@@ -8,8 +8,7 @@

 #include "LogicalExpr.h"
 #include "llvm/ADT/DenseMap.h"
-#include "llvm/ADT/SmallPtrSet.h"
-#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/IR/InstrTypes.h"
 #include "llvm/IR/Instruction.h"
@@ -50,8 +49,7 @@ private:
   friend class LogicalOpNode;

   SmallDenseMap<Value *, LogicalOpNode *, 16> LogicalOpNodes;
-  SmallPtrSet<Value *, 8> LeafSet;
-  SmallVector<Value *, 8> LeafValues;
+  SmallSetVector<Value *, 8> LeafValues;

   void clear();

diff --git a/llvm/lib/Analysis/LogicCombine.cpp b/llvm/lib/Analysis/LogicCombine.cpp
index 28d9488cab96..3b410cdacd32 100644
--- a/llvm/lib/Analysis/LogicCombine.cpp
+++ b/llvm/lib/Analysis/LogicCombine.cpp
@@ -101,17 +101,16 @@ void LogicCombiner::clear() {
   for (auto node : LogicalOpNodes)
     delete node.second;
   LogicalOpNodes.clear();
-  LeafSet.clear();
   LeafValues.clear();
 }

 LogicalOpNode *LogicCombiner::visitLeafNode(Value *Val, unsigned Depth) {
   // Depth is 0 means the root is not logical operation. We can't
   // do anything for that.
-  if (Depth == 0 || LeafSet.size() >= MaxLogicOpLeafsToScan)
+  if (Depth == 0 || LeafValues.size() >= MaxLogicOpLeafsToScan)
     return nullptr;

-  uint64_t ExprVal = 1ULL << LeafSet.size();
+  uint64_t ExprVal = 1ULL << LeafValues.size();
   // Constant Zero,AllOne are special leaf nodes. They involve
   // LogicalExpr's calculation so we must detect them at first.
   if (auto ConstVal = dyn_cast<ConstantInt>(Val)) {
@@ -120,9 +119,8 @@ LogicalOpNode *LogicCombiner::visitLeafNode(Value *Val, unsigned Depth) {
     else if (ConstVal->isAllOnesValue())
       ExprVal = LogicalExpr::ExprAllOne;
   }
-  if (ExprVal != LogicalExpr::ExprAllOne && ExprVal != 0 &&
-      LeafSet.insert(Val).second)
-    LeafValues.push_back(Val);
+  if (ExprVal != LogicalExpr::ExprAllOne && ExprVal != 0)
+    LeafValues.insert(Val);
   LogicalOpNode *Node = new LogicalOpNode(this, Val, LogicalExpr(ExprVal));
   LogicalOpNodes[Val] = Node;
   return Node;

Thanks, I have already update it.

spatel accepted this revision.Feb 23 2023, 7:08 AM

LGTM

llvm/include/llvm/Analysis/LogicalExpr.h
95

spelling: "special"

This revision is now accepted and ready to land.Feb 23 2023, 7:08 AM
nikic added inline comments.Feb 23 2023, 9:11 AM
llvm/include/llvm/Analysis/LogicCombine.h
23

It's weird that LogicalOpNode has a reference back to LogicCombiner. Is this just for printing? Would it be better to pass it to the print method?

35

Why the explicit empty dtor?

llvm/include/llvm/Analysis/LogicalExpr.h
11

bitsets?

28

An example

57

Am I missing something, or is LeafMask never actually used?

59

Unnecessary inline?

87

Hm, can these actually occur? It looks like they should be excluded by a ^ a canonicalization.

95

special

121

insert returns an iterator you can pass to erase.

llvm/lib/Analysis/LogicCombine.cpp
17
18

It would be more helpful to write them in binary form here, e.g. 111.

52

I think you might be looking for Value->printAsOperand() here?

78

You can use ListSeparator and avoid the need to split out the last case.

94

Same here, ListSeparator.

103

Should these be using unique_ptr instead?

Or possibly SpecificBumpPtrAllocator?

nikic added a comment.Feb 23 2023, 1:24 PM

I'm concerned about the caching here. It looks like you reuse one LogicCombiner instance for a basic block. However, isn't it possible for some of the instructions that have been inserted into LogicalOpNodes to be deleted, in which case the map may contain dangling pointers. If the pointer is reused by a newly allocated instruction, the cached information will be incorrect.

bcl5980 updated this revision to Diff 500020.Feb 23 2023, 5:50 PM

Address comments.

llvm/include/llvm/Analysis/LogicCombine.h
23

I call print in operator "<<" . I can't pass the LogicCombiner into the override operator "<<". Do you have any ideal for that?

llvm/include/llvm/Analysis/LogicalExpr.h
57

The initial patch hasn't use LeafMask. The following up change D143155 will use that.

87

For now it should happen in the case with constant 0 like: %and = and i8 %a, 0.
Constant 0 can be represented by empty set or one element with 0 value. If I canonicalize one element with 0 value to empty set then it won't happen.

I'm concerned about the caching here. It looks like you reuse one LogicCombiner instance for a basic block. However, isn't it possible for some of the instructions that have been inserted into LogicalOpNodes to be deleted, in which case the map may contain dangling pointers. If the pointer is reused by a newly allocated instruction, the cached information will be incorrect.

The main reason for caching is saving compile time. The new patch will remove all the instructions already inserted into the caches and I think functional it works now.
@nikic @spatel if possible can we use the llvm-compile-time-track to test how much compile time increase if we enable the LogicCombiner for every single instruction?

bcl5980 updated this revision to Diff 500031.Feb 23 2023, 6:30 PM

use SpecificBumpPtrAllocator

bcl5980 updated this revision to Diff 500042.Feb 23 2023, 7:00 PM

minor bug fix.

bcl5980 updated this revision to Diff 500044.Feb 23 2023, 7:05 PM

code clean

spatel added inline comments.Feb 24 2023, 6:23 AM
llvm/include/llvm/Analysis/LogicalExpr.h
57

I did not notice that LeafMask is unused. It would be better to remove it from this patch.

If you only add it when it becomes necessary, then we can accurately measure the cost of each enhancement. It also makes the review easier because we can see exactly where/how the new features impact the original code.

I'm concerned about the caching here. It looks like you reuse one LogicCombiner instance for a basic block. However, isn't it possible for some of the instructions that have been inserted into LogicalOpNodes to be deleted, in which case the map may contain dangling pointers. If the pointer is reused by a newly allocated instruction, the cached information will be incorrect.

The main reason for caching is saving compile time. The new patch will remove all the instructions already inserted into the caches and I think functional it works now.
@nikic @spatel if possible can we use the llvm-compile-time-track to test how much compile time increase if we enable the LogicCombiner for every single instruction?

If this -- https://github.com/llvm/llvm-project/commit/efcf6c2b1f8490a9258d40abb90b21da60a15919 -- is the experiment that you wanted to try, it seems to have no significant difference:
https://llvm-compile-time-tracker.com/compare.php?from=3592d05438acc1034905feff7ff555f4fd4c5774&to=efcf6c2b1f8490a9258d40abb90b21da60a15919&stat=instructions:u

nikic added inline comments.Feb 24 2023, 1:13 PM
llvm/include/llvm/Analysis/LogicCombine.h
23

Right, this would not work with operator<<, one would have to call the print method. If << is used a lot, then this makes sense.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
869–870

I've added your fork now. Sorry, I didn't notice this request before.

bcl5980 updated this revision to Diff 500657.Feb 26 2023, 8:50 PM

remove leafmask

bcl5980 added a comment.EditedFeb 26 2023, 8:56 PM

I'm concerned about the caching here. It looks like you reuse one LogicCombiner instance for a basic block. However, isn't it possible for some of the instructions that have been inserted into LogicalOpNodes to be deleted, in which case the map may contain dangling pointers. If the pointer is reused by a newly allocated instruction, the cached information will be incorrect.

The main reason for caching is saving compile time. The new patch will remove all the instructions already inserted into the caches and I think functional it works now.
@nikic @spatel if possible can we use the llvm-compile-time-track to test how much compile time increase if we enable the LogicCombiner for every single instruction?

If this -- https://github.com/llvm/llvm-project/commit/efcf6c2b1f8490a9258d40abb90b21da60a15919 -- is the experiment that you wanted to try, it seems to have no significant difference:
https://llvm-compile-time-tracker.com/compare.php?from=3592d05438acc1034905feff7ff555f4fd4c5774&to=efcf6c2b1f8490a9258d40abb90b21da60a15919&stat=instructions:u

For now I send 5 patches for logic combine(D143046, D143155, D144373, D144077, D144373). After more patches involved the compile time also increased.
This is the result I enable LogicCombiner for every instruction after all my patches send to review:
http://llvm-compile-time-tracker.com/compare.php?from=4dd4eb939caef1138c655e22bb4adc8978f16427&to=8556a41a4ad4e8cf48bc316c9b5692b0de8e3d39&stat=instructions%3Au

And later I still need to send more patches to figure out some headache things like avoid undef unsafe pattern, restore "|" from the logical expression. So for now I still prefer to cache the result on basic block level.

with this patch, would it possible to remove some similar optimizations from instcombine, those being subsumed by this?

with this patch, would it possible to remove some similar optimizations from instcombine, those being subsumed by this?

AggressiveInstCombine runs only with -O3, no?

Removing instcombine folds would regress performance with lower levels. This leads me to the question how ofter this fires in llvm test suite / SPEC / clang bootstrap? Is it worth it?

with this patch, would it possible to remove some similar optimizations from instcombine, those being subsumed by this?

AggressiveInstCombine runs only with -O3, no?

Removing instcombine folds would regress performance with lower levels. This leads me to the question how ofter this fires in llvm test suite / SPEC / clang bootstrap? Is it worth it?

Ah yes you're right. Might be worth considering running it under -O2 as well?

with this patch, would it possible to remove some similar optimizations from instcombine, those being subsumed by this?

AggressiveInstCombine runs only with -O3, no?

Removing instcombine folds would regress performance with lower levels. This leads me to the question how ofter this fires in llvm test suite / SPEC / clang bootstrap? Is it worth it?

Ah yes you're right. Might be worth considering running it under -O2 as well?

Yeah! I believe that last time even the impact on compile times was quite low.

with this patch, would it possible to remove some similar optimizations from instcombine, those being subsumed by this?

This patch can't replace the normal patterns in instcombine. Because this patch works bad between logical node itself. For example,

define i8 @test(i8 %a, i8 %b, i8 %c, i8 %d) {
   %oab = or i8 %a, %b
   %ocd = or i8 %c, %d
   %and = and i8 %oab, %ocd
   %r = and i8 %and, %and
   ret i8 %r
}

Obviously the %r is the same to %and. But both of them in the logical node is a complicated set of bitsets. For now it's not easy to find %and is the same to %r.

any other coments for the initial patch? @nikic

This revision was landed with ongoing or failed builds.Mar 2 2023, 4:46 AM
This revision was automatically updated to reflect the committed changes.

This breaks module builds (i.e. LLVM_ENABLE_MODULES=ON):

/Users/alex/llvm-project/llvm/include/llvm/Analysis/LogicalExpr.h:53:7: error: redefinition of 'LogicalExpr'
class LogicalExpr {
      ^
/Users/alex/llvm-project/llvm/include/llvm/Analysis/LogicCombine.h:9:10: note: '/Users/alex/llvm-project/llvm/include/llvm/Analysis/LogicalExpr.h' included multiple times, additional include site in header from module 'LLVM_Analysis.LogicalExpr'
#include "LogicalExpr.h"
         ^
<module-includes>:67:10: note: '/Users/alex/llvm-project/llvm/include/llvm/Analysis/LogicalExpr.h' included multiple times, additional include site in header from module 'LLVM_Analysis.LogicalExpr'
#include "Analysis/LogicalExpr.h"
         ^
/Users/alex/llvm-project/llvm/include/llvm/Analysis/LogicalExpr.h:53:7: note: unguarded header; consider using #ifdef guards or #pragma once
class LogicalExpr {

I've added header guards to llvm/include/llvm/Analysis/LogicalExpr.h and llvm/include/llvm/LogicCombine.h in ff65a586677eb127ea70ca84b91204c0b9940b00.

any other coments for the initial patch? @nikic

I think it would be better to have approvals for all patches and commit this all at once. Otherwise we might end up with nontrivial unfinished code.

And my question was not answered before commit.

This leads me to the question how ofter this fires in llvm test suite / SPEC / clang bootstrap? Is it worth it?

any other coments for the initial patch? @nikic

I think it would be better to have approvals for all patches and commit this all at once. Otherwise we might end up with nontrivial unfinished code.

reverted by 76df706bca14affdcf0dd91561c8e6805035608f

And my question was not answered before commit.

This leads me to the question how ofter this fires in llvm test suite / SPEC / clang bootstrap? Is it worth it?

patch until D143155, there are 7 patterns triggered by SPEC 2017. No pattern trigged from llvm test suite. Haven't try bootstrap build yet.

If no one have any further comments I will abandon this serial patches. It looks it can't improve too much.