Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Avoid migrating existing patches. Phabricator shutdown timeline

[LogicCombine 1/?] Implement a general way to simplify logical operations.
ClosedPublic

Authored by bcl5980 on Jan 28 2023, 2:15 AM.

Details

Summary

This patch involves boolean ring to simplify logical operations. We can treat & as ring multiplication and ^ as ring addition.
So we need to canonicalize all other operations to * +. Like:

a & b -> a * b
a ^ b -> a + b
~a -> a + 1
a | b -> a * b + a + b
c ? a : b -> c * a + (c + 1) * b

In the code, we use a mask set to represent an expression. Every value that is not comes from logical operations could be a bit in the mask.
The mask itself is a multiplication chain. The mask set is an addiction chain.
We can calculate two expressions based on boolean algebras.

For now, the initial patch only enabled on and/or/xor, Later we can enhance the code step by step.

Reference: https://en.wikipedia.org/wiki/Boolean_ring

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
spatel added inline comments.Feb 21 2023, 1:21 PM
llvm/lib/Analysis/ComplexLogicCombine.cpp
65 ↗(On Diff #498982)

Should not need "llvm::" specifier here?

66 ↗(On Diff #498982)

The "LeafCnt == 0" is redundant. Move the "LeafBits == 0" check above the popcount for efficiency.

76 ↗(On Diff #498982)

Shouldn't need "llvm::" ?

135–136 ↗(On Diff #498982)

!BO->isBitwiseLogicOp()

180 ↗(On Diff #498982)

earsed -> erased

196 ↗(On Diff #498982)

What does this TODO mean? Either describe the planned follow-up with more details or remove this line.

203 ↗(On Diff #498982)

"can't be larger"

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
830–831

I.isBitwiseLogicOp()

869–870

What does the "for now" mean? Will it change in the future? What limitation does this imply?

bcl5980 added inline comments.Feb 21 2023, 6:35 PM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
869–870

There are 2 limitations to be trade off. here The higher level the LogicalOpsHelper create, the more logical node cached, which means it can save more cpu timing. But it will maintain more leaf nodes. By default the max of leaf node is 8 , which is not enough for whole function I guess.
So I write the comment here to mention me we can do something here later. Like split the helper based on types to make the code more efficient, adjust the default value of max leaf node number, use APInt to support more bits.

The most headache thing for me is test the cpu overhead. @nikic , can you add my github fork to the llvm-compile-time-tracker.com? This serial patches need a lot of CPU overhead tests I think.

bcl5980 added inline comments.Feb 21 2023, 6:38 PM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
869–870
bcl5980 updated this revision to Diff 499357.Feb 21 2023, 7:25 PM

Address comments.

bcl5980 updated this revision to Diff 499398.Feb 22 2023, 12:32 AM
bcl5980 retitled this revision from [ComplexLogicCombine 1/?] Implement a general way to simplify complex logical operations. to [LogicCombine 1/?] Implement a general way to simplify logical operations..Feb 22 2023, 1:00 AM

Please rename the test file as a preliminary step, so we will again show diffs in this patch. We also need to add negative tests to show current limitations and also that the combining is not making wrong logic reductions.

llvm/include/llvm/Analysis/LogicCombine.h
43

This isn't really a "helper" - this is the main part of the code. Can we name this "LogicCombiner"?

54–55

I don't think we need "LeafSet". If you make "LeafValues" a SmallSetVector, we won't insert duplicate values into the vector. Is that the only job of the LeafSet?

llvm/lib/Analysis/LogicCombine.cpp
134

This comment isn't necessary - I think the code is clear enough now.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
828

/// Reduce bitwise logic sequences.

869–875

Adjust wording (make sure I understand it correctly):

// TODO: Combining at the function-level would allow more 
// caching of nodes which saves on compile-time, but it may hit the
// max depth or value limits before finding a solution. We could split
// the helper based on types to make the code more efficient, adjust 
// the value of max depth/values, or use APInt to support tracking 
// more than 63 leaf values.

But I doubt that we have a real problem here for the vast majority of programs? We are tracking up to 8 different leaf values with a depth of 8 logic instructions. Maybe add (currently) negative tests with those large number of instructions to show what is needed to hit the limits?

885

Change name: foldComplexLogic -> foldBitwiseLogic

This call should be moved before "foldSqrt()" as the code comment suggests.

bcl5980 added inline comments.Feb 22 2023, 5:09 PM
llvm/include/llvm/Analysis/LogicCombine.h
54–55

I need LeafValues to access value by index. It looks SmallSetVector can't do that.

bcl5980 updated this revision to Diff 499702.Feb 22 2023, 6:16 PM

Address comments.

I need LeafValues to access value by index. It looks SmallSetVector can't do that.

SmallSetVector allows indexing. This is the patch I tried after applying this patch (I had to fix line endings first), and no tests fail:

diff --git a/llvm/include/llvm/Analysis/LogicCombine.h b/llvm/include/llvm/Analysis/LogicCombine.h
index 3fdcf7998321..56a3d8f36b16 100644
--- a/llvm/include/llvm/Analysis/LogicCombine.h
+++ b/llvm/include/llvm/Analysis/LogicCombine.h
@@ -8,8 +8,7 @@

 #include "LogicalExpr.h"
 #include "llvm/ADT/DenseMap.h"
-#include "llvm/ADT/SmallPtrSet.h"
-#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/IR/InstrTypes.h"
 #include "llvm/IR/Instruction.h"
@@ -50,8 +49,7 @@ private:
   friend class LogicalOpNode;

   SmallDenseMap<Value *, LogicalOpNode *, 16> LogicalOpNodes;
-  SmallPtrSet<Value *, 8> LeafSet;
-  SmallVector<Value *, 8> LeafValues;
+  SmallSetVector<Value *, 8> LeafValues;

   void clear();

diff --git a/llvm/lib/Analysis/LogicCombine.cpp b/llvm/lib/Analysis/LogicCombine.cpp
index 28d9488cab96..3b410cdacd32 100644
--- a/llvm/lib/Analysis/LogicCombine.cpp
+++ b/llvm/lib/Analysis/LogicCombine.cpp
@@ -101,17 +101,16 @@ void LogicCombiner::clear() {
   for (auto node : LogicalOpNodes)
     delete node.second;
   LogicalOpNodes.clear();
-  LeafSet.clear();
   LeafValues.clear();
 }

 LogicalOpNode *LogicCombiner::visitLeafNode(Value *Val, unsigned Depth) {
   // Depth is 0 means the root is not logical operation. We can't
   // do anything for that.
-  if (Depth == 0 || LeafSet.size() >= MaxLogicOpLeafsToScan)
+  if (Depth == 0 || LeafValues.size() >= MaxLogicOpLeafsToScan)
     return nullptr;

-  uint64_t ExprVal = 1ULL << LeafSet.size();
+  uint64_t ExprVal = 1ULL << LeafValues.size();
   // Constant Zero,AllOne are special leaf nodes. They involve
   // LogicalExpr's calculation so we must detect them at first.
   if (auto ConstVal = dyn_cast<ConstantInt>(Val)) {
@@ -120,9 +119,8 @@ LogicalOpNode *LogicCombiner::visitLeafNode(Value *Val, unsigned Depth) {
     else if (ConstVal->isAllOnesValue())
       ExprVal = LogicalExpr::ExprAllOne;
   }
-  if (ExprVal != LogicalExpr::ExprAllOne && ExprVal != 0 &&
-      LeafSet.insert(Val).second)
-    LeafValues.push_back(Val);
+  if (ExprVal != LogicalExpr::ExprAllOne && ExprVal != 0)
+    LeafValues.insert(Val);
   LogicalOpNode *Node = new LogicalOpNode(this, Val, LogicalExpr(ExprVal));
   LogicalOpNodes[Val] = Node;
   return Node;
bcl5980 updated this revision to Diff 499834.Feb 23 2023, 6:31 AM

Remove leafset.

I need LeafValues to access value by index. It looks SmallSetVector can't do that.

SmallSetVector allows indexing. This is the patch I tried after applying this patch (I had to fix line endings first), and no tests fail:

diff --git a/llvm/include/llvm/Analysis/LogicCombine.h b/llvm/include/llvm/Analysis/LogicCombine.h
index 3fdcf7998321..56a3d8f36b16 100644
--- a/llvm/include/llvm/Analysis/LogicCombine.h
+++ b/llvm/include/llvm/Analysis/LogicCombine.h
@@ -8,8 +8,7 @@

 #include "LogicalExpr.h"
 #include "llvm/ADT/DenseMap.h"
-#include "llvm/ADT/SmallPtrSet.h"
-#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/IR/InstrTypes.h"
 #include "llvm/IR/Instruction.h"
@@ -50,8 +49,7 @@ private:
   friend class LogicalOpNode;

   SmallDenseMap<Value *, LogicalOpNode *, 16> LogicalOpNodes;
-  SmallPtrSet<Value *, 8> LeafSet;
-  SmallVector<Value *, 8> LeafValues;
+  SmallSetVector<Value *, 8> LeafValues;

   void clear();

diff --git a/llvm/lib/Analysis/LogicCombine.cpp b/llvm/lib/Analysis/LogicCombine.cpp
index 28d9488cab96..3b410cdacd32 100644
--- a/llvm/lib/Analysis/LogicCombine.cpp
+++ b/llvm/lib/Analysis/LogicCombine.cpp
@@ -101,17 +101,16 @@ void LogicCombiner::clear() {
   for (auto node : LogicalOpNodes)
     delete node.second;
   LogicalOpNodes.clear();
-  LeafSet.clear();
   LeafValues.clear();
 }

 LogicalOpNode *LogicCombiner::visitLeafNode(Value *Val, unsigned Depth) {
   // Depth is 0 means the root is not logical operation. We can't
   // do anything for that.
-  if (Depth == 0 || LeafSet.size() >= MaxLogicOpLeafsToScan)
+  if (Depth == 0 || LeafValues.size() >= MaxLogicOpLeafsToScan)
     return nullptr;

-  uint64_t ExprVal = 1ULL << LeafSet.size();
+  uint64_t ExprVal = 1ULL << LeafValues.size();
   // Constant Zero,AllOne are special leaf nodes. They involve
   // LogicalExpr's calculation so we must detect them at first.
   if (auto ConstVal = dyn_cast<ConstantInt>(Val)) {
@@ -120,9 +119,8 @@ LogicalOpNode *LogicCombiner::visitLeafNode(Value *Val, unsigned Depth) {
     else if (ConstVal->isAllOnesValue())
       ExprVal = LogicalExpr::ExprAllOne;
   }
-  if (ExprVal != LogicalExpr::ExprAllOne && ExprVal != 0 &&
-      LeafSet.insert(Val).second)
-    LeafValues.push_back(Val);
+  if (ExprVal != LogicalExpr::ExprAllOne && ExprVal != 0)
+    LeafValues.insert(Val);
   LogicalOpNode *Node = new LogicalOpNode(this, Val, LogicalExpr(ExprVal));
   LogicalOpNodes[Val] = Node;
   return Node;

Thanks, I have already update it.

spatel accepted this revision.Feb 23 2023, 7:08 AM

LGTM

llvm/include/llvm/Analysis/LogicalExpr.h
95

spelling: "special"

This revision is now accepted and ready to land.Feb 23 2023, 7:08 AM
nikic added inline comments.Feb 23 2023, 9:11 AM
llvm/include/llvm/Analysis/LogicCombine.h
23

It's weird that LogicalOpNode has a reference back to LogicCombiner. Is this just for printing? Would it be better to pass it to the print method?

35

Why the explicit empty dtor?

llvm/include/llvm/Analysis/LogicalExpr.h
11

bitsets?

28

An example

57

Am I missing something, or is LeafMask never actually used?

59

Unnecessary inline?

87

Hm, can these actually occur? It looks like they should be excluded by a ^ a canonicalization.

95

special

121

insert returns an iterator you can pass to erase.

llvm/lib/Analysis/LogicCombine.cpp
17
18

It would be more helpful to write them in binary form here, e.g. 111.

52

I think you might be looking for Value->printAsOperand() here?

78

You can use ListSeparator and avoid the need to split out the last case.

94

Same here, ListSeparator.

103

Should these be using unique_ptr instead?

Or possibly SpecificBumpPtrAllocator?

nikic added a comment.Feb 23 2023, 1:24 PM

I'm concerned about the caching here. It looks like you reuse one LogicCombiner instance for a basic block. However, isn't it possible for some of the instructions that have been inserted into LogicalOpNodes to be deleted, in which case the map may contain dangling pointers. If the pointer is reused by a newly allocated instruction, the cached information will be incorrect.

bcl5980 updated this revision to Diff 500020.Feb 23 2023, 5:50 PM

Address comments.

llvm/include/llvm/Analysis/LogicCombine.h
23

I call print in operator "<<" . I can't pass the LogicCombiner into the override operator "<<". Do you have any ideal for that?

llvm/include/llvm/Analysis/LogicalExpr.h
57

The initial patch hasn't use LeafMask. The following up change D143155 will use that.

87

For now it should happen in the case with constant 0 like: %and = and i8 %a, 0.
Constant 0 can be represented by empty set or one element with 0 value. If I canonicalize one element with 0 value to empty set then it won't happen.

I'm concerned about the caching here. It looks like you reuse one LogicCombiner instance for a basic block. However, isn't it possible for some of the instructions that have been inserted into LogicalOpNodes to be deleted, in which case the map may contain dangling pointers. If the pointer is reused by a newly allocated instruction, the cached information will be incorrect.

The main reason for caching is saving compile time. The new patch will remove all the instructions already inserted into the caches and I think functional it works now.
@nikic @spatel if possible can we use the llvm-compile-time-track to test how much compile time increase if we enable the LogicCombiner for every single instruction?

bcl5980 updated this revision to Diff 500031.Feb 23 2023, 6:30 PM

use SpecificBumpPtrAllocator

bcl5980 updated this revision to Diff 500042.Feb 23 2023, 7:00 PM

minor bug fix.

bcl5980 updated this revision to Diff 500044.Feb 23 2023, 7:05 PM

code clean

spatel added inline comments.Feb 24 2023, 6:23 AM
llvm/include/llvm/Analysis/LogicalExpr.h
57

I did not notice that LeafMask is unused. It would be better to remove it from this patch.

If you only add it when it becomes necessary, then we can accurately measure the cost of each enhancement. It also makes the review easier because we can see exactly where/how the new features impact the original code.

I'm concerned about the caching here. It looks like you reuse one LogicCombiner instance for a basic block. However, isn't it possible for some of the instructions that have been inserted into LogicalOpNodes to be deleted, in which case the map may contain dangling pointers. If the pointer is reused by a newly allocated instruction, the cached information will be incorrect.

The main reason for caching is saving compile time. The new patch will remove all the instructions already inserted into the caches and I think functional it works now.
@nikic @spatel if possible can we use the llvm-compile-time-track to test how much compile time increase if we enable the LogicCombiner for every single instruction?

If this -- https://github.com/llvm/llvm-project/commit/efcf6c2b1f8490a9258d40abb90b21da60a15919 -- is the experiment that you wanted to try, it seems to have no significant difference:
https://llvm-compile-time-tracker.com/compare.php?from=3592d05438acc1034905feff7ff555f4fd4c5774&to=efcf6c2b1f8490a9258d40abb90b21da60a15919&stat=instructions:u

nikic added inline comments.Feb 24 2023, 1:13 PM
llvm/include/llvm/Analysis/LogicCombine.h
23

Right, this would not work with operator<<, one would have to call the print method. If << is used a lot, then this makes sense.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
869–870

I've added your fork now. Sorry, I didn't notice this request before.

bcl5980 updated this revision to Diff 500657.Feb 26 2023, 8:50 PM

remove leafmask

bcl5980 added a comment.EditedFeb 26 2023, 8:56 PM

I'm concerned about the caching here. It looks like you reuse one LogicCombiner instance for a basic block. However, isn't it possible for some of the instructions that have been inserted into LogicalOpNodes to be deleted, in which case the map may contain dangling pointers. If the pointer is reused by a newly allocated instruction, the cached information will be incorrect.

The main reason for caching is saving compile time. The new patch will remove all the instructions already inserted into the caches and I think functional it works now.
@nikic @spatel if possible can we use the llvm-compile-time-track to test how much compile time increase if we enable the LogicCombiner for every single instruction?

If this -- https://github.com/llvm/llvm-project/commit/efcf6c2b1f8490a9258d40abb90b21da60a15919 -- is the experiment that you wanted to try, it seems to have no significant difference:
https://llvm-compile-time-tracker.com/compare.php?from=3592d05438acc1034905feff7ff555f4fd4c5774&to=efcf6c2b1f8490a9258d40abb90b21da60a15919&stat=instructions:u

For now I send 5 patches for logic combine(D143046, D143155, D144373, D144077, D144373). After more patches involved the compile time also increased.
This is the result I enable LogicCombiner for every instruction after all my patches send to review:
http://llvm-compile-time-tracker.com/compare.php?from=4dd4eb939caef1138c655e22bb4adc8978f16427&to=8556a41a4ad4e8cf48bc316c9b5692b0de8e3d39&stat=instructions%3Au

And later I still need to send more patches to figure out some headache things like avoid undef unsafe pattern, restore "|" from the logical expression. So for now I still prefer to cache the result on basic block level.

with this patch, would it possible to remove some similar optimizations from instcombine, those being subsumed by this?

with this patch, would it possible to remove some similar optimizations from instcombine, those being subsumed by this?

AggressiveInstCombine runs only with -O3, no?

Removing instcombine folds would regress performance with lower levels. This leads me to the question how ofter this fires in llvm test suite / SPEC / clang bootstrap? Is it worth it?

with this patch, would it possible to remove some similar optimizations from instcombine, those being subsumed by this?

AggressiveInstCombine runs only with -O3, no?

Removing instcombine folds would regress performance with lower levels. This leads me to the question how ofter this fires in llvm test suite / SPEC / clang bootstrap? Is it worth it?

Ah yes you're right. Might be worth considering running it under -O2 as well?

with this patch, would it possible to remove some similar optimizations from instcombine, those being subsumed by this?

AggressiveInstCombine runs only with -O3, no?

Removing instcombine folds would regress performance with lower levels. This leads me to the question how ofter this fires in llvm test suite / SPEC / clang bootstrap? Is it worth it?

Ah yes you're right. Might be worth considering running it under -O2 as well?

Yeah! I believe that last time even the impact on compile times was quite low.

with this patch, would it possible to remove some similar optimizations from instcombine, those being subsumed by this?

This patch can't replace the normal patterns in instcombine. Because this patch works bad between logical node itself. For example,

define i8 @test(i8 %a, i8 %b, i8 %c, i8 %d) {
   %oab = or i8 %a, %b
   %ocd = or i8 %c, %d
   %and = and i8 %oab, %ocd
   %r = and i8 %and, %and
   ret i8 %r
}

Obviously the %r is the same to %and. But both of them in the logical node is a complicated set of bitsets. For now it's not easy to find %and is the same to %r.

any other coments for the initial patch? @nikic

This revision was landed with ongoing or failed builds.Mar 2 2023, 4:46 AM
This revision was automatically updated to reflect the committed changes.

This breaks module builds (i.e. LLVM_ENABLE_MODULES=ON):

/Users/alex/llvm-project/llvm/include/llvm/Analysis/LogicalExpr.h:53:7: error: redefinition of 'LogicalExpr'
class LogicalExpr {
      ^
/Users/alex/llvm-project/llvm/include/llvm/Analysis/LogicCombine.h:9:10: note: '/Users/alex/llvm-project/llvm/include/llvm/Analysis/LogicalExpr.h' included multiple times, additional include site in header from module 'LLVM_Analysis.LogicalExpr'
#include "LogicalExpr.h"
         ^
<module-includes>:67:10: note: '/Users/alex/llvm-project/llvm/include/llvm/Analysis/LogicalExpr.h' included multiple times, additional include site in header from module 'LLVM_Analysis.LogicalExpr'
#include "Analysis/LogicalExpr.h"
         ^
/Users/alex/llvm-project/llvm/include/llvm/Analysis/LogicalExpr.h:53:7: note: unguarded header; consider using #ifdef guards or #pragma once
class LogicalExpr {

I've added header guards to llvm/include/llvm/Analysis/LogicalExpr.h and llvm/include/llvm/LogicCombine.h in ff65a586677eb127ea70ca84b91204c0b9940b00.

any other coments for the initial patch? @nikic

I think it would be better to have approvals for all patches and commit this all at once. Otherwise we might end up with nontrivial unfinished code.

And my question was not answered before commit.

This leads me to the question how ofter this fires in llvm test suite / SPEC / clang bootstrap? Is it worth it?

any other coments for the initial patch? @nikic

I think it would be better to have approvals for all patches and commit this all at once. Otherwise we might end up with nontrivial unfinished code.

reverted by 76df706bca14affdcf0dd91561c8e6805035608f

And my question was not answered before commit.

This leads me to the question how ofter this fires in llvm test suite / SPEC / clang bootstrap? Is it worth it?

patch until D143155, there are 7 patterns triggered by SPEC 2017. No pattern trigged from llvm test suite. Haven't try bootstrap build yet.

If no one have any further comments I will abandon this serial patches. It looks it can't improve too much.