This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
3/11
InstCombineAndOrXor.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
1/3
and-or.ll
-
or.ll

Differential D124119

[InstCombine] Combine instructions of type or/and where AND masks can be combined.
ClosedPublic

Authored by bipmis on Apr 20 2022, 12:54 PM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
dmgreen
nikic
alexfh

Commits

rGd87bfa9ad0af: [InstCombine] Combine instructions of type or/and where AND masks can be…
rGec4adf1f6c33: [InstCombine] Combine instructions of type or/and where AND masks can be…

Summary

The patch simplifies some of the patterns as below

(A | (B & C0)) | (B & C1) -> A | (B & C0|C1)
((B & C0) | A) | (B & C1) -> (B & C0|C1) | A

In some scenarios like byte reverse on half word, we can see this pattern multiple times and this conversion can optimize these patterns.

Diff Detail

Unit TestsFailed

	Time	Test
	60,080 ms	x64 debian > LLVM.CodeGen/NVPTX::wmma.py
	60,050 ms	x64 debian > libFuzzer.libFuzzer::large.test

Event Timeline

bipmis requested review of this revision.Apr 20 2022, 12:54 PM

bipmis created this revision.

bipmis edited the summary of this revision. (Show Details)

bipmis mentioned this in D123029: [AArch64] Optimize patterns where AND's on same operand with multiple masks can be combined..Apr 20 2022, 12:59 PM

Harbormaster completed remote builds in B160513: Diff 424000.Apr 20 2022, 2:13 PM

RKSimon retitled this revision from [AArch64] Combine instructions of type or/and where AND masks can be combined. to [InstCombine] Combine instructions of type or/and where AND masks can be combined..Apr 20 2022, 2:15 PM

RKSimon added a reviewer: nikic.

Sounds great, doing this is InstCombine as opposed to DAG. That should make all the costmodelling and whatnot come out better.

Do you have alive proofs for the changes? And it might be a good idea to add a few more tests, for one use cases and negative checks that shouldn't fire. Precommitting the tests is also a good idea.

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2754	Looks like this was unneeded.
llvm/test/Transforms/InstCombine/and-or.ll
359	You can remove local_unnamed_addr #0

Thanks. Adding additional test for showing the other commute scenario. Also re-patching with the tests committed and comments incorporated.
Alive link for showing the transform is https://alive2.llvm.org/ce/z/XeMdDp

Harbormaster completed remote builds in B160857: Diff 424477.Apr 22 2022, 8:37 AM

This seems like a problem with the -reassociate pass - it seems to invert the form we want here.

However, I'm not opposed to another relatively simple reassociation transform here in instcombine (we have many already), but the patch as written is not general enough, missing commuted patterns/tests, and missing some kind of one-use check as noted earlier.

We could generalize to a canonicalization that pushes the 'and' values together, and then the optimization of combining mask constants will fall out from that:
https://alive2.llvm.org/ce/z/ZHgrvL

As code, that would be something like this:

// (A & B) | (C | (A & D)) --> ((A & B) | (A & D)) | C
if (match(Op0, m_And(m_Value(A), m_Value(B))) &&
    match(Op1, m_c_Or(m_Value(C), m_c_And(m_Specific(A), m_Value(D)))))
...

There are already 4 commuted possibilities there, but we'd also need to test for the pattern where "B" is the repeated operand on the right side:

// (A & B) | (C | (B & D)) --> ((A & B) | (B & D)) | C

...and swap the operands of the final 'or'. So 16 total patterns to test for? If we make the matches require constants, we can reduce the number of possibilities (since we know that constants are always op1/right-side), but it's a less general transform.

On 2nd thought, we should model the transform like this:
https://alive2.llvm.org/ce/z/NAgoJM

So it's a missing optimization even without constant operands, and we really do need 16 commuted matches/tests. :)

In D124119#3467871, @spatel wrote:
This seems like a problem with the -reassociate pass - it seems to invert the form we want here.

However, I'm not opposed to another relatively simple reassociation transform here in instcombine (we have many already), but the patch as written is not general enough, missing commuted patterns/tests, and missing some kind of one-use check as noted earlier.

We could generalize to a canonicalization that pushes the 'and' values together, and then the optimization of combining mask constants will fall out from that:
https://alive2.llvm.org/ce/z/ZHgrvL

As code, that would be something like this:
// (A & B) | (C | (A & D)) --> ((A & B) | (A & D)) | C
if (match(Op0, m_And(m_Value(A), m_Value(B))) &&
    match(Op1, m_c_Or(m_Value(C), m_c_And(m_Specific(A), m_Value(D)))))
...
There are already 4 commuted possibilities there, but we'd also need to test for the pattern where "B" is the repeated operand on the right side:
// (A & B) | (C | (B & D)) --> ((A & B) | (B & D)) | C
...and swap the operands of the final 'or'. So 16 total patterns to test for? If we make the matches require constants, we can reduce the number of possibilities (since we know that constants are always op1/right-side), but it's a less general transform.

I agree to this point. We can generalize to a canonicalization that pushes the 'and' values together, as the reassociate pass will then combine the 2 AND's for both register and masks. This will also align to what you have suggested in https://alive2.llvm.org/ce/z/NAgoJM.

Yes for generic scenario, we are looking at 16 combinations. Let me push the test for them first and can then rebase my patch with above implementation.

Rebasing patch with the changes as suggested with different commuted tests.

Harbormaster completed remote builds in B161607: Diff 425520.Apr 27 2022, 8:51 AM

spatel added inline comments.Apr 27 2022, 12:52 PM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2983–2984	This is still missing one-use checks and tests. We should have tests where one or more of the intermediate values has some other use. Look in the test file that you are modifying for tests with `call void @use(i8)`. If we are creating 3 instructions, then both Op0 and Op1 must have only one use to ensure that we do not end up with more instructions than we started with.
2986	The code comments are confusing - don't reuse 'D' if we've already specified that as the outer-level `and` instruction. It seems better to match the final `or` with a commutative matcher rather than duplicate all of this code twice. It would be something like: if (match(&I, m_c_Or(m_And(m_Value(A), m_Value(B)), m_Or(m_Value(C), m_Value(D))))
2990	Why use m_c_BinOp rather than m_c_And? If we are not using 'X', then there is no reason to capture it - just use plain `m_Value()`.
2992	I think it would be better (more efficient) to create the optimal pattern directly rather than relying on subsequent folds to do it for us. We can create the or(and(or...)) sequence right here.
llvm/test/Transforms/InstCombine/and-or.ll
421	Notice that pat1-4 are canonicalized to the same form as pat5-8, so these are not testing the patterns that you intended. You'll need to add an extra instruction to these tests (and also pat1-4 in the next set of tests) to maintain `%c` as operand 0 of the `or`. Search for "thwart complexity-based canonicalization" in the InstCombine test dir for examples of how to do this.

bipmis added inline comments.Apr 28 2022, 2:36 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2986	Ya the method and the comment is based on the current implementation where we do not actually reduce number of instructions, but reorder such that the AND's are now ORed and the existing implementation in InstCombine tryFactorization() folds them and reduces the instructions.
2990	m_c_BinOp was used as it can look for 2 operands in either order which is needed and could reduce extra code.
2992	Right. My point of view is if there is an existing implementation in the instruction combine to do this, should we redo the same here. I checked with some examples and multiple commuted sequences and the existing implementation could combine both constants and registers and reduce the number of instructions which was the intention. In doing so however it is not retaining the order of the operands correctly.
llvm/test/Transforms/InstCombine/and-or.ll
421	Right this is possibly due to the existing implementation tryFactorization() which folds the or(and, and) to and(or()). It does it for all scenarios, however likely maintains a fixed position of the operands.

bipmis marked an inline comment as not done.Apr 28 2022, 3:23 AM

bipmis added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2990	But yes we could use m_c_And as well. Thanks for pointing this out.

I have modified and committed the tests with modifications requested. Please have a look if this looks ok. Subsequently will rebase my patch.

In the current implementation I have folded (A & B) | (C | (A & D)) -> C | ((A & D) | (A & B)). Number of instructions have not changed as there is existing code instCombine which can do C | ((A & D) | (A & B)) -> C | (A & (D | B)). Advantage to this is less code required for implementing this and as suggested we can reduce the duplicated code and it would now look as below

if (match(&I, m_c_Or(m_OneUse(m_And(m_Value(A), m_Value(B))), m_OneUse(m_Or(m_Value(C), m_Value(D))))))
  {
    if (match(D, m_c_And(m_Specific(A), m_Value())) ||
        match(D, m_c_And(m_Specific(B), m_Value())))
      return BinaryOperator::CreateOr(C, Builder.CreateOr(D, Builder.CreateAnd(A, B)));
    if (match(C, m_c_And(m_Specific(A), m_Value())) ||
        match(C, m_c_And(m_Specific(B), m_Value())))
      return BinaryOperator::CreateOr(Builder.CreateOr(C, Builder.CreateAnd(A, B)), D);
  }

Other option is to directly generate the reduced number of instructions (A & B) | (C | (A & D)) -> C | (A & (D | B)). In this case we will need to introduce more code. Should this be the preferred approach and would be the reason for that. Thanks.

In D124119#3480733, @bipmis wrote:

I have modified and committed the tests with modifications requested. Please have a look if this looks ok. Subsequently will rebase my patch.

The changes to prevent commutes look right. I suspect we'll need even more one-use test variations to be sure we have that right, but that should not hold up rebasing/changing the patch.

Rebasing after incorporating the comments and test additions. Current implementation folds to a representation which can be reduced by the existing implementation in InstCombine.

Harbormaster completed remote builds in B162467: Diff 426719.May 3 2022, 9:34 AM

@spatel Could you review the rebased patch and let me know if you have any other comments. Thanks.

Sorry, I lost track of this patch. LGTM - see inline comments for minor improvements.

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2985	I still don't like re-using 'D' in these comments. That's the uncaptured operand in the next match, so I'd just call it '?' to show that it's a "don't care" value.
2997	'C' is still operand 0 of the final 'or' (the same as in the above set of transforms), but these comments show it as operand 1.

This revision is now accepted and ready to land.May 12 2022, 6:20 AM

This revision was landed with ongoing or failed builds.May 16 2022, 4:44 AM

Closed by commit rGec4adf1f6c33: [InstCombine] Combine instructions of type or/and where AND masks can be… (authored by bipmis). · Explain Why

This revision was automatically updated to reflect the committed changes.

bipmis added a commit: rGec4adf1f6c33: [InstCombine] Combine instructions of type or/and where AND masks can be….

Hi Biplob,
This commit has increased compilation time of certain translation units by a large factor. It raised from ~7s to at least 20 minutes. -ftime-report shows that the code previously put already quite a heavy load on InstCombinePass:

Total Execution Time: 6.0698 seconds (6.0699 wall clock)

 ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
 2.0465 ( 34.4%)   0.0269 ( 23.8%)   2.0734 ( 34.2%)   2.0734 ( 34.2%)  InstCombinePass
 1.4016 ( 23.5%)   0.0360 ( 31.8%)   1.4376 ( 23.7%)   1.4376 ( 23.7%)  ModuleInlinerWrapperPass
 1.3901 ( 23.3%)   0.0360 ( 31.8%)   1.4261 ( 23.5%)   1.4262 ( 23.5%)  DevirtSCCRepeatedPass
 0.3090 (  5.2%)   0.0001 (  0.1%)   0.3091 (  5.1%)   0.3091 (  5.1%)  SROAPass

However after this change some parts of the algorithm may have become exponential. At least I haven't seen the compilation finish yet. We're trying to come up with an isolated test case, but please consider reverting the commit while working on a fix. Thanks!

@alexfh it sounds more likely that there are 2 transforms fighting each other? Can you find a bugpoint test case by just setting a short ish timeout?

In D124119#3545576, @RKSimon wrote:

@alexfh it sounds more likely that there are 2 transforms fighting each other? Can you find a bugpoint test case by just setting a short ish timeout?

I'm working on a reduced test case. So far I have collected profile from the first few seconds of clang invocation. It may help understanding what's happening:

- 99.58% clang::BackendConsumer::HandleTranslationUnit                                                                                                                                                                            ◆
   - 99.43% clang::EmitBackendOutput                                                                                                                                                                                              ▒
      - 99.43% (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline                                                                                                                                                 ▒
         - llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run                                                                                                                                              ▒
            - 99.24% llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run                                                                  ▒
               - llvm::ModuleToFunctionPassAdaptor::run                                                                                                                                                                           ▒
                  - 99.24% llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run                 ▒
                     - llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run                                                                                                                              ▒
                        - 98.82% llvm::detail::PassModel<llvm::Function, llvm::InstCombinePass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run                                                              ▒
                           - llvm::InstCombinePass::run                                                                                                                                                                           ▒
                              - 98.81% combineInstructionsOverFunction                                                                                                                                                            ▒
                                 - 98.26% llvm::InstCombinerImpl::run                                                                                                                                                             ▒
                                    - 72.80% llvm::InstCombinerImpl::visitOr                                                                                                                                                      ▒
                                       - 21.67% llvm::InstCombinerImpl::SimplifyUsingDistributiveLaws                                                                                                                             ▒
                                          - 18.71% SimplifyOrInst                                                                                                                                                                 ▒
                                             - 9.82% SimplifyAssociativeBinOp                                                                                                                                                     ▒
                                                - 9.44% SimplifyOrInst                                                                                                                                                            ▒
                                                   - 3.44% expandCommutativeBinOp                                                                                                                                                 ▒
                                                      - 3.36% expandBinOp                                                                                                                                                         ▒
                                                         + 3.07% SimplifyOrInst                                                                                                                                                   ▒
                                                   + 2.36% expandBinOp                                                                                                                                                            ▒
                                                     2.19% simplifyOrLogic                                                                                                                                                        ▒
                                             + 3.53% expandBinOp                                                                                                                                                                  ▒
                                               2.36% simplifyOrLogic                                                                                                                                                              ▒
                                             + 1.69% expandCommutativeBinOp                                                                                                                                                       ▒
                                          + 1.58% llvm::InstCombinerImpl::tryFactorization                                                                                                                                        ▒
                                          + 0.68% llvm::Constant::getAllOnesValue                                                                                                                                                 ▒
                                       - 20.35% SimplifyOrInst                                                                                                                                                                    ▒
                                          - 9.20% SimplifyAssociativeBinOp                                                                                                                                                        ▒
                                             - 9.03% SimplifyOrInst                                                                                                                                                               ▒
                                                + 3.94% expandCommutativeBinOp                                                                                                                                                    ▒
                                                + 3.12% expandBinOp                                                                                                                                                               ▒
                                                  1.21% simplifyOrLogic                                                                                                                                                           ▒
                                          + 7.08% expandBinOp                                                                                                                                                                     ▒
                                          - 2.00% expandCommutativeBinOp                                                                                                                                                          ▒
                                             + 1.95% expandBinOp                                                                                                                                                                  ▒
                                            1.29% simplifyOrLogic                                                                                                                                                                 ▒
                                       - 10.74% llvm::InstCombinerImpl::SimplifyDemandedInstructionBits                                                                                                                           ▒
                                          - 10.61% llvm::InstCombinerImpl::SimplifyDemandedUseBits                                                                                                                                ▒
                                             - 9.99% llvm::InstCombinerImpl::SimplifyDemandedBits                                                                                                                                 ▒
                                                - 9.75% llvm::InstCombinerImpl::SimplifyDemandedUseBits                                                                                                                           ▒
                                                   - 6.72% llvm::InstCombinerImpl::SimplifyDemandedBits                                                                                                                           ▒
                                                      + 6.55% llvm::InstCombinerImpl::SimplifyDemandedUseBits                                                                                                                     ▒
                                                   + 1.86% llvm::InstCombinerImpl::SimplifyMultipleUseDemandedBits                                                                                                                ▒
                                       - 9.86% llvm::InstCombinerImpl::SimplifyAssociativeOrCommutative                                                                                                                           ▒
                                          - 9.42% SimplifyOrInst                                                                                                                                                                  ▒
                                             + 4.36% expandCommutativeBinOp                                                                                                                                                       ▒
                                             + 3.20% expandBinOp                                                                                                                                                                  ▒
                                               1.14% simplifyOrLogic                                                                                                                                                              ▒
                                       + 5.44% llvm::InstCombinerImpl::matchBSwapOrBitReverse                                                                                                                                     ▒
                                       + 0.95% llvm::InstCombinerImpl::matchSelectFromAndOr                                                                                                                                       ▒
                                         0.59% llvm::IRBuilderBase::CreateAnd                                                                                                                                                     ▒
                                    - 16.98% llvm::InstCombinerImpl::visitAnd                                                                                                                                                     ▒
                                       + 8.63% llvm::InstCombinerImpl::SimplifyDemandedInstructionBits                                                                                                                            ▒
                                       + 3.31% SimplifyAndInst                                                                                                                                                                    ▒
                                       + 1.59% llvm::InstCombinerImpl::SimplifyAssociativeOrCommutative                                                                                                                           ▒
                                    + 3.26% llvm::InstCombinerImpl::visitLShr                                                                                                                                                     ▒

@alexfh The transformation here is from one form of OR to the other which can actually be handled by the existing implementation in InstCombine. I dont think we saw an issue with compile time in llvm-test-suite. Would be good to analyze the specific test scenario which triggers this issue.

I was wrong with the initial analysis. The patch doesn't just make the performance worse. It makes clang loop indefinitely on a certain input.

$ cat q.cc
int e;
int u;
int ag;
void f() {
  int ak = e;
  int al((unsigned char)(ak >> 23) & 925);
  if (ak)
    al = (ak >> 23 & u) | ((unsigned char)(ak >> 23) & 925) | (u >> 23 & 157);
  ag = al;
}

$ time ./clang-15-10515 --target=x86_64--linux-gnu -O1  -c q.cc
^C

real    0m45.072s
user    0m0.025s
sys     0m0.099s

This is definitely a problem that has to be fixed. If you don't have an obvious fix in mind, please revert while investigating.

In D124119#3549716, @alexfh wrote:
I was wrong with the initial analysis. The patch doesn't just make the performance worse. It makes clang loop indefinitely on a certain input.
$ cat q.cc
int e;
int u;
int ag;
void f() {
  int ak = e;
  int al((unsigned char)(ak >> 23) & 925);
  if (ak)
    al = (ak >> 23 & u) | ((unsigned char)(ak >> 23) & 925) | (u >> 23 & 157);
  ag = al;
}

$ time ./clang-15-10515 --target=x86_64--linux-gnu -O1  -c q.cc
^C

real    0m45.072s
user    0m0.025s
sys     0m0.099s
This is definitely a problem that has to be fixed. If you don't have an obvious fix in mind, please revert while investigating.

A bit cleaner test case:

int f(int a, int b) {
  int c = ((unsigned char)(a >> 23) & 925);
  if (a)
    c = (a >> 23 & b) | ((unsigned char)(a >> 23) & 925) | (b >> 23 & 157);
  return c;
}

alexfh added a reverting change: rGaa98e7e1eb96: Revert "[InstCombine] Combine instructions of type or/and where AND masks can….Jun 1 2022, 5:20 AM

I've reverted the commit in aa98e7e1eb960712533d39bbc98a05a6e70a9683 to unblock our internal release.

@alexfh Verified this is an issue and thanks for reverting. I have the fix for the same which I can commit. Let me know if I can do it now or at a later date. Thanks.

In D124119#3549954, @bipmis wrote:

@alexfh Verified this is an issue and thanks for reverting. I have the fix for the same which I can commit. Let me know if I can do it now or at a later date. Thanks.

Feel free to commit it. I can then verify it on the original test case.

Update the patch to fix the test case

int f(int a, int b) {
  int c = ((unsigned char)(a >> 23) & 925);
  if (a)
    c = (a >> 23 & b) | ((unsigned char)(a >> 23) & 925) | (b >> 23 & 157);
  return c;
}

@alexfh I think it would be good if you can try the patch and approve for commit if it looks fine. Thanks.

bipmis added a reviewer: alexfh.Jun 1 2022, 7:19 AM

Harbormaster completed remote builds in B167253: Diff 433387.Jun 1 2022, 8:04 AM

We should have at least one minimized version of the test that caused the infinite loop in the updated version of the patch or pre-committed, so we guard against that same bug in the future.

I didn't step through to see if more can be removed, but I got it down to this, and it would infinite loop with "opt -instcombine":

declare void @use(i32)

define i32 @f(i32 %a, i32 %b) {
  %shr = ashr i32 %a, 23
  %conv = trunc i32 %shr to i8
  %conv1 = zext i8 %conv to i32
  %and = and i32 %conv1, 925
  call void @use(i32 %and)
  %and3 = and i32 %shr, %b
  %or = or i32 %and3, %and
  %shr8 = ashr i32 %b, 23
  %and9 = and i32 %shr8, 157
  %r = or i32 %or, %and9
  ret i32 %r
}

Similar bug reported here:
https://github.com/llvm/llvm-project/issues/55801

In D124119#3550149, @bipmis wrote:

@alexfh I think it would be good if you can try the patch and approve for commit if it looks fine. Thanks.

I have verified clang with your latest patch on the original file, and it doesn't hang now. However, I'm not an expert in InstCombine and can't properly review the change.

In D124119#3556322, @alexfh wrote:

In D124119#3550149, @bipmis wrote:

@alexfh I think it would be good if you can try the patch and approve for commit if it looks fine. Thanks.

I have verified clang with your latest patch on the original file, and it doesn't hang now. However, I'm not an expert in InstCombine and can't properly review the change.

And yes, please add a regression test.

Added regression tests to the patch.

Harbormaster completed remote builds in B168019: Diff 434433.Jun 6 2022, 5:35 AM

IIUC, the problem/fix (and it would be good to put something like this in the updated commit message for easier reference):
The previous revision/commit did not check one-use of an intermediate value that this transform re-uses. When that value has another use, an existing transform will try to invert the transform here. By adding one-use checks, we avoid the infinite loops seen with the earlier commit.

This revision was landed with ongoing or failed builds.Jun 9 2022, 2:59 AM

bipmis added a commit: rGd87bfa9ad0af: [InstCombine] Combine instructions of type or/and where AND masks can be….

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineAndOrXor.cpp

18 lines

test/

Transforms/

InstCombine/

and-or.ll

30 lines

or.ll

10 lines

Diff 424477

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

Show First 20 Lines • Show All 2,744 Lines • ▼ Show 20 Lines	if (Op0->hasOneUse() \|\| Op1->hasOneUse()) {
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);
if (Value *V = matchSelectFromAndOr(D, B, A, C))		if (Value *V = matchSelectFromAndOr(D, B, A, C))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);
if (Value *V = matchSelectFromAndOr(D, B, C, A))		if (Value *V = matchSelectFromAndOr(D, B, C, A))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);
}		}
}		}

		// (A \| B) \| (C & D)
		if (match(Op0, m_Or(m_Value(A), m_Value(B))) &&
		dmgreenUnsubmitted Not Done Reply Inline Actions Looks like this was unneeded. dmgreen: Looks like this was unneeded.
		match(Op1, m_And(m_Value(C), m_Value(D)))) {
		const APInt C0, C1;
		// (A \| (B & C0)) \| (B & C1) -> A \| (B & C0\|C1)
		if (match(B, m_c_And(m_Specific(C), m_APInt(C0))) &&
		match(D, m_APInt(C1))) {
		Constant C01 = ConstantInt::get(Ty, C0 \| *C1);
		return BinaryOperator::CreateOr(A, Builder.CreateAnd(C, C01));
		}
		// ((B & C0) \| A) \| (B & C1) -> (B & C0\|C1) \| A
		if (match(A, m_c_And(m_Specific(C), m_APInt(C0))) &&
		match(D, m_APInt(C1))) {
		Constant C01 = ConstantInt::get(Ty, C0 \| *C1);
		return BinaryOperator::CreateOr(Builder.CreateAnd(C, C01), B);
		}
		}

// (A ^ B) \| ((B ^ C) ^ A) -> (A ^ B) \| C		// (A ^ B) \| ((B ^ C) ^ A) -> (A ^ B) \| C
if (match(Op0, m_Xor(m_Value(A), m_Value(B))))		if (match(Op0, m_Xor(m_Value(A), m_Value(B))))
if (match(Op1, m_Xor(m_Xor(m_Specific(B), m_Value(C)), m_Specific(A))))		if (match(Op1, m_Xor(m_Xor(m_Specific(B), m_Value(C)), m_Specific(A))))
return BinaryOperator::CreateOr(Op0, C);		return BinaryOperator::CreateOr(Op0, C);

// ((A ^ C) ^ B) \| (B ^ A) -> (B ^ A) \| C		// ((A ^ C) ^ B) \| (B ^ A) -> (B ^ A) \| C
if (match(Op0, m_Xor(m_Xor(m_Value(A), m_Value(C)), m_Value(B))))		if (match(Op0, m_Xor(m_Xor(m_Value(A), m_Value(C)), m_Value(B))))
if (match(Op1, m_Xor(m_Specific(B), m_Specific(A))))		if (match(Op1, m_Xor(m_Specific(B), m_Specific(A))))
▲ Show 20 Lines • Show All 196 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitOr(BinaryOperator &I) {

// An or recurrence w/loop invariant step is equivelent to (or start, step)		// An or recurrence w/loop invariant step is equivelent to (or start, step)
PHINode *PN = nullptr;		PHINode *PN = nullptr;
Value Start = nullptr, Step = nullptr;		Value Start = nullptr, Step = nullptr;
if (matchSimpleRecurrence(&I, PN, Start, Step) && DT.dominates(Step, PN))		if (matchSimpleRecurrence(&I, PN, Start, Step) && DT.dominates(Step, PN))
return replaceInstUsesWith(I, Builder.CreateOr(Start, Step));		return replaceInstUsesWith(I, Builder.CreateOr(Start, Step));

return nullptr;		return nullptr;
}		}

		spatelUnsubmitted Not Done Reply Inline Actions This is still missing one-use checks and tests. We should have tests where one or more of the intermediate values has some other use. Look in the test file that you are modifying for tests with `call void @use(i8)`. If we are creating 3 instructions, then both Op0 and Op1 must have only one use to ensure that we do not end up with more instructions than we started with. spatel: This is still missing one-use checks and tests. We should have tests where one or more of the…
/// A ^ B can be specified using other logic ops in a variety of patterns. We		/// A ^ B can be specified using other logic ops in a variety of patterns. We
		spatelUnsubmitted Not Done Reply Inline Actions I still don't like re-using 'D' in these comments. That's the uncaptured operand in the next match, so I'd just call it '?' to show that it's a "don't care" value. spatel: I still don't like re-using 'D' in these comments. That's the uncaptured operand in the next…
/// can fold these early and efficiently by morphing an existing instruction.		/// can fold these early and efficiently by morphing an existing instruction.
		spatelUnsubmitted Not Done Reply Inline Actions The code comments are confusing - don't reuse 'D' if we've already specified that as the outer-level `and` instruction. It seems better to match the final `or` with a commutative matcher rather than duplicate all of this code twice. It would be something like: if (match(&I, m_c_Or(m_And(m_Value(A), m_Value(B)), m_Or(m_Value(C), m_Value(D)))) spatel: The code comments are confusing - don't reuse 'D' if we've already specified that as the outer…
		bipmisAuthorUnsubmitted Done Reply Inline Actions Ya the method and the comment is based on the current implementation where we do not actually reduce number of instructions, but reorder such that the AND's are now ORed and the existing implementation in InstCombine tryFactorization() folds them and reduces the instructions. bipmis: Ya the method and the comment is based on the current implementation where we do not actually…
static Instruction *foldXorToXor(BinaryOperator &I,		static Instruction *foldXorToXor(BinaryOperator &I,
InstCombiner::BuilderTy &Builder) {		InstCombiner::BuilderTy &Builder) {
assert(I.getOpcode() == Instruction::Xor);		assert(I.getOpcode() == Instruction::Xor);
Value *Op0 = I.getOperand(0);		Value *Op0 = I.getOperand(0);
		spatelUnsubmitted Not Done Reply Inline Actions Why use m_c_BinOp rather than m_c_And? If we are not using 'X', then there is no reason to capture it - just use plain `m_Value()`. spatel: Why use m_c_BinOp rather than m_c_And? If we are not using 'X', then there is no reason to…
		bipmisAuthorUnsubmitted Not Done Reply Inline Actions m_c_BinOp was used as it can look for 2 operands in either order which is needed and could reduce extra code. bipmis: m_c_BinOp was used as it can look for 2 operands in either order which is needed and could…
		bipmisAuthorUnsubmitted Done Reply Inline Actions But yes we could use m_c_And as well. Thanks for pointing this out. bipmis: But yes we could use m_c_And as well. Thanks for pointing this out.
Value *Op1 = I.getOperand(1);		Value *Op1 = I.getOperand(1);
Value A, B;		Value A, B;
		spatelUnsubmitted Not Done Reply Inline Actions I think it would be better (more efficient) to create the optimal pattern directly rather than relying on subsequent folds to do it for us. We can create the or(and(or...)) sequence right here. spatel: I think it would be better (more efficient) to create the optimal pattern directly rather than…
		bipmisAuthorUnsubmitted Done Reply Inline Actions Right. My point of view is if there is an existing implementation in the instruction combine to do this, should we redo the same here. I checked with some examples and multiple commuted sequences and the existing implementation could combine both constants and registers and reduce the number of instructions which was the intention. In doing so however it is not retaining the order of the operands correctly. bipmis: Right. My point of view is if there is an existing implementation in the instruction combine to…

// There are 4 commuted variants for each of the basic patterns.		// There are 4 commuted variants for each of the basic patterns.

// (A & B) ^ (A \| B) -> A ^ B		// (A & B) ^ (A \| B) -> A ^ B
// (A & B) ^ (B \| A) -> A ^ B		// (A & B) ^ (B \| A) -> A ^ B
		spatelUnsubmitted Not Done Reply Inline Actions 'C' is still operand 0 of the final 'or' (the same as in the above set of transforms), but these comments show it as operand 1. spatel: 'C' is still operand 0 of the final 'or' (the same as in the above set of transforms), but…
// (A \| B) ^ (A & B) -> A ^ B		// (A \| B) ^ (A & B) -> A ^ B
// (A \| B) ^ (B & A) -> A ^ B		// (A \| B) ^ (B & A) -> A ^ B
if (match(&I, m_c_Xor(m_And(m_Value(A), m_Value(B)),		if (match(&I, m_c_Xor(m_And(m_Value(A), m_Value(B)),
m_c_Or(m_Deferred(A), m_Deferred(B)))))		m_c_Or(m_Deferred(A), m_Deferred(B)))))
return BinaryOperator::CreateXor(A, B);		return BinaryOperator::CreateXor(A, B);

// (A \| ~B) ^ (~A \| B) -> A ^ B		// (A \| ~B) ^ (~A \| B) -> A ^ B
// (~B \| A) ^ (~A \| B) -> A ^ B		// (~B \| A) ^ (~A \| B) -> A ^ B
▲ Show 20 Lines • Show All 706 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/and-or.ll

Show First 20 Lines • Show All 333 Lines • ▼ Show 20 Lines	;
%extra_use_of_or = mul i8 %or, %and		%extra_use_of_or = mul i8 %or, %and
ret i8 %extra_use_of_or		ret i8 %extra_use_of_or
}		}

; ((B & C0) \| A) \| (B & C1) -> (B & C0\|C1) \| A		; ((B & C0) \| A) \| (B & C1) -> (B & C0\|C1) \| A
define i64 @or_or_and(i64 %x) {		define i64 @or_or_and(i64 %x) {
; CHECK-LABEL: @or_or_and(		; CHECK-LABEL: @or_or_and(
; CHECK-NEXT: [[TMP1:%.]] = lshr i64 [[X:%.]], 8		; CHECK-NEXT: [[TMP1:%.]] = lshr i64 [[X:%.]], 8
; CHECK-NEXT: [[SHL:%.*]] = and i64 [[TMP1]], 71776119061217280
; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[X]], 8		; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[X]], 8
; CHECK-NEXT: [[SHL3:%.*]] = and i64 [[TMP2]], -72057594037927936		; CHECK-NEXT: [[SHL3:%.*]] = and i64 [[TMP2]], -72057594037927936
; CHECK-NEXT: [[OR:%.*]] = or i64 [[SHL]], [[SHL3]]		; CHECK-NEXT: [[TMP3:%.*]] = and i64 [[TMP1]], 71777214277877760
; CHECK-NEXT: [[SHL6:%.*]] = and i64 [[TMP1]], 1095216660480		; CHECK-NEXT: [[OR7:%.*]] = or i64 [[TMP3]], [[SHL3]]
; CHECK-NEXT: [[OR7:%.*]] = or i64 [[OR]], [[SHL6]]
; CHECK-NEXT: ret i64 [[OR7]]		; CHECK-NEXT: ret i64 [[OR7]]
;		;
%1 = lshr i64 %x, 8		%1 = lshr i64 %x, 8
%shl = and i64 %1, 71776119061217280		%shl = and i64 %1, 71776119061217280
%2 = shl i64 %x, 8		%2 = shl i64 %x, 8
%shl3 = and i64 %2, -72057594037927936		%shl3 = and i64 %2, -72057594037927936
%or = or i64 %shl, %shl3		%or = or i64 %shl, %shl3
%shl6 = and i64 %1, 1095216660480		%shl6 = and i64 %1, 1095216660480
%or7 = or i64 %or, %shl6		%or7 = or i64 %or, %shl6
ret i64 %or7		ret i64 %or7
}		}

; (A \| (B & C0)) \| (B & C1) -> A \| (B & C0\|C1)		; (A \| (B & C0)) \| (B & C1) -> A \| (B & C0\|C1)
define i64 @or_or_and_commute0(i64 %x) {		define i64 @or_or_and_commute0(i64 %x) {
		dmgreenUnsubmitted Not Done Reply Inline Actions You can remove local_unnamed_addr #0 dmgreen: You can remove local_unnamed_addr #0
; CHECK-LABEL: @or_or_and_commute0(		; CHECK-LABEL: @or_or_and_commute0(
; CHECK-NEXT: [[TMP1:%.]] = lshr i64 [[X:%.]], 8		; CHECK-NEXT: [[TMP1:%.]] = lshr i64 [[X:%.]], 8
; CHECK-NEXT: [[SHL:%.*]] = and i64 [[TMP1]], 71776119061217280
; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[X]], 8		; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[X]], 8
; CHECK-NEXT: [[SHL3:%.*]] = and i64 [[TMP2]], -72057594037927936		; CHECK-NEXT: [[SHL3:%.*]] = and i64 [[TMP2]], -72057594037927936
; CHECK-NEXT: [[OR:%.*]] = or i64 [[SHL3]], [[SHL]]		; CHECK-NEXT: [[TMP3:%.*]] = and i64 [[TMP1]], 71777214277877760
; CHECK-NEXT: [[SHL6:%.*]] = and i64 [[TMP1]], 1095216660480		; CHECK-NEXT: [[OR7:%.*]] = or i64 [[SHL3]], [[TMP3]]
; CHECK-NEXT: [[OR7:%.*]] = or i64 [[OR]], [[SHL6]]
; CHECK-NEXT: ret i64 [[OR7]]		; CHECK-NEXT: ret i64 [[OR7]]
;		;
%1 = lshr i64 %x, 8		%1 = lshr i64 %x, 8
%shl = and i64 %1, 71776119061217280		%shl = and i64 %1, 71776119061217280
%2 = shl i64 %x, 8		%2 = shl i64 %x, 8
%shl3 = and i64 %2, -72057594037927936		%shl3 = and i64 %2, -72057594037927936
%or = or i64 %shl3, %shl		%or = or i64 %shl3, %shl
%shl6 = and i64 %1, 1095216660480		%shl6 = and i64 %1, 1095216660480
%or7 = or i64 %or, %shl6		%or7 = or i64 %or, %shl6
ret i64 %or7		ret i64 %or7
}		}

define i64 @or_or_or_and_complex(i64 noundef %i) {		define i64 @or_or_or_and_complex(i64 noundef %i) {
; CHECK-LABEL: @or_or_or_and_complex(		; CHECK-LABEL: @or_or_or_and_complex(
; CHECK-NEXT: [[TMP1:%.]] = lshr i64 [[I:%.]], 8		; CHECK-NEXT: [[TMP1:%.]] = lshr i64 [[I:%.]], 8
; CHECK-NEXT: [[SHL:%.*]] = and i64 [[TMP1]], 71776119061217280
; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[I]], 8		; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[I]], 8
; CHECK-NEXT: [[SHL3:%.*]] = and i64 [[TMP2]], -72057594037927936		; CHECK-NEXT: [[TMP3:%.*]] = and i64 [[TMP1]], 71777214294589695
; CHECK-NEXT: [[OR:%.*]] = or i64 [[SHL]], [[SHL3]]		; CHECK-NEXT: [[TMP4:%.*]] = and i64 [[TMP2]], -71777214294589696
; CHECK-NEXT: [[SHL6:%.*]] = and i64 [[TMP1]], 1095216660480		; CHECK-NEXT: [[OR27:%.*]] = or i64 [[TMP3]], [[TMP4]]
; CHECK-NEXT: [[OR7:%.*]] = or i64 [[OR]], [[SHL6]]
; CHECK-NEXT: [[SHL10:%.*]] = and i64 [[TMP2]], 280375465082880
; CHECK-NEXT: [[OR11:%.*]] = or i64 [[OR7]], [[SHL10]]
; CHECK-NEXT: [[SHL14:%.*]] = and i64 [[TMP1]], 16711680
; CHECK-NEXT: [[OR15:%.*]] = or i64 [[OR11]], [[SHL14]]
; CHECK-NEXT: [[SHL18:%.*]] = and i64 [[TMP2]], 4278190080
; CHECK-NEXT: [[OR19:%.*]] = or i64 [[OR15]], [[SHL18]]
; CHECK-NEXT: [[AND21:%.*]] = and i64 [[TMP1]], 255
; CHECK-NEXT: [[OR23:%.*]] = or i64 [[OR19]], [[AND21]]
; CHECK-NEXT: [[SHL26:%.*]] = and i64 [[TMP2]], 65280
; CHECK-NEXT: [[OR27:%.*]] = or i64 [[OR23]], [[SHL26]]
; CHECK-NEXT: ret i64 [[OR27]]		; CHECK-NEXT: ret i64 [[OR27]]
;		;
%1 = lshr i64 %i, 8		%1 = lshr i64 %i, 8
%shl = and i64 %1, 71776119061217280		%shl = and i64 %1, 71776119061217280
%2 = shl i64 %i, 8		%2 = shl i64 %i, 8
%shl3 = and i64 %2, -72057594037927936		%shl3 = and i64 %2, -72057594037927936
%or = or i64 %shl, %shl3		%or = or i64 %shl, %shl3
%shl6 = and i64 %1, 1095216660480		%shl6 = and i64 %1, 1095216660480
%or7 = or i64 %or, %shl6		%or7 = or i64 %or, %shl6
%shl10 = and i64 %2, 280375465082880		%shl10 = and i64 %2, 280375465082880
%or11 = or i64 %or7, %shl10		%or11 = or i64 %or7, %shl10
%shl14 = and i64 %1, 16711680		%shl14 = and i64 %1, 16711680
%or15 = or i64 %or11, %shl14		%or15 = or i64 %or11, %shl14
%shl18 = and i64 %2, 4278190080		%shl18 = and i64 %2, 4278190080
%or19 = or i64 %or15, %shl18		%or19 = or i64 %or15, %shl18
%and21 = and i64 %1, 255		%and21 = and i64 %1, 255
%or23 = or i64 %or19, %and21		%or23 = or i64 %or19, %and21
%shl26 = and i64 %2, 65280		%shl26 = and i64 %2, 65280
%or27 = or i64 %or23, %shl26		%or27 = or i64 %or23, %shl26
ret i64 %or27		ret i64 %or27
}		}
spatelUnsubmitted Not Done Reply Inline Actions Notice that pat1-4 are canonicalized to the same form as pat5-8, so these are not testing the patterns that you intended. You'll need to add an extra instruction to these tests (and also pat1-4 in the next set of tests) to maintain `%c` as operand 0 of the `or`. Search for "thwart complexity-based canonicalization" in the InstCombine test dir for examples of how to do this. spatel: Notice that pat1-4 are canonicalized to the same form as pat5-8, so these are not testing the…
bipmisAuthorUnsubmitted Done Reply Inline Actions Right this is possibly due to the existing implementation tryFactorization() which folds the or(and, and) to and(or()). It does it for all scenarios, however likely maintains a fixed position of the operands. bipmis: Right this is possibly due to the existing implementation tryFactorization() which folds the or…

llvm/test/Transforms/InstCombine/or.ll

Show First 20 Lines • Show All 377 Lines • ▼ Show 20 Lines	;
%D = or <2 x i32> %C1, %C2		%D = or <2 x i32> %C1, %C2
%E = icmp ne <2 x i32> %D, zeroinitializer		%E = icmp ne <2 x i32> %D, zeroinitializer
ret <2 x i1> %E		ret <2 x i1> %E
}		}

; PR4216		; PR4216
define i32 @test30(i32 %A) {		define i32 @test30(i32 %A) {
; CHECK-LABEL: @test30(		; CHECK-LABEL: @test30(
; CHECK-NEXT: [[D:%.]] = and i32 [[A:%.]], -58312		; CHECK-NEXT: [[TMP1:%.]] = and i32 [[A:%.]], -58312
; CHECK-NEXT: [[E:%.*]] = or i32 [[D]], 32962		; CHECK-NEXT: [[E:%.*]] = or i32 [[TMP1]], 32962
; CHECK-NEXT: ret i32 [[E]]		; CHECK-NEXT: ret i32 [[E]]
;		;
%B = or i32 %A, 32962 ; 0b1000_0000_1100_0010		%B = or i32 %A, 32962 ; 0b1000_0000_1100_0010
%C = and i32 %A, -65536 ; 0xffff0000		%C = and i32 %A, -65536 ; 0xffff0000
%D = and i32 %B, 40186 ; 0b1001_1100_1111_1010		%D = and i32 %B, 40186 ; 0b1001_1100_1111_1010
%E = or i32 %D, %C		%E = or i32 %D, %C
ret i32 %E		ret i32 %E
}		}

define <2 x i32> @test30vec(<2 x i32> %A) {		define <2 x i32> @test30vec(<2 x i32> %A) {
; CHECK-LABEL: @test30vec(		; CHECK-LABEL: @test30vec(
; CHECK-NEXT: [[C:%.]] = and <2 x i32> [[A:%.]], <i32 -65536, i32 -65536>		; CHECK-NEXT: [[TMP1:%.]] = and <2 x i32> [[A:%.]], <i32 -58312, i32 -58312>
; CHECK-NEXT: [[B:%.*]] = and <2 x i32> [[A]], <i32 7224, i32 7224>		; CHECK-NEXT: [[E:%.*]] = or <2 x i32> [[TMP1]], <i32 32962, i32 32962>
; CHECK-NEXT: [[D:%.*]] = or <2 x i32> [[B]], <i32 32962, i32 32962>
; CHECK-NEXT: [[E:%.*]] = or <2 x i32> [[D]], [[C]]
; CHECK-NEXT: ret <2 x i32> [[E]]		; CHECK-NEXT: ret <2 x i32> [[E]]
;		;
%B = or <2 x i32> %A, <i32 32962, i32 32962>		%B = or <2 x i32> %A, <i32 32962, i32 32962>
%C = and <2 x i32> %A, <i32 -65536, i32 -65536>		%C = and <2 x i32> %A, <i32 -65536, i32 -65536>
%D = and <2 x i32> %B, <i32 40186, i32 40186>		%D = and <2 x i32> %B, <i32 40186, i32 40186>
%E = or <2 x i32> %D, %C		%E = or <2 x i32> %D, %C
ret <2 x i32> %E		ret <2 x i32> %E
}		}
▲ Show 20 Lines • Show All 1,135 Lines • Show Last 20 Lines