This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
3/5
DAGCombiner.cpp
-
Target/X86/
-
X86/
-
X86InstrCompiler.td
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
sub1.ll
-
AMDGPU/
1/7
ds-sub-offset.ll
1
setcc-multiple-use.ll
-
ARM/
-
intrinsics-overflow.ll
-
usub_sat.ll
-
usub_sat_plus.ll
-
PowerPC/
-
bool-math.ll
-
select_const.ll
-
RISCV/
-
atomic-rmw.ll
-
atomic-signext.ll
-
SPARC/
-
64bit.ll

Differential D128123

[SDAG] try to replace subtract-from-constant with xor
ClosedPublic

Authored by spatel on Jun 18 2022, 8:11 AM.

Download Raw Diff

Details

Reviewers

RKSimon
foad
craig.topper
arsenm
deadalnix
dmgreen
asb

Commits

rG8b756713140f: [SDAG] try to replace subtract-from-constant with xor

Summary

This is almost the same as the abandoned D48529, but it allows splat vector constants too.
This replaces the x86-specific code that was added with the alternate patch D48557 with the original generic combine.
This transform is a less restricted form of an existing InstCombine and the proposed SDAG equivalent for that in D128080:
https://alive2.llvm.org/ce/z/OUm6N_

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Jun 18 2022, 8:11 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2022, 8:11 AM

Herald added subscribers: jsji, kosarev, StephenFan and 30 others. · View Herald Transcript

spatel requested review of this revision.Jun 18 2022, 8:11 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2022, 8:11 AM

Herald added subscribers: llvm-commits, • pcwang-thead, MaskRay, wdng. · View Herald Transcript

Harbormaster completed remote builds in B170676: Diff 438129.Jun 18 2022, 8:12 AM

spatel mentioned this in D128080: [SDAG] convert sub from (Pow2-1) constant into xor.Jun 18 2022, 8:12 AM

RKSimon added inline comments.Jun 20 2022, 1:32 AM

llvm/test/CodeGen/AMDGPU/ds-sub-offset.ll
277–329	Based off @foad's comments on D128080 - we need to either add a TLI override to control this or add a way for AMDGPU to reverse it.

RKSimon added inline comments.Jun 20 2022, 1:35 AM

llvm/test/CodeGen/AMDGPU/ds-sub-offset.ll
277–329	Hmm - how similar is TargetLowering::preferIncOfAddToSubOfNot ?

Nice change! RISC-V changes LGTM, and it seems to cause various small improvements in codegen for the GCC torture suite.

This revision is now accepted and ready to land.Jun 20 2022, 2:07 AM

spatel planned changes to this revision.Jun 20 2022, 3:33 AM

spatel added inline comments.

llvm/test/CodeGen/AMDGPU/ds-sub-offset.ll
277–329	At first glance, that seems more about constant materialization while this is about a "sub-from" instruction with an immediate operand. That hook is default true but overridden to false for vector types by ARM/AArch/PowerPC.

deadalnix added inline comments.Jun 26 2022, 6:40 PM

llvm/test/CodeGen/AMDGPU/ds-sub-offset.ll
277–329	TBH, it seems like a missed opportunity from the AMDGPU backend rather than a major problem with this patch. Isn't there someone from AMD we can get help from here?

foad added inline comments.Jun 27 2022, 4:42 AM

llvm/test/CodeGen/AMDGPU/ds-sub-offset.ll
277–329	AMDGPUDAGToDAGISel::SelectDS1Addr1Offset would have to be taught to match appropriate uses of ISD::XOR as well as ISD::SUB (similar to how addressing mode matchers have to match OR as well as ADD?). This code was added by @arsenm in D12283. It doesn't sound like a very important real world use case so maybe we could just ignore the regression?

deadalnix added inline comments.Jun 27 2022, 2:17 PM

llvm/test/CodeGen/AMDGPU/ds-sub-offset.ll
277–329	I think there is the regression, which is probably okay, but then there is what the test is actually testing for: maximum offset. So maybe a second test case for the maximum offset - that do not trigger the regression - should be added here, and then we can move on.

Patch updated:
No code changes. Based on review comments, I added 2 AMD offset transform tests that show no diffs with this patch and test comments for the changed tests in case there is motivation to restore those patterns.

This revision is now accepted and ready to land.Jun 30 2022, 8:35 AM

Harbormaster completed remote builds in B173054: Diff 441412.Jun 30 2022, 8:37 AM

spatel mentioned this in rGe44dcfb06eb3: [AMDGPU] add alternate tests for max-offset codegen; NFC.Jun 30 2022, 12:51 PM

Ping (AMDGPU) - are the test updates sufficient?

arsenm added inline comments.Jul 7 2022, 8:25 AM

llvm/test/CodeGen/AMDGPU/ds-sub-offset.ll
277–329	It might be nice if isBaseWithConstantOffset would recognize this like for add/or but I don't think this is that important
llvm/test/CodeGen/AMDGPU/setcc-multiple-use.ll
18–19	It's a bit weird to see a vector operation traded for a scalar one but this seems fine

This revision was landed with ongoing or failed builds.Jul 8 2022, 5:17 AM

Closed by commit rG8b756713140f: [SDAG] try to replace subtract-from-constant with xor (authored by spatel). · Explain Why

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rG8b756713140f: [SDAG] try to replace subtract-from-constant with xor.

barannikov88 added a subscriber: barannikov88.Jul 8 2022, 7:05 AM

barannikov88 added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3766	It might be more profitable to do this transformation in SimplifyDemandedBits. Some bits of the result might not be used downstream, in which case the check can be relaxed.

spatel marked an inline comment as done.Jul 8 2022, 8:35 AM

spatel added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3766	The known-bits vs. demanded-bits transforms are two different things if I'm seeing it correctly, although there can be some overlap. Demanded bits may already be used to shrink the constants, so that could enable this transform. If you have an example in mind that doesn't transform as expected, would you please file an issue, so we can track the fix? If it's not working here in codegen, it may also be missing from the IR equivalent. I updated the IR side to match this patch with: 79bb915fb60b

barannikov88 added inline comments.Jul 8 2022, 11:32 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3766	The known-bits vs. demanded-bits transforms are two different things if I'm seeing it correctly My understanding is as follows. SimplifyDemandedBits recurses down to operands of the node propagating information about what bits are needed of them. It might help to simplify the operands. On the way back from the reucursion, the operands tell the node which bits of them are known so that the node itself can be simplified. I could be wrong. I noticed that SimplifyDemandedBits is mainly used in DAGCombiner, mostly as a last resort if no other transformation could be applied. For some reason this function is not called for ISD::SUB. in which case the check can be relaxed. On the second thought, it does not look so simple. Suppose we have two 2-bit operands: C = 0b10 X = 0b1? and we're demanding the 1-th bit (counting from 0) from the SUB. We still have to check for possible overflow in 0-th bit.

barannikov88 added inline comments.Jul 8 2022, 3:09 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3766	By the way, a more general check would be to ignore borrow from the bit after the MSB. E.g. `(~X.KnownZero).isSubsetOf(C \| SignMask)` or `(~(X.KnownZero \| SignMask)).isSubsetOf(C)` or, which is probably a bit cleaner, `(C - ~X.KnownZero) == (C ^ ~X.KnownZero)` if I'm not mistaken. This, however, does not affect any existing tests. At least not in backend.

spatel marked an inline comment as done.Jul 10 2022, 4:08 AM

spatel added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3766	Yes, that looks like a good enhancement. Here's a try at a general proof and specific example for a regression test: https://alive2.llvm.org/ce/z/KABK4Z

spatel mentioned this in rGd0eec5f7e787: [SDAG] enhance sub->xor fold to ignore signbit.Jul 11 2022, 9:38 AM

Hi @spatel

I've seen some regressions with this (or maybe the similar update in instcombine that replace sub by xor. Those patterns typically involve a sub that is being used as index in a GEP. And I think that both SCEV for regular IR and our downstream machine IR scalar evolution is having a hard time to understand that the xor is a subtract in disguise. So instead of ending up with a tight loop with negative stride for the memory accesses we now end up with xor operations inside the loop. Not quite sure how to deal with the regressions. Maybe the scalar evolution implementations can be improved here. Or maybe our downstream ISel need to select sub instead of xor (if the reverse transform is easy).

My target has lots of different instructions that involve add/sub. We can fold it into "multiply and add/subtract", we can fold it into addressing modes as an offset or by using a post-update on the pointer, we can do add/subtract for both "general purpose" registers and for "pointer" registers. However, logical operations are more limited (specially when it comes to pointer arithmetic since we also will end up moving values between GPR:s that can do the logical operations and the pointer registers that can be used for addressing).

Since this is a one instruction instead of another single instruction (we do not reduce amount of instructions in these combines. I'm interested to understand what deems a xor to be better than sub. Are logical operations considered "better" than arithmetic operations in general, or what is the rule?

In D128123#3643353, @bjope wrote:

Hi @spatel

I've seen some regressions with this (or maybe the similar update in instcombine that replace sub by xor. Those patterns typically involve a sub that is being used as index in a GEP. And I think that both SCEV for regular IR and our downstream machine IR scalar evolution is having a hard time to understand that the xor is a subtract in disguise. So instead of ending up with a tight loop with negative stride for the memory accesses we now end up with xor operations inside the loop. Not quite sure how to deal with the regressions. Maybe the scalar evolution implementations can be improved here. Or maybe our downstream ISel need to select sub instead of xor (if the reverse transform is easy).

My target has lots of different instructions that involve add/sub. We can fold it into "multiply and add/subtract", we can fold it into addressing modes as an offset or by using a post-update on the pointer, we can do add/subtract for both "general purpose" registers and for "pointer" registers. However, logical operations are more limited (specially when it comes to pointer arithmetic since we also will end up moving values between GPR:s that can do the logical operations and the pointer registers that can be used for addressing).

Since this is a one instruction instead of another single instruction (we do not reduce amount of instructions in these combines. I'm interested to understand what deems a xor to be better than sub. Are logical operations considered "better" than arithmetic operations in general, or what is the rule?

Thanks for letting me know. The main codegen motivation for this transform is that it can allow using an immediate-form xor instruction rather than a separate load of the immediate when used as Op0 of a subtract. That kind of improvement is seen in the RISCV and AArch64 diffs. Our folds for bitwise logic tend to be better than sub too, and known-bits / demanded-bits are also easier with xor. The bit-tracking improvement is why I figured we should also do this in instcombine.

But it was already a borderline codegen transform because of the AMDGPU diffs, so we do very likely need to restrict this with a TLI hook. If you have examples of the regressions you're seeing that can be translated to an in-tree target, that would be great.

Not sure really if this ends up as a regression or not, but with this source we can see some differences if for example using heaxagon or systemz as targets.
(This example is related to doing sub->xor in instcombine so not exactly the patch in this review.)

Input C program (sub_xor.c):

static unsigned ARR[100];

extern void populate(unsigned*);

unsigned foo(unsigned *a) {
    populate(ARR);
    unsigned sum = 0;
    for (int j = 1; j < 4; ++j) {
        unsigned *ptr = &ARR[99];
        for (int i = 0; i < 70; ++i) {
            sum += j * *(ptr - i);
        }
    }
    return sum;
}

Compile with clang -target hexagon -O2 sub_xor.c -o sub_xor.ll -emit-llvm -S and the only difference will be some sub:s ending up as xor (depending on if the instcombine transform from commit 79bb915fb60b2cd2 is reverted or not).
If also comparing the result of llc -O2 sub_xor.ll -o - -print-after-all, given those two different versions of sub_xor.ll from prior step, we can see that Loop Strength Reduce can eliminate the sub but it does not eliminate the xor.

Well, this is the best in-tree example I got so far (when aiming at having a full test from C code, I guess this could be reduced to a much smaller test just running LSR to show that we get different result for sub vs xor).

In D128123#3643672, @bjope wrote:

Not sure really if this ends up as a regression or not, but with this source we can see some differences if for example using heaxagon or systemz as targets.
(This example is related to doing sub->xor in instcombine so not exactly the patch in this review.)

Thanks for the example. The instcombine variation of this fold has a different problem that I noticed in another example.
Can you try the patch below and see if that fixes the regressions for your target?

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
index 535a7736454c..c51ce0e8d1c3 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
@@ -1966,13 +1966,6 @@ Instruction *InstCombinerImpl::visitSub(BinaryOperator &I) {
       return BinaryOperator::CreateAdd(X, ConstantExpr::getSub(C, C2));
   }
 
-  // If there's no chance any bit will need to borrow from an adjacent bit:
-  // sub C, X --> xor X, C
-  const APInt *Op0C;
-  if (match(Op0, m_APInt(Op0C)) &&
-      (~computeKnownBits(Op1, 0, &I).Zero).isSubsetOf(*Op0C))
-    return BinaryOperator::CreateXor(Op1, Op0);
-
   {
     Value *Y;
     // X-(X+Y) == -Y    X-(Y+X) == -Y
@@ -2231,7 +2224,20 @@ Instruction *InstCombinerImpl::visitSub(BinaryOperator &I) {
         I, Builder.CreateIntrinsic(Intrinsic::ctpop, {I.getType()},
                                    {Builder.CreateNot(X)}));
 
-  return TryToNarrowDeduceFlags();
+  if (Instruction *R = TryToNarrowDeduceFlags())
+    return R;
+
+  // If there's no chance any bit will need to borrow from an adjacent bit:
+  // sub C, X --> xor X, C
+  // Avoid this fold if the sub has no-wrap flags because that could be an
+  // information-losing transform that we cannot recover from.
+  const APInt *Op0C;
+  if (!I.hasNoSignedWrap() && !I.hasNoUnsignedWrap() &&
+      match(Op0, m_APInt(Op0C)) &&
+      (~computeKnownBits(Op1, 0, &I).Zero).isSubsetOf(*Op0C))
+    return BinaryOperator::CreateXor(Op1, Op0);
+
+  return nullptr;
 }
 
 /// This eliminates floating-point negation in either 'fneg(X)' or

In D128123#3645270, @spatel wrote:

+  // If there's no chance any bit will need to borrow from an adjacent bit:
+  // sub C, X --> xor X, C
+  // Avoid this fold if the sub has no-wrap flags because that could be an
+  // information-losing transform that we cannot recover from.
+  const APInt *Op0C;
+  if (!I.hasNoSignedWrap() && !I.hasNoUnsignedWrap() &&
+      match(Op0, m_APInt(Op0C)) &&
+      (~computeKnownBits(Op1, 0, &I).Zero).isSubsetOf(*Op0C))
+    return BinaryOperator::CreateXor(Op1, Op0);

If there's no chance any bit will need to borrow from an adjacent bit then there is no unsigned wrap by definition, so I'm not sure losing the nuw flag would actually lose any information.

In D128123#3645509, @foad wrote:

If there's no chance any bit will need to borrow from an adjacent bit then there is no unsigned wrap by definition, so I'm not sure losing the nuw flag would actually lose any information.

Yes, good point. That suggests that we should also improve later passes to recognize xor as sub when profitable.

The test that I was looking where I don't think we can recover is:

define i1 @test_negative_combined_sub_signed_overflow(i8 %x) {
; CHECK-LABEL: @test_negative_combined_sub_signed_overflow(
; CHECK-NEXT:    ret i1 false
;
  %y = sub nsw i8 127, %x
  %z = icmp slt i8 %y, -1
  ret i1 %z
}

If we convert that to xor (after applying the equivalent of the SDAG patch in d0eec5f7e787), then we would have:

%y = xor i8 %x, 127
%z = icmp slt i8 %y, -1

And now we can't fold the whole thing to 'false' anymore.

Right, my argument only applies for nuw. I agree that losing nsw can still lose information.

In D128123#3645270, @spatel wrote:

In D128123#3643672, @bjope wrote:

Not sure really if this ends up as a regression or not, but with this source we can see some differences if for example using heaxagon or systemz as targets.
(This example is related to doing sub->xor in instcombine so not exactly the patch in this review.)

Thanks for the example. The instcombine variation of this fold has a different problem that I noticed in another example.
Can you try the patch below and see if that fixes the regressions for your target?

I rerun some benchmarks with that patch and the cycle counts were restored in those tests that earlier showed a regression. So that is great.
Still a small fear that xor is harder for us to lower into machine IR compared to sub (considering register constraints etc). But I guess that is something we should deal with downstream somehow (investigate if we can select xor as a sub when possible).

spatel mentioned this in D129650: [InstCombine] change conditions for transform of sub to xor.Jul 13 2022, 8:30 AM

spatel mentioned this in rG2c648f7d5826: [PhaseOrdering][SystemZ] add test for combining/unrolling; NFC.Jul 14 2022, 11:23 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

8 lines

Target/

X86/

X86InstrCompiler.td

38 lines

test/

CodeGen/

AArch64/

sub1.ll

8 lines

AMDGPU/

ds-sub-offset.ll

36 lines

setcc-multiple-use.ll

6 lines

ARM/

intrinsics-overflow.ll

6 lines

usub_sat.ll

6 lines

usub_sat_plus.ll

6 lines

PowerPC/

bool-math.ll

19 lines

select_const.ll

4 lines

RISCV/

atomic-rmw.ll

60 lines

atomic-signext.ll

12 lines

SPARC/

64bit.ll

4 lines

Diff 443216

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,756 Lines • ▼ Show 20 Lines	if (SDValue Carry = getAsCarry(TLI, N0)) {
SDValue Zero = DAG.getConstant(0, DL, VT);		SDValue Zero = DAG.getConstant(0, DL, VT);
SDValue NegX = DAG.getNode(ISD::SUB, DL, VT, Zero, X);		SDValue NegX = DAG.getNode(ISD::SUB, DL, VT, Zero, X);
return DAG.getNode(ISD::ADDCARRY, DL,		return DAG.getNode(ISD::ADDCARRY, DL,
DAG.getVTList(VT, Carry.getValueType()), NegX, Zero,		DAG.getVTList(VT, Carry.getValueType()), NegX, Zero,
Carry);		Carry);
}		}
}		}

		// If there's no chance any bit will need to borrow from an adjacent bit:
		// sub C, X --> xor X, C
		barannikov88Unsubmitted Done Reply Inline Actions It might be more profitable to do this transformation in SimplifyDemandedBits. Some bits of the result might not be used downstream, in which case the check can be relaxed. barannikov88: It might be more profitable to do this transformation in SimplifyDemandedBits. Some bits of the…
		spatelAuthorUnsubmitted Done Reply Inline Actions The known-bits vs. demanded-bits transforms are two different things if I'm seeing it correctly, although there can be some overlap. Demanded bits may already be used to shrink the constants, so that could enable this transform. If you have an example in mind that doesn't transform as expected, would you please file an issue, so we can track the fix? If it's not working here in codegen, it may also be missing from the IR equivalent. I updated the IR side to match this patch with: 79bb915fb60b spatel: The known-bits vs. demanded-bits transforms are two different things if I'm seeing it correctly…
		barannikov88Unsubmitted Not Done Reply Inline Actions The known-bits vs. demanded-bits transforms are two different things if I'm seeing it correctly My understanding is as follows. SimplifyDemandedBits recurses down to operands of the node propagating information about what bits are needed of them. It might help to simplify the operands. On the way back from the reucursion, the operands tell the node which bits of them are known so that the node itself can be simplified. I could be wrong. I noticed that SimplifyDemandedBits is mainly used in DAGCombiner, mostly as a last resort if no other transformation could be applied. For some reason this function is not called for ISD::SUB. in which case the check can be relaxed. On the second thought, it does not look so simple. Suppose we have two 2-bit operands: C = 0b10 X = 0b1? and we're demanding the 1-th bit (counting from 0) from the SUB. We still have to check for possible overflow in 0-th bit. barannikov88: > The known-bits vs. demanded-bits transforms are two different things if I'm seeing it…
		barannikov88Unsubmitted Not Done Reply Inline Actions By the way, a more general check would be to ignore borrow from the bit after the MSB. E.g. `(~X.KnownZero).isSubsetOf(C \| SignMask)` or `(~(X.KnownZero \| SignMask)).isSubsetOf(C)` or, which is probably a bit cleaner, `(C - ~X.KnownZero) == (C ^ ~X.KnownZero)` if I'm not mistaken. This, however, does not affect any existing tests. At least not in backend. barannikov88: By the way, a more general check would be to ignore borrow from the bit after the MSB. E.g. `…
		spatelAuthorUnsubmitted Done Reply Inline Actions Yes, that looks like a good enhancement. Here's a try at a general proof and specific example for a regression test: https://alive2.llvm.org/ce/z/KABK4Z spatel: Yes, that looks like a good enhancement. Here's a try at a general proof and specific example…
		if (ConstantSDNode *C0 = isConstOrConstSplat(N0)) {
		if (!C0->isOpaque() &&
		(~DAG.computeKnownBits(N1).Zero).isSubsetOf(C0->getAPIntValue()))
		return DAG.getNode(ISD::XOR, DL, VT, N1, N0);
		}

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitSUBSAT(SDNode *N) {		SDValue DAGCombiner::visitSUBSAT(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N0.getValueType();		EVT VT = N0.getValueType();
SDLoc DL(N);		SDLoc DL(N);
▲ Show 20 Lines • Show All 21,088 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrCompiler.td

Show First 20 Lines • Show All 1,526 Lines • ▼ Show 20 Lines	def : Pat<(xor GR8:$src1, -128),
(ADD8ri GR8:$src1, -128)>;		(ADD8ri GR8:$src1, -128)>;
def : Pat<(xor GR16:$src1, -32768),		def : Pat<(xor GR16:$src1, -32768),
(ADD16ri GR16:$src1, -32768)>;		(ADD16ri GR16:$src1, -32768)>;
def : Pat<(xor GR32:$src1, -2147483648),		def : Pat<(xor GR32:$src1, -2147483648),
(ADD32ri GR32:$src1, -2147483648)>;		(ADD32ri GR32:$src1, -2147483648)>;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Pattern match SUB as XOR
//===----------------------------------------------------------------------===//

// An immediate in the LHS of a subtract can't be encoded in the instruction.
// If there is no possibility of a borrow we can use an XOR instead of a SUB
// to enable the immediate to be folded.
// TODO: Move this to a DAG combine?

def sub_is_xor : PatFrag<(ops node:$lhs, node:$rhs), (sub node:$lhs, node:$rhs),[{
if (ConstantSDNode *CN = dyn_cast<ConstantSDNode>(N->getOperand(0))) {
KnownBits Known = CurDAG->computeKnownBits(N->getOperand(1));

// If all possible ones in the RHS are set in the LHS then there can't be
// a borrow and we can use xor.
return (~Known.Zero).isSubsetOf(CN->getAPIntValue());
}

return false;
}]>;

let AddedComplexity = 5 in {
def : Pat<(sub_is_xor imm:$src2, GR8:$src1),
(XOR8ri GR8:$src1, imm:$src2)>;
def : Pat<(sub_is_xor i16immSExt8:$src2, GR16:$src1),
(XOR16ri8 GR16:$src1, i16immSExt8:$src2)>;
def : Pat<(sub_is_xor imm:$src2, GR16:$src1),
(XOR16ri GR16:$src1, imm:$src2)>;
def : Pat<(sub_is_xor i32immSExt8:$src2, GR32:$src1),
(XOR32ri8 GR32:$src1, i32immSExt8:$src2)>;
def : Pat<(sub_is_xor imm:$src2, GR32:$src1),
(XOR32ri GR32:$src1, imm:$src2)>;
def : Pat<(sub_is_xor i64immSExt8:$src2, GR64:$src1),
(XOR64ri8 GR64:$src1, i64immSExt8:$src2)>;
def : Pat<(sub_is_xor i64immSExt32:$src2, GR64:$src1),
(XOR64ri32 GR64:$src1, i64immSExt32:$src2)>;
}

//===----------------------------------------------------------------------===//
// Some peepholes		// Some peepholes
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// Odd encoding trick: -128 fits into an 8-bit immediate field while		// Odd encoding trick: -128 fits into an 8-bit immediate field while
// +128 doesn't, so in this special case use a sub instead of an add.		// +128 doesn't, so in this special case use a sub instead of an add.
def : Pat<(add GR16:$src1, 128),		def : Pat<(add GR16:$src1, 128),
(SUB16ri8 GR16:$src1, -128)>;		(SUB16ri8 GR16:$src1, -128)>;
def : Pat<(store (add (loadi16 addr:$dst), 128), addr:$dst),		def : Pat<(store (add (loadi16 addr:$dst), 128), addr:$dst),
▲ Show 20 Lines • Show All 689 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sub1.ll

Show All 12 Lines	; CHECK-NEXT: ret
%r = and i64 %a1, %a2		%r = and i64 %a1, %a2
ret i64 %r		ret i64 %r
}		}

define i8 @masked_sub_i8(i8 %x) {		define i8 @masked_sub_i8(i8 %x) {
; CHECK-LABEL: masked_sub_i8:		; CHECK-LABEL: masked_sub_i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #5		; CHECK-NEXT: mov w8, #5
; CHECK-NEXT: mov w9, #7
; CHECK-NEXT: and w8, w0, w8		; CHECK-NEXT: and w8, w0, w8
; CHECK-NEXT: sub w0, w9, w8		; CHECK-NEXT: eor w0, w8, #0x7
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a = and i8 %x, 5		%a = and i8 %x, 5
%m = sub i8 7, %a		%m = sub i8 7, %a
ret i8 %m		ret i8 %m
}		}

define i8 @not_masked_sub_i8(i8 %x) {		define i8 @not_masked_sub_i8(i8 %x) {
; CHECK-LABEL: not_masked_sub_i8:		; CHECK-LABEL: not_masked_sub_i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #7		; CHECK-NEXT: mov w8, #7
; CHECK-NEXT: and w9, w0, #0x8		; CHECK-NEXT: and w9, w0, #0x8
; CHECK-NEXT: sub w0, w8, w9		; CHECK-NEXT: sub w0, w8, w9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a = and i8 %x, 8		%a = and i8 %x, 8
%m = sub i8 7, %a		%m = sub i8 7, %a
ret i8 %m		ret i8 %m
}		}

define i32 @masked_sub_i32(i32 %x) {		define i32 @masked_sub_i32(i32 %x) {
; CHECK-LABEL: masked_sub_i32:		; CHECK-LABEL: masked_sub_i32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #9		; CHECK-NEXT: mov w8, #9
; CHECK-NEXT: mov w9, #31
; CHECK-NEXT: and w8, w0, w8		; CHECK-NEXT: and w8, w0, w8
; CHECK-NEXT: sub w0, w9, w8		; CHECK-NEXT: eor w0, w8, #0x1f
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a = and i32 %x, 9		%a = and i32 %x, 9
%m = sub i32 31, %a		%m = sub i32 31, %a
ret i32 %m		ret i32 %m
}		}

define <4 x i32> @masked_sub_v4i32(<4 x i32> %x) {		define <4 x i32> @masked_sub_v4i32(<4 x i32> %x) {
; CHECK-LABEL: masked_sub_v4i32:		; CHECK-LABEL: masked_sub_v4i32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: movi v1.4s, #42		; CHECK-NEXT: movi v1.4s, #42
; CHECK-NEXT: movi v2.4s, #1, msl #8		; CHECK-NEXT: movi v2.4s, #1, msl #8
; CHECK-NEXT: and v0.16b, v0.16b, v1.16b		; CHECK-NEXT: and v0.16b, v0.16b, v1.16b
; CHECK-NEXT: sub v0.4s, v2.4s, v0.4s		; CHECK-NEXT: eor v0.16b, v0.16b, v2.16b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a = and <4 x i32> %x, <i32 42, i32 42, i32 42, i32 42>		%a = and <4 x i32> %x, <i32 42, i32 42, i32 42, i32 42>
%m = sub <4 x i32> <i32 511, i32 511, i32 511, i32 511>, %a		%m = sub <4 x i32> <i32 511, i32 511, i32 511, i32 511>, %a
ret <4 x i32> %m		ret <4 x i32> %m
}		}

llvm/test/CodeGen/AMDGPU/ds-sub-offset.ll

Show First 20 Lines • Show All 242 Lines • ▼ Show 20 Lines	; GFX11-NEXT: s_endpgm
%shl = shl i32 %x.i, 4		%shl = shl i32 %x.i, 4
%add = add i32 %shl, 65535		%add = add i32 %shl, 65535
%z = zext i32 %add to i64		%z = zext i32 %add to i64
%ptr = inttoptr i64 %z to i8 addrspace(3)*		%ptr = inttoptr i64 %z to i8 addrspace(3)*
store i8 13, i8 addrspace(3)* %ptr, align 1		store i8 13, i8 addrspace(3)* %ptr, align 1
ret void		ret void
}		}

		; this could have the offset transform, but sub became xor

define amdgpu_kernel void @add_x_shl_neg_to_sub_max_offset_alt() #1 {		define amdgpu_kernel void @add_x_shl_neg_to_sub_max_offset_alt() #1 {
; CI-LABEL: add_x_shl_neg_to_sub_max_offset_alt:		; CI-LABEL: add_x_shl_neg_to_sub_max_offset_alt:
; CI: ; %bb.0:		; CI: ; %bb.0:
; CI-NEXT: v_lshlrev_b32_e32 v0, 2, v0		; CI-NEXT: v_lshlrev_b32_e32 v0, 2, v0
; CI-NEXT: v_sub_i32_e32 v0, vcc, 0, v0		; CI-NEXT: v_xor_b32_e32 v0, 0xffff, v0
; CI-NEXT: v_mov_b32_e32 v1, 13		; CI-NEXT: v_mov_b32_e32 v1, 13
; CI-NEXT: s_mov_b32 m0, -1		; CI-NEXT: s_mov_b32 m0, -1
; CI-NEXT: ds_write_b8 v0, v1 offset:65535		; CI-NEXT: ds_write_b8 v0, v1
; CI-NEXT: s_endpgm		; CI-NEXT: s_endpgm
;		;
; GFX9-LABEL: add_x_shl_neg_to_sub_max_offset_alt:		; GFX9-LABEL: add_x_shl_neg_to_sub_max_offset_alt:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0		; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0
; GFX9-NEXT: v_sub_u32_e32 v0, 0, v0		; GFX9-NEXT: v_xor_b32_e32 v0, 0xffff, v0
; GFX9-NEXT: v_mov_b32_e32 v1, 13		; GFX9-NEXT: v_mov_b32_e32 v1, 13
; GFX9-NEXT: ds_write_b8 v0, v1 offset:65535		; GFX9-NEXT: ds_write_b8 v0, v1
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX10-LABEL: add_x_shl_neg_to_sub_max_offset_alt:		; GFX10-LABEL: add_x_shl_neg_to_sub_max_offset_alt:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0		; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
; GFX10-NEXT: v_mov_b32_e32 v1, 13		; GFX10-NEXT: v_mov_b32_e32 v1, 13
; GFX10-NEXT: v_sub_nc_u32_e32 v0, 0, v0		; GFX10-NEXT: v_xor_b32_e32 v0, 0xffff, v0
; GFX10-NEXT: ds_write_b8 v0, v1 offset:65535		; GFX10-NEXT: ds_write_b8 v0, v1
; GFX10-NEXT: s_endpgm		; GFX10-NEXT: s_endpgm
;		;
; GFX11-LABEL: add_x_shl_neg_to_sub_max_offset_alt:		; GFX11-LABEL: add_x_shl_neg_to_sub_max_offset_alt:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: v_dual_mov_b32 v1, 13 :: v_dual_lshlrev_b32 v0, 2, v0		; GFX11-NEXT: v_dual_mov_b32 v1, 13 :: v_dual_lshlrev_b32 v0, 2, v0
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_sub_nc_u32_e32 v0, 0, v0		; GFX11-NEXT: v_xor_b32_e32 v0, 0xffff, v0
; GFX11-NEXT: ds_store_b8 v0, v1 offset:65535		; GFX11-NEXT: ds_store_b8 v0, v1
; GFX11-NEXT: s_endpgm		; GFX11-NEXT: s_endpgm
%x.i = tail call i32 @llvm.amdgcn.workitem.id.x()		%x.i = tail call i32 @llvm.amdgcn.workitem.id.x()
%.neg = mul i32 %x.i, -4		%.neg = mul i32 %x.i, -4
%add = add i32 %.neg, 65535		%add = add i32 %.neg, 65535
%z = zext i32 %add to i64		%z = zext i32 %add to i64
%ptr = inttoptr i64 %z to i8 addrspace(3)*		%ptr = inttoptr i64 %z to i8 addrspace(3)*
store i8 13, i8 addrspace(3)* %ptr, align 1		store i8 13, i8 addrspace(3)* %ptr, align 1
ret void		ret void
}		}

		; this could have the offset transform, but sub became xor

define amdgpu_kernel void @add_x_shl_neg_to_sub_max_offset_not_canonical() #1 {		define amdgpu_kernel void @add_x_shl_neg_to_sub_max_offset_not_canonical() #1 {
; CI-LABEL: add_x_shl_neg_to_sub_max_offset_not_canonical:		; CI-LABEL: add_x_shl_neg_to_sub_max_offset_not_canonical:
; CI: ; %bb.0:		; CI: ; %bb.0:
; CI-NEXT: v_lshlrev_b32_e32 v0, 2, v0		; CI-NEXT: v_lshlrev_b32_e32 v0, 2, v0
; CI-NEXT: v_sub_i32_e32 v0, vcc, 0, v0		; CI-NEXT: v_xor_b32_e32 v0, 0xffff, v0
; CI-NEXT: v_mov_b32_e32 v1, 13		; CI-NEXT: v_mov_b32_e32 v1, 13
; CI-NEXT: s_mov_b32 m0, -1		; CI-NEXT: s_mov_b32 m0, -1
; CI-NEXT: ds_write_b8 v0, v1 offset:65535		; CI-NEXT: ds_write_b8 v0, v1
; CI-NEXT: s_endpgm		; CI-NEXT: s_endpgm
;		;
; GFX9-LABEL: add_x_shl_neg_to_sub_max_offset_not_canonical:		; GFX9-LABEL: add_x_shl_neg_to_sub_max_offset_not_canonical:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0		; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0
; GFX9-NEXT: v_sub_u32_e32 v0, 0, v0		; GFX9-NEXT: v_xor_b32_e32 v0, 0xffff, v0
; GFX9-NEXT: v_mov_b32_e32 v1, 13		; GFX9-NEXT: v_mov_b32_e32 v1, 13
; GFX9-NEXT: ds_write_b8 v0, v1 offset:65535		; GFX9-NEXT: ds_write_b8 v0, v1
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX10-LABEL: add_x_shl_neg_to_sub_max_offset_not_canonical:		; GFX10-LABEL: add_x_shl_neg_to_sub_max_offset_not_canonical:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0		; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
; GFX10-NEXT: v_mov_b32_e32 v1, 13		; GFX10-NEXT: v_mov_b32_e32 v1, 13
; GFX10-NEXT: v_sub_nc_u32_e32 v0, 0, v0		; GFX10-NEXT: v_xor_b32_e32 v0, 0xffff, v0
; GFX10-NEXT: ds_write_b8 v0, v1 offset:65535		; GFX10-NEXT: ds_write_b8 v0, v1
; GFX10-NEXT: s_endpgm		; GFX10-NEXT: s_endpgm
;		;
; GFX11-LABEL: add_x_shl_neg_to_sub_max_offset_not_canonical:		; GFX11-LABEL: add_x_shl_neg_to_sub_max_offset_not_canonical:
; GFX11: ; %bb.0:		; GFX11: ; %bb.0:
; GFX11-NEXT: v_dual_mov_b32 v1, 13 :: v_dual_lshlrev_b32 v0, 2, v0		; GFX11-NEXT: v_dual_mov_b32 v1, 13 :: v_dual_lshlrev_b32 v0, 2, v0
; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)		; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
; GFX11-NEXT: v_sub_nc_u32_e32 v0, 0, v0		; GFX11-NEXT: v_xor_b32_e32 v0, 0xffff, v0
; GFX11-NEXT: ds_store_b8 v0, v1 offset:65535		; GFX11-NEXT: ds_store_b8 v0, v1
; GFX11-NEXT: s_endpgm		; GFX11-NEXT: s_endpgm
		RKSimonUnsubmitted Not Done Reply Inline Actions Based off @foad's comments on D128080 - we need to either add a TLI override to control this or add a way for AMDGPU to reverse it. RKSimon: Based off @foad's comments on D128080 - we need to either add a TLI override to control this or…
		RKSimonUnsubmitted Not Done Reply Inline Actions Hmm - how similar is TargetLowering::preferIncOfAddToSubOfNot ? RKSimon: Hmm - how similar is TargetLowering::preferIncOfAddToSubOfNot ?
		spatelAuthorUnsubmitted Done Reply Inline Actions At first glance, that seems more about constant materialization while this is about a "sub-from" instruction with an immediate operand. That hook is default true but overridden to false for vector types by ARM/AArch/PowerPC. spatel: At first glance, that seems more about constant materialization while this is about a "sub…
		deadalnixUnsubmitted Not Done Reply Inline Actions TBH, it seems like a missed opportunity from the AMDGPU backend rather than a major problem with this patch. Isn't there someone from AMD we can get help from here? deadalnix: TBH, it seems like a missed opportunity from the AMDGPU backend rather than a major problem…
		foadUnsubmitted Not Done Reply Inline Actions AMDGPUDAGToDAGISel::SelectDS1Addr1Offset would have to be taught to match appropriate uses of ISD::XOR as well as ISD::SUB (similar to how addressing mode matchers have to match OR as well as ADD?). This code was added by @arsenm in D12283. It doesn't sound like a very important real world use case so maybe we could just ignore the regression? foad: AMDGPUDAGToDAGISel::SelectDS1Addr1Offset would have to be taught to match appropriate uses of…
		arsenmUnsubmitted Not Done Reply Inline Actions It might be nice if isBaseWithConstantOffset would recognize this like for add/or but I don't think this is that important arsenm: It might be nice if isBaseWithConstantOffset would recognize this like for add/or but I don't…
		deadalnixUnsubmitted Not Done Reply Inline Actions I think there is the regression, which is probably okay, but then there is what the test is actually testing for: maximum offset. So maybe a second test case for the maximum offset - that do not trigger the regression - should be added here, and then we can move on. deadalnix: I think there is the regression, which is probably okay, but then there is what the test is…
%x.i = call i32 @llvm.amdgcn.workitem.id.x() #0		%x.i = call i32 @llvm.amdgcn.workitem.id.x() #0
%neg = sub i32 0, %x.i		%neg = sub i32 0, %x.i
%shl = shl i32 %neg, 2		%shl = shl i32 %neg, 2
%add = add i32 65535, %shl		%add = add i32 65535, %shl
%ptr = inttoptr i32 %add to i8 addrspace(3)*		%ptr = inttoptr i32 %add to i8 addrspace(3)*
store i8 13, i8 addrspace(3)* %ptr		store i8 13, i8 addrspace(3)* %ptr
ret void		ret void
}		}
▲ Show 20 Lines • Show All 324 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/setcc-multiple-use.ll

	Show All 9 Lines
	define i32 @f() {			define i32 @f() {
	; CHECK-LABEL: f:			; CHECK-LABEL: f:
	; CHECK: ; %bb.0: ; %bb			; CHECK: ; %bb.0: ; %bb
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_waitcnt_vscnt null, 0x0			; CHECK-NEXT: s_waitcnt_vscnt null, 0x0
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: v_mov_b32_e32 v0, 0
	; CHECK-NEXT: ds_read_b32 v0, v0			; CHECK-NEXT: ds_read_b32 v0, v0
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
				; CHECK-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
				; CHECK-NEXT: v_cndmask_b32_e64 v1, 0, 1, vcc_lo
				arsenmUnsubmitted Not Done Reply Inline Actions It's a bit weird to see a vector operation traded for a scalar one but this seems fine arsenm: It's a bit weird to see a vector operation traded for a scalar one but this seems fine
	; CHECK-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v0			; CHECK-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v0
	; CHECK-NEXT: s_cmpk_lg_u32 vcc_lo, 0x0			; CHECK-NEXT: v_cndmask_b32_e32 v0, 0, v1, vcc_lo
	; CHECK-NEXT: s_subb_u32 s4, 1, 0
	; CHECK-NEXT: v_cndmask_b32_e64 v0, 0, s4, vcc_lo
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%i = load i32, i32 addrspace(3)* null, align 16			%i = load i32, i32 addrspace(3)* null, align 16
	%i6 = icmp ult i32 0, %i			%i6 = icmp ult i32 0, %i
	%i7 = sext i1 %i6 to i32			%i7 = sext i1 %i6 to i32
	%i8 = add i32 %i7, 1			%i8 = add i32 %i7, 1
	%i9 = and i32 %i8, %i7			%i9 = and i32 %i8, %i7
	ret i32 %i9			ret i32 %i9
	}			}

llvm/test/CodeGen/ARM/intrinsics-overflow.ll

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	define i32 @usub_overflow(i32 %a, i32 %b) #0 {
%2 = zext i1 %1 to i32		%2 = zext i1 %1 to i32
ret i32 %2		ret i32 %2

; CHECK-LABEL: usub_overflow:		; CHECK-LABEL: usub_overflow:

; ARM: subs r[[R0:[0-9]+]], r[[R0]], r[[R1:[0-9]+]]		; ARM: subs r[[R0:[0-9]+]], r[[R0]], r[[R1:[0-9]+]]
; ARM: mov r[[R2:[0-9]+]], #0		; ARM: mov r[[R2:[0-9]+]], #0
; ARM: adc r[[R0]], r[[R2]], #0		; ARM: adc r[[R0]], r[[R2]], #0
; ARM: rsb r[[R0]], r[[R0]], #1		; ARM: eor r[[R0]], r[[R0]], #1

; THUMBV6: movs r[[R2:[0-9]+]], #0		; THUMBV6: movs r[[R2:[0-9]+]], #0
; THUMBV6: subs r[[R0:[0-9]+]], r[[R0]], r[[R1:[0-9]+]]		; THUMBV6: subs r[[R0:[0-9]+]], r[[R0]], r[[R1:[0-9]+]]
; THUMBV6: adcs r[[R2]], r[[R2]]		; THUMBV6: adcs r[[R2]], r[[R2]]
; THUMBV6: movs r[[R0]], #1		; THUMBV6: movs r[[R0]], #1
; THUMBV6: subs r[[R0]], r[[R0]], r[[R2]]		; THUMBV6: eors r[[R0]], r[[R2]]

; THUMBV7: subs r[[R0:[0-9]+]], r[[R0]], r[[R1:[0-9]+]]		; THUMBV7: subs r[[R0:[0-9]+]], r[[R0]], r[[R1:[0-9]+]]
; THUMBV7: mov.w r[[R2:[0-9]+]], #0		; THUMBV7: mov.w r[[R2:[0-9]+]], #0
; THUMBV7: adc r[[R0]], r[[R2]], #0		; THUMBV7: adc r[[R0]], r[[R2]], #0
; THUMBV7: rsb.w r[[R0]], r[[R0]], #1		; THUMBV7: eor r[[R0]], r[[R0]], #1

; We should know that the overflow is just 1 bit,		; We should know that the overflow is just 1 bit,
; no need to clear any other bit		; no need to clear any other bit
; CHECK-NOT: and		; CHECK-NOT: and
}		}

define i32 @ssub_overflow(i32 %a, i32 %b) #0 {		define i32 @ssub_overflow(i32 %a, i32 %b) #0 {
%sadd = tail call { i32, i1 } @llvm.ssub.with.overflow.i32(i32 %a, i32 %b)		%sadd = tail call { i32, i1 } @llvm.ssub.with.overflow.i32(i32 %a, i32 %b)
Show All 24 Lines

llvm/test/CodeGen/ARM/usub_sat.ll

	Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; CHECK-T1-NEXT: push {r4, lr}			; CHECK-T1-NEXT: push {r4, lr}
	; CHECK-T1-NEXT: mov r4, r1			; CHECK-T1-NEXT: mov r4, r1
	; CHECK-T1-NEXT: movs r1, #0			; CHECK-T1-NEXT: movs r1, #0
	; CHECK-T1-NEXT: subs r2, r0, r2			; CHECK-T1-NEXT: subs r2, r0, r2
	; CHECK-T1-NEXT: sbcs r4, r3			; CHECK-T1-NEXT: sbcs r4, r3
	; CHECK-T1-NEXT: mov r0, r1			; CHECK-T1-NEXT: mov r0, r1
	; CHECK-T1-NEXT: adcs r0, r1			; CHECK-T1-NEXT: adcs r0, r1
	; CHECK-T1-NEXT: movs r3, #1			; CHECK-T1-NEXT: movs r3, #1
	; CHECK-T1-NEXT: subs r3, r3, r0			; CHECK-T1-NEXT: eors r3, r0
	; CHECK-T1-NEXT: mov r0, r1			; CHECK-T1-NEXT: mov r0, r1
	; CHECK-T1-NEXT: beq .LBB1_3			; CHECK-T1-NEXT: beq .LBB1_3
	; CHECK-T1-NEXT: @ %bb.1:			; CHECK-T1-NEXT: @ %bb.1:
	; CHECK-T1-NEXT: cmp r3, #0			; CHECK-T1-NEXT: cmp r3, #0
	; CHECK-T1-NEXT: beq .LBB1_4			; CHECK-T1-NEXT: beq .LBB1_4
	; CHECK-T1-NEXT: .LBB1_2:			; CHECK-T1-NEXT: .LBB1_2:
	; CHECK-T1-NEXT: pop {r4, pc}			; CHECK-T1-NEXT: pop {r4, pc}
	; CHECK-T1-NEXT: .LBB1_3:			; CHECK-T1-NEXT: .LBB1_3:
	; CHECK-T1-NEXT: mov r0, r2			; CHECK-T1-NEXT: mov r0, r2
	; CHECK-T1-NEXT: cmp r3, #0			; CHECK-T1-NEXT: cmp r3, #0
	; CHECK-T1-NEXT: bne .LBB1_2			; CHECK-T1-NEXT: bne .LBB1_2
	; CHECK-T1-NEXT: .LBB1_4:			; CHECK-T1-NEXT: .LBB1_4:
	; CHECK-T1-NEXT: mov r1, r4			; CHECK-T1-NEXT: mov r1, r4
	; CHECK-T1-NEXT: pop {r4, pc}			; CHECK-T1-NEXT: pop {r4, pc}
	;			;
	; CHECK-T2-LABEL: func2:			; CHECK-T2-LABEL: func2:
	; CHECK-T2: @ %bb.0:			; CHECK-T2: @ %bb.0:
	; CHECK-T2-NEXT: subs r0, r0, r2			; CHECK-T2-NEXT: subs r0, r0, r2
	; CHECK-T2-NEXT: mov.w r12, #0			; CHECK-T2-NEXT: mov.w r12, #0
	; CHECK-T2-NEXT: sbcs r1, r3			; CHECK-T2-NEXT: sbcs r1, r3
	; CHECK-T2-NEXT: adc r2, r12, #0			; CHECK-T2-NEXT: adc r2, r12, #0
	; CHECK-T2-NEXT: rsbs.w r2, r2, #1			; CHECK-T2-NEXT: eors r2, r2, #1
	; CHECK-T2-NEXT: itt ne			; CHECK-T2-NEXT: itt ne
	; CHECK-T2-NEXT: movne r0, #0			; CHECK-T2-NEXT: movne r0, #0
	; CHECK-T2-NEXT: movne r1, #0			; CHECK-T2-NEXT: movne r1, #0
	; CHECK-T2-NEXT: bx lr			; CHECK-T2-NEXT: bx lr
	;			;
	; CHECK-ARM-LABEL: func2:			; CHECK-ARM-LABEL: func2:
	; CHECK-ARM: @ %bb.0:			; CHECK-ARM: @ %bb.0:
	; CHECK-ARM-NEXT: subs r0, r0, r2			; CHECK-ARM-NEXT: subs r0, r0, r2
	; CHECK-ARM-NEXT: mov r12, #0			; CHECK-ARM-NEXT: mov r12, #0
	; CHECK-ARM-NEXT: sbcs r1, r1, r3			; CHECK-ARM-NEXT: sbcs r1, r1, r3
	; CHECK-ARM-NEXT: adc r2, r12, #0			; CHECK-ARM-NEXT: adc r2, r12, #0
	; CHECK-ARM-NEXT: rsbs r2, r2, #1			; CHECK-ARM-NEXT: eors r2, r2, #1
	; CHECK-ARM-NEXT: movwne r0, #0			; CHECK-ARM-NEXT: movwne r0, #0
	; CHECK-ARM-NEXT: movwne r1, #0			; CHECK-ARM-NEXT: movwne r1, #0
	; CHECK-ARM-NEXT: bx lr			; CHECK-ARM-NEXT: bx lr
	%tmp = call i64 @llvm.usub.sat.i64(i64 %x, i64 %y)			%tmp = call i64 @llvm.usub.sat.i64(i64 %x, i64 %y)
	ret i64 %tmp			ret i64 %tmp
	}			}

	define zeroext i16 @func16(i16 zeroext %x, i16 zeroext %y) nounwind {			define zeroext i16 @func16(i16 zeroext %x, i16 zeroext %y) nounwind {
	▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/usub_sat_plus.ll

	Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	; CHECK-T1-NEXT: movs r1, #0			; CHECK-T1-NEXT: movs r1, #0
	; CHECK-T1-NEXT: ldr r4, [sp, #12]			; CHECK-T1-NEXT: ldr r4, [sp, #12]
	; CHECK-T1-NEXT: ldr r3, [sp, #8]			; CHECK-T1-NEXT: ldr r3, [sp, #8]
	; CHECK-T1-NEXT: subs r3, r0, r3			; CHECK-T1-NEXT: subs r3, r0, r3
	; CHECK-T1-NEXT: sbcs r2, r4			; CHECK-T1-NEXT: sbcs r2, r4
	; CHECK-T1-NEXT: mov r0, r1			; CHECK-T1-NEXT: mov r0, r1
	; CHECK-T1-NEXT: adcs r0, r1			; CHECK-T1-NEXT: adcs r0, r1
	; CHECK-T1-NEXT: movs r4, #1			; CHECK-T1-NEXT: movs r4, #1
	; CHECK-T1-NEXT: subs r4, r4, r0			; CHECK-T1-NEXT: eors r4, r0
	; CHECK-T1-NEXT: mov r0, r1			; CHECK-T1-NEXT: mov r0, r1
	; CHECK-T1-NEXT: beq .LBB1_3			; CHECK-T1-NEXT: beq .LBB1_3
	; CHECK-T1-NEXT: @ %bb.1:			; CHECK-T1-NEXT: @ %bb.1:
	; CHECK-T1-NEXT: cmp r4, #0			; CHECK-T1-NEXT: cmp r4, #0
	; CHECK-T1-NEXT: beq .LBB1_4			; CHECK-T1-NEXT: beq .LBB1_4
	; CHECK-T1-NEXT: .LBB1_2:			; CHECK-T1-NEXT: .LBB1_2:
	; CHECK-T1-NEXT: pop {r4, pc}			; CHECK-T1-NEXT: pop {r4, pc}
	; CHECK-T1-NEXT: .LBB1_3:			; CHECK-T1-NEXT: .LBB1_3:
	; CHECK-T1-NEXT: mov r0, r3			; CHECK-T1-NEXT: mov r0, r3
	; CHECK-T1-NEXT: cmp r4, #0			; CHECK-T1-NEXT: cmp r4, #0
	; CHECK-T1-NEXT: bne .LBB1_2			; CHECK-T1-NEXT: bne .LBB1_2
	; CHECK-T1-NEXT: .LBB1_4:			; CHECK-T1-NEXT: .LBB1_4:
	; CHECK-T1-NEXT: mov r1, r2			; CHECK-T1-NEXT: mov r1, r2
	; CHECK-T1-NEXT: pop {r4, pc}			; CHECK-T1-NEXT: pop {r4, pc}
	;			;
	; CHECK-T2-LABEL: func64:			; CHECK-T2-LABEL: func64:
	; CHECK-T2: @ %bb.0:			; CHECK-T2: @ %bb.0:
	; CHECK-T2-NEXT: ldrd r2, r3, [sp]			; CHECK-T2-NEXT: ldrd r2, r3, [sp]
	; CHECK-T2-NEXT: mov.w r12, #0			; CHECK-T2-NEXT: mov.w r12, #0
	; CHECK-T2-NEXT: subs r0, r0, r2			; CHECK-T2-NEXT: subs r0, r0, r2
	; CHECK-T2-NEXT: sbcs r1, r3			; CHECK-T2-NEXT: sbcs r1, r3
	; CHECK-T2-NEXT: adc r2, r12, #0			; CHECK-T2-NEXT: adc r2, r12, #0
	; CHECK-T2-NEXT: rsbs.w r2, r2, #1			; CHECK-T2-NEXT: eors r2, r2, #1
	; CHECK-T2-NEXT: itt ne			; CHECK-T2-NEXT: itt ne
	; CHECK-T2-NEXT: movne r0, #0			; CHECK-T2-NEXT: movne r0, #0
	; CHECK-T2-NEXT: movne r1, #0			; CHECK-T2-NEXT: movne r1, #0
	; CHECK-T2-NEXT: bx lr			; CHECK-T2-NEXT: bx lr
	;			;
	; CHECK-ARM-LABEL: func64:			; CHECK-ARM-LABEL: func64:
	; CHECK-ARM: @ %bb.0:			; CHECK-ARM: @ %bb.0:
	; CHECK-ARM-NEXT: ldr r2, [sp]			; CHECK-ARM-NEXT: ldr r2, [sp]
	; CHECK-ARM-NEXT: mov r12, #0			; CHECK-ARM-NEXT: mov r12, #0
	; CHECK-ARM-NEXT: ldr r3, [sp, #4]			; CHECK-ARM-NEXT: ldr r3, [sp, #4]
	; CHECK-ARM-NEXT: subs r0, r0, r2			; CHECK-ARM-NEXT: subs r0, r0, r2
	; CHECK-ARM-NEXT: sbcs r1, r1, r3			; CHECK-ARM-NEXT: sbcs r1, r1, r3
	; CHECK-ARM-NEXT: adc r2, r12, #0			; CHECK-ARM-NEXT: adc r2, r12, #0
	; CHECK-ARM-NEXT: rsbs r2, r2, #1			; CHECK-ARM-NEXT: eors r2, r2, #1
	; CHECK-ARM-NEXT: movwne r0, #0			; CHECK-ARM-NEXT: movwne r0, #0
	; CHECK-ARM-NEXT: movwne r1, #0			; CHECK-ARM-NEXT: movwne r1, #0
	; CHECK-ARM-NEXT: bx lr			; CHECK-ARM-NEXT: bx lr
	%a = mul i64 %y, %z			%a = mul i64 %y, %z
	%tmp = call i64 @llvm.usub.sat.i64(i64 %x, i64 %z)			%tmp = call i64 @llvm.usub.sat.i64(i64 %x, i64 %z)
	ret i64 %tmp			ret i64 %tmp
	}			}

	▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/bool-math.ll

Show All 39 Lines	; CHECK-NEXT: blr
%z = zext i1 %c to i8		%z = zext i1 %c to i8
%r = sub i8 47, %z		%r = sub i8 47, %z
ret i8 %r		ret i8 %r
}		}

define i8 @add_zext_cmp_mask_same_size_result(i8 %x) {		define i8 @add_zext_cmp_mask_same_size_result(i8 %x) {
; CHECK-LABEL: add_zext_cmp_mask_same_size_result:		; CHECK-LABEL: add_zext_cmp_mask_same_size_result:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: clrlwi 3, 3, 31		; CHECK-NEXT: clrldi 3, 3, 63
; CHECK-NEXT: subfic 3, 3, 27		; CHECK-NEXT: xori 3, 3, 27
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%a = and i8 %x, 1		%a = and i8 %x, 1
%c = icmp eq i8 %a, 0		%c = icmp eq i8 %a, 0
%z = zext i1 %c to i8		%z = zext i1 %c to i8
%r = add i8 %z, 26		%r = add i8 %z, 26
ret i8 %r		ret i8 %r
}		}

define i32 @add_zext_cmp_mask_wider_result(i8 %x) {		define i32 @add_zext_cmp_mask_wider_result(i8 %x) {
; CHECK-LABEL: add_zext_cmp_mask_wider_result:		; CHECK-LABEL: add_zext_cmp_mask_wider_result:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: clrlwi 3, 3, 31		; CHECK-NEXT: clrldi 3, 3, 63
; CHECK-NEXT: subfic 3, 3, 27		; CHECK-NEXT: xori 3, 3, 27
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%a = and i8 %x, 1		%a = and i8 %x, 1
%c = icmp eq i8 %a, 0		%c = icmp eq i8 %a, 0
%z = zext i1 %c to i32		%z = zext i1 %c to i32
%r = add i32 %z, 26		%r = add i32 %z, 26
ret i32 %r		ret i32 %r
}		}

define i8 @add_zext_cmp_mask_narrower_result(i32 %x) {		define i8 @add_zext_cmp_mask_narrower_result(i32 %x) {
; CHECK-LABEL: add_zext_cmp_mask_narrower_result:		; CHECK-LABEL: add_zext_cmp_mask_narrower_result:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: clrlwi 3, 3, 31		; CHECK-NEXT: clrldi 3, 3, 63
; CHECK-NEXT: subfic 3, 3, 43		; CHECK-NEXT: xori 3, 3, 43
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%a = and i32 %x, 1		%a = and i32 %x, 1
%c = icmp eq i32 %a, 0		%c = icmp eq i32 %a, 0
%z = zext i1 %c to i8		%z = zext i1 %c to i8
%r = add i8 %z, 42		%r = add i8 %z, 42
ret i8 %r		ret i8 %r
}		}

Show All 31 Lines	; CHECK-NEXT: blr
%c = icmp eq i32 %a, 0		%c = icmp eq i32 %a, 0
%r = select i1 %c, i16 36, i16 37		%r = select i1 %c, i16 36, i16 37
ret i16 %r		ret i16 %r
}		}

define i8 @low_bit_select_constants_bigger_true_same_size_result(i8 %x) {		define i8 @low_bit_select_constants_bigger_true_same_size_result(i8 %x) {
; CHECK-LABEL: low_bit_select_constants_bigger_true_same_size_result:		; CHECK-LABEL: low_bit_select_constants_bigger_true_same_size_result:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
		; CHECK-NEXT: li 4, -29
; CHECK-NEXT: clrldi 3, 3, 63		; CHECK-NEXT: clrldi 3, 3, 63
; CHECK-NEXT: subfic 3, 3, -29		; CHECK-NEXT: xor 3, 3, 4
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%a = and i8 %x, 1		%a = and i8 %x, 1
%c = icmp eq i8 %a, 0		%c = icmp eq i8 %a, 0
%r = select i1 %c, i8 227, i8 226		%r = select i1 %c, i8 227, i8 226
ret i8 %r		ret i8 %r
}		}

define i32 @low_bit_select_constants_bigger_true_wider_result(i8 %x) {		define i32 @low_bit_select_constants_bigger_true_wider_result(i8 %x) {
; CHECK-LABEL: low_bit_select_constants_bigger_true_wider_result:		; CHECK-LABEL: low_bit_select_constants_bigger_true_wider_result:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: clrldi 3, 3, 63		; CHECK-NEXT: clrldi 3, 3, 63
; CHECK-NEXT: subfic 3, 3, 227		; CHECK-NEXT: xori 3, 3, 227
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%a = and i8 %x, 1		%a = and i8 %x, 1
%c = icmp eq i8 %a, 0		%c = icmp eq i8 %a, 0
%r = select i1 %c, i32 227, i32 226		%r = select i1 %c, i32 227, i32 226
ret i32 %r		ret i32 %r
}		}

define i8 @low_bit_select_constants_bigger_true_narrower_result(i16 %x) {		define i8 @low_bit_select_constants_bigger_true_narrower_result(i16 %x) {
; CHECK-LABEL: low_bit_select_constants_bigger_true_narrower_result:		; CHECK-LABEL: low_bit_select_constants_bigger_true_narrower_result:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: clrldi 3, 3, 63		; CHECK-NEXT: clrldi 3, 3, 63
; CHECK-NEXT: subfic 3, 3, 41		; CHECK-NEXT: xori 3, 3, 41
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%a = and i16 %x, 1		%a = and i16 %x, 1
%c = icmp eq i16 %a, 0		%c = icmp eq i16 %a, 0
%r = select i1 %c, i8 41, i8 40		%r = select i1 %c, i8 41, i8 40
ret i8 %r		ret i8 %r
}		}

llvm/test/CodeGen/PowerPC/select_const.ll

Show First 20 Lines • Show All 489 Lines • ▼ Show 20 Lines	; NO_ISEL-NEXT: blr
%bo = srem i8 120, %sel		%bo = srem i8 120, %sel
ret i8 %bo		ret i8 %bo
}		}

define i8 @sel_constants_urem_constant(i1 %cond) {		define i8 @sel_constants_urem_constant(i1 %cond) {
; ALL-LABEL: sel_constants_urem_constant:		; ALL-LABEL: sel_constants_urem_constant:
; ALL: # %bb.0:		; ALL: # %bb.0:
; ALL-NEXT: clrldi 3, 3, 63		; ALL-NEXT: clrldi 3, 3, 63
; ALL-NEXT: subfic 3, 3, 3		; ALL-NEXT: xori 3, 3, 3
; ALL-NEXT: blr		; ALL-NEXT: blr
%sel = select i1 %cond, i8 -4, i8 23		%sel = select i1 %cond, i8 -4, i8 23
%bo = urem i8 %sel, 5		%bo = urem i8 %sel, 5
ret i8 %bo		ret i8 %bo
}		}

define i8 @urem_constant_sel_constants(i1 %cond) {		define i8 @urem_constant_sel_constants(i1 %cond) {
; ISEL-LABEL: urem_constant_sel_constants:		; ISEL-LABEL: urem_constant_sel_constants:
Show All 18 Lines	; NO_ISEL-NEXT: blr
%bo = urem i8 120, %sel		%bo = urem i8 120, %sel
ret i8 %bo		ret i8 %bo
}		}

define i8 @sel_constants_and_constant(i1 %cond) {		define i8 @sel_constants_and_constant(i1 %cond) {
; ALL-LABEL: sel_constants_and_constant:		; ALL-LABEL: sel_constants_and_constant:
; ALL: # %bb.0:		; ALL: # %bb.0:
; ALL-NEXT: clrldi 3, 3, 63		; ALL-NEXT: clrldi 3, 3, 63
; ALL-NEXT: subfic 3, 3, 5		; ALL-NEXT: xori 3, 3, 5
; ALL-NEXT: blr		; ALL-NEXT: blr
%sel = select i1 %cond, i8 -4, i8 23		%sel = select i1 %cond, i8 -4, i8 23
%bo = and i8 %sel, 5		%bo = and i8 %sel, 5
ret i8 %bo		ret i8 %bo
}		}

define i8 @sel_constants_or_constant(i1 %cond) {		define i8 @sel_constants_or_constant(i1 %cond) {
; ISEL-LABEL: sel_constants_or_constant:		; ISEL-LABEL: sel_constants_or_constant:
▲ Show 20 Lines • Show All 412 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/atomic-rmw.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 2,045 Lines • ▼ Show 20 Lines
	; RV32IA-NEXT: andi a2, a0, -4			; RV32IA-NEXT: andi a2, a0, -4
	; RV32IA-NEXT: slli a0, a0, 3			; RV32IA-NEXT: slli a0, a0, 3
	; RV32IA-NEXT: andi a3, a0, 24			; RV32IA-NEXT: andi a3, a0, 24
	; RV32IA-NEXT: li a4, 255			; RV32IA-NEXT: li a4, 255
	; RV32IA-NEXT: sll a4, a4, a0			; RV32IA-NEXT: sll a4, a4, a0
	; RV32IA-NEXT: slli a1, a1, 24			; RV32IA-NEXT: slli a1, a1, 24
	; RV32IA-NEXT: srai a1, a1, 24			; RV32IA-NEXT: srai a1, a1, 24
	; RV32IA-NEXT: sll a1, a1, a0			; RV32IA-NEXT: sll a1, a1, a0
	; RV32IA-NEXT: li a5, 24			; RV32IA-NEXT: xori a3, a3, 24
	; RV32IA-NEXT: sub a3, a5, a3
	; RV32IA-NEXT: .LBB35_1: # =>This Inner Loop Header: Depth=1			; RV32IA-NEXT: .LBB35_1: # =>This Inner Loop Header: Depth=1
	; RV32IA-NEXT: lr.w a5, (a2)			; RV32IA-NEXT: lr.w a5, (a2)
	; RV32IA-NEXT: and a7, a5, a4			; RV32IA-NEXT: and a7, a5, a4
	; RV32IA-NEXT: mv a6, a5			; RV32IA-NEXT: mv a6, a5
	; RV32IA-NEXT: sll a7, a7, a3			; RV32IA-NEXT: sll a7, a7, a3
	; RV32IA-NEXT: sra a7, a7, a3			; RV32IA-NEXT: sra a7, a7, a3
	; RV32IA-NEXT: bge a7, a1, .LBB35_3			; RV32IA-NEXT: bge a7, a1, .LBB35_3
	; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB35_1 Depth=1			; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB35_1 Depth=1
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; RV64IA-NEXT: andi a2, a0, -4			; RV64IA-NEXT: andi a2, a0, -4
	; RV64IA-NEXT: slliw a0, a0, 3			; RV64IA-NEXT: slliw a0, a0, 3
	; RV64IA-NEXT: andi a3, a0, 24			; RV64IA-NEXT: andi a3, a0, 24
	; RV64IA-NEXT: li a4, 255			; RV64IA-NEXT: li a4, 255
	; RV64IA-NEXT: sllw a4, a4, a0			; RV64IA-NEXT: sllw a4, a4, a0
	; RV64IA-NEXT: slli a1, a1, 56			; RV64IA-NEXT: slli a1, a1, 56
	; RV64IA-NEXT: srai a1, a1, 56			; RV64IA-NEXT: srai a1, a1, 56
	; RV64IA-NEXT: sllw a1, a1, a0			; RV64IA-NEXT: sllw a1, a1, a0
	; RV64IA-NEXT: li a5, 56			; RV64IA-NEXT: xori a3, a3, 56
	; RV64IA-NEXT: sub a3, a5, a3
	; RV64IA-NEXT: .LBB35_1: # =>This Inner Loop Header: Depth=1			; RV64IA-NEXT: .LBB35_1: # =>This Inner Loop Header: Depth=1
	; RV64IA-NEXT: lr.w a5, (a2)			; RV64IA-NEXT: lr.w a5, (a2)
	; RV64IA-NEXT: and a7, a5, a4			; RV64IA-NEXT: and a7, a5, a4
	; RV64IA-NEXT: mv a6, a5			; RV64IA-NEXT: mv a6, a5
	; RV64IA-NEXT: sll a7, a7, a3			; RV64IA-NEXT: sll a7, a7, a3
	; RV64IA-NEXT: sra a7, a7, a3			; RV64IA-NEXT: sra a7, a7, a3
	; RV64IA-NEXT: bge a7, a1, .LBB35_3			; RV64IA-NEXT: bge a7, a1, .LBB35_3
	; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB35_1 Depth=1			; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB35_1 Depth=1
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; RV32IA-NEXT: andi a2, a0, -4			; RV32IA-NEXT: andi a2, a0, -4
	; RV32IA-NEXT: slli a0, a0, 3			; RV32IA-NEXT: slli a0, a0, 3
	; RV32IA-NEXT: andi a3, a0, 24			; RV32IA-NEXT: andi a3, a0, 24
	; RV32IA-NEXT: li a4, 255			; RV32IA-NEXT: li a4, 255
	; RV32IA-NEXT: sll a4, a4, a0			; RV32IA-NEXT: sll a4, a4, a0
	; RV32IA-NEXT: slli a1, a1, 24			; RV32IA-NEXT: slli a1, a1, 24
	; RV32IA-NEXT: srai a1, a1, 24			; RV32IA-NEXT: srai a1, a1, 24
	; RV32IA-NEXT: sll a1, a1, a0			; RV32IA-NEXT: sll a1, a1, a0
	; RV32IA-NEXT: li a5, 24			; RV32IA-NEXT: xori a3, a3, 24
	; RV32IA-NEXT: sub a3, a5, a3
	; RV32IA-NEXT: .LBB36_1: # =>This Inner Loop Header: Depth=1			; RV32IA-NEXT: .LBB36_1: # =>This Inner Loop Header: Depth=1
	; RV32IA-NEXT: lr.w.aq a5, (a2)			; RV32IA-NEXT: lr.w.aq a5, (a2)
	; RV32IA-NEXT: and a7, a5, a4			; RV32IA-NEXT: and a7, a5, a4
	; RV32IA-NEXT: mv a6, a5			; RV32IA-NEXT: mv a6, a5
	; RV32IA-NEXT: sll a7, a7, a3			; RV32IA-NEXT: sll a7, a7, a3
	; RV32IA-NEXT: sra a7, a7, a3			; RV32IA-NEXT: sra a7, a7, a3
	; RV32IA-NEXT: bge a7, a1, .LBB36_3			; RV32IA-NEXT: bge a7, a1, .LBB36_3
	; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB36_1 Depth=1			; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB36_1 Depth=1
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; RV64IA-NEXT: andi a2, a0, -4			; RV64IA-NEXT: andi a2, a0, -4
	; RV64IA-NEXT: slliw a0, a0, 3			; RV64IA-NEXT: slliw a0, a0, 3
	; RV64IA-NEXT: andi a3, a0, 24			; RV64IA-NEXT: andi a3, a0, 24
	; RV64IA-NEXT: li a4, 255			; RV64IA-NEXT: li a4, 255
	; RV64IA-NEXT: sllw a4, a4, a0			; RV64IA-NEXT: sllw a4, a4, a0
	; RV64IA-NEXT: slli a1, a1, 56			; RV64IA-NEXT: slli a1, a1, 56
	; RV64IA-NEXT: srai a1, a1, 56			; RV64IA-NEXT: srai a1, a1, 56
	; RV64IA-NEXT: sllw a1, a1, a0			; RV64IA-NEXT: sllw a1, a1, a0
	; RV64IA-NEXT: li a5, 56			; RV64IA-NEXT: xori a3, a3, 56
	; RV64IA-NEXT: sub a3, a5, a3
	; RV64IA-NEXT: .LBB36_1: # =>This Inner Loop Header: Depth=1			; RV64IA-NEXT: .LBB36_1: # =>This Inner Loop Header: Depth=1
	; RV64IA-NEXT: lr.w.aq a5, (a2)			; RV64IA-NEXT: lr.w.aq a5, (a2)
	; RV64IA-NEXT: and a7, a5, a4			; RV64IA-NEXT: and a7, a5, a4
	; RV64IA-NEXT: mv a6, a5			; RV64IA-NEXT: mv a6, a5
	; RV64IA-NEXT: sll a7, a7, a3			; RV64IA-NEXT: sll a7, a7, a3
	; RV64IA-NEXT: sra a7, a7, a3			; RV64IA-NEXT: sra a7, a7, a3
	; RV64IA-NEXT: bge a7, a1, .LBB36_3			; RV64IA-NEXT: bge a7, a1, .LBB36_3
	; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB36_1 Depth=1			; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB36_1 Depth=1
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; RV32IA-NEXT: andi a2, a0, -4			; RV32IA-NEXT: andi a2, a0, -4
	; RV32IA-NEXT: slli a0, a0, 3			; RV32IA-NEXT: slli a0, a0, 3
	; RV32IA-NEXT: andi a3, a0, 24			; RV32IA-NEXT: andi a3, a0, 24
	; RV32IA-NEXT: li a4, 255			; RV32IA-NEXT: li a4, 255
	; RV32IA-NEXT: sll a4, a4, a0			; RV32IA-NEXT: sll a4, a4, a0
	; RV32IA-NEXT: slli a1, a1, 24			; RV32IA-NEXT: slli a1, a1, 24
	; RV32IA-NEXT: srai a1, a1, 24			; RV32IA-NEXT: srai a1, a1, 24
	; RV32IA-NEXT: sll a1, a1, a0			; RV32IA-NEXT: sll a1, a1, a0
	; RV32IA-NEXT: li a5, 24			; RV32IA-NEXT: xori a3, a3, 24
	; RV32IA-NEXT: sub a3, a5, a3
	; RV32IA-NEXT: .LBB37_1: # =>This Inner Loop Header: Depth=1			; RV32IA-NEXT: .LBB37_1: # =>This Inner Loop Header: Depth=1
	; RV32IA-NEXT: lr.w a5, (a2)			; RV32IA-NEXT: lr.w a5, (a2)
	; RV32IA-NEXT: and a7, a5, a4			; RV32IA-NEXT: and a7, a5, a4
	; RV32IA-NEXT: mv a6, a5			; RV32IA-NEXT: mv a6, a5
	; RV32IA-NEXT: sll a7, a7, a3			; RV32IA-NEXT: sll a7, a7, a3
	; RV32IA-NEXT: sra a7, a7, a3			; RV32IA-NEXT: sra a7, a7, a3
	; RV32IA-NEXT: bge a7, a1, .LBB37_3			; RV32IA-NEXT: bge a7, a1, .LBB37_3
	; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB37_1 Depth=1			; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB37_1 Depth=1
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; RV64IA-NEXT: andi a2, a0, -4			; RV64IA-NEXT: andi a2, a0, -4
	; RV64IA-NEXT: slliw a0, a0, 3			; RV64IA-NEXT: slliw a0, a0, 3
	; RV64IA-NEXT: andi a3, a0, 24			; RV64IA-NEXT: andi a3, a0, 24
	; RV64IA-NEXT: li a4, 255			; RV64IA-NEXT: li a4, 255
	; RV64IA-NEXT: sllw a4, a4, a0			; RV64IA-NEXT: sllw a4, a4, a0
	; RV64IA-NEXT: slli a1, a1, 56			; RV64IA-NEXT: slli a1, a1, 56
	; RV64IA-NEXT: srai a1, a1, 56			; RV64IA-NEXT: srai a1, a1, 56
	; RV64IA-NEXT: sllw a1, a1, a0			; RV64IA-NEXT: sllw a1, a1, a0
	; RV64IA-NEXT: li a5, 56			; RV64IA-NEXT: xori a3, a3, 56
	; RV64IA-NEXT: sub a3, a5, a3
	; RV64IA-NEXT: .LBB37_1: # =>This Inner Loop Header: Depth=1			; RV64IA-NEXT: .LBB37_1: # =>This Inner Loop Header: Depth=1
	; RV64IA-NEXT: lr.w a5, (a2)			; RV64IA-NEXT: lr.w a5, (a2)
	; RV64IA-NEXT: and a7, a5, a4			; RV64IA-NEXT: and a7, a5, a4
	; RV64IA-NEXT: mv a6, a5			; RV64IA-NEXT: mv a6, a5
	; RV64IA-NEXT: sll a7, a7, a3			; RV64IA-NEXT: sll a7, a7, a3
	; RV64IA-NEXT: sra a7, a7, a3			; RV64IA-NEXT: sra a7, a7, a3
	; RV64IA-NEXT: bge a7, a1, .LBB37_3			; RV64IA-NEXT: bge a7, a1, .LBB37_3
	; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB37_1 Depth=1			; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB37_1 Depth=1
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; RV32IA-NEXT: andi a2, a0, -4			; RV32IA-NEXT: andi a2, a0, -4
	; RV32IA-NEXT: slli a0, a0, 3			; RV32IA-NEXT: slli a0, a0, 3
	; RV32IA-NEXT: andi a3, a0, 24			; RV32IA-NEXT: andi a3, a0, 24
	; RV32IA-NEXT: li a4, 255			; RV32IA-NEXT: li a4, 255
	; RV32IA-NEXT: sll a4, a4, a0			; RV32IA-NEXT: sll a4, a4, a0
	; RV32IA-NEXT: slli a1, a1, 24			; RV32IA-NEXT: slli a1, a1, 24
	; RV32IA-NEXT: srai a1, a1, 24			; RV32IA-NEXT: srai a1, a1, 24
	; RV32IA-NEXT: sll a1, a1, a0			; RV32IA-NEXT: sll a1, a1, a0
	; RV32IA-NEXT: li a5, 24			; RV32IA-NEXT: xori a3, a3, 24
	; RV32IA-NEXT: sub a3, a5, a3
	; RV32IA-NEXT: .LBB38_1: # =>This Inner Loop Header: Depth=1			; RV32IA-NEXT: .LBB38_1: # =>This Inner Loop Header: Depth=1
	; RV32IA-NEXT: lr.w.aq a5, (a2)			; RV32IA-NEXT: lr.w.aq a5, (a2)
	; RV32IA-NEXT: and a7, a5, a4			; RV32IA-NEXT: and a7, a5, a4
	; RV32IA-NEXT: mv a6, a5			; RV32IA-NEXT: mv a6, a5
	; RV32IA-NEXT: sll a7, a7, a3			; RV32IA-NEXT: sll a7, a7, a3
	; RV32IA-NEXT: sra a7, a7, a3			; RV32IA-NEXT: sra a7, a7, a3
	; RV32IA-NEXT: bge a7, a1, .LBB38_3			; RV32IA-NEXT: bge a7, a1, .LBB38_3
	; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB38_1 Depth=1			; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB38_1 Depth=1
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; RV64IA-NEXT: andi a2, a0, -4			; RV64IA-NEXT: andi a2, a0, -4
	; RV64IA-NEXT: slliw a0, a0, 3			; RV64IA-NEXT: slliw a0, a0, 3
	; RV64IA-NEXT: andi a3, a0, 24			; RV64IA-NEXT: andi a3, a0, 24
	; RV64IA-NEXT: li a4, 255			; RV64IA-NEXT: li a4, 255
	; RV64IA-NEXT: sllw a4, a4, a0			; RV64IA-NEXT: sllw a4, a4, a0
	; RV64IA-NEXT: slli a1, a1, 56			; RV64IA-NEXT: slli a1, a1, 56
	; RV64IA-NEXT: srai a1, a1, 56			; RV64IA-NEXT: srai a1, a1, 56
	; RV64IA-NEXT: sllw a1, a1, a0			; RV64IA-NEXT: sllw a1, a1, a0
	; RV64IA-NEXT: li a5, 56			; RV64IA-NEXT: xori a3, a3, 56
	; RV64IA-NEXT: sub a3, a5, a3
	; RV64IA-NEXT: .LBB38_1: # =>This Inner Loop Header: Depth=1			; RV64IA-NEXT: .LBB38_1: # =>This Inner Loop Header: Depth=1
	; RV64IA-NEXT: lr.w.aq a5, (a2)			; RV64IA-NEXT: lr.w.aq a5, (a2)
	; RV64IA-NEXT: and a7, a5, a4			; RV64IA-NEXT: and a7, a5, a4
	; RV64IA-NEXT: mv a6, a5			; RV64IA-NEXT: mv a6, a5
	; RV64IA-NEXT: sll a7, a7, a3			; RV64IA-NEXT: sll a7, a7, a3
	; RV64IA-NEXT: sra a7, a7, a3			; RV64IA-NEXT: sra a7, a7, a3
	; RV64IA-NEXT: bge a7, a1, .LBB38_3			; RV64IA-NEXT: bge a7, a1, .LBB38_3
	; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB38_1 Depth=1			; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB38_1 Depth=1
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; RV32IA-NEXT: andi a2, a0, -4			; RV32IA-NEXT: andi a2, a0, -4
	; RV32IA-NEXT: slli a0, a0, 3			; RV32IA-NEXT: slli a0, a0, 3
	; RV32IA-NEXT: andi a3, a0, 24			; RV32IA-NEXT: andi a3, a0, 24
	; RV32IA-NEXT: li a4, 255			; RV32IA-NEXT: li a4, 255
	; RV32IA-NEXT: sll a4, a4, a0			; RV32IA-NEXT: sll a4, a4, a0
	; RV32IA-NEXT: slli a1, a1, 24			; RV32IA-NEXT: slli a1, a1, 24
	; RV32IA-NEXT: srai a1, a1, 24			; RV32IA-NEXT: srai a1, a1, 24
	; RV32IA-NEXT: sll a1, a1, a0			; RV32IA-NEXT: sll a1, a1, a0
	; RV32IA-NEXT: li a5, 24			; RV32IA-NEXT: xori a3, a3, 24
	; RV32IA-NEXT: sub a3, a5, a3
	; RV32IA-NEXT: .LBB39_1: # =>This Inner Loop Header: Depth=1			; RV32IA-NEXT: .LBB39_1: # =>This Inner Loop Header: Depth=1
	; RV32IA-NEXT: lr.w.aqrl a5, (a2)			; RV32IA-NEXT: lr.w.aqrl a5, (a2)
	; RV32IA-NEXT: and a7, a5, a4			; RV32IA-NEXT: and a7, a5, a4
	; RV32IA-NEXT: mv a6, a5			; RV32IA-NEXT: mv a6, a5
	; RV32IA-NEXT: sll a7, a7, a3			; RV32IA-NEXT: sll a7, a7, a3
	; RV32IA-NEXT: sra a7, a7, a3			; RV32IA-NEXT: sra a7, a7, a3
	; RV32IA-NEXT: bge a7, a1, .LBB39_3			; RV32IA-NEXT: bge a7, a1, .LBB39_3
	; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB39_1 Depth=1			; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB39_1 Depth=1
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; RV64IA-NEXT: andi a2, a0, -4			; RV64IA-NEXT: andi a2, a0, -4
	; RV64IA-NEXT: slliw a0, a0, 3			; RV64IA-NEXT: slliw a0, a0, 3
	; RV64IA-NEXT: andi a3, a0, 24			; RV64IA-NEXT: andi a3, a0, 24
	; RV64IA-NEXT: li a4, 255			; RV64IA-NEXT: li a4, 255
	; RV64IA-NEXT: sllw a4, a4, a0			; RV64IA-NEXT: sllw a4, a4, a0
	; RV64IA-NEXT: slli a1, a1, 56			; RV64IA-NEXT: slli a1, a1, 56
	; RV64IA-NEXT: srai a1, a1, 56			; RV64IA-NEXT: srai a1, a1, 56
	; RV64IA-NEXT: sllw a1, a1, a0			; RV64IA-NEXT: sllw a1, a1, a0
	; RV64IA-NEXT: li a5, 56			; RV64IA-NEXT: xori a3, a3, 56
	; RV64IA-NEXT: sub a3, a5, a3
	; RV64IA-NEXT: .LBB39_1: # =>This Inner Loop Header: Depth=1			; RV64IA-NEXT: .LBB39_1: # =>This Inner Loop Header: Depth=1
	; RV64IA-NEXT: lr.w.aqrl a5, (a2)			; RV64IA-NEXT: lr.w.aqrl a5, (a2)
	; RV64IA-NEXT: and a7, a5, a4			; RV64IA-NEXT: and a7, a5, a4
	; RV64IA-NEXT: mv a6, a5			; RV64IA-NEXT: mv a6, a5
	; RV64IA-NEXT: sll a7, a7, a3			; RV64IA-NEXT: sll a7, a7, a3
	; RV64IA-NEXT: sra a7, a7, a3			; RV64IA-NEXT: sra a7, a7, a3
	; RV64IA-NEXT: bge a7, a1, .LBB39_3			; RV64IA-NEXT: bge a7, a1, .LBB39_3
	; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB39_1 Depth=1			; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB39_1 Depth=1
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; RV32IA-NEXT: andi a2, a0, -4			; RV32IA-NEXT: andi a2, a0, -4
	; RV32IA-NEXT: slli a0, a0, 3			; RV32IA-NEXT: slli a0, a0, 3
	; RV32IA-NEXT: andi a3, a0, 24			; RV32IA-NEXT: andi a3, a0, 24
	; RV32IA-NEXT: li a4, 255			; RV32IA-NEXT: li a4, 255
	; RV32IA-NEXT: sll a4, a4, a0			; RV32IA-NEXT: sll a4, a4, a0
	; RV32IA-NEXT: slli a1, a1, 24			; RV32IA-NEXT: slli a1, a1, 24
	; RV32IA-NEXT: srai a1, a1, 24			; RV32IA-NEXT: srai a1, a1, 24
	; RV32IA-NEXT: sll a1, a1, a0			; RV32IA-NEXT: sll a1, a1, a0
	; RV32IA-NEXT: li a5, 24			; RV32IA-NEXT: xori a3, a3, 24
	; RV32IA-NEXT: sub a3, a5, a3
	; RV32IA-NEXT: .LBB40_1: # =>This Inner Loop Header: Depth=1			; RV32IA-NEXT: .LBB40_1: # =>This Inner Loop Header: Depth=1
	; RV32IA-NEXT: lr.w a5, (a2)			; RV32IA-NEXT: lr.w a5, (a2)
	; RV32IA-NEXT: and a7, a5, a4			; RV32IA-NEXT: and a7, a5, a4
	; RV32IA-NEXT: mv a6, a5			; RV32IA-NEXT: mv a6, a5
	; RV32IA-NEXT: sll a7, a7, a3			; RV32IA-NEXT: sll a7, a7, a3
	; RV32IA-NEXT: sra a7, a7, a3			; RV32IA-NEXT: sra a7, a7, a3
	; RV32IA-NEXT: bge a1, a7, .LBB40_3			; RV32IA-NEXT: bge a1, a7, .LBB40_3
	; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB40_1 Depth=1			; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB40_1 Depth=1
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; RV64IA-NEXT: andi a2, a0, -4			; RV64IA-NEXT: andi a2, a0, -4
	; RV64IA-NEXT: slliw a0, a0, 3			; RV64IA-NEXT: slliw a0, a0, 3
	; RV64IA-NEXT: andi a3, a0, 24			; RV64IA-NEXT: andi a3, a0, 24
	; RV64IA-NEXT: li a4, 255			; RV64IA-NEXT: li a4, 255
	; RV64IA-NEXT: sllw a4, a4, a0			; RV64IA-NEXT: sllw a4, a4, a0
	; RV64IA-NEXT: slli a1, a1, 56			; RV64IA-NEXT: slli a1, a1, 56
	; RV64IA-NEXT: srai a1, a1, 56			; RV64IA-NEXT: srai a1, a1, 56
	; RV64IA-NEXT: sllw a1, a1, a0			; RV64IA-NEXT: sllw a1, a1, a0
	; RV64IA-NEXT: li a5, 56			; RV64IA-NEXT: xori a3, a3, 56
	; RV64IA-NEXT: sub a3, a5, a3
	; RV64IA-NEXT: .LBB40_1: # =>This Inner Loop Header: Depth=1			; RV64IA-NEXT: .LBB40_1: # =>This Inner Loop Header: Depth=1
	; RV64IA-NEXT: lr.w a5, (a2)			; RV64IA-NEXT: lr.w a5, (a2)
	; RV64IA-NEXT: and a7, a5, a4			; RV64IA-NEXT: and a7, a5, a4
	; RV64IA-NEXT: mv a6, a5			; RV64IA-NEXT: mv a6, a5
	; RV64IA-NEXT: sll a7, a7, a3			; RV64IA-NEXT: sll a7, a7, a3
	; RV64IA-NEXT: sra a7, a7, a3			; RV64IA-NEXT: sra a7, a7, a3
	; RV64IA-NEXT: bge a1, a7, .LBB40_3			; RV64IA-NEXT: bge a1, a7, .LBB40_3
	; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB40_1 Depth=1			; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB40_1 Depth=1
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; RV32IA-NEXT: andi a2, a0, -4			; RV32IA-NEXT: andi a2, a0, -4
	; RV32IA-NEXT: slli a0, a0, 3			; RV32IA-NEXT: slli a0, a0, 3
	; RV32IA-NEXT: andi a3, a0, 24			; RV32IA-NEXT: andi a3, a0, 24
	; RV32IA-NEXT: li a4, 255			; RV32IA-NEXT: li a4, 255
	; RV32IA-NEXT: sll a4, a4, a0			; RV32IA-NEXT: sll a4, a4, a0
	; RV32IA-NEXT: slli a1, a1, 24			; RV32IA-NEXT: slli a1, a1, 24
	; RV32IA-NEXT: srai a1, a1, 24			; RV32IA-NEXT: srai a1, a1, 24
	; RV32IA-NEXT: sll a1, a1, a0			; RV32IA-NEXT: sll a1, a1, a0
	; RV32IA-NEXT: li a5, 24			; RV32IA-NEXT: xori a3, a3, 24
	; RV32IA-NEXT: sub a3, a5, a3
	; RV32IA-NEXT: .LBB41_1: # =>This Inner Loop Header: Depth=1			; RV32IA-NEXT: .LBB41_1: # =>This Inner Loop Header: Depth=1
	; RV32IA-NEXT: lr.w.aq a5, (a2)			; RV32IA-NEXT: lr.w.aq a5, (a2)
	; RV32IA-NEXT: and a7, a5, a4			; RV32IA-NEXT: and a7, a5, a4
	; RV32IA-NEXT: mv a6, a5			; RV32IA-NEXT: mv a6, a5
	; RV32IA-NEXT: sll a7, a7, a3			; RV32IA-NEXT: sll a7, a7, a3
	; RV32IA-NEXT: sra a7, a7, a3			; RV32IA-NEXT: sra a7, a7, a3
	; RV32IA-NEXT: bge a1, a7, .LBB41_3			; RV32IA-NEXT: bge a1, a7, .LBB41_3
	; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB41_1 Depth=1			; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB41_1 Depth=1
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; RV64IA-NEXT: andi a2, a0, -4			; RV64IA-NEXT: andi a2, a0, -4
	; RV64IA-NEXT: slliw a0, a0, 3			; RV64IA-NEXT: slliw a0, a0, 3
	; RV64IA-NEXT: andi a3, a0, 24			; RV64IA-NEXT: andi a3, a0, 24
	; RV64IA-NEXT: li a4, 255			; RV64IA-NEXT: li a4, 255
	; RV64IA-NEXT: sllw a4, a4, a0			; RV64IA-NEXT: sllw a4, a4, a0
	; RV64IA-NEXT: slli a1, a1, 56			; RV64IA-NEXT: slli a1, a1, 56
	; RV64IA-NEXT: srai a1, a1, 56			; RV64IA-NEXT: srai a1, a1, 56
	; RV64IA-NEXT: sllw a1, a1, a0			; RV64IA-NEXT: sllw a1, a1, a0
	; RV64IA-NEXT: li a5, 56			; RV64IA-NEXT: xori a3, a3, 56
	; RV64IA-NEXT: sub a3, a5, a3
	; RV64IA-NEXT: .LBB41_1: # =>This Inner Loop Header: Depth=1			; RV64IA-NEXT: .LBB41_1: # =>This Inner Loop Header: Depth=1
	; RV64IA-NEXT: lr.w.aq a5, (a2)			; RV64IA-NEXT: lr.w.aq a5, (a2)
	; RV64IA-NEXT: and a7, a5, a4			; RV64IA-NEXT: and a7, a5, a4
	; RV64IA-NEXT: mv a6, a5			; RV64IA-NEXT: mv a6, a5
	; RV64IA-NEXT: sll a7, a7, a3			; RV64IA-NEXT: sll a7, a7, a3
	; RV64IA-NEXT: sra a7, a7, a3			; RV64IA-NEXT: sra a7, a7, a3
	; RV64IA-NEXT: bge a1, a7, .LBB41_3			; RV64IA-NEXT: bge a1, a7, .LBB41_3
	; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB41_1 Depth=1			; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB41_1 Depth=1
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; RV32IA-NEXT: andi a2, a0, -4			; RV32IA-NEXT: andi a2, a0, -4
	; RV32IA-NEXT: slli a0, a0, 3			; RV32IA-NEXT: slli a0, a0, 3
	; RV32IA-NEXT: andi a3, a0, 24			; RV32IA-NEXT: andi a3, a0, 24
	; RV32IA-NEXT: li a4, 255			; RV32IA-NEXT: li a4, 255
	; RV32IA-NEXT: sll a4, a4, a0			; RV32IA-NEXT: sll a4, a4, a0
	; RV32IA-NEXT: slli a1, a1, 24			; RV32IA-NEXT: slli a1, a1, 24
	; RV32IA-NEXT: srai a1, a1, 24			; RV32IA-NEXT: srai a1, a1, 24
	; RV32IA-NEXT: sll a1, a1, a0			; RV32IA-NEXT: sll a1, a1, a0
	; RV32IA-NEXT: li a5, 24			; RV32IA-NEXT: xori a3, a3, 24
	; RV32IA-NEXT: sub a3, a5, a3
	; RV32IA-NEXT: .LBB42_1: # =>This Inner Loop Header: Depth=1			; RV32IA-NEXT: .LBB42_1: # =>This Inner Loop Header: Depth=1
	; RV32IA-NEXT: lr.w a5, (a2)			; RV32IA-NEXT: lr.w a5, (a2)
	; RV32IA-NEXT: and a7, a5, a4			; RV32IA-NEXT: and a7, a5, a4
	; RV32IA-NEXT: mv a6, a5			; RV32IA-NEXT: mv a6, a5
	; RV32IA-NEXT: sll a7, a7, a3			; RV32IA-NEXT: sll a7, a7, a3
	; RV32IA-NEXT: sra a7, a7, a3			; RV32IA-NEXT: sra a7, a7, a3
	; RV32IA-NEXT: bge a1, a7, .LBB42_3			; RV32IA-NEXT: bge a1, a7, .LBB42_3
	; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB42_1 Depth=1			; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB42_1 Depth=1
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; RV64IA-NEXT: andi a2, a0, -4			; RV64IA-NEXT: andi a2, a0, -4
	; RV64IA-NEXT: slliw a0, a0, 3			; RV64IA-NEXT: slliw a0, a0, 3
	; RV64IA-NEXT: andi a3, a0, 24			; RV64IA-NEXT: andi a3, a0, 24
	; RV64IA-NEXT: li a4, 255			; RV64IA-NEXT: li a4, 255
	; RV64IA-NEXT: sllw a4, a4, a0			; RV64IA-NEXT: sllw a4, a4, a0
	; RV64IA-NEXT: slli a1, a1, 56			; RV64IA-NEXT: slli a1, a1, 56
	; RV64IA-NEXT: srai a1, a1, 56			; RV64IA-NEXT: srai a1, a1, 56
	; RV64IA-NEXT: sllw a1, a1, a0			; RV64IA-NEXT: sllw a1, a1, a0
	; RV64IA-NEXT: li a5, 56			; RV64IA-NEXT: xori a3, a3, 56
	; RV64IA-NEXT: sub a3, a5, a3
	; RV64IA-NEXT: .LBB42_1: # =>This Inner Loop Header: Depth=1			; RV64IA-NEXT: .LBB42_1: # =>This Inner Loop Header: Depth=1
	; RV64IA-NEXT: lr.w a5, (a2)			; RV64IA-NEXT: lr.w a5, (a2)
	; RV64IA-NEXT: and a7, a5, a4			; RV64IA-NEXT: and a7, a5, a4
	; RV64IA-NEXT: mv a6, a5			; RV64IA-NEXT: mv a6, a5
	; RV64IA-NEXT: sll a7, a7, a3			; RV64IA-NEXT: sll a7, a7, a3
	; RV64IA-NEXT: sra a7, a7, a3			; RV64IA-NEXT: sra a7, a7, a3
	; RV64IA-NEXT: bge a1, a7, .LBB42_3			; RV64IA-NEXT: bge a1, a7, .LBB42_3
	; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB42_1 Depth=1			; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB42_1 Depth=1
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; RV32IA-NEXT: andi a2, a0, -4			; RV32IA-NEXT: andi a2, a0, -4
	; RV32IA-NEXT: slli a0, a0, 3			; RV32IA-NEXT: slli a0, a0, 3
	; RV32IA-NEXT: andi a3, a0, 24			; RV32IA-NEXT: andi a3, a0, 24
	; RV32IA-NEXT: li a4, 255			; RV32IA-NEXT: li a4, 255
	; RV32IA-NEXT: sll a4, a4, a0			; RV32IA-NEXT: sll a4, a4, a0
	; RV32IA-NEXT: slli a1, a1, 24			; RV32IA-NEXT: slli a1, a1, 24
	; RV32IA-NEXT: srai a1, a1, 24			; RV32IA-NEXT: srai a1, a1, 24
	; RV32IA-NEXT: sll a1, a1, a0			; RV32IA-NEXT: sll a1, a1, a0
	; RV32IA-NEXT: li a5, 24			; RV32IA-NEXT: xori a3, a3, 24
	; RV32IA-NEXT: sub a3, a5, a3
	; RV32IA-NEXT: .LBB43_1: # =>This Inner Loop Header: Depth=1			; RV32IA-NEXT: .LBB43_1: # =>This Inner Loop Header: Depth=1
	; RV32IA-NEXT: lr.w.aq a5, (a2)			; RV32IA-NEXT: lr.w.aq a5, (a2)
	; RV32IA-NEXT: and a7, a5, a4			; RV32IA-NEXT: and a7, a5, a4
	; RV32IA-NEXT: mv a6, a5			; RV32IA-NEXT: mv a6, a5
	; RV32IA-NEXT: sll a7, a7, a3			; RV32IA-NEXT: sll a7, a7, a3
	; RV32IA-NEXT: sra a7, a7, a3			; RV32IA-NEXT: sra a7, a7, a3
	; RV32IA-NEXT: bge a1, a7, .LBB43_3			; RV32IA-NEXT: bge a1, a7, .LBB43_3
	; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB43_1 Depth=1			; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB43_1 Depth=1
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; RV64IA-NEXT: andi a2, a0, -4			; RV64IA-NEXT: andi a2, a0, -4
	; RV64IA-NEXT: slliw a0, a0, 3			; RV64IA-NEXT: slliw a0, a0, 3
	; RV64IA-NEXT: andi a3, a0, 24			; RV64IA-NEXT: andi a3, a0, 24
	; RV64IA-NEXT: li a4, 255			; RV64IA-NEXT: li a4, 255
	; RV64IA-NEXT: sllw a4, a4, a0			; RV64IA-NEXT: sllw a4, a4, a0
	; RV64IA-NEXT: slli a1, a1, 56			; RV64IA-NEXT: slli a1, a1, 56
	; RV64IA-NEXT: srai a1, a1, 56			; RV64IA-NEXT: srai a1, a1, 56
	; RV64IA-NEXT: sllw a1, a1, a0			; RV64IA-NEXT: sllw a1, a1, a0
	; RV64IA-NEXT: li a5, 56			; RV64IA-NEXT: xori a3, a3, 56
	; RV64IA-NEXT: sub a3, a5, a3
	; RV64IA-NEXT: .LBB43_1: # =>This Inner Loop Header: Depth=1			; RV64IA-NEXT: .LBB43_1: # =>This Inner Loop Header: Depth=1
	; RV64IA-NEXT: lr.w.aq a5, (a2)			; RV64IA-NEXT: lr.w.aq a5, (a2)
	; RV64IA-NEXT: and a7, a5, a4			; RV64IA-NEXT: and a7, a5, a4
	; RV64IA-NEXT: mv a6, a5			; RV64IA-NEXT: mv a6, a5
	; RV64IA-NEXT: sll a7, a7, a3			; RV64IA-NEXT: sll a7, a7, a3
	; RV64IA-NEXT: sra a7, a7, a3			; RV64IA-NEXT: sra a7, a7, a3
	; RV64IA-NEXT: bge a1, a7, .LBB43_3			; RV64IA-NEXT: bge a1, a7, .LBB43_3
	; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB43_1 Depth=1			; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB43_1 Depth=1
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; RV32IA-NEXT: andi a2, a0, -4			; RV32IA-NEXT: andi a2, a0, -4
	; RV32IA-NEXT: slli a0, a0, 3			; RV32IA-NEXT: slli a0, a0, 3
	; RV32IA-NEXT: andi a3, a0, 24			; RV32IA-NEXT: andi a3, a0, 24
	; RV32IA-NEXT: li a4, 255			; RV32IA-NEXT: li a4, 255
	; RV32IA-NEXT: sll a4, a4, a0			; RV32IA-NEXT: sll a4, a4, a0
	; RV32IA-NEXT: slli a1, a1, 24			; RV32IA-NEXT: slli a1, a1, 24
	; RV32IA-NEXT: srai a1, a1, 24			; RV32IA-NEXT: srai a1, a1, 24
	; RV32IA-NEXT: sll a1, a1, a0			; RV32IA-NEXT: sll a1, a1, a0
	; RV32IA-NEXT: li a5, 24			; RV32IA-NEXT: xori a3, a3, 24
	; RV32IA-NEXT: sub a3, a5, a3
	; RV32IA-NEXT: .LBB44_1: # =>This Inner Loop Header: Depth=1			; RV32IA-NEXT: .LBB44_1: # =>This Inner Loop Header: Depth=1
	; RV32IA-NEXT: lr.w.aqrl a5, (a2)			; RV32IA-NEXT: lr.w.aqrl a5, (a2)
	; RV32IA-NEXT: and a7, a5, a4			; RV32IA-NEXT: and a7, a5, a4
	; RV32IA-NEXT: mv a6, a5			; RV32IA-NEXT: mv a6, a5
	; RV32IA-NEXT: sll a7, a7, a3			; RV32IA-NEXT: sll a7, a7, a3
	; RV32IA-NEXT: sra a7, a7, a3			; RV32IA-NEXT: sra a7, a7, a3
	; RV32IA-NEXT: bge a1, a7, .LBB44_3			; RV32IA-NEXT: bge a1, a7, .LBB44_3
	; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB44_1 Depth=1			; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB44_1 Depth=1
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; RV64IA-NEXT: andi a2, a0, -4			; RV64IA-NEXT: andi a2, a0, -4
	; RV64IA-NEXT: slliw a0, a0, 3			; RV64IA-NEXT: slliw a0, a0, 3
	; RV64IA-NEXT: andi a3, a0, 24			; RV64IA-NEXT: andi a3, a0, 24
	; RV64IA-NEXT: li a4, 255			; RV64IA-NEXT: li a4, 255
	; RV64IA-NEXT: sllw a4, a4, a0			; RV64IA-NEXT: sllw a4, a4, a0
	; RV64IA-NEXT: slli a1, a1, 56			; RV64IA-NEXT: slli a1, a1, 56
	; RV64IA-NEXT: srai a1, a1, 56			; RV64IA-NEXT: srai a1, a1, 56
	; RV64IA-NEXT: sllw a1, a1, a0			; RV64IA-NEXT: sllw a1, a1, a0
	; RV64IA-NEXT: li a5, 56			; RV64IA-NEXT: xori a3, a3, 56
	; RV64IA-NEXT: sub a3, a5, a3
	; RV64IA-NEXT: .LBB44_1: # =>This Inner Loop Header: Depth=1			; RV64IA-NEXT: .LBB44_1: # =>This Inner Loop Header: Depth=1
	; RV64IA-NEXT: lr.w.aqrl a5, (a2)			; RV64IA-NEXT: lr.w.aqrl a5, (a2)
	; RV64IA-NEXT: and a7, a5, a4			; RV64IA-NEXT: and a7, a5, a4
	; RV64IA-NEXT: mv a6, a5			; RV64IA-NEXT: mv a6, a5
	; RV64IA-NEXT: sll a7, a7, a3			; RV64IA-NEXT: sll a7, a7, a3
	; RV64IA-NEXT: sra a7, a7, a3			; RV64IA-NEXT: sra a7, a7, a3
	; RV64IA-NEXT: bge a1, a7, .LBB44_3			; RV64IA-NEXT: bge a1, a7, .LBB44_3
	; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB44_1 Depth=1			; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB44_1 Depth=1
	▲ Show 20 Lines • Show All 13,847 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/atomic-signext.ll

	Show First 20 Lines • Show All 622 Lines • ▼ Show 20 Lines
	; RV32IA-NEXT: andi a2, a0, -4			; RV32IA-NEXT: andi a2, a0, -4
	; RV32IA-NEXT: slli a0, a0, 3			; RV32IA-NEXT: slli a0, a0, 3
	; RV32IA-NEXT: andi a3, a0, 24			; RV32IA-NEXT: andi a3, a0, 24
	; RV32IA-NEXT: li a4, 255			; RV32IA-NEXT: li a4, 255
	; RV32IA-NEXT: sll a4, a4, a0			; RV32IA-NEXT: sll a4, a4, a0
	; RV32IA-NEXT: slli a1, a1, 24			; RV32IA-NEXT: slli a1, a1, 24
	; RV32IA-NEXT: srai a1, a1, 24			; RV32IA-NEXT: srai a1, a1, 24
	; RV32IA-NEXT: sll a1, a1, a0			; RV32IA-NEXT: sll a1, a1, a0
	; RV32IA-NEXT: li a5, 24			; RV32IA-NEXT: xori a3, a3, 24
	; RV32IA-NEXT: sub a3, a5, a3
	; RV32IA-NEXT: .LBB10_1: # =>This Inner Loop Header: Depth=1			; RV32IA-NEXT: .LBB10_1: # =>This Inner Loop Header: Depth=1
	; RV32IA-NEXT: lr.w a5, (a2)			; RV32IA-NEXT: lr.w a5, (a2)
	; RV32IA-NEXT: and a7, a5, a4			; RV32IA-NEXT: and a7, a5, a4
	; RV32IA-NEXT: mv a6, a5			; RV32IA-NEXT: mv a6, a5
	; RV32IA-NEXT: sll a7, a7, a3			; RV32IA-NEXT: sll a7, a7, a3
	; RV32IA-NEXT: sra a7, a7, a3			; RV32IA-NEXT: sra a7, a7, a3
	; RV32IA-NEXT: bge a7, a1, .LBB10_3			; RV32IA-NEXT: bge a7, a1, .LBB10_3
	; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB10_1 Depth=1			; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB10_1 Depth=1
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; RV64IA-NEXT: andi a2, a0, -4			; RV64IA-NEXT: andi a2, a0, -4
	; RV64IA-NEXT: slliw a0, a0, 3			; RV64IA-NEXT: slliw a0, a0, 3
	; RV64IA-NEXT: andi a3, a0, 24			; RV64IA-NEXT: andi a3, a0, 24
	; RV64IA-NEXT: li a4, 255			; RV64IA-NEXT: li a4, 255
	; RV64IA-NEXT: sllw a4, a4, a0			; RV64IA-NEXT: sllw a4, a4, a0
	; RV64IA-NEXT: slli a1, a1, 56			; RV64IA-NEXT: slli a1, a1, 56
	; RV64IA-NEXT: srai a1, a1, 56			; RV64IA-NEXT: srai a1, a1, 56
	; RV64IA-NEXT: sllw a1, a1, a0			; RV64IA-NEXT: sllw a1, a1, a0
	; RV64IA-NEXT: li a5, 56			; RV64IA-NEXT: xori a3, a3, 56
	; RV64IA-NEXT: sub a3, a5, a3
	; RV64IA-NEXT: .LBB10_1: # =>This Inner Loop Header: Depth=1			; RV64IA-NEXT: .LBB10_1: # =>This Inner Loop Header: Depth=1
	; RV64IA-NEXT: lr.w a5, (a2)			; RV64IA-NEXT: lr.w a5, (a2)
	; RV64IA-NEXT: and a7, a5, a4			; RV64IA-NEXT: and a7, a5, a4
	; RV64IA-NEXT: mv a6, a5			; RV64IA-NEXT: mv a6, a5
	; RV64IA-NEXT: sll a7, a7, a3			; RV64IA-NEXT: sll a7, a7, a3
	; RV64IA-NEXT: sra a7, a7, a3			; RV64IA-NEXT: sra a7, a7, a3
	; RV64IA-NEXT: bge a7, a1, .LBB10_3			; RV64IA-NEXT: bge a7, a1, .LBB10_3
	; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB10_1 Depth=1			; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB10_1 Depth=1
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; RV32IA-NEXT: andi a2, a0, -4			; RV32IA-NEXT: andi a2, a0, -4
	; RV32IA-NEXT: slli a0, a0, 3			; RV32IA-NEXT: slli a0, a0, 3
	; RV32IA-NEXT: andi a3, a0, 24			; RV32IA-NEXT: andi a3, a0, 24
	; RV32IA-NEXT: li a4, 255			; RV32IA-NEXT: li a4, 255
	; RV32IA-NEXT: sll a4, a4, a0			; RV32IA-NEXT: sll a4, a4, a0
	; RV32IA-NEXT: slli a1, a1, 24			; RV32IA-NEXT: slli a1, a1, 24
	; RV32IA-NEXT: srai a1, a1, 24			; RV32IA-NEXT: srai a1, a1, 24
	; RV32IA-NEXT: sll a1, a1, a0			; RV32IA-NEXT: sll a1, a1, a0
	; RV32IA-NEXT: li a5, 24			; RV32IA-NEXT: xori a3, a3, 24
	; RV32IA-NEXT: sub a3, a5, a3
	; RV32IA-NEXT: .LBB11_1: # =>This Inner Loop Header: Depth=1			; RV32IA-NEXT: .LBB11_1: # =>This Inner Loop Header: Depth=1
	; RV32IA-NEXT: lr.w a5, (a2)			; RV32IA-NEXT: lr.w a5, (a2)
	; RV32IA-NEXT: and a7, a5, a4			; RV32IA-NEXT: and a7, a5, a4
	; RV32IA-NEXT: mv a6, a5			; RV32IA-NEXT: mv a6, a5
	; RV32IA-NEXT: sll a7, a7, a3			; RV32IA-NEXT: sll a7, a7, a3
	; RV32IA-NEXT: sra a7, a7, a3			; RV32IA-NEXT: sra a7, a7, a3
	; RV32IA-NEXT: bge a1, a7, .LBB11_3			; RV32IA-NEXT: bge a1, a7, .LBB11_3
	; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB11_1 Depth=1			; RV32IA-NEXT: # %bb.2: # in Loop: Header=BB11_1 Depth=1
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; RV64IA-NEXT: andi a2, a0, -4			; RV64IA-NEXT: andi a2, a0, -4
	; RV64IA-NEXT: slliw a0, a0, 3			; RV64IA-NEXT: slliw a0, a0, 3
	; RV64IA-NEXT: andi a3, a0, 24			; RV64IA-NEXT: andi a3, a0, 24
	; RV64IA-NEXT: li a4, 255			; RV64IA-NEXT: li a4, 255
	; RV64IA-NEXT: sllw a4, a4, a0			; RV64IA-NEXT: sllw a4, a4, a0
	; RV64IA-NEXT: slli a1, a1, 56			; RV64IA-NEXT: slli a1, a1, 56
	; RV64IA-NEXT: srai a1, a1, 56			; RV64IA-NEXT: srai a1, a1, 56
	; RV64IA-NEXT: sllw a1, a1, a0			; RV64IA-NEXT: sllw a1, a1, a0
	; RV64IA-NEXT: li a5, 56			; RV64IA-NEXT: xori a3, a3, 56
	; RV64IA-NEXT: sub a3, a5, a3
	; RV64IA-NEXT: .LBB11_1: # =>This Inner Loop Header: Depth=1			; RV64IA-NEXT: .LBB11_1: # =>This Inner Loop Header: Depth=1
	; RV64IA-NEXT: lr.w a5, (a2)			; RV64IA-NEXT: lr.w a5, (a2)
	; RV64IA-NEXT: and a7, a5, a4			; RV64IA-NEXT: and a7, a5, a4
	; RV64IA-NEXT: mv a6, a5			; RV64IA-NEXT: mv a6, a5
	; RV64IA-NEXT: sll a7, a7, a3			; RV64IA-NEXT: sll a7, a7, a3
	; RV64IA-NEXT: sra a7, a7, a3			; RV64IA-NEXT: sra a7, a7, a3
	; RV64IA-NEXT: bge a1, a7, .LBB11_3			; RV64IA-NEXT: bge a1, a7, .LBB11_3
	; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB11_1 Depth=1			; RV64IA-NEXT: # %bb.2: # in Loop: Header=BB11_1 Depth=1
	▲ Show 20 Lines • Show All 2,880 Lines • Show Last 20 Lines

llvm/test/CodeGen/SPARC/64bit.ll

Show First 20 Lines • Show All 232 Lines • ▼ Show 20 Lines	entry:
%arraydecay = getelementptr inbounds [32 x i8], [32 x i8]* %b, i64 0, i64 0		%arraydecay = getelementptr inbounds [32 x i8], [32 x i8]* %b, i64 0, i64 0
call void @g(i8* %arraydecay) #2		call void @g(i8* %arraydecay) #2
ret void		ret void
}		}

declare void @g(i8*)		declare void @g(i8*)

; CHECK: expand_setcc		; CHECK: expand_setcc
; CHECK: cmp %i0, 1		; CHECK: cmp %i0, 0
; CHECK: movl %xcc, 1,		; CHECK: movg %xcc, 1,
define i32 @expand_setcc(i64 %a) {		define i32 @expand_setcc(i64 %a) {
%cond = icmp sle i64 %a, 0		%cond = icmp sle i64 %a, 0
%cast2 = zext i1 %cond to i32		%cast2 = zext i1 %cond to i32
%RV = sub i32 1, %cast2		%RV = sub i32 1, %cast2
ret i32 %RV		ret i32 %RV
}		}

; CHECK: spill_i64		; CHECK: spill_i64
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SDAG] try to replace subtract-from-constant with xorClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 443216

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/X86/X86InstrCompiler.td

llvm/test/CodeGen/AArch64/sub1.ll

llvm/test/CodeGen/AMDGPU/ds-sub-offset.ll

llvm/test/CodeGen/AMDGPU/setcc-multiple-use.ll

llvm/test/CodeGen/ARM/intrinsics-overflow.ll

llvm/test/CodeGen/ARM/usub_sat.ll

llvm/test/CodeGen/ARM/usub_sat_plus.ll

llvm/test/CodeGen/PowerPC/bool-math.ll

llvm/test/CodeGen/PowerPC/select_const.ll

llvm/test/CodeGen/RISCV/atomic-rmw.ll

llvm/test/CodeGen/RISCV/atomic-signext.ll

llvm/test/CodeGen/SPARC/64bit.ll

[SDAG] try to replace subtract-from-constant with xor
ClosedPublic