This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
AMDGPU/
-
calling-conventions.ll
-
cvt_f32_ubyte.ll
-
sminmax.v2i16.ll
-
widen-smrd-loads.ll
-
Hexagon/
-
subi-asl.ll
-
X86/
-
scheduler-backtracking.ll
-
signbit-shift.ll
-
split-store.ll

Differential D59758

[DAGCombiner] Combine OR as ADD when no common bits are set
ClosedPublic

Authored by bjope on Mar 25 2019, 2:45 AM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
craig.topper
kparzysz

Commits

rGf97b29be884c: [DAGCombiner] Combine OR as ADD when no common bits are set
rL358965: [DAGCombiner] Combine OR as ADD when no common bits are set

Summary

The DAGCombiner is rewriting (canonicalizing) an ISD::ADD
with no common bits set in the operands as an ISD::OR node.

This could sometimes result in "missing out" on some
combines that normally are performed for ADD. To be more
specific this could happen if we already have rewritten an
ADD into OR, and later (after legalizations or combines)
we expose patterns that could have been optimized if we
had seen the OR as an ADD (e.g. reassociations based on ADD).

To make the DAG combiner less sensitive to if ADD or OR is
used for these "no common bits set" ADD/OR operations we
now apply most of the ADD combines also to an OR operation,
when value tracking indicates that the operands have no
common bits set.

Diff Detail

Repository: rL LLVM

Event Timeline

bjope created this revision.Mar 25 2019, 2:45 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 25 2019, 2:45 AM

Herald added subscribers: javed.absar, hiraditya, nhaehnle, jvesely. · View Herald Transcript

Harbormaster completed remote builds in B29559: Diff 192060.Mar 25 2019, 2:45 AM

Hello reviewers! Do you think this is a good idea?

If you agree to the idea presented here, then I'll probably need some help from Hexagon regarding the llvm/test/CodeGen/Hexagon/subi-asl.ll test case (which no longer is triggering "subi-asl"). Should the test case be updated? Should we still get subi-asl here?

I've mostly seen improvements for our OOT target when doing this, but for example llvm/test/CodeGen/X86/split-store.ll also exposes a case when we trigger a rewrite into using SUB.

bjope added a reviewer: kparzysz.Mar 25 2019, 2:52 AM

In D59758#1441124, @bjope wrote:

Hello reviewers! Do you think this is a good idea?

It's an interesting idea. :)

I've mostly seen improvements for our OOT target when doing this, but for example llvm/test/CodeGen/X86/split-store.ll also exposes a case when we trigger a rewrite into using SUB.

Yes, we'd classify that as a slight regression for x86.

Do you have a sense of how many different folds we're missing in the tests where you show improvements? If it's a small number, we're probably better off just duplicating that code inside 'visitOR', so we don't have to deal with the regressions.

In D59758#1442959, @spatel wrote:

In D59758#1441124, @bjope wrote:

Hello reviewers! Do you think this is a good idea?

It's an interesting idea. :)

I've mostly seen improvements for our OOT target when doing this, but for example llvm/test/CodeGen/X86/split-store.ll also exposes a case when we trigger a rewrite into using SUB.

Yes, we'd classify that as a slight regression for x86.

Isn't split-store.ll showing an improvement (we get one subb instead of andb+orb)?

However, signbit-shift.ll might show a regression (since we get one more instruction and use an extra register). However, I'm not that familiar with the vector instructions to understand if it really is a regression (maybe those movdqa instructions are easier to schedule, or having shorter latency or something).

Do you have a sense of how many different folds we're missing in the tests where you show improvements? If it's a small number, we're probably better off just duplicating that code inside 'visitOR', so we don't have to deal with the regressions.

We got lots of selection patterns using "add", for example patterns involving addressing modes, multiply-and-add, etc. Currently our OOT target is using some hooks to rewrite ADD->OR in the first rounds of DAG combiner, and then in the last DAG combiner run we instead rewrite OR->ADD. That is a little bit hacky, so I'm trying to avoid that and either duplicate patterns (detecting or-with-no-common-bits in tablegen patterns), or rewriting to "add" in the PreprocessISelDAG hook. It was when working on that I noticed lots of regressions due to no longer doing the DAG combines that would trigger if using ADD instead of OR in the last DAG combine run.

Examples:

not triggering the reassociation like ((X + C) + Y) => ((X + Y) + C) resulted in some regressions
not triggering the combines involving SUB such as ((0 - A) + B) => (B - A) resulted in some regressions

I can try to make it more selective (duplicating some specific folds). I actually started out that way, but then I figured that it might be better to do all possible folds from visitADD.

Normally we try to do most folds in visitADD before we do rewrite into OR (so that should be the normal case when we have ADD in the input). But if for example opt starts rewriting ADD->OR so we have OR already in the input we won't even try those folds before we rewrite ADD->OR. This makes the DAG combiner sensitive to the input form IMHO. This could be amended by this patch, but if I start to be more selective we most likely will still end up with different results for semantically equivalent input.
Note however that I'm not only focusing on being less sensitive to if ADD or OR has been used in the input to SelectionDAG. Some of the problems I've seen is related to order of combines/lowering, such as doing the ADD->OR rewrite before we have done other simplifications that would have triggered the folds in visitADD if we for example had delayed the ADD->OR rewrite to after legalization.

In D59758#1443074, @bjope wrote:

In D59758#1442959, @spatel wrote:

In D59758#1441124, @bjope wrote:

Hello reviewers! Do you think this is a good idea?

It's an interesting idea. :)

I've mostly seen improvements for our OOT target when doing this, but for example llvm/test/CodeGen/X86/split-store.ll also exposes a case when we trigger a rewrite into using SUB.

Yes, we'd classify that as a slight regression for x86.

Isn't split-store.ll showing an improvement (we get one subb instead of andb+orb)?

Yes - sorry, I reversed that with signbit-shift.ll. So we would call signbit-shift.ll a slight regression because of the extra mov instruction. We are probably missing a generic combine. Might be similar to the hexagon diff?

However, signbit-shift.ll might show a regression (since we get one more instruction and use an extra register). However, I'm not that familiar with the vector instructions to understand if it really is a regression (maybe those movdqa instructions are easier to schedule, or having shorter latency or something).

The Hexagon testcase can be fixed---it's probably just a matter of changing the selection pattern for the instruction we're checking.

In D59758#1443308, @spatel wrote:

In D59758#1443074, @bjope wrote:

In D59758#1442959, @spatel wrote:

In D59758#1441124, @bjope wrote:

Hello reviewers! Do you think this is a good idea?

It's an interesting idea. :)

I've mostly seen improvements for our OOT target when doing this, but for example llvm/test/CodeGen/X86/split-store.ll also exposes a case when we trigger a rewrite into using SUB.

Yes, we'd classify that as a slight regression for x86.

Isn't split-store.ll showing an improvement (we get one subb instead of andb+orb)?

Yes - sorry, I reversed that with signbit-shift.ll. So we would call signbit-shift.ll a slight regression because of the extra mov instruction. We are probably missing a generic combine. Might be similar to the hexagon diff?

Seems to be foldAddSubMasked1 that now is doing the fold

(or (and (AssertSext X, i1), 1), C) --> (sub C, X)

which looks pretty nice in the DAG, but at least in this case (X86 + vector ops) the load of the splat vector C from constant pool can't be folded into the PSUBD, so we get a separate MOVAPS for that:

  t24: v4i32,ch = MOVAPSrm<Mem:(load 16 from constant-pool)> Register:i64 $rip, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstantPool:i32<<4 x i32> <i32 42, i32 42, i32 42, i32 42>> 0, Register:i32 $noreg, t0
  t21: v4i32 = PCMPGTDrr t2, V_SETALLONES:v4i32
t20: v4i32 = PSUBDrr t24, t21

instead of this output from isel where the load from constant pool is folded into the POR

    t20: v4i32 = PCMPGTDrr t2, V_SETALLONES:v4i32
  t22: v4i32 = PSRLDri t20, TargetConstant:i8<31>
t15: v4i32,ch = PORrm<Mem:(load 16 from constant-pool)> t22, Register:i64 $rip, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstantPool:i32<<4 x i32> <i32 42, i32 42, i32 42, i32 42>> 0, Register:i32 $noreg, t0

Afaict, the code after ISel looks better when we do the rewrite into using sub (the height of the DAG is reduced by one). However, since both the PCMPGTD and the PSUBD has tied operands we lose when doing register allocation, and we have to insert an extra move to be pass the result in xmm0. I do not think that we can detect that in the generic DAGCombiner. And if the test case just had been a little bit different (receiving %x in xmm1, if that is possible), then there wouldn't have been an extra MOVDQA.

So is this regression anything to bother about? (I can add another test case where %x isn't taken from the first argument.)

In D59758#1444441, @kparzysz wrote:

The Hexagon testcase can be fixed---it's probably just a matter of changing the selection pattern for the instruction we're checking.

I've done some more analysis for llvm/test/CodeGen/Hexagon/subi-asl.ll.

The first DAG combine that makes a difference is that with this patch we fold

    t30: i32 = shl t11, Constant:i32<1>
  t31: i32 = sub Constant:i32<0>, t30
t28: i32 = or t9, t31

into

  t30: i32 = shl t11, Constant:i32<1>
t32: i32 = sub t9, t30

IMO that looks like a nice fold.

A few combines later we end up with the following before selection:

Address tree balanced selection DAG:SelectionDAG has 27 nodes:
  t0: ch = EntryToken
    t44: i32 = HexagonISD::CONST32_GP TargetGlobalAddress:i32<i32* @this_insn_number> 0
  t11: i32,ch = load<(dereferenceable load 4 from @this_insn_number)> t0, t44, undef:i32
    t2: i32,ch = CopyFromReg t0, Register:i32 %1
  t41: i32,ch = load<(load 2 from %ir.cgep59, align 4), sext from i16> t0, t2, undef:i32
  t30: i32 = shl t11, Constant:i32<1>
    t18: ch = TokenFactor t41:1, t11:1
    t17: i32,ch = CopyFromReg t0, Register:i32 %0
  t23: ch,glue = CopyToReg t18, Register:i32 $r0, t17
        t37: i1 = setcc t41, Constant:i32<56>, seteq:ch
        t48: i32 = sub Constant:i32<1>, t30
        t47: i32 = sub Constant:i32<0>, t30
      t49: i32 = select t37, t48, t47
    t25: ch,glue = CopyToReg t23, Register:i32 $r1, t49, t23:1
  t27: ch,glue = HexagonISD::TC_RETURN t25, TargetGlobalAddress:i32<void (%struct.rtx_def*, i32)* @reg_is_born> 0, Register:i32 $r0, Register:i32 $r1, RegisterMask:Untyped

and since there now is two uses of t30 the patterns for selecting subi_asl won't trigger (they check that there only is one use of the shl).

Without the patch we instead would get

Address tree balanced selection DAG:SelectionDAG has 27 nodes:
  t0: ch = EntryToken
    t40: i32 = HexagonISD::CONST32_GP TargetGlobalAddress:i32<i32* @this_insn_number> 0
  t11: i32,ch = load<(dereferenceable load 4 from @this_insn_number)> t0, t40, undef:i32
    t2: i32,ch = CopyFromReg t0, Register:i32 %1
  t38: i32,ch = load<(load 2 from %ir.cgep59, align 4), sext from i16> t0, t2, undef:i32
    t30: i32 = shl t11, Constant:i32<1>
  t31: i32 = sub Constant:i32<0>, t30
    t18: ch = TokenFactor t38:1, t11:1
    t17: i32,ch = CopyFromReg t0, Register:i32 %0
  t23: ch,glue = CopyToReg t18, Register:i32 $r0, t17
        t35: i1 = setcc t38, Constant:i32<56>, seteq:ch
        t41: i32 = or t31, Constant:i32<1>
      t42: i32 = select t35, t41, t31
    t25: ch,glue = CopyToReg t23, Register:i32 $r1, t42, t23:1
  t27: ch,glue = HexagonISD::TC_RETURN t25, TargetGlobalAddress:i32<void (%struct.rtx_def*, i32)* @reg_is_born> 0, Register:i32 $r0, Register:i32 $r1, RegisterMask:Untyped

So the height of the DAG (looking at the operands for the select) seem to be one less with this patch (select ... (sub (shl)) ...) instead of (select ... (or (sub (shl ..))) ...) .

If you think that the old codegen (using subi_asl) actually is superior here, then we might need a pattern doing

(sub 1, (shl x, c)) => (setbit_i (subi_asl_ri 0, x, c), 0)

But I guess with a predicate to only do it if the shl has multiple uses, because otherwise

(sub 1, (shl x, c)) => (subi_asl_ri 1, x, c)

would be better.
I actually think that we need even more predicates to distinguish when the setbit_i solution is better, because it does not look optimal (at least not with my limited knowledge about Hexagon).

Added extra test in test/CodeGen/X86/signbit-shift.ll to show that add_zext_ifpos_vec_splat only turn up as a regression due to register contraints.

The new add_zext_ifpos_vec_splat2 show an improvement since we get

pcmpeqd %xmm0, %xmm0
pcmpgtd %xmm0, %xmm1
movdqa .LCPI3_0(%rip), %xmm0 # xmm0 = [42,42,42,42]
psubd %xmm1, %xmm0

instead of

movdqa %xmm1, %xmm0
pcmpeqd %xmm1, %xmm1
pcmpgtd %xmm1, %xmm0
psrld $31, %xmm0
por .LCPI3_0(%rip), %xmm0

Harbormaster completed remote builds in B29913: Diff 193124.Apr 1 2019, 10:45 AM

Thanks for expanding on the x86 example. I agree now that it's a good idea to try these optimizations.
@RKSimon may know from looking, but this might mean we can remove the more specific fold from rL357351 ?

In D59758#1451337, @spatel wrote:

Thanks for expanding on the x86 example. I agree now that it's a good idea to try these optimizations.
@RKSimon may know from looking, but this might mean we can remove the more specific fold from rL357351 ?

IIRC that transform was being done pre-DAG - so the OR was already stuck on the wrong side of the zext - would be interested to see though (we still don't match PAVGB if it uses OR after ZEXT).

Something I didn't do but I think would be useful is to add a 'bool SelectionDAG::isAddLike(SDValue N, SDValue &Op0, SDValue &Op1)' helper that will match against ADD/OR/SHL etc. that all perform some form of addition.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2230 ↗	(On Diff #193124)	I realise it reduces the patch - but would it be better to move this in the new visitAdd?
2236 ↗	(On Diff #193124)	move this in the new visitAdd?

Updated according to suggestion from @RKSimon, i.e. moving the combines only
done for ADD into the visitADD function.

I also renamed the old visitADDLike method to visitADDLikeCommutative, and the
new method visitADDorNoCommonBitsOR (from earlier version of this patch) is
now called visitADDLike.

Harbormaster completed remote builds in B30010: Diff 193514.Apr 3 2019, 8:58 AM

Friendly ping!

lebedev.ri added a subscriber: lebedev.ri.Apr 9 2019, 7:25 AM

lebedev.ri added inline comments.

llvm/test/CodeGen/X86/signbit-shift.ll
31–58 ↗	(On Diff #193514)	Aren't these two tests identical?

bjope added inline comments.Apr 9 2019, 7:59 AM

llvm/test/CodeGen/X86/signbit-shift.ll
31–58 ↗	(On Diff #193514)	add_zext_ifpos_vec_splat2 is receiving %x in the second argument (so I guess it will be mapped to xmm1). That second test was added to show that what seems to be a degradation in add_zext_ifpos_vec_splat happens due to register constraints (resulting in an extra movdqa at the end). Maybe I should omit the new test before commit. After all, it does not serve any extra purpose when it comes to testing "signbit-shift".

lebedev.ri added inline comments.Apr 9 2019, 8:33 AM

llvm/test/CodeGen/X86/signbit-shift.ll
31–58 ↗	(On Diff #193514)	That extra move may be regalloc problem (or maybe that is already optimal). The 'real' problem here is that we now have that extra `movdqa {{.*#+}} xmm1 = [42,42,42,42]`, i guess because `psubd` can only take memory operand in other operand. I'd drop the extra test.

Removed the add_zext_ifpos_vec_splat2 test from test/CodeGen/X86/signbit-shift.ll again (as suggested by @lebedev.ri). That test was only added to demonstrate why add_zext_ifpos_vec_splat gets an extra movdqa with this patch (due to unfortunate reg constraints), but it did not contribute anything new when it comes to testing "signbit-shift".

Harbormaster completed remote builds in B30359: Diff 194645.Apr 11 2019, 1:28 AM

ping!

If there still are worries about this, then maybe I can do the visitADD refactoring in a separate patch? And then I can limit this patch to the part where we start to use visitADDLike from visitOR.

LGTM (see inline for a couple of nits) - but I'd prefer that someone with AMDGPU knowledge (@arsenm @nhaehnle @rampitec ?) confirm those diffs too.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2106 ↗	(On Diff #194645)	typo: necessarily
2107 ↗	(On Diff #194645)	typo: known -> know

This revision is now accepted and ready to land.Apr 18 2019, 9:03 AM

In D59758#1471822, @spatel wrote:

LGTM (see inline for a couple of nits) - but I'd prefer that someone with AMDGPU knowledge (@arsenm @nhaehnle @rampitec ?) confirm those diffs too.

Thanks! I won't push this until next week anyway.

LGTM for amdgpu test changes.

Closed by commit rL358965: [DAGCombiner] Combine OR as ADD when no common bits are set (authored by bjope). · Explain WhyApr 23 2019, 2:59 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

56 lines

test/

CodeGen/

AMDGPU/

calling-conventions.ll

6 lines

cvt_f32_ubyte.ll

54 lines

sminmax.v2i16.ll

22 lines

widen-smrd-loads.ll

3 lines

Hexagon/

subi-asl.ll

5 lines

X86/

scheduler-backtracking.ll

221 lines

signbit-shift.ll

5 lines

split-store.ll

7 lines

Diff 196203

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 335 Lines • ▼ Show 20 Lines	private:
// Return Value:		// Return Value:
// SDValue.getNode() == 0 - No change was made		// SDValue.getNode() == 0 - No change was made
// SDValue.getNode() == N - N was replaced, is dead and has been handled.		// SDValue.getNode() == N - N was replaced, is dead and has been handled.
// otherwise - N should be replaced by the returned Operand.		// otherwise - N should be replaced by the returned Operand.
//		//
SDValue visitTokenFactor(SDNode *N);		SDValue visitTokenFactor(SDNode *N);
SDValue visitMERGE_VALUES(SDNode *N);		SDValue visitMERGE_VALUES(SDNode *N);
SDValue visitADD(SDNode *N);		SDValue visitADD(SDNode *N);
SDValue visitADDLike(SDValue N0, SDValue N1, SDNode *LocReference);		SDValue visitADDLike(SDNode *N);
		SDValue visitADDLikeCommutative(SDValue N0, SDValue N1, SDNode *LocReference);
SDValue visitSUB(SDNode *N);		SDValue visitSUB(SDNode *N);
SDValue visitADDSAT(SDNode *N);		SDValue visitADDSAT(SDNode *N);
SDValue visitSUBSAT(SDNode *N);		SDValue visitSUBSAT(SDNode *N);
SDValue visitADDC(SDNode *N);		SDValue visitADDC(SDNode *N);
SDValue visitADDO(SDNode *N);		SDValue visitADDO(SDNode *N);
SDValue visitUADDOLike(SDValue N0, SDValue N1, SDNode *N);		SDValue visitUADDOLike(SDValue N0, SDValue N1, SDNode *N);
SDValue visitSUBC(SDNode *N);		SDValue visitSUBC(SDNode *N);
SDValue visitSUBO(SDNode *N);		SDValue visitSUBO(SDNode *N);
▲ Show 20 Lines • Show All 1,753 Lines • ▼ Show 20 Lines	static SDValue foldAddSubOfSignBit(SDNode *N, SelectionDAG &DAG) {
// sub C, (srl (not X), 31) --> add (srl X, 31), (C - 1)		// sub C, (srl (not X), 31) --> add (srl X, 31), (C - 1)
SDLoc DL(N);		SDLoc DL(N);
auto ShOpcode = IsAdd ? ISD::SRA : ISD::SRL;		auto ShOpcode = IsAdd ? ISD::SRA : ISD::SRL;
SDValue NewShift = DAG.getNode(ShOpcode, DL, VT, Not.getOperand(0), ShAmt);		SDValue NewShift = DAG.getNode(ShOpcode, DL, VT, Not.getOperand(0), ShAmt);
APInt NewC = IsAdd ? C->getAPIntValue() + 1 : C->getAPIntValue() - 1;		APInt NewC = IsAdd ? C->getAPIntValue() + 1 : C->getAPIntValue() - 1;
return DAG.getNode(ISD::ADD, DL, VT, NewShift, DAG.getConstant(NewC, DL, VT));		return DAG.getNode(ISD::ADD, DL, VT, NewShift, DAG.getConstant(NewC, DL, VT));
}		}

SDValue DAGCombiner::visitADD(SDNode *N) {		/// Try to fold a node that behaves like an ADD (note that N isn't necessarily
		/// an ISD::ADD here, it could for example be an ISD::OR if we know that there
		/// are no common bits set in the operands).
		SDValue DAGCombiner::visitADDLike(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N0.getValueType();		EVT VT = N0.getValueType();
SDLoc DL(N);		SDLoc DL(N);

// fold vector ops		// fold vector ops
if (VT.isVector()) {		if (VT.isVector()) {
if (SDValue FoldedVOp = SimplifyVBinOp(N))		if (SDValue FoldedVOp = SimplifyVBinOp(N))
▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	auto MatchUSUBSAT = [](ConstantSDNode Max, ConstantSDNode Op) {
(Max && Op && Max->getAPIntValue() == (-Op->getAPIntValue()));		(Max && Op && Max->getAPIntValue() == (-Op->getAPIntValue()));
};		};
if (ISD::matchBinaryPredicate(N0.getOperand(1), N1, MatchUSUBSAT,		if (ISD::matchBinaryPredicate(N0.getOperand(1), N1, MatchUSUBSAT,
/AllowUndefs/ true))		/AllowUndefs/ true))
return DAG.getNode(ISD::USUBSAT, DL, VT, N0.getOperand(0),		return DAG.getNode(ISD::USUBSAT, DL, VT, N0.getOperand(0),
N0.getOperand(1));		N0.getOperand(1));
}		}

if (SDValue V = foldAddSubBoolOfMaskedVal(N, DAG))
return V;

if (SDValue V = foldAddSubOfSignBit(N, DAG))
return V;

if (SimplifyDemandedBits(SDValue(N, 0)))		if (SimplifyDemandedBits(SDValue(N, 0)))
return SDValue(N, 0);		return SDValue(N, 0);

// fold (a+b) -> (a\|b) iff a and b share no bits.
if ((!LegalOperations \|\| TLI.isOperationLegal(ISD::OR, VT)) &&
DAG.haveNoCommonBitsSet(N0, N1))
return DAG.getNode(ISD::OR, DL, VT, N0, N1);

if (isOneOrOneSplat(N1)) {		if (isOneOrOneSplat(N1)) {
// fold (add (xor a, -1), 1) -> (sub 0, a)		// fold (add (xor a, -1), 1) -> (sub 0, a)
if (isBitwiseNot(N0))		if (isBitwiseNot(N0))
return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT),		return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT),
N0.getOperand(0));		N0.getOperand(0));

// fold (add (add (xor a, -1), b), 1) -> (sub b, a)		// fold (add (add (xor a, -1), b), 1) -> (sub b, a)
if (N0.getOpcode() == ISD::ADD \|\|		if (N0.getOpcode() == ISD::ADD \|\|
Show All 9 Lines	if (N0.getOpcode() == ISD::ADD \|\|
Xor = N0.getOperand(1);		Xor = N0.getOperand(1);
}		}

if (Xor)		if (Xor)
return DAG.getNode(ISD::SUB, DL, VT, A, Xor.getOperand(0));		return DAG.getNode(ISD::SUB, DL, VT, A, Xor.getOperand(0));
}		}
}		}

if (SDValue Combined = visitADDLike(N0, N1, N))		if (SDValue Combined = visitADDLikeCommutative(N0, N1, N))
return Combined;		return Combined;

if (SDValue Combined = visitADDLike(N1, N0, N))		if (SDValue Combined = visitADDLikeCommutative(N1, N0, N))
return Combined;		return Combined;

return SDValue();		return SDValue();
}		}

		SDValue DAGCombiner::visitADD(SDNode *N) {
		SDValue N0 = N->getOperand(0);
		SDValue N1 = N->getOperand(1);
		EVT VT = N0.getValueType();
		SDLoc DL(N);

		if (SDValue Combined = visitADDLike(N))
		return Combined;

		if (SDValue V = foldAddSubBoolOfMaskedVal(N, DAG))
		return V;

		if (SDValue V = foldAddSubOfSignBit(N, DAG))
		return V;

		// fold (a+b) -> (a\|b) iff a and b share no bits.
		if ((!LegalOperations \|\| TLI.isOperationLegal(ISD::OR, VT)) &&
		DAG.haveNoCommonBitsSet(N0, N1))
		return DAG.getNode(ISD::OR, DL, VT, N0, N1);

		return SDValue();
		}

SDValue DAGCombiner::visitADDSAT(SDNode *N) {		SDValue DAGCombiner::visitADDSAT(SDNode *N) {
unsigned Opcode = N->getOpcode();		unsigned Opcode = N->getOpcode();
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N0.getValueType();		EVT VT = N0.getValueType();
SDLoc DL(N);		SDLoc DL(N);

// fold vector ops		// fold vector ops
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	static SDValue foldAddSubMasked1(bool IsAdd, SDValue N0, SDValue N1,
if (DAG.ComputeNumSignBits(N1.getOperand(0)) != VT.getScalarSizeInBits())		if (DAG.ComputeNumSignBits(N1.getOperand(0)) != VT.getScalarSizeInBits())
return SDValue();		return SDValue();

// add N0, (and (AssertSext X, i1), 1) --> sub N0, X		// add N0, (and (AssertSext X, i1), 1) --> sub N0, X
// sub N0, (and (AssertSext X, i1), 1) --> add N0, X		// sub N0, (and (AssertSext X, i1), 1) --> add N0, X
return DAG.getNode(IsAdd ? ISD::SUB : ISD::ADD, DL, VT, N0, N1.getOperand(0));		return DAG.getNode(IsAdd ? ISD::SUB : ISD::ADD, DL, VT, N0, N1.getOperand(0));
}		}

SDValue DAGCombiner::visitADDLike(SDValue N0, SDValue N1, SDNode *LocReference) {		/// Helper for doing combines based on N0 and N1 being added to each other.
		SDValue DAGCombiner::visitADDLikeCommutative(SDValue N0, SDValue N1,
		SDNode *LocReference) {
EVT VT = N0.getValueType();		EVT VT = N0.getValueType();
SDLoc DL(LocReference);		SDLoc DL(LocReference);

// fold (add x, shl(0 - y, n)) -> sub(x, shl(y, n))		// fold (add x, shl(0 - y, n)) -> sub(x, shl(y, n))
if (N1.getOpcode() == ISD::SHL && N1.getOperand(0).getOpcode() == ISD::SUB &&		if (N1.getOpcode() == ISD::SHL && N1.getOperand(0).getOpcode() == ISD::SUB &&
isNullOrNullSplat(N1.getOperand(0).getOperand(0)))		isNullOrNullSplat(N1.getOperand(0).getOperand(0)))
return DAG.getNode(ISD::SUB, DL, VT, N0,		return DAG.getNode(ISD::SUB, DL, VT, N0,
DAG.getNode(ISD::SHL, DL, VT,		DAG.getNode(ISD::SHL, DL, VT,
▲ Show 20 Lines • Show All 3,115 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitOR(SDNode *N) {

if (SDValue Load = MatchLoadCombine(N))		if (SDValue Load = MatchLoadCombine(N))
return Load;		return Load;

// Simplify the operands using demanded-bits information.		// Simplify the operands using demanded-bits information.
if (SimplifyDemandedBits(SDValue(N, 0)))		if (SimplifyDemandedBits(SDValue(N, 0)))
return SDValue(N, 0);		return SDValue(N, 0);

		// If OR can be rewritten into ADD, try combines based on ADD.
		if ((!LegalOperations \|\| TLI.isOperationLegal(ISD::ADD, VT)) &&
		DAG.haveNoCommonBitsSet(N0, N1))
		if (SDValue Combined = visitADDLike(N))
		return Combined;

return SDValue();		return SDValue();
}		}

static SDValue stripConstantMask(SelectionDAG &DAG, SDValue Op, SDValue &Mask) {		static SDValue stripConstantMask(SelectionDAG &DAG, SDValue Op, SDValue &Mask) {
if (Op.getOpcode() == ISD::AND &&		if (Op.getOpcode() == ISD::AND &&
DAG.isConstantIntBuildVectorOrConstantInt(Op.getOperand(1))) {		DAG.isConstantIntBuildVectorOrConstantInt(Op.getOperand(1))) {
Mask = Op.getOperand(1);		Mask = Op.getOperand(1);
return Op.getOperand(0);		return Op.getOperand(0);
▲ Show 20 Lines • Show All 14,506 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/calling-conventions.ll

	Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines
	; VI: v_mov_b32_e32 v2, 1			; VI: v_mov_b32_e32 v2, 1
	; VI: v_add_u16_e32 v1, 1, v0			; VI: v_add_u16_e32 v1, 1, v0
	; VI: v_add_u16_sdwa v0, v0, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD			; VI: v_add_u16_sdwa v0, v0, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
	; VI: v_or_b32_e32 v0, v1, v0			; VI: v_or_b32_e32 v0, v1, v0


	; SI: v_lshlrev_b32_e32 v1, 16, v1			; SI: v_lshlrev_b32_e32 v1, 16, v1
	; SI: v_add_i32_e32 v0, vcc, 1, v0			; SI: v_add_i32_e32 v0, vcc, 1, v0
	; SI: v_add_i32_e32 v1, vcc, 0x10000, v1
	; SI: v_and_b32			; SI: v_and_b32
	; SI: v_or_b32			; SI: v_or_b32
				; SI: v_add_i32_e32 v0, vcc, 0x10000, v0
	define amdgpu_ps void @ps_mesa_v2i16(<2 x i16> %arg0) {			define amdgpu_ps void @ps_mesa_v2i16(<2 x i16> %arg0) {
	%add = add <2 x i16> %arg0, <i16 1, i16 1>			%add = add <2 x i16> %arg0, <i16 1, i16 1>
	store <2 x i16> %add, <2 x i16> addrspace(1)* undef			store <2 x i16> %add, <2 x i16> addrspace(1)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}ps_mesa_inreg_v2i16:			; GCN-LABEL: {{^}}ps_mesa_inreg_v2i16:
	; VI: s_and_b32 s1, s0, 0xffff0000			; VI: s_and_b32 s1, s0, 0xffff0000
	; VI: s_add_i32 s0, s0, 1			; VI: s_add_i32 s0, s0, 1
	; VI: s_add_i32 s1, s1, 0x10000
	; VI: s_and_b32 s0, s0, 0xffff			; VI: s_and_b32 s0, s0, 0xffff
	; VI: s_or_b32 s0, s0, s1			; VI: s_or_b32 s0, s0, s1
				; VI: s_add_i32 s0, s0, 0x10000
	; VI: v_mov_b32_e32 v0, s0			; VI: v_mov_b32_e32 v0, s0

	; SI: s_lshl_b32 s1, s1, 16			; SI: s_lshl_b32 s1, s1, 16
	; SI: s_add_i32 s0, s0, 1			; SI: s_add_i32 s0, s0, 1
	; SI: s_add_i32 s1, s1, 0x10000
	; SI: s_and_b32 s0, s0, 0xffff			; SI: s_and_b32 s0, s0, 0xffff
	; SI: s_or_b32 s0, s0, s1			; SI: s_or_b32 s0, s0, s1
				; SI: s_add_i32 s0, s0, 0x10000
	define amdgpu_ps void @ps_mesa_inreg_v2i16(<2 x i16> inreg %arg0) {			define amdgpu_ps void @ps_mesa_inreg_v2i16(<2 x i16> inreg %arg0) {
	%add = add <2 x i16> %arg0, <i16 1, i16 1>			%add = add <2 x i16> %arg0, <i16 1, i16 1>
	store <2 x i16> %add, <2 x i16> addrspace(1)* undef			store <2 x i16> %add, <2 x i16> addrspace(1)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}ps_mesa_inreg_v3i32:			; GCN-LABEL: {{^}}ps_mesa_inreg_v3i32:
	; GCN-DAG: s_add_i32 s0, s0, 1			; GCN-DAG: s_add_i32 s0, s0, 1
	▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll

	Show First 20 Lines • Show All 270 Lines • ▼ Show 20 Lines
	; SI-NEXT: s_mov_b32 s6, 0			; SI-NEXT: s_mov_b32 s6, 0
	; SI-NEXT: s_mov_b32 s7, s3			; SI-NEXT: s_mov_b32 s7, s3
	; SI-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; SI-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; SI-NEXT: v_mov_b32_e32 v1, 0			; SI-NEXT: v_mov_b32_e32 v1, 0
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: buffer_load_dword v1, v[0:1], s[4:7], 0 addr64			; SI-NEXT: buffer_load_dword v1, v[0:1], s[4:7], 0 addr64
	; SI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x9			; SI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x9
	; SI-NEXT: s_mov_b32 s2, -1			; SI-NEXT: s_mov_b32 s2, -1
	; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xb			; SI-NEXT: s_movk_i32 s12, 0xff
	; SI-NEXT: s_movk_i32 s12, 0x900
	; SI-NEXT: s_mov_b32 s10, s2			; SI-NEXT: s_mov_b32 s10, s2
	; SI-NEXT: s_mov_b32 s11, s3			; SI-NEXT: s_mov_b32 s11, s3
	; SI-NEXT: s_movk_i32 s13, 0xff			; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xb
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_lshrrev_b32_e32 v4, 16, v1			; SI-NEXT: v_lshrrev_b32_e32 v4, 16, v1
	; SI-NEXT: v_lshrrev_b32_e32 v5, 24, v1
	; SI-NEXT: v_and_b32_e32 v6, 0xff00, v1
	; SI-NEXT: v_add_i32_e32 v7, vcc, 9, v1			; SI-NEXT: v_add_i32_e32 v7, vcc, 9, v1
				; SI-NEXT: v_and_b32_e32 v6, 0xff00, v1
				; SI-NEXT: v_lshrrev_b32_e32 v5, 24, v1
	; SI-NEXT: v_cvt_f32_ubyte3_e32 v3, v1			; SI-NEXT: v_cvt_f32_ubyte3_e32 v3, v1
	; SI-NEXT: v_cvt_f32_ubyte2_e32 v2, v1			; SI-NEXT: v_cvt_f32_ubyte2_e32 v2, v1
	; SI-NEXT: v_cvt_f32_ubyte0_e32 v0, v1			; SI-NEXT: v_cvt_f32_ubyte0_e32 v0, v1
	; SI-NEXT: v_cvt_f32_ubyte1_e32 v1, v6			; SI-NEXT: v_cvt_f32_ubyte1_e32 v1, v6
	; SI-NEXT: v_lshlrev_b32_e32 v5, 8, v5
	; SI-NEXT: v_add_i32_e32 v4, vcc, 9, v4			; SI-NEXT: v_add_i32_e32 v4, vcc, 9, v4
				; SI-NEXT: v_and_b32_e32 v7, s12, v7
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0			; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0
	; SI-NEXT: v_add_i32_e32 v6, vcc, s12, v6
	; SI-NEXT: v_and_b32_e32 v7, s13, v7
	; SI-NEXT: s_waitcnt expcnt(0)			; SI-NEXT: s_waitcnt expcnt(0)
	; SI-NEXT: v_add_i32_e32 v1, vcc, s12, v5			; SI-NEXT: v_or_b32_e32 v1, v7, v6
	; SI-NEXT: v_and_b32_e32 v2, s13, v4			; SI-NEXT: v_lshlrev_b32_e32 v5, 8, v5
	; SI-NEXT: v_or_b32_e32 v0, v7, v6			; SI-NEXT: v_and_b32_e32 v0, s12, v4
	; SI-NEXT: v_or_b32_e32 v1, v2, v1			; SI-NEXT: v_or_b32_e32 v0, v0, v5
	; SI-NEXT: v_and_b32_e32 v0, 0xffff, v0			; SI-NEXT: v_add_i32_e32 v1, vcc, 0x900, v1
	; SI-NEXT: v_lshlrev_b32_e32 v1, 16, v1			; SI-NEXT: v_lshlrev_b32_e32 v0, 16, v0
	; SI-NEXT: v_or_b32_e32 v0, v0, v1			; SI-NEXT: v_and_b32_e32 v1, 0xffff, v1
				; SI-NEXT: v_or_b32_e32 v0, v1, v0
				; SI-NEXT: v_add_i32_e32 v0, vcc, 0x9000000, v0
	; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0			; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: load_v4i8_to_v4f32_2_uses:			; VI-LABEL: load_v4i8_to_v4f32_2_uses:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; VI-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; VI-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; VI-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; VI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24			; VI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
	; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x2c			; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x2c
	; VI-NEXT: s_movk_i32 s8, 0x900			; VI-NEXT: v_mov_b32_e32 v4, 9
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v1, s3			; VI-NEXT: v_mov_b32_e32 v1, s3
	; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v0			; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v0
	; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; VI-NEXT: flat_load_dword v5, v[0:1]			; VI-NEXT: flat_load_dword v5, v[0:1]
	; VI-NEXT: s_mov_b32 s3, 0xf000			; VI-NEXT: s_mov_b32 s3, 0xf000
	; VI-NEXT: s_mov_b32 s2, -1			; VI-NEXT: s_mov_b32 s2, -1
	; VI-NEXT: s_mov_b32 s6, s2			; VI-NEXT: s_mov_b32 s6, s2
	; VI-NEXT: s_mov_b32 s7, s3			; VI-NEXT: s_mov_b32 s7, s3
	; VI-NEXT: v_mov_b32_e32 v4, 9			; VI-NEXT: s_movk_i32 s8, 0x900
				; VI-NEXT: v_mov_b32_e32 v6, s8
	; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; VI-NEXT: v_lshrrev_b32_e32 v6, 24, v5			; VI-NEXT: v_lshrrev_b32_e32 v7, 24, v5
	; VI-NEXT: v_cvt_f32_ubyte3_e32 v3, v5			; VI-NEXT: v_cvt_f32_ubyte3_e32 v3, v5
	; VI-NEXT: v_cvt_f32_ubyte2_e32 v2, v5			; VI-NEXT: v_cvt_f32_ubyte2_e32 v2, v5
	; VI-NEXT: v_cvt_f32_ubyte1_e32 v1, v5			; VI-NEXT: v_cvt_f32_ubyte1_e32 v1, v5
	; VI-NEXT: v_cvt_f32_ubyte0_e32 v0, v5			; VI-NEXT: v_cvt_f32_ubyte0_e32 v0, v5
	; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[4:7], 0			; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[4:7], 0
	; VI-NEXT: v_and_b32_e32 v7, 0xffffff00, v5			; VI-NEXT: v_and_b32_e32 v8, 0xffffff00, v5
	; VI-NEXT: v_lshlrev_b16_e32 v1, 8, v6			; VI-NEXT: v_add_u16_e32 v9, 9, v5
	; VI-NEXT: v_add_u16_e32 v8, 9, v5			; VI-NEXT: v_add_u16_sdwa v4, v5, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
	; VI-NEXT: v_add_u16_e32 v0, s8, v7			; VI-NEXT: v_lshlrev_b16_e32 v1, 8, v7
	; VI-NEXT: v_add_u16_e32 v1, s8, v1			; VI-NEXT: v_or_b32_sdwa v0, v9, v8 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
	; VI-NEXT: v_add_u16_sdwa v2, v5, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD			; VI-NEXT: v_or_b32_sdwa v1, v4, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
	; VI-NEXT: v_or_b32_sdwa v0, v8, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD			; VI-NEXT: v_add_u16_e32 v0, s8, v0
	; VI-NEXT: v_or_b32_sdwa v1, v2, v1 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD			; VI-NEXT: v_add_u16_sdwa v1, v1, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
	; VI-NEXT: v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD			; VI-NEXT: v_or_b32_e32 v0, v0, v1
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0			; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	%tid.x = call i32 @llvm.amdgcn.workitem.id.x()			%tid.x = call i32 @llvm.amdgcn.workitem.id.x()
	%in.ptr = getelementptr <4 x i8>, <4 x i8> addrspace(1)* %in, i32 %tid.x			%in.ptr = getelementptr <4 x i8>, <4 x i8> addrspace(1)* %in, i32 %tid.x
	%load = load <4 x i8>, <4 x i8> addrspace(1)* %in.ptr, align 4			%load = load <4 x i8>, <4 x i8> addrspace(1)* %in.ptr, align 4
	%cvt = uitofp <4 x i8> %load to <4 x float>			%cvt = uitofp <4 x i8> %load to <4 x float>
	store <4 x float> %cvt, <4 x float> addrspace(1)* %out, align 16			store <4 x float> %cvt, <4 x float> addrspace(1)* %out, align 16
	%add = add <4 x i8> %load, <i8 9, i8 9, i8 9, i8 9> ; Second use of %load			%add = add <4 x i8> %load, <i8 9, i8 9, i8 9, i8 9> ; Second use of %load
	▲ Show 20 Lines • Show All 606 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/sminmax.v2i16.ll

	; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX9,GCN %s			; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX9,GCN %s
	; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=VI,CIVI,GCN %s			; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=VI,CIVI,GCN %s
	; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=CI,CIVI,GCN %s			; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=CI,CIVI,GCN %s

	; GCN-LABEL: {{^}}s_abs_v2i16:			; GCN-LABEL: {{^}}s_abs_v2i16:
	; GFX9: s_load_dword [[VAL:s[0-9]+]]			; GFX9: s_load_dword [[VAL:s[0-9]+]]
	; GFX9: v_pk_sub_i16 [[SUB:v[0-9]+]], 0, [[VAL]]			; GFX9: v_pk_sub_i16 [[SUB:v[0-9]+]], 0, [[VAL]]
	; GFX9: v_pk_max_i16 [[MAX:v[0-9]+]], [[VAL]], [[SUB]]			; GFX9: v_pk_max_i16 [[MAX:v[0-9]+]], [[VAL]], [[SUB]]
	; GFX9: v_pk_add_u16 [[ADD:v[0-9]+]], [[MAX]], 2			; GFX9: v_pk_add_u16 [[ADD:v[0-9]+]], [[MAX]], 2

	; CIVI: s_lshr_b32 s{{[0-9]+}}, s{{[0-9]+}}, 16			; CIVI: s_lshr_b32 s{{[0-9]+}}, s{{[0-9]+}}, 16
	; CIVI: s_sub_i32			; CIVI: s_sub_i32
	; CIVI: s_sub_i32			; CIVI: s_sub_i32
	; CIVI: s_max_i32			; CIVI: s_max_i32
	; CIVI: s_max_i32			; CIVI: s_max_i32
	; CIVI: s_add_i32			; CIVI: s_add_i32
	; CIVI: s_add_i32			; CIVI-DAG: s_add_i32
	; CIVI: s_and_b32			; CIVI-DAG: s_and_b32
	; CIVI: s_or_b32			; CIVI-DAG: s_or_b32
	define amdgpu_kernel void @s_abs_v2i16(<2 x i16> addrspace(1)* %out, <2 x i16> %val) #0 {			define amdgpu_kernel void @s_abs_v2i16(<2 x i16> addrspace(1)* %out, <2 x i16> %val) #0 {
	%neg = sub <2 x i16> zeroinitializer, %val			%neg = sub <2 x i16> zeroinitializer, %val
	%cond = icmp sgt <2 x i16> %val, %neg			%cond = icmp sgt <2 x i16> %val, %neg
	%res = select <2 x i1> %cond, <2 x i16> %val, <2 x i16> %neg			%res = select <2 x i1> %cond, <2 x i16> %val, <2 x i16> %neg
	%res2 = add <2 x i16> %res, <i16 2, i16 2>			%res2 = add <2 x i16> %res, <i16 2, i16 2>
	store <2 x i16> %res2, <2 x i16> addrspace(1)* %out, align 4			store <2 x i16> %res2, <2 x i16> addrspace(1)* %out, align 4
	ret void			ret void
	}			}
	Show All 12 Lines
	; VI: v_max_i16_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}			; VI: v_max_i16_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
	; VI: v_add_u16_e32 v{{[0-9]+}}, 2, v{{[0-9]+}}			; VI: v_add_u16_e32 v{{[0-9]+}}, 2, v{{[0-9]+}}
	; VI: v_add_u16_sdwa v{{[0-9]+}}, v{{[0-9]+}}, [[TWO]] dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD			; VI: v_add_u16_sdwa v{{[0-9]+}}, v{{[0-9]+}}, [[TWO]] dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
	; VI-NOT: v_and_b32			; VI-NOT: v_and_b32
	; VI: v_or_b32_e32			; VI: v_or_b32_e32

	; CI: buffer_load_dword v			; CI: buffer_load_dword v
	; CI: v_lshrrev_b32_e32			; CI: v_lshrrev_b32_e32
	; CI: v_sub_i32_e32			; CI-DAG: v_sub_i32_e32
	; CI: v_bfe_i32			; CI-DAG: v_bfe_i32
	; CI: v_bfe_i32			; CI-DAG: v_bfe_i32
	; CI: v_max_i32			; CI-DAG: v_max_i32
	; CI: v_max_i32			; CI-DAG: v_max_i32
	; CI: v_add_i32			; CI-DAG: v_add_i32
	; CI: v_add_i32			; CI-DAG: v_add_i32
	; CI: v_or_b32			; CI-DAG: v_or_b32
	define amdgpu_kernel void @v_abs_v2i16(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %src) #0 {			define amdgpu_kernel void @v_abs_v2i16(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %src) #0 {
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep.in = getelementptr inbounds <2 x i16>, <2 x i16> addrspace(1)* %src, i32 %tid			%gep.in = getelementptr inbounds <2 x i16>, <2 x i16> addrspace(1)* %src, i32 %tid
	%gep.out = getelementptr inbounds <2 x i16>, <2 x i16> addrspace(1)* %out, i32 %tid			%gep.out = getelementptr inbounds <2 x i16>, <2 x i16> addrspace(1)* %out, i32 %tid
	%val = load <2 x i16>, <2 x i16> addrspace(1)* %gep.in, align 4			%val = load <2 x i16>, <2 x i16> addrspace(1)* %gep.in, align 4
	%neg = sub <2 x i16> zeroinitializer, %val			%neg = sub <2 x i16> zeroinitializer, %val
	%cond = icmp sgt <2 x i16> %val, %neg			%cond = icmp sgt <2 x i16> %val, %neg
	%res = select <2 x i1> %cond, <2 x i16> %val, <2 x i16> %neg			%res = select <2 x i1> %cond, <2 x i16> %val, <2 x i16> %neg
	▲ Show 20 Lines • Show All 171 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/widen-smrd-loads.ll

	Show First 20 Lines • Show All 207 Lines • ▼ Show 20 Lines
	; SI-NEXT: s_mov_b32 s5, 0			; SI-NEXT: s_mov_b32 s5, 0
	; SI-NEXT: s_mov_b32 s4, 0			; SI-NEXT: s_mov_b32 s4, 0
	; SI-NEXT: s_mov_b32 s7, 0xf000			; SI-NEXT: s_mov_b32 s7, 0xf000
	; SI-NEXT: s_mov_b32 s6, -1			; SI-NEXT: s_mov_b32 s6, -1
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_load_dword s0, s[0:1], 0x0			; SI-NEXT: s_load_dword s0, s[0:1], 0x0
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_and_b32 s1, s0, 0xff00			; SI-NEXT: s_and_b32 s1, s0, 0xff00
	; SI-NEXT: s_and_b32 s0, s0, 0xffff
	; SI-NEXT: s_add_i32 s0, s0, 12			; SI-NEXT: s_add_i32 s0, s0, 12
	; SI-NEXT: s_or_b32 s0, s0, 4			; SI-NEXT: s_or_b32 s0, s0, 4
	; SI-NEXT: s_addk_i32 s1, 0x2c00
	; SI-NEXT: s_and_b32 s0, s0, 0xff			; SI-NEXT: s_and_b32 s0, s0, 0xff
	; SI-NEXT: s_or_b32 s0, s0, s1			; SI-NEXT: s_or_b32 s0, s0, s1
				; SI-NEXT: s_addk_i32 s0, 0x2c00
	; SI-NEXT: s_or_b32 s0, s0, 0x300			; SI-NEXT: s_or_b32 s0, s0, 0x300
	; SI-NEXT: v_mov_b32_e32 v0, s0			; SI-NEXT: v_mov_b32_e32 v0, s0
	; SI-NEXT: buffer_store_short v0, off, s[4:7], 0			; SI-NEXT: buffer_store_short v0, off, s[4:7], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: widen_v2i8_constant_load:			; VI-LABEL: widen_v2i8_constant_load:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	▲ Show 20 Lines • Show All 263 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/Hexagon/subi-asl.ll

	; RUN: llc -march=hexagon < %s \| FileCheck %s			; RUN: llc -march=hexagon < %s \| FileCheck %s

	; Check if S4_subi_asl_ri is being generated correctly.			; Check if S4_subi_asl_ri is being generated correctly.

	; CHECK-LABEL: yes_sub_asl			; CHECK-LABEL: yes_sub_asl
	; CHECK: [[REG1:(r[0-9]+)]] = sub(#0,asl([[REG1]],#1))			; FIXME: We no longer get subi_asl here.
				; XCHECK: [[REG1:(r[0-9]+)]] = sub(#0,asl([[REG1]],#1))
				; CHECK: [[REG1:(r[0-9]+)]] = asl([[REG1]],#1)
				; CHECK: = sub(#0,[[REG1]])

	; CHECK-LABEL: no_sub_asl			; CHECK-LABEL: no_sub_asl
	; CHECK: [[REG2:(r[0-9]+)]] = asl(r{{[0-9]+}},#1)			; CHECK: [[REG2:(r[0-9]+)]] = asl(r{{[0-9]+}},#1)
	; CHECK: r{{[0-9]+}} = sub([[REG2]],r{{[0-9]+}})			; CHECK: r{{[0-9]+}} = sub([[REG2]],r{{[0-9]+}})

	%struct.rtx_def = type { i16, i8 }			%struct.rtx_def = type { i16, i8 }

	@this_insn_number = external global i32, align 4			@this_insn_number = external global i32, align 4
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/scheduler-backtracking.ll

	Show All 11 Lines
	define i256 @test1(i256 %a) nounwind {			define i256 @test1(i256 %a) nounwind {
	; ILP-LABEL: test1:			; ILP-LABEL: test1:
	; ILP: # %bb.0:			; ILP: # %bb.0:
	; ILP-NEXT: pushq %r14			; ILP-NEXT: pushq %r14
	; ILP-NEXT: pushq %rbx			; ILP-NEXT: pushq %rbx
	; ILP-NEXT: movq %rdi, %rax			; ILP-NEXT: movq %rdi, %rax
	; ILP-NEXT: xorl %r8d, %r8d			; ILP-NEXT: xorl %r8d, %r8d
	; ILP-NEXT: addl %esi, %esi			; ILP-NEXT: addl %esi, %esi
	; ILP-NEXT: addb $2, %sil			; ILP-NEXT: leal 3(%rsi), %r9d
	; ILP-NEXT: orb $1, %sil			; ILP-NEXT: movb $125, %r10b
	; ILP-NEXT: movl $1, %r10d			; ILP-NEXT: movl $1, %edi
	; ILP-NEXT: xorl %r14d, %r14d			; ILP-NEXT: xorl %r11d, %r11d
				; ILP-NEXT: movl %r9d, %ecx
				; ILP-NEXT: shldq %cl, %rdi, %r11
				; ILP-NEXT: subb %sil, %r10b
				; ILP-NEXT: addb $-125, %sil
				; ILP-NEXT: xorl %ebx, %ebx
	; ILP-NEXT: movl %esi, %ecx			; ILP-NEXT: movl %esi, %ecx
	; ILP-NEXT: shldq %cl, %r10, %r14			; ILP-NEXT: shldq %cl, %rdi, %rbx
	; ILP-NEXT: movl $1, %edx			; ILP-NEXT: movl $1, %edx
	; ILP-NEXT: shlq %cl, %rdx			; ILP-NEXT: shlq %cl, %rdx
	; ILP-NEXT: leal -128(%rsi), %r9d			; ILP-NEXT: movl $1, %r14d
	; ILP-NEXT: movb $-128, %r11b			; ILP-NEXT: movl %r10d, %ecx
	; ILP-NEXT: xorl %ebx, %ebx			; ILP-NEXT: shrdq %cl, %r8, %r14
	; ILP-NEXT: movl %r9d, %ecx			; ILP-NEXT: movl %r9d, %ecx
	; ILP-NEXT: shldq %cl, %r10, %rbx
	; ILP-NEXT: testb $64, %sil
	; ILP-NEXT: cmovneq %rdx, %r14
	; ILP-NEXT: cmovneq %r8, %rdx
	; ILP-NEXT: movl $1, %edi
	; ILP-NEXT: shlq %cl, %rdi			; ILP-NEXT: shlq %cl, %rdi
	; ILP-NEXT: subb %sil, %r11b
	; ILP-NEXT: movl %r11d, %ecx
	; ILP-NEXT: shrdq %cl, %r8, %r10
	; ILP-NEXT: testb $64, %r11b
	; ILP-NEXT: cmovneq %r8, %r10
	; ILP-NEXT: testb $64, %r9b			; ILP-NEXT: testb $64, %r9b
	; ILP-NEXT: cmovneq %rdi, %rbx			; ILP-NEXT: cmovneq %rdi, %r11
	; ILP-NEXT: cmovneq %r8, %rdi			; ILP-NEXT: cmovneq %r8, %rdi
	; ILP-NEXT: testb %sil, %sil			; ILP-NEXT: testb $64, %r10b
	; ILP-NEXT: cmovsq %r8, %r14			; ILP-NEXT: cmovneq %r8, %r14
	; ILP-NEXT: cmovsq %r8, %rdx			; ILP-NEXT: testb $64, %sil
	; ILP-NEXT: movq %r14, 8(%rax)			; ILP-NEXT: cmovneq %rdx, %rbx
	; ILP-NEXT: movq %rdx, (%rax)			; ILP-NEXT: cmovneq %r8, %rdx
				; ILP-NEXT: testb %r9b, %r9b
				; ILP-NEXT: cmovsq %r8, %r11
				; ILP-NEXT: cmovsq %r8, %rdi
				; ILP-NEXT: movq %r11, 8(%rax)
				; ILP-NEXT: movq %rdi, (%rax)
	; ILP-NEXT: cmovnsq %r8, %rbx			; ILP-NEXT: cmovnsq %r8, %rbx
	; ILP-NEXT: cmoveq %r8, %rbx			; ILP-NEXT: cmoveq %r8, %rbx
	; ILP-NEXT: movq %rbx, 24(%rax)			; ILP-NEXT: movq %rbx, 24(%rax)
	; ILP-NEXT: cmovnsq %r10, %rdi			; ILP-NEXT: cmovnsq %r14, %rdx
	; ILP-NEXT: cmoveq %r8, %rdi			; ILP-NEXT: cmoveq %r8, %rdx
	; ILP-NEXT: movq %rdi, 16(%rax)			; ILP-NEXT: movq %rdx, 16(%rax)
	; ILP-NEXT: popq %rbx			; ILP-NEXT: popq %rbx
	; ILP-NEXT: popq %r14			; ILP-NEXT: popq %r14
	; ILP-NEXT: retq			; ILP-NEXT: retq
	;			;
	; HYBRID-LABEL: test1:			; HYBRID-LABEL: test1:
	; HYBRID: # %bb.0:			; HYBRID: # %bb.0:
				; HYBRID-NEXT: pushq %rbx
	; HYBRID-NEXT: movq %rdi, %rax			; HYBRID-NEXT: movq %rdi, %rax
	; HYBRID-NEXT: addl %esi, %esi			; HYBRID-NEXT: addl %esi, %esi
	; HYBRID-NEXT: addb $2, %sil			; HYBRID-NEXT: movb $125, %cl
	; HYBRID-NEXT: orb $1, %sil
	; HYBRID-NEXT: movb $-128, %cl
	; HYBRID-NEXT: subb %sil, %cl			; HYBRID-NEXT: subb %sil, %cl
	; HYBRID-NEXT: xorl %r8d, %r8d			; HYBRID-NEXT: xorl %r8d, %r8d
	; HYBRID-NEXT: movl $1, %r11d			; HYBRID-NEXT: movl $1, %edi
	; HYBRID-NEXT: movl $1, %r9d			; HYBRID-NEXT: movl $1, %r9d
	; HYBRID-NEXT: shrdq %cl, %r8, %r9			; HYBRID-NEXT: shrdq %cl, %r8, %r9
	; HYBRID-NEXT: testb $64, %cl			; HYBRID-NEXT: testb $64, %cl
	; HYBRID-NEXT: cmovneq %r8, %r9			; HYBRID-NEXT: cmovneq %r8, %r9
	; HYBRID-NEXT: xorl %r10d, %r10d			; HYBRID-NEXT: leal 3(%rsi), %r10d
	; HYBRID-NEXT: movl %esi, %ecx			; HYBRID-NEXT: xorl %r11d, %r11d
	; HYBRID-NEXT: shldq %cl, %r11, %r10			; HYBRID-NEXT: movl %r10d, %ecx
	; HYBRID-NEXT: leal -128(%rsi), %ecx			; HYBRID-NEXT: shldq %cl, %rdi, %r11
	; HYBRID-NEXT: xorl %edi, %edi			; HYBRID-NEXT: addb $-125, %sil
	; HYBRID-NEXT: shldq %cl, %r11, %rdi			; HYBRID-NEXT: xorl %edx, %edx
	; HYBRID-NEXT: movl $1, %edx
	; HYBRID-NEXT: shlq %cl, %rdx
	; HYBRID-NEXT: testb $64, %cl
	; HYBRID-NEXT: cmovneq %rdx, %rdi
	; HYBRID-NEXT: cmovneq %r8, %rdx
	; HYBRID-NEXT: movl %esi, %ecx			; HYBRID-NEXT: movl %esi, %ecx
	; HYBRID-NEXT: shlq %cl, %r11			; HYBRID-NEXT: shldq %cl, %rdi, %rdx
				; HYBRID-NEXT: movl $1, %ebx
				; HYBRID-NEXT: shlq %cl, %rbx
	; HYBRID-NEXT: testb $64, %sil			; HYBRID-NEXT: testb $64, %sil
	; HYBRID-NEXT: cmovneq %r11, %r10			; HYBRID-NEXT: cmovneq %rbx, %rdx
	; HYBRID-NEXT: cmovneq %r8, %r11			; HYBRID-NEXT: cmovneq %r8, %rbx
	; HYBRID-NEXT: testb %sil, %sil			; HYBRID-NEXT: movl %r10d, %ecx
	; HYBRID-NEXT: cmovsq %r8, %r10			; HYBRID-NEXT: shlq %cl, %rdi
	; HYBRID-NEXT: movq %r10, 8(%rax)			; HYBRID-NEXT: testb $64, %r10b
				; HYBRID-NEXT: cmovneq %rdi, %r11
				; HYBRID-NEXT: cmovneq %r8, %rdi
				; HYBRID-NEXT: testb %r10b, %r10b
	; HYBRID-NEXT: cmovsq %r8, %r11			; HYBRID-NEXT: cmovsq %r8, %r11
	; HYBRID-NEXT: movq %r11, (%rax)			; HYBRID-NEXT: movq %r11, 8(%rax)
	; HYBRID-NEXT: cmovnsq %r8, %rdi			; HYBRID-NEXT: cmovsq %r8, %rdi
	; HYBRID-NEXT: cmoveq %r8, %rdi			; HYBRID-NEXT: movq %rdi, (%rax)
	; HYBRID-NEXT: movq %rdi, 24(%rax)			; HYBRID-NEXT: cmovnsq %r8, %rdx
	; HYBRID-NEXT: cmovnsq %r9, %rdx
	; HYBRID-NEXT: cmoveq %r8, %rdx			; HYBRID-NEXT: cmoveq %r8, %rdx
	; HYBRID-NEXT: movq %rdx, 16(%rax)			; HYBRID-NEXT: movq %rdx, 24(%rax)
				; HYBRID-NEXT: cmovnsq %r9, %rbx
				; HYBRID-NEXT: cmoveq %r8, %rbx
				; HYBRID-NEXT: movq %rbx, 16(%rax)
				; HYBRID-NEXT: popq %rbx
	; HYBRID-NEXT: retq			; HYBRID-NEXT: retq
	;			;
	; BURR-LABEL: test1:			; BURR-LABEL: test1:
	; BURR: # %bb.0:			; BURR: # %bb.0:
				; BURR-NEXT: pushq %rbx
	; BURR-NEXT: movq %rdi, %rax			; BURR-NEXT: movq %rdi, %rax
	; BURR-NEXT: addl %esi, %esi			; BURR-NEXT: addl %esi, %esi
	; BURR-NEXT: addb $2, %sil			; BURR-NEXT: movb $125, %cl
	; BURR-NEXT: orb $1, %sil
	; BURR-NEXT: movb $-128, %cl
	; BURR-NEXT: subb %sil, %cl			; BURR-NEXT: subb %sil, %cl
	; BURR-NEXT: xorl %r8d, %r8d			; BURR-NEXT: xorl %r8d, %r8d
	; BURR-NEXT: movl $1, %r11d			; BURR-NEXT: movl $1, %edi
	; BURR-NEXT: movl $1, %r9d			; BURR-NEXT: movl $1, %r9d
	; BURR-NEXT: shrdq %cl, %r8, %r9			; BURR-NEXT: shrdq %cl, %r8, %r9
	; BURR-NEXT: testb $64, %cl			; BURR-NEXT: testb $64, %cl
	; BURR-NEXT: cmovneq %r8, %r9			; BURR-NEXT: cmovneq %r8, %r9
	; BURR-NEXT: xorl %r10d, %r10d			; BURR-NEXT: leal 3(%rsi), %r10d
				; BURR-NEXT: xorl %r11d, %r11d
				; BURR-NEXT: movl %r10d, %ecx
				; BURR-NEXT: shldq %cl, %rdi, %r11
				; BURR-NEXT: addb $-125, %sil
				; BURR-NEXT: xorl %edx, %edx
	; BURR-NEXT: movl %esi, %ecx			; BURR-NEXT: movl %esi, %ecx
	; BURR-NEXT: shldq %cl, %r11, %r10			; BURR-NEXT: shldq %cl, %rdi, %rdx
	; BURR-NEXT: leal -128(%rsi), %ecx			; BURR-NEXT: movl $1, %ebx
	; BURR-NEXT: xorl %edi, %edi			; BURR-NEXT: shlq %cl, %rbx
	; BURR-NEXT: shldq %cl, %r11, %rdi
	; BURR-NEXT: movl $1, %edx
	; BURR-NEXT: shlq %cl, %rdx
	; BURR-NEXT: testb $64, %cl
	; BURR-NEXT: cmovneq %rdx, %rdi
	; BURR-NEXT: cmovneq %r8, %rdx
	; BURR-NEXT: movl %esi, %ecx
	; BURR-NEXT: shlq %cl, %r11
	; BURR-NEXT: testb $64, %sil			; BURR-NEXT: testb $64, %sil
	; BURR-NEXT: cmovneq %r11, %r10			; BURR-NEXT: cmovneq %rbx, %rdx
	; BURR-NEXT: cmovneq %r8, %r11			; BURR-NEXT: cmovneq %r8, %rbx
	; BURR-NEXT: testb %sil, %sil			; BURR-NEXT: movl %r10d, %ecx
	; BURR-NEXT: cmovsq %r8, %r10			; BURR-NEXT: shlq %cl, %rdi
	; BURR-NEXT: movq %r10, 8(%rax)			; BURR-NEXT: testb $64, %r10b
				; BURR-NEXT: cmovneq %rdi, %r11
				; BURR-NEXT: cmovneq %r8, %rdi
				; BURR-NEXT: testb %r10b, %r10b
	; BURR-NEXT: cmovsq %r8, %r11			; BURR-NEXT: cmovsq %r8, %r11
	; BURR-NEXT: movq %r11, (%rax)			; BURR-NEXT: movq %r11, 8(%rax)
	; BURR-NEXT: cmovnsq %r8, %rdi			; BURR-NEXT: cmovsq %r8, %rdi
	; BURR-NEXT: cmoveq %r8, %rdi			; BURR-NEXT: movq %rdi, (%rax)
	; BURR-NEXT: movq %rdi, 24(%rax)			; BURR-NEXT: cmovnsq %r8, %rdx
	; BURR-NEXT: cmovnsq %r9, %rdx
	; BURR-NEXT: cmoveq %r8, %rdx			; BURR-NEXT: cmoveq %r8, %rdx
	; BURR-NEXT: movq %rdx, 16(%rax)			; BURR-NEXT: movq %rdx, 24(%rax)
				; BURR-NEXT: cmovnsq %r9, %rbx
				; BURR-NEXT: cmoveq %r8, %rbx
				; BURR-NEXT: movq %rbx, 16(%rax)
				; BURR-NEXT: popq %rbx
	; BURR-NEXT: retq			; BURR-NEXT: retq
	;			;
	; SRC-LABEL: test1:			; SRC-LABEL: test1:
	; SRC: # %bb.0:			; SRC: # %bb.0:
	; SRC-NEXT: pushq %rbx			; SRC-NEXT: pushq %rbx
	; SRC-NEXT: movq %rdi, %rax			; SRC-NEXT: movq %rdi, %rax
	; SRC-NEXT: addl %esi, %esi			; SRC-NEXT: addl %esi, %esi
	; SRC-NEXT: addb $2, %sil			; SRC-NEXT: leal 3(%rsi), %r9d
	; SRC-NEXT: orb $1, %sil			; SRC-NEXT: movb $125, %cl
	; SRC-NEXT: movb $-128, %cl
	; SRC-NEXT: subb %sil, %cl			; SRC-NEXT: subb %sil, %cl
	; SRC-NEXT: xorl %r8d, %r8d			; SRC-NEXT: xorl %r8d, %r8d
	; SRC-NEXT: movl $1, %edi			; SRC-NEXT: movl $1, %edi
	; SRC-NEXT: movl $1, %r10d			; SRC-NEXT: movl $1, %r10d
	; SRC-NEXT: shrdq %cl, %r8, %r10			; SRC-NEXT: shrdq %cl, %r8, %r10
	; SRC-NEXT: testb $64, %cl			; SRC-NEXT: testb $64, %cl
	; SRC-NEXT: cmovneq %r8, %r10			; SRC-NEXT: cmovneq %r8, %r10
	; SRC-NEXT: leal -128(%rsi), %r9d			; SRC-NEXT: addb $-125, %sil
	; SRC-NEXT: xorl %edx, %edx			; SRC-NEXT: xorl %edx, %edx
	; SRC-NEXT: movl %r9d, %ecx			; SRC-NEXT: movl %esi, %ecx
	; SRC-NEXT: shldq %cl, %rdi, %rdx			; SRC-NEXT: shldq %cl, %rdi, %rdx
	; SRC-NEXT: xorl %r11d, %r11d			; SRC-NEXT: xorl %r11d, %r11d
	; SRC-NEXT: movl %esi, %ecx			; SRC-NEXT: movl %r9d, %ecx
	; SRC-NEXT: shldq %cl, %rdi, %r11			; SRC-NEXT: shldq %cl, %rdi, %r11
	; SRC-NEXT: movl $1, %ebx			; SRC-NEXT: movl $1, %ebx
	; SRC-NEXT: shlq %cl, %rbx			; SRC-NEXT: shlq %cl, %rbx
	; SRC-NEXT: testb $64, %sil			; SRC-NEXT: testb $64, %r9b
	; SRC-NEXT: cmovneq %rbx, %r11			; SRC-NEXT: cmovneq %rbx, %r11
	; SRC-NEXT: cmovneq %r8, %rbx			; SRC-NEXT: cmovneq %r8, %rbx
	; SRC-NEXT: movl %r9d, %ecx			; SRC-NEXT: movl %esi, %ecx
	; SRC-NEXT: shlq %cl, %rdi			; SRC-NEXT: shlq %cl, %rdi
	; SRC-NEXT: testb $64, %r9b			; SRC-NEXT: testb $64, %sil
	; SRC-NEXT: cmovneq %rdi, %rdx			; SRC-NEXT: cmovneq %rdi, %rdx
	; SRC-NEXT: cmovneq %r8, %rdi			; SRC-NEXT: cmovneq %r8, %rdi
	; SRC-NEXT: testb %sil, %sil			; SRC-NEXT: testb %r9b, %r9b
	; SRC-NEXT: cmovnsq %r10, %rdi			; SRC-NEXT: cmovnsq %r10, %rdi
	; SRC-NEXT: cmoveq %r8, %rdi			; SRC-NEXT: cmoveq %r8, %rdi
	; SRC-NEXT: cmovnsq %r8, %rdx			; SRC-NEXT: cmovnsq %r8, %rdx
	; SRC-NEXT: cmoveq %r8, %rdx			; SRC-NEXT: cmoveq %r8, %rdx
	; SRC-NEXT: cmovsq %r8, %r11			; SRC-NEXT: cmovsq %r8, %r11
	; SRC-NEXT: cmovsq %r8, %rbx			; SRC-NEXT: cmovsq %r8, %rbx
	; SRC-NEXT: movq %r11, 8(%rax)			; SRC-NEXT: movq %r11, 8(%rax)
	; SRC-NEXT: movq %rbx, (%rax)			; SRC-NEXT: movq %rbx, (%rax)
	; SRC-NEXT: movq %rdx, 24(%rax)			; SRC-NEXT: movq %rdx, 24(%rax)
	; SRC-NEXT: movq %rdi, 16(%rax)			; SRC-NEXT: movq %rdi, 16(%rax)
	; SRC-NEXT: popq %rbx			; SRC-NEXT: popq %rbx
	; SRC-NEXT: retq			; SRC-NEXT: retq
	;			;
	; LIN-LABEL: test1:			; LIN-LABEL: test1:
	; LIN: # %bb.0:			; LIN: # %bb.0:
	; LIN-NEXT: movq %rdi, %rax			; LIN-NEXT: movq %rdi, %rax
	; LIN-NEXT: xorl %r9d, %r9d			; LIN-NEXT: xorl %r9d, %r9d
	; LIN-NEXT: movl $1, %r8d			; LIN-NEXT: movl $1, %r8d
	; LIN-NEXT: addl %esi, %esi			; LIN-NEXT: addl %esi, %esi
	; LIN-NEXT: addb $2, %sil			; LIN-NEXT: leal 3(%rsi), %ecx
	; LIN-NEXT: orb $1, %sil			; LIN-NEXT: movl $1, %edi
	; LIN-NEXT: movl $1, %edx			; LIN-NEXT: shlq %cl, %rdi
	; LIN-NEXT: movl %esi, %ecx			; LIN-NEXT: testb $64, %cl
	; LIN-NEXT: shlq %cl, %rdx			; LIN-NEXT: movq %rdi, %rdx
	; LIN-NEXT: testb $64, %sil			; LIN-NEXT: cmovneq %r9, %rdx
	; LIN-NEXT: movq %rdx, %rcx			; LIN-NEXT: testb %cl, %cl
	; LIN-NEXT: cmovneq %r9, %rcx			; LIN-NEXT: cmovsq %r9, %rdx
	; LIN-NEXT: testb %sil, %sil			; LIN-NEXT: movq %rdx, (%rax)
	; LIN-NEXT: cmovsq %r9, %rcx			; LIN-NEXT: xorl %edx, %edx
	; LIN-NEXT: movq %rcx, (%rdi)			; LIN-NEXT: # kill: def $cl killed $cl killed $ecx
	; LIN-NEXT: xorl %edi, %edi			; LIN-NEXT: shldq %cl, %r8, %rdx
	; LIN-NEXT: movl %esi, %ecx			; LIN-NEXT: cmovneq %rdi, %rdx
	; LIN-NEXT: shldq %cl, %r8, %rdi			; LIN-NEXT: cmovsq %r9, %rdx
	; LIN-NEXT: cmovneq %rdx, %rdi			; LIN-NEXT: movq %rdx, 8(%rax)
	; LIN-NEXT: cmovsq %r9, %rdi			; LIN-NEXT: leal -125(%rsi), %r10d
	; LIN-NEXT: movq %rdi, 8(%rax)
	; LIN-NEXT: leal -128(%rsi), %r10d
	; LIN-NEXT: movl $1, %edx			; LIN-NEXT: movl $1, %edx
	; LIN-NEXT: movl %r10d, %ecx			; LIN-NEXT: movl %r10d, %ecx
	; LIN-NEXT: shlq %cl, %rdx			; LIN-NEXT: shlq %cl, %rdx
	; LIN-NEXT: testb $64, %r10b			; LIN-NEXT: testb $64, %r10b
	; LIN-NEXT: movq %rdx, %rdi			; LIN-NEXT: movq %rdx, %rdi
	; LIN-NEXT: cmovneq %r9, %rdi			; LIN-NEXT: cmovneq %r9, %rdi
	; LIN-NEXT: movb $-128, %cl			; LIN-NEXT: movb $125, %cl
	; LIN-NEXT: subb %sil, %cl			; LIN-NEXT: subb %sil, %cl
	; LIN-NEXT: movl $1, %esi			; LIN-NEXT: movl $1, %esi
	; LIN-NEXT: shrdq %cl, %r9, %rsi			; LIN-NEXT: shrdq %cl, %r9, %rsi
	; LIN-NEXT: testb $64, %cl			; LIN-NEXT: testb $64, %cl
	; LIN-NEXT: cmovneq %r9, %rsi			; LIN-NEXT: cmovneq %r9, %rsi
	; LIN-NEXT: cmovsq %rdi, %rsi			; LIN-NEXT: cmovsq %rdi, %rsi
	; LIN-NEXT: cmoveq %r9, %rsi			; LIN-NEXT: cmoveq %r9, %rsi
	; LIN-NEXT: movq %rsi, 16(%rax)			; LIN-NEXT: movq %rsi, 16(%rax)
	▲ Show 20 Lines • Show All 800 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/signbit-shift.ll

Show All 27 Lines	; CHECK-NEXT: retq
ret i32 %r		ret i32 %r
}		}

define <4 x i32> @add_zext_ifpos_vec_splat(<4 x i32> %x) {		define <4 x i32> @add_zext_ifpos_vec_splat(<4 x i32> %x) {
; CHECK-LABEL: add_zext_ifpos_vec_splat:		; CHECK-LABEL: add_zext_ifpos_vec_splat:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: pcmpeqd %xmm1, %xmm1		; CHECK-NEXT: pcmpeqd %xmm1, %xmm1
; CHECK-NEXT: pcmpgtd %xmm1, %xmm0		; CHECK-NEXT: pcmpgtd %xmm1, %xmm0
; CHECK-NEXT: psrld $31, %xmm0		; CHECK-NEXT: movdqa {{.*#+}} xmm1 = [42,42,42,42]
; CHECK-NEXT: por {{.*}}(%rip), %xmm0		; CHECK-NEXT: psubd %xmm0, %xmm1
		; CHECK-NEXT: movdqa %xmm1, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%c = icmp sgt <4 x i32> %x, <i32 -1, i32 -1, i32 -1, i32 -1>		%c = icmp sgt <4 x i32> %x, <i32 -1, i32 -1, i32 -1, i32 -1>
%e = zext <4 x i1> %c to <4 x i32>		%e = zext <4 x i1> %c to <4 x i32>
%r = add <4 x i32> %e, <i32 42, i32 42, i32 42, i32 42>		%r = add <4 x i32> %e, <i32 42, i32 42, i32 42, i32 42>
ret <4 x i32> %r		ret <4 x i32> %r
}		}

define i32 @sel_ifpos_tval_bigger(i32 %x) {		define i32 @sel_ifpos_tval_bigger(i32 %x) {
▲ Show 20 Lines • Show All 232 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/split-store.ll

	Show First 20 Lines • Show All 211 Lines • ▼ Show 20 Lines
	}			}

	; getTypeSizeInBits(i2) != getTypeStoreSizeInBits(i2), so store split doesn't kick in.			; getTypeSizeInBits(i2) != getTypeStoreSizeInBits(i2), so store split doesn't kick in.

	define void @int1_int1_pair(i1 signext %tmp1, i1 signext %tmp2, i2* %ref.tmp) {			define void @int1_int1_pair(i1 signext %tmp1, i1 signext %tmp2, i2* %ref.tmp) {
	; CHECK-LABEL: int1_int1_pair:			; CHECK-LABEL: int1_int1_pair:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: addb %sil, %sil			; CHECK-NEXT: addb %sil, %sil
	; CHECK-NEXT: andb $1, %dil			; CHECK-NEXT: subb %dil, %sil
	; CHECK-NEXT: orb %sil, %dil			; CHECK-NEXT: andb $3, %sil
	; CHECK-NEXT: andb $3, %dil			; CHECK-NEXT: movb %sil, (%rdx)
	; CHECK-NEXT: movb %dil, (%rdx)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t1 = zext i1 %tmp2 to i2			%t1 = zext i1 %tmp2 to i2
	%t2 = shl nuw i2 %t1, 1			%t2 = shl nuw i2 %t1, 1
	%t3 = zext i1 %tmp1 to i2			%t3 = zext i1 %tmp1 to i2
	%t4 = or i2 %t2, %t3			%t4 = or i2 %t2, %t3
	store i2 %t4, i2* %ref.tmp, align 1			store i2 %t4, i2* %ref.tmp, align 1
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines