This is an archive of the discontinued LLVM Phabricator instance.

X86: More efficient codegen for 64-bit compares on 32-bit target
ClosedPublic

Authored by hans on Nov 9 2015, 4:11 AM.

Download Raw Diff

Details

Reviewers

majnemer
mkuper
DavidKreitzer

Commits

rGdcc250045274: X86: More efficient legalization of wide integer compares
rL253572: X86: More efficient legalization of wide integer compares

Summary

This patch changes the lowering of 64-bit integer compares on 32-bit x86 to be more efficient.

Example:

define i32 @test_slt(i64 %a, i64 %b) {
entry:
  %cmp = icmp slt i64 %a, %b
  br i1 %cmp, label %bb1, label %bb2
bb1:
  ret i32 1
bb2:
  ret i32 2
}

Before this patch:

test_slt:
        movl    4(%esp), %eax
        movl    8(%esp), %ecx
        cmpl    12(%esp), %eax
        setae   %al
        cmpl    16(%esp), %ecx
        setge   %cl
        je      .LBB2_2
        movb    %cl, %al
.LBB2_2:
        testb   %al, %al
        jne     .LBB2_4
        movl    $1, %eax
        retl
.LBB2_4:
        movl    $2, %eax
        retl

After this patch:

test_slt:
        movl    4(%esp), %eax
        movl    8(%esp), %ecx
        cmpl    12(%esp), %eax
        sbbl    16(%esp), %ecx
        jge     .LBB1_2
        movl    $1, %eax
        retl
.LBB1_2:
        movl    $2, %eax
        retl

On a 32-bit Clang bootstrap, this results in 19 KB binary size reduction.

Diff Detail

Event Timeline

hans updated this revision to Diff 39675.Nov 9 2015, 4:11 AM

hans retitled this revision from to X86: More efficient codegen for 64-bit compare-and-branch .

hans updated this object.

hans added reviewers: mkuper, majnemer.

hans added subscribers: llvm-commits, hansw.

Hi Hans,

For the eq version, I find it a bit surprising that the new code is faster, but if benchmarksing says it is, who am I to argue. Adding Dave to the review for another opinion.

For the lt version, the new code definitely looks better than what we had before. It looks like there's another option though, which is more similar in spirit to the old code than to the new, but looks much nicer. This is what ICC produces:

test(long long, long long):
        movl      4(%esp), %eax 
        subl      12(%esp), %eax
        movl      8(%esp), %edx 
        sbbl      16(%esp), %edx
        jge       ..B1.3        
        movl      $1, %eax      
        ret                     
..B1.3:                         
        movl      $2, %eax      
        ret

Do you think it may be worth lowering the new pseudo to that, instead of the proposed sequence?

Michael

lib/Target/X86/X86ISelLowering.cpp
15301	I'm not a huge fan of this, but producing the pseudo in a target-specific pre-legalization DAG combine sounds like it may cause too many problems due to making the comparison opaque to other early combines.
21247	Why is this guaranteed?

mkuper added a reviewer: DavidKreitzer.Nov 9 2015, 5:37 AM

Thanks for the quick reply!

In D14496#285017, @mkuper wrote:

test(long long, long long):
        movl      4(%esp), %eax 
        subl      12(%esp), %eax
        movl      8(%esp), %edx 
        sbbl      16(%esp), %edx
        jge       ..B1.3        
        movl      $1, %eax      
        ret                     
..B1.3:                         
        movl      $2, %eax      
        ret

Do you think it may be worth lowering the new pseudo to that, instead of the proposed sequence?

Ooh, that looks very nice indeed. Thinking about cmp as a subtraction, this wide subtraction seems like a natural way to do it. I wonder why MSVC doesn't do it this way..

By the way, is there a reason ICC is using %edx instead of %eax for the second move and sbb? It seems to me that the register pressure here should be the same as for my proposed sequence, i.e. only one register needed.

Maybe we could also use this for the equality comparisons?

I'm mainly coming at this from a binary size perspective. Your suggestion is 4 bytes shorter than my test_slt. It's also 2 bytes shorter than my test_eq, so I like it :-)

ICC doesn't do this for equality comparisons.
In fact, it produces code that's very close to what we currently have in clang (before your patch).

As to eax/edx - I can't see a good reason for that. Doesn't necessarily mean there isn't one, of course, just that I can't see one...

jevinskie added a subscriber: jevinskie.Nov 9 2015, 9:39 AM

(1) The subl/sbbl sequence cannot distinguish between greater-than and equal-to, so it doesn't work for ==/!= without an additional instruction at which point, it's no better than the current sequence. This also means you have to be careful about how you order the operands depending on the condition. For (a < b), you'd generate what Michael showed:

test(long long, long long):
        movl      4(%esp), %eax 
        subl      12(%esp), %eax
        movl      8(%esp), %edx 
        sbbl      16(%esp), %edx
        jge       ..B1.3        
        movl      $1, %eax      
        ret                     
..B1.3:                         
        movl      $2, %eax      
        ret

For (a >= b), you can simply replace the "jge" with "jl". For (a > b), you have to reverse the sense of the subtraction like this:

test(long long, long long):
        movl     12(%esp), %eax 
        subl      4(%esp), %eax
        movl      16(%esp), %edx 
        sbbl      8(%esp), %edx
        jge       ..B1.3        
        movl      $1, %eax      
        ret                     
..B1.3:                         
        movl      $2, %eax      
        ret

(a <= b) would have the same operand order but with "jl" instead of "jge".

(2) I think there is no clear winner between the current 1-branch implementation of ==/!= and the proposed 2-branch implementation. The branch prediction effect will be data dependent, and the context will determine whether the extra branch or the longer dependence chain of the current implementation is more harmful. For example, one situation where the 1-branch implementation will shine is when the compare operands almost always compare unequal, but the lower bits often compare equal.

(3) ICC uses both eax and edx to give the post-RA scheduler more flexibility, which is sometimes useful, sometimes not. (In Michael's code snippet, it clearly didn't accomplish anything.)

David, Michael: thank you very much for your input. Not only is the code you suggest better, but it should be easier to generate too since it doesn't change the cfg.

I've been trying to make this work today, but am not sure exactly where to put it. The approach of pattern-matching for "(setcc (or (xor hi1 hi2) (xor lo1 lo2)) 0 {eq,ne})" is brittle because some nodes can get "simplified" e.g .(setcc a 0 ugt) will become (setcc a 0 ne). We could of course try to unsimplify it but I wonder if there's a better way.

Michael mentioned maybe we could do a target-specific DAG-combine. But that would need to be done before legalization, so the types would be illegal. Is it OK to legalize them in a combine? I don't think the combine has access to the logic for expanding integers etc?

Can I make this a target-specific legalization somehow? Where would I do that?

Or, could we make this a generic thing? We have SUBE and SUBC that would let us do subtract with borrow, but I don't think we can pick up the flags from that. Is there some way we could represent this in the generic selection dag?

I'll keep working on this tomorrow, just wanted to put down my notes in case you have any comments. The backend is still new territory for me.

Here is a new version of the patch, using the SUB-SBB method.

I'm trying a new approach, adding a custom SETCC_PARTS node. This avoids the trouble of pattern-matching, allows this to fire in more situations (e.g. not just branches, but all setcc's), and would makes it easier to do the same thing for other targets.

I did try to come up with a way to make this more generic, using ISD::SUBC and SUBE, but couldn't figure out how to detect the outcome based on the result from SUBE. For unsigned comparisons we can just compare it against zero (a comparison that will get folded away nicely), but what to do for unsigned?

I've removed the change to equality comparisons. It might still be worth looking at some more since it lowers the register pressure (maybe use when optimizing for size?).

DavidKreitzer added inline comments.Nov 12 2015, 3:19 PM

include/llvm/CodeGen/ISDOpcodes.h
375	wit --> with
378	I think the operand ordering ought to be [LHSLo, LHSHi, RHSLo, RHSHi] to be consistent with SHL_PARTS, et al.
lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2709	Should probably delete the extra blank line.
2772	This looks reasonable to me. I am just wondering whether it would be better to refactor this such that we always generate SETCC_PARTS here regardless of whether the target has a custom lowering and then implement a default lowering that does most of what this routine is currently doing.
lib/Target/X86/X86ISelLowering.cpp
14595	Your operand ordering for SETCC_PARTS doesn't match your comment in ISDOpcodes.h. The two should at least be in sync, and I already gave my opinion on what the operand order ought to be. :-)
14625	The code sequences you are generating here look great!
test/CodeGen/X86/wide-integer-cmp.ll
39	I think you meant LHSHi here.
61	Again, this looks odd. Did you mean RHSHi here and at line 62?
80	Also RHSHi here and at line 81.

hans marked 5 inline comments as done.Nov 13 2015, 12:59 AM

hans added inline comments.

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2772	Can you provide an example of how always generating SETCC_PARTS and having a default lowering would help? One problem I see is that we'd need to legalize it too. For example a 128-bit SETCC could be legalized to 64-bit SETCC_PARTS, and on a 32-bit target we'd need to legalize that too. With the current patch, we only legalize SETCC to SETCC_PARTS for types the target has said it can efficiently lower.
lib/Target/X86/X86ISelLowering.cpp
14595	Oops. Thanks for catching this. So much for trying to document things :-) I've updated the code to use the order you suggested.
14625	Thanks to you and Michael for suggesting it :-)
test/CodeGen/X86/wide-integer-cmp.ll
80	Thanks! Not sure how I managed to get them so backwards.

Addressing David's comments.

DavidKreitzer added inline comments.Nov 13 2015, 12:43 PM

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2772	I didn't have any particular example in mind. I was just thinking about alternative design choices. The current SETCC_PARTS design seems like it is putting a little too much target-specific knowledge in this target-independent piece of type legalization. Another option you might like that is in line with your earlier thinking, i.e. I did try to come up with a way to make this more generic, using ISD::SUBC and SUBE is to define a new SETCCE opcode that behaves just like SETCC except that it has an additional borrow operand. So the sequence you'd generate during type legalization is SUBC LHSLo, RHSLo SETCCE LHSHi, RHSHi, <SUBC borrow>, <cond> And then the SETCCE would be implemented using SBB+SETcc rather than CMP+SETcc as SETCC is currently implemented. It is probably easier and/or more natural to extend SETCCE to other targets than SETCC_PARTS. What do you think?

DavidKreitzer added inline comments.Nov 13 2015, 4:59 PM

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2772	Another advantage of the SETCCE approach is that it can be used to implement arbitrary precision integer compares. For example, an __int128 compare on x86 could be legalized to SUBC SUBE SUBE SETCCE

Apologies for the delay.

Here is a version using the proposed SETCCE approach. I think this turned out really well, and of course the code for e.g. 128-bit compares is now very nice.

The only downside I can think of is that this gives targets less flexibility in lowering. For example, with SETCC_PARTS a target could choose to lower with a three-way branch on the high part if it wanted too. On the other hand, this flexibility meant more work; the SETCCE node should be very straight-forward for targets to handle, and it handles the cases with more than two halves nicely so I think it's the better option.

Please take a look and let me know what you think.

I really like the way this turned out! Thanks so much for doing the rework. I just had a couple minor additional suggestions.

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2772	I would expect every target to be able to support this (at which point, much of the code in this routine can be removed). Maybe it would be a good idea to add a FIXME comment to that effect.
lib/Target/X86/X86ISelLowering.cpp
14598	I would recommend against this restriction. The SETCCE operation itself makes perfect sense for other conditions. (For example, SETCCE "eq" computes op0 - op1 - carry == 0.) For the purposes of large integer compare lowering, it happens to only be useful for < and >=, but we might find other good uses for it. (I'm imagining some fancy DAG combine optimizations ...)
test/CodeGen/X86/wide-integer-cmp.ll
87	Nice! This is SO much better than the current code!

hans marked an inline comment as done.Nov 18 2015, 9:34 AM

hans added inline comments.

lib/Target/X86/X86ISelLowering.cpp
14598	Sounds good to me.

Addressing review comments.

I also ran into a problem when the carry value gets folded to CARRY_FALSE. Added code to handle that and a test.

Please take a look.

DavidKreitzer added inline comments.Nov 19 2015, 7:32 AM

lib/Target/X86/X86ISelLowering.cpp
14597	I like the idea of leveraging TranslateX86CC, but I believe these two optimizations at the beginning of TranslateX86CC are invalid for SETCCE and somehow need to be avoided for it: // X > -1 -> X == 0, jump !sign. // X < 0 -> X == 0, jump on sign. They are invalid in the one special case where X is 0x80000000 and the carry is true. FWIW, this optimization is valid: // X < 1 -> X <= 0

hans added inline comments.Nov 19 2015, 7:40 AM

lib/Target/X86/X86ISelLowering.cpp
14597	Thanks for catching that!

Breaking out the actual operation translation part from TranslateX86CC and use that instead.

I like it! Everything looks good to me now.

Closed by commit rL253572: X86: More efficient legalization of wide integer compares (authored by hans). · Explain WhyNov 19 2015, 8:37 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

ISDOpcodes.h

6 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

15 lines

LegalizeDAG.cpp

3 lines

LegalizeIntegerTypes.cpp

60 lines

LegalizeTypes.h

1 line

SelectionDAGDumper.cpp

1 line

Target/

X86/

X86ISelLowering.h

1 line

X86ISelLowering.cpp

52 lines

test/

CodeGen/

X86/

2012-08-17-legalizer-crash.ll

3 lines

atomic-minmax-i6432.ll

8 lines

atomic128.ll

52 lines

avx512-cmp.ll

25 lines

wide-integer-cmp.ll

130 lines

win32-pic-jumptable.ll

4 lines

Diff 40647

include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 366 Lines • ▼ Show 20 Lines	enum NodeType {
/// SetCC operator - This evaluates to a true value iff the condition is		/// SetCC operator - This evaluates to a true value iff the condition is
/// true. If the result value type is not i1 then the high bits conform		/// true. If the result value type is not i1 then the high bits conform
/// to getBooleanContents. The operands to this are the left and right		/// to getBooleanContents. The operands to this are the left and right
/// operands to compare (ops #0, and #1) and the condition code to compare		/// operands to compare (ops #0, and #1) and the condition code to compare
/// them with (op #2) as a CondCodeSDNode. If the operands are vector types		/// them with (op #2) as a CondCodeSDNode. If the operands are vector types
/// then the result type must also be a vector type.		/// then the result type must also be a vector type.
SETCC,		SETCC,

		/// Like SetCC, ops #0 and #1 are the LHS and RHS operands to compare, but
		DavidKreitzerUnsubmitted Done Reply Inline Actions wit --> with DavidKreitzer: wit --> with
		/// op #2 is a carry value. This operator checks the result of
		/// "LHS - RHS - Carry", and can be used to compare two wide integers:
		/// (setcce lhshi rhshi (subc lhslo rhslo) cc). Only valid for integers.
		DavidKreitzerUnsubmitted Done Reply Inline Actions I think the operand ordering ought to be [LHSLo, LHSHi, RHSLo, RHSHi] to be consistent with SHL_PARTS, et al. DavidKreitzer: I think the operand ordering ought to be [LHSLo, LHSHi, RHSLo, RHSHi] to be consistent with…
		SETCCE,

/// SHL_PARTS/SRA_PARTS/SRL_PARTS - These operators are used for expanded		/// SHL_PARTS/SRA_PARTS/SRL_PARTS - These operators are used for expanded
/// integer shift operations. The operation ordering is:		/// integer shift operations. The operation ordering is:
/// [Lo,Hi] = op [LoLHS,HiLHS], Amt		/// [Lo,Hi] = op [LoLHS,HiLHS], Amt
SHL_PARTS, SRA_PARTS, SRL_PARTS,		SHL_PARTS, SRA_PARTS, SRL_PARTS,

/// Conversion operators. These are all single input single output		/// Conversion operators. These are all single input single output
/// operations. For all of these, the result type must be strictly		/// operations. For all of these, the result type must be strictly
/// wider or narrower (depending on the operation) than the source		/// wider or narrower (depending on the operation) than the source
▲ Show 20 Lines • Show All 549 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 261 Lines • ▼ Show 20 Lines	private:
SDValue visitCTLZ_ZERO_UNDEF(SDNode *N);		SDValue visitCTLZ_ZERO_UNDEF(SDNode *N);
SDValue visitCTTZ(SDNode *N);		SDValue visitCTTZ(SDNode *N);
SDValue visitCTTZ_ZERO_UNDEF(SDNode *N);		SDValue visitCTTZ_ZERO_UNDEF(SDNode *N);
SDValue visitCTPOP(SDNode *N);		SDValue visitCTPOP(SDNode *N);
SDValue visitSELECT(SDNode *N);		SDValue visitSELECT(SDNode *N);
SDValue visitVSELECT(SDNode *N);		SDValue visitVSELECT(SDNode *N);
SDValue visitSELECT_CC(SDNode *N);		SDValue visitSELECT_CC(SDNode *N);
SDValue visitSETCC(SDNode *N);		SDValue visitSETCC(SDNode *N);
		SDValue visitSETCCE(SDNode *N);
SDValue visitSIGN_EXTEND(SDNode *N);		SDValue visitSIGN_EXTEND(SDNode *N);
SDValue visitZERO_EXTEND(SDNode *N);		SDValue visitZERO_EXTEND(SDNode *N);
SDValue visitANY_EXTEND(SDNode *N);		SDValue visitANY_EXTEND(SDNode *N);
SDValue visitSIGN_EXTEND_INREG(SDNode *N);		SDValue visitSIGN_EXTEND_INREG(SDNode *N);
SDValue visitSIGN_EXTEND_VECTOR_INREG(SDNode *N);		SDValue visitSIGN_EXTEND_VECTOR_INREG(SDNode *N);
SDValue visitTRUNCATE(SDNode *N);		SDValue visitTRUNCATE(SDNode *N);
SDValue visitBITCAST(SDNode *N);		SDValue visitBITCAST(SDNode *N);
SDValue visitBUILD_PAIR(SDNode *N);		SDValue visitBUILD_PAIR(SDNode *N);
▲ Show 20 Lines • Show All 1,113 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visit(SDNode *N) {
case ISD::CTLZ_ZERO_UNDEF: return visitCTLZ_ZERO_UNDEF(N);		case ISD::CTLZ_ZERO_UNDEF: return visitCTLZ_ZERO_UNDEF(N);
case ISD::CTTZ: return visitCTTZ(N);		case ISD::CTTZ: return visitCTTZ(N);
case ISD::CTTZ_ZERO_UNDEF: return visitCTTZ_ZERO_UNDEF(N);		case ISD::CTTZ_ZERO_UNDEF: return visitCTTZ_ZERO_UNDEF(N);
case ISD::CTPOP: return visitCTPOP(N);		case ISD::CTPOP: return visitCTPOP(N);
case ISD::SELECT: return visitSELECT(N);		case ISD::SELECT: return visitSELECT(N);
case ISD::VSELECT: return visitVSELECT(N);		case ISD::VSELECT: return visitVSELECT(N);
case ISD::SELECT_CC: return visitSELECT_CC(N);		case ISD::SELECT_CC: return visitSELECT_CC(N);
case ISD::SETCC: return visitSETCC(N);		case ISD::SETCC: return visitSETCC(N);
		case ISD::SETCCE: return visitSETCCE(N);
case ISD::SIGN_EXTEND: return visitSIGN_EXTEND(N);		case ISD::SIGN_EXTEND: return visitSIGN_EXTEND(N);
case ISD::ZERO_EXTEND: return visitZERO_EXTEND(N);		case ISD::ZERO_EXTEND: return visitZERO_EXTEND(N);
case ISD::ANY_EXTEND: return visitANY_EXTEND(N);		case ISD::ANY_EXTEND: return visitANY_EXTEND(N);
case ISD::SIGN_EXTEND_INREG: return visitSIGN_EXTEND_INREG(N);		case ISD::SIGN_EXTEND_INREG: return visitSIGN_EXTEND_INREG(N);
case ISD::SIGN_EXTEND_VECTOR_INREG: return visitSIGN_EXTEND_VECTOR_INREG(N);		case ISD::SIGN_EXTEND_VECTOR_INREG: return visitSIGN_EXTEND_VECTOR_INREG(N);
case ISD::TRUNCATE: return visitTRUNCATE(N);		case ISD::TRUNCATE: return visitTRUNCATE(N);
case ISD::BITCAST: return visitBITCAST(N);		case ISD::BITCAST: return visitBITCAST(N);
case ISD::BUILD_PAIR: return visitBUILD_PAIR(N);		case ISD::BUILD_PAIR: return visitBUILD_PAIR(N);
▲ Show 20 Lines • Show All 4,308 Lines • ▼ Show 20 Lines
}		}

SDValue DAGCombiner::visitSETCC(SDNode *N) {		SDValue DAGCombiner::visitSETCC(SDNode *N) {
return SimplifySetCC(N->getValueType(0), N->getOperand(0), N->getOperand(1),		return SimplifySetCC(N->getValueType(0), N->getOperand(0), N->getOperand(1),
cast<CondCodeSDNode>(N->getOperand(2))->get(),		cast<CondCodeSDNode>(N->getOperand(2))->get(),
SDLoc(N));		SDLoc(N));
}		}

		SDValue DAGCombiner::visitSETCCE(SDNode *N) {
		SDValue LHS = N->getOperand(0);
		SDValue RHS = N->getOperand(1);
		SDValue Carry = N->getOperand(2);
		SDValue Cond = N->getOperand(3);

		// If Carry is false, fold to a regular SETCC.
		if (Carry.getOpcode() == ISD::CARRY_FALSE)
		return DAG.getNode(ISD::SETCC, SDLoc(N), N->getVTList(), LHS, RHS, Cond);

		return SDValue();
		}

/// Try to fold a sext/zext/aext dag node into a ConstantSDNode or		/// Try to fold a sext/zext/aext dag node into a ConstantSDNode or
/// a build_vector of constants.		/// a build_vector of constants.
/// This function is called by the DAGCombiner when visiting sext/zext/aext		/// This function is called by the DAGCombiner when visiting sext/zext/aext
/// dag nodes (see for example method DAGCombiner::visitSIGN_EXTEND).		/// dag nodes (see for example method DAGCombiner::visitSIGN_EXTEND).
/// Vector extends are not folded if operations are legal; this is to		/// Vector extends are not folded if operations are legal; this is to
/// avoid introducing illegal build_vector dag nodes.		/// avoid introducing illegal build_vector dag nodes.
static SDNode tryToFoldExtendOfConstant(SDNode N, const TargetLowering &TLI,		static SDNode tryToFoldExtendOfConstant(SDNode N, const TargetLowering &TLI,
SelectionDAG &DAG, bool LegalTypes,		SelectionDAG &DAG, bool LegalTypes,
▲ Show 20 Lines • Show All 8,992 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 1,236 Lines • ▼ Show 20 Lines	case ISD::ATOMIC_STORE: {
Action = TLI.getOperationAction(Node->getOpcode(),		Action = TLI.getOperationAction(Node->getOpcode(),
Node->getOperand(2).getValueType());		Node->getOperand(2).getValueType());
break;		break;
}		}
case ISD::SELECT_CC:		case ISD::SELECT_CC:
case ISD::SETCC:		case ISD::SETCC:
case ISD::BR_CC: {		case ISD::BR_CC: {
unsigned CCOperand = Node->getOpcode() == ISD::SELECT_CC ? 4 :		unsigned CCOperand = Node->getOpcode() == ISD::SELECT_CC ? 4 :
Node->getOpcode() == ISD::SETCC ? 2 : 1;		Node->getOpcode() == ISD::SETCC ? 2 :
		Node->getOpcode() == ISD::SETCCE ? 3 : 1;
unsigned CompareOperand = Node->getOpcode() == ISD::BR_CC ? 2 : 0;		unsigned CompareOperand = Node->getOpcode() == ISD::BR_CC ? 2 : 0;
MVT OpVT = Node->getOperand(CompareOperand).getSimpleValueType();		MVT OpVT = Node->getOperand(CompareOperand).getSimpleValueType();
ISD::CondCode CCCode =		ISD::CondCode CCCode =
cast<CondCodeSDNode>(Node->getOperand(CCOperand))->get();		cast<CondCodeSDNode>(Node->getOperand(CCOperand))->get();
Action = TLI.getCondCodeAction(CCCode, OpVT);		Action = TLI.getCondCodeAction(CCCode, OpVT);
if (Action == TargetLowering::Legal) {		if (Action == TargetLowering::Legal) {
if (Node->getOpcode() == ISD::SELECT_CC)		if (Node->getOpcode() == ISD::SELECT_CC)
Action = TLI.getOperationAction(Node->getOpcode(),		Action = TLI.getOperationAction(Node->getOpcode(),
▲ Show 20 Lines • Show All 3,377 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 2,628 Lines • ▼ Show 20 Lines	bool DAGTypeLegalizer::ExpandIntegerOperand(SDNode *N, unsigned OpNo) {
case ISD::BITCAST: Res = ExpandOp_BITCAST(N); break;		case ISD::BITCAST: Res = ExpandOp_BITCAST(N); break;
case ISD::BR_CC: Res = ExpandIntOp_BR_CC(N); break;		case ISD::BR_CC: Res = ExpandIntOp_BR_CC(N); break;
case ISD::BUILD_VECTOR: Res = ExpandOp_BUILD_VECTOR(N); break;		case ISD::BUILD_VECTOR: Res = ExpandOp_BUILD_VECTOR(N); break;
case ISD::EXTRACT_ELEMENT: Res = ExpandOp_EXTRACT_ELEMENT(N); break;		case ISD::EXTRACT_ELEMENT: Res = ExpandOp_EXTRACT_ELEMENT(N); break;
case ISD::INSERT_VECTOR_ELT: Res = ExpandOp_INSERT_VECTOR_ELT(N); break;		case ISD::INSERT_VECTOR_ELT: Res = ExpandOp_INSERT_VECTOR_ELT(N); break;
case ISD::SCALAR_TO_VECTOR: Res = ExpandOp_SCALAR_TO_VECTOR(N); break;		case ISD::SCALAR_TO_VECTOR: Res = ExpandOp_SCALAR_TO_VECTOR(N); break;
case ISD::SELECT_CC: Res = ExpandIntOp_SELECT_CC(N); break;		case ISD::SELECT_CC: Res = ExpandIntOp_SELECT_CC(N); break;
case ISD::SETCC: Res = ExpandIntOp_SETCC(N); break;		case ISD::SETCC: Res = ExpandIntOp_SETCC(N); break;
		case ISD::SETCCE: Res = ExpandIntOp_SETCCE(N); break;
case ISD::SINT_TO_FP: Res = ExpandIntOp_SINT_TO_FP(N); break;		case ISD::SINT_TO_FP: Res = ExpandIntOp_SINT_TO_FP(N); break;
case ISD::STORE: Res = ExpandIntOp_STORE(cast<StoreSDNode>(N), OpNo); break;		case ISD::STORE: Res = ExpandIntOp_STORE(cast<StoreSDNode>(N), OpNo); break;
case ISD::TRUNCATE: Res = ExpandIntOp_TRUNCATE(N); break;		case ISD::TRUNCATE: Res = ExpandIntOp_TRUNCATE(N); break;
case ISD::UINT_TO_FP: Res = ExpandIntOp_UINT_TO_FP(N); break;		case ISD::UINT_TO_FP: Res = ExpandIntOp_UINT_TO_FP(N); break;

case ISD::SHL:		case ISD::SHL:
case ISD::SRA:		case ISD::SRA:
case ISD::SRL:		case ISD::SRL:
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	void DAGTypeLegalizer::IntegerExpandSetCCOperands(SDValue &NewLHS,
if (ConstantSDNode *CST = dyn_cast<ConstantSDNode>(NewRHS))		if (ConstantSDNode *CST = dyn_cast<ConstantSDNode>(NewRHS))
if ((CCCode == ISD::SETLT && CST->isNullValue()) \|\| // X < 0		if ((CCCode == ISD::SETLT && CST->isNullValue()) \|\| // X < 0
(CCCode == ISD::SETGT && CST->isAllOnesValue())) { // X > -1		(CCCode == ISD::SETGT && CST->isAllOnesValue())) { // X > -1
NewLHS = LHSHi;		NewLHS = LHSHi;
NewRHS = RHSHi;		NewRHS = RHSHi;
return;		return;
}		}

// FIXME: This generated code sucks.		// FIXME: This generated code sucks.
		DavidKreitzerUnsubmitted Done Reply Inline Actions Should probably delete the extra blank line. DavidKreitzer: Should probably delete the extra blank line.
ISD::CondCode LowCC;		ISD::CondCode LowCC;
switch (CCCode) {		switch (CCCode) {
default: llvm_unreachable("Unknown integer setcc!");		default: llvm_unreachable("Unknown integer setcc!");
case ISD::SETLT:		case ISD::SETLT:
case ISD::SETULT: LowCC = ISD::SETULT; break;		case ISD::SETULT: LowCC = ISD::SETULT; break;
case ISD::SETGT:		case ISD::SETGT:
case ISD::SETUGT: LowCC = ISD::SETUGT; break;		case ISD::SETUGT: LowCC = ISD::SETUGT; break;
case ISD::SETLE:		case ISD::SETLE:
Show All 39 Lines	if ((Tmp1C && Tmp1C->isNullValue()) \|\|
// low part is known false, returns high part.		// low part is known false, returns high part.
// For LE / GE, if high part is known false, ignore the low part.		// For LE / GE, if high part is known false, ignore the low part.
// For LT / GT, if high part is known true, ignore the low part.		// For LT / GT, if high part is known true, ignore the low part.
NewLHS = Tmp2;		NewLHS = Tmp2;
NewRHS = SDValue();		NewRHS = SDValue();
return;		return;
}		}

		if (LHSHi == RHSHi) {
		// Comparing the low bits is enough.
		NewLHS = Tmp1;
		NewRHS = SDValue();
		return;
		}

		// Lower with SETCCE if the target supports it.
		DavidKreitzerUnsubmitted Not Done Reply Inline Actions This looks reasonable to me. I am just wondering whether it would be better to refactor this such that we always generate SETCC_PARTS here regardless of whether the target has a custom lowering and then implement a default lowering that does most of what this routine is currently doing. DavidKreitzer: This looks reasonable to me. I am just wondering whether it would be better to refactor this…
		hansAuthorUnsubmitted Not Done Reply Inline Actions Can you provide an example of how always generating SETCC_PARTS and having a default lowering would help? One problem I see is that we'd need to legalize it too. For example a 128-bit SETCC could be legalized to 64-bit SETCC_PARTS, and on a 32-bit target we'd need to legalize that too. With the current patch, we only legalize SETCC to SETCC_PARTS for types the target has said it can efficiently lower. hans: Can you provide an example of how always generating SETCC_PARTS and having a default lowering…
		DavidKreitzerUnsubmitted Not Done Reply Inline Actions I didn't have any particular example in mind. I was just thinking about alternative design choices. The current SETCC_PARTS design seems like it is putting a little too much target-specific knowledge in this target-independent piece of type legalization. Another option you might like that is in line with your earlier thinking, i.e. I did try to come up with a way to make this more generic, using ISD::SUBC and SUBE is to define a new SETCCE opcode that behaves just like SETCC except that it has an additional borrow operand. So the sequence you'd generate during type legalization is SUBC LHSLo, RHSLo SETCCE LHSHi, RHSHi, <SUBC borrow>, <cond> And then the SETCCE would be implemented using SBB+SETcc rather than CMP+SETcc as SETCC is currently implemented. It is probably easier and/or more natural to extend SETCCE to other targets than SETCC_PARTS. What do you think? DavidKreitzer: I didn't have any particular example in mind. I was just thinking about alternative design…
		DavidKreitzerUnsubmitted Not Done Reply Inline Actions Another advantage of the SETCCE approach is that it can be used to implement arbitrary precision integer compares. For example, an __int128 compare on x86 could be legalized to SUBC SUBE SUBE SETCCE DavidKreitzer: Another advantage of the SETCCE approach is that it can be used to implement arbitrary…
		DavidKreitzerUnsubmitted Done Reply Inline Actions I would expect every target to be able to support this (at which point, much of the code in this routine can be removed). Maybe it would be a good idea to add a FIXME comment to that effect. DavidKreitzer: I would expect every target to be able to support this (at which point, much of the code in…
		// FIXME: Make all targets support this, then remove the other lowering.
		if (TLI.getOperationAction(
		ISD::SETCCE,
		TLI.getTypeToExpandTo(*DAG.getContext(), LHSLo.getValueType())) ==
		TargetLowering::Custom) {
		// SETCCE can detect < and >= directly. For > and <=, flip operands and
		// condition code.
		bool FlipOperands = false;
		switch (CCCode) {
		case ISD::SETGT: CCCode = ISD::SETLT; FlipOperands = true; break;
		case ISD::SETUGT: CCCode = ISD::SETULT; FlipOperands = true; break;
		case ISD::SETLE: CCCode = ISD::SETGE; FlipOperands = true; break;
		case ISD::SETULE: CCCode = ISD::SETUGE; FlipOperands = true; break;
		default: break;
		}
		if (FlipOperands) {
		std::swap(LHSLo, RHSLo);
		std::swap(LHSHi, RHSHi);
		}
		// Perform a wide subtraction, feeding the carry from the low part into
		// SETCCE. The SETCCE operation is essentially looking at the high part of
		// the result of LHS - RHS. It is negative iff LHS < RHS. It is zero or
		// positive iff LHS >= RHS.
		SDVTList VTList = DAG.getVTList(LHSLo.getValueType(), MVT::Glue);
		SDValue LowCmp = DAG.getNode(ISD::SUBC, dl, VTList, LHSLo, RHSLo);
		SDValue Res =
		DAG.getNode(ISD::SETCCE, dl, getSetCCResultType(LHSLo.getValueType()),
		LHSHi, RHSHi, LowCmp.getValue(1), DAG.getCondCode(CCCode));
		NewLHS = Res;
		NewRHS = SDValue();
		return;
		}

NewLHS = TLI.SimplifySetCC(getSetCCResultType(LHSHi.getValueType()),		NewLHS = TLI.SimplifySetCC(getSetCCResultType(LHSHi.getValueType()),
LHSHi, RHSHi, ISD::SETEQ, false,		LHSHi, RHSHi, ISD::SETEQ, false,
DagCombineInfo, dl);		DagCombineInfo, dl);
if (!NewLHS.getNode())		if (!NewLHS.getNode())
NewLHS = DAG.getSetCC(dl, getSetCCResultType(LHSHi.getValueType()),		NewLHS = DAG.getSetCC(dl, getSetCCResultType(LHSHi.getValueType()),
LHSHi, RHSHi, ISD::SETEQ);		LHSHi, RHSHi, ISD::SETEQ);
NewLHS = DAG.getSelect(dl, Tmp1.getValueType(),		NewLHS = DAG.getSelect(dl, Tmp1.getValueType(),
NewLHS, Tmp1, Tmp2);		NewLHS, Tmp1, Tmp2);
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	if (!NewRHS.getNode()) {
return NewLHS;		return NewLHS;
}		}

// Otherwise, update N to have the operands specified.		// Otherwise, update N to have the operands specified.
return SDValue(DAG.UpdateNodeOperands(N, NewLHS, NewRHS,		return SDValue(DAG.UpdateNodeOperands(N, NewLHS, NewRHS,
DAG.getCondCode(CCCode)), 0);		DAG.getCondCode(CCCode)), 0);
}		}

		SDValue DAGTypeLegalizer::ExpandIntOp_SETCCE(SDNode *N) {
		SDValue LHS = N->getOperand(0);
		SDValue RHS = N->getOperand(1);
		SDValue Carry = N->getOperand(2);
		SDValue Cond = N->getOperand(3);
		SDLoc dl = SDLoc(N);

		SDValue LHSLo, LHSHi, RHSLo, RHSHi;
		GetExpandedInteger(LHS, LHSLo, LHSHi);
		GetExpandedInteger(RHS, RHSLo, RHSHi);

		// Expand to a SUBE for the low part and a smaller SETCCE for the high.
		SDVTList VTList = DAG.getVTList(LHSLo.getValueType(), MVT::Glue);
		SDValue LowCmp = DAG.getNode(ISD::SUBE, dl, VTList, LHSLo, RHSLo, Carry);
		return DAG.getNode(ISD::SETCCE, dl, N->getValueType(0), LHSHi, RHSHi,
		LowCmp.getValue(1), Cond);
		}

SDValue DAGTypeLegalizer::ExpandIntOp_Shift(SDNode *N) {		SDValue DAGTypeLegalizer::ExpandIntOp_Shift(SDNode *N) {
// The value being shifted is legal, but the shift amount is too big.		// The value being shifted is legal, but the shift amount is too big.
// It follows that either the result of the shift is undefined, or the		// It follows that either the result of the shift is undefined, or the
// upper half of the shift amount is zero. Just use the lower half.		// upper half of the shift amount is zero. Just use the lower half.
SDValue Lo, Hi;		SDValue Lo, Hi;
GetExpandedInteger(N->getOperand(1), Lo, Hi);		GetExpandedInteger(N->getOperand(1), Lo, Hi);
return SDValue(DAG.UpdateNodeOperands(N, N->getOperand(0), Lo), 0);		return SDValue(DAG.UpdateNodeOperands(N, N->getOperand(0), Lo), 0);
}		}
▲ Show 20 Lines • Show All 392 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeTypes.h

Show First 20 Lines • Show All 350 Lines • ▼ Show 20 Lines	private:
bool ExpandShiftWithKnownAmountBit(SDNode *N, SDValue &Lo, SDValue &Hi);		bool ExpandShiftWithKnownAmountBit(SDNode *N, SDValue &Lo, SDValue &Hi);
bool ExpandShiftWithUnknownAmountBit(SDNode *N, SDValue &Lo, SDValue &Hi);		bool ExpandShiftWithUnknownAmountBit(SDNode *N, SDValue &Lo, SDValue &Hi);

// Integer Operand Expansion.		// Integer Operand Expansion.
bool ExpandIntegerOperand(SDNode *N, unsigned OperandNo);		bool ExpandIntegerOperand(SDNode *N, unsigned OperandNo);
SDValue ExpandIntOp_BR_CC(SDNode *N);		SDValue ExpandIntOp_BR_CC(SDNode *N);
SDValue ExpandIntOp_SELECT_CC(SDNode *N);		SDValue ExpandIntOp_SELECT_CC(SDNode *N);
SDValue ExpandIntOp_SETCC(SDNode *N);		SDValue ExpandIntOp_SETCC(SDNode *N);
		SDValue ExpandIntOp_SETCCE(SDNode *N);
SDValue ExpandIntOp_Shift(SDNode *N);		SDValue ExpandIntOp_Shift(SDNode *N);
SDValue ExpandIntOp_SINT_TO_FP(SDNode *N);		SDValue ExpandIntOp_SINT_TO_FP(SDNode *N);
SDValue ExpandIntOp_STORE(StoreSDNode *N, unsigned OpNo);		SDValue ExpandIntOp_STORE(StoreSDNode *N, unsigned OpNo);
SDValue ExpandIntOp_TRUNCATE(SDNode *N);		SDValue ExpandIntOp_TRUNCATE(SDNode *N);
SDValue ExpandIntOp_UINT_TO_FP(SDNode *N);		SDValue ExpandIntOp_UINT_TO_FP(SDNode *N);
SDValue ExpandIntOp_RETURNADDR(SDNode *N);		SDValue ExpandIntOp_RETURNADDR(SDNode *N);
SDValue ExpandIntOp_ATOMIC_STORE(SDNode *N);		SDValue ExpandIntOp_ATOMIC_STORE(SDNode *N);

▲ Show 20 Lines • Show All 451 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Show First 20 Lines • Show All 203 Lines • ▼ Show 20 Lines	#endif
case ISD::FPOW: return "fpow";		case ISD::FPOW: return "fpow";
case ISD::SMIN: return "smin";		case ISD::SMIN: return "smin";
case ISD::SMAX: return "smax";		case ISD::SMAX: return "smax";
case ISD::UMIN: return "umin";		case ISD::UMIN: return "umin";
case ISD::UMAX: return "umax";		case ISD::UMAX: return "umax";

case ISD::FPOWI: return "fpowi";		case ISD::FPOWI: return "fpowi";
case ISD::SETCC: return "setcc";		case ISD::SETCC: return "setcc";
		case ISD::SETCCE: return "setcce";
case ISD::SELECT: return "select";		case ISD::SELECT: return "select";
case ISD::VSELECT: return "vselect";		case ISD::VSELECT: return "vselect";
case ISD::SELECT_CC: return "select_cc";		case ISD::SELECT_CC: return "select_cc";
case ISD::INSERT_VECTOR_ELT: return "insert_vector_elt";		case ISD::INSERT_VECTOR_ELT: return "insert_vector_elt";
case ISD::EXTRACT_VECTOR_ELT: return "extract_vector_elt";		case ISD::EXTRACT_VECTOR_ELT: return "extract_vector_elt";
case ISD::CONCAT_VECTORS: return "concat_vectors";		case ISD::CONCAT_VECTORS: return "concat_vectors";
case ISD::INSERT_SUBVECTOR: return "insert_subvector";		case ISD::INSERT_SUBVECTOR: return "insert_subvector";
case ISD::EXTRACT_SUBVECTOR: return "extract_subvector";		case ISD::EXTRACT_SUBVECTOR: return "extract_subvector";
▲ Show 20 Lines • Show All 506 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 1,015 Lines • ▼ Show 20 Lines	private:
SDValue LowerUINT_TO_FP_i32(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerUINT_TO_FP_i32(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerUINT_TO_FP_vec(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerUINT_TO_FP_vec(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerTRUNCATE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerTRUNCATE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFP_TO_SINT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFP_TO_SINT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFP_TO_UINT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFP_TO_UINT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerToBT(SDValue And, ISD::CondCode CC,		SDValue LowerToBT(SDValue And, ISD::CondCode CC,
SDLoc dl, SelectionDAG &DAG) const;		SDLoc dl, SelectionDAG &DAG) const;
SDValue LowerSETCC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSETCC(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerSETCCE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSELECT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSELECT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerJumpTable(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerJumpTable(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVASTART(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVASTART(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVAARG(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVAARG(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerRETURNADDR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerRETURNADDR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFRAMEADDR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFRAMEADDR(SDValue Op, SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 415 Lines • ▼ Show 20 Lines	X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
setOperationAction(ISD::SELECT , MVT::f64 , Custom);		setOperationAction(ISD::SELECT , MVT::f64 , Custom);
setOperationAction(ISD::SELECT , MVT::f80 , Custom);		setOperationAction(ISD::SELECT , MVT::f80 , Custom);
setOperationAction(ISD::SETCC , MVT::i8 , Custom);		setOperationAction(ISD::SETCC , MVT::i8 , Custom);
setOperationAction(ISD::SETCC , MVT::i16 , Custom);		setOperationAction(ISD::SETCC , MVT::i16 , Custom);
setOperationAction(ISD::SETCC , MVT::i32 , Custom);		setOperationAction(ISD::SETCC , MVT::i32 , Custom);
setOperationAction(ISD::SETCC , MVT::f32 , Custom);		setOperationAction(ISD::SETCC , MVT::f32 , Custom);
setOperationAction(ISD::SETCC , MVT::f64 , Custom);		setOperationAction(ISD::SETCC , MVT::f64 , Custom);
setOperationAction(ISD::SETCC , MVT::f80 , Custom);		setOperationAction(ISD::SETCC , MVT::f80 , Custom);
		setOperationAction(ISD::SETCCE , MVT::i8 , Custom);
		setOperationAction(ISD::SETCCE , MVT::i16 , Custom);
		setOperationAction(ISD::SETCCE , MVT::i32 , Custom);
if (Subtarget->is64Bit()) {		if (Subtarget->is64Bit()) {
setOperationAction(ISD::SELECT , MVT::i64 , Custom);		setOperationAction(ISD::SELECT , MVT::i64 , Custom);
setOperationAction(ISD::SETCC , MVT::i64 , Custom);		setOperationAction(ISD::SETCC , MVT::i64 , Custom);
		setOperationAction(ISD::SETCCE , MVT::i64 , Custom);
}		}
setOperationAction(ISD::EH_RETURN , MVT::Other, Custom);		setOperationAction(ISD::EH_RETURN , MVT::Other, Custom);
// NOTE: EH_SJLJ_SETJMP/_LONGJMP supported here is NOT intended to support		// NOTE: EH_SJLJ_SETJMP/_LONGJMP supported here is NOT intended to support
// SjLj exception handling but a light-weight setjmp/longjmp replacement to		// SjLj exception handling but a light-weight setjmp/longjmp replacement to
// support continuation, user-level threading, and etc.. As a result, no		// support continuation, user-level threading, and etc.. As a result, no
// other SjLj exception interfaces are implemented and please don't build		// other SjLj exception interfaces are implemented and please don't build
// your own exception handling based on them.		// your own exception handling based on them.
// LLVM/Clang supports zero-cost DWARF exception handling.		// LLVM/Clang supports zero-cost DWARF exception handling.
▲ Show 20 Lines • Show All 3,519 Lines • ▼ Show 20 Lines	static bool isX86CCUnsigned(unsigned X86CC) {
case X86::COND_NE: return true;		case X86::COND_NE: return true;
case X86::COND_B: return true;		case X86::COND_B: return true;
case X86::COND_A: return true;		case X86::COND_A: return true;
case X86::COND_BE: return true;		case X86::COND_BE: return true;
case X86::COND_AE: return true;		case X86::COND_AE: return true;
}		}
}		}

		static X86::CondCode TranslateIntegerX86CC(ISD::CondCode SetCCOpcode) {
		switch (SetCCOpcode) {
		default: llvm_unreachable("Invalid integer condition!");
		case ISD::SETEQ: return X86::COND_E;
		case ISD::SETGT: return X86::COND_G;
		case ISD::SETGE: return X86::COND_GE;
		case ISD::SETLT: return X86::COND_L;
		case ISD::SETLE: return X86::COND_LE;
		case ISD::SETNE: return X86::COND_NE;
		case ISD::SETULT: return X86::COND_B;
		case ISD::SETUGT: return X86::COND_A;
		case ISD::SETULE: return X86::COND_BE;
		case ISD::SETUGE: return X86::COND_AE;
		}
		}

/// Do a one-to-one translation of a ISD::CondCode to the X86-specific		/// Do a one-to-one translation of a ISD::CondCode to the X86-specific
/// condition code, returning the condition code and the LHS/RHS of the		/// condition code, returning the condition code and the LHS/RHS of the
/// comparison to make.		/// comparison to make.
static unsigned TranslateX86CC(ISD::CondCode SetCCOpcode, SDLoc DL, bool isFP,		static unsigned TranslateX86CC(ISD::CondCode SetCCOpcode, SDLoc DL, bool isFP,
SDValue &LHS, SDValue &RHS, SelectionDAG &DAG) {		SDValue &LHS, SDValue &RHS, SelectionDAG &DAG) {
if (!isFP) {		if (!isFP) {
if (ConstantSDNode *RHSC = dyn_cast<ConstantSDNode>(RHS)) {		if (ConstantSDNode *RHSC = dyn_cast<ConstantSDNode>(RHS)) {
if (SetCCOpcode == ISD::SETGT && RHSC->isAllOnesValue()) {		if (SetCCOpcode == ISD::SETGT && RHSC->isAllOnesValue()) {
// X > -1 -> X == 0, jump !sign.		// X > -1 -> X == 0, jump !sign.
RHS = DAG.getConstant(0, DL, RHS.getValueType());		RHS = DAG.getConstant(0, DL, RHS.getValueType());
return X86::COND_NS;		return X86::COND_NS;
}		}
if (SetCCOpcode == ISD::SETLT && RHSC->isNullValue()) {		if (SetCCOpcode == ISD::SETLT && RHSC->isNullValue()) {
// X < 0 -> X == 0, jump on sign.		// X < 0 -> X == 0, jump on sign.
return X86::COND_S;		return X86::COND_S;
}		}
if (SetCCOpcode == ISD::SETLT && RHSC->getZExtValue() == 1) {		if (SetCCOpcode == ISD::SETLT && RHSC->getZExtValue() == 1) {
// X < 1 -> X <= 0		// X < 1 -> X <= 0
RHS = DAG.getConstant(0, DL, RHS.getValueType());		RHS = DAG.getConstant(0, DL, RHS.getValueType());
return X86::COND_LE;		return X86::COND_LE;
}		}
}		}

switch (SetCCOpcode) {		return TranslateIntegerX86CC(SetCCOpcode);
default: llvm_unreachable("Invalid integer condition!");
case ISD::SETEQ: return X86::COND_E;
case ISD::SETGT: return X86::COND_G;
case ISD::SETGE: return X86::COND_GE;
case ISD::SETLT: return X86::COND_L;
case ISD::SETLE: return X86::COND_LE;
case ISD::SETNE: return X86::COND_NE;
case ISD::SETULT: return X86::COND_B;
case ISD::SETUGT: return X86::COND_A;
case ISD::SETULE: return X86::COND_BE;
case ISD::SETUGE: return X86::COND_AE;
}
}		}

// First determine if it is required or is profitable to flip the operands.		// First determine if it is required or is profitable to flip the operands.

// If LHS is a foldable load, but RHS is not, flip the condition.		// If LHS is a foldable load, but RHS is not, flip the condition.
if (ISD::isNON_EXTLoad(LHS.getNode()) &&		if (ISD::isNON_EXTLoad(LHS.getNode()) &&
!ISD::isNON_EXTLoad(RHS.getNode())) {		!ISD::isNON_EXTLoad(RHS.getNode())) {
SetCCOpcode = getSetCCSwappedOperands(SetCCOpcode);		SetCCOpcode = getSetCCSwappedOperands(SetCCOpcode);
▲ Show 20 Lines • Show All 10,567 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::LowerSETCC(SDValue Op, SelectionDAG &DAG) const {
EFLAGS = ConvertCmpIfNecessary(EFLAGS, DAG);		EFLAGS = ConvertCmpIfNecessary(EFLAGS, DAG);
SDValue SetCC = DAG.getNode(X86ISD::SETCC, dl, MVT::i8,		SDValue SetCC = DAG.getNode(X86ISD::SETCC, dl, MVT::i8,
DAG.getConstant(X86CC, dl, MVT::i8), EFLAGS);		DAG.getConstant(X86CC, dl, MVT::i8), EFLAGS);
if (VT == MVT::i1)		if (VT == MVT::i1)
return DAG.getNode(ISD::TRUNCATE, dl, MVT::i1, SetCC);		return DAG.getNode(ISD::TRUNCATE, dl, MVT::i1, SetCC);
return SetCC;		return SetCC;
}		}

		SDValue X86TargetLowering::LowerSETCCE(SDValue Op, SelectionDAG &DAG) const {
		SDValue LHS = Op.getOperand(0);
		SDValue RHS = Op.getOperand(1);
		SDValue Carry = Op.getOperand(2);
		SDValue Cond = Op.getOperand(3);
		SDLoc DL(Op);

		DavidKreitzerUnsubmitted Not Done Reply Inline Actions Your operand ordering for SETCC_PARTS doesn't match your comment in ISDOpcodes.h. The two should at least be in sync, and I already gave my opinion on what the operand order ought to be. :-) DavidKreitzer: Your operand ordering for SETCC_PARTS doesn't match your comment in ISDOpcodes.h. The two…
		hansAuthorUnsubmitted Not Done Reply Inline Actions Oops. Thanks for catching this. So much for trying to document things :-) I've updated the code to use the order you suggested. hans: Oops. Thanks for catching this. So much for trying to document things :-) I've updated the code…
		assert(LHS.getSimpleValueType().isInteger() && "SETCCE is integer only.");
		X86::CondCode CC = TranslateIntegerX86CC(cast<CondCodeSDNode>(Cond)->get());
		DavidKreitzerUnsubmitted Not Done Reply Inline Actions I like the idea of leveraging TranslateX86CC, but I believe these two optimizations at the beginning of TranslateX86CC are invalid for SETCCE and somehow need to be avoided for it: // X > -1 -> X == 0, jump !sign. // X < 0 -> X == 0, jump on sign. They are invalid in the one special case where X is 0x80000000 and the carry is true. FWIW, this optimization is valid: // X < 1 -> X <= 0 DavidKreitzer: I like the idea of leveraging TranslateX86CC, but I believe these two optimizations at the…
		hansAuthorUnsubmitted Not Done Reply Inline Actions Thanks for catching that! hans: Thanks for catching that!

		DavidKreitzerUnsubmitted Not Done Reply Inline Actions I would recommend against this restriction. The SETCCE operation itself makes perfect sense for other conditions. (For example, SETCCE "eq" computes op0 - op1 - carry == 0.) For the purposes of large integer compare lowering, it happens to only be useful for < and >=, but we might find other good uses for it. (I'm imagining some fancy DAG combine optimizations ...) DavidKreitzer: I would recommend against this restriction. The SETCCE operation itself makes perfect sense for…
		hansAuthorUnsubmitted Not Done Reply Inline Actions Sounds good to me. hans: Sounds good to me.
		assert(Carry.getOpcode() != ISD::CARRY_FALSE);
		SDVTList VTs = DAG.getVTList(LHS.getValueType(), MVT::i32);
		SDValue Cmp = DAG.getNode(X86ISD::SBB, DL, VTs, LHS, RHS, Carry);
		return DAG.getNode(X86ISD::SETCC, DL, Op.getValueType(),
		DAG.getConstant(CC, DL, MVT::i8), Cmp.getValue(1));
		}

// isX86LogicalCmp - Return true if opcode is a X86 logical comparison.		// isX86LogicalCmp - Return true if opcode is a X86 logical comparison.
static bool isX86LogicalCmp(SDValue Op) {		static bool isX86LogicalCmp(SDValue Op) {
unsigned Opc = Op.getNode()->getOpcode();		unsigned Opc = Op.getNode()->getOpcode();
if (Opc == X86ISD::CMP \|\| Opc == X86ISD::COMI \|\| Opc == X86ISD::UCOMI \|\|		if (Opc == X86ISD::CMP \|\| Opc == X86ISD::COMI \|\| Opc == X86ISD::UCOMI \|\|
Opc == X86ISD::SAHF)		Opc == X86ISD::SAHF)
return true;		return true;
if (Op.getResNo() == 1 &&		if (Op.getResNo() == 1 &&
(Opc == X86ISD::ADD \|\|		(Opc == X86ISD::ADD \|\|
Opc == X86ISD::SUB \|\|		Opc == X86ISD::SUB \|\|
Opc == X86ISD::ADC \|\|		Opc == X86ISD::ADC \|\|
Opc == X86ISD::SBB \|\|		Opc == X86ISD::SBB \|\|
Opc == X86ISD::SMUL \|\|		Opc == X86ISD::SMUL \|\|
Opc == X86ISD::UMUL \|\|		Opc == X86ISD::UMUL \|\|
Opc == X86ISD::INC \|\|		Opc == X86ISD::INC \|\|
Opc == X86ISD::DEC \|\|		Opc == X86ISD::DEC \|\|
Opc == X86ISD::OR \|\|		Opc == X86ISD::OR \|\|
Opc == X86ISD::XOR \|\|		Opc == X86ISD::XOR \|\|
Opc == X86ISD::AND))		Opc == X86ISD::AND))
return true;		return true;

		DavidKreitzerUnsubmitted Not Done Reply Inline Actions The code sequences you are generating here look great! DavidKreitzer: The code sequences you are generating here look great!
		hansAuthorUnsubmitted Not Done Reply Inline Actions Thanks to you and Michael for suggesting it :-) hans: Thanks to you and Michael for suggesting it :-)
if (Op.getResNo() == 2 && Opc == X86ISD::UMUL)		if (Op.getResNo() == 2 && Opc == X86ISD::UMUL)
return true;		return true;

return false;		return false;
}		}

static bool isTruncWithZeroHighBitsInput(SDValue V, SelectionDAG &DAG) {		static bool isTruncWithZeroHighBitsInput(SDValue V, SelectionDAG &DAG) {
if (V.getOpcode() != ISD::TRUNCATE)		if (V.getOpcode() != ISD::TRUNCATE)
▲ Show 20 Lines • Show All 659 Lines • ▼ Show 20 Lines	static bool isXor1OfSetCC(SDValue Op) {
}		}
return false;		return false;
}		}

SDValue X86TargetLowering::LowerBRCOND(SDValue Op, SelectionDAG &DAG) const {		SDValue X86TargetLowering::LowerBRCOND(SDValue Op, SelectionDAG &DAG) const {
bool addTest = true;		bool addTest = true;
SDValue Chain = Op.getOperand(0);		SDValue Chain = Op.getOperand(0);
SDValue Cond = Op.getOperand(1);		SDValue Cond = Op.getOperand(1);
SDValue Dest = Op.getOperand(2);		SDValue Dest = Op.getOperand(2);
		mkuperUnsubmitted Not Done Reply Inline Actions I'm not a huge fan of this, but producing the pseudo in a target-specific pre-legalization DAG combine sounds like it may cause too many problems due to making the comparison opaque to other early combines. mkuper: I'm not a huge fan of this, but producing the pseudo in a target-specific pre-legalization DAG…
SDLoc dl(Op);		SDLoc dl(Op);
SDValue CC;		SDValue CC;
bool Inverted = false;		bool Inverted = false;

if (Cond.getOpcode() == ISD::SETCC) {		if (Cond.getOpcode() == ISD::SETCC) {
// Check for setcc([su]{add,sub,mul}o == 0).		// Check for setcc([su]{add,sub,mul}o == 0).
if (cast<CondCodeSDNode>(Cond.getOperand(2))->get() == ISD::SETEQ &&		if (cast<CondCodeSDNode>(Cond.getOperand(2))->get() == ISD::SETEQ &&
isa<ConstantSDNode>(Cond.getOperand(1)) &&		isa<ConstantSDNode>(Cond.getOperand(1)) &&
▲ Show 20 Lines • Show All 4,371 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::FP_TO_UINT: return LowerFP_TO_UINT(Op, DAG);		case ISD::FP_TO_UINT: return LowerFP_TO_UINT(Op, DAG);
case ISD::FP_EXTEND: return LowerFP_EXTEND(Op, DAG);		case ISD::FP_EXTEND: return LowerFP_EXTEND(Op, DAG);
case ISD::LOAD: return LowerExtendedLoad(Op, Subtarget, DAG);		case ISD::LOAD: return LowerExtendedLoad(Op, Subtarget, DAG);
case ISD::FABS:		case ISD::FABS:
case ISD::FNEG: return LowerFABSorFNEG(Op, DAG);		case ISD::FNEG: return LowerFABSorFNEG(Op, DAG);
case ISD::FCOPYSIGN: return LowerFCOPYSIGN(Op, DAG);		case ISD::FCOPYSIGN: return LowerFCOPYSIGN(Op, DAG);
case ISD::FGETSIGN: return LowerFGETSIGN(Op, DAG);		case ISD::FGETSIGN: return LowerFGETSIGN(Op, DAG);
case ISD::SETCC: return LowerSETCC(Op, DAG);		case ISD::SETCC: return LowerSETCC(Op, DAG);
		case ISD::SETCCE: return LowerSETCCE(Op, DAG);
case ISD::SELECT: return LowerSELECT(Op, DAG);		case ISD::SELECT: return LowerSELECT(Op, DAG);
case ISD::BRCOND: return LowerBRCOND(Op, DAG);		case ISD::BRCOND: return LowerBRCOND(Op, DAG);
case ISD::JumpTable: return LowerJumpTable(Op, DAG);		case ISD::JumpTable: return LowerJumpTable(Op, DAG);
case ISD::VASTART: return LowerVASTART(Op, DAG);		case ISD::VASTART: return LowerVASTART(Op, DAG);
case ISD::VAARG: return LowerVAARG(Op, DAG);		case ISD::VAARG: return LowerVAARG(Op, DAG);
case ISD::VACOPY: return LowerVACOPY(Op, Subtarget, DAG);		case ISD::VACOPY: return LowerVACOPY(Op, Subtarget, DAG);
case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, Subtarget, DAG);		case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, Subtarget, DAG);
case ISD::INTRINSIC_VOID:		case ISD::INTRINSIC_VOID:
▲ Show 20 Lines • Show All 1,541 Lines • ▼ Show 20 Lines	X86TargetLowering::EmitLoweredAtomicFP(MachineInstr *MI,
MachineBasicBlock *BB) const {		MachineBasicBlock *BB) const {
// Combine the following atomic floating-point modification pattern:		// Combine the following atomic floating-point modification pattern:
// a.store(reg OP a.load(acquire), release)		// a.store(reg OP a.load(acquire), release)
// Transform them into:		// Transform them into:
// OPss (%gpr), %xmm		// OPss (%gpr), %xmm
// movss %xmm, (%gpr)		// movss %xmm, (%gpr)
// Or sd equivalent for 64-bit operations.		// Or sd equivalent for 64-bit operations.
unsigned MOp, FOp;		unsigned MOp, FOp;
switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
		mkuperUnsubmitted Not Done Reply Inline Actions Why is this guaranteed? mkuper: Why is this guaranteed?
default: llvm_unreachable("unexpected instr type for EmitLoweredAtomicFP");		default: llvm_unreachable("unexpected instr type for EmitLoweredAtomicFP");
case X86::RELEASE_FADD32mr: MOp = X86::MOVSSmr; FOp = X86::ADDSSrm; break;		case X86::RELEASE_FADD32mr: MOp = X86::MOVSSmr; FOp = X86::ADDSSrm; break;
case X86::RELEASE_FADD64mr: MOp = X86::MOVSDmr; FOp = X86::ADDSDrm; break;		case X86::RELEASE_FADD64mr: MOp = X86::MOVSDmr; FOp = X86::ADDSDrm; break;
}		}
const X86InstrInfo *TII = Subtarget->getInstrInfo();		const X86InstrInfo *TII = Subtarget->getInstrInfo();
DebugLoc DL = MI->getDebugLoc();		DebugLoc DL = MI->getDebugLoc();
MachineRegisterInfo &MRI = BB->getParent()->getRegInfo();		MachineRegisterInfo &MRI = BB->getParent()->getRegInfo();
MachineOperand MSrc = MI->getOperand(0);		MachineOperand MSrc = MI->getOperand(0);
▲ Show 20 Lines • Show All 6,380 Lines • Show Last 20 Lines

test/CodeGen/X86/2012-08-17-legalizer-crash.ll

	Show All 20 Lines
	if.then: ; preds = %entry			if.then: ; preds = %entry
	store i576 %srcval2, i576* %1, align 8			store i576 %srcval2, i576* %1, align 8
	br label %if.end			br label %if.end

	if.end: ; preds = %if.then, %entry			if.end: ; preds = %if.then, %entry
	ret void			ret void

	; CHECK-LABEL: fn1:			; CHECK-LABEL: fn1:
	; CHECK: shrq $32, [[REG:%.*]]			; CHECK: jb
	; CHECK: sete
	}			}

test/CodeGen/X86/atomic-minmax-i6432.ll

	; RUN: llc -march=x86 -mattr=+cmov,cx16 -mtriple=i386-pc-linux -verify-machineinstrs < %s \| FileCheck %s -check-prefix=LINUX			; RUN: llc -march=x86 -mattr=+cmov,cx16 -mtriple=i386-pc-linux -verify-machineinstrs < %s \| FileCheck %s -check-prefix=LINUX
	; RUN: llc -march=x86 -mattr=cx16 -mtriple=i386-macosx -relocation-model=pic -verify-machineinstrs < %s \| FileCheck %s -check-prefix=PIC			; RUN: llc -march=x86 -mattr=cx16 -mtriple=i386-macosx -relocation-model=pic -verify-machineinstrs < %s \| FileCheck %s -check-prefix=PIC

	@sc64 = external global i64			@sc64 = external global i64

	define void @atomic_maxmin_i6432() {			define void @atomic_maxmin_i6432() {
	; LINUX: atomic_maxmin_i6432			; LINUX: atomic_maxmin_i6432
	%1 = atomicrmw max i64* @sc64, i64 5 acquire			%1 = atomicrmw max i64* @sc64, i64 5 acquire
	; LINUX: [[LABEL:.LBB[0-9]+_[0-9]+]]			; LINUX: [[LABEL:.LBB[0-9]+_[0-9]+]]
	; LINUX: cmpl			; LINUX: cmpl
	; LINUX: seta			; LINUX: sbbl
	; LINUX: cmovne			; LINUX: cmovne
	; LINUX: cmovne			; LINUX: cmovne
	; LINUX: lock cmpxchg8b			; LINUX: lock cmpxchg8b
	; LINUX: jne [[LABEL]]			; LINUX: jne [[LABEL]]
	%2 = atomicrmw min i64* @sc64, i64 6 acquire			%2 = atomicrmw min i64* @sc64, i64 6 acquire
	; LINUX: [[LABEL:.LBB[0-9]+_[0-9]+]]			; LINUX: [[LABEL:.LBB[0-9]+_[0-9]+]]
	; LINUX: cmpl			; LINUX: cmpl
	; LINUX: setb			; LINUX: sbbl
	; LINUX: cmovne			; LINUX: cmovne
	; LINUX: cmovne			; LINUX: cmovne
	; LINUX: lock cmpxchg8b			; LINUX: lock cmpxchg8b
	; LINUX: jne [[LABEL]]			; LINUX: jne [[LABEL]]
	%3 = atomicrmw umax i64* @sc64, i64 7 acquire			%3 = atomicrmw umax i64* @sc64, i64 7 acquire
	; LINUX: [[LABEL:.LBB[0-9]+_[0-9]+]]			; LINUX: [[LABEL:.LBB[0-9]+_[0-9]+]]
	; LINUX: cmpl			; LINUX: cmpl
	; LINUX: seta			; LINUX: sbbl
	; LINUX: cmovne			; LINUX: cmovne
	; LINUX: cmovne			; LINUX: cmovne
	; LINUX: lock cmpxchg8b			; LINUX: lock cmpxchg8b
	; LINUX: jne [[LABEL]]			; LINUX: jne [[LABEL]]
	%4 = atomicrmw umin i64* @sc64, i64 8 acquire			%4 = atomicrmw umin i64* @sc64, i64 8 acquire
	; LINUX: [[LABEL:.LBB[0-9]+_[0-9]+]]			; LINUX: [[LABEL:.LBB[0-9]+_[0-9]+]]
	; LINUX: cmpl			; LINUX: cmpl
	; LINUX: setb			; LINUX: sbbl
	; LINUX: cmovne			; LINUX: cmovne
	; LINUX: cmovne			; LINUX: cmovne
	; LINUX: lock cmpxchg8b			; LINUX: lock cmpxchg8b
	; LINUX: jne [[LABEL]]			; LINUX: jne [[LABEL]]
	ret void			ret void
	}			}

	; rdar://12453106			; rdar://12453106
	Show All 12 Lines

test/CodeGen/X86/atomic128.ll

	Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines

	define void @fetch_and_min(i128* %p, i128 %bits) {			define void @fetch_and_min(i128* %p, i128 %bits) {
	; CHECK-LABEL: fetch_and_min:			; CHECK-LABEL: fetch_and_min:
	; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]			; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]
	; CHECK-DAG: movq (%rdi), %rax			; CHECK-DAG: movq (%rdi), %rax
	; CHECK-DAG: movq 8(%rdi), %rdx			; CHECK-DAG: movq 8(%rdi), %rdx

	; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:			; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:
	; CHECK: cmpq %rsi, %rax			; CHECK: cmpq
	; CHECK: setbe [[CMP:%[a-z0-9]+]]			; CHECK: sbbq
	; CHECK: cmpq [[INCHI]], %rdx			; CHECK: setg
	; CHECK: setle [[HICMP:%[a-z0-9]+]]
	; CHECK: je [[USE_LO:.?LBB[0-9]+_[0-9]+]]

	; CHECK: movb [[HICMP]], [[CMP]]
	; CHECK: [[USE_LO]]:
	; CHECK: testb [[CMP]], [[CMP]]
	; CHECK: movq %rsi, %rbx
	; CHECK: cmovneq %rax, %rbx			; CHECK: cmovneq %rax, %rbx
	; CHECK: movq [[INCHI]], %rcx			; CHECK: movq [[INCHI]], %rcx
	; CHECK: cmovneq %rdx, %rcx			; CHECK: cmovneq %rdx, %rcx
	; CHECK: lock			; CHECK: lock
	; CHECK: cmpxchg16b (%rdi)			; CHECK: cmpxchg16b (%rdi)
	; CHECK: jne [[LOOP]]			; CHECK: jne [[LOOP]]

	; CHECK: movq %rax, _var			; CHECK: movq %rax, _var
	; CHECK: movq %rdx, _var+8			; CHECK: movq %rdx, _var+8

	%val = atomicrmw min i128* %p, i128 %bits seq_cst			%val = atomicrmw min i128* %p, i128 %bits seq_cst
	store i128 %val, i128* @var, align 16			store i128 %val, i128* @var, align 16
	ret void			ret void
	}			}

	define void @fetch_and_max(i128* %p, i128 %bits) {			define void @fetch_and_max(i128* %p, i128 %bits) {
	; CHECK-LABEL: fetch_and_max:			; CHECK-LABEL: fetch_and_max:
	; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]			; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]
	; CHECK-DAG: movq (%rdi), %rax			; CHECK-DAG: movq (%rdi), %rax
	; CHECK-DAG: movq 8(%rdi), %rdx			; CHECK-DAG: movq 8(%rdi), %rdx

	; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:			; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:
	; CHECK: cmpq %rsi, %rax			; CHECK: cmpq
	; CHECK: setae [[CMP:%[a-z0-9]+]]			; CHECK: sbbq
	; CHECK: cmpq [[INCHI]], %rdx			; CHECK: setge
	; CHECK: setge [[HICMP:%[a-z0-9]+]]
	; CHECK: je [[USE_LO:.?LBB[0-9]+_[0-9]+]]

	; CHECK: movb [[HICMP]], [[CMP]]
	; CHECK: [[USE_LO]]:
	; CHECK: testb [[CMP]], [[CMP]]
	; CHECK: movq %rsi, %rbx
	; CHECK: cmovneq %rax, %rbx			; CHECK: cmovneq %rax, %rbx
	; CHECK: movq [[INCHI]], %rcx			; CHECK: movq [[INCHI]], %rcx
	; CHECK: cmovneq %rdx, %rcx			; CHECK: cmovneq %rdx, %rcx
	; CHECK: lock			; CHECK: lock
	; CHECK: cmpxchg16b (%rdi)			; CHECK: cmpxchg16b (%rdi)
	; CHECK: jne [[LOOP]]			; CHECK: jne [[LOOP]]

	; CHECK: movq %rax, _var			; CHECK: movq %rax, _var
	; CHECK: movq %rdx, _var+8			; CHECK: movq %rdx, _var+8

	%val = atomicrmw max i128* %p, i128 %bits seq_cst			%val = atomicrmw max i128* %p, i128 %bits seq_cst
	store i128 %val, i128* @var, align 16			store i128 %val, i128* @var, align 16
	ret void			ret void
	}			}

	define void @fetch_and_umin(i128* %p, i128 %bits) {			define void @fetch_and_umin(i128* %p, i128 %bits) {
	; CHECK-LABEL: fetch_and_umin:			; CHECK-LABEL: fetch_and_umin:
	; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]			; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]
	; CHECK-DAG: movq (%rdi), %rax			; CHECK-DAG: movq (%rdi), %rax
	; CHECK-DAG: movq 8(%rdi), %rdx			; CHECK-DAG: movq 8(%rdi), %rdx

	; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:			; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:
	; CHECK: cmpq %rsi, %rax			; CHECK: cmpq
	; CHECK: setbe [[CMP:%[a-z0-9]+]]			; CHECK: sbbq
	; CHECK: cmpq [[INCHI]], %rdx			; CHECK: seta
	; CHECK: setbe [[HICMP:%[a-z0-9]+]]
	; CHECK: je [[USE_LO:.?LBB[0-9]+_[0-9]+]]

	; CHECK: movb [[HICMP]], [[CMP]]
	; CHECK: [[USE_LO]]:
	; CHECK: testb [[CMP]], [[CMP]]
	; CHECK: movq %rsi, %rbx
	; CHECK: cmovneq %rax, %rbx			; CHECK: cmovneq %rax, %rbx
	; CHECK: movq [[INCHI]], %rcx			; CHECK: movq [[INCHI]], %rcx
	; CHECK: cmovneq %rdx, %rcx			; CHECK: cmovneq %rdx, %rcx
	; CHECK: lock			; CHECK: lock
	; CHECK: cmpxchg16b (%rdi)			; CHECK: cmpxchg16b (%rdi)
	; CHECK: jne [[LOOP]]			; CHECK: jne [[LOOP]]

	; CHECK: movq %rax, _var			; CHECK: movq %rax, _var
	; CHECK: movq %rdx, _var+8			; CHECK: movq %rdx, _var+8

	%val = atomicrmw umin i128* %p, i128 %bits seq_cst			%val = atomicrmw umin i128* %p, i128 %bits seq_cst
	store i128 %val, i128* @var, align 16			store i128 %val, i128* @var, align 16
	ret void			ret void
	}			}

	define void @fetch_and_umax(i128* %p, i128 %bits) {			define void @fetch_and_umax(i128* %p, i128 %bits) {
	; CHECK-LABEL: fetch_and_umax:			; CHECK-LABEL: fetch_and_umax:
	; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]			; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]
	; CHECK-DAG: movq (%rdi), %rax			; CHECK-DAG: movq (%rdi), %rax
	; CHECK-DAG: movq 8(%rdi), %rdx			; CHECK-DAG: movq 8(%rdi), %rdx

	; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:			; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:
	; CHECK: cmpq %rax, %rsi			; CHECK: cmpq
	; CHECK: setb [[CMP:%[a-z0-9]+]]			; CHECK: sbbq
	; CHECK: cmpq [[INCHI]], %rdx			; CHECK: setb
	; CHECK: seta [[HICMP:%[a-z0-9]+]]
	; CHECK: je [[USE_LO:.?LBB[0-9]+_[0-9]+]]

	; CHECK: movb [[HICMP]], [[CMP]]
	; CHECK: [[USE_LO]]:
	; CHECK: testb [[CMP]], [[CMP]]
	; CHECK: movq %rsi, %rbx
	; CHECK: cmovneq %rax, %rbx			; CHECK: cmovneq %rax, %rbx
	; CHECK: movq [[INCHI]], %rcx			; CHECK: movq [[INCHI]], %rcx
	; CHECK: cmovneq %rdx, %rcx			; CHECK: cmovneq %rdx, %rcx
	; CHECK: lock			; CHECK: lock
	; CHECK: cmpxchg16b (%rdi)			; CHECK: cmpxchg16b (%rdi)
	; CHECK: jne [[LOOP]]			; CHECK: jne [[LOOP]]

	; CHECK: movq %rax, _var			; CHECK: movq %rax, _var
	▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

test/CodeGen/X86/avx512-cmp.ll

	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl --show-mc-encoding \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl --show-mc-encoding \| FileCheck %s
	; RUN: llc < %s -mtriple=i386-apple-darwin -mcpu=knl \| FileCheck %s --check-prefix AVX512-32

	; CHECK-LABEL: test1			; CHECK-LABEL: test1
	; CHECK: vucomisd {{.*}}encoding: [0x62			; CHECK: vucomisd {{.*}}encoding: [0x62
	define double @test1(double %a, double %b) nounwind {			define double @test1(double %a, double %b) nounwind {
	%tobool = fcmp une double %a, %b			%tobool = fcmp une double %a, %b
	br i1 %tobool, label %l1, label %l2			br i1 %tobool, label %l1, label %l2

	l1:			l1:
	▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
	%b = and i64 %a, 1			%b = and i64 %a, 1
	%cmp10.i = icmp eq i64 %b, 0			%cmp10.i = icmp eq i64 %b, 0
	br i1 %cmp10.i, label %A, label %B			br i1 %cmp10.i, label %A, label %B
	A:			A:
	ret i32 6			ret i32 6
	B:			B:
	ret i32 7			ret i32 7
	}			}

	; AVX512-32-LABEL: test10
	; AVX512-32: movl 4(%esp), %ecx
	; AVX512-32: cmpl $9, (%ecx)
	; AVX512-32: seta %al
	; AVX512-32: cmpl $0, 4(%ecx)
	; AVX512-32: setg %cl
	; AVX512-32: je
	; AVX512-32: movb %cl, %al
	; AVX512-32: testb $1, %al

	define void @test10(i64* %i.addr) {

	%x = load i64, i64* %i.addr, align 8
	%cmp = icmp slt i64 %x, 10
	br i1 %cmp, label %true, label %false

	true:
	ret void

	false:
	ret void
	}

test/CodeGen/X86/wide-integer-cmp.ll

This file was added.

				; RUN: llc -mtriple=i686-linux-gnu %s -o - \| FileCheck %s


				define i32 @branch_eq(i64 %a, i64 %b) {
				entry:
				%cmp = icmp eq i64 %a, %b
				br i1 %cmp, label %bb1, label %bb2
				bb1:
				ret i32 1
				bb2:
				ret i32 2

				; CHECK-LABEL: branch_eq:
				; CHECK: movl 4(%esp), [[LHSLo:%[a-z]+]]
				; CHECK: movl 8(%esp), [[LHSHi:%[a-z]+]]
				; CHECK: xorl 16(%esp), [[LHSHi]]
				; CHECK: xorl 12(%esp), [[LHSLo]]
				; CHECK: orl [[LHSHi]], [[LHSLo]]
				; CHECK: jne [[FALSE:.LBB[0-9_]+]]
				; CHECK: movl $1, %eax
				; CHECK: retl
				; CHECK: [[FALSE]]:
				; CHECK: movl $2, %eax
				; CHECK: retl
				}

				define i32 @branch_slt(i64 %a, i64 %b) {
				entry:
				%cmp = icmp slt i64 %a, %b
				br i1 %cmp, label %bb1, label %bb2
				bb1:
				ret i32 1
				bb2:
				ret i32 2

				; CHECK-LABEL: branch_slt:
				; CHECK: movl 4(%esp), [[LHSLo:%[a-z]+]]
				; CHECK: movl 8(%esp), [[LHSHi:%[a-z]+]]
				; CHECK: cmpl 12(%esp), [[LHSLo]]
				DavidKreitzerUnsubmitted Done Reply Inline Actions I think you meant LHSHi here. DavidKreitzer: I think you meant LHSHi here.
				; CHECK: sbbl 16(%esp), [[LHSHi]]
				; CHECK: jge [[FALSE:.LBB[0-9_]+]]
				; CHECK: movl $1, %eax
				; CHECK: retl
				; CHECK: [[FALSE]]:
				; CHECK: movl $2, %eax
				; CHECK: retl
				}

				define i32 @branch_ule(i64 %a, i64 %b) {
				entry:
				%cmp = icmp ule i64 %a, %b
				br i1 %cmp, label %bb1, label %bb2
				bb1:
				ret i32 1
				bb2:
				ret i32 2

				; CHECK-LABEL: branch_ule:
				; CHECK: movl 12(%esp), [[RHSLo:%[a-z]+]]
				; CHECK: movl 16(%esp), [[RHSHi:%[a-z]+]]
				; CHECK: cmpl 4(%esp), [[RHSLo]]
				DavidKreitzerUnsubmitted Done Reply Inline Actions Again, this looks odd. Did you mean RHSHi here and at line 62? DavidKreitzer: Again, this looks odd. Did you mean RHSHi here and at line 62?
				; CHECK: sbbl 8(%esp), [[RHSHi]]
				; CHECK: jb [[FALSE:.LBB[0-9_]+]]
				; CHECK: movl $1, %eax
				; CHECK: retl
				; CHECK: [[FALSE]]:
				; CHECK: movl $2, %eax
				; CHECK: retl
				}

				define i32 @set_gt(i64 %a, i64 %b) {
				entry:
				%cmp = icmp sgt i64 %a, %b
				%res = select i1 %cmp, i32 1, i32 0
				ret i32 %res

				; CHECK-LABEL: set_gt:
				; CHECK: movl 12(%esp), [[RHSLo:%[a-z]+]]
				; CHECK: movl 16(%esp), [[RHSHi:%[a-z]+]]
				; CHECK: cmpl 4(%esp), [[RHSLo]]
				DavidKreitzerUnsubmitted Not Done Reply Inline Actions Also RHSHi here and at line 81. DavidKreitzer: Also RHSHi here and at line 81.
				hansAuthorUnsubmitted Not Done Reply Inline Actions Thanks! Not sure how I managed to get them so backwards. hans: Thanks! Not sure how I managed to get them so backwards.
				; CHECK: sbbl 8(%esp), [[RHSHi]]
				; CHECK: setl %al
				; CHECK: retl
				}

				define i32 @test_wide(i128 %a, i128 %b) {
				entry:
				DavidKreitzerUnsubmitted Not Done Reply Inline Actions Nice! This is SO much better than the current code! DavidKreitzer: Nice! This is SO much better than the current code!
				%cmp = icmp slt i128 %a, %b
				br i1 %cmp, label %bb1, label %bb2
				bb1:
				ret i32 1
				bb2:
				ret i32 2

				; CHECK-LABEL: test_wide:
				; CHECK: cmpl 24(%esp)
				; CHECK: sbbl 28(%esp)
				; CHECK: sbbl 32(%esp)
				; CHECK: sbbl 36(%esp)
				; CHECK: jge [[FALSE:.LBB[0-9_]+]]
				; CHECK: movl $1, %eax
				; CHECK: retl
				; CHECK: [[FALSE]]:
				; CHECK: movl $2, %eax
				; CHECK: retl
				}

				define i32 @test_carry_false(i64 %a, i64 %b) {
				entry:
				%x = and i64 %a, -4294967296 ;0xffffffff00000000
				%y = and i64 %b, -4294967296
				%cmp = icmp slt i64 %x, %y
				br i1 %cmp, label %bb1, label %bb2
				bb1:
				ret i32 1
				bb2:
				ret i32 2

				; The comparison of the low bits will be folded to a CARRY_FALSE node. Make
				; sure the code can handle that.
				; CHECK-LABEL: carry_false:
				; CHECK: movl 8(%esp), [[LHSHi:%[a-z]+]]
				; CHECK: cmpl 16(%esp), [[LHSHi]]
				; CHECK: jge [[FALSE:.LBB[0-9_]+]]
				; CHECK: movl $1, %eax
				; CHECK: retl
				; CHECK: [[FALSE]]:
				; CHECK: movl $2, %eax
				; CHECK: retl
				}

test/CodeGen/X86/win32-pic-jumptable.ll

	; RUN: llc < %s -relocation-model=pic \| FileCheck %s			; RUN: llc < %s -relocation-model=pic \| FileCheck %s

	; CHECK: calll L0$pb			; CHECK: calll L0$pb
	; CHECK-NEXT: L0$pb:			; CHECK-NEXT: L0$pb:
	; CHECK-NEXT: popl %eax			; CHECK-NEXT: popl %eax
	; CHECK-NEXT: addl LJTI0_0(,%ecx,4), %eax			; CHECK-NEXT: addl LJTI0_0(,%ecx,4), %eax
	; CHECK-NEXT: jmpl *%eax			; CHECK-NEXT: jmpl *%eax

	; CHECK: LJTI0_0:			; CHECK: LJTI0_0:
				; CHECK-NEXT: .long LBB0_2-L0$pb
				; CHECK-NEXT: .long LBB0_3-L0$pb
	; CHECK-NEXT: .long LBB0_4-L0$pb			; CHECK-NEXT: .long LBB0_4-L0$pb
	; CHECK-NEXT: .long LBB0_5-L0$pb			; CHECK-NEXT: .long LBB0_5-L0$pb
	; CHECK-NEXT: .long LBB0_6-L0$pb
	; CHECK-NEXT: .long LBB0_7-L0$pb


	target triple = "i686--windows-itanium"			target triple = "i686--windows-itanium"
	define i32 @f(i64 %x) {			define i32 @f(i64 %x) {
	bb0:			bb0:
	switch i64 %x, label %bb5 [			switch i64 %x, label %bb5 [
	i64 1, label %bb1			i64 1, label %bb1
	i64 2, label %bb2			i64 2, label %bb2
	Show All 15 Lines

This is an archive of the discontinued LLVM Phabricator instance.

X86: More efficient codegen for 64-bit compares on 32-bit target ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 40647

include/llvm/CodeGen/ISDOpcodes.h

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

lib/CodeGen/SelectionDAG/LegalizeTypes.h

lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

lib/Target/X86/X86ISelLowering.h

lib/Target/X86/X86ISelLowering.cpp

test/CodeGen/X86/2012-08-17-legalizer-crash.ll

test/CodeGen/X86/atomic-minmax-i6432.ll

test/CodeGen/X86/atomic128.ll

test/CodeGen/X86/avx512-cmp.ll

test/CodeGen/X86/wide-integer-cmp.ll

test/CodeGen/X86/win32-pic-jumptable.ll

X86: More efficient codegen for 64-bit compares on 32-bit target
ClosedPublic