Download Raw Diff

Details

Reviewers

echristo
javed.absar
efriedma

Commits

rG73e8a784e62f: [SelectionDAG] Improve the legalisation lowering of UMULO.
rL339922: [SelectionDAG] Improve the legalisation lowering of UMULO.

Summary

There is no way in the universe, that doing a full-width division in
software will be faster than doing overflowing multiplication in
software in the first place, especially given that this same full-width
multiplication needs to be done anyway.

This patch replaces the previous implementation with a direct lowering
into an overflowing multiplication algorithm based on half-width
operations.

Correctness of the algorithm was verified by exhaustively checking the
output of this algorithm for overflowing multiplication of 16 bit
integers against an obviously correct widening multiplication. Baring
any oversights introduced by porting the algorithm to DAG, confidence in
correctness of this algorithm is extremely high.

Following table shows the change in both t = runtime and s = space. The
change is expressed as a multiplier of original, so anything under 1 is
“better” and anything above 1 is worse.

+-------+-----------+-----------+-------------+-------------+
| Arch  | u64*u64 t | u64*u64 s | u128*u128 t | u128*u128 s |
+-------+-----------+-----------+-------------+-------------+
|   X64 |     -     |     -     |    ~0.5     |    ~0.64    |
|  i686 |   ~0.5    |   ~0.6666 |    ~0.05    |    ~0.9     |
| armv7 |     -     |   ~0.75   |      -      |    ~1.4     |
+-------+-----------+-----------+-------------+-------------+

Performance numbers have been collected by running overflowing
multiplication in a loop under `perf` on two x86_64 (one Intel Haswell,
other AMD Ryzen) based machines. Size numbers have been collected by
looking at the size of function containing an overflowing multiply in
a loop.

All in all, it can be seen that both performance and size has improved
except in the case of armv7 where code size has regressed for 128-bit
multiply. u128*u128 overflowing multiply on 32-bit platforms seem to
benefit from this change a lot, taking only 5% of the time compared to
original algorithm to calculate the same thing.

The final benefit of this change is that LLVM is now capable of lowering
the overflowing unsigned multiply for integers of any bit-width as long
as the target is capable of lowering regular multiplication for the same
bit-width. Previously, 128-bit overflowing multiply was the widest
possible.

Notes:

This change might have broken some tests I have not caught. I have no idea what tests are present and how to run them all, so I’ll leave it up to CI to build and run the tests.
- ninja check-all seems to pass locally, but 1) I haven’t all targets enabled; and 2) Some of my previous revisions failed tests at CI even if I had all targets enabled…
I have no idea how style in LLVM is enforced so I tried my best to match style with the surrounding code by hand;
I have no idea who the reviewers should be so I just picked Eric who seems to have introduced this code in the first place;
I do not have commit access, so somebody will have to land this for me.

Diff Detail

Repository: rL LLVM

Event Timeline

nagisa created this revision.Aug 5 2018, 8:00 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptAug 5 2018, 8:00 AM

Herald added a subscriber: kristof.beyls. · View Herald Transcript

nagisa edited the summary of this revision. (Show Details)Aug 5 2018, 8:02 AM

rkruppe added a subscriber: rkruppe.Aug 5 2018, 9:09 AM

I’ve added some codegen tests. Those just ensure that targets are able to lower this operation more than anything else, although I did manually verify the x64 assembly to check whether it really looks like what I expect.

I would very much love to add a test that actually executes some machine code with a number of test vectors, but I believe no such test family exists in LLVM.

Herald added subscribers: the_o, brucehoult, MartinMosbeck and 18 others. · View Herald TranscriptAug 6 2018, 1:51 PM

I would very much love to add a test that actually executes some machine code with a number of test vectors, but I believe no such test family exists in LLVM.

That would require being able to execute code for an arbitrary target, which would in general require an emulator, which is a lot more hassle than it's worth.

No tests for umul.with.overflow.i64 on 32-bit?

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2717	This add can overflow, but I guess that can only happen if %0 is true?
2743	Transforming umul_lohi to a call to __multi3 isn't particularly useful. We could expand it inline, but it's a lot of code.
2754	Maybe BUILD_PAIR here instead, since it's going to get split anyway?
test/CodeGen/ARM/muloti2.ll
3 ↗	(On Diff #159383)	v6 and v7 are basically identical for this purpose; the code isn't using any v7-specific instructions.
4 ↗	(On Diff #159383)	AArch64 tests aren't allowed in test/CodeGen/ARM; they have to go in test/CodeGen/AArch64. (It's possible to build LLVM with the AArch64 backend enabled, but not the ARM backend.) Probably should also test ARMv6 in Thumb mode for completeness (although I expect the result to be messy).

In D50310#1190302, @efriedma wrote:

No tests for umul.with.overflow.i64 on 32-bit?

Thanks for the review. On one hand, I ended up forgetting as it was getting late, on the other hand, those are already tested by the definition of the i128 umulo lowering which uses i64 umulo in its implementation. I will add a few tests targetting i64 umulo specifically later today.

nagisa added inline comments.Aug 7 2018, 3:40 AM

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2717	When this operation overflows, one or more of the `%1.1`, `%2.1` and `%0` will already be true and thus the whole operation will already have overflow bit set.
2743	The intention to use `umul_lohi` was to specifically give targets the information that a widening multiply is expected here, so targets which natively support this operation could do that without necessarily inspecting the operands for `ZERO_EXTEND`. Alas, targets like 32-bit ARM outright refuse to lower such operation for `i64,i64` e.g. output, and, I assume, many more targets would have trouble with `i128,i128 umul_lohi`.
2754	Will try, but not sure if you can add two "pairs" together.

Changed code to use BUILD_PAIR;
Added the tests for i64 umulo;
Moved the AArch tests to the right places;
Added a Thumbv6 i128 umulo test.

Please choose a different name for the tests; "muloti2.ll" isn't usefully indicating what the files actually test.

Correctness of the algorithm was verified by exhaustively checking the output of this algorithm for overflowing multiplication of 16 bit integers against an obviously correct widening multiplication.

What exactly did you verify? Looking again, I'm pretty sure your example IR doesn't compute the correct result: you never compute the product %LHS.HI * %RHS.HI.

test/CodeGen/X86/muloti.ll
60	This test is pretty useless; could get rid of it, I guess, since it's covered by muloti2.ll.

Herald added a subscriber: PkmX. · View Herald TranscriptAug 8 2018, 12:59 PM

In D50310#1192833, @efriedma wrote:

Please choose a different name for the tests; "muloti2.ll" isn't usefully indicating what the files actually test.

Does something like umulo-legalisation-lowering.ll sound good?

In D50310#1192833, @efriedma wrote:

Correctness of the algorithm was verified by exhaustively checking the output of this algorithm for overflowing multiplication of 16 bit integers against an obviously correct widening multiplication.

What exactly did you verify? Looking again, I'm pretty sure your example IR doesn't compute the correct result: you never compute the product %LHS.HI * %RHS.HI.

The computation of %LHS.HI * %RHS.HI is only necessary to compute the overflow bit. If multiplication was used, the check would look like this:

%product = { iNh, i1 } @umul.with.overflow.iNh(iNh %LHS.HI, iNh %RHS.HI)
%0 = product.0 != 0 || product.1 ; equivalent of the current %0

The %0 = %LHS.HI != 0 && %RHS.HI != 0 is an optimisation that calculates the same information without doing the multiply. Computing the product may be more efficient for some specific bitwidths on some targets, but I found the %LHS.HI != 0 && %RHS.HI != 0 variant to be more palatable in the general case.

The following Rust code is what I used to check the correctness of current algorithm exhaustively in a reasonable time-frame. I’m willing to port this test to C if there’s a demand for it.

type Half = u8;
type Full = u16;
type Double = u32;

const HALF_BITS: u32 = 8;
const FULL_BITS: u32 = 16;

pub fn obviously_correct(l: Full, r: Full) -> (Full, bool) {
    // Do a widening multiplication and check the high half to see if the multiplication
    // overflowed. Also correctly handles result wrapping in case of overflow.
    let doublewide = (l as Double).wrapping_mul(r as Double);
    (doublewide as Full, (doublewide >> FULL_BITS) != 0)
}

pub fn actual_implementation(l: Full, r: Full) -> (Full, bool) {
    let (lhs_lo, rhs_lo) = (l as Half, r as Half);
    let (lhs_hi, rhs_hi) = ((l >> HALF_BITS) as Half, (r >> HALF_BITS) as Half);

    let overflow0 = lhs_hi != 0 && rhs_hi != 0;
    let (r1, overflow1) = lhs_hi.overflowing_mul(rhs_lo);
    let (r2, overflow2) = rhs_hi.overflowing_mul(lhs_lo);
    let r3 = (lhs_lo as Full).wrapping_mul(rhs_lo as Full);
    let r4 = ((r1 as Full) << HALF_BITS).wrapping_add((r2 as Full) << HALF_BITS);
    let (r5, overflow5) = r4.overflowing_add(r3);
    (r5, overflow0 || overflow1 || overflow2 || overflow5)
}

pub fn main() {
    for lhs in Full::min_value()..=Full::max_value() {
        for rhs in Full::min_value()..=Full::max_value() {
            assert_eq!(obviously_correct(lhs, rhs), actual_implementation(lhs, rhs),
                       "results did not match for lhs={}, rhs={}", lhs, rhs);
        }
    }
}

The computation of %LHS.HI * %RHS.HI is only necessary to compute the overflow bit.

Oh, sorry, you're right, not sure what I was thinking.

I was reading the AArch64 code and thinking it looked strange, but the issue was just that the code was doing the operations in a strange order. An i128 multiply normally generates umulh+madd+madd, but for some reason your expansion generates mul+umulh+madd+add. Not really important.

Updated test filenames to better reflect what they are testing.

LGTM. (Do you want me to commit this for you?)

This revision is now accepted and ready to land.Aug 13 2018, 4:57 PM

In D50310#1198215, @efriedma wrote:

Do you want me to commit this for you?

Yes, please. Thanks!

Closed by commit rL339922: [SelectionDAG] Improve the legalisation lowering of UMULO. (authored by efriedma). · Explain WhyAug 16 2018, 11:40 AM

This revision was automatically updated to reflect the committed changes.

Herald added subscribers: jocewei, jrtc27. · View Herald TranscriptAug 16 2018, 11:40 AM

Diff 159212

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 2,699 Lines • ▼ Show 20 Lines	void DAGTypeLegalizer::ExpandIntRes_TRUNCATE(SDNode *N,
Hi = DAG.getNode(ISD::TRUNCATE, dl, NVT, Hi);		Hi = DAG.getNode(ISD::TRUNCATE, dl, NVT, Hi);
}		}

void DAGTypeLegalizer::ExpandIntRes_XMULO(SDNode *N,		void DAGTypeLegalizer::ExpandIntRes_XMULO(SDNode *N,
SDValue &Lo, SDValue &Hi) {		SDValue &Lo, SDValue &Hi) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc dl(N);		SDLoc dl(N);

// A divide for UMULO should be faster than a function call.
if (N->getOpcode() == ISD::UMULO) {		if (N->getOpcode() == ISD::UMULO) {
		// This section expands the operation into the following sequence of
		// instructions. `iNh` here refers to a type which has half the bit width of
		// the type the original operation operated on.
		//
		// %0 = %LHS.HI != 0 && %RHS.HI != 0
		// %1 = { iNh, i1 } @umul.with.overflow.iNh(iNh %LHS.HI, iNh %RHS.LO)
		// %2 = { iNh, i1 } @umul.with.overflow.iNh(iNh %RHS.HI, iNh %LHS.LO)
		// %3 = mul nuw iN (%LHS.LOW as iN), (%RHS.LOW as iN)
		// %4 = add iN (%1.0 as iN) << Nh, (%2.0 as iN) << Nh
		efriedmaUnsubmitted Not Done Reply Inline Actions This add can overflow, but I guess that can only happen if %0 is true? efriedma: This add can overflow, but I guess that can only happen if %0 is true?
		nagisaAuthorUnsubmitted Not Done Reply Inline Actions When this operation overflows, one or more of the `%1.1`, `%2.1` and `%0` will already be true and thus the whole operation will already have overflow bit set. nagisa: When this operation overflows, one or more of the `%1.1`, `%2.1` and `%0` will already be true…
		// %5 = { iN, i1 } @uadd.with.overflow.iN( %4, %3 )
		//
		// %res = { %5.0, %0 \|\| %1.1 \|\| %2.1 \|\| %5.1 }
SDValue LHS = N->getOperand(0), RHS = N->getOperand(1);		SDValue LHS = N->getOperand(0), RHS = N->getOperand(1);
		SDValue LHSHigh, LHSLow, RHSHigh, RHSLow;
SDValue MUL = DAG.getNode(ISD::MUL, dl, LHS.getValueType(), LHS, RHS);		SplitInteger(LHS, LHSLow, LHSHigh);
SplitInteger(MUL, Lo, Hi);		SplitInteger(RHS, RHSLow, RHSHigh);
		EVT HalfVT = LHSLow.getValueType()
// A divide for UMULO will be faster than a function call. Select to		, BitVT = N->getValueType(1);
// make sure we aren't using 0.		SDVTList VTHalfMulO = DAG.getVTList(HalfVT, BitVT);
SDValue isZero = DAG.getSetCC(dl, getSetCCResultType(VT),		SDVTList VTFullAddO = DAG.getVTList(VT, BitVT);
RHS, DAG.getConstant(0, dl, VT), ISD::SETEQ);
SDValue NotZero = DAG.getSelect(dl, VT, isZero,		SDValue HalfZero = DAG.getConstant(0, dl, HalfVT);
DAG.getConstant(1, dl, VT), RHS);		SDValue Overflow = DAG.getNode(ISD::AND, dl, BitVT,
SDValue DIV = DAG.getNode(ISD::UDIV, dl, VT, MUL, NotZero);		DAG.getSetCC(dl, BitVT, LHSHigh, HalfZero, ISD::SETNE),
SDValue Overflow = DAG.getSetCC(dl, N->getValueType(1), DIV, LHS,		DAG.getSetCC(dl, BitVT, RHSHigh, HalfZero, ISD::SETNE));
ISD::SETNE);
Overflow = DAG.getSelect(dl, N->getValueType(1), isZero,		SDValue One = DAG.getNode(ISD::UMULO, dl, VTHalfMulO, LHSHigh, RHSLow);
DAG.getConstant(0, dl, N->getValueType(1)),		Overflow = DAG.getNode(ISD::OR, dl, BitVT, Overflow, One.getValue(1));
Overflow);
		SDValue Two = DAG.getNode(ISD::UMULO, dl, VTHalfMulO, RHSHigh, LHSLow);
		Overflow = DAG.getNode(ISD::OR, dl, BitVT, Overflow, Two.getValue(1));

		// Cannot use `UMUL_LOHI` directly, because some 32-bit targets (ARM) do not
		// know how to expand `i64,i64 = umul_lohi a, b` and abort (why isn’t this
		// operation recursively legalized?).
		efriedmaUnsubmitted Not Done Reply Inline Actions Transforming umul_lohi to a call to __multi3 isn't particularly useful. We could expand it inline, but it's a lot of code. efriedma: Transforming umul_lohi to a call to __multi3 isn't particularly useful. We could expand it…
		nagisaAuthorUnsubmitted Not Done Reply Inline Actions The intention to use `umul_lohi` was to specifically give targets the information that a widening multiply is expected here, so targets which natively support this operation could do that without necessarily inspecting the operands for `ZERO_EXTEND`. Alas, targets like 32-bit ARM outright refuse to lower such operation for `i64,i64` e.g. output, and, I assume, many more targets would have trouble with `i128,i128 umul_lohi`. nagisa: The intention to use `umul_lohi` was to specifically give targets the information that a…
		//
		// Many backends understand this pattern and will convert into LOHI
		// themselves, if applicable.
		SDValue Three = DAG.getNode(ISD::MUL, dl, VT,
		DAG.getNode(ISD::ZERO_EXTEND, dl, VT, LHSLow),
		DAG.getNode(ISD::ZERO_EXTEND, dl, VT, RHSLow));

		MVT ShiftAmountTy = TLI.getScalarShiftAmountTy(DAG.getDataLayout(), VT);
		auto ShiftAmount = DAG.getConstant(One.getValueSizeInBits(), dl, ShiftAmountTy);
		SDValue OneInHigh = DAG.getNode(ISD::SHL, dl, VT,
		DAG.getNode(ISD::ANY_EXTEND, dl, VT, One.getValue(0)), ShiftAmount);
		efriedmaUnsubmitted Not Done Reply Inline Actions Maybe BUILD_PAIR here instead, since it's going to get split anyway? efriedma: Maybe BUILD_PAIR here instead, since it's going to get split anyway?
		nagisaAuthorUnsubmitted Not Done Reply Inline Actions Will try, but not sure if you can add two "pairs" together. nagisa: Will try, but not sure if you can add two "pairs" together.
		SDValue TwoInHigh = DAG.getNode(ISD::SHL, dl, VT,
		DAG.getNode(ISD::ANY_EXTEND, dl, VT, Two.getValue(0)), ShiftAmount);
		SDValue Four = DAG.getNode(ISD::ADD, dl, VT, OneInHigh, TwoInHigh);
		SDValue Five = DAG.getNode(ISD::UADDO, dl, VTFullAddO, Three, Four);
		Overflow = DAG.getNode(ISD::OR, dl, BitVT, Overflow, Five.getValue(1));
		SplitInteger(Five, Lo, Hi);
ReplaceValueWith(SDValue(N, 1), Overflow);		ReplaceValueWith(SDValue(N, 1), Overflow);
return;		return;
}		}

Type RetTy = VT.getTypeForEVT(DAG.getContext());		Type RetTy = VT.getTypeForEVT(DAG.getContext());
EVT PtrVT = TLI.getPointerTy(DAG.getDataLayout());		EVT PtrVT = TLI.getPointerTy(DAG.getDataLayout());
Type PtrTy = PtrVT.getTypeForEVT(DAG.getContext());		Type PtrTy = PtrVT.getTypeForEVT(DAG.getContext());

▲ Show 20 Lines • Show All 864 Lines • Show Last 20 Lines

test/CodeGen/X86/muloti.ll

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	; CHECK: foo
store i64 %b.coerce0, i64* %4		store i64 %b.coerce0, i64* %4
%5 = getelementptr %0, %0* %3, i32 0, i32 1		%5 = getelementptr %0, %0* %3, i32 0, i32 1
store i64 %b.coerce1, i64* %5		store i64 %b.coerce1, i64* %5
%b = load i128, i128* %coerce1, align 16		%b = load i128, i128* %coerce1, align 16
store i128 %b, i128* %b.addr, align 16		store i128 %b, i128* %b.addr, align 16
%tmp = load i128, i128* %a.addr, align 16		%tmp = load i128, i128* %a.addr, align 16
%tmp2 = load i128, i128* %b.addr, align 16		%tmp2 = load i128, i128* %b.addr, align 16
%6 = call %1 @llvm.umul.with.overflow.i128(i128 %tmp, i128 %tmp2)		%6 = call %1 @llvm.umul.with.overflow.i128(i128 %tmp, i128 %tmp2)
; CHECK: cmov		; CHECK-NOT: divti3
		efriedmaUnsubmitted Not Done Reply Inline Actions This test is pretty useless; could get rid of it, I guess, since it's covered by muloti2.ll. efriedma: This test is pretty useless; could get rid of it, I guess, since it's covered by muloti2.ll.
; CHECK: divti3
%7 = extractvalue %1 %6, 0		%7 = extractvalue %1 %6, 0
%8 = extractvalue %1 %6, 1		%8 = extractvalue %1 %6, 1
br i1 %8, label %overflow, label %nooverflow		br i1 %8, label %overflow, label %nooverflow

overflow: ; preds = %entry		overflow: ; preds = %entry
call void @llvm.trap()		call void @llvm.trap()
unreachable		unreachable

Show All 12 Lines

test/CodeGen/X86/select.ll

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; GENERIC-NEXT: shll $3, %eax			; GENERIC-NEXT: shll $3, %eax
	; GENERIC-NEXT: cmpl $32768, %eax ## imm = 0x8000			; GENERIC-NEXT: cmpl $32768, %eax ## imm = 0x8000
	; GENERIC-NEXT: jge LBB1_1			; GENERIC-NEXT: jge LBB1_1
	; GENERIC-NEXT: ## %bb.2: ## %bb91			; GENERIC-NEXT: ## %bb.2: ## %bb91
	; GENERIC-NEXT: xorl %eax, %eax			; GENERIC-NEXT: xorl %eax, %eax
	; GENERIC-NEXT: popq %rcx			; GENERIC-NEXT: popq %rcx
	; GENERIC-NEXT: retq			; GENERIC-NEXT: retq
	; GENERIC-NEXT: LBB1_1: ## %bb90			; GENERIC-NEXT: LBB1_1: ## %bb90
				; GENERIC-NEXT: ud2
	;			;
	; ATOM-LABEL: test2:			; ATOM-LABEL: test2:
	; ATOM: ## %bb.0: ## %entry			; ATOM: ## %bb.0: ## %entry
	; ATOM-NEXT: pushq %rax			; ATOM-NEXT: pushq %rax
	; ATOM-NEXT: callq _return_false			; ATOM-NEXT: callq _return_false
	; ATOM-NEXT: xorl %ecx, %ecx			; ATOM-NEXT: xorl %ecx, %ecx
	; ATOM-NEXT: movl $-480, %edx ## imm = 0xFE20			; ATOM-NEXT: movl $-480, %edx ## imm = 0xFE20
	; ATOM-NEXT: testb $1, %al			; ATOM-NEXT: testb $1, %al
	; ATOM-NEXT: cmovnel %ecx, %edx			; ATOM-NEXT: cmovnel %ecx, %edx
	; ATOM-NEXT: shll $3, %edx			; ATOM-NEXT: shll $3, %edx
	; ATOM-NEXT: cmpl $32768, %edx ## imm = 0x8000			; ATOM-NEXT: cmpl $32768, %edx ## imm = 0x8000
	; ATOM-NEXT: jge LBB1_1			; ATOM-NEXT: jge LBB1_1
	; ATOM-NEXT: ## %bb.2: ## %bb91			; ATOM-NEXT: ## %bb.2: ## %bb91
	; ATOM-NEXT: xorl %eax, %eax			; ATOM-NEXT: xorl %eax, %eax
	; ATOM-NEXT: popq %rcx			; ATOM-NEXT: popq %rcx
	; ATOM-NEXT: retq			; ATOM-NEXT: retq
	; ATOM-NEXT: LBB1_1: ## %bb90			; ATOM-NEXT: LBB1_1: ## %bb90
				; ATOM-NEXT: ud2
	;			;
	; MCU-LABEL: test2:			; MCU-LABEL: test2:
	; MCU: # %bb.0: # %entry			; MCU: # %bb.0: # %entry
	; MCU-NEXT: calll return_false			; MCU-NEXT: calll return_false
	; MCU-NEXT: xorl %ecx, %ecx			; MCU-NEXT: xorl %ecx, %ecx
	; MCU-NEXT: testb $1, %al			; MCU-NEXT: testb $1, %al
	; MCU-NEXT: jne .LBB1_2			; MCU-NEXT: jne .LBB1_2
	; MCU-NEXT: # %bb.1: # %entry			; MCU-NEXT: # %bb.1: # %entry
	▲ Show 20 Lines • Show All 550 Lines • ▼ Show 20 Lines
	; MCU-NEXT: movl {{[0-9]+}}(%esp), %edx			; MCU-NEXT: movl {{[0-9]+}}(%esp), %edx
	; MCU-NEXT: .LBB13_2:			; MCU-NEXT: .LBB13_2:
	; MCU-NEXT: retl			; MCU-NEXT: retl
	%cmp = icmp ne i64 %x, 0			%cmp = icmp ne i64 %x, 0
	%cond = select i1 %cmp, i64 -1, i64 %y			%cond = select i1 %cmp, i64 -1, i64 %y
	ret i64 %cond			ret i64 %cond
	}			}


	declare noalias i8* @_Znam(i64) noredzone

	define noalias i8* @test12(i64 %count) nounwind ssp noredzone {
	; GENERIC-LABEL: test12:
	; GENERIC: ## %bb.0: ## %entry
	; GENERIC-NEXT: movl $4, %ecx
	; GENERIC-NEXT: movq %rdi, %rax
	; GENERIC-NEXT: mulq %rcx
	; GENERIC-NEXT: movq $-1, %rdi
	; GENERIC-NEXT: cmovnoq %rax, %rdi
	; GENERIC-NEXT: jmp __Znam ## TAILCALL
	;
	; ATOM-LABEL: test12:
	; ATOM: ## %bb.0: ## %entry
	; ATOM-NEXT: movq %rdi, %rax
	; ATOM-NEXT: movl $4, %ecx
	; ATOM-NEXT: movq $-1, %rdi
	; ATOM-NEXT: mulq %rcx
	; ATOM-NEXT: cmovnoq %rax, %rdi
	; ATOM-NEXT: jmp __Znam ## TAILCALL
	;
	; MCU-LABEL: test12:
	; MCU: # %bb.0: # %entry
	; MCU-NEXT: pushl %ebp
	; MCU-NEXT: pushl %ebx
	; MCU-NEXT: pushl %edi
	; MCU-NEXT: pushl %esi
	; MCU-NEXT: movl %edx, %ebx
	; MCU-NEXT: movl %eax, %ebp
	; MCU-NEXT: movl $4, %ecx
	; MCU-NEXT: mull %ecx
	; MCU-NEXT: movl %eax, %esi
	; MCU-NEXT: leal (%edx,%ebx,4), %edi
	; MCU-NEXT: movl %edi, %edx
	; MCU-NEXT: pushl $0
	; MCU-NEXT: pushl $4
	; MCU-NEXT: calll __udivdi3
	; MCU-NEXT: addl $8, %esp
	; MCU-NEXT: xorl %ebx, %edx
	; MCU-NEXT: xorl %ebp, %eax
	; MCU-NEXT: orl %edx, %eax
	; MCU-NEXT: movl $-1, %eax
	; MCU-NEXT: movl $-1, %edx
	; MCU-NEXT: jne .LBB14_2
	; MCU-NEXT: # %bb.1: # %entry
	; MCU-NEXT: movl %esi, %eax
	; MCU-NEXT: movl %edi, %edx
	; MCU-NEXT: .LBB14_2: # %entry
	; MCU-NEXT: popl %esi
	; MCU-NEXT: popl %edi
	; MCU-NEXT: popl %ebx
	; MCU-NEXT: popl %ebp
	; MCU-NEXT: jmp _Znam # TAILCALL
	entry:
	%A = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %count, i64 4)
	%B = extractvalue { i64, i1 } %A, 1
	%C = extractvalue { i64, i1 } %A, 0
	%D = select i1 %B, i64 -1, i64 %C
	%call = tail call noalias i8* @_Znam(i64 %D) nounwind noredzone
	ret i8* %call
	}

	declare { i64, i1 } @llvm.umul.with.overflow.i64(i64, i64) nounwind readnone

	define i32 @test13(i32 %a, i32 %b) nounwind {			define i32 @test13(i32 %a, i32 %b) nounwind {
	; GENERIC-LABEL: test13:			; GENERIC-LABEL: test13:
	; GENERIC: ## %bb.0:			; GENERIC: ## %bb.0:
	; GENERIC-NEXT: cmpl %esi, %edi			; GENERIC-NEXT: cmpl %esi, %edi
	; GENERIC-NEXT: sbbl %eax, %eax			; GENERIC-NEXT: sbbl %eax, %eax
	; GENERIC-NEXT: retq			; GENERIC-NEXT: retq
	;			;
	; ATOM-LABEL: test13:			; ATOM-LABEL: test13:
	▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	; ATOM-NEXT: movl %esi, %eax			; ATOM-NEXT: movl %esi, %eax
	; ATOM-NEXT: nop			; ATOM-NEXT: nop
	; ATOM-NEXT: nop			; ATOM-NEXT: nop
	; ATOM-NEXT: retq			; ATOM-NEXT: retq
	;			;
	; MCU-LABEL: test18:			; MCU-LABEL: test18:
	; MCU: # %bb.0:			; MCU: # %bb.0:
	; MCU-NEXT: cmpl $15, %eax			; MCU-NEXT: cmpl $15, %eax
	; MCU-NEXT: jl .LBB20_2			; MCU-NEXT: jl .LBB19_2
	; MCU-NEXT: # %bb.1:			; MCU-NEXT: # %bb.1:
	; MCU-NEXT: movl %ecx, %edx			; MCU-NEXT: movl %ecx, %edx
	; MCU-NEXT: .LBB20_2:			; MCU-NEXT: .LBB19_2:
	; MCU-NEXT: movl %edx, %eax			; MCU-NEXT: movl %edx, %eax
	; MCU-NEXT: retl			; MCU-NEXT: retl
	%cmp = icmp slt i32 %x, 15			%cmp = icmp slt i32 %x, 15
	%sel = select i1 %cmp, i8 %a, i8 %b			%sel = select i1 %cmp, i8 %a, i8 %b
	ret i8 %sel			ret i8 %sel
	}			}

	define i32 @trunc_select_miscompile(i32 %a, i1 zeroext %cc) {			define i32 @trunc_select_miscompile(i32 %a, i1 zeroext %cc) {
	Show All 20 Lines
	define void @clamp_i8(i32 %src, i8* %dst) {			define void @clamp_i8(i32 %src, i8* %dst) {
	; GENERIC-LABEL: clamp_i8:			; GENERIC-LABEL: clamp_i8:
	; GENERIC: ## %bb.0:			; GENERIC: ## %bb.0:
	; GENERIC-NEXT: cmpl $127, %edi			; GENERIC-NEXT: cmpl $127, %edi
	; GENERIC-NEXT: movl $127, %eax			; GENERIC-NEXT: movl $127, %eax
	; GENERIC-NEXT: cmovlel %edi, %eax			; GENERIC-NEXT: cmovlel %edi, %eax
	; GENERIC-NEXT: cmpl $-128, %eax			; GENERIC-NEXT: cmpl $-128, %eax
	; GENERIC-NEXT: movb $-128, %cl			; GENERIC-NEXT: movb $-128, %cl
	; GENERIC-NEXT: jl LBB22_2			; GENERIC-NEXT: jl LBB21_2
	; GENERIC-NEXT: ## %bb.1:			; GENERIC-NEXT: ## %bb.1:
	; GENERIC-NEXT: movl %eax, %ecx			; GENERIC-NEXT: movl %eax, %ecx
	; GENERIC-NEXT: LBB22_2:			; GENERIC-NEXT: LBB21_2:
	; GENERIC-NEXT: movb %cl, (%rsi)			; GENERIC-NEXT: movb %cl, (%rsi)
	; GENERIC-NEXT: retq			; GENERIC-NEXT: retq
	;			;
	; ATOM-LABEL: clamp_i8:			; ATOM-LABEL: clamp_i8:
	; ATOM: ## %bb.0:			; ATOM: ## %bb.0:
	; ATOM-NEXT: cmpl $127, %edi			; ATOM-NEXT: cmpl $127, %edi
	; ATOM-NEXT: movl $127, %eax			; ATOM-NEXT: movl $127, %eax
	; ATOM-NEXT: movb $-128, %cl			; ATOM-NEXT: movb $-128, %cl
	; ATOM-NEXT: cmovlel %edi, %eax			; ATOM-NEXT: cmovlel %edi, %eax
	; ATOM-NEXT: cmpl $-128, %eax			; ATOM-NEXT: cmpl $-128, %eax
	; ATOM-NEXT: jl LBB22_2			; ATOM-NEXT: jl LBB21_2
	; ATOM-NEXT: ## %bb.1:			; ATOM-NEXT: ## %bb.1:
	; ATOM-NEXT: movl %eax, %ecx			; ATOM-NEXT: movl %eax, %ecx
	; ATOM-NEXT: LBB22_2:			; ATOM-NEXT: LBB21_2:
	; ATOM-NEXT: movb %cl, (%rsi)			; ATOM-NEXT: movb %cl, (%rsi)
	; ATOM-NEXT: retq			; ATOM-NEXT: retq
	;			;
	; MCU-LABEL: clamp_i8:			; MCU-LABEL: clamp_i8:
	; MCU: # %bb.0:			; MCU: # %bb.0:
	; MCU-NEXT: cmpl $127, %eax			; MCU-NEXT: cmpl $127, %eax
	; MCU-NEXT: movl $127, %ecx			; MCU-NEXT: movl $127, %ecx
	; MCU-NEXT: jg .LBB22_2			; MCU-NEXT: jg .LBB21_2
	; MCU-NEXT: # %bb.1:			; MCU-NEXT: # %bb.1:
	; MCU-NEXT: movl %eax, %ecx			; MCU-NEXT: movl %eax, %ecx
	; MCU-NEXT: .LBB22_2:			; MCU-NEXT: .LBB21_2:
	; MCU-NEXT: cmpl $-128, %ecx			; MCU-NEXT: cmpl $-128, %ecx
	; MCU-NEXT: movb $-128, %al			; MCU-NEXT: movb $-128, %al
	; MCU-NEXT: jl .LBB22_4			; MCU-NEXT: jl .LBB21_4
	; MCU-NEXT: # %bb.3:			; MCU-NEXT: # %bb.3:
	; MCU-NEXT: movl %ecx, %eax			; MCU-NEXT: movl %ecx, %eax
	; MCU-NEXT: .LBB22_4:			; MCU-NEXT: .LBB21_4:
	; MCU-NEXT: movb %al, (%edx)			; MCU-NEXT: movb %al, (%edx)
	; MCU-NEXT: retl			; MCU-NEXT: retl
	%cmp = icmp sgt i32 %src, 127			%cmp = icmp sgt i32 %src, 127
	%sel1 = select i1 %cmp, i32 127, i32 %src			%sel1 = select i1 %cmp, i32 127, i32 %src
	%cmp1 = icmp slt i32 %sel1, -128			%cmp1 = icmp slt i32 %sel1, -128
	%sel2 = select i1 %cmp1, i32 -128, i32 %sel1			%sel2 = select i1 %cmp1, i32 -128, i32 %sel1
	%conv = trunc i32 %sel2 to i8			%conv = trunc i32 %sel2 to i8
	store i8 %conv, i8* %dst, align 2			store i8 %conv, i8* %dst, align 2
	Show All 23 Lines
	; ATOM-NEXT: cmovgel %eax, %ecx			; ATOM-NEXT: cmovgel %eax, %ecx
	; ATOM-NEXT: movw %cx, (%rsi)			; ATOM-NEXT: movw %cx, (%rsi)
	; ATOM-NEXT: retq			; ATOM-NEXT: retq
	;			;
	; MCU-LABEL: clamp:			; MCU-LABEL: clamp:
	; MCU: # %bb.0:			; MCU: # %bb.0:
	; MCU-NEXT: cmpl $32767, %eax # imm = 0x7FFF			; MCU-NEXT: cmpl $32767, %eax # imm = 0x7FFF
	; MCU-NEXT: movl $32767, %ecx # imm = 0x7FFF			; MCU-NEXT: movl $32767, %ecx # imm = 0x7FFF
	; MCU-NEXT: jg .LBB23_2			; MCU-NEXT: jg .LBB22_2
	; MCU-NEXT: # %bb.1:			; MCU-NEXT: # %bb.1:
	; MCU-NEXT: movl %eax, %ecx			; MCU-NEXT: movl %eax, %ecx
	; MCU-NEXT: .LBB23_2:			; MCU-NEXT: .LBB22_2:
	; MCU-NEXT: cmpl $-32768, %ecx # imm = 0x8000			; MCU-NEXT: cmpl $-32768, %ecx # imm = 0x8000
	; MCU-NEXT: movl $32768, %eax # imm = 0x8000			; MCU-NEXT: movl $32768, %eax # imm = 0x8000
	; MCU-NEXT: jl .LBB23_4			; MCU-NEXT: jl .LBB22_4
	; MCU-NEXT: # %bb.3:			; MCU-NEXT: # %bb.3:
	; MCU-NEXT: movl %ecx, %eax			; MCU-NEXT: movl %ecx, %eax
	; MCU-NEXT: .LBB23_4:			; MCU-NEXT: .LBB22_4:
	; MCU-NEXT: movw %ax, (%edx)			; MCU-NEXT: movw %ax, (%edx)
	; MCU-NEXT: retl			; MCU-NEXT: retl
	%cmp = icmp sgt i32 %src, 32767			%cmp = icmp sgt i32 %src, 32767
	%sel1 = select i1 %cmp, i32 32767, i32 %src			%sel1 = select i1 %cmp, i32 32767, i32 %src
	%cmp1 = icmp slt i32 %sel1, -32768			%cmp1 = icmp slt i32 %sel1, -32768
	%sel2 = select i1 %cmp1, i32 -32768, i32 %sel1			%sel2 = select i1 %cmp1, i32 -32768, i32 %sel1
	%conv = trunc i32 %sel2 to i16			%conv = trunc i32 %sel2 to i16
	store i16 %conv, i16* %dst, align 2			store i16 %conv, i16* %dst, align 2
	ret void			ret void
	}			}

	define void @test19() {			define void @test19() {
	; This is a massive reduction of an llvm-stress test case that generates			; This is a massive reduction of an llvm-stress test case that generates
	; interesting chains feeding setcc and eventually a f32 select operation. This			; interesting chains feeding setcc and eventually a f32 select operation. This
	; is intended to exercise the SELECT formation in the DAG combine simplifying			; is intended to exercise the SELECT formation in the DAG combine simplifying
	; a simplified select_cc node. If it it regresses and is no longer triggering			; a simplified select_cc node. If it it regresses and is no longer triggering
	; that code path, it can be deleted.			; that code path, it can be deleted.
	;			;
	; CHECK-LABEL: test19:			; CHECK-LABEL: test19:
	; CHECK: ## %bb.0: ## %BB			; CHECK: ## %bb.0: ## %BB
	; CHECK-NEXT: movl $-1, %eax			; CHECK-NEXT: movl $-1, %eax
	; CHECK-NEXT: movb $1, %cl			; CHECK-NEXT: movb $1, %cl
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB24_1: ## %CF			; CHECK-NEXT: LBB23_1: ## %CF
	; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: testb %cl, %cl			; CHECK-NEXT: testb %cl, %cl
	; CHECK-NEXT: jne LBB24_1			; CHECK-NEXT: jne LBB23_1
	; CHECK-NEXT: ## %bb.2: ## %CF250			; CHECK-NEXT: ## %bb.2: ## %CF250
	; CHECK-NEXT: ## in Loop: Header=BB24_1 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB23_1 Depth=1
	; CHECK-NEXT: jne LBB24_1			; CHECK-NEXT: jne LBB23_1
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB24_3: ## %CF242			; CHECK-NEXT: LBB23_3: ## %CF242
	; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: cmpl %eax, %eax			; CHECK-NEXT: cmpl %eax, %eax
	; CHECK-NEXT: ucomiss %xmm0, %xmm0			; CHECK-NEXT: ucomiss %xmm0, %xmm0
	; CHECK-NEXT: jp LBB24_3			; CHECK-NEXT: jp LBB23_3
	; CHECK-NEXT: ## %bb.4: ## %CF244			; CHECK-NEXT: ## %bb.4: ## %CF244
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;			;
	; MCU-LABEL: test19:			; MCU-LABEL: test19:
	; MCU: # %bb.0: # %BB			; MCU: # %bb.0: # %BB
	; MCU-NEXT: movl $-1, %ecx			; MCU-NEXT: movl $-1, %ecx
	; MCU-NEXT: movb $1, %al			; MCU-NEXT: movb $1, %al
	; MCU-NEXT: .p2align 4, 0x90			; MCU-NEXT: .p2align 4, 0x90
	; MCU-NEXT: .LBB24_1: # %CF			; MCU-NEXT: .LBB23_1: # %CF
	; MCU-NEXT: # =>This Inner Loop Header: Depth=1			; MCU-NEXT: # =>This Inner Loop Header: Depth=1
	; MCU-NEXT: testb %al, %al			; MCU-NEXT: testb %al, %al
	; MCU-NEXT: jne .LBB24_1			; MCU-NEXT: jne .LBB23_1
	; MCU-NEXT: # %bb.2: # %CF250			; MCU-NEXT: # %bb.2: # %CF250
	; MCU-NEXT: # in Loop: Header=BB24_1 Depth=1			; MCU-NEXT: # in Loop: Header=BB23_1 Depth=1
	; MCU-NEXT: jne .LBB24_1			; MCU-NEXT: jne .LBB23_1
	; MCU-NEXT: # %bb.3: # %CF242.preheader			; MCU-NEXT: # %bb.3: # %CF242.preheader
	; MCU-NEXT: fldz			; MCU-NEXT: fldz
	; MCU-NEXT: .p2align 4, 0x90			; MCU-NEXT: .p2align 4, 0x90
	; MCU-NEXT: .LBB24_4: # %CF242			; MCU-NEXT: .LBB23_4: # %CF242
	; MCU-NEXT: # =>This Inner Loop Header: Depth=1			; MCU-NEXT: # =>This Inner Loop Header: Depth=1
	; MCU-NEXT: cmpl %eax, %ecx			; MCU-NEXT: cmpl %eax, %ecx
	; MCU-NEXT: fucom %st(0)			; MCU-NEXT: fucom %st(0)
	; MCU-NEXT: fnstsw %ax			; MCU-NEXT: fnstsw %ax
	; MCU-NEXT: # kill: def $ah killed $ah killed $ax			; MCU-NEXT: # kill: def $ah killed $ah killed $ax
	; MCU-NEXT: sahf			; MCU-NEXT: sahf
	; MCU-NEXT: jp .LBB24_4			; MCU-NEXT: jp .LBB23_4
	; MCU-NEXT: # %bb.5: # %CF244			; MCU-NEXT: # %bb.5: # %CF244
	; MCU-NEXT: fstp %st(0)			; MCU-NEXT: fstp %st(0)
	; MCU-NEXT: retl			; MCU-NEXT: retl
	BB:			BB:
	br label %CF			br label %CF

	CF:			CF:
	%Cmp10 = icmp ule i8 undef, undef			%Cmp10 = icmp ule i8 undef, undef
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: testb $1, %sil			; CHECK-NEXT: testb $1, %sil
	; CHECK-NEXT: cmovel %edi, %eax			; CHECK-NEXT: cmovel %edi, %eax
	; CHECK-NEXT: ## kill: def $ax killed $ax killed $eax			; CHECK-NEXT: ## kill: def $ax killed $ax killed $eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;			;
	; MCU-LABEL: select_xor_1b:			; MCU-LABEL: select_xor_1b:
	; MCU: # %bb.0: # %entry			; MCU: # %bb.0: # %entry
	; MCU-NEXT: testb $1, %dl			; MCU-NEXT: testb $1, %dl
	; MCU-NEXT: je .LBB26_2			; MCU-NEXT: je .LBB25_2
	; MCU-NEXT: # %bb.1:			; MCU-NEXT: # %bb.1:
	; MCU-NEXT: xorl $43, %eax			; MCU-NEXT: xorl $43, %eax
	; MCU-NEXT: .LBB26_2: # %entry			; MCU-NEXT: .LBB25_2: # %entry
	; MCU-NEXT: # kill: def $ax killed $ax killed $eax			; MCU-NEXT: # kill: def $ax killed $ax killed $eax
	; MCU-NEXT: retl			; MCU-NEXT: retl
	entry:			entry:
	%and = and i8 %cond, 1			%and = and i8 %cond, 1
	%cmp10 = icmp ne i8 %and, 1			%cmp10 = icmp ne i8 %and, 1
	%0 = xor i16 %A, 43			%0 = xor i16 %A, 43
	%1 = select i1 %cmp10, i16 %A, i16 %0			%1 = select i1 %cmp10, i16 %A, i16 %0
	ret i16 %1			ret i16 %1
	Show All 32 Lines
	; CHECK-NEXT: testb $1, %dl			; CHECK-NEXT: testb $1, %dl
	; CHECK-NEXT: cmovel %edi, %esi			; CHECK-NEXT: cmovel %edi, %esi
	; CHECK-NEXT: movl %esi, %eax			; CHECK-NEXT: movl %esi, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;			;
	; MCU-LABEL: select_xor_2b:			; MCU-LABEL: select_xor_2b:
	; MCU: # %bb.0: # %entry			; MCU: # %bb.0: # %entry
	; MCU-NEXT: testb $1, %cl			; MCU-NEXT: testb $1, %cl
	; MCU-NEXT: je .LBB28_2			; MCU-NEXT: je .LBB27_2
	; MCU-NEXT: # %bb.1:			; MCU-NEXT: # %bb.1:
	; MCU-NEXT: xorl %edx, %eax			; MCU-NEXT: xorl %edx, %eax
	; MCU-NEXT: .LBB28_2: # %entry			; MCU-NEXT: .LBB27_2: # %entry
	; MCU-NEXT: retl			; MCU-NEXT: retl
	entry:			entry:
	%and = and i8 %cond, 1			%and = and i8 %cond, 1
	%cmp10 = icmp ne i8 %and, 1			%cmp10 = icmp ne i8 %and, 1
	%0 = xor i32 %B, %A			%0 = xor i32 %B, %A
	%1 = select i1 %cmp10, i32 %A, i32 %0			%1 = select i1 %cmp10, i32 %A, i32 %0
	ret i32 %1			ret i32 %1
	}			}
	Show All 31 Lines
	; CHECK-NEXT: testb $1, %dl			; CHECK-NEXT: testb $1, %dl
	; CHECK-NEXT: cmovel %edi, %esi			; CHECK-NEXT: cmovel %edi, %esi
	; CHECK-NEXT: movl %esi, %eax			; CHECK-NEXT: movl %esi, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;			;
	; MCU-LABEL: select_or_b:			; MCU-LABEL: select_or_b:
	; MCU: # %bb.0: # %entry			; MCU: # %bb.0: # %entry
	; MCU-NEXT: testb $1, %cl			; MCU-NEXT: testb $1, %cl
	; MCU-NEXT: je .LBB30_2			; MCU-NEXT: je .LBB29_2
	; MCU-NEXT: # %bb.1:			; MCU-NEXT: # %bb.1:
	; MCU-NEXT: orl %edx, %eax			; MCU-NEXT: orl %edx, %eax
	; MCU-NEXT: .LBB30_2: # %entry			; MCU-NEXT: .LBB29_2: # %entry
	; MCU-NEXT: retl			; MCU-NEXT: retl
	entry:			entry:
	%and = and i8 %cond, 1			%and = and i8 %cond, 1
	%cmp10 = icmp ne i8 %and, 1			%cmp10 = icmp ne i8 %and, 1
	%0 = or i32 %B, %A			%0 = or i32 %B, %A
	%1 = select i1 %cmp10, i32 %A, i32 %0			%1 = select i1 %cmp10, i32 %A, i32 %0
	ret i32 %1			ret i32 %1
	}			}
	Show All 31 Lines
	; CHECK-NEXT: testb $1, %dl			; CHECK-NEXT: testb $1, %dl
	; CHECK-NEXT: cmovel %edi, %esi			; CHECK-NEXT: cmovel %edi, %esi
	; CHECK-NEXT: movl %esi, %eax			; CHECK-NEXT: movl %esi, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;			;
	; MCU-LABEL: select_or_1b:			; MCU-LABEL: select_or_1b:
	; MCU: # %bb.0: # %entry			; MCU: # %bb.0: # %entry
	; MCU-NEXT: testb $1, %cl			; MCU-NEXT: testb $1, %cl
	; MCU-NEXT: je .LBB32_2			; MCU-NEXT: je .LBB31_2
	; MCU-NEXT: # %bb.1:			; MCU-NEXT: # %bb.1:
	; MCU-NEXT: orl %edx, %eax			; MCU-NEXT: orl %edx, %eax
	; MCU-NEXT: .LBB32_2: # %entry			; MCU-NEXT: .LBB31_2: # %entry
	; MCU-NEXT: retl			; MCU-NEXT: retl
	entry:			entry:
	%and = and i32 %cond, 1			%and = and i32 %cond, 1
	%cmp10 = icmp ne i32 %and, 1			%cmp10 = icmp ne i32 %and, 1
	%0 = or i32 %B, %A			%0 = or i32 %B, %A
	%1 = select i1 %cmp10, i32 %A, i32 %0			%1 = select i1 %cmp10, i32 %A, i32 %0
	ret i32 %1			ret i32 %1
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

Improve the legalisation lowering of UMULO
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 159212

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

test/CodeGen/X86/muloti.ll

test/CodeGen/X86/select.ll

This is an archive of the discontinued LLVM Phabricator instance.

Improve the legalisation lowering of UMULOClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 159212

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

test/CodeGen/X86/muloti.ll

test/CodeGen/X86/select.ll

Improve the legalisation lowering of UMULO
ClosedPublic