This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
NVPTX/
9
fast-math.ll
-
X86/
-
select.ll

Differential D106058

[DAG] Fold select(cond,binop(x,y),binop(x,z)) -> binop(x,select(cond,y,z))
ClosedPublic

Authored by RKSimon on Jul 15 2021, 5:05 AM.

Download Raw Diff

Details

Reviewers

spatel
lebedev.ri
jholewinski
jlebar
craig.topper
tra

Commits

rG0aece73aba65: [DAG] Fold select(cond,binop(x,y),binop(x,z)) -> binop(x,select(cond,y,z))

Summary

Similar to the folds performed in InstCombinerImpl::foldSelectOpOp, this attempts to push a select further up to help merge a pair of binops.

I'm primarily interested in select(cond,add(x,y),add(x,z)) folds to help expose pointer math (see https://bugs.llvm.org/show_bug.cgi?id=51069 etc.) but I've tried to use the more generic isBinOp().

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	2,600 ms	x64 debian > libarcher.critical::critical.c
	2,720 ms	x64 debian > libarcher.parallel::parallel-firstprivate.c
	2,790 ms	x64 debian > libarcher.races::critical-unrelated.c
	2,640 ms	x64 debian > libarcher.races::lock-nested-unrelated.c
	2,920 ms	x64 debian > libarcher.races::lock-unrelated.c
		View Full Test Results (17 Failed)

Event Timeline

RKSimon created this revision.Jul 15 2021, 5:05 AM

Herald added subscribers: ecnelises, pengfei, hiraditya. · View Herald TranscriptJul 15 2021, 5:05 AM

RKSimon requested review of this revision.Jul 15 2021, 5:05 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2021, 5:05 AM

Harbormaster completed remote builds in B114207: Diff 358924.Jul 15 2021, 5:39 AM

Looks good to me.
I think NVPTX tests should just be adjusted beforehand.

llvm/test/CodeGen/NVPTX/fast-math.ll
147	I think in all of these you want to have different divisors.

This revision is now accepted and ready to land.Jul 15 2021, 6:15 AM

RKSimon mentioned this in rG3cc38703d5ab: [NVPTX] Tweak fast-math tests to avoid select(binop(x,y),binop(x,z)) fold.Jul 15 2021, 7:47 AM

rebase

RKSimon edited the summary of this revision. (Show Details)Jul 15 2021, 8:04 AM

This revision was landed with ongoing or failed builds.Jul 15 2021, 8:19 AM

Closed by commit rG0aece73aba65: [DAG] Fold select(cond,binop(x,y),binop(x,z)) -> binop(x,select(cond,y,z)) (authored by RKSimon). · Explain Why

This revision was automatically updated to reflect the committed changes.

RKSimon added a commit: rG0aece73aba65: [DAG] Fold select(cond,binop(x,y),binop(x,z)) -> binop(x,select(cond,y,z)).

Harbormaster completed remote builds in B114240: Diff 358968.Jul 15 2021, 8:58 AM

tra added inline comments.Jul 15 2021, 10:29 AM

llvm/test/CodeGen/NVPTX/fast-math.ll
150	This looks like it may be a performance regression. Judging by the name of the tests, we do expect to see reciprocal and multiplies, not `div`.

tra added inline comments.Jul 15 2021, 10:31 AM

llvm/test/CodeGen/NVPTX/fast-math.ll
150	Never mind. These changes are not in the final version of the patch.

lebedev.ri added inline comments.Jul 15 2021, 10:34 AM

llvm/test/CodeGen/NVPTX/fast-math.ll
150	Then you were already missing some kind of an inverse transform, because middle-end would have already done this fold: https://godbolt.org/z/cb38zEj9Y

RKSimon added inline comments.Jul 15 2021, 10:41 AM

llvm/test/CodeGen/NVPTX/fast-math.ll
150	I made a kuldge fix in rG3cc38703d5ab05be0b01c31f829d19b47f183c5f so it still tests the combineRepeatedFPDivisors code path which appeared to be their purpose. But as @lebedev.ri said, we're unlikely to see the original test code patterns from real world code as InstCombine will have already broken the pattern.

tra added inline comments.Jul 15 2021, 11:02 AM

llvm/test/CodeGen/NVPTX/fast-math.ll
150	This version of the patch resulted in `div` showing up in the PTX. The godbolt example above looks good and still generates rcp/mul/mul, which is what we want. If the generated PTX changes to use `div` once godbolt picks up these changes, that would be a regression. Changing the test just hides the problem. It might've been better to disable the test with a comment that the test is still correct, but that there may be an issue in NVPTX which needs to be fixed. I'm not sure yet why/how your change affectes NVPTX. I'll check what's going on.

RKSimon added inline comments.Jul 15 2021, 11:09 AM

llvm/test/CodeGen/NVPTX/fast-math.ll
150	This shows the llc output before/after opt -O3 https://godbolt.org/z/dz4PGvjEf

lebedev.ri added inline comments.Jul 15 2021, 11:12 AM

llvm/test/CodeGen/NVPTX/fast-math.ll
150	The godbolt example above looks good and still generates rcp/mul/mul, which is what we want. No it doesn't: https://godbolt.org/z/P43Y5E76E

tra added inline comments.Jul 15 2021, 11:19 AM

llvm/test/CodeGen/NVPTX/fast-math.ll
150	Your original link does produce correct PTX without div for me. However, RKSimon's example shows that there's indeed something weird going on that prevents lowering to multiplication by reciprocal. To make it clear - I have no issue with this patch. It's apparently just exposed an issue in the NVPTX back-end. I'll deal with it.

Is it intentional that two fdiv arcp get folded into fdiv w/o arcp?

This is part of what made the difference for NVPTX tests.
Debug output from llc on IR here https://godbolt.org/z/zcrh5n8n8:

SelectionDAG has 21 nodes:
  t0: ch = EntryToken
    t14: v1f32,ch = load<(dereferenceable invariant load (s32) from `float addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_3', undef:i32
  t15: f32 = extract_vector_elt t14, Constant:i32<0>
            t4: v1i8,ch = load<(dereferenceable invariant load (s8) from `i1 addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_0', undef:i32
          t5: i8 = extract_vector_elt t4, Constant:i32<0>
        t6: i1 = truncate t5
            t8: v1f32,ch = load<(dereferenceable invariant load (s32) from `float addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_1', undef:i32
          t9: f32 = extract_vector_elt t8, Constant:i32<0>
        t16: f32 = fdiv arcp t9, t15
            t11: v1f32,ch = load<(dereferenceable invariant load (s32) from `float addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_2', undef:i32
          t12: f32 = extract_vector_elt t11, Constant:i32<0>
        t17: f32 = fdiv arcp t12, t15
      t18: f32 = select t6, t16, t17
    t19: ch = NVPTXISD::StoreRetval<(store (s32), align 1)> t0, Constant:i32<0>, t18
  t20: ch = NVPTXISD::RET_FLAG t19

Combining: t20: ch = NVPTXISD::RET_FLAG t19

Combining: t19: ch = NVPTXISD::StoreRetval<(store (s32), align 1)> t0, Constant:i32<0>, t18

Combining: t18: f32 = select t6, t16, t17
Creating new node: t21: f32 = select t6, t9, t12
Creating new node: t22: f32 = fdiv t21, t15
 ... into: t22: f32 = fdiv t21, t15

Without arcp we have no choice now but to lower into a regular div instruction.

That said, even if we were to preserve arcp, we'd run into the second issue.
NVPTX itself does not know how to lower FDIV32rr_prec arcp correctly and lowers it as a regular div.

Previously two divs+select were combined into two multiplications by reciprocal and that was what made it look like we can lower div to multiplication by reciprocal.

In D106058#2881782, @tra wrote:

Is it intentional that two fdiv arcp get folded into fdiv w/o arcp?

Hm, i guess we need to intersect fast-math flags from both original instructions into the new instruction.

This is part of what made the difference for NVPTX tests.

SelectionDAG has 21 nodes:
  t0: ch = EntryToken
    t14: v1f32,ch = load<(dereferenceable invariant load (s32) from `float addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_3', undef:i32
  t15: f32 = extract_vector_elt t14, Constant:i32<0>
            t4: v1i8,ch = load<(dereferenceable invariant load (s8) from `i1 addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_0', undef:i32
          t5: i8 = extract_vector_elt t4, Constant:i32<0>
        t6: i1 = truncate t5
            t8: v1f32,ch = load<(dereferenceable invariant load (s32) from `float addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_1', undef:i32
          t9: f32 = extract_vector_elt t8, Constant:i32<0>
        t16: f32 = fdiv arcp t9, t15
            t11: v1f32,ch = load<(dereferenceable invariant load (s32) from `float addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_2', undef:i32
          t12: f32 = extract_vector_elt t11, Constant:i32<0>
        t17: f32 = fdiv arcp t12, t15
      t18: f32 = select t6, t16, t17
    t19: ch = NVPTXISD::StoreRetval<(store (s32), align 1)> t0, Constant:i32<0>, t18
  t20: ch = NVPTXISD::RET_FLAG t19

Combining: t20: ch = NVPTXISD::RET_FLAG t19

Combining: t19: ch = NVPTXISD::StoreRetval<(store (s32), align 1)> t0, Constant:i32<0>, t18

Combining: t18: f32 = select t6, t16, t17
Creating new node: t21: f32 = select t6, t9, t12
Creating new node: t22: f32 = fdiv t21, t15
 ... into: t22: f32 = fdiv t21, t15

Without arcp we have no choice now but to lower into a regular div instruction.

That said, even if we were to preserve arcp, we'd run into the second issue.
NVPTX itself does not know how to lower FDIV32rr_prec arcp correctly and lowers it as a regular div.

Previously two divs+select were combined into two multiplications by reciprocal and that was what made it look like we can lower div to multiplication by reciprocal.

OK, I'll look at adding intersected flag preservation tomorrow - cheers.

RKSimon mentioned this in rG1a6a8443c226: [DAG] Move select(cc, binop(), binop()) folds into DAGCombiner….Jul 18 2021, 6:57 AM

RKSimon mentioned this in rGfcb710a7ad4f: [NVPTX] Add select(cc,binop(),binop()) fast-math tests.Jul 18 2021, 7:40 AM

RKSimon mentioned this in rGfd7a54c70908: [DAG] DAGCombiner::foldSelectOfBinops - propagate the common flags to the….Jul 18 2021, 10:39 AM

@tra Common flag propagation should be working now - in rGfcb710a7ad4f I've added back the fast-math tests that I tweaked (I've kept the tweaked test variant as well) so it should show the missing mvptx arcp lowering. Shout on this ticket if I've missed anything.

In D106058#2887076, @RKSimon wrote:

@tra Common flag propagation should be working now - in rGfcb710a7ad4f I've added back the fast-math tests that I tweaked (I've kept the tweaked test variant as well) so it should show the missing mvptx arcp lowering. Shout on this ticket if I've missed anything.

Thank you. I've confirmed that arcp does get propagated now.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

24 lines

test/

CodeGen/

NVPTX/

fast-math.ll

16 lines

X86/

select.ll

47 lines

Diff 358924

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,640 Lines • ▼ Show 20 Lines	if (TLI.isOperationLegal(ISD::SELECT_CC, VT) \|\|
N2, N0.getOperand(2));		N2, N0.getOperand(2));
SelectNode->setFlags(Flags);		SelectNode->setFlags(Flags);
return SelectNode;		return SelectNode;
}		}

return SimplifySelect(DL, N0, N1, N2);		return SimplifySelect(DL, N0, N1, N2);
}		}

		if (N1.getOpcode() == N2.getOpcode() && TLI.isBinOp(N1.getOpcode()) &&
		N->isOnlyUserOf(N0.getNode()) && N->isOnlyUserOf(N1.getNode())) {
		// Fold select(cond, binop(x, y), binop(z, y))
		// --> binop(select(cond, x, z), y)
		if (N1.getOperand(1) == N2.getOperand(1)) {
		SDValue NewSel =
		DAG.getSelect(DL, VT, N0, N1.getOperand(0), N2.getOperand(0));
		return DAG.getNode(N1.getOpcode(), DL, VT, NewSel, N1.getOperand(1));
		}

		// Fold select(cond, binop(x, y), binop(x, z))
		// --> binop(x, select(cond, y, z))
		// Second op VT might be different (e.g. shift amount type)
		if (N1.getOperand(0) == N2.getOperand(0) &&
		VT == N1.getOperand(1).getValueType() &&
		VT == N2.getOperand(1).getValueType()) {
		SDValue NewSel =
		DAG.getSelect(DL, VT, N0, N1.getOperand(1), N2.getOperand(1));
		return DAG.getNode(N1.getOpcode(), DL, VT, N1.getOperand(0), NewSel);
		}

		// TODO: Handle isCommutativeBinOp as well ?
		}

return SDValue();		return SDValue();
}		}

// This function assumes all the vselect's arguments are CONCAT_VECTOR		// This function assumes all the vselect's arguments are CONCAT_VECTOR
// nodes and that the condition is a BV of ConstantSDNodes (or undefs).		// nodes and that the condition is a BV of ConstantSDNodes (or undefs).
static SDValue ConvertSelectToConcatVector(SDNode *N, SelectionDAG &DAG) {		static SDValue ConvertSelectToConcatVector(SDNode *N, SelectionDAG &DAG) {
SDLoc DL(N);		SDLoc DL(N);
SDValue Cond = N->getOperand(0);		SDValue Cond = N->getOperand(0);
▲ Show 20 Lines • Show All 9,991 Lines • Show Last 20 Lines

llvm/test/CodeGen/NVPTX/fast-math.ll

	Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines

	; CHECK-LABEL: fcos_approx			; CHECK-LABEL: fcos_approx
	; CHECK: cos.approx.f32			; CHECK: cos.approx.f32
	define float @fcos_approx(float %a) #0 {			define float @fcos_approx(float %a) #0 {
	%r = tail call float @llvm.cos.f32(float %a)			%r = tail call float @llvm.cos.f32(float %a)
	ret float %r			ret float %r
	}			}

	; CHECK-LABEL: repeated_div_recip_allowed			; CHECK-LABEL: repeated_div_recip_allowed
				lebedev.riUnsubmitted Not Done Reply Inline Actions I think in all of these you want to have different divisors. lebedev.ri: I think in all of these you want to have different divisors.
	define float @repeated_div_recip_allowed(i1 %pred, float %a, float %b, float %divisor) {			define float @repeated_div_recip_allowed(i1 %pred, float %a, float %b, float %divisor) {
	; CHECK: rcp.rn.f32
	; CHECK: mul.rn.f32
	; CHECK: mul.rn.f32
	; CHECK: selp.f32			; CHECK: selp.f32
				; CHECK: div.rn.f32
				traUnsubmitted Not Done Reply Inline Actions This looks like it may be a performance regression. Judging by the name of the tests, we do expect to see reciprocal and multiplies, not `div`. tra: This looks like it may be a performance regression. Judging by the name of the tests, we do…
				traUnsubmitted Not Done Reply Inline Actions Never mind. These changes are not in the final version of the patch. tra: Never mind. These changes are not in the final version of the patch.
				lebedev.riUnsubmitted Not Done Reply Inline Actions Then you were already missing some kind of an inverse transform, because middle-end would have already done this fold: https://godbolt.org/z/cb38zEj9Y lebedev.ri: Then you were already missing some kind of an inverse transform, because middle-end would have…
				RKSimonAuthorUnsubmitted Not Done Reply Inline Actions I made a kuldge fix in rG3cc38703d5ab05be0b01c31f829d19b47f183c5f so it still tests the combineRepeatedFPDivisors code path which appeared to be their purpose. But as @lebedev.ri said, we're unlikely to see the original test code patterns from real world code as InstCombine will have already broken the pattern. RKSimon: I made a kuldge fix in rG3cc38703d5ab05be0b01c31f829d19b47f183c5f so it still tests the…
				traUnsubmitted Not Done Reply Inline Actions This version of the patch resulted in `div` showing up in the PTX. The godbolt example above looks good and still generates rcp/mul/mul, which is what we want. If the generated PTX changes to use `div` once godbolt picks up these changes, that would be a regression. Changing the test just hides the problem. It might've been better to disable the test with a comment that the test is still correct, but that there may be an issue in NVPTX which needs to be fixed. I'm not sure yet why/how your change affectes NVPTX. I'll check what's going on. tra: This version of the patch resulted in `div` showing up in the PTX. The godbolt example above…
				RKSimonAuthorUnsubmitted Not Done Reply Inline Actions This shows the llc output before/after opt -O3 https://godbolt.org/z/dz4PGvjEf RKSimon: This shows the llc output before/after opt -O3 https://godbolt.org/z/dz4PGvjEf
				lebedev.riUnsubmitted Not Done Reply Inline Actions The godbolt example above looks good and still generates rcp/mul/mul, which is what we want. No it doesn't: https://godbolt.org/z/P43Y5E76E lebedev.ri: > The godbolt example above looks good and still generates rcp/mul/mul, which is what we want.
				traUnsubmitted Not Done Reply Inline Actions Your original link does produce correct PTX without div for me. However, RKSimon's example shows that there's indeed something weird going on that prevents lowering to multiplication by reciprocal. To make it clear - I have no issue with this patch. It's apparently just exposed an issue in the NVPTX back-end. I'll deal with it. tra: Your original link does produce correct PTX without div for me. However, RKSimon's example…
	%x = fdiv arcp float %a, %divisor			%x = fdiv arcp float %a, %divisor
	%y = fdiv arcp float %b, %divisor			%y = fdiv arcp float %b, %divisor
	%z = select i1 %pred, float %x, float %y			%z = select i1 %pred, float %x, float %y
	ret float %z			ret float %z
	}			}

	; CHECK-LABEL: repeated_div_recip_allowed_ftz			; CHECK-LABEL: repeated_div_recip_allowed_ftz
	define float @repeated_div_recip_allowed_ftz(i1 %pred, float %a, float %b, float %divisor) #1 {			define float @repeated_div_recip_allowed_ftz(i1 %pred, float %a, float %b, float %divisor) #1 {
	; CHECK: rcp.rn.ftz.f32
	; CHECK: mul.rn.ftz.f32
	; CHECK: mul.rn.ftz.f32
	; CHECK: selp.f32			; CHECK: selp.f32
				; CHECK: div.rn.ftz.f32
	%x = fdiv arcp float %a, %divisor			%x = fdiv arcp float %a, %divisor
	%y = fdiv arcp float %b, %divisor			%y = fdiv arcp float %b, %divisor
	%z = select i1 %pred, float %x, float %y			%z = select i1 %pred, float %x, float %y
	ret float %z			ret float %z
	}			}

	; CHECK-LABEL: repeated_div_fast			; CHECK-LABEL: repeated_div_fast
	define float @repeated_div_fast(i1 %pred, float %a, float %b, float %divisor) #0 {			define float @repeated_div_fast(i1 %pred, float %a, float %b, float %divisor) #0 {
	; CHECK: rcp.approx.f32
	; CHECK: mul.f32
	; CHECK: mul.f32
	; CHECK: selp.f32			; CHECK: selp.f32
				; CHECK: div.approx.f32
	%x = fdiv float %a, %divisor			%x = fdiv float %a, %divisor
	%y = fdiv float %b, %divisor			%y = fdiv float %b, %divisor
	%z = select i1 %pred, float %x, float %y			%z = select i1 %pred, float %x, float %y
	ret float %z			ret float %z
	}			}

	; CHECK-LABEL: repeated_div_fast_ftz			; CHECK-LABEL: repeated_div_fast_ftz
	define float @repeated_div_fast_ftz(i1 %pred, float %a, float %b, float %divisor) #0 #1 {			define float @repeated_div_fast_ftz(i1 %pred, float %a, float %b, float %divisor) #0 #1 {
	; CHECK: rcp.approx.ftz.f32
	; CHECK: mul.ftz.f32
	; CHECK: mul.ftz.f32
	; CHECK: selp.f32			; CHECK: selp.f32
				; CHECK: div.approx.ftz.f32
	%x = fdiv float %a, %divisor			%x = fdiv float %a, %divisor
	%y = fdiv float %b, %divisor			%y = fdiv float %b, %divisor
	%z = select i1 %pred, float %x, float %y			%z = select i1 %pred, float %x, float %y
	ret float %z			ret float %z
	}			}

	attributes #0 = { "unsafe-fp-math" = "true" }			attributes #0 = { "unsafe-fp-math" = "true" }
	attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }			attributes #1 = { "denormal-fp-math-f32" = "preserve-sign" }

llvm/test/CodeGen/X86/select.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-apple-darwin10 \| FileCheck %s --check-prefix=CHECK --check-prefix=GENERIC			; RUN: llc < %s -mtriple=x86_64-apple-darwin10 \| FileCheck %s --check-prefix=CHECK --check-prefix=GENERIC
	; RUN: llc < %s -mtriple=x86_64-apple-darwin10 -mcpu=atom \| FileCheck %s --check-prefix=CHECK --check-prefix=ATOM			; RUN: llc < %s -mtriple=x86_64-apple-darwin10 -mcpu=atom \| FileCheck %s --check-prefix=CHECK --check-prefix=ATOM
	; RUN: llc < %s -mtriple=i386-apple-darwin10 -mcpu=athlon \| FileCheck %s --check-prefix=ATHLON			; RUN: llc < %s -mtriple=i386-apple-darwin10 -mcpu=athlon \| FileCheck %s --check-prefix=ATHLON
	; RUN: llc < %s -mtriple=i386-intel-elfiamcu \| FileCheck %s --check-prefix=MCU			; RUN: llc < %s -mtriple=i386-intel-elfiamcu \| FileCheck %s --check-prefix=MCU

	; PR5757			; PR5757
	%0 = type { i64, i32 }			%0 = type { i64, i32 }

	define i32 @test1(%0* %p, %0* %q, i1 %r) nounwind {			define i32 @test1(%0* %p, %0* %q, i1 %r) nounwind {
	; CHECK-LABEL: test1:			; GENERIC-LABEL: test1:
	; CHECK: ## %bb.0:			; GENERIC: ## %bb.0:
	; CHECK-NEXT: addq $8, %rdi			; GENERIC-NEXT: testb $1, %dl
	; CHECK-NEXT: addq $8, %rsi			; GENERIC-NEXT: cmoveq %rsi, %rdi
	; CHECK-NEXT: testb $1, %dl			; GENERIC-NEXT: movl 8(%rdi), %eax
	; CHECK-NEXT: cmovneq %rdi, %rsi			; GENERIC-NEXT: retq
	; CHECK-NEXT: movl (%rsi), %eax			;
	; CHECK-NEXT: retq			; ATOM-LABEL: test1:
				; ATOM: ## %bb.0:
				; ATOM-NEXT: testb $1, %dl
				; ATOM-NEXT: cmoveq %rsi, %rdi
				; ATOM-NEXT: movl 8(%rdi), %eax
				; ATOM-NEXT: nop
				; ATOM-NEXT: nop
				; ATOM-NEXT: retq
	;			;
	; ATHLON-LABEL: test1:			; ATHLON-LABEL: test1:
	; ATHLON: ## %bb.0:			; ATHLON: ## %bb.0:
	; ATHLON-NEXT: movl {{[0-9]+}}(%esp), %eax
	; ATHLON-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; ATHLON-NEXT: addl $8, %ecx
	; ATHLON-NEXT: addl $8, %eax
	; ATHLON-NEXT: testb $1, {{[0-9]+}}(%esp)			; ATHLON-NEXT: testb $1, {{[0-9]+}}(%esp)
	; ATHLON-NEXT: cmovnel %ecx, %eax			; ATHLON-NEXT: leal {{[0-9]+}}(%esp), %eax
	; ATHLON-NEXT: movl (%eax), %eax			; ATHLON-NEXT: leal {{[0-9]+}}(%esp), %ecx
				; ATHLON-NEXT: cmovnel %eax, %ecx
				; ATHLON-NEXT: movl (%ecx), %eax
				; ATHLON-NEXT: movl 8(%eax), %eax
	; ATHLON-NEXT: retl			; ATHLON-NEXT: retl
	;			;
	; MCU-LABEL: test1:			; MCU-LABEL: test1:
	; MCU: # %bb.0:			; MCU: # %bb.0:
	; MCU-NEXT: testb $1, %cl			; MCU-NEXT: testb $1, %cl
	; MCU-NEXT: jne .LBB0_1			; MCU-NEXT: jne .LBB0_2
	; MCU-NEXT: # %bb.2:			; MCU-NEXT: # %bb.1:
	; MCU-NEXT: addl $8, %edx			; MCU-NEXT: movl %edx, %eax
	; MCU-NEXT: movl (%edx), %eax			; MCU-NEXT: .LBB0_2:
	; MCU-NEXT: retl			; MCU-NEXT: movl 8(%eax), %eax
	; MCU-NEXT: .LBB0_1:
	; MCU-NEXT: addl $8, %eax
	; MCU-NEXT: movl (%eax), %eax
	; MCU-NEXT: retl			; MCU-NEXT: retl
	%t0 = load %0, %0* %p			%t0 = load %0, %0* %p
	%t1 = load %0, %0* %q			%t1 = load %0, %0* %q
	%t4 = select i1 %r, %0 %t0, %0 %t1			%t4 = select i1 %r, %0 %t0, %0 %t1
	%t5 = extractvalue %0 %t4, 1			%t5 = extractvalue %0 %t4, 1
	ret i32 %t5			ret i32 %t5
	}			}

	▲ Show 20 Lines • Show All 1,494 Lines • Show Last 20 Lines