This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
1
DAGCombiner.cpp
-
test/
-
CodeGen/
-
AArch64/
-
aarch64_f16_be.ll
-
and-mask-removal.ll
-
ARM/
-
Windows/
-
alloca.ll
-
alloc-no-stack-realign.ll
-
big-endian-ret-f64.ll
-
vst3.ll
-
X86/
-
atomic16.ll
-
atomic32.ll
-
atomic6432.ll
-
dag-optnone.ll
-
fast-isel-gep.ll
-
fastmath-optnone.ll
-
inline-asm-tied.ll
-
musttail.ll
-
switch.ll
-
win32_sret.ll
-
win64_eh.ll
-
XCore/
-
threads.ll
-
DebugInfo/
-
ARM/
-
line.test
-
X86/
-
op_deref.ll
-
vla.ll
-
tools/llvm-symbolizer/
-
llvm-symbolizer/
-
ppc64.test

Differential D9992

Disable DAGCombine for -O0 and optnone
Needs ReviewPublic

Authored by iid_iunknown on May 25 2015, 8:00 AM.

Download Raw Diff

Details

Reviewers

grosbach
delena
echristo

Summary

Yet another attempt to disable DAGCombine for -O0 and optnone for better debugging experience.

Some additional info on the subject can be found in D7181, D8614 and in PR22346.

The previous attempt has been reverted due to the failures in instruction selection, which depends on the transforms done from DAGCombine. DAGCombine calls LegalizeOp that performs transformations such as ConstantPool -> <TargetConstantPool + Wrapper> and others. Their absence breaks instruction selection.

The patch disables combine leaving LegalizeOp enabled to satisfy instruction selection. The dependent tests were adjusted to reflect the changes in the code generated w/o combine.
It also has a fix in ARMISelDAGToDAG.cpp that prevents llc from crashing when compiling the CodeGen/ARM/2010-05-18-LocalAllocCrash.ll test with DAGCombine disabled, as described in PR22346.

I might be missing some important points so would like to have a sharp community's eye on this and hear any concerns / criticism about the patch.

*UPDATE*
The new version of the patch fixes broken FMA lowering on X86 (please find the comment from Michael Kuperstein).

Previously X86 FMA lowering was done at the combine step thereby requiring DAGCombine to get proper target specific FMA nodes. The patch moves this logic to the Legalize step to make FMA lowering independent on DAGCombine. It also adds a new CodeGen/X86/fma-no-dag-combine.ll test.

This broke fma_patterns.ll and other tests that expected operations like a * b - c to be combined into (fma a, b, (fneg c)). The patch was producing vfmadd instead of the expected vfmsub. The reason for this is that FNEG lowering is called before the added FMA lowering. FNEG gets transformed into other nodes making FMA lowering unable to match the pattern for vfmsub. To overcome this, the patch skips FNEG lowering if it is an FMA operand (LowerFABSorFNEG in X86ISelLowering.cpp).

Diff Detail

Event Timeline

iid_iunknown updated this revision to Diff 26436.May 25 2015, 8:00 AM

iid_iunknown retitled this revision from to Disable DAGCombine for -O0 and optnone.

iid_iunknown updated this object.

iid_iunknown edited the test plan for this revision. (Show Details)

iid_iunknown added reviewers: probinson, echristo, grosbach.

iid_iunknown set the repository for this revision to rL LLVM.

iid_iunknown added a subscriber: Unknown Object (MLST).

asl added a subscriber: asl.May 25 2015, 8:14 AM

I had one inline comment about how to test for the correct conditions.

I think the updated 'dag-optnone.ll' already covers everything that's in 'fastmath-optnone.ll' so you don't need to add the latter as a new test.

I'm not qualified to review all the other test changes, someone else will need to do that.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1204	It should be sufficient to test OptLevel here. OptimizeNone should have reset OptLevel for the current function, so you should not need to check the attribute directly.

In D9992#178684, @probinson wrote:

Thank you for your feedback Paul!
I will update the patch according to your remarks. New version will also contain a fix for the CodeGen/ARM/2010-05-18-LocalAllocCrash.ll test, which crashes without DAGCombine as you noticed in PR22346. Just missed it somehow when creating the patch file.

Corrections according to the remarks from Paul.
Fix for CodeGen/ARM/2010-05-18-LocalAllocCrash.ll crash (ARMISelDAGToDAG.cpp).

iid_iunknown updated this object.May 27 2015, 2:10 PM

So, there's something that I'm fairly sure will break on x86, and it doesn't seem to be covered by a test. :-(
The x86 PerformFMACombine actually performs an essential part of isel - it lowers a target-independent ISD node into a target-dependent one.

Now, for cases where the FMA comes from an x86 intrinsic, it's not an issue, since we never get the target-independent ISD. For cases where the FMA itself is constructed by a DAGCombine (which is what fma_patterns tests), it's not an issue either, because the FMA never gets formed. But it will be hit when the FMA comes from a target-independent intrinsic.

TL;DR:
Try compiling this with -mattr=+fma -O0 with your patch, I expect it to fail:

declare <4 x float> @llvm.fma.v4f32(<4 x float> %a, <4 x float>  %b, <4 x float>  %c)

define <4 x float> @test_fma1(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) {
  %res = call <4 x float> @llvm.fma.v4f32(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2)
  ret <4 x float> %res
}

In D9992#179746, @mkuper wrote:
So, there's something that I'm fairly sure will break on x86, and it doesn't seem to be covered by a test. :-(
The x86 PerformFMACombine actually performs an essential part of isel - it lowers a target-independent ISD node into a target-dependent one.

Now, for cases where the FMA comes from an x86 intrinsic, it's not an issue, since we never get the target-independent ISD. For cases where the FMA itself is constructed by a DAGCombine (which is what fma_patterns tests), it's not an issue either, because the FMA never gets formed. But it will be hit when the FMA comes from a target-independent intrinsic.

TL;DR:
Try compiling this with -mattr=+fma -O0 with your patch, I expect it to fail:
declare <4 x float> @llvm.fma.v4f32(<4 x float> %a, <4 x float>  %b, <4 x float>  %c)

define <4 x float> @test_fma1(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) {
  %res = call <4 x float> @llvm.fma.v4f32(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2)
  ret <4 x float> %res
}

Wow... then it should be a part of Legalize step.

In D9992#179746, @mkuper wrote:
So, there's something that I'm fairly sure will break on x86, and it doesn't seem to be covered by a test. :-(
The x86 PerformFMACombine actually performs an essential part of isel - it lowers a target-independent ISD node into a target-dependent one.

Now, for cases where the FMA comes from an x86 intrinsic, it's not an issue, since we never get the target-independent ISD. For cases where the FMA itself is constructed by a DAGCombine (which is what fma_patterns tests), it's not an issue either, because the FMA never gets formed. But it will be hit when the FMA comes from a target-independent intrinsic.

TL;DR:
Try compiling this with -mattr=+fma -O0 with your patch, I expect it to fail:
declare <4 x float> @llvm.fma.v4f32(<4 x float> %a, <4 x float>  %b, <4 x float>  %c)

define <4 x float> @test_fma1(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2) {
  %res = call <4 x float> @llvm.fma.v4f32(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2)
  ret <4 x float> %res
}

Thank you Michael! You are right - your test fails with the patch.
The next version of the patch moves FMA lowering on X86 from combine to the Legalize step, thus it no longer depends on DAGCombine availability. It also adds a test you provided. I will update the review summary with the description of the changes made. Would you be able to take a look, please?

Fix for broken FMA lowering on X86 according to the comment from Michael Kuperstein.

Herald added a subscriber: aemerson. · View Herald TranscriptJun 2 2015, 10:27 AM

compnerd added a subscriber: compnerd.Jun 2 2015, 11:51 AM

I'm not sure it's a good idea to move the entire combine into lowering.
The FNEG treatment is an actual combine. And I'm not 100% it will work correctly now (e.g. in cases where there are multiple users for the FNEG).

It would probably be better to leave the combine as is, but add a trivial "safety-net" lowering for ISD::FMA -> X86ISD::FMADD.

Adding Elena, who knows this code better than I do.

All lit tests should pass in the -O0 mode. The code may be not optimal, but the compilation should not fail and the generated code should correct.

lib/Target/X86/X86ISelLowering.cpp
17616 ↗	(On Diff #26984)	You should not do the work of DAG-combiner in lowering. Lowering should deal with one SDNode. Please put the FMA-combine back and just translate ISD::FMA to X86ISD::FMADD here.

In D9992#182775, @mkuper wrote:

I'm not sure it's a good idea to move the entire combine into lowering.
The FNEG treatment is an actual combine. And I'm not 100% it will work correctly now (e.g. in cases where there are multiple users for the FNEG).

It would probably be better to leave the combine as is, but add a trivial "safety-net" lowering for ISD::FMA -> X86ISD::FMADD.

Thanks Michael. I will look into this.

In D9992#182779, @delena wrote:

All lit tests should pass in the -O0 mode. The code may be not optimal, but the compilation should not fail and the generated code should correct.

Not sure I get this point correctly.
Many tests currently run without -O0 and their CHECK's expect some optimizations to happen. This doesn't mean the tests won't compile with -O0. This means the CHECK's will have to be changed if -O0 is used.

The patch has disabled DAGCombine for -O0. This broke some tests running with -O0 as their CHECK's were written with DAGCombine in mind. I have changed the CHECK's for those tests whose differences in the code generated with and without DAGCombine in -O0 were relatively small. Several tests however had quite significant differences in the generated code, and required full redesign of their CHECK's. Instead, DAGCombine was enabled for them by changing -O0 -> -O1 to get nearly the same code as the CHECK's expected. Please let me know if this is not the best way to handle such tests and we should rather stick to the 1st option (leave -O0 and redesign the CHECK's).

lib/Target/X86/X86ISelLowering.cpp
17616 ↗	(On Diff #26984)	Thank you for the clarification Elena. I will submit a new patch shortly.

I just afraid that some tests will fail to select instruction if you run them without DAGCombine.
So you should run ALL tests with O0 and be sure that they don't fail.

Elena

I think what Elena means is that it may be a good idea to check that the tests that currently run with -O3 don't *crash* (because of selection failures) when run with -O0.
They can, of course, fail due to unmatched CHECK lines.

probinson resigned from this revision.Jan 19 2016, 2:39 PM

probinson removed a reviewer: probinson.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

	DAGCombiner.cpp
	DAGCombiner.cpp (revision 237899)

85 lines

test/

CodeGen/

AArch64/

	aarch64_f16_be.ll
	aarch64_f16_be.ll (revision 237899)

10 lines

	and-mask-removal.ll
	and-mask-removal.ll (revision 237899)

2 lines

ARM/

Windows/

	alloca.ll
	alloca.ll (revision 237899)

2 lines

	alloc-no-stack-realign.ll
	alloc-no-stack-realign.ll (revision 237899)

54 lines

	big-endian-ret-f64.ll
	big-endian-ret-f64.ll (revision 237899)

4 lines

	vst3.ll
	vst3.ll (revision 237899)

2 lines

X86/

	atomic16.ll
	atomic16.ll (revision 237899)

20 lines

	atomic32.ll
	atomic32.ll (revision 237899)

16 lines

	atomic6432.ll
	atomic6432.ll (revision 237899)

12 lines

	dag-optnone.ll
	dag-optnone.ll (revision 237899)

45 lines

	fast-isel-gep.ll
	fast-isel-gep.ll (revision 237899)

3 lines

	fastmath-optnone.ll
	fastmath-optnone.ll (revision 0)

35 lines

	inline-asm-tied.ll
	inline-asm-tied.ll (revision 237899)

5 lines

	musttail.ll
	musttail.ll (revision 237899)

6 lines

	switch.ll
	switch.ll (revision 237899)

30 lines

	win32_sret.ll
	win32_sret.ll (revision 237899)

14 lines

	win64_eh.ll
	win64_eh.ll (revision 237899)

14 lines

XCore/

	threads.ll
	threads.ll (revision 237899)

4 lines

DebugInfo/

ARM/

	line.test
	line.test (revision 237899)

7 lines

X86/

	op_deref.ll
	op_deref.ll (revision 237899)

4 lines

	vla.ll
	vla.ll (revision 237899)

4 lines

tools/

llvm-symbolizer/

	ppc64.test
	ppc64.test (revision 237899)

2 lines

Diff 26436

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,192 Lines • ▼ Show 20 Lines
// Main DAG Combiner implementation		// Main DAG Combiner implementation
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void DAGCombiner::Run(CombineLevel AtLevel) {		void DAGCombiner::Run(CombineLevel AtLevel) {
// set the instance variables, so that the various visit routines may use it.		// set the instance variables, so that the various visit routines may use it.
Level = AtLevel;		Level = AtLevel;
LegalOperations = Level >= AfterLegalizeVectorOps;		LegalOperations = Level >= AfterLegalizeVectorOps;
LegalTypes = Level >= AfterLegalizeTypes;		LegalTypes = Level >= AfterLegalizeTypes;
		bool LegalizeOps = Level == AfterLegalizeDAG;
		bool DoCombine = OptLevel != CodeGenOpt::None &&
		!DAG.getMachineFunction().getFunction()->hasFnAttribute(
		Attribute::OptimizeNone);
		probinsonUnsubmitted Not Done Reply Inline Actions It should be sufficient to test OptLevel here. OptimizeNone should have reset OptLevel for the current function, so you should not need to check the attribute directly. probinson: It should be sufficient to test OptLevel here. OptimizeNone should have reset OptLevel for the…

		if (!DoCombine && !LegalizeOps)
		return;

// Add all the dag nodes to the worklist.		// Add all the dag nodes to the worklist.
for (SelectionDAG::allnodes_iterator I = DAG.allnodes_begin(),		for (SelectionDAG::allnodes_iterator I = DAG.allnodes_begin(),
E = DAG.allnodes_end(); I != E; ++I)		E = DAG.allnodes_end(); I != E; ++I)
AddToWorklist(I);		AddToWorklist(I);

// Create a dummy node (which is not added to allnodes), that adds a reference		// Create a dummy node (which is not added to allnodes), that adds a reference
// to the root node, preventing it from being deleted, and tracking any		// to the root node, preventing it from being deleted, and tracking any
Show All 19 Lines	while (!WorklistMap.empty()) {
// reduced number of uses, allowing other xforms.		// reduced number of uses, allowing other xforms.
if (recursivelyDeleteUnusedNodes(N))		if (recursivelyDeleteUnusedNodes(N))
continue;		continue;

WorklistRemover DeadNodes(*this);		WorklistRemover DeadNodes(*this);

// If this combine is running after legalizing the DAG, re-legalize any		// If this combine is running after legalizing the DAG, re-legalize any
// nodes pulled off the worklist.		// nodes pulled off the worklist.
if (Level == AfterLegalizeDAG) {		if (LegalizeOps) {
SmallSetVector<SDNode *, 16> UpdatedNodes;		SmallSetVector<SDNode *, 16> UpdatedNodes;
bool NIsValid = DAG.LegalizeOp(N, UpdatedNodes);		bool NIsValid = DAG.LegalizeOp(N, UpdatedNodes);

for (SDNode *LN : UpdatedNodes) {		for (SDNode *LN : UpdatedNodes) {
AddToWorklist(LN);		AddToWorklist(LN);
AddUsersToWorklist(LN);		AddUsersToWorklist(LN);
}		}
if (!NIsValid)		if (!NIsValid)
continue;		continue;
}		}

		if (DoCombine) {
DEBUG(dbgs() << "\nCombining: "; N->dump(&DAG));		DEBUG(dbgs() << "\nCombining: "; N->dump(&DAG));

// Add any operands of the new node which have not yet been combined to the		// Add any operands of the new node which have not yet been combined to the
// worklist as well. Because the worklist uniques things already, this		// worklist as well. Because the worklist uniques things already, this
// won't repeatedly process the same operand.		// won't repeatedly process the same operand.
CombinedNodes.insert(N);		CombinedNodes.insert(N);
for (unsigned i = 0, e = N->getNumOperands(); i != e; ++i)		for (unsigned i = 0, e = N->getNumOperands(); i != e; ++i)
if (!CombinedNodes.count(N->getOperand(i).getNode()))		if (!CombinedNodes.count(N->getOperand(i).getNode()))
AddToWorklist(N->getOperand(i).getNode());		AddToWorklist(N->getOperand(i).getNode());

SDValue RV = combine(N);		SDValue RV = combine(N);

if (!RV.getNode())		if (!RV.getNode())
continue;		continue;

++NodesCombined;		++NodesCombined;

// If we get back the same node we passed in, rather than a new node or		// If we get back the same node we passed in, rather than a new node or
// zero, we know that the node must have defined multiple values and		// zero, we know that the node must have defined multiple values and
// CombineTo was used. Since CombineTo takes care of the worklist		// CombineTo was used. Since CombineTo takes care of the worklist
// mechanics for us, we have no work to do in this case.		// mechanics for us, we have no work to do in this case.
if (RV.getNode() == N)		if (RV.getNode() == N)
continue;		continue;

assert(N->getOpcode() != ISD::DELETED_NODE &&		assert(N->getOpcode() != ISD::DELETED_NODE &&
RV.getNode()->getOpcode() != ISD::DELETED_NODE &&		RV.getNode()->getOpcode() != ISD::DELETED_NODE &&
"Node was deleted but visit returned new node!");		"Node was deleted but visit returned new node!");

DEBUG(dbgs() << " ... into: ";		DEBUG(dbgs() << " ... into: ";
RV.getNode()->dump(&DAG));		RV.getNode()->dump(&DAG));

// Transfer debug value.		// Transfer debug value.
DAG.TransferDbgValues(SDValue(N, 0), RV);		DAG.TransferDbgValues(SDValue(N, 0), RV);
if (N->getNumValues() == RV.getNode()->getNumValues())		if (N->getNumValues() == RV.getNode()->getNumValues())
DAG.ReplaceAllUsesWith(N, RV.getNode());		DAG.ReplaceAllUsesWith(N, RV.getNode());
else {		else {
assert(N->getValueType(0) == RV.getValueType() &&		assert(N->getValueType(0) == RV.getValueType() &&
N->getNumValues() == 1 && "Type mismatch");		N->getNumValues() == 1 && "Type mismatch");
SDValue OpV = RV;		SDValue OpV = RV;
DAG.ReplaceAllUsesWith(N, &OpV);		DAG.ReplaceAllUsesWith(N, &OpV);
}		}

// Push the new node and any users onto the worklist		// Push the new node and any users onto the worklist
AddToWorklist(RV.getNode());		AddToWorklist(RV.getNode());
AddUsersToWorklist(RV.getNode());		AddUsersToWorklist(RV.getNode());
		}

// Finally, if the node is now dead, remove it from the graph. The node		// Finally, if the node is now dead, remove it from the graph. The node
// may not be dead if the replacement process recursively simplified to		// may not be dead if the replacement process recursively simplified to
// something else needing this node. This will also take care of adding any		// something else needing this node. This will also take care of adding any
// operands which have lost a user to the worklist.		// operands which have lost a user to the worklist.
recursivelyDeleteUnusedNodes(N);		recursivelyDeleteUnusedNodes(N);
}		}

▲ Show 20 Lines • Show All 12,713 Lines • Show Last 20 Lines

test/CodeGen/AArch64/aarch64_f16_be.ll

Show All 26 Lines	; CHECK-BE: st1
ret void		ret void
}		}

define void @test_bitcast_v8f16_to_fp128(<8 x half> %a) {		define void @test_bitcast_v8f16_to_fp128(<8 x half> %a) {
; CHECK-LABEL: test_bitcast_v8f16_to_fp128:		; CHECK-LABEL: test_bitcast_v8f16_to_fp128:
; CHECK-NOT: st1		; CHECK-NOT: st1

; CHECK-BE-LABEL: test_bitcast_v8f16_to_fp128:		; CHECK-BE-LABEL: test_bitcast_v8f16_to_fp128:
; CHECK-BE: st1		; CHECK-BE: rev64
		; CHECK-BE: ext
		; CHECK-BE: rev64
		; CHECK-BE: ext
		; CHECK-BE: str

%x = alloca fp128, align 16		%x = alloca fp128, align 16
%y = bitcast <8 x half> %a to fp128		%y = bitcast <8 x half> %a to fp128
store fp128 %y, fp128* %x, align 16		store fp128 %y, fp128* %x, align 16
ret void		ret void
}		}

define void @test_bitcast_v4f16_to_v2f32(<4 x half> %a) {		define void @test_bitcast_v4f16_to_v2f32(<4 x half> %a) {
Show All 9 Lines	; CHECK-BE: st1
ret void		ret void
}		}

define void @test_bitcast_v4f16_to_v1f64(<4 x half> %a) {		define void @test_bitcast_v4f16_to_v1f64(<4 x half> %a) {
; CHECK-LABEL: test_bitcast_v4f16_to_v1f64:		; CHECK-LABEL: test_bitcast_v4f16_to_v1f64:
; CHECK-NOT: st1		; CHECK-NOT: st1

; CHECK-BE-LABEL: test_bitcast_v4f16_to_v1f64:		; CHECK-BE-LABEL: test_bitcast_v4f16_to_v1f64:
; CHECK-BE: st1		; CHECK-BE: rev64
		; CHECK-BE: rev64
		; CHECK-BE: str

%x = alloca <1 x double>, align 8		%x = alloca <1 x double>, align 8
%y = bitcast <4 x half> %a to <1 x double>		%y = bitcast <4 x half> %a to <1 x double>
store <1 x double> %y, <1 x double>* %x, align 8		store <1 x double> %y, <1 x double>* %x, align 8
ret void		ret void
}		}

test/CodeGen/AArch64/and-mask-removal.ll

	; RUN: llc -O0 -fast-isel=false -mtriple=arm64-apple-darwin < %s \| FileCheck %s			; RUN: llc -O1 -fast-isel=false -mtriple=arm64-apple-darwin < %s \| FileCheck %s

	@board = common global [400 x i8] zeroinitializer, align 1			@board = common global [400 x i8] zeroinitializer, align 1
	@next_string = common global i32 0, align 4			@next_string = common global i32 0, align 4
	@string_number = common global [400 x i32] zeroinitializer, align 4			@string_number = common global [400 x i32] zeroinitializer, align 4

	; Function Attrs: nounwind ssp			; Function Attrs: nounwind ssp
	define void @new_position(i32 %pos) {			define void @new_position(i32 %pos) {
	entry:			entry:
	▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

test/CodeGen/ARM/Windows/alloca.ll

	; RUN: llc -O0 -mtriple thumbv7-windows-itanium -filetype asm -o - %s \| FileCheck %s			; RUN: llc -O1 -mtriple thumbv7-windows-itanium -filetype asm -o - %s \| FileCheck %s

	declare arm_aapcs_vfpcc i32 @num_entries()			declare arm_aapcs_vfpcc i32 @num_entries()

	define arm_aapcs_vfpcc void @test___builtin_alloca() {			define arm_aapcs_vfpcc void @test___builtin_alloca() {
	entry:			entry:
	%array = alloca i8*, align 4			%array = alloca i8*, align 4
	%call = call arm_aapcs_vfpcc i32 @num_entries()			%call = call arm_aapcs_vfpcc i32 @num_entries()
	%mul = mul i32 4, %call			%mul = mul i32 4, %call
	Show All 13 Lines

test/CodeGen/ARM/alloc-no-stack-realign.ll

	; RUN: llc < %s -mtriple=armv7-apple-ios -O0 \| FileCheck %s -check-prefix=NO-REALIGN			; RUN: llc < %s -mtriple=armv7-apple-ios -O1 \| FileCheck %s -check-prefix=NO-REALIGN
	; RUN: llc < %s -mtriple=armv7-apple-ios -O0 \| FileCheck %s -check-prefix=REALIGN			; RUN: llc < %s -mtriple=armv7-apple-ios -O1 \| FileCheck %s -check-prefix=REALIGN

	; rdar://12713765			; rdar://12713765
	; When realign-stack is set to false, make sure we are not creating stack			; When realign-stack is set to false, make sure we are not creating stack
	; objects that are assumed to be 64-byte aligned.			; objects that are assumed to be 64-byte aligned.
	@T3_retval = common global <16 x float> zeroinitializer, align 16			@T3_retval = common global <16 x float> zeroinitializer, align 16

	define void @test1(<16 x float>* noalias sret %agg.result) nounwind ssp "no-realign-stack" {			define void @test1(<16 x float>* noalias sret %agg.result) nounwind ssp "no-realign-stack" {
	entry:			entry:
	; NO-REALIGN-LABEL: test1			; NO-REALIGN-LABEL: test1
	; NO-REALIGN: mov r[[R2:[0-9]+]], r[[R1:[0-9]+]]			; NO-REALIGN: mov r[[R2:[0-9]+]], r[[R1:[0-9]+]]
	; NO-REALIGN: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]!			; NO-REALIGN: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]!
	; NO-REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
	; NO-REALIGN: add r[[R2:[0-9]+]], r[[R1]], #32			; NO-REALIGN: add r[[R9:[0-9]+]], r[[R1]], #32
	; NO-REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]			; NO-REALIGN: add r[[R1]], r[[R1]], #48
	; NO-REALIGN: add r[[R2:[0-9]+]], r[[R1]], #48			; NO-REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
				; NO-REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R9]]:128]
	; NO-REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]			; NO-REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]

	; NO-REALIGN: add r[[R2:[0-9]+]], r[[R1:[0-9]+]], #48			; NO-REALIGN: add r[[R1:[0-9]+]], r[[R3:[0-9]+]], #48
	; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
	; NO-REALIGN: add r[[R2:[0-9]+]], r[[R1]], #32
	; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
	; NO-REALIGN: vst1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]!
	; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]			; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
				; NO-REALIGN: add r[[R1:[0-9]+]], r[[R3:[0-9]+]], #32
				; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
				; NO-REALIGN: vst1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R3]]:128]!
				; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R3]]:128]

	; NO-REALIGN: add r[[R2:[0-9]+]], r[[R0:0]], #48			; NO-REALIGN: add r[[R1:[0-9]+]], r[[R0:0]], #48
	; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]			; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
	; NO-REALIGN: add r[[R2:[0-9]+]], r[[R0]], #32			; NO-REALIGN: add r[[R1:[0-9]+]], r[[R0]], #32
	; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]			; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
	; NO-REALIGN: vst1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R0]]:128]!			; NO-REALIGN: vst1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R0]]:128]!
	; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R0]]:128]			; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R0]]:128]

	%retval = alloca <16 x float>, align 16			%retval = alloca <16 x float>, align 16
	%0 = load <16 x float>, <16 x float>* @T3_retval, align 16			%0 = load <16 x float>, <16 x float>* @T3_retval, align 16
	store <16 x float> %0, <16 x float>* %retval			store <16 x float> %0, <16 x float>* %retval
	%1 = load <16 x float>, <16 x float>* %retval			%1 = load <16 x float>, <16 x float>* %retval
	store <16 x float> %1, <16 x float>* %agg.result, align 16			store <16 x float> %1, <16 x float>* %agg.result, align 16
	ret void			ret void
	}			}

	define void @test2(<16 x float>* noalias sret %agg.result) nounwind ssp {			define void @test2(<16 x float>* noalias sret %agg.result) nounwind ssp {
	entry:			entry:
	; REALIGN-LABEL: test2			; REALIGN-LABEL: test2
	; REALIGN: bfc sp, #0, #6			; REALIGN: bfc sp, #0, #6
	; REALIGN: mov r[[R2:[0-9]+]], r[[R1:[0-9]+]]			; REALIGN: mov r[[R2:[0-9]+]], r[[R1:[0-9]+]]
	; REALIGN: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]!			; REALIGN: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]!
	; REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]			; REALIGN: add r[[R9:[0-9]+]], r[[R1]], #32
	; REALIGN: add r[[R2:[0-9]+]], r[[R1]], #32			; REALIGN: add r[[R1:[0-9]+]], r[[R1]], #48
	; REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]			; REALIGN: mov r[[R3:[0-9]+]], sp
	; REALIGN: add r[[R2:[0-9]+]], r[[R1]], #48			; REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
				; REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R9]]:128]
	; REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]			; REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]

				; REALIGN: orr r[[R1:[0-9]+]], r[[R3:[0-9]+]], #48
	; REALIGN: orr r[[R2:[0-9]+]], r[[R1:[0-9]+]], #48			; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
	; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]			; REALIGN: orr r[[R1:[0-9]+]], r[[R3]], #32
	; REALIGN: orr r[[R2:[0-9]+]], r[[R1]], #32			; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
	; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]			; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R3]]:128]
	; REALIGN: orr r[[R2:[0-9]+]], r[[R1]], #16			; REALIGN: orr r[[R1:[0-9]+]], r[[R3]], #16
	; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
	; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]			; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]

	; REALIGN: add r[[R1:[0-9]+]], r[[R0:0]], #48			; REALIGN: add r[[R1:[0-9]+]], r[[R0:0]], #48
	; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]			; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
	; REALIGN: add r[[R1:[0-9]+]], r[[R0]], #32			; REALIGN: add r[[R1:[0-9]+]], r[[R0]], #32
	; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]			; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
	; REALIGN: vst1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R0]]:128]!			; REALIGN: vst1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R0]]:128]!
	; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R0]]:128]			; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R0]]:128]
	%retval = alloca <16 x float>, align 16			%retval = alloca <16 x float>, align 16
	%0 = load <16 x float>, <16 x float>* @T3_retval, align 16			%0 = load <16 x float>, <16 x float>* @T3_retval, align 16
	store <16 x float> %0, <16 x float>* %retval			store <16 x float> %0, <16 x float>* %retval
	%1 = load <16 x float>, <16 x float>* %retval			%1 = load <16 x float>, <16 x float>* %retval
	store <16 x float> %1, <16 x float>* %agg.result, align 16			store <16 x float> %1, <16 x float>* %agg.result, align 16
	ret void			ret void
	}			}

test/CodeGen/ARM/big-endian-ret-f64.ll

	; RUN: llc -mtriple=armebv7a-eabi %s -O0 -o - \| FileCheck %s			; RUN: llc -mtriple=armebv7a-eabi %s -O0 -o - \| FileCheck %s
	; RUN: llc -mtriple=armebv8a-eabi %s -O0 -o - \| FileCheck %s			; RUN: llc -mtriple=armebv8a-eabi %s -O0 -o - \| FileCheck %s

	define double @fn() {			define double @fn() {
	; CHECK-LABEL: fn			; CHECK-LABEL: fn
	; CHECK: ldr r0, [sp]			; CHECK: vldr [[REG:d[0-9]+]], [sp]
	; CHECK: ldr r1, [sp, #4]			; CHECK: vmov r1, r0, [[REG]]
	%r = alloca double, align 8			%r = alloca double, align 8
	%1 = load double, double* %r, align 8			%1 = load double, double* %r, align 8
	ret double %1			ret double %1
	}			}

test/CodeGen/ARM/vst3.ll

	; RUN: llc -mtriple=arm-eabi -mattr=+neon -fast-isel=0 -O0 %s -o - \| FileCheck %s			; RUN: llc -mtriple=arm-eabi -mattr=+neon -fast-isel=0 -O1 %s -o - \| FileCheck %s

	define void @vst3i8(i8* %A, <8 x i8>* %B) nounwind {			define void @vst3i8(i8* %A, <8 x i8>* %B) nounwind {
	;CHECK-LABEL: vst3i8:			;CHECK-LABEL: vst3i8:
	;Check the alignment value. Max for this instruction is 64 bits:			;Check the alignment value. Max for this instruction is 64 bits:
	;This test runs at -O0 so do not check for specific register numbers.			;This test runs at -O0 so do not check for specific register numbers.
	;CHECK: vst3.8 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}:64]			;CHECK: vst3.8 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}:64]
	%tmp1 = load <8 x i8>, <8 x i8>* %B			%tmp1 = load <8 x i8>, <8 x i8>* %B
	call void @llvm.arm.neon.vst3.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 32)			call void @llvm.arm.neon.vst3.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 32)
	▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

test/CodeGen/X86/atomic16.ll

	Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; X64-LABEL: atomic_fetch_and16			; X64-LABEL: atomic_fetch_and16
	; X32-LABEL: atomic_fetch_and16			; X32-LABEL: atomic_fetch_and16
	%t1 = atomicrmw and i16* @sc16, i16 3 acquire			%t1 = atomicrmw and i16* @sc16, i16 3 acquire
	; X64: lock			; X64: lock
	; X64: andw $3, {{.*}} # encoding: [0x66,0xf0			; X64: andw $3, {{.*}} # encoding: [0x66,0xf0
	; X32: lock			; X32: lock
	; X32: andw $3			; X32: andw $3
	%t2 = atomicrmw and i16* @sc16, i16 5 acquire			%t2 = atomicrmw and i16* @sc16, i16 5 acquire
	; X64: andl			; X64: andw
	; X64: lock			; X64: lock
	; X64: cmpxchgw			; X64: cmpxchgw
	; X32: andl			; X32: andw
	; X32: lock			; X32: lock
	; X32: cmpxchgw			; X32: cmpxchgw
	%t3 = atomicrmw and i16* @sc16, i16 %t2 acquire			%t3 = atomicrmw and i16* @sc16, i16 %t2 acquire
	; X64: lock			; X64: lock
	; X64: andw {{.*}} # encoding: [0x66,0xf0			; X64: andw {{.*}} # encoding: [0x66,0xf0
	; X32: lock			; X32: lock
	; X32: andw			; X32: andw
	ret void			ret void
	; X64: ret			; X64: ret
	; X32: ret			; X32: ret
	}			}

	define void @atomic_fetch_or16() nounwind {			define void @atomic_fetch_or16() nounwind {
	; X64-LABEL: atomic_fetch_or16			; X64-LABEL: atomic_fetch_or16
	; X32-LABEL: atomic_fetch_or16			; X32-LABEL: atomic_fetch_or16
	%t1 = atomicrmw or i16* @sc16, i16 3 acquire			%t1 = atomicrmw or i16* @sc16, i16 3 acquire
	; X64: lock			; X64: lock
	; X64: orw $3, {{.*}} # encoding: [0x66,0xf0			; X64: orw $3, {{.*}} # encoding: [0x66,0xf0
	; X32: lock			; X32: lock
	; X32: orw $3			; X32: orw $3
	%t2 = atomicrmw or i16* @sc16, i16 5 acquire			%t2 = atomicrmw or i16* @sc16, i16 5 acquire
	; X64: orl			; X64: orw
	; X64: lock			; X64: lock
	; X64: cmpxchgw			; X64: cmpxchgw
	; X32: orl			; X32: orw
	; X32: lock			; X32: lock
	; X32: cmpxchgw			; X32: cmpxchgw
	%t3 = atomicrmw or i16* @sc16, i16 %t2 acquire			%t3 = atomicrmw or i16* @sc16, i16 %t2 acquire
	; X64: lock			; X64: lock
	; X64: orw {{.*}} # encoding: [0x66,0xf0			; X64: orw {{.*}} # encoding: [0x66,0xf0
	; X32: lock			; X32: lock
	; X32: orw			; X32: orw
	ret void			ret void
	; X64: ret			; X64: ret
	; X32: ret			; X32: ret
	}			}

	define void @atomic_fetch_xor16() nounwind {			define void @atomic_fetch_xor16() nounwind {
	; X64-LABEL: atomic_fetch_xor16			; X64-LABEL: atomic_fetch_xor16
	; X32-LABEL: atomic_fetch_xor16			; X32-LABEL: atomic_fetch_xor16
	%t1 = atomicrmw xor i16* @sc16, i16 3 acquire			%t1 = atomicrmw xor i16* @sc16, i16 3 acquire
	; X64: lock			; X64: lock
	; X64: xorw $3, {{.*}} # encoding: [0x66,0xf0			; X64: xorw $3, {{.*}} # encoding: [0x66,0xf0
	; X32: lock			; X32: lock
	; X32: xorw $3			; X32: xorw $3
	%t2 = atomicrmw xor i16* @sc16, i16 5 acquire			%t2 = atomicrmw xor i16* @sc16, i16 5 acquire
	; X64: xorl			; X64: xorw
	; X64: lock			; X64: lock
	; X64: cmpxchgw			; X64: cmpxchgw
	; X32: xorl			; X32: xorw
	; X32: lock			; X32: lock
	; X32: cmpxchgw			; X32: cmpxchgw
	%t3 = atomicrmw xor i16* @sc16, i16 %t2 acquire			%t3 = atomicrmw xor i16* @sc16, i16 %t2 acquire
	; X64: lock			; X64: lock
	; X64: xorw {{.*}} # encoding: [0x66,0xf0			; X64: xorw {{.*}} # encoding: [0x66,0xf0
	; X32: lock			; X32: lock
	; X32: xorw			; X32: xorw
	ret void			ret void
	; X64: ret			; X64: ret
	; X32: ret			; X32: ret
	}			}

	define void @atomic_fetch_nand16(i16 %x) nounwind {			define void @atomic_fetch_nand16(i16 %x) nounwind {
	; X64-LABEL: atomic_fetch_nand16			; X64-LABEL: atomic_fetch_nand16
	; X32-LABEL: atomic_fetch_nand16			; X32-LABEL: atomic_fetch_nand16
	%t1 = atomicrmw nand i16* @sc16, i16 %x acquire			%t1 = atomicrmw nand i16* @sc16, i16 %x acquire
	; X64: andl			; X64: andw
	; X64: notl			; X64: notw
	; X64: lock			; X64: lock
	; X64: cmpxchgw			; X64: cmpxchgw
	; X32: andl			; X32: andw
	; X32: notl			; X32: notw
	; X32: lock			; X32: lock
	; X32: cmpxchgw			; X32: cmpxchgw
	ret void			ret void
	; X64: ret			; X64: ret
	; X32: ret			; X32: ret
	}			}

	define void @atomic_fetch_max16(i16 %x) nounwind {			define void @atomic_fetch_max16(i16 %x) nounwind {
	▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

test/CodeGen/X86/atomic32.ll

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	; WITH-CMOV-LABEL: atomic_fetch_max32:

%t1 = atomicrmw max i32* @sc32, i32 %x acquire		%t1 = atomicrmw max i32* @sc32, i32 %x acquire
; WITH-CMOV: subl		; WITH-CMOV: subl
; WITH-CMOV: cmov		; WITH-CMOV: cmov
; WITH-CMOV: lock		; WITH-CMOV: lock
; WITH-CMOV: cmpxchgl		; WITH-CMOV: cmpxchgl

; NOCMOV: subl		; NOCMOV: subl
; NOCMOV: jge		; NOCMOV: setg [[REG:%[a-z]+]]
		; NOCMOV: testb $1, [[REG]]
		; NOCMOV: jne
; NOCMOV: lock		; NOCMOV: lock
; NOCMOV: cmpxchgl		; NOCMOV: cmpxchgl
ret void		ret void
; WITH-CMOV: ret		; WITH-CMOV: ret
; NOCMOV: ret		; NOCMOV: ret
}		}

define void @atomic_fetch_min32(i32 %x) nounwind {		define void @atomic_fetch_min32(i32 %x) nounwind {
; WITH-CMOV-LABEL: atomic_fetch_min32:		; WITH-CMOV-LABEL: atomic_fetch_min32:
; NOCMOV-LABEL: atomic_fetch_min32:		; NOCMOV-LABEL: atomic_fetch_min32:

%t1 = atomicrmw min i32* @sc32, i32 %x acquire		%t1 = atomicrmw min i32* @sc32, i32 %x acquire
; WITH-CMOV: subl		; WITH-CMOV: subl
; WITH-CMOV: cmov		; WITH-CMOV: cmov
; WITH-CMOV: lock		; WITH-CMOV: lock
; WITH-CMOV: cmpxchgl		; WITH-CMOV: cmpxchgl

; NOCMOV: subl		; NOCMOV: subl
; NOCMOV: jle		; NOCMOV: setle [[REG:%[a-z]+]]
		; NOCMOV: testb $1, [[REG]]
		; NOCMOV: jne
; NOCMOV: lock		; NOCMOV: lock
; NOCMOV: cmpxchgl		; NOCMOV: cmpxchgl
ret void		ret void
; WITH-CMOV: ret		; WITH-CMOV: ret
; NOCMOV: ret		; NOCMOV: ret
}		}

define void @atomic_fetch_umax32(i32 %x) nounwind {		define void @atomic_fetch_umax32(i32 %x) nounwind {
; WITH-CMOV-LABEL: atomic_fetch_umax32:		; WITH-CMOV-LABEL: atomic_fetch_umax32:
; NOCMOV-LABEL: atomic_fetch_umax32:		; NOCMOV-LABEL: atomic_fetch_umax32:

%t1 = atomicrmw umax i32* @sc32, i32 %x acquire		%t1 = atomicrmw umax i32* @sc32, i32 %x acquire
; WITH-CMOV: subl		; WITH-CMOV: subl
; WITH-CMOV: cmov		; WITH-CMOV: cmov
; WITH-CMOV: lock		; WITH-CMOV: lock
; WITH-CMOV: cmpxchgl		; WITH-CMOV: cmpxchgl

; NOCMOV: subl		; NOCMOV: subl
; NOCMOV: ja		; NOCMOV: seta [[REG:%[a-z]+]]
		; NOCMOV: testb $1, [[REG]]
		; NOCMOV: jne
; NOCMOV: lock		; NOCMOV: lock
; NOCMOV: cmpxchgl		; NOCMOV: cmpxchgl
ret void		ret void
; WITH-CMOV: ret		; WITH-CMOV: ret
; NOCMOV: ret		; NOCMOV: ret
}		}

define void @atomic_fetch_umin32(i32 %x) nounwind {		define void @atomic_fetch_umin32(i32 %x) nounwind {
; WITH-CMOV-LABEL: atomic_fetch_umin32:		; WITH-CMOV-LABEL: atomic_fetch_umin32:
; NOCMOV-LABEL: atomic_fetch_umin32:		; NOCMOV-LABEL: atomic_fetch_umin32:

%t1 = atomicrmw umin i32* @sc32, i32 %x acquire		%t1 = atomicrmw umin i32* @sc32, i32 %x acquire
; WITH-CMOV: subl		; WITH-CMOV: subl
; WITH-CMOV: cmov		; WITH-CMOV: cmov
; WITH-CMOV: lock		; WITH-CMOV: lock
; WITH-CMOV: cmpxchgl		; WITH-CMOV: cmpxchgl

; NOCMOV: subl		; NOCMOV: subl
; NOCMOV: jb		; NOCMOV: setbe [[REG:%[a-z]+]]
		; NOCMOV: testb $1, [[REG]]
		; NOCMOV: jne
; NOCMOV: lock		; NOCMOV: lock
; NOCMOV: cmpxchgl		; NOCMOV: cmpxchgl
ret void		ret void
; WITH-CMOV: ret		; WITH-CMOV: ret
; NOCMOV: ret		; NOCMOV: ret
}		}

define void @atomic_fetch_cmpxchg32() nounwind {		define void @atomic_fetch_cmpxchg32() nounwind {
Show All 28 Lines

test/CodeGen/X86/atomic6432.ll

	Show All 26 Lines
	; X32: cmpxchg8b			; X32: cmpxchg8b
	ret void			ret void
	; X32: ret			; X32: ret
	}			}

	define void @atomic_fetch_sub64() nounwind {			define void @atomic_fetch_sub64() nounwind {
	; X32-LABEL: atomic_fetch_sub64:			; X32-LABEL: atomic_fetch_sub64:
	%t1 = atomicrmw sub i64* @sc64, i64 1 acquire			%t1 = atomicrmw sub i64* @sc64, i64 1 acquire
	; X32: addl $-1			; X32: subl $1
	; X32: adcl $-1			; X32: sbbl $0
	; X32: lock			; X32: lock
	; X32: cmpxchg8b			; X32: cmpxchg8b
	%t2 = atomicrmw sub i64* @sc64, i64 3 acquire			%t2 = atomicrmw sub i64* @sc64, i64 3 acquire
	; X32: addl $-3			; X32: subl $3
	; X32: adcl $-1			; X32: sbbl $0
	; X32: lock			; X32: lock
	; X32: cmpxchg8b			; X32: cmpxchg8b
	%t3 = atomicrmw sub i64* @sc64, i64 5 acquire			%t3 = atomicrmw sub i64* @sc64, i64 5 acquire
	; X32: addl $-5			; X32: subl $5
	; X32: adcl $-1			; X32: sbbl $0
	; X32: lock			; X32: lock
	; X32: cmpxchg8b			; X32: cmpxchg8b
	%t4 = atomicrmw sub i64* @sc64, i64 %t3 acquire			%t4 = atomicrmw sub i64* @sc64, i64 %t3 acquire
	; X32: subl			; X32: subl
	; X32: sbbl			; X32: sbbl
	; X32: lock			; X32: lock
	; X32: cmpxchg8b			; X32: cmpxchg8b
	ret void			ret void
	▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

test/CodeGen/X86/dag-optnone.ll

	; RUN: llc < %s -mtriple=x86_64-pc-win32 -O0 -mattr=+avx \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-pc-win32 -O0 -mattr=+avx \| FileCheck %s --check-prefix=CHECK --check-prefix=OPT
				; RUN: llc < %s -mtriple=x86_64-pc-win32 -O1 -mattr=+avx \| FileCheck %s --check-prefix=OPT

	; Background:			; Background:
	; If fast-isel bails out to normal selection, then the DAG combiner will run,			; If fast-isel bails out to normal selection, then the DAG combiner should not
	; even at -O0. In principle this should not happen (those are optimizations,			; run at -O0.
	; and we said -O0) but as a practical matter there are some instruction
	; selection patterns that depend on the legalizations and transforms that the
	; DAG combiner does.
	;			;
	; The 'optnone' attribute implicitly sets -O0 and fast-isel for the function.			; The 'optnone' attribute implicitly sets -O0 and fast-isel for the function.
	; The DAG combiner was disabled for 'optnone' (but not -O0) by r221168, then
	; re-enabled in r233153 because of problems with instruction selection patterns
	; mentioned above. (Note: because 'optnone' is supposed to match -O0, r221168
	; really should have disabled the combiner for both.)
	;			;
	; If instruction selection eventually becomes smart enough to run without DAG			; The test cases @foo[WithOptnone] prove that no DAG combine happens with
	; combiner, then the combiner can be turned off for -O0 (not just 'optnone')			; -O0 and with 'optnone' set. To prove this, we use a Windows triple to
	; and this test can go away. (To be replaced by a different test that verifies
	; the DAG combiner does not run at -O0 or for 'optnone' functions.)
	;
	; In the meantime, this test wants to make sure the combiner stays enabled for
	; 'optnone' functions, just as it is for -O0.


	; The test cases @foo[WithOptnone] prove that the same DAG combine happens
	; with -O0 and with 'optnone' set. To prove this, we use a Windows triple to
	; cause fast-isel to bail out (because something about the calling convention			; cause fast-isel to bail out (because something about the calling convention
	; is not handled in fast-isel). Then we have a repeated fadd that can be			; is not handled in fast-isel). Then we have a repeated fadd that can be
	; combined into an fmul. We show that this happens in both the non-optnone			; combined into an fmul. We show that this does not happen in both the
	; function and the optnone function.			; non-optnone function and the optnone function.

	define float @foo(float %x) #0 {			define float @foo(float %x) #0 {
	entry:			entry:
	%add = fadd fast float %x, %x			%add = fadd fast float %x, %x
	%add1 = fadd fast float %add, %x			%add1 = fadd fast float %add, %x
	ret float %add1			ret float %add1
	}			}

	; CHECK-LABEL: @foo			; CHECK-LABEL: @foo
	; CHECK-NOT: add			; CHECK-NOT: mul
	; CHECK: mul			; CHECK: add
				; CHECK: add
	; CHECK-NEXT: ret			; CHECK-NEXT: ret

	define float @fooWithOptnone(float %x) #1 {			define float @fooWithOptnone(float %x) #1 {
	entry:			entry:
	%add = fadd fast float %x, %x			%add = fadd fast float %x, %x
	%add1 = fadd fast float %add, %x			%add1 = fadd fast float %add, %x
	ret float %add1			ret float %add1
	}			}

	; CHECK-LABEL: @fooWithOptnone			; OPT-LABEL: @fooWithOptnone
	; CHECK-NOT: add			; OPT-NOT: mul
	; CHECK: mul			; OPT: add
	; CHECK-NEXT: ret			; OPT: add
				; OPT-NEXT: ret


	; The test case @bar is derived from an instruction selection failure case			; The test case @bar is derived from an instruction selection failure case
	; that was solved by r233153. It depends on -mattr=+avx.			; that was solved by r233153. It depends on -mattr=+avx.
	; Really all we're trying to prove is that it doesn't crash any more.			; Really all we're trying to prove is that it doesn't crash any more.

	@id84 = common global <16 x i32> zeroinitializer, align 64			@id84 = common global <16 x i32> zeroinitializer, align 64

	Show All 11 Lines

test/CodeGen/X86/fast-isel-gep.ll

Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	entry:
%tmp = load i64, i64* %x.addr ; <i64> [#uses=1]		%tmp = load i64, i64* %x.addr ; <i64> [#uses=1]
%add = add nsw i64 %tmp, 16 ; <i64> [#uses=1]		%add = add nsw i64 %tmp, 16 ; <i64> [#uses=1]
%tmp1 = load double, double* %p.addr ; <double*> [#uses=1]		%tmp1 = load double, double* %p.addr ; <double*> [#uses=1]
%arrayidx = getelementptr inbounds double, double* %tmp1, i64 %add ; <double*> [#uses=1]		%arrayidx = getelementptr inbounds double, double* %tmp1, i64 %add ; <double*> [#uses=1]
%tmp2 = load double, double* %arrayidx ; <double> [#uses=1]		%tmp2 = load double, double* %arrayidx ; <double> [#uses=1]
ret double %tmp2		ret double %tmp2

; X32-LABEL: test4:		; X32-LABEL: test4:
; X32: 128(%e{{.}},%e{{.}},8)		; X32: addl $16, [[REG:%e[a-z]+]]
		; X32: (%e{{.*}},[[REG]],8)
; X64-LABEL: test4:		; X64-LABEL: test4:
; X64: 128(%r{{.}},%r{{.}},8)		; X64: 128(%r{{.}},%r{{.}},8)
}		}

; PR8961 - Make sure the sext for the GEP addressing comes before the load that		; PR8961 - Make sure the sext for the GEP addressing comes before the load that
; is folded.		; is folded.
define i64 @test5(i8* %A, i32 %I, i64 %B) nounwind {		define i64 @test5(i8* %A, i32 %I, i64 %B) nounwind {
%v8 = getelementptr i8, i8* %A, i32 %I		%v8 = getelementptr i8, i8* %A, i32 %I
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

test/CodeGen/X86/fastmath-optnone.ll

				; RUN: llc < %s -mcpu=corei7 -march=x86-64 -mattr=+sse2 \| FileCheck %s
				; Verify that floating-point operations inside 'optnone' functions
				; are not optimized even if unsafe-fp-math is set.

				define float @foo(float %x) #0 {
				entry:
				%add = fadd fast float %x, %x
				%add1 = fadd fast float %add, %x
				ret float %add1
				}

				; CHECK-LABEL: @foo
				; CHECK-NOT: add
				; CHECK: mul
				; CHECK-NOT: add
				; CHECK: ret

				define float @fooWithOptnone(float %x) #1 {
				entry:
				%add = fadd fast float %x, %x
				%add1 = fadd fast float %add, %x
				ret float %add1
				}

				; CHECK-LABEL: @fooWithOptnone
				; CHECK-NOT: mul
				; CHECK: add
				; CHECK-NOT: mul
				; CHECK: add
				; CHECK-NOT: mul
				; CHECK: ret


				attributes #0 = { "unsafe-fp-math"="true" }
				attributes #1 = { noinline optnone "unsafe-fp-math"="true" }

test/CodeGen/X86/inline-asm-tied.ll

	; RUN: llc < %s -mtriple=i386-apple-darwin9 -O0 -optimize-regalloc -regalloc=basic -no-integrated-as \| FileCheck %s			; RUN: llc < %s -mtriple=i386-apple-darwin9 -O0 -optimize-regalloc -regalloc=basic -no-integrated-as \| FileCheck %s
	; rdar://6992609			; rdar://6992609

	; CHECK: movl [[EDX:%e..]], 4(%esp)			; CHECK: movl [[REG1:%e..]], 4(%esp)
	; CHECK: movl [[EDX]], 4(%esp)			; CHECK: movl 4(%esp), [[REG2:%e..]]
				; CHECK: movl [[REG2]], 4(%esp)
	target triple = "i386-apple-darwin9.0"			target triple = "i386-apple-darwin9.0"
	@llvm.used = appending global [1 x i8] [i8 bitcast (i64 (i64)* @_OSSwapInt64 to i8)], section "llvm.metadata" ; <[1 x i8]*> [#uses=0]			@llvm.used = appending global [1 x i8] [i8 bitcast (i64 (i64)* @_OSSwapInt64 to i8)], section "llvm.metadata" ; <[1 x i8]*> [#uses=0]

	define i64 @_OSSwapInt64(i64 %_data) nounwind {			define i64 @_OSSwapInt64(i64 %_data) nounwind {
	entry:			entry:
	%retval = alloca i64 ; <i64*> [#uses=2]			%retval = alloca i64 ; <i64*> [#uses=2]
	%_data.addr = alloca i64 ; <i64*> [#uses=4]			%_data.addr = alloca i64 ; <i64*> [#uses=4]
	store i64 %_data, i64* %_data.addr			store i64 %_data, i64* %_data.addr
	Show All 17 Lines

test/CodeGen/X86/musttail.ll

	Show All 39 Lines
	declare void @capture(i8*)			declare void @capture(i8*)
	declare void @t3_callee(i32)			declare void @t3_callee(i32)

	; Test that we actually copy in and out stack arguments that aren't forwarded			; Test that we actually copy in and out stack arguments that aren't forwarded
	; without modification.			; without modification.
	define i32 @t4({}* %fn, i32 %n, i32 %r) {			define i32 @t4({}* %fn, i32 %n, i32 %r) {
	; CHECK-LABEL: t4:			; CHECK-LABEL: t4:
	; CHECK: incl %[[r:.*]]			; CHECK: incl %[[r:.*]]
	; CHECK: decl %[[n:.*]]			; CHECK: {{decl\|subl}}
				; CHECK-SAME: %[[n:.*]]
	; CHECK: movl %[[r]], {{[0-9]+}}(%esp)			; CHECK: movl %[[r]], {{[0-9]+}}(%esp)
	; CHECK: movl %[[n]], {{[0-9]+}}(%esp)			; CHECK: movl %[[n]], {{[0-9]+}}(%esp)
	; CHECK: jmpl %{{.}}			; CHECK: jmpl %{{.}}

	entry:			entry:
	%r1 = add i32 %r, 1			%r1 = add i32 %r, 1
	%n1 = sub i32 %n, 1			%n1 = sub i32 %n, 1
	%fn_cast = bitcast {}* %fn to i32 ({}, i32, i32)			%fn_cast = bitcast {}* %fn to i32 ({}, i32, i32)
	%r2 = musttail call i32 %fn_cast({}* %fn, i32 %n1, i32 %r1)			%r2 = musttail call i32 %fn_cast({}* %fn, i32 %n1, i32 %r1)
	ret i32 %r2			ret i32 %r2
	}			}

	; Combine the complex stack frame with the parameter modification.			; Combine the complex stack frame with the parameter modification.
	define i32 @t5({}* %fn, i32 %n, i32 %r) alignstack(32) {			define i32 @t5({}* %fn, i32 %n, i32 %r) alignstack(32) {
	; CHECK-LABEL: t5:			; CHECK-LABEL: t5:
	; CHECK: pushl %ebp			; CHECK: pushl %ebp
	; CHECK: movl %esp, %ebp			; CHECK: movl %esp, %ebp
	; CHECK: pushl %esi			; CHECK: pushl %esi
	; Align the stack.			; Align the stack.
	; CHECK: andl $-32, %esp			; CHECK: andl $-32, %esp
	; CHECK: movl %esp, %esi			; CHECK: movl %esp, %esi
	; Modify the args.			; Modify the args.
	; CHECK: incl %[[r:.*]]			; CHECK: incl %[[r:.*]]
	; CHECK: decl %[[n:.*]]			; CHECK: {{decl\|subl}}
				; CHECK-SAME: %[[n:.*]]
	; Store them through ebp, since that's the only stable arg pointer.			; Store them through ebp, since that's the only stable arg pointer.
	; CHECK: movl %[[r]], {{[0-9]+}}(%ebp)			; CHECK: movl %[[r]], {{[0-9]+}}(%ebp)
	; CHECK: movl %[[n]], {{[0-9]+}}(%ebp)			; CHECK: movl %[[n]], {{[0-9]+}}(%ebp)
	; Epilogue.			; Epilogue.
	; CHECK: leal {{[-0-9]+}}(%ebp), %esp			; CHECK: leal {{[-0-9]+}}(%ebp), %esp
	; CHECK: popl %esi			; CHECK: popl %esi
	; CHECK: popl %ebp			; CHECK: popl %ebp
	; CHECK: jmpl %{{.}}			; CHECK: jmpl %{{.}}
	Show All 10 Lines

test/CodeGen/X86/switch.ll

	Show All 13 Lines
	bb0: tail call void @g(i32 0) br label %return			bb0: tail call void @g(i32 0) br label %return
	bb1: tail call void @g(i32 1) br label %return			bb1: tail call void @g(i32 1) br label %return
	bb2: tail call void @g(i32 1) br label %return			bb2: tail call void @g(i32 1) br label %return
	return: ret void			return: ret void

	; Should be lowered as straight compares in -O0 mode.			; Should be lowered as straight compares in -O0 mode.
	; NOOPT-LABEL: basic			; NOOPT-LABEL: basic
	; NOOPT: subl $1, %eax			; NOOPT: subl $1, %eax
	; NOOPT: je			; NOOPT: sete [[R1:%.+]]
				; NOOPT: testb $1, [[R1]]
				; NOOPT: jne
	; NOOPT: subl $3, %eax			; NOOPT: subl $3, %eax
	; NOOPT: je			; NOOPT: sete [[R1]]
				; NOOPT: testb $1, [[R1]]
				; NOOPT: jne
	; NOOPT: subl $4, %eax			; NOOPT: subl $4, %eax
	; NOOPT: je			; NOOPT: sete [[R1]]
				; NOOPT: testb $1, [[R1]]
				; NOOPT: jne
	; NOOPT: subl $5, %eax			; NOOPT: subl $5, %eax
	; NOOPT: je			; NOOPT: sete [[R1]]
				; NOOPT: testb $1, [[R1]]
				; NOOPT: jne

	; Jump table otherwise.			; Jump table otherwise.
	; CHECK-LABEL: basic			; CHECK-LABEL: basic
	; CHECK: decl			; CHECK: decl
	; CHECK: cmpl $4			; CHECK: cmpl $4
	; CHECK: ja			; CHECK: ja
	; CHECK: jmpq *.LJTI			; CHECK: jmpq *.LJTI
	}			}
	Show All 20 Lines
	; CHECK: leal -100			; CHECK: leal -100
	; CHECK: cmpl $4			; CHECK: cmpl $4
	; CHECK: jae			; CHECK: jae
	; CHECK: cmpl $3			; CHECK: cmpl $3
	; CHECK: ja			; CHECK: ja

	; We do this even at -O0, because it's cheap and makes codegen faster.			; We do this even at -O0, because it's cheap and makes codegen faster.
	; NOOPT-LABEL: simple_ranges			; NOOPT-LABEL: simple_ranges
	; NOOPT: subl $4			; NOOPT: subl $3
	; NOOPT: jb			; NOOPT: setbe [[R1:%.+]]
	; NOOPT: addl $-100			; NOOPT: testb $1, [[R1]]
	; NOOPT: subl $4			; NOOPT: jne
	; NOOPT: jb			; NOOPT: subl $100
				; NOOPT: subl $3
				; NOOPT: setbe [[R1]]
				; NOOPT: testb $1, [[R1]]
				; NOOPT: jne
	}			}


	define void @jt_is_better(i32 %x) {			define void @jt_is_better(i32 %x) {
	entry:			entry:
	switch i32 %x, label %return [			switch i32 %x, label %return [
	i32 0, label %bb0			i32 0, label %bb0
	i32 2, label %bb0			i32 2, label %bb0
	▲ Show 20 Lines • Show All 459 Lines • Show Last 20 Lines

test/CodeGen/X86/win32_sret.ll

; We specify -mcpu explicitly to avoid instruction reordering that happens on		; We specify -mcpu explicitly to avoid instruction reordering that happens on
; some setups (e.g., Atom) from affecting the output.		; some setups (e.g., Atom) from affecting the output.
; RUN: llc < %s -mcpu=core2 -mtriple=i686-pc-win32 \| FileCheck %s -check-prefix=WIN32		; RUN: llc < %s -mcpu=core2 -mtriple=i686-pc-win32 \| FileCheck %s -check-prefix=WIN32 -check-prefix=WIN32-NO-O0
; RUN: llc < %s -mcpu=core2 -mtriple=i686-pc-mingw32 \| FileCheck %s -check-prefix=MINGW_X86		; RUN: llc < %s -mcpu=core2 -mtriple=i686-pc-mingw32 \| FileCheck %s -check-prefix=MINGW_X86
; RUN: llc < %s -mcpu=core2 -mtriple=i686-pc-cygwin \| FileCheck %s -check-prefix=CYGWIN		; RUN: llc < %s -mcpu=core2 -mtriple=i686-pc-cygwin \| FileCheck %s -check-prefix=CYGWIN
; RUN: llc < %s -mcpu=core2 -mtriple=i386-pc-linux \| FileCheck %s -check-prefix=LINUX		; RUN: llc < %s -mcpu=core2 -mtriple=i386-pc-linux \| FileCheck %s -check-prefix=LINUX
; RUN: llc < %s -mcpu=core2 -O0 -mtriple=i686-pc-win32 \| FileCheck %s -check-prefix=WIN32		; RUN: llc < %s -mcpu=core2 -O0 -mtriple=i686-pc-win32 \| FileCheck %s -check-prefix=WIN32 -check-prefix=WIN32-O0
; RUN: llc < %s -mcpu=core2 -O0 -mtriple=i686-pc-mingw32 \| FileCheck %s -check-prefix=MINGW_X86		; RUN: llc < %s -mcpu=core2 -O0 -mtriple=i686-pc-mingw32 \| FileCheck %s -check-prefix=MINGW_X86
; RUN: llc < %s -mcpu=core2 -O0 -mtriple=i686-pc-cygwin \| FileCheck %s -check-prefix=CYGWIN		; RUN: llc < %s -mcpu=core2 -O0 -mtriple=i686-pc-cygwin \| FileCheck %s -check-prefix=CYGWIN
; RUN: llc < %s -mcpu=core2 -O0 -mtriple=i386-pc-linux \| FileCheck %s -check-prefix=LINUX		; RUN: llc < %s -mcpu=core2 -O0 -mtriple=i386-pc-linux \| FileCheck %s -check-prefix=LINUX

; The SysV ABI used by most Unixes and Mingw on x86 specifies that an sret pointer		; The SysV ABI used by most Unixes and Mingw on x86 specifies that an sret pointer
; is callee-cleanup. However, in MSVC's cdecl calling convention, sret pointer		; is callee-cleanup. However, in MSVC's cdecl calling convention, sret pointer
; arguments are caller-cleanup like normal arguments.		; arguments are caller-cleanup like normal arguments.

▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	entry:
store i32 42, i32* %x, align 4		store i32 42, i32* %x, align 4
ret void		ret void
; WIN32-LABEL: {{^}}"?foo@C5@@QAE?AUS5@@XZ":		; WIN32-LABEL: {{^}}"?foo@C5@@QAE?AUS5@@XZ":
; MINGW_X86-LABEL: {{^}}"?foo@C5@@QAE?AUS5@@XZ":		; MINGW_X86-LABEL: {{^}}"?foo@C5@@QAE?AUS5@@XZ":
; CYGWIN-LABEL: {{^}}"?foo@C5@@QAE?AUS5@@XZ":		; CYGWIN-LABEL: {{^}}"?foo@C5@@QAE?AUS5@@XZ":
; LINUX-LABEL: {{^}}"?foo@C5@@QAE?AUS5@@XZ":		; LINUX-LABEL: {{^}}"?foo@C5@@QAE?AUS5@@XZ":

; The address of the return structure is passed as an implicit parameter.		; The address of the return structure is passed as an implicit parameter.
; In the -O0 build, %eax is spilled at the beginning of the function, hence we		; WIN32-NO-O0: {{[48]}}(%esp), %eax
; should match both 4(%esp) and 8(%esp).		; WIN32-O0: subl ${{[0-9]+}}, %esp
; WIN32: {{[48]}}(%esp), %eax		; WIN32-O0: {{[0-9]+}}(%esp), %eax
; WIN32: movl $42, (%eax)		; WIN32: movl $42, (%eax)
; WIN32: retl $4		; WIN32: retl $4
}		}

define void @call_foo5() {		define void @call_foo5() {
entry:		entry:
%c = alloca %class.C5, align 1		%c = alloca %class.C5, align 1
%s = alloca %struct.S5, align 4		%s = alloca %struct.S5, align 4
call x86_thiscallcc void @"\01?foo@C5@@QAE?AUS5@@XZ"(%struct.S5* sret %s, %class.C5* %c)		call x86_thiscallcc void @"\01?foo@C5@@QAE?AUS5@@XZ"(%struct.S5* sret %s, %class.C5* %c)
; WIN32-LABEL: {{^}}_call_foo5:		; WIN32-LABEL: {{^}}_call_foo5:
▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

test/CodeGen/X86/win64_eh.ll

	Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; WIN64: .seh_stackalloc 8000			; WIN64: .seh_stackalloc 8000
	; WIN64: .seh_endprologue			; WIN64: .seh_endprologue
	; WIN64: addq $8000, %rsp			; WIN64: addq $8000, %rsp
	; WIN64: ret			; WIN64: ret
	; WIN64: .seh_endproc			; WIN64: .seh_endproc


	; Checks stack push			; Checks stack push
	define i32 @foo3(i32 %f_arg, i32 %e_arg, i32 %d_arg, i32 %c_arg, i32 %b_arg, i32 %a_arg) uwtable {			define i32 @foo3(i32 %g_arg, i32 %f_arg, i32 %e_arg, i32 %d_arg, i32 %c_arg, i32 %b_arg, i32 %a_arg) uwtable {
	entry:			entry:
	%a = alloca i32			%a = alloca i32
	%b = alloca i32			%b = alloca i32
	%c = alloca i32			%c = alloca i32
	%d = alloca i32			%d = alloca i32
	%e = alloca i32			%e = alloca i32
	%f = alloca i32			%f = alloca i32
				%g = alloca i32
	store i32 %a_arg, i32* %a			store i32 %a_arg, i32* %a
	store i32 %b_arg, i32* %b			store i32 %b_arg, i32* %b
	store i32 %c_arg, i32* %c			store i32 %c_arg, i32* %c
	store i32 %d_arg, i32* %d			store i32 %d_arg, i32* %d
	store i32 %e_arg, i32* %e			store i32 %e_arg, i32* %e
	store i32 %f_arg, i32* %f			store i32 %f_arg, i32* %f
				store i32 %g_arg, i32* %g
	%tmp = load i32, i32* %a			%tmp = load i32, i32* %a
	%tmp1 = mul i32 %tmp, 2			%tmp1 = mul i32 %tmp, 2
	%tmp2 = load i32, i32* %b			%tmp2 = load i32, i32* %b
	%tmp3 = mul i32 %tmp2, 3			%tmp3 = mul i32 %tmp2, 3
	%tmp4 = add i32 %tmp1, %tmp3			%tmp4 = add i32 %tmp1, %tmp3
	%tmp5 = load i32, i32* %c			%tmp5 = load i32, i32* %c
	%tmp6 = mul i32 %tmp5, 5			%tmp6 = mul i32 %tmp5, 5
	%tmp7 = add i32 %tmp4, %tmp6			%tmp7 = add i32 %tmp4, %tmp6
	%tmp8 = load i32, i32* %d			%tmp8 = load i32, i32* %d
	%tmp9 = mul i32 %tmp8, 7			%tmp9 = mul i32 %tmp8, 9
	%tmp10 = add i32 %tmp7, %tmp9			%tmp10 = add i32 %tmp7, %tmp9
	%tmp11 = load i32, i32* %e			%tmp11 = load i32, i32* %e
	%tmp12 = mul i32 %tmp11, 11			%tmp12 = mul i32 %tmp11, 11
	%tmp13 = add i32 %tmp10, %tmp12			%tmp13 = add i32 %tmp10, %tmp12
	%tmp14 = load i32, i32* %f			%tmp14 = load i32, i32* %f
	%tmp15 = mul i32 %tmp14, 13			%tmp15 = mul i32 %tmp14, 13
	%tmp16 = add i32 %tmp13, %tmp15			%tmp16 = add i32 %tmp13, %tmp15
	ret i32 %tmp16			ret i32 %tmp16
	}			}
	; WIN64-LABEL: foo3:			; WIN64-LABEL: foo3:
	; WIN64: .seh_proc foo3			; WIN64: .seh_proc foo3
	; WIN64: pushq %rsi			; WIN64: pushq %rsi
	; WIN64: .seh_pushreg 6			; WIN64: .seh_pushreg 6
	; NORM: subq $24, %rsp			; NORM: subq $32, %rsp
	; ATOM: leaq -24(%rsp), %rsp			; ATOM: leaq -32(%rsp), %rsp
	; WIN64: .seh_stackalloc 24			; WIN64: .seh_stackalloc 32
	; WIN64: .seh_endprologue			; WIN64: .seh_endprologue
	; WIN64: addq $24, %rsp			; WIN64: addq $32, %rsp
	; WIN64: popq %rsi			; WIN64: popq %rsi
	; WIN64: ret			; WIN64: ret
	; WIN64: .seh_endproc			; WIN64: .seh_endproc


	; Check emission of eh handler and handler data			; Check emission of eh handler and handler data
	declare i32 @_d_eh_personality(i32, i32, i64, i8, i8)			declare i32 @_d_eh_personality(i32, i32, i64, i8, i8)
	declare void @_d_eh_resume_unwind(i8*)			declare void @_d_eh_resume_unwind(i8*)
	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

test/CodeGen/XCore/threads.ll

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	ConstantExpPhiNode:
br label %ConstantExpPhiNode		br label %ConstantExpPhiNode
exit:		exit:
ret void		ret void
}		}

define void @phiNode2( i1 %bool) {		define void @phiNode2( i1 %bool) {
; N.B. check an extra 'Node_crit_edge' (LBB12_1) is inserted		; N.B. check an extra 'Node_crit_edge' (LBB12_1) is inserted
; PHINODE-LABEL: phiNode2:		; PHINODE-LABEL: phiNode2:
; PHINODE: bf {{r[0-9]}}, .LBB12_3		; PHINODE: mkmsk [[MASK_REG:r[0-9]+]], 1
		; PHINODE: xor [[REG:r[0-9]+]], [[REG]], [[MASK_REG]]
		; PHINODE: bt [[REG]], .LBB12_3
; PHINODE: bu .LBB12_1		; PHINODE: bu .LBB12_1
; PHINODE-LABEL: .LBB12_1:		; PHINODE-LABEL: .LBB12_1:
; PHINODE: get r11, id		; PHINODE: get r11, id
; PHINODE-LABEL: .LBB12_2:		; PHINODE-LABEL: .LBB12_2:
; PHINODE: get r11, id		; PHINODE: get r11, id
; PHINODE: bu .LBB12_2		; PHINODE: bu .LBB12_2
; PHINODE-LABEL: .LBB12_3:		; PHINODE-LABEL: .LBB12_3:
entry:		entry:
Show All 11 Lines

test/DebugInfo/ARM/line.test

	; RUN: llc -mtriple=arm-none-linux -O0 -filetype=asm < %S/../Inputs/line.ll \| FileCheck %S/../Inputs/line.ll

	; This is more complex than it looked. It's mixed up somewhere in SelectionDAG
	; (legalized as br_cc, losing the separation between the comparison and the
	; branch, then further lowered to CMPri + brcc but without the fidelity that
	; those two instructions are on separate lines)
	; XFAIL: *

test/DebugInfo/X86/op_deref.ll

	Show All 14 Lines

	; CHECK-NOT: DW_TAG			; CHECK-NOT: DW_TAG
	; CHECK: DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000067] = "vla")			; CHECK: DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000067] = "vla")

	; Unfortunately llvm-dwarfdump can't unparse a list of DW_AT_locations			; Unfortunately llvm-dwarfdump can't unparse a list of DW_AT_locations
	; right now, so we check the asm output:			; right now, so we check the asm output:
	; RUN: llc -O0 -mtriple=x86_64-apple-darwin %s -o - -filetype=asm \| FileCheck %s -check-prefix=ASM-CHECK			; RUN: llc -O0 -mtriple=x86_64-apple-darwin %s -o - -filetype=asm \| FileCheck %s -check-prefix=ASM-CHECK
	; vla should have a register-indirect address at one point.			; vla should have a register-indirect address at one point.
	; ASM-CHECK: DEBUG_VALUE: vla <- RCX			; ASM-CHECK: DEBUG_VALUE: vla <- RDX
	; ASM-CHECK: DW_OP_breg2			; ASM-CHECK: DW_OP_breg1

	; RUN: llvm-as %s -o - \| llvm-dis - \| FileCheck %s --check-prefix=PRETTY-PRINT			; RUN: llvm-as %s -o - \| llvm-dis - \| FileCheck %s --check-prefix=PRETTY-PRINT
	; PRETTY-PRINT: DIExpression(DW_OP_deref, DW_OP_deref)			; PRETTY-PRINT: DIExpression(DW_OP_deref, DW_OP_deref)

	define void @testVLAwithSize(i32 %s) nounwind uwtable ssp {			define void @testVLAwithSize(i32 %s) nounwind uwtable ssp {
	entry:			entry:
	%s.addr = alloca i32, align 4			%s.addr = alloca i32, align 4
	%saved_stack = alloca i8*			%saved_stack = alloca i8*
	▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

test/DebugInfo/X86/vla.ll

	; RUN: llc -O0 -mtriple=x86_64-apple-darwin -filetype=asm %s -o - \| FileCheck %s			; RUN: llc -O0 -mtriple=x86_64-apple-darwin -filetype=asm %s -o - \| FileCheck %s
	; Ensure that we generate an indirect location for the variable length array a.			; Ensure that we generate an indirect location for the variable length array a.
	; CHECK: ##DEBUG_VALUE: vla:a <- RDX			; CHECK: ##DEBUG_VALUE: vla:a <- RSI
	; CHECK: DW_OP_breg1			; CHECK: DW_OP_breg4
	; rdar://problem/13658587			; rdar://problem/13658587
	;			;
	; generated from:			; generated from:
	;			;
	; int vla(int n) {			; int vla(int n) {
	; int a[n];			; int a[n];
	; a[0] = 42;			; a[0] = 42;
	; return a[n-1];			; return a[n-1];
	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

test/tools/llvm-symbolizer/ppc64.test

	// ppc64 was compiled from this source on a big-endian 64-bit PowerPC box			// ppc64 was compiled from this source on a big-endian 64-bit PowerPC box
	// with just "clang -nostdlib":			// with just "clang -nostdlib":
	int foo() { return 0; }			int foo() { return 0; }
	int bar() { return foo(); }			int bar() { return foo(); }
	int _start() { return bar(); }			int _start() { return bar(); }

	RUN: %python -c "print('0x1000014c\n0x1000018c\n0x100001cc')" \| llvm-symbolizer -obj=%p/Inputs/ppc64 \| FileCheck %s			RUN: "%python" -c "print('0x1000014c\n0x1000018c\n0x100001cc')" \| llvm-symbolizer -obj=%p/Inputs/ppc64 \| FileCheck %s

	CHECK: foo			CHECK: foo
	CHECK: bar			CHECK: bar
	CHECK: _start			CHECK: _start

This is an archive of the discontinued LLVM Phabricator instance.

Disable DAGCombine for -O0 and optnoneNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 26436

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/AArch64/aarch64_f16_be.ll

test/CodeGen/AArch64/and-mask-removal.ll

test/CodeGen/ARM/Windows/alloca.ll

test/CodeGen/ARM/alloc-no-stack-realign.ll

test/CodeGen/ARM/big-endian-ret-f64.ll

test/CodeGen/ARM/vst3.ll

test/CodeGen/X86/atomic16.ll

test/CodeGen/X86/atomic32.ll

test/CodeGen/X86/atomic6432.ll

test/CodeGen/X86/dag-optnone.ll

test/CodeGen/X86/fast-isel-gep.ll

test/CodeGen/X86/fastmath-optnone.ll

test/CodeGen/X86/inline-asm-tied.ll

test/CodeGen/X86/musttail.ll

test/CodeGen/X86/switch.ll

test/CodeGen/X86/win32_sret.ll

test/CodeGen/X86/win64_eh.ll

test/CodeGen/XCore/threads.ll

test/DebugInfo/ARM/line.test

test/DebugInfo/X86/op_deref.ll

test/DebugInfo/X86/vla.ll

test/tools/llvm-symbolizer/ppc64.test

Disable DAGCombine for -O0 and optnone
Needs ReviewPublic