This is an archive of the discontinued LLVM Phabricator instance.

[SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate insertions.
ClosedPublic

Authored by chandlerc on Jul 22 2014, 3:45 AM.

Download Raw Diff

Details

Reviewers

grosbach
arsenm
hfinkel

Commits

rG9a0051cd59d1: [SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate…
rL213727: [SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate

Summary

The old behavior could cause arbitrarily bad memory usage in the DAG
combiner if there was heavy traffic of adding nodes already on the
worklist to it. This commit switches the DAG combine worklist to work
the same way as the instcombine worklist where we null-out removed
entries and only add new entries to the worklist. This results in
subtle, frustrating churn in the particular order in which DAG combines
are applied which causes a number of minor regressions where we fail to
match a pattern previously matched by accident. AFAICT, all of these
should be using AddToWorklist to directly or should be written in a less
brittle way. None of the changes seem drastically bad, and a few of the
changes seem distinctly better.

A major change required to make this work is to significantly harden the
way in which the DAG combiner handle nodes which become dead
(zero-uses). Previously, we relied on the ability to "priority-bump"
them on the combine worklist to achieve recursive deletion of these
nodes and ensure that the frontier of remaining live nodes all were
added to the worklist. Instead, I've introduced a routine to just
implement that precise logic with no indirection. It is a significantly
simpler operation than that of the combiner worklist proper. I suspect
this will also fix some other problems with the combiner.

Note that I have *NO IDEA* what the changes on any architecture other than x86
really imply, please check these for your target! I just don't know how to
evaluate them. I've literally transcribed the test case changes necessary to
pass, but it may be more useful to patch in this change and compare A/B to
understand the differences for a particular test case.

I think the x86 changes are really minor and uninteresting, but the avx512 at
least is hiding a "regression" (but the test case is just noise, not testing
some performance invariant) that might be looked into. Not sure if any of the
others impact specific "important" code paths, but they didn't look terribly
interesting to me, or the changes were really minor.

However, maybe this entire approach is just deeply flawed? What do folks think,
is this worthwhile?

Diff Detail

Repository: rL LLVM

Event Timeline

chandlerc updated this revision to Diff 11743.Jul 22 2014, 3:45 AM

chandlerc retitled this revision from to [SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate insertions..

chandlerc updated this object.

chandlerc added reviewers: hfinkel, arsenm, grosbach.

chandlerc added a subscriber: Unknown Object (MLST).

Hi Chandler,

I've taken a look at the ARM changes, and I think they're mostly innocuous.

On the patch as a whole, I don't think relying on nodes being visited in a particular order for good codegen is a good idea anyway. It usually means you're only handling one particular edge-case of a more general construct. So I wouldn't worry too much about that myself. Still annoying for the person like you who comes along to try & make things better though.

I can't think of any problems with the new implementation, though I'm probably not the best person around for picking the right data structure. Hopefully someone else is better there.

Cheers.

Tim.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
101–107 ↗	(On Diff #11743)	This comment now seems out of date.
1096 ↗	(On Diff #11743)	Is this possible? Isn't that an immediate cycle?
test/CodeGen/ARM/sxt_rot.ll
12–13 ↗	(On Diff #11743)	This doesn't look ideal, but it's just a deficiency in the ARM patterns. I've got a fix ready to go, but I'll wait until this is in to avoid giving you conflicts to resolve or anything.

Chandler,

Overall, this seems quite reasonable. My only minor comment would be
that you should improve the naming and documentation of
recursivelyDeleteUnusedNodes. It is not currently clear from the
interface that items are added to the worklist at all, or that it is the
frontier of used nodes which are added. Given this seems to be
important from your description, it should be documented.

LGTM -- Mind you, I'm not particularly familiar with this code. So take
my LGTM with a grain or two of salt. :)

Philip

I just ran the test suite on PPC64/Linux with this patch, and it introduced no failures. ;)

Regarding the PowerPC test changes:

test/CodeGen/PowerPC/complex-return.ll looks like a CodeGen improvement (better store-to-load forwarding).

test/CodeGen/PowerPC/subsumes-pred-regs.ll is also a CodeGen improvement (we changed from comparing the value to 0 to comparing that it is not equal to 1 so that we can reuse the loaded '1' value later).

So I see only improvements here - LGTM.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1082 ↗	(On Diff #11743)	It seems like we now have a number of routines that do something like this. SDAG has an: void RemoveDeadNode(SDNode *N); which is also supposed to have behavior close to this (except that it does not updated the DC worklist?). Maybe some refactoring could consolidate things?
1096 ↗	(On Diff #11743)	And if it is possible, do you need a isPredecessorOf check instead?

This revision is now accepted and ready to land.Jul 22 2014, 11:07 AM

Couple of comments inline.

test/CodeGen/X86/block-placement.ll
240 ↗	(On Diff #11743)	Inquiring minds here? It looks like it was deleted because it's no longer applicable? Does the code look ok?
test/CodeGen/X86/divide-by-constant.ll
60 ↗	(On Diff #11743)	Assume the additional check lines aren't because the code generated is actually worse?

arsenm added inline comments.Jul 22 2014, 2:52 PM

test/CodeGen/R600/r600-export-fix.ll
6 ↗	(On Diff #11743)	I'm pretty sure these are fine. I don't think the specific vector components matter, and Evergreen tests in general are overly sensitive to minor changes in scheduling

I've made the requested minor changes. Let me know if you'd like a fresh patch to review. See detailed replies to questions and comments below.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
101–107 ↗	(On Diff #11743)	Yea, updated comments here.
1082 ↗	(On Diff #11743)	The intent was for this to be a DAG-combiner specific deletion mechanism. Maybe it would be more clear as a method on DAGCombiner? I think I like that better. It also (IMO) removes some of the problems with clarifying that it manages the worklist.
1096 ↗	(On Diff #11743)	I saw crashes due to this at some point in my testing but I can't reproduce them. I'll remove it for now. Note that we don't really need to understand the nature of the cycle, just avoid creating an infinite loop in this function because we've already removed N from the set.
test/CodeGen/ARM/sxt_rot.ll
12–13 ↗	(On Diff #11743)	Yea, this seemed clearly like a regression. Thanks for prepping the fix.
test/CodeGen/X86/block-placement.ll
240 ↗	(On Diff #11743)	This test was trying to check for a very precise misbehavior when we formed a particularly convoluted loop structure (see the comment). After this patch, we don't form that loop structure. I have no way of re-reducing a test case that still does, and the value seems very low.
test/CodeGen/X86/divide-by-constant.ll
60 ↗	(On Diff #11743)	Correct. As it happens, this code is quite a bit better (fewer reg-reg dependencies).
test/CodeGen/X86/narrow-shl-load.ll
34–61 ↗	(On Diff #11743)	Note that the code for this test case is actually worse after my patch. Before: test2: # @test2 # BB#0: # %entry pushq %rax movl $127, 4(%rsp) movb $0, 3(%rsp) movl 4(%rsp), %eax addl %eax, %eax movsbl %al, %eax sarl %eax cmpl $-1, %eax jne .LBB1_2 After: test2: # @test2 # BB#0: # %entry pushq %rax movl $127, 4(%rsp) movb $0, 3(%rsp) movl 4(%rsp), %eax addl %eax, %eax movsbl %al, %eax shrl %eax movzbl %al, %eax cmpl $255, %eax jne .LBB1_2 I haven't tracked this down, but the test doesn't really give any information about what this was actually trying to test. The change it went in with added a single value type test to avoid one DAG combine. Without more context for what the desirable code was, it seemed a bad idea to keep the test at all. We probably should have something which recognizes that we could do "sarl, cmpl -1" rather than "shrl, movzbl, cmpl 255" (or equivalent other constants). But I'm not fussed about letting this regress minorly and letting someone who wants come along and clean it up later.
test/CodeGen/X86/store-narrow.ll
37 ↗	(On Diff #11743)	Note that this is probably a (very minor) regression as we don't actually create partial register stalls here.
test/CodeGen/X86/vec_extract-sse4.ll
7 ↗	(On Diff #11743)	This is also likely a minor regression as we fail to avoid the cross-register-bank copy, although in this case it seems likely to be quite minor.

I think it's fine to hit the regressions later if/when they show up.

This LGTM w/ the recent updates. I second Tim's comment about node ordering assumptions. That's bad news anyway.

Closed by commit rL213727 (authored by @chandlerc).

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

123 lines

test/

CodeGen/

ARM/

fold-stack-adjust.ll

4 lines

sxt_rot.ll

3 lines

PowerPC/

complex-return.ll

4 lines

subsumes-pred-regs.ll

2 lines

R600/

r600-export-fix.ll

4 lines

swizzle-export.ll

2 lines

Thumb2/

thumb2-sxt_rot.ll

3 lines

thumb2-uxt_rot.ll

2 lines

X86/

avx512-zext-load-crash.ll

14 lines

block-placement.ll

38 lines

divide-by-constant.ll

8 lines

117 lines

34 lines

4 lines

4 lines

Diff 11804

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 12 Lines
// This pass is not a substitute for the LLVM IR instcombine pass. This pass is		// This pass is not a substitute for the LLVM IR instcombine pass. This pass is
// primarily intended to handle simplification opportunities that are implicit		// primarily intended to handle simplification opportunities that are implicit
// in the LLVM IR and exposed by the various codegen lowering phases.		// in the LLVM IR and exposed by the various codegen lowering phases.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/CodeGen/SelectionDAG.h"		#include "llvm/CodeGen/SelectionDAG.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	class DAGCombiner {
SelectionDAG &DAG;		SelectionDAG &DAG;
const TargetLowering &TLI;		const TargetLowering &TLI;
CombineLevel Level;		CombineLevel Level;
CodeGenOpt::Level OptLevel;		CodeGenOpt::Level OptLevel;
bool LegalOperations;		bool LegalOperations;
bool LegalTypes;		bool LegalTypes;
bool ForCodeSize;		bool ForCodeSize;

// Worklist of all of the nodes that need to be simplified.		/// \brief Worklist of all of the nodes that need to be simplified.
//		///
// This has the semantics that when adding to the worklist,		/// This must behave as a stack -- new nodes to process are pushed onto the
// the item added must be next to be processed. It should		/// back and when processing we pop off of the back.
// also only appear once. The naive approach to this takes		///
// linear time.		/// The worklist will not contain duplicates but may contain null entries
//		/// due to nodes being deleted from the underlying DAG.
// To reduce the insert/remove time to logarithmic, we use		SmallVector<SDNode *, 64> Worklist;
// a set and a vector to maintain our worklist.
//		/// \brief Mapping from an SDNode to its position on the worklist.
// The set contains the items on the worklist, but does not		///
// maintain the order they should be visited.		/// This is used to find and remove nodes from the worklist (by nulling
//		/// them) when they are deleted from the underlying DAG. It relies on
// The vector maintains the order nodes should be visited, but may		/// stable indices of nodes within the worklist.
// contain duplicate or removed nodes. When choosing a node to		DenseMap<SDNode *, unsigned> WorklistMap;
// visit, we pop off the order stack until we find an item that is
// also in the contents set. All operations are O(log N).
SmallPtrSet<SDNode*, 64> WorklistContents;
SmallVector<SDNode*, 64> WorklistOrder;

// AA - Used for DAG load/store alias analysis.		// AA - Used for DAG load/store alias analysis.
AliasAnalysis &AA;		AliasAnalysis &AA;

/// AddUsersToWorklist - When an instruction is simplified, add all users of		/// AddUsersToWorklist - When an instruction is simplified, add all users of
/// the instruction to the work lists because they might get more simplified		/// the instruction to the work lists because they might get more simplified
/// now.		/// now.
///		///
Show All 10 Lines	public:
/// AddToWorklist - Add to the work list making sure its instance is at the		/// AddToWorklist - Add to the work list making sure its instance is at the
/// back (next to be processed.)		/// back (next to be processed.)
void AddToWorklist(SDNode *N) {		void AddToWorklist(SDNode *N) {
// Skip handle nodes as they can't usefully be combined and confuse the		// Skip handle nodes as they can't usefully be combined and confuse the
// zero-use deletion strategy.		// zero-use deletion strategy.
if (N->getOpcode() == ISD::HANDLENODE)		if (N->getOpcode() == ISD::HANDLENODE)
return;		return;

WorklistContents.insert(N);		if (WorklistMap.insert(std::make_pair(N, Worklist.size())).second)
WorklistOrder.push_back(N);		Worklist.push_back(N);
}		}

/// removeFromWorklist - remove all instances of N from the worklist.		/// removeFromWorklist - remove all instances of N from the worklist.
///		///
void removeFromWorklist(SDNode *N) {		void removeFromWorklist(SDNode *N) {
WorklistContents.erase(N);		auto It = WorklistMap.find(N);
		if (It == WorklistMap.end())
		return; // Not in the worklist.

		// Null out the entry rather than erasing it to avoid a linear operation.
		Worklist[It->second] = nullptr;
		WorklistMap.erase(It);
}		}

		bool recursivelyDeleteUnusedNodes(SDNode *N);

SDValue CombineTo(SDNode N, const SDValue To, unsigned NumTo,		SDValue CombineTo(SDNode N, const SDValue To, unsigned NumTo,
bool AddTo = true);		bool AddTo = true);

SDValue CombineTo(SDNode *N, SDValue Res, bool AddTo = true) {		SDValue CombineTo(SDNode *N, SDValue Res, bool AddTo = true) {
return CombineTo(N, &Res, 1, AddTo);		return CombineTo(N, &Res, 1, AddTo);
}		}

SDValue CombineTo(SDNode *N, SDValue Res0, SDValue Res1,		SDValue CombineTo(SDNode *N, SDValue Res0, SDValue Res1,
▲ Show 20 Lines • Show All 914 Lines • ▼ Show 20 Lines	if (TLI.IsDesirableToPromoteOp(Op, PVT)) {
removeFromWorklist(N);		removeFromWorklist(N);
DAG.DeleteNode(N);		DAG.DeleteNode(N);
AddToWorklist(Result.getNode());		AddToWorklist(Result.getNode());
return true;		return true;
}		}
return false;		return false;
}		}

		/// \brief Recursively delete a node which has no uses and any operands for
		/// which it is the only use.
		///
		/// Note that this both deletes the nodes and removes them from the worklist.
		/// It also adds any nodes who have had a user deleted to the worklist as they
		/// may now have only one use and subject to other combines.
		bool DAGCombiner::recursivelyDeleteUnusedNodes(SDNode *N) {
		if (!N->use_empty())
		return false;

		SmallSetVector<SDNode *, 16> Nodes;
		Nodes.insert(N);
		do {
		N = Nodes.pop_back_val();
		if (!N)
		continue;

		if (N->use_empty()) {
		for (unsigned i = 0, e = N->getNumOperands(); i != e; ++i)
		Nodes.insert(N->getOperand(i).getNode());

		removeFromWorklist(N);
		DAG.DeleteNode(N);
		} else {
		AddToWorklist(N);
		}
		} while (!Nodes.empty());
		return true;
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Main DAG Combiner implementation		// Main DAG Combiner implementation
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void DAGCombiner::Run(CombineLevel AtLevel) {		void DAGCombiner::Run(CombineLevel AtLevel) {
// set the instance variables, so that the various visit routines may use it.		// set the instance variables, so that the various visit routines may use it.
Level = AtLevel;		Level = AtLevel;
Show All 11 Lines	void DAGCombiner::Run(CombineLevel AtLevel) {
HandleSDNode Dummy(DAG.getRoot());		HandleSDNode Dummy(DAG.getRoot());

// The root of the dag may dangle to deleted nodes until the dag combiner is		// The root of the dag may dangle to deleted nodes until the dag combiner is
// done. Set it to null to avoid confusion.		// done. Set it to null to avoid confusion.
DAG.setRoot(SDValue());		DAG.setRoot(SDValue());

// while the worklist isn't empty, find a node and		// while the worklist isn't empty, find a node and
// try and combine it.		// try and combine it.
while (!WorklistContents.empty()) {		while (!WorklistMap.empty()) {
SDNode *N;		SDNode *N;
// The WorklistOrder holds the SDNodes in order, but it may contain		// The Worklist holds the SDNodes in order, but it may contain null entries.
// duplicates.
// In order to avoid a linear scan, we use a set (O(log N)) to hold what the
// worklist should contain, and check the node we want to visit is should
// actually be visited.
do {		do {
N = WorklistOrder.pop_back_val();		N = Worklist.pop_back_val();
} while (!WorklistContents.erase(N));		} while (!N);

		bool GoodWorklistEntry = WorklistMap.erase(N);
		(void)GoodWorklistEntry;
		assert(GoodWorklistEntry &&
		"Found a worklist entry without a corresponding map entry!");

// If N has no uses, it is dead. Make sure to revisit all N's operands once		// If N has no uses, it is dead. Make sure to revisit all N's operands once
// N is deleted from the DAG, since they too may now be dead or may have a		// N is deleted from the DAG, since they too may now be dead or may have a
// reduced number of uses, allowing other xforms.		// reduced number of uses, allowing other xforms.
if (N->use_empty()) {		if (recursivelyDeleteUnusedNodes(N))
for (unsigned i = 0, e = N->getNumOperands(); i != e; ++i)
AddToWorklist(N->getOperand(i).getNode());

DAG.DeleteNode(N);
continue;		continue;
}
		WorklistRemover DeadNodes(*this);

SDValue RV = combine(N);		SDValue RV = combine(N);

if (!RV.getNode())		if (!RV.getNode())
continue;		continue;

++NodesCombined;		++NodesCombined;

Show All 11 Lines	while (!WorklistMap.empty()) {
DEBUG(dbgs() << "\nReplacing.3 ";		DEBUG(dbgs() << "\nReplacing.3 ";
N->dump(&DAG);		N->dump(&DAG);
dbgs() << "\nWith: ";		dbgs() << "\nWith: ";
RV.getNode()->dump(&DAG);		RV.getNode()->dump(&DAG);
dbgs() << '\n');		dbgs() << '\n');

// Transfer debug value.		// Transfer debug value.
DAG.TransferDbgValues(SDValue(N, 0), RV);		DAG.TransferDbgValues(SDValue(N, 0), RV);
WorklistRemover DeadNodes(*this);
if (N->getNumValues() == RV.getNode()->getNumValues())		if (N->getNumValues() == RV.getNode()->getNumValues())
DAG.ReplaceAllUsesWith(N, RV.getNode());		DAG.ReplaceAllUsesWith(N, RV.getNode());
else {		else {
assert(N->getValueType(0) == RV.getValueType() &&		assert(N->getValueType(0) == RV.getValueType() &&
N->getNumValues() == 1 && "Type mismatch");		N->getNumValues() == 1 && "Type mismatch");
SDValue OpV = RV;		SDValue OpV = RV;
DAG.ReplaceAllUsesWith(N, &OpV);		DAG.ReplaceAllUsesWith(N, &OpV);
}		}

// Push the new node and any users onto the worklist		// Push the new node and any users onto the worklist
AddToWorklist(RV.getNode());		AddToWorklist(RV.getNode());
AddUsersToWorklist(RV.getNode());		AddUsersToWorklist(RV.getNode());

// Add any uses of the old node to the worklist in case this node is the
// last one that uses them. They may become dead after this node is
// deleted.
for (unsigned i = 0, e = N->getNumOperands(); i != e; ++i)
AddToWorklist(N->getOperand(i).getNode());

// Finally, if the node is now dead, remove it from the graph. The node		// Finally, if the node is now dead, remove it from the graph. The node
// may not be dead if the replacement process recursively simplified to		// may not be dead if the replacement process recursively simplified to
// something else needing this node.		// something else needing this node. This will also take care of adding any
if (N->use_empty()) {		// operands which have lost a user to the worklist.
// Nodes can be reintroduced into the worklist. Make sure we do not		recursivelyDeleteUnusedNodes(N);
// process a node that has been replaced.
removeFromWorklist(N);

// Finally, since the node is now dead, remove it from the graph.
DAG.DeleteNode(N);
}
}		}

// If the root changed (e.g. it was a dead load, update the root).		// If the root changed (e.g. it was a dead load, update the root).
DAG.setRoot(Dummy.getValue());		DAG.setRoot(Dummy.getValue());
DAG.RemoveDeadNodes();		DAG.RemoveDeadNodes();
}		}

SDValue DAGCombiner::visit(SDNode *N) {		SDValue DAGCombiner::visit(SDNode *N) {
▲ Show 20 Lines • Show All 10,619 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/fold-stack-adjust.ll

Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	end:
; We want the epilogue to be the only thing in a basic block so that we hit		; We want the epilogue to be the only thing in a basic block so that we hit
; the correct edge-case (first inst in block is correct one to adjust).		; the correct edge-case (first inst in block is correct one to adjust).
ret void		ret void
}		}

define void @test_varsize(...) minsize {		define void @test_varsize(...) minsize {
; CHECK-T1-LABEL: test_varsize:		; CHECK-T1-LABEL: test_varsize:
; CHECK-T1: sub sp, #16		; CHECK-T1: sub sp, #16
; CHECK-T1: push {r2, r3, r4, r5, r7, lr}		; CHECK-T1: push {r5, r6, r7, lr}
; ...		; ...
; CHECK-T1: pop {r2, r3, r4, r5, r7}		; CHECK-T1: pop {r2, r3, r7}
; CHECK-T1: pop {r3}		; CHECK-T1: pop {r3}
; CHECK-T1: add sp, #16		; CHECK-T1: add sp, #16
; CHECK-T1: bx r3		; CHECK-T1: bx r3

; CHECK-LABEL: test_varsize:		; CHECK-LABEL: test_varsize:
; CHECK: sub sp, #16		; CHECK: sub sp, #16
; CHECK: push.w {r9, r10, r11, lr}		; CHECK: push.w {r9, r10, r11, lr}
; ...		; ...
Show All 38 Lines

llvm/trunk/test/CodeGen/ARM/sxt_rot.ll

	; RUN: llc -mtriple=arm-eabi -mattr=+v6 %s -o - \| FileCheck %s			; RUN: llc -mtriple=arm-eabi -mattr=+v6 %s -o - \| FileCheck %s

	define i32 @test0(i8 %A) {			define i32 @test0(i8 %A) {
	; CHECK: test0			; CHECK: test0
	; CHECK: sxtb r0, r0			; CHECK: sxtb r0, r0
	%B = sext i8 %A to i32			%B = sext i8 %A to i32
	ret i32 %B			ret i32 %B
	}			}

	define signext i8 @test1(i32 %A) {			define signext i8 @test1(i32 %A) {
	; CHECK: test1			; CHECK: test1
	; CHECK: sxtb r0, r0, ror #8			; CHECK: lsr r0, r0, #8
				; CHECK: sxtb r0, r0
	%B = lshr i32 %A, 8			%B = lshr i32 %A, 8
	%C = shl i32 %A, 24			%C = shl i32 %A, 24
	%D = or i32 %B, %C			%D = or i32 %B, %C
	%E = trunc i32 %D to i8			%E = trunc i32 %D to i8
	ret i8 %E			ret i8 %E
	}			}

	define signext i32 @test2(i32 %A, i32 %X) {			define signext i32 @test2(i32 %A, i32 %X) {
	Show All 10 Lines

llvm/trunk/test/CodeGen/PowerPC/complex-return.ll

Show All 18 Lines	entry:
%imag2 = getelementptr inbounds { ppc_fp128, ppc_fp128 }* %retval, i32 0, i32 1		%imag2 = getelementptr inbounds { ppc_fp128, ppc_fp128 }* %retval, i32 0, i32 1
store ppc_fp128 %x.real, ppc_fp128* %real1		store ppc_fp128 %x.real, ppc_fp128* %real1
store ppc_fp128 %x.imag, ppc_fp128* %imag2		store ppc_fp128 %x.imag, ppc_fp128* %imag2
%0 = load { ppc_fp128, ppc_fp128 }* %retval		%0 = load { ppc_fp128, ppc_fp128 }* %retval
ret { ppc_fp128, ppc_fp128 } %0		ret { ppc_fp128, ppc_fp128 } %0
}		}

; CHECK-LABEL: foo:		; CHECK-LABEL: foo:
; CHECK: lfd 3
; CHECK: lfd 4
; CHECK: lfd 1		; CHECK: lfd 1
; CHECK: lfd 2		; CHECK: lfd 2
		; CHECK: lfd 3
		; CHECK: lfd 4

define { float, float } @oof() nounwind {		define { float, float } @oof() nounwind {
entry:		entry:
%retval = alloca { float, float }, align 4		%retval = alloca { float, float }, align 4
%x = alloca { float, float }, align 4		%x = alloca { float, float }, align 4
%real = getelementptr inbounds { float, float }* %x, i32 0, i32 0		%real = getelementptr inbounds { float, float }* %x, i32 0, i32 0
%imag = getelementptr inbounds { float, float }* %x, i32 0, i32 1		%imag = getelementptr inbounds { float, float }* %x, i32 0, i32 1
store float 3.500000e+00, float* %real		store float 3.500000e+00, float* %real
Show All 17 Lines

llvm/trunk/test/CodeGen/PowerPC/subsumes-pred-regs.ll

	Show All 29 Lines
	if.end7.i37: ; preds = %test.exit27.i34, %if.end.i24			if.end7.i37: ; preds = %test.exit27.i34, %if.end.i24
	%tobool.i.i36 = icmp eq i8 undef, 0			%tobool.i.i36 = icmp eq i8 undef, 0
	br i1 %tobool.i.i36, label %return, label %if.then9.i39			br i1 %tobool.i.i36, label %return, label %if.then9.i39

	if.then9.i39: ; preds = %if.end7.i37			if.then9.i39: ; preds = %if.end7.i37
	br i1 %lnot.i.i16.i23, label %return, label %lor.rhs.i.i49			br i1 %lnot.i.i16.i23, label %return, label %lor.rhs.i.i49

	; CHECK: .LBB0_7:			; CHECK: .LBB0_7:
	; CHECK: beq 1, .LBB0_10			; CHECK: bne 1, .LBB0_10
	; CHECK: beq 0, .LBB0_10			; CHECK: beq 0, .LBB0_10
	; CHECK: .LBB0_9:			; CHECK: .LBB0_9:

	lor.rhs.i.i49: ; preds = %if.then9.i39			lor.rhs.i.i49: ; preds = %if.then9.i39
	%cmp.i.i.i.i48 = icmp ne i64 undef, 0			%cmp.i.i.i.i48 = icmp ne i64 undef, 0
	br label %return			br label %return

	land.rhs: ; preds = %lor.end			land.rhs: ; preds = %lor.end
	Show All 19 Lines

llvm/trunk/test/CodeGen/R600/r600-export-fix.ll

	; RUN: llc < %s -march=r600 -mcpu=cedar \| FileCheck %s			; RUN: llc < %s -march=r600 -mcpu=cedar \| FileCheck %s

	;CHECK: EXPORT T{{[0-9]}}.XYZW			;CHECK: EXPORT T{{[0-9]}}.XYZW
	;CHECK: EXPORT T{{[0-9]}}.0000			;CHECK: EXPORT T{{[0-9]}}.0000
	;CHECK: EXPORT T{{[0-9]}}.0000			;CHECK: EXPORT T{{[0-9]}}.0000
	;CHECK: EXPORT T{{[0-9]}}.0XZW			;CHECK: EXPORT T{{[0-9]}}.0XYZ
	;CHECK: EXPORT T{{[0-9]}}.XYZW			;CHECK: EXPORT T{{[0-9]}}.XYZW
	;CHECK: EXPORT T{{[0-9]}}.YX00			;CHECK: EXPORT T{{[0-9]}}.YZ00
	;CHECK: EXPORT T{{[0-9]}}.0000			;CHECK: EXPORT T{{[0-9]}}.0000
	;CHECK: EXPORT T{{[0-9]}}.0000			;CHECK: EXPORT T{{[0-9]}}.0000


	define void @main(<4 x float> inreg %reg0, <4 x float> inreg %reg1) #0 {			define void @main(<4 x float> inreg %reg0, <4 x float> inreg %reg1) #0 {
	main_body:			main_body:
	%0 = extractelement <4 x float> %reg1, i32 0			%0 = extractelement <4 x float> %reg1, i32 0
	%1 = extractelement <4 x float> %reg1, i32 1			%1 = extractelement <4 x float> %reg1, i32 1
	▲ Show 20 Lines • Show All 126 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/R600/swizzle-export.ll

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	main_body:
%74 = insertelement <4 x float> %73, float %52, i32 2		%74 = insertelement <4 x float> %73, float %52, i32 2
%75 = insertelement <4 x float> %74, float %55, i32 3		%75 = insertelement <4 x float> %74, float %55, i32 3
call void @llvm.R600.store.swizzle(<4 x float> %75, i32 3, i32 2)		call void @llvm.R600.store.swizzle(<4 x float> %75, i32 3, i32 2)
ret void		ret void
}		}

; EG-CHECK: @main2		; EG-CHECK: @main2
; EG-CHECK: T{{[0-9]+}}.XY__		; EG-CHECK: T{{[0-9]+}}.XY__
; EG-CHECK: T{{[0-9]+}}.YXZ0		; EG-CHECK: T{{[0-9]+}}.ZXY0

define void @main2(<4 x float> inreg %reg0, <4 x float> inreg %reg1) #0 {		define void @main2(<4 x float> inreg %reg0, <4 x float> inreg %reg1) #0 {
main_body:		main_body:
%0 = extractelement <4 x float> %reg1, i32 0		%0 = extractelement <4 x float> %reg1, i32 0
%1 = extractelement <4 x float> %reg1, i32 1		%1 = extractelement <4 x float> %reg1, i32 1
%2 = fadd float %0, 2.5		%2 = fadd float %0, 2.5
%3 = fmul float %1, 3.5		%3 = fmul float %1, 3.5
%4 = load <4 x float> addrspace(8)* getelementptr ([1024 x <4 x float>] addrspace(8)* null, i64 0, i32 1)		%4 = load <4 x float> addrspace(8)* getelementptr ([1024 x <4 x float>] addrspace(8)* null, i64 0, i32 1)
Show All 24 Lines

llvm/trunk/test/CodeGen/Thumb2/thumb2-sxt_rot.ll

	; RUN: llc -mtriple=thumb-eabi -mcpu=arm1156t2-s -mattr=+thumb2,+t2xtpk %s -o - \			; RUN: llc -mtriple=thumb-eabi -mcpu=arm1156t2-s -mattr=+thumb2,+t2xtpk %s -o - \
	; RUN: \| FileCheck %s			; RUN: \| FileCheck %s

	define i32 @test0(i8 %A) {			define i32 @test0(i8 %A) {
	; CHECK: test0			; CHECK: test0
	; CHECK: sxtb r0, r0			; CHECK: sxtb r0, r0
	%B = sext i8 %A to i32			%B = sext i8 %A to i32
	ret i32 %B			ret i32 %B
	}			}

	define signext i8 @test1(i32 %A) {			define signext i8 @test1(i32 %A) {
	; CHECK: test1			; CHECK: test1
	; CHECK: sxtb.w r0, r0, ror #8			; CHECK: lsrs r0, r0, #8
				; CHECK: sxtb r0, r0
	%B = lshr i32 %A, 8			%B = lshr i32 %A, 8
	%C = shl i32 %A, 24			%C = shl i32 %A, 24
	%D = or i32 %B, %C			%D = or i32 %B, %C
	%E = trunc i32 %D to i8			%E = trunc i32 %D to i8
	ret i8 %E			ret i8 %E
	}			}

	define signext i32 @test2(i32 %A, i32 %X) {			define signext i32 @test2(i32 %A, i32 %X) {
	Show All 11 Lines

llvm/trunk/test/CodeGen/Thumb2/thumb2-uxt_rot.ll

Show All 19 Lines	; M3: add r0, r1
%C.u = trunc i32 %B.u to i8		%C.u = trunc i32 %B.u to i8
%D.u = zext i8 %C.u to i32		%D.u = zext i8 %C.u to i32
%E.u = add i32 %A.u, %D.u		%E.u = add i32 %A.u, %D.u
ret i32 %E.u		ret i32 %E.u
}		}

define zeroext i32 @test3(i32 %A.u) {		define zeroext i32 @test3(i32 %A.u) {
; A8: test3		; A8: test3
; A8: uxth.w r0, r0, ror #8		; A8: ubfx r0, r0, #8, #16
%B.u = lshr i32 %A.u, 8		%B.u = lshr i32 %A.u, 8
%C.u = shl i32 %A.u, 24		%C.u = shl i32 %A.u, 24
%D.u = or i32 %B.u, %C.u		%D.u = or i32 %B.u, %C.u
%E.u = trunc i32 %D.u to i16		%E.u = trunc i32 %D.u to i16
%F.u = zext i16 %E.u to i32		%F.u = zext i16 %E.u to i32
ret i32 %F.u		ret i32 %F.u
}		}

llvm/trunk/test/CodeGen/X86/avx512-zext-load-crash.ll

	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl \| FileCheck %s

	define <8 x i16> @test_zext_load() {
	; CHECK: vmovq
	entry:
	%0 = load <2 x i16> ** undef, align 8
	%1 = getelementptr inbounds <2 x i16>* %0, i64 1
	%2 = load <2 x i16>* %0, align 1
	%3 = shufflevector <2 x i16> %2, <2 x i16> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%4 = load <2 x i16>* %1, align 1
	%5 = shufflevector <2 x i16> %4, <2 x i16> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%6 = shufflevector <8 x i16> %3, <8 x i16> %5, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
	ret <8 x i16> %6
	}

llvm/trunk/test/CodeGen/X86/block-placement.ll

Show First 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	body1:
%next = add i32 %iv, 1		%next = add i32 %iv, 1
%exitcond = icmp eq i32 %next, %i		%exitcond = icmp eq i32 %next, %i
br i1 %exitcond, label %exit, label %body0		br i1 %exitcond, label %exit, label %body0

exit:		exit:
ret i32 %base		ret i32 %base
}		}

define void @test_loop_rotate_reversed_blocks() {
; This test case (greatly reduced from an Olden bencmark) ensures that the loop
; rotate implementation doesn't assume that loops are laid out in a particular
; order. The first loop will get split into two basic blocks, with the loop
; header coming after the loop latch.
;
; CHECK: test_loop_rotate_reversed_blocks
; CHECK: %entry
; Look for a jump into the middle of the loop, and no branches mid-way.
; CHECK: jmp
; CHECK: %loop1
; CHECK-NOT: j{{\w}} .LBB{{.}}
; CHECK: %loop1
; CHECK: je

entry:
%cond1 = load volatile i1* undef
br i1 %cond1, label %loop2.preheader, label %loop1

loop1:
call i32 @f()
%cond2 = load volatile i1* undef
br i1 %cond2, label %loop2.preheader, label %loop1

loop2.preheader:
call i32 @f()
%cond3 = load volatile i1* undef
br i1 %cond3, label %exit, label %loop2

loop2:
call i32 @f()
%cond4 = load volatile i1* undef
br i1 %cond4, label %exit, label %loop2

exit:
ret void
}

define i32 @test_loop_align(i32 %i, i32* %a) {		define i32 @test_loop_align(i32 %i, i32* %a) {
; Check that we provide basic loop body alignment with the block placement		; Check that we provide basic loop body alignment with the block placement
; pass.		; pass.
; CHECK-LABEL: test_loop_align:		; CHECK-LABEL: test_loop_align:
; CHECK: %entry		; CHECK: %entry
; CHECK: .align [[ALIGN:[0-9]+]],		; CHECK: .align [[ALIGN:[0-9]+]],
; CHECK-NEXT: %body		; CHECK-NEXT: %body
; CHECK: %exit		; CHECK: %exit
▲ Show 20 Lines • Show All 838 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/divide-by-constant.ll

	Show All 25 Lines
	define zeroext i8 @test3(i8 zeroext %x, i8 zeroext %c) nounwind readnone ssp noredzone {			define zeroext i8 @test3(i8 zeroext %x, i8 zeroext %c) nounwind readnone ssp noredzone {
	entry:			entry:
	%div = udiv i8 %c, 3			%div = udiv i8 %c, 3
	ret i8 %div			ret i8 %div

	; CHECK-LABEL: test3:			; CHECK-LABEL: test3:
	; CHECK: movzbl 8(%esp), %eax			; CHECK: movzbl 8(%esp), %eax
	; CHECK-NEXT: imull $171, %eax			; CHECK-NEXT: imull $171, %eax
				; CHECK-NEXT: andl $65024, %eax
	; CHECK-NEXT: shrl $9, %eax			; CHECK-NEXT: shrl $9, %eax
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	}			}

	define signext i16 @test4(i16 signext %x) nounwind {			define signext i16 @test4(i16 signext %x) nounwind {
	entry:			entry:
	%div = sdiv i16 %x, 33 ; <i32> [#uses=1]			%div = sdiv i16 %x, 33 ; <i32> [#uses=1]
	ret i16 %div			ret i16 %div
	Show All 9 Lines
	; CHECK: mull 4(%esp)			; CHECK: mull 4(%esp)
	}			}

	define signext i16 @test6(i16 signext %x) nounwind {			define signext i16 @test6(i16 signext %x) nounwind {
	entry:			entry:
	%div = sdiv i16 %x, 10			%div = sdiv i16 %x, 10
	ret i16 %div			ret i16 %div
	; CHECK-LABEL: test6:			; CHECK-LABEL: test6:
	; CHECK: imull $26215, %eax, %ecx			; CHECK: imull $26215, %eax
	; CHECK: sarl $18, %ecx			; CHECK: movl %eax, %ecx
	; CHECK: shrl $15, %eax			; CHECK: shrl $31, %ecx
				; CHECK: sarl $18, %eax
	}			}

	define i32 @test7(i32 %x) nounwind {			define i32 @test7(i32 %x) nounwind {
	%div = udiv i32 %x, 28			%div = udiv i32 %x, 28
	ret i32 %div			ret i32 %div
	; CHECK-LABEL: test7:			; CHECK-LABEL: test7:
	; CHECK: shrl $2			; CHECK: shrl $2
	; CHECK: movl $613566757			; CHECK: movl $613566757
	Show All 25 Lines

llvm/trunk/test/CodeGen/X86/fold-pcmpeqd-0.ll

	; RUN: llc < %s -mtriple=x86_64-apple-darwin \| FileCheck --check-prefix=X86-64 %s
	; DISABLED: llc < %s -mtriple=i386-apple-darwin -mcpu=yonah -regalloc=linearscan \| FileCheck --check-prefix=I386 %s

	; i386 test has been disabled when scheduler 2-addr hack is disabled.

	; This testcase shouldn't need to spill the -1 value,
	; so it should just use pcmpeqd to materialize an all-ones vector.
	; For i386, cp load of -1 are folded.

	; With -regalloc=greedy, the live range is split before spilling, so the first
	; pcmpeq doesn't get folded as a constant pool load.

	; I386-NOT: pcmpeqd
	; I386: orps LCPI0_2, %xmm
	; I386-NOT: pcmpeqd
	; I386: orps LCPI0_2, %xmm

	; X86-64: pcmpeqd
	; X86-64-NOT: pcmpeqd

	%struct.__ImageExecInfo = type <{ <4 x i32>, <4 x float>, <2 x i64>, i8, i8, i8*, i32, i32, i32, i32, i32 }>
	%struct._cl_image_format_t = type <{ i32, i32, i32 }>
	%struct._image2d_t = type <{ i8*, %struct._cl_image_format_t, i32, i32, i32, i32, i32, i32 }>

	define void @program_1(%struct._image2d_t* %dest, %struct._image2d_t* %t0, <4 x float> %p0, <4 x float> %p1, <4 x float> %p4, <4 x float> %p5, <4 x float> %p6) nounwind {
	entry:
	%tmp3.i = load i32* null ; <i32> [#uses=1]
	%cmp = icmp sgt i32 %tmp3.i, 200 ; <i1> [#uses=1]
	br i1 %cmp, label %forcond, label %ifthen

	ifthen: ; preds = %entry
	ret void

	forcond: ; preds = %entry
	%tmp3.i536 = load i32* null ; <i32> [#uses=1]
	%cmp12 = icmp slt i32 0, %tmp3.i536 ; <i1> [#uses=1]
	br i1 %cmp12, label %forbody, label %afterfor

	forbody: ; preds = %forcond
	%bitcast204.i313 = bitcast <4 x i32> zeroinitializer to <4 x float> ; <<4 x float>> [#uses=1]
	%mul233 = fmul <4 x float> %bitcast204.i313, zeroinitializer ; <<4 x float>> [#uses=1]
	%mul257 = fmul <4 x float> %mul233, zeroinitializer ; <<4 x float>> [#uses=1]
	%mul275 = fmul <4 x float> %mul257, zeroinitializer ; <<4 x float>> [#uses=1]
	%tmp51 = call <4 x float> @llvm.x86.sse.max.ps(<4 x float> %mul275, <4 x float> zeroinitializer) nounwind ; <<4 x float>> [#uses=1]
	%bitcast198.i182 = bitcast <4 x float> zeroinitializer to <4 x i32> ; <<4 x i32>> [#uses=0]
	%bitcast204.i185 = bitcast <4 x i32> zeroinitializer to <4 x float> ; <<4 x float>> [#uses=1]
	%tmp69 = call <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float> zeroinitializer) nounwind ; <<4 x i32>> [#uses=1]
	%tmp70 = call <4 x float> @llvm.x86.sse2.cvtdq2ps(<4 x i32> %tmp69) nounwind ; <<4 x float>> [#uses=1]
	%sub140.i78 = fsub <4 x float> zeroinitializer, %tmp70 ; <<4 x float>> [#uses=2]
	%mul166.i86 = fmul <4 x float> zeroinitializer, %sub140.i78 ; <<4 x float>> [#uses=1]
	%add167.i87 = fadd <4 x float> %mul166.i86, < float 0x3FE62ACB60000000, float 0x3FE62ACB60000000, float 0x3FE62ACB60000000, float 0x3FE62ACB60000000 > ; <<4 x float>> [#uses=1]
	%mul171.i88 = fmul <4 x float> %add167.i87, %sub140.i78 ; <<4 x float>> [#uses=1]
	%add172.i89 = fadd <4 x float> %mul171.i88, < float 0x3FF0000A40000000, float 0x3FF0000A40000000, float 0x3FF0000A40000000, float 0x3FF0000A40000000 > ; <<4 x float>> [#uses=1]
	%bitcast176.i90 = bitcast <4 x float> %add172.i89 to <4 x i32> ; <<4 x i32>> [#uses=1]
	%andnps178.i92 = and <4 x i32> %bitcast176.i90, zeroinitializer ; <<4 x i32>> [#uses=1]
	%bitcast179.i93 = bitcast <4 x i32> %andnps178.i92 to <4 x float> ; <<4 x float>> [#uses=1]
	%mul186.i96 = fmul <4 x float> %bitcast179.i93, zeroinitializer ; <<4 x float>> [#uses=1]
	%bitcast190.i98 = bitcast <4 x float> %mul186.i96 to <4 x i32> ; <<4 x i32>> [#uses=1]
	%andnps192.i100 = and <4 x i32> %bitcast190.i98, zeroinitializer ; <<4 x i32>> [#uses=1]
	%xorps.i102 = xor <4 x i32> zeroinitializer, < i32 -1, i32 -1, i32 -1, i32 -1 > ; <<4 x i32>> [#uses=1]
	%orps203.i103 = or <4 x i32> %andnps192.i100, %xorps.i102 ; <<4 x i32>> [#uses=1]
	%bitcast204.i104 = bitcast <4 x i32> %orps203.i103 to <4 x float> ; <<4 x float>> [#uses=1]
	%cmple.i = call <4 x float> @llvm.x86.sse.cmp.ps(<4 x float> zeroinitializer, <4 x float> %tmp51, i8 2) nounwind ; <<4 x float>> [#uses=1]
	%tmp80 = call <4 x float> @llvm.x86.sse2.cvtdq2ps(<4 x i32> zeroinitializer) nounwind ; <<4 x float>> [#uses=1]
	%sub140.i = fsub <4 x float> zeroinitializer, %tmp80 ; <<4 x float>> [#uses=1]
	%bitcast148.i = bitcast <4 x float> zeroinitializer to <4 x i32> ; <<4 x i32>> [#uses=1]
	%andnps150.i = and <4 x i32> %bitcast148.i, < i32 -2139095041, i32 -2139095041, i32 -2139095041, i32 -2139095041 > ; <<4 x i32>> [#uses=0]
	%mul171.i = fmul <4 x float> zeroinitializer, %sub140.i ; <<4 x float>> [#uses=1]
	%add172.i = fadd <4 x float> %mul171.i, < float 0x3FF0000A40000000, float 0x3FF0000A40000000, float 0x3FF0000A40000000, float 0x3FF0000A40000000 > ; <<4 x float>> [#uses=1]
	%bitcast176.i = bitcast <4 x float> %add172.i to <4 x i32> ; <<4 x i32>> [#uses=1]
	%andnps178.i = and <4 x i32> %bitcast176.i, zeroinitializer ; <<4 x i32>> [#uses=1]
	%bitcast179.i = bitcast <4 x i32> %andnps178.i to <4 x float> ; <<4 x float>> [#uses=1]
	%mul186.i = fmul <4 x float> %bitcast179.i, zeroinitializer ; <<4 x float>> [#uses=1]
	%bitcast189.i = bitcast <4 x float> zeroinitializer to <4 x i32> ; <<4 x i32>> [#uses=0]
	%bitcast190.i = bitcast <4 x float> %mul186.i to <4 x i32> ; <<4 x i32>> [#uses=1]
	%andnps192.i = and <4 x i32> %bitcast190.i, zeroinitializer ; <<4 x i32>> [#uses=1]
	%bitcast198.i = bitcast <4 x float> %cmple.i to <4 x i32> ; <<4 x i32>> [#uses=1]
	%xorps.i = xor <4 x i32> %bitcast198.i, < i32 -1, i32 -1, i32 -1, i32 -1 > ; <<4 x i32>> [#uses=1]
	%orps203.i = or <4 x i32> %andnps192.i, %xorps.i ; <<4 x i32>> [#uses=1]
	%bitcast204.i = bitcast <4 x i32> %orps203.i to <4 x float> ; <<4 x float>> [#uses=1]
	%mul307 = fmul <4 x float> %bitcast204.i185, zeroinitializer ; <<4 x float>> [#uses=1]
	%mul310 = fmul <4 x float> %bitcast204.i104, zeroinitializer ; <<4 x float>> [#uses=2]
	%mul313 = fmul <4 x float> %bitcast204.i, zeroinitializer ; <<4 x float>> [#uses=1]
	%tmp82 = call <4 x float> @llvm.x86.sse.min.ps(<4 x float> %mul307, <4 x float> zeroinitializer) nounwind ; <<4 x float>> [#uses=1]
	%bitcast11.i15 = bitcast <4 x float> %tmp82 to <4 x i32> ; <<4 x i32>> [#uses=1]
	%andnps.i17 = and <4 x i32> %bitcast11.i15, zeroinitializer ; <<4 x i32>> [#uses=1]
	%orps.i18 = or <4 x i32> %andnps.i17, zeroinitializer ; <<4 x i32>> [#uses=1]
	%bitcast17.i19 = bitcast <4 x i32> %orps.i18 to <4 x float> ; <<4 x float>> [#uses=1]
	%tmp83 = call <4 x float> @llvm.x86.sse.min.ps(<4 x float> %mul310, <4 x float> zeroinitializer) nounwind ; <<4 x float>> [#uses=1]
	%bitcast.i3 = bitcast <4 x float> %mul310 to <4 x i32> ; <<4 x i32>> [#uses=1]
	%bitcast6.i4 = bitcast <4 x float> zeroinitializer to <4 x i32> ; <<4 x i32>> [#uses=2]
	%andps.i5 = and <4 x i32> %bitcast.i3, %bitcast6.i4 ; <<4 x i32>> [#uses=1]
	%bitcast11.i6 = bitcast <4 x float> %tmp83 to <4 x i32> ; <<4 x i32>> [#uses=1]
	%not.i7 = xor <4 x i32> %bitcast6.i4, < i32 -1, i32 -1, i32 -1, i32 -1 > ; <<4 x i32>> [#uses=1]
	%andnps.i8 = and <4 x i32> %bitcast11.i6, %not.i7 ; <<4 x i32>> [#uses=1]
	%orps.i9 = or <4 x i32> %andnps.i8, %andps.i5 ; <<4 x i32>> [#uses=1]
	%bitcast17.i10 = bitcast <4 x i32> %orps.i9 to <4 x float> ; <<4 x float>> [#uses=1]
	%bitcast.i = bitcast <4 x float> %mul313 to <4 x i32> ; <<4 x i32>> [#uses=1]
	%andps.i = and <4 x i32> %bitcast.i, zeroinitializer ; <<4 x i32>> [#uses=1]
	%orps.i = or <4 x i32> zeroinitializer, %andps.i ; <<4 x i32>> [#uses=1]
	%bitcast17.i = bitcast <4 x i32> %orps.i to <4 x float> ; <<4 x float>> [#uses=1]
	call void null(<4 x float> %bitcast17.i19, <4 x float> %bitcast17.i10, <4 x float> %bitcast17.i, <4 x float> zeroinitializer, %struct.__ImageExecInfo* null, <4 x i32> zeroinitializer) nounwind
	unreachable

	afterfor: ; preds = %forcond
	ret void
	}

	declare <4 x float> @llvm.x86.sse.cmp.ps(<4 x float>, <4 x float>, i8) nounwind readnone

	declare <4 x float> @llvm.x86.sse2.cvtdq2ps(<4 x i32>) nounwind readnone

	declare <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float>) nounwind readnone

	declare <4 x float> @llvm.x86.sse.max.ps(<4 x float>, <4 x float>) nounwind readnone

	declare <4 x float> @llvm.x86.sse.min.ps(<4 x float>, <4 x float>) nounwind readnone

llvm/trunk/test/CodeGen/X86/narrow-shl-load.ll

Show All 24 Lines	while.cond: ; preds = %while.cond, %bb.nph
%conv22 = trunc i64 %add21 to i32		%conv22 = trunc i64 %add21 to i32
store i32 %conv22, i32* undef, align 4		store i32 %conv22, i32* undef, align 4
br i1 false, label %while.end, label %while.cond		br i1 false, label %while.end, label %while.cond

while.end: ; preds = %while.cond		while.end: ; preds = %while.cond
ret void		ret void
}		}


; DAGCombiner shouldn't fold the sdiv (ashr) away.
; rdar://8636812
; CHECK-LABEL: test2:
; CHECK: sarl

define i32 @test2() nounwind {
entry:
%i = alloca i32, align 4
%j = alloca i8, align 1
store i32 127, i32* %i, align 4
store i8 0, i8* %j, align 1
%tmp3 = load i32* %i, align 4
%mul = mul nsw i32 %tmp3, 2
%conv4 = trunc i32 %mul to i8
%conv5 = sext i8 %conv4 to i32
%div6 = sdiv i32 %conv5, 2
%conv7 = trunc i32 %div6 to i8
%conv9 = sext i8 %conv7 to i32
%cmp = icmp eq i32 %conv9, -1
br i1 %cmp, label %if.then, label %if.end

if.then: ; preds = %entry
ret i32 0

if.end: ; preds = %entry
call void @abort() noreturn
unreachable
}

declare void @abort() noreturn

declare void @exit(i32) noreturn

; DAG Combiner can't fold this into a load of the 1'th byte.		; DAG Combiner can't fold this into a load of the 1'th byte.
; PR8757		; PR8757
define i32 @test3(i32 *%P) nounwind ssp {		define i32 @test3(i32 *%P) nounwind ssp {
store volatile i32 128, i32* %P		store volatile i32 128, i32* %P
%tmp4.pre = load i32* %P		%tmp4.pre = load i32* %P
%phitmp = trunc i32 %tmp4.pre to i16		%phitmp = trunc i32 %tmp4.pre to i16
%phitmp13 = shl i16 %phitmp, 8		%phitmp13 = shl i16 %phitmp, 8
%phitmp14 = ashr i16 %phitmp13, 8		%phitmp14 = ashr i16 %phitmp13, 8
Show All 9 Lines

llvm/trunk/test/CodeGen/X86/store-narrow.ll

Show All 28 Lines	entry:
%CS = shl i32 %C, 8		%CS = shl i32 %C, 8
%D = or i32 %B, %CS		%D = or i32 %B, %CS
store i32 %D, i32* %a0, align 4		store i32 %D, i32* %a0, align 4
ret void		ret void
; X64-LABEL: test2:		; X64-LABEL: test2:
; X64: movb %sil, 1(%rdi)		; X64: movb %sil, 1(%rdi)

; X32-LABEL: test2:		; X32-LABEL: test2:
; X32: movb 8(%esp), %[[REG:[abcd]l]]		; X32: movzbl 8(%esp), %e[[REG:[abcd]]]x
; X32: movb %[[REG]], 1(%{{.*}})		; X32: movb %[[REG]]l, 1(%{{.*}})
}		}

define void @test3(i32* nocapture %a0, i16 zeroext %a1) nounwind ssp {		define void @test3(i32* nocapture %a0, i16 zeroext %a1) nounwind ssp {
entry:		entry:
%A = load i32* %a0, align 4		%A = load i32* %a0, align 4
%B = and i32 %A, -65536 ; 0xFFFF0000		%B = and i32 %A, -65536 ; 0xFFFF0000
%C = zext i16 %a1 to i32		%C = zext i16 %a1 to i32
%D = or i32 %B, %C		%D = or i32 %B, %C
▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vec_extract-sse4.ll

	; RUN: llc < %s -mcpu=corei7 -march=x86 -mattr=+sse4.1 \| FileCheck %s			; RUN: llc < %s -mcpu=corei7 -march=x86 -mattr=+sse4.1 \| FileCheck %s

	define void @t1(float* %R, <4 x float>* %P1) nounwind {			define void @t1(float* %R, <4 x float>* %P1) nounwind {
	; CHECK-LABEL: @t1			; CHECK-LABEL: @t1
	; CHECK: movl 4(%esp), %[[R0:e[abcd]x]]			; CHECK: movl 4(%esp), %[[R0:e[abcd]x]]
	; CHECK-NEXT: movl 8(%esp), %[[R1:e[abcd]x]]			; CHECK-NEXT: movl 8(%esp), %[[R1:e[abcd]x]]
	; CHECK-NEXT: movl 12(%[[R1]]), %[[R2:e[abcd]x]]			; CHECK-NEXT: movss 12(%[[R1]]), %[[R2:xmm.*]]
	; CHECK-NEXT: movl %[[R2]], (%[[R0]])			; CHECK-NEXT: movss %[[R2]], (%[[R0]])
	; CHECK-NEXT: retl			; CHECK-NEXT: retl

	%X = load <4 x float>* %P1			%X = load <4 x float>* %P1
	%tmp = extractelement <4 x float> %X, i32 3			%tmp = extractelement <4 x float> %X, i32 3
	store float %tmp, float* %R			store float %tmp, float* %R
	ret void			ret void
	}			}

	Show All 35 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate insertions.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 11804

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/trunk/test/CodeGen/ARM/fold-stack-adjust.ll

llvm/trunk/test/CodeGen/ARM/sxt_rot.ll

llvm/trunk/test/CodeGen/PowerPC/complex-return.ll

llvm/trunk/test/CodeGen/PowerPC/subsumes-pred-regs.ll

llvm/trunk/test/CodeGen/R600/r600-export-fix.ll

llvm/trunk/test/CodeGen/R600/swizzle-export.ll

llvm/trunk/test/CodeGen/Thumb2/thumb2-sxt_rot.ll

llvm/trunk/test/CodeGen/Thumb2/thumb2-uxt_rot.ll

llvm/trunk/test/CodeGen/X86/avx512-zext-load-crash.ll

llvm/trunk/test/CodeGen/X86/block-placement.ll

llvm/trunk/test/CodeGen/X86/divide-by-constant.ll

llvm/trunk/test/CodeGen/X86/fold-pcmpeqd-0.ll

llvm/trunk/test/CodeGen/X86/narrow-shl-load.ll

llvm/trunk/test/CodeGen/X86/store-narrow.ll

llvm/trunk/test/CodeGen/X86/vec_extract-sse4.ll

[SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate insertions.
ClosedPublic