This is an archive of the discontinued LLVM Phabricator instance.

Differential D20017

Aggressive choosing best loop top
AbandonedPublic

Authored by cycheng on May 6 2016, 6:06 AM.

Download Raw Diff

Details

Reviewers

rsandifo
chandlerc
tjablin
davidxl
• tstellarAMD
kbarton
amehsan
hfinkel
morisset
nemanjai

Summary

We want to find better loop top for this common (and similar) pattern:

          entry               
            |                 
------> loop.header (body)    
|97%    /       \             
|      /50%      \50%         
--- latch <--- if.then        
       \    97%  /            
        \3%     /3%           
         loop.end

Currently, Branch Probability Basic Block Placement will generate BlockChain in this order:
entry -> loop.header -> if.then -> latch -> loop.end

This order cause latch needs an branch jumping back to loop.header when condition is true.

Better BlockChain order would be:
entry -> latch -> loop.header -> if.then -> loop.end

So latch can fall through loop.header without this jump.

Thanks Carrot for pointing out this performance issue: https://llvm.org/bugs/show_bug.cgi?id=25782

We also test this patch on Power8 by running SPEC2006, gcc and libquantum get 5% improvements.

Diff Detail

Event Timeline

cycheng updated this revision to Diff 56406.May 6 2016, 6:06 AM

cycheng retitled this revision from to Aggressive choosing best loop top.

cycheng updated this object.

cycheng added reviewers: hfinkel, nemanjai, tjablin, kbarton, amehsan, chandlerc, rsandifo, morisset.

cycheng added a subscriber: llvm-commits.

Maybe we already have this and I missed it. Could you add a testcase, that has the pattern you want to optimize. That will help reviewing the code, and also prevent breaking your this optimization in the future.

amehsan added inline comments.May 9 2016, 5:36 AM

lib/CodeGen/MachineBlockPlacement.cpp
726–742	Could this hurt performance for the following example --------->entry \| \| \| \| \| loop header \| \| \| \ \| \| \| \ \| ---- --Latch1<----- if.then \| \| \| ----------Latch2 <------ Probabilities that matter: loop.heaer--->if.then 99%, if then->Latch 1 99%, Latch1 ---> header 80% The problem is that before your change the branch from if.then to Latch 1 is fall thru. After your change it is a jump back. If we have a single latch this shouldn't hurt the performance, but with multiple latches, the number of fall-thrus that your code, removes is more than those that it creates. I am not sure how important this is. Checking for a single latch should be straightforward using LoopInfo API if this is really a problematic edge case.

amehsan added inline comments.May 9 2016, 8:22 AM

lib/CodeGen/MachineBlockPlacement.cpp
726–742	In the graph that I have drawn, arrows from Latch1 and Latch2 should go back to loop header. I hope it was not confusing.

cycheng mentioned this in D20092: [AMDGPU] Fix issues introduced by aggressive block placement.May 10 2016, 2:04 AM

cycheng added inline comments.May 10 2016, 4:47 AM

lib/CodeGen/MachineBlockPlacement.cpp
726–742	Thanks Ehsan for pointing out this! I changed latch1 --> loop.header probability to 90%, because my patch checks this probability, if it <= 80%, then skip latch1. entry \| +----> loop.header \| / \ \| 90% /1% \99% +--latch1 <--- if.then \| \| 99% \| \| v 10% \| 1% ---latch2 <------+ original: this patch: latch2 latch1 loop.header loop.header if.then if.then latch1 latch2 When do the original win: loop.header -> if.then -> latch1 -> latch2 -> loop.header this patch require two additional branch: if.then -> latch1 latch2 -> loop.header When do this patch win: loop.header -> latch1 -> loop.header I think I should check the probability of loop.header -> latch1, 50% looks like we can still get benefit, consider the two path again, with loop.header -> latch1 = 50%: loop.header -> if.then -> latch1 -> latch2 -> loop.header 0.5 * 0.99 * 0.1 * 1.0 = (path probability) = 0.0495 original: * 1 (branch) = 0.0495 this patch: * 3 (branch) = 0.1485 loop.header -> latch1 -> loop.header 0.5 * 0.90 = (path probability) = 0.45 original: * 2 (branch) = 0.9 this patch: * 1 (branch) = 0.45 Right?

cycheng added a parent revision: D20092: [AMDGPU] Fix issues introduced by aggressive block placement.May 10 2016, 4:49 AM

amehsan added inline comments.May 10 2016, 7:46 AM

lib/CodeGen/MachineBlockPlacement.cpp
726–742	The question is how much we can rely on these numbers? The probabilities are chosen statically by the compiler (or under FDO they come from training workload). So we might have probabilities during compilation that allow us to perform the code change, but real probabilities during execution may differ, and cause slow down. For single latch case, your change is very robust, and does not cause degradation, even if actual probabilities differ from our static assumptions. But for multiple latch case this is not true. There is still one possibility: we may want to accept the risk of degradation in some cases to have improvement in other cases. Personally, I prefer to be conservative and disable it for multiple latches, unless some reason for accepting risky change is given. But I may be wrong.

Fix AMDGPU test case failure

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptMay 10 2016, 7:55 AM

Have you benchmarked how this changes performance on other architectures? If you can't we'll need to get others do to so, as this is likely to have pretty widespread effects.

Also should get some of the other folks who've bene looking at loop layout to look at this. CC'ing David at least.

lib/CodeGen/AtomicExpandPass.cpp
900	Sink this to where it is used?
932–934	This comment doesn't really help here. What is "aggressive-best-top" and what does it mean? I think what you're actually doing is trying to give probabilities to override the default loop probabilities because the spin loop for an atomic isn't expected to loop often if at all. I'm actually surprised you would have a 50% probability of looping here, I would have expected a relatively low probability of spinning to be more appropriate. I think you should: split this out into an independent change with its own description and test case improve this comment to talk generically about the fact that the loop isn't expected to be taken unless there is contention. Then the rest of this change can be predicated on that. I'm assuming that you can test this independently because its actually annotating the IR.
lib/CodeGen/MachineBlockPlacement.cpp
114–117	Any particular reason for a flag? Or for calling this aggressive? It seems pretty straight forward to try to select the best loop top among latches.
780	I would't reference system-z or a test case which might go away. You have the generic description here and can add details about why this is bad in a generic sense.
789	Why would you handle it if cold at all? Not clear why you need a different threshold here than the threshold used throughout this code for cold.
lib/Target/SystemZ/SystemZISelLowering.cpp
5298–5299	Much like in the atomic expand pass, I'd talk about the semantic reason why these probabilities are correct, rather than about some hypothetical layout. Is this testable independently? If so, I would separate this too into its own patch.

Sorry I missed this.

Chandler thanks for looping me in. I will take a look.

By briefly looking at the summary, it seems to me this is exactly the cost based loop rotation enhancement (by Cong) is supposed to solve. This option is currently off by default, and also guarded by Profile feedback.

Can you comment out the guard at line 1555 that checks profile data, and then use option -mllvm precise-rotation-cost=true to see if it solves your problem? We have plan to turn that on by default at some point after more performance experiment.

haicheng added a subscriber: haicheng.May 11 2016, 5:34 PM

Thanks for working on this. This pattern is also seen in spec2006/mcf, I will test this patch on AArch64/Kryo and let you know the results. FWIW, I think this is always good when the Loop has a single latch only, but for loops with multiple latches it might not always be beneficial without knowing the actual branch probabilities.

I verified that cost based rotation algorithm handles the issue as expected.

See the new test case added in r269267.

A new internal option is also added to help enable this rotation llc -force-precise-rotation-cost without requiring PGO. I recommend exploring existing rotation strategy (i.e., help by testing your use cases with performance data so that we can turn it on by default if desired) instead of adding more redundant logic here.

The missing weight fixes can be extracted out separately.

I don't think the change in MBP is the right thing to make -- we don't any more 'pattern' matching code like this in MBP, which can be very fragile. Please try the cost based loop rotation which is a general solution. For different targets, there are also tuning parameters JumpInstCost and MisfetchCost that can be tuned in a target dependent way.

lib/CodeGen/AtomicExpandPass.cpp
900	please split out change in this file into a different patch.
lib/Target/SystemZ/SystemZISelLowering.cpp
5271	This should also be split out into a different patch.

davidxl added inline comments.May 11 2016, 10:33 PM

test/CodeGen/X86/code_placement_loop_rotation2.ll
8	By the way, this example shows that the new expected layout is not as optimal as the one when -force-precise-rotation-cost is on (see below CHECK-PROFILE). The total branch cost of the optimal layout [e, f, c, d, h, b, g ] is: C(c->e) + C(f->h) + C(d->f) + C(h->exit) + C(b->c) + C(g->h) while the total cost of aggressive loop top layout [h, b, g, f, c, d, e] is C(c->e) + C(f->h) + C(d->f) + C(h->exit) + C(b->c) + C(g->h) + C(e->f) Edge e->f is in the inner loop and it is a hot edge , the additional cost of C(e->f) can be high.

davidxl:

Yes, -force-precise-rotation-cost=true solve this issue, and the result looks better then this patch. I hope we can enable it by default.

Test on Power8 with ref data size (largest size), iteration = 1, cpu frequency governor is "performance" (maximum speed), bind on physical-cpu 0:

llvm: r269273 (2016 May 12)
-force-precise-rotation-cost=true v.s. false, > 1.0 means true is good.
Result:

400.perlbench 	1.00x
401.bzip2     	1.00x
403.gcc       	1.07x
429.mcf       	1.04x
445.gobmk     	1.02x
456.hmmer     	1.00x
458.sjeng     	1.01x
462.libquantum	1.00x
464.h264ref   	1.04x
471.omnetpp   	1.06x
473.astar     	0.97x
483.xalancbmk 	0.99x

llvm r267873 (2016 Apr 28) + this patch
aggressive-best-top=true v.s. false, > 1.0 means true is good.
Result:

400.perlbench 	1.01x
401.bzip2     	1.01x
403.gcc       	1.07x
429.mcf       	1.00x
445.gobmk     	1.02x
456.hmmer     	0.98x
458.sjeng     	0.99x
462.libquantum	1.03x
464.h264ref   	1.06x
471.omnetpp   	1.03x
473.astar     	1.00x
483.xalancbmk 	0.99x

So I thought I should use existing rotation strategy.

By the way, please look at this pattern:

          entry               
            |                 
------> loop.header (body)    
|97%    /       \             
|      /50%      \50%         
--- latch <--- if.then        
       |
       |3%
   loop.end

Current strategy generates this order: entry loop.header if.then latch loop.end
But I thought this pattern is also eligible to optimize, right?

The cost of loop rotation calculated by current rotation strategy:

BB#1 ('header') to the top: 273
BB#4 ('if.then') to the top: 297
BB#2 ('latch') to the top: 297

lib/CodeGen/AtomicExpandPass.cpp
932–934	Exactly! And yes, that loop isn't expected to be taken, I should use lower value.
lib/CodeGen/MachineBlockPlacement.cpp
114–117	For testing purpose :P
test/CodeGen/X86/code_placement_loop_rotation2.ll
8	Thanks for pointing out this : D

chandlerc:

I can test on PowerPC and x86, but I have no idea on other backend : (

Abandon this patch because there is already a mature mechanism: "Precise (Loop) Rotation Cost", I should rely on it.
Thanks for all of your help.

Revision Contents

Path

Size

lib/

CodeGen/

AtomicExpandPass.cpp

6 lines

MachineBlockPlacement.cpp

57 lines

Target/

SystemZ/

SystemZISelLowering.cpp

8 lines

test/

CodeGen/

AArch64/

swifterror.ll

4 lines

AMDGPU/

valu-i1.ll

27 lines

ARM/

code-placement.ll

8 lines

swifterror.ll

4 lines

SystemZ/

loop-01.ll

20 lines

swifterror.ll

4 lines

Thumb2/

2010-02-11-phi-cycle.ll

8 lines

X86/

block-placement.ll

12 lines

code_placement_cold_loop_blocks.ll

4 lines

code_placement_ignore_succ_in_inner_loop.ll

6 lines

code_placement_loop_rotation2.ll

7 lines

compact-unwind.ll

4 lines

licm-dominance.ll

3 lines

mbp-false-cfg-break.ll

2 lines

swifterror.ll

4 lines

Transforms/

AtomicExpand/

ARM/

atomic-expansion-v7.ll

22 lines

atomic-expansion-v8.ll

8 lines

Diff 56713

lib/CodeGen/AtomicExpandPass.cpp

Show All 16 Lines

#include "llvm/CodeGen/AtomicExpandUtils.h"		#include "llvm/CodeGen/AtomicExpandUtils.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
		#include "llvm/IR/MDBuilder.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetLowering.h"		#include "llvm/Target/TargetLowering.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetSubtargetInfo.h"		#include "llvm/Target/TargetSubtargetInfo.h"

using namespace llvm;		using namespace llvm;
▲ Show 20 Lines • Show All 858 Lines • ▼ Show 20 Lines	bool llvm::expandAtomicRMWToCmpXchg(AtomicRMWInst *AI,
// %pair = cmpxchg iN* %addr, iN %loaded, iN %new		// %pair = cmpxchg iN* %addr, iN %loaded, iN %new
// %new_loaded = extractvalue { iN, i1 } %pair, 0		// %new_loaded = extractvalue { iN, i1 } %pair, 0
// %success = extractvalue { iN, i1 } %pair, 1		// %success = extractvalue { iN, i1 } %pair, 1
// br i1 %success, label %atomicrmw.end, label %loop		// br i1 %success, label %atomicrmw.end, label %loop
// atomicrmw.end:		// atomicrmw.end:
// [...]		// [...]
BasicBlock *ExitBB = BB->splitBasicBlock(AI->getIterator(), "atomicrmw.end");		BasicBlock *ExitBB = BB->splitBasicBlock(AI->getIterator(), "atomicrmw.end");
BasicBlock *LoopBB = BasicBlock::Create(Ctx, "atomicrmw.start", F, ExitBB);		BasicBlock *LoopBB = BasicBlock::Create(Ctx, "atomicrmw.start", F, ExitBB);
		MDNode *BrWeight = MDBuilder(AI->getContext()).createBranchWeights(1, 1);
		chandlercUnsubmitted Not Done Reply Inline Actions Sink this to where it is used? chandlerc: Sink this to where it is used?
		davidxlUnsubmitted Not Done Reply Inline Actions please split out change in this file into a different patch. davidxl: please split out change in this file into a different patch.

// This grabs the DebugLoc from AI.		// This grabs the DebugLoc from AI.
IRBuilder<> Builder(AI);		IRBuilder<> Builder(AI);

// The split call above "helpfully" added a branch at the end of BB (to the		// The split call above "helpfully" added a branch at the end of BB (to the
// wrong place), but we want a load. It's easiest to just remove		// wrong place), but we want a load. It's easiest to just remove
// the branch entirely.		// the branch entirely.
std::prev(BB->end())->eraseFromParent();		std::prev(BB->end())->eraseFromParent();
Show All 15 Lines	bool llvm::expandAtomicRMWToCmpXchg(AtomicRMWInst *AI,
Value *Success = nullptr;		Value *Success = nullptr;

CreateCmpXchg(Builder, Addr, Loaded, NewVal, MemOpOrder,		CreateCmpXchg(Builder, Addr, Loaded, NewVal, MemOpOrder,
Success, NewLoaded);		Success, NewLoaded);
assert(Success && NewLoaded);		assert(Success && NewLoaded);

Loaded->addIncoming(NewLoaded, LoopBB);		Loaded->addIncoming(NewLoaded, LoopBB);

Builder.CreateCondBr(Success, ExitBB, LoopBB);		// Set LoopBB -> ExitBB and LoopBB -> LoopBB with 50% probability, so block
		// placement won't apply aggressive-best-top on LoopBB
		Builder.CreateCondBr(Success, ExitBB, LoopBB, BrWeight);
		chandlercUnsubmitted Not Done Reply Inline Actions This comment doesn't really help here. What is "aggressive-best-top" and what does it mean? I think what you're actually doing is trying to give probabilities to override the default loop probabilities because the spin loop for an atomic isn't expected to loop often if at all. I'm actually surprised you would have a 50% probability of looping here, I would have expected a relatively low probability of spinning to be more appropriate. I think you should: split this out into an independent change with its own description and test case improve this comment to talk generically about the fact that the loop isn't expected to be taken unless there is contention. Then the rest of this change can be predicated on that. I'm assuming that you can test this independently because its actually annotating the IR. chandlerc: This comment doesn't really help here. What is "aggressive-best-top" and what does it mean? I…
		cychengAuthorUnsubmitted Not Done Reply Inline Actions Exactly! And yes, that loop isn't expected to be taken, I should use lower value. cycheng: Exactly! And yes, that loop isn't expected to be taken, I should use lower value.

Builder.SetInsertPoint(ExitBB, ExitBB->begin());		Builder.SetInsertPoint(ExitBB, ExitBB->begin());

AI->replaceAllUsesWith(NewLoaded);		AI->replaceAllUsesWith(NewLoaded);
AI->eraseFromParent();		AI->eraseFromParent();

return true;		return true;
}		}
▲ Show 20 Lines • Show All 351 Lines • Show Last 20 Lines

lib/CodeGen/MachineBlockPlacement.cpp

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	cl::desc("Cost that models the probablistic risk of an instruction "
"misfetch due to a jump comparing to falling through, whose cost "		"misfetch due to a jump comparing to falling through, whose cost "
"is zero."),		"is zero."),
cl::init(1), cl::Hidden);		cl::init(1), cl::Hidden);

static cl::opt<unsigned> JumpInstCost("jump-inst-cost",		static cl::opt<unsigned> JumpInstCost("jump-inst-cost",
cl::desc("Cost of jump instructions."),		cl::desc("Cost of jump instructions."),
cl::init(1), cl::Hidden);		cl::init(1), cl::Hidden);

		static cl::opt<bool>
		AggressiveBestTop("aggressive-best-top",
		cl::desc("Find best top from all latches even with conditional exit."),
		cl::init(true), cl::Hidden);
		chandlercUnsubmitted Not Done Reply Inline Actions Any particular reason for a flag? Or for calling this aggressive? It seems pretty straight forward to try to select the best loop top among latches. chandlerc: Any particular reason for a flag? Or for calling this aggressive? It seems pretty straight…
		cychengAuthorUnsubmitted Not Done Reply Inline Actions For testing purpose :P cycheng: For testing purpose :P

namespace {		namespace {
class BlockChain;		class BlockChain;
/// \brief Type for our function-wide basic block -> block chain mapping.		/// \brief Type for our function-wide basic block -> block chain mapping.
typedef DenseMap<MachineBasicBlock , BlockChain > BlockToChainMapType;		typedef DenseMap<MachineBasicBlock , BlockChain > BlockToChainMapType;
}		}

namespace {		namespace {
/// \brief A chain of blocks which will be laid out contiguously.		/// \brief A chain of blocks which will be laid out contiguously.
▲ Show 20 Lines • Show All 583 Lines • ▼ Show 20 Lines	void MachineBlockPlacement::buildChain(

DEBUG(dbgs() << "Finished forming chain for header block "		DEBUG(dbgs() << "Finished forming chain for header block "
<< getBlockName(*Chain.begin()) << "\n");		<< getBlockName(*Chain.begin()) << "\n");
}		}

/// \brief Find the best loop top block for layout.		/// \brief Find the best loop top block for layout.
///		///
/// Look for a block which is strictly better than the loop header for laying		/// Look for a block which is strictly better than the loop header for laying
/// out at the top of the loop. This looks for one and only one pattern:		/// out at the top of the loop. This looks for two patterns:
		///
/// a latch block with no conditional exit. This block will cause a conditional		/// a latch block with no conditional exit. This block will cause a conditional
/// jump around it or will be the bottom of the loop if we lay it out in place,		/// jump around it or will be the bottom of the loop if we lay it out in place,
/// but if it it doesn't end up at the bottom of the loop for any reason,		/// but if it it doesn't end up at the bottom of the loop for any reason,
/// rotation alone won't fix it. Because such a block will always result in an		/// rotation alone won't fix it. Because such a block will always result in an
/// unconditional jump (for the backedge) rotating it in front of the loop		/// unconditional jump (for the backedge) rotating it in front of the loop
/// header is always profitable.		/// header is always profitable.
		///
		/// a latch block with conditional exit, and similar to following cfg:
		///
		/// entry original better
		/// \| layout layout
		/// ------> loop.header (body) -------- ------
		/// \|97% / \ entry entry
		/// \| /50% \50% loop.header latch
		/// --- latch <--- if.then if.then loop.header
		/// \ 97% / latch if.then
		/// \3% /3% loop.end loop.end
		/// loop.end
		///
		/// "original layout" cause latch needs a branch jumping back to loop.header
		/// when condition is true, but in "better layout", latch can fall through
		/// loop.header without this jump.

		amehsanUnsubmitted Not Done Reply Inline Actions Could this hurt performance for the following example --------->entry \| \| \| \| \| loop header \| \| \| \ \| \| \| \ \| ---- --Latch1<----- if.then \| \| \| ----------Latch2 <------ Probabilities that matter: loop.heaer--->if.then 99%, if then->Latch 1 99%, Latch1 ---> header 80% The problem is that before your change the branch from if.then to Latch 1 is fall thru. After your change it is a jump back. If we have a single latch this shouldn't hurt the performance, but with multiple latches, the number of fall-thrus that your code, removes is more than those that it creates. I am not sure how important this is. Checking for a single latch should be straightforward using LoopInfo API if this is really a problematic edge case. amehsan: Could this hurt performance for the following example ``` --------->entry \| \|…
		amehsanUnsubmitted Not Done Reply Inline Actions In the graph that I have drawn, arrows from Latch1 and Latch2 should go back to loop header. I hope it was not confusing. amehsan: In the graph that I have drawn, arrows from Latch1 and Latch2 should go back to loop header. I…
		cychengAuthorUnsubmitted Not Done Reply Inline Actions Thanks Ehsan for pointing out this! I changed latch1 --> loop.header probability to 90%, because my patch checks this probability, if it <= 80%, then skip latch1. entry \| +----> loop.header \| / \ \| 90% /1% \99% +--latch1 <--- if.then \| \| 99% \| \| v 10% \| 1% ---latch2 <------+ original: this patch: latch2 latch1 loop.header loop.header if.then if.then latch1 latch2 When do the original win: loop.header -> if.then -> latch1 -> latch2 -> loop.header this patch require two additional branch: if.then -> latch1 latch2 -> loop.header When do this patch win: loop.header -> latch1 -> loop.header I think I should check the probability of loop.header -> latch1, 50% looks like we can still get benefit, consider the two path again, with loop.header -> latch1 = 50%: loop.header -> if.then -> latch1 -> latch2 -> loop.header 0.5 * 0.99 * 0.1 * 1.0 = (path probability) = 0.0495 original: * 1 (branch) = 0.0495 this patch: * 3 (branch) = 0.1485 loop.header -> latch1 -> loop.header 0.5 * 0.90 = (path probability) = 0.45 original: * 2 (branch) = 0.9 this patch: * 1 (branch) = 0.45 Right? cycheng: Thanks Ehsan for pointing out this! I changed latch1 --> loop.header probability to 90%…
		amehsanUnsubmitted Not Done Reply Inline Actions The question is how much we can rely on these numbers? The probabilities are chosen statically by the compiler (or under FDO they come from training workload). So we might have probabilities during compilation that allow us to perform the code change, but real probabilities during execution may differ, and cause slow down. For single latch case, your change is very robust, and does not cause degradation, even if actual probabilities differ from our static assumptions. But for multiple latch case this is not true. There is still one possibility: we may want to accept the risk of degradation in some cases to have improvement in other cases. Personally, I prefer to be conservative and disable it for multiple latches, unless some reason for accepting risky change is given. But I may be wrong. amehsan: The question is how much we can rely on these numbers? The probabilities are chosen statically…
MachineBasicBlock *		MachineBasicBlock *
MachineBlockPlacement::findBestLoopTop(MachineLoop &L,		MachineBlockPlacement::findBestLoopTop(MachineLoop &L,
const BlockFilterSet &LoopBlockSet) {		const BlockFilterSet &LoopBlockSet) {
// Check that the header hasn't been fused with a preheader block due to		// Check that the header hasn't been fused with a preheader block due to
// crazy branches. If it has, we need to start with the header at the top to		// crazy branches. If it has, we need to start with the header at the top to
// prevent pulling the preheader into the loop body.		// prevent pulling the preheader into the loop body.
BlockChain &HeaderChain = *BlockToChain[L.getHeader()];		BlockChain &HeaderChain = *BlockToChain[L.getHeader()];
if (!LoopBlockSet.count(*HeaderChain.begin()))		if (!LoopBlockSet.count(*HeaderChain.begin()))
return L.getHeader();		return L.getHeader();

DEBUG(dbgs() << "Finding best loop top for: " << getBlockName(L.getHeader())		DEBUG(dbgs() << "Finding best loop top for: " << getBlockName(L.getHeader())
<< "\n");		<< "\n");

BlockFrequency BestPredFreq;		BlockFrequency BestPredFreq;
MachineBasicBlock *BestPred = nullptr;		MachineBasicBlock *BestPred = nullptr;
for (MachineBasicBlock *Pred : L.getHeader()->predecessors()) {		for (MachineBasicBlock *Pred : L.getHeader()->predecessors()) {
if (!LoopBlockSet.count(Pred))		if (!LoopBlockSet.count(Pred))
continue;		continue;
DEBUG(dbgs() << " header pred: " << getBlockName(Pred) << ", "		DEBUG(dbgs() << " header pred: " << getBlockName(Pred) << ", "
<< Pred->succ_size() << " successors, ";		<< Pred->succ_size() << " successors, ";
MBFI->printBlockFreq(dbgs(), Pred) << " freq\n");		MBFI->printBlockFreq(dbgs(), Pred) << " freq\n");
if (Pred->succ_size() > 1)		if (Pred->succ_size() > 1) {
		if (!AggressiveBestTop)
		continue;

		// Don't handle latch with only one predecessor
		if (Pred->pred_size() < 2)
		continue;

		const BranchProbability HotProb(4, 5); // 80%

		// Don't handle if latch -> loop.header is not hot.
		auto ToHeaderProb = MBPI->getEdgeProbability(Pred, L.getHeader());
		if (ToHeaderProb <= HotProb)
continue;		continue;

		// Don't handle if loop.header -> latch is very cold:
		// e.g. SystemZ atomicrmw instruction (atomicrmw-minmax-*.ll)
		chandlercUnsubmitted Not Done Reply Inline Actions I would't reference system-z or a test case which might go away. You have the generic description here and can add details about why this is bad in a generic sense. chandlerc: I would't reference system-z or a test case which might go away. You have the generic…
		//
		// ------> loop.header (body)
		// \| / \
		// \| /.1% \99.9%
		// --- latch <--- if.then
		// \|
		// loop.end
		//
		const BranchProbability VeryColdProb(1, 1000); // 99.9%
		chandlercUnsubmitted Not Done Reply Inline Actions Why would you handle it if cold at all? Not clear why you need a different threshold here than the threshold used throughout this code for cold. chandlerc: Why would you handle it if cold at all? Not clear why you need a different threshold here than…
		if (L.getHeader()->isSuccessor(Pred) &&
		VeryColdProb >= MBPI->getEdgeProbability(L.getHeader(), Pred))
		continue;
		}

BlockFrequency PredFreq = MBFI->getBlockFreq(Pred);		BlockFrequency PredFreq = MBFI->getBlockFreq(Pred);
if (!BestPred \|\| PredFreq > BestPredFreq \|\|		if (!BestPred \|\| PredFreq > BestPredFreq \|\|
(!(PredFreq < BestPredFreq) &&		(!(PredFreq < BestPredFreq) &&
Pred->isLayoutSuccessor(L.getHeader()))) {		Pred->isLayoutSuccessor(L.getHeader()))) {
BestPred = Pred;		BestPred = Pred;
BestPredFreq = PredFreq;		BestPredFreq = PredFreq;
}		}
}		}
▲ Show 20 Lines • Show All 784 Lines • Show Last 20 Lines

lib/Target/SystemZ/SystemZISelLowering.cpp

Show First 20 Lines • Show All 5,261 Lines • ▼ Show 20 Lines	SystemZTargetLowering::emitAtomicLoadMinMax(MachineInstr *MI,
unsigned RotatedNewVal = (IsSubWord ? MRI.createVirtualRegister(RC) : NewVal);		unsigned RotatedNewVal = (IsSubWord ? MRI.createVirtualRegister(RC) : NewVal);

// Insert 3 basic blocks for the loop.		// Insert 3 basic blocks for the loop.
MachineBasicBlock *StartMBB = MBB;		MachineBasicBlock *StartMBB = MBB;
MachineBasicBlock *DoneMBB = splitBlockBefore(MI, MBB);		MachineBasicBlock *DoneMBB = splitBlockBefore(MI, MBB);
MachineBasicBlock *LoopMBB = emitBlockAfter(StartMBB);		MachineBasicBlock *LoopMBB = emitBlockAfter(StartMBB);
MachineBasicBlock *UseAltMBB = emitBlockAfter(LoopMBB);		MachineBasicBlock *UseAltMBB = emitBlockAfter(LoopMBB);
MachineBasicBlock *UpdateMBB = emitBlockAfter(UseAltMBB);		MachineBasicBlock *UpdateMBB = emitBlockAfter(UseAltMBB);
		const BranchProbability VeryHighProb(999, 1000); // 99.9%
		const BranchProbability VeryLowProb(1, 1000); // 0.1%
		davidxlUnsubmitted Not Done Reply Inline Actions This should also be split out into a different patch. davidxl: This should also be split out into a different patch.

// StartMBB:		// StartMBB:
// ...		// ...
// %OrigVal = L Disp(%Base)		// %OrigVal = L Disp(%Base)
// # fall through to LoopMMB		// # fall through to LoopMMB
MBB = StartMBB;		MBB = StartMBB;
BuildMI(MBB, DL, TII->get(LOpcode), OrigVal)		BuildMI(MBB, DL, TII->get(LOpcode), OrigVal)
.addOperand(Base).addImm(Disp).addReg(0);		.addOperand(Base).addImm(Disp).addReg(0);
Show All 10 Lines	BuildMI(MBB, DL, TII->get(SystemZ::PHI), OldVal)
.addReg(Dest).addMBB(UpdateMBB);		.addReg(Dest).addMBB(UpdateMBB);
if (IsSubWord)		if (IsSubWord)
BuildMI(MBB, DL, TII->get(SystemZ::RLL), RotatedOldVal)		BuildMI(MBB, DL, TII->get(SystemZ::RLL), RotatedOldVal)
.addReg(OldVal).addReg(BitShift).addImm(0);		.addReg(OldVal).addReg(BitShift).addImm(0);
BuildMI(MBB, DL, TII->get(CompareOpcode))		BuildMI(MBB, DL, TII->get(CompareOpcode))
.addReg(RotatedOldVal).addReg(Src2);		.addReg(RotatedOldVal).addReg(Src2);
BuildMI(MBB, DL, TII->get(SystemZ::BRC))		BuildMI(MBB, DL, TII->get(SystemZ::BRC))
.addImm(SystemZ::CCMASK_ICMP).addImm(KeepOldMask).addMBB(UpdateMBB);		.addImm(SystemZ::CCMASK_ICMP).addImm(KeepOldMask).addMBB(UpdateMBB);
MBB->addSuccessor(UpdateMBB);		// Set LoopMBB -> UpdateMBB with VeryLowProb, so Block Placement Pass will
MBB->addSuccessor(UseAltMBB);		// layout in this order: LoopMBB UseAltMBB UpdateMBB
		chandlercUnsubmitted Not Done Reply Inline Actions Much like in the atomic expand pass, I'd talk about the semantic reason why these probabilities are correct, rather than about some hypothetical layout. Is this testable independently? If so, I would separate this too into its own patch. chandlerc: Much like in the atomic expand pass, I'd talk about the semantic reason why these…
		MBB->addSuccessor(UpdateMBB, VeryLowProb);
		MBB->addSuccessor(UseAltMBB, VeryHighProb);

// UseAltMBB:		// UseAltMBB:
// %RotatedAltVal = RISBG %RotatedOldVal, %Src2, 32, 31 + BitSize, 0		// %RotatedAltVal = RISBG %RotatedOldVal, %Src2, 32, 31 + BitSize, 0
// # fall through to UpdateMMB		// # fall through to UpdateMMB
MBB = UseAltMBB;		MBB = UseAltMBB;
if (IsSubWord)		if (IsSubWord)
BuildMI(MBB, DL, TII->get(SystemZ::RISBG32), RotatedAltVal)		BuildMI(MBB, DL, TII->get(SystemZ::RISBG32), RotatedAltVal)
.addReg(RotatedOldVal).addReg(Src2)		.addReg(RotatedOldVal).addReg(Src2)
▲ Show 20 Lines • Show All 743 Lines • Show Last 20 Lines

test/CodeGen/AArch64/swifterror.ll

Show First 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	normal:
ret float 0.0		ret float 0.0
}		}

; "foo_loop" is a function that takes a swifterror parameter, it sets swifterror		; "foo_loop" is a function that takes a swifterror parameter, it sets swifterror
; under a certain condition inside a loop.		; under a certain condition inside a loop.
define float @foo_loop(%swift_error** swifterror %error_ptr_ref, i32 %cc, float %cc2) {		define float @foo_loop(%swift_error** swifterror %error_ptr_ref, i32 %cc, float %cc2) {
; CHECK-APPLE-LABEL: foo_loop:		; CHECK-APPLE-LABEL: foo_loop:
; CHECK-APPLE: mov x0, x19		; CHECK-APPLE: mov x0, x19
		; CHECK-APPLE: fcmp
		; CHECK-APPLE: b.gt
; CHECK-APPLE: cbz		; CHECK-APPLE: cbz
; CHECK-APPLE: orr w0, wzr, #0x10		; CHECK-APPLE: orr w0, wzr, #0x10
; CHECK-APPLE: malloc		; CHECK-APPLE: malloc
; CHECK-APPLE: strb w{{.*}}, [x0, #8]		; CHECK-APPLE: strb w{{.*}}, [x0, #8]
; CHECK-APPLE: fcmp
; CHECK-APPLE: b.le
; CHECK-APPLE: mov x19, x0		; CHECK-APPLE: mov x19, x0
; CHECK-APPLE: ret		; CHECK-APPLE: ret

; CHECK-O0-LABEL: foo_loop:		; CHECK-O0-LABEL: foo_loop:
; spill x19		; spill x19
; CHECK-O0: str x19		; CHECK-O0: str x19
; CHECk-O0: cbz		; CHECk-O0: cbz
; CHECK-O0: orr w{{.*}}, wzr, #0x10		; CHECK-O0: orr w{{.*}}, wzr, #0x10
▲ Show 20 Lines • Show All 208 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/valu-i1.ll

	Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
	; SI: s_xor_b64 [[OUTER_CMP_SREG]], exec, [[OUTER_CMP_SREG]]			; SI: s_xor_b64 [[OUTER_CMP_SREG]], exec, [[OUTER_CMP_SREG]]
	; SI: s_cbranch_execz BB3_2			; SI: s_cbranch_execz BB3_2

	; Initialize inner condition to false			; Initialize inner condition to false
	; SI: ; BB#1:			; SI: ; BB#1:
	; SI: s_mov_b64 [[ZERO:s\[[0-9]+:[0-9]+\]]], 0{{$}}			; SI: s_mov_b64 [[ZERO:s\[[0-9]+:[0-9]+\]]], 0{{$}}
	; SI: s_mov_b64 [[COND_STATE:s\[[0-9]+:[0-9]+\]]], [[ZERO]]			; SI: s_mov_b64 [[COND_STATE:s\[[0-9]+:[0-9]+\]]], [[ZERO]]

				; Loop
				; SI: BB3_4:
				; SI: buffer_store_dword
				; SI: v_cmp_ge_i64_e64 [[CMP:s\[[0-9]+:[0-9]+\]]]
				; SI: s_or_b64 [[TMP:s\[[0-9]+:[0-9]+\]]], [[CMP]], [[COND_STATE]]

				; SI: BB3_5:
				; SI: s_or_b64 exec, exec, [[ORNEG2:s\[[0-9]+:[0-9]+\]]]
				; SI: s_or_b64 [[COND_STATE]], [[ORNEG2]], [[TMP]]
				; SI: s_andn2_b64 exec, exec, [[COND_STATE]]
				; SI: s_cbranch_execz BB3_6

	; Clear exec bits for workitems that load -1s			; Clear exec bits for workitems that load -1s
	; SI: BB3_3:			; SI: BB3_3:
	; SI: buffer_load_dword [[B:v[0-9]+]]			; SI: buffer_load_dword [[B:v[0-9]+]]
	; SI: buffer_load_dword [[A:v[0-9]+]]			; SI: buffer_load_dword [[A:v[0-9]+]]
	; SI-DAG: v_cmp_ne_i32_e64 [[NEG1_CHECK_0:s\[[0-9]+:[0-9]+\]]], -1, [[A]]			; SI-DAG: v_cmp_ne_i32_e64 [[NEG1_CHECK_0:s\[[0-9]+:[0-9]+\]]], -1, [[A]]
	; SI-DAG: v_cmp_ne_i32_e32 [[NEG1_CHECK_1:vcc]], -1, [[B]]			; SI-DAG: v_cmp_ne_i32_e32 [[NEG1_CHECK_1:vcc]], -1, [[B]]
	; SI: s_and_b64 [[ORNEG1:s\[[0-9]+:[0-9]+\]]], [[NEG1_CHECK_1]], [[NEG1_CHECK_0]]			; SI: s_and_b64 [[ORNEG1:s\[[0-9]+:[0-9]+\]]], [[NEG1_CHECK_1]], [[NEG1_CHECK_0]]
	; SI: s_and_saveexec_b64 [[ORNEG2:s\[[0-9]+:[0-9]+\]]], [[ORNEG1]]			; SI: s_and_saveexec_b64 [[ORNEG2]], [[ORNEG1]]
	; SI: s_xor_b64 [[ORNEG2]], exec, [[ORNEG2]]			; SI: s_xor_b64 [[ORNEG2]], exec, [[ORNEG2]]
	; SI: s_cbranch_execz BB3_5			; SI: s_cbranch_execz BB3_5

	; SI: BB#4:			; SI: BB3_6
	; SI: buffer_store_dword
	; SI: v_cmp_ge_i64_e64 [[CMP:s\[[0-9]+:[0-9]+\]]]
	; SI: s_or_b64 [[TMP:s\[[0-9]+:[0-9]+\]]], [[CMP]], [[COND_STATE]]

	; SI: BB3_5:
	; SI: s_or_b64 exec, exec, [[ORNEG2]]
	; SI: s_or_b64 [[COND_STATE]], [[ORNEG2]], [[TMP]]
	; SI: s_andn2_b64 exec, exec, [[COND_STATE]]
	; SI: s_cbranch_execnz BB3_3

	; SI: BB#6
	; SI: s_or_b64 exec, exec, [[COND_STATE]]			; SI: s_or_b64 exec, exec, [[COND_STATE]]

	; SI: BB3_2:			; SI: BB3_2:
	; SI-NOT: [[COND_STATE]]			; SI-NOT: [[COND_STATE]]
	; SI: s_endpgm			; SI: s_endpgm

	define void @multi_vcond_loop(i32 addrspace(1)* noalias nocapture %arg, i32 addrspace(1)* noalias nocapture readonly %arg1, i32 addrspace(1)* noalias nocapture readonly %arg2, i32 addrspace(1)* noalias nocapture readonly %arg3) #1 {			define void @multi_vcond_loop(i32 addrspace(1)* noalias nocapture %arg, i32 addrspace(1)* noalias nocapture readonly %arg1, i32 addrspace(1)* noalias nocapture readonly %arg2, i32 addrspace(1)* noalias nocapture readonly %arg3) #1 {
	bb:			bb:
	Show All 34 Lines

test/CodeGen/ARM/code-placement.ll

	Show All 32 Lines
	; rdar://8117827			; rdar://8117827
	define i32 @t2(i32 %passes, i32* nocapture %src, i32 %size) nounwind readonly {			define i32 @t2(i32 %passes, i32* nocapture %src, i32 %size) nounwind readonly {
	entry:			entry:
	; CHECK-LABEL: t2:			; CHECK-LABEL: t2:
	; CHECK: beq LBB1_[[RET:.]]			; CHECK: beq LBB1_[[RET:.]]
	%0 = icmp eq i32 %passes, 0 ; <i1> [#uses=1]			%0 = icmp eq i32 %passes, 0 ; <i1> [#uses=1]
	br i1 %0, label %bb5, label %bb.nph15			br i1 %0, label %bb5, label %bb.nph15

				; bb3 Checking:
				; CHECK: LBB1_[[BB3:.]]: @ %bb3
				; CHECK: beq LBB1_[[RET]]
				; CHECK-NOT: b LBB1_

	; CHECK: LBB1_[[PREHDR:.]]: @ %bb2.preheader			; CHECK: LBB1_[[PREHDR:.]]: @ %bb2.preheader
	bb1: ; preds = %bb2.preheader, %bb1			bb1: ; preds = %bb2.preheader, %bb1
	; CHECK: LBB1_[[BB1:.]]: @ %bb1			; CHECK: LBB1_[[BB1:.]]: @ %bb1
	; CHECK: bne LBB1_[[BB1]]			; CHECK: bne LBB1_[[BB1]]
	%indvar = phi i32 [ %indvar.next, %bb1 ], [ 0, %bb2.preheader ] ; <i32> [#uses=2]			%indvar = phi i32 [ %indvar.next, %bb1 ], [ 0, %bb2.preheader ] ; <i32> [#uses=2]
	%sum.08 = phi i32 [ %2, %bb1 ], [ %sum.110, %bb2.preheader ] ; <i32> [#uses=1]			%sum.08 = phi i32 [ %2, %bb1 ], [ %sum.110, %bb2.preheader ] ; <i32> [#uses=1]
	%tmp17 = sub i32 %i.07, %indvar ; <i32> [#uses=1]			%tmp17 = sub i32 %i.07, %indvar ; <i32> [#uses=1]
	%scevgep = getelementptr i32, i32* %src, i32 %tmp17 ; <i32*> [#uses=1]			%scevgep = getelementptr i32, i32* %src, i32 %tmp17 ; <i32*> [#uses=1]
	%1 = load i32, i32* %scevgep, align 4 ; <i32> [#uses=1]			%1 = load i32, i32* %scevgep, align 4 ; <i32> [#uses=1]
	%2 = add nsw i32 %1, %sum.08 ; <i32> [#uses=2]			%2 = add nsw i32 %1, %sum.08 ; <i32> [#uses=2]
	%indvar.next = add i32 %indvar, 1 ; <i32> [#uses=2]			%indvar.next = add i32 %indvar, 1 ; <i32> [#uses=2]
	%exitcond = icmp eq i32 %indvar.next, %size ; <i1> [#uses=1]			%exitcond = icmp eq i32 %indvar.next, %size ; <i1> [#uses=1]
	br i1 %exitcond, label %bb3, label %bb1			br i1 %exitcond, label %bb3, label %bb1

	bb3: ; preds = %bb1, %bb2.preheader			bb3: ; preds = %bb1, %bb2.preheader
	; CHECK: LBB1_[[BB3:.]]: @ %bb3
	; CHECK: bne LBB1_[[PREHDR]]
	; CHECK-NOT: b LBB1_
	%sum.0.lcssa = phi i32 [ %sum.110, %bb2.preheader ], [ %2, %bb1 ] ; <i32> [#uses=2]			%sum.0.lcssa = phi i32 [ %sum.110, %bb2.preheader ], [ %2, %bb1 ] ; <i32> [#uses=2]
	%3 = add i32 %pass.011, 1 ; <i32> [#uses=2]			%3 = add i32 %pass.011, 1 ; <i32> [#uses=2]
	%exitcond18 = icmp eq i32 %3, %passes ; <i1> [#uses=1]			%exitcond18 = icmp eq i32 %3, %passes ; <i1> [#uses=1]
	br i1 %exitcond18, label %bb5, label %bb2.preheader			br i1 %exitcond18, label %bb5, label %bb2.preheader

	bb.nph15: ; preds = %entry			bb.nph15: ; preds = %entry
	%i.07 = add i32 %size, -1 ; <i32> [#uses=2]			%i.07 = add i32 %size, -1 ; <i32> [#uses=2]
	%4 = icmp sgt i32 %i.07, -1 ; <i1> [#uses=1]			%4 = icmp sgt i32 %i.07, -1 ; <i1> [#uses=1]
	Show All 13 Lines

test/CodeGen/ARM/swifterror.ll

	Show First 20 Lines • Show All 170 Lines • ▼ Show 20 Lines
	; "foo_loop" is a function that takes a swifterror parameter, it sets swifterror			; "foo_loop" is a function that takes a swifterror parameter, it sets swifterror
	; under a certain condition inside a loop.			; under a certain condition inside a loop.
	define float @foo_loop(%swift_error** swifterror %error_ptr_ref, i32 %cc, float %cc2) {			define float @foo_loop(%swift_error** swifterror %error_ptr_ref, i32 %cc, float %cc2) {
	; CHECK-APPLE-LABEL: foo_loop:			; CHECK-APPLE-LABEL: foo_loop:
	; CHECK-APPLE: mov [[CODE:r[0-9]+]], r0			; CHECK-APPLE: mov [[CODE:r[0-9]+]], r0
	; swifterror is kept in a register			; swifterror is kept in a register
	; CHECK-APPLE: mov [[ID:r[0-9]+]], r6			; CHECK-APPLE: mov [[ID:r[0-9]+]], r6
	; CHECK-APPLE: cmp [[CODE]], #0			; CHECK-APPLE: cmp [[CODE]], #0
	; CHECK-APPLE: beq			; CHECK-APPLE: beq [[BB_CONT:.]]
	; CHECK-APPLE: mov r0, #16			; CHECK-APPLE: mov r0, #16
	; CHECK-APPLE: malloc			; CHECK-APPLE: malloc
	; CHECK-APPLE: strb r{{.}}, [{{.}}[[ID]], #8]			; CHECK-APPLE: strb r{{.}}, [{{.}}[[ID]], #8]
	; CHECK-APPLE: ble			; CHECK-APPLE: b [[BB_CONT]]
	; CHECK-APPLE: mov r6, [[ID]]			; CHECK-APPLE: mov r6, [[ID]]

	; CHECK-O0-LABEL: foo_loop:			; CHECK-O0-LABEL: foo_loop:
	; CHECK-O0: mov r{{.*}}, r6			; CHECK-O0: mov r{{.*}}, r6
	; CHECK-O0: cmp r{{.*}}, #0			; CHECK-O0: cmp r{{.*}}, #0
	; CHECK-O0: beq			; CHECK-O0: beq
	; CHECK-O0-DAG: movw r{{.*}}, #1			; CHECK-O0-DAG: movw r{{.*}}, #1
	; CHECK-O0-DAG: mov r{{.*}}, #16			; CHECK-O0-DAG: mov r{{.*}}, #16
	▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/loop-01.ll

Show All 18 Lines	loop:
%next = add i64 %index, 1		%next = add i64 %index, 1
%cmp = icmp ne i64 %next, 100		%cmp = icmp ne i64 %next, 100
br i1 %cmp, label %loop, label %exit		br i1 %cmp, label %loop, label %exit

exit:		exit:
ret void		ret void
}		}

; Test a loop that should be converted into dbr form and then use BRCT.		; Test a loop
define void @f2(i32 %src, i32 %dest) {		define void @f2(i32 %src, i32 %dest) {
; CHECK-LABEL: f2:		; CHECK-LABEL: f2:
; CHECK: lhi [[REG:%r[0-5]]], 100		; CHECK: lhi [[REG:%r[0-5]]], 100
; CHECK: [[LABEL:\.[^:]]]:{{.}} %loop		; CHECK: j [[LABEL:.*]]
; CHECK: brct [[REG]], [[LABEL]]		; CHECK: [[LATCH:.*]]: # %loop.next
		; CHECK: [[LABEL]]: # %loop
		; CHECK: j [[LATCH]]
; CHECK: br %r14		; CHECK: br %r14
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%count = phi i32 [ 0, %entry ], [ %next, %loop.next ]		%count = phi i32 [ 0, %entry ], [ %next, %loop.next ]
%next = add i32 %count, 1		%next = add i32 %count, 1
%val = load volatile i32 , i32 *%src		%val = load volatile i32 , i32 *%src
%cmp = icmp eq i32 %val, 0		%cmp = icmp eq i32 %val, 0
br i1 %cmp, label %loop.next, label %loop.store		br i1 %cmp, label %loop.next, label %loop.store

loop.store:		loop.store:
%add = add i32 %val, 1		%add = add i32 %val, 1
store volatile i32 %add, i32 *%dest		store volatile i32 %add, i32 *%dest
br label %loop.next		br label %loop.next

loop.next:		loop.next:
%cont = icmp ne i32 %next, 100		%cont = icmp ne i32 %next, 100
br i1 %cont, label %loop, label %exit		br i1 %cont, label %loop, label %exit

exit:		exit:
ret void		ret void
}		}

; Like f2, but for BRCTG.		; Like f2.
define void @f3(i64 %src, i64 %dest) {		define void @f3(i64 %src, i64 %dest) {
; CHECK-LABEL: f3:		; CHECK-LABEL: f3:
; CHECK: lghi [[REG:%r[0-5]]], 100		; CHECK: lghi [[REG:%r[0-5]]], 100
; CHECK: [[LABEL:\.[^:]]]:{{.}} %loop		; CHECK: j [[LABEL:.*]]
; CHECK: brctg [[REG]], [[LABEL]]		; CHECK: [[LATCH:\.[^:]*]]: # %loop.next
		; CHECK: [[LABEL]]: # %loop
		; CHECK: j [[LATCH]]
; CHECK: br %r14		; CHECK: br %r14

entry:		entry:
br label %loop		br label %loop

loop:		loop:
%count = phi i64 [ 0, %entry ], [ %next, %loop.next ]		%count = phi i64 [ 0, %entry ], [ %next, %loop.next ]
%next = add i64 %count, 1		%next = add i64 %count, 1
%val = load volatile i64 , i64 *%src		%val = load volatile i64 , i64 *%src
%cmp = icmp eq i64 %val, 0		%cmp = icmp eq i64 %val, 0
Show All 15 Lines
; Test a loop with a 64-bit decremented counter in which the 32-bit		; Test a loop with a 64-bit decremented counter in which the 32-bit
; low part of the counter is used after the decrement. This is an example		; low part of the counter is used after the decrement. This is an example
; of a subregister use being the only thing that blocks a conversion to BRCTG.		; of a subregister use being the only thing that blocks a conversion to BRCTG.
define void @f4(i32 %src, i32 %dest, i64 *%dest2, i64 %count) {		define void @f4(i32 %src, i32 %dest, i64 *%dest2, i64 %count) {
; CHECK-LABEL: f4:		; CHECK-LABEL: f4:
; CHECK: aghi [[REG:%r[0-5]]], -1		; CHECK: aghi [[REG:%r[0-5]]], -1
; CHECK: lr [[REG2:%r[0-5]]], [[REG]]		; CHECK: lr [[REG2:%r[0-5]]], [[REG]]
; CHECK: stg [[REG2]],		; CHECK: stg [[REG2]],
; CHECK: jne {{\..*}}		; CHECK: je [[EXIT:.*]]
		; CHECK: [[EXIT]]:
; CHECK: br %r14		; CHECK: br %r14
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%left = phi i64 [ %count, %entry ], [ %next, %loop.next ]		%left = phi i64 [ %count, %entry ], [ %next, %loop.next ]
store volatile i64 %left, i64 *%dest2		store volatile i64 %left, i64 *%dest2
%val = load volatile i32 , i32 *%src		%val = load volatile i32 , i32 *%src
Show All 21 Lines

test/CodeGen/SystemZ/swifterror.ll

Show First 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	normal:
ret float 0.0		ret float 0.0
}		}

; "foo_loop" is a function that takes a swifterror parameter, it sets swifterror		; "foo_loop" is a function that takes a swifterror parameter, it sets swifterror
; under a certain condition inside a loop.		; under a certain condition inside a loop.
define float @foo_loop(%swift_error** swifterror %error_ptr_ref, i32 %cc, float %cc2) {		define float @foo_loop(%swift_error** swifterror %error_ptr_ref, i32 %cc, float %cc2) {
; CHECK-LABEL: foo_loop:		; CHECK-LABEL: foo_loop:
; CHECK: lr %r[[REG1:[0-9]+]], %r2		; CHECK: lr %r[[REG1:[0-9]+]], %r2
		; CHECK: ceb %f8,
		; CHECK: jh
; CHECK: cije %r[[REG1]], 0		; CHECK: cije %r[[REG1]], 0
; CHECK: lghi %r2, 16		; CHECK: lghi %r2, 16
; CHECK: brasl %r14, malloc		; CHECK: brasl %r14, malloc
; CHECK: mvi 8(%r2), 1		; CHECK: mvi 8(%r2), 1
; CHECK: ceb %f8,
; CHECK: jnh
; CHECK: lgr %r9, %r2		; CHECK: lgr %r9, %r2
; CHECK: br %r14		; CHECK: br %r14
; CHECK-O0-LABEL: foo_loop:		; CHECK-O0-LABEL: foo_loop:
; spill to stack		; spill to stack
; CHECK-O0: stg %r9, [[OFFS:[0-9]+]](%r15)		; CHECK-O0: stg %r9, [[OFFS:[0-9]+]](%r15)
; CHECK-O0: chi %r{{.*}}, 0		; CHECK-O0: chi %r{{.*}}, 0
; CHECK-O0: je		; CHECK-O0: je
; CHECK-O0: lghi %r2, 16		; CHECK-O0: lghi %r2, 16
▲ Show 20 Lines • Show All 189 Lines • Show Last 20 Lines

test/CodeGen/Thumb2/2010-02-11-phi-cycle.ll

Show All 25 Lines	bb: ; preds = %bb.nph, %bb
br i1 %exitcond, label %return, label %bb		br i1 %exitcond, label %return, label %bb

return: ; preds = %bb, %entry		return: ; preds = %bb, %entry
ret i32 undef		ret i32 undef
}		}

define i32 @test_dead_cycle(i32 %n) nounwind {		define i32 @test_dead_cycle(i32 %n) nounwind {
; CHECK-LABEL: test_dead_cycle:		; CHECK-LABEL: test_dead_cycle:
; CHECK: blx
; CHECK-NOT: mov
; CHECK: blx
entry:		entry:
%0 = icmp eq i32 %n, 1 ; <i1> [#uses=1]		%0 = icmp eq i32 %n, 1 ; <i1> [#uses=1]
br i1 %0, label %return, label %bb.nph		br i1 %0, label %return, label %bb.nph

bb.nph: ; preds = %entry		bb.nph: ; preds = %entry
%tmp = add i32 %n, -1 ; <i32> [#uses=2]		%tmp = add i32 %n, -1 ; <i32> [#uses=2]
br label %bb		br label %bb

Show All 11 Lines	bb1: ; preds = %bb
%ins = or i64 %tmp6, %mask ; <i64> [#uses=1]		%ins = or i64 %tmp6, %mask ; <i64> [#uses=1]
tail call void @g(i64 %ins) nounwind		tail call void @g(i64 %ins) nounwind
br label %bb2		br label %bb2

bb2: ; preds = %bb1, %bb		bb2: ; preds = %bb1, %bb
; also check for duplicate induction variables (radar 7645034)		; also check for duplicate induction variables (radar 7645034)
; CHECK: subs r{{.*}}, #1		; CHECK: subs r{{.*}}, #1
; CHECK-NOT: subs r{{.*}}, #1		; CHECK-NOT: subs r{{.*}}, #1
; CHECK: pop		; CHECK: %bb
%u.0 = phi i64 [ %ins, %bb1 ], [ %u.17, %bb ] ; <i64> [#uses=2]		%u.0 = phi i64 [ %ins, %bb1 ], [ %u.17, %bb ] ; <i64> [#uses=2]
%indvar.next = add i32 %indvar, 1 ; <i32> [#uses=2]		%indvar.next = add i32 %indvar, 1 ; <i32> [#uses=2]
%exitcond = icmp eq i32 %indvar.next, %tmp ; <i1> [#uses=1]		%exitcond = icmp eq i32 %indvar.next, %tmp ; <i1> [#uses=1]
br i1 %exitcond, label %return, label %bb		br i1 %exitcond, label %return, label %bb

		; CHECK: blx
		; CHECK-NOT: mov
		; CHECK: blx
return: ; preds = %bb2, %entry		return: ; preds = %bb2, %entry
ret i32 undef		ret i32 undef
}		}

declare i32 @f()		declare i32 @f()

declare void @g(i64)		declare void @g(i64)

test/CodeGen/X86/block-placement.ll

	Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	exit:			exit:
	ret i32 %b			ret i32 %b
	}			}

	define i32 @test_loop_cold_blocks(i32 %i, i32* %a) {			define i32 @test_loop_cold_blocks(i32 %i, i32* %a) {
	; Check that we sink cold loop blocks after the hot loop body.			; Check that we sink cold loop blocks after the hot loop body.
	; CHECK-LABEL: test_loop_cold_blocks:			; CHECK-LABEL: test_loop_cold_blocks:
	; CHECK: %entry			; CHECK: %entry
	; CHECK-NOT: .p2align
	; CHECK: %unlikely1
	; CHECK-NOT: .p2align
	; CHECK: %unlikely2
	; CHECK: .p2align			; CHECK: .p2align
				; CHECK: %body3
	; CHECK: %body1			; CHECK: %body1
	; CHECK: %body2			; CHECK: %body2
	; CHECK: %body3			; CHECK-NOT: .p2align
				; CHECK: %unlikely2
				; CHECK-NOT: .p2align
				; CHECK: %unlikely1
	; CHECK: %exit			; CHECK: %exit

	entry:			entry:
	br label %body1			br label %body1

	body1:			body1:
	%iv = phi i32 [ 0, %entry ], [ %next, %body3 ]			%iv = phi i32 [ 0, %entry ], [ %next, %body3 ]
	%base = phi i32 [ 0, %entry ], [ %sum, %body3 ]			%base = phi i32 [ 0, %entry ], [ %sum, %body3 ]
	▲ Show 20 Lines • Show All 850 Lines • ▼ Show 20 Lines
	; CHECK: %if.else			; CHECK: %if.else
	; CHECK: %if.end10			; CHECK: %if.end10
	; Second rotated loop top			; Second rotated loop top
	; CHECK: .p2align			; CHECK: .p2align
	; CHECK: %if.then24			; CHECK: %if.then24
	; CHECK: %while.cond.outer			; CHECK: %while.cond.outer
	; Third rotated loop top			; Third rotated loop top
	; CHECK: .p2align			; CHECK: .p2align
				; CHECK: %if.end20
	; CHECK: %while.cond			; CHECK: %while.cond
	; CHECK: %while.body			; CHECK: %while.body
	; CHECK: %land.lhs.true			; CHECK: %land.lhs.true
	; CHECK: %if.then19			; CHECK: %if.then19
	; CHECK: %if.end20
	; CHECK: %if.then8			; CHECK: %if.then8
	; CHECK: ret			; CHECK: ret

	entry:			entry:
	%shr = ashr i32 %n, 1			%shr = ashr i32 %n, 1
	%add = add nsw i32 %shr, 1			%add = add nsw i32 %shr, 1
	%arrayidx3 = getelementptr inbounds double, double* %ra, i64 1			%arrayidx3 = getelementptr inbounds double, double* %ra, i64 1
	br label %for.cond			br label %for.cond
	▲ Show 20 Lines • Show All 208 Lines • Show Last 20 Lines

test/CodeGen/X86/code_placement_cold_loop_blocks.ll

	; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux < %s \| FileCheck %s			; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux < %s \| FileCheck %s

	define void @foo() !prof !1 {			define void @foo() !prof !1 {
	; Test if a cold block in a loop will be placed at the end of the function			; Test if a cold block in a loop will be placed at the end of the function
	; chain.			; chain.
	;			;
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
				; CHECK: callq e
	; CHECK: callq b			; CHECK: callq b
	; CHECK: callq c			; CHECK: callq c
	; CHECK: callq e
	; CHECK: callq f			; CHECK: callq f
	; CHECK: callq d			; CHECK: callq d

	entry:			entry:
	br label %header			br label %header

	header:			header:
	call void @b()			call void @b()
	Show All 20 Lines

	define void @nested_loop_0() !prof !1 {			define void @nested_loop_0() !prof !1 {
	; Test if a block that is cold in the inner loop but not cold in the outer loop			; Test if a block that is cold in the inner loop but not cold in the outer loop
	; will merged to the outer loop chain.			; will merged to the outer loop chain.
	;			;
	; CHECK-LABEL: nested_loop_0:			; CHECK-LABEL: nested_loop_0:
	; CHECK: callq c			; CHECK: callq c
	; CHECK: callq d			; CHECK: callq d
	; CHECK: callq e
	; CHECK: callq b			; CHECK: callq b
				; CHECK: callq e
	; CHECK: callq f			; CHECK: callq f

	entry:			entry:
	br label %header			br label %header

	header:			header:
	call void @b()			call void @b()
	%call4 = call zeroext i1 @a()			%call4 = call zeroext i1 @a()
	▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

test/CodeGen/X86/code_placement_ignore_succ_in_inner_loop.ll

	; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux < %s \| FileCheck %s			; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux < %s \| FileCheck %s

	define void @foo() {			define void @foo() {
	; Test that when determining the edge probability from a node in an inner loop			; Test that when determining the edge probability from a node in an inner loop
	; to a node in an outer loop, the weights on edges in the inner loop should be			; to a node in an outer loop, the weights on edges in the inner loop should be
	; ignored if we are building the chain for the outer loop.			; ignored if we are building the chain for the outer loop.
	;			;
	; CHECK-LABEL: foo:
	; CHECK: callq c			; CHECK: callq c
	; CHECK: callq b			; CHECK: callq b

	entry:			entry:
	%call = call zeroext i1 @a()			%call = call zeroext i1 @a()
	br i1 %call, label %if.then, label %if.else, !prof !1			br i1 %call, label %if.then, label %if.else, !prof !1

	if.then:			if.then:
	%call1 = call zeroext i1 @a()			%call1 = call zeroext i1 @a()
	br i1 %call1, label %while.body, label %if.end.1, !prof !1			br i1 %call1, label %while.body, label %if.end.1, !prof !1

	while.body:			while.body:
	%call2 = call zeroext i1 @a()			%call2 = call zeroext i1 @a()
	br i1 %call2, label %if.then.1, label %while.cond			br i1 %call2, label %if.then.1, label %while.cond

	if.then.1:			if.then.1:
	call void @d()			call void @d()
	br label %while.cond			br label %while.cond

	while.cond:			while.cond:
	%call3 = call zeroext i1 @a()			%call3 = call zeroext i1 @a()
	br i1 %call3, label %while.body, label %if.end			br i1 %call3, label %while.body, label %if.end, !prof !5

	if.end.1:			if.end.1:
	call void @d()			call void @d()
	br label %if.end			br label %if.end

	if.else:			if.else:
	call void @b()			call void @b()
	br label %if.end			br label %if.end
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines

	declare zeroext i1 @a()			declare zeroext i1 @a()
	declare void @b()			declare void @b()
	declare void @c()			declare void @c()
	declare void @d()			declare void @d()
	declare void @e()			declare void @e()

	!1 = !{!"branch_weights", i32 10, i32 1}			!1 = !{!"branch_weights", i32 10, i32 1}
	!2 = !{!"branch_weights", i32 100, i32 1}			!2 = !{!"branch_weights", i32 80, i32 20}
	!3 = !{!"branch_weights", i32 1, i32 100}			!3 = !{!"branch_weights", i32 1, i32 100}
	!4 = !{!"branch_weights", i32 1, i32 1}			!4 = !{!"branch_weights", i32 1, i32 1}
				!5 = !{!"branch_weights", i32 80, i32 20}

test/CodeGen/X86/code_placement_loop_rotation2.ll

	; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux < %s \| FileCheck %s			; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux < %s \| FileCheck %s
	; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux -precise-rotation-cost < %s \| FileCheck %s -check-prefix=CHECK-PROFILE			; RUN: llc -mcpu=corei7 -mtriple=x86_64-linux -precise-rotation-cost < %s \| FileCheck %s -check-prefix=CHECK-PROFILE

	define void @foo() {			define void @foo() {
	; Test a nested loop case when profile data is not available.			; Test a nested loop case when profile data is not available.
	;			;
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
				; CHECK: jmp
				davidxlUnsubmitted Not Done Reply Inline Actions By the way, this example shows that the new expected layout is not as optimal as the one when -force-precise-rotation-cost is on (see below CHECK-PROFILE). The total branch cost of the optimal layout [e, f, c, d, h, b, g ] is: C(c->e) + C(f->h) + C(d->f) + C(h->exit) + C(b->c) + C(g->h) while the total cost of aggressive loop top layout [h, b, g, f, c, d, e] is C(c->e) + C(f->h) + C(d->f) + C(h->exit) + C(b->c) + C(g->h) + C(e->f) Edge e->f is in the inner loop and it is a hot edge , the additional cost of C(e->f) can be high. davidxl: By the way, this example shows that the new expected layout is not as optimal as the one when…
				cychengAuthorUnsubmitted Not Done Reply Inline Actions Thanks for pointing out this : D cycheng: Thanks for pointing out this : D
				; CHECK: callq h
	; CHECK: callq b			; CHECK: callq b
				; CHECK: callq g
				; CHECK: callq f
	; CHECK: callq c			; CHECK: callq c
	; CHECK: callq d			; CHECK: callq d
	; CHECK: callq e			; CHECK: callq e
	; CHECK: callq f
	; CHECK: callq g
	; CHECK: callq h

	entry:			entry:
	br label %header			br label %header

	header:			header:
	call void @b()			call void @b()
	%call = call zeroext i1 @a()			%call = call zeroext i1 @a()
	br i1 %call, label %if.then, label %if.else, !prof !2			br i1 %call, label %if.then, label %if.else, !prof !2
	▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

test/CodeGen/X86/compact-unwind.ll

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines

	declare void @OSMemoryBarrier() optsize			declare void @OSMemoryBarrier() optsize

	; Test the code below uses UNWIND_X86_64_MODE_STACK_IMMD compact unwind			; Test the code below uses UNWIND_X86_64_MODE_STACK_IMMD compact unwind
	; encoding.			; encoding.

	; NOFP-CU: Entry at offset 0x20:			; NOFP-CU: Entry at offset 0x20:
	; NOFP-CU-NEXT: start: 0x1d _test1			; NOFP-CU-NEXT: start: 0x1d _test1
	; NOFP-CU-NEXT: length: 0x42			; NOFP-CU-NEXT: length: 0x46
	; NOFP-CU-NEXT: compact encoding: 0x02040c0a			; NOFP-CU-NEXT: compact encoding: 0x02040c0a

	; NOFP-FROM-ASM: Entry at offset 0x20:			; NOFP-FROM-ASM: Entry at offset 0x20:
	; NOFP-FROM-ASM-NEXT: start: 0x1d _test1			; NOFP-FROM-ASM-NEXT: start: 0x1d _test1
	; NOFP-FROM-ASM-NEXT: length: 0x42			; NOFP-FROM-ASM-NEXT: length: 0x46
	; NOFP-FROM-ASM-NEXT: compact encoding: 0x02040c0a			; NOFP-FROM-ASM-NEXT: compact encoding: 0x02040c0a

	define void @test1(%class.ImageLoader* %image) optsize ssp uwtable {			define void @test1(%class.ImageLoader* %image) optsize ssp uwtable {
	entry:			entry:
	br label %for.cond1.preheader			br label %for.cond1.preheader

	for.cond1.preheader: ; preds = %for.inc10, %entry			for.cond1.preheader: ; preds = %for.inc10, %entry
	%p.019 = phi %"struct.dyld::MappedRanges"* [ @G1, %entry ], [ %1, %for.inc10 ]			%p.019 = phi %"struct.dyld::MappedRanges"* [ @G1, %entry ], [ %1, %for.inc10 ]
	Show All 29 Lines

test/CodeGen/X86/licm-dominance.ll

	; RUN: llc -asm-verbose=true < %s \| FileCheck %s			; RUN: llc -asm-verbose=true < %s \| FileCheck %s

	; MachineLICM should check dominance before hoisting instructions.			; MachineLICM should check dominance before hoisting instructions.
	; CHECK: ## in Loop:			; CHECK: ## %if.then26.i
				; CHECK-NEXT: ## in Loop:
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %al, %al

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
	target triple = "x86_64-apple-macosx10.7.2"			target triple = "x86_64-apple-macosx10.7.2"

	define void @CMSColorWorldCreateParametricData() nounwind uwtable optsize ssp {			define void @CMSColorWorldCreateParametricData() nounwind uwtable optsize ssp {
	entry:			entry:
	Show All 24 Lines

test/CodeGen/X86/mbp-false-cfg-break.ll

	; RUN: llc < %s -march=x86-64 \| FileCheck %s			; RUN: llc < %s -march=x86-64 -precise-rotation-cost=true \| FileCheck %s

	define void @test(i1 %cnd) !prof !{!"function_entry_count", i64 1024} {			define void @test(i1 %cnd) !prof !{!"function_entry_count", i64 1024} {
	; CHECK-LABEL: @test			; CHECK-LABEL: @test
	; Using the assembly comments to indicate block order..			; Using the assembly comments to indicate block order..
	; CHECK: # %loop			; CHECK: # %loop
	; CHECK: # %backedge			; CHECK: # %backedge
	; CHECK: # %exit			; CHECK: # %exit
	; CHECK: # %rare			; CHECK: # %rare
	Show All 30 Lines

test/CodeGen/X86/swifterror.ll

Show First 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	normal:
ret float 0.0		ret float 0.0
}		}

; "foo_loop" is a function that takes a swifterror parameter, it sets swifterror		; "foo_loop" is a function that takes a swifterror parameter, it sets swifterror
; under a certain condition inside a loop.		; under a certain condition inside a loop.
define float @foo_loop(%swift_error** swifterror %error_ptr_ref, i32 %cc, float %cc2) {		define float @foo_loop(%swift_error** swifterror %error_ptr_ref, i32 %cc, float %cc2) {
; CHECK-APPLE-LABEL: foo_loop:		; CHECK-APPLE-LABEL: foo_loop:
; CHECK-APPLE: movq %r12, %rax		; CHECK-APPLE: movq %r12, %rax
		; CHECK-APPLE: ucomiss
; CHECK-APPLE: testl		; CHECK-APPLE: testl
; CHECK-APPLE: je		; CHECK-APPLE: je
; CHECK-APPLE: movl $16, %edi		; CHECK-APPLE: movl $16, %edi
; CHECK-APPLE: malloc		; CHECK-APPLE: malloc
; CHECK-APPLE: movb $1, 8(%rax)		; CHECK-APPLE: movb $1, 8(%rax)
; CHECK-APPLE: ucomiss		; CHECK-APPLE: jmp
; CHECK-APPLE: jbe
; CHECK-APPLE: movq %rax, %r12		; CHECK-APPLE: movq %rax, %r12
; CHECK-APPLE: ret		; CHECK-APPLE: ret

; CHECK-O0-LABEL: foo_loop:		; CHECK-O0-LABEL: foo_loop:
; spill to stack		; spill to stack
; CHECK-O0: movq %r12, {{.*}}(%rsp)		; CHECK-O0: movq %r12, {{.*}}(%rsp)
; CHECK-O0: cmpl $0		; CHECK-O0: cmpl $0
; CHECK-O0: je		; CHECK-O0: je
▲ Show 20 Lines • Show All 187 Lines • Show Last 20 Lines

test/Transforms/AtomicExpand/ARM/atomic-expansion-v7.ll

	; RUN: opt -S -o - -mtriple=armv7-apple-ios7.0 -atomic-expand -codegen-opt-level=1 %s \| FileCheck %s			; RUN: opt -S -o - -mtriple=armv7-apple-ios7.0 -atomic-expand -codegen-opt-level=1 %s \| FileCheck %s

	define i8 @test_atomic_xchg_i8(i8* %ptr, i8 %xchgend) {			define i8 @test_atomic_xchg_i8(i8* %ptr, i8 %xchgend) {
	; CHECK-LABEL: @test_atomic_xchg_i8			; CHECK-LABEL: @test_atomic_xchg_i8
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8
	; CHECK: [[NEWVAL32:%.*]] = zext i8 %xchgend to i32			; CHECK: [[NEWVAL32:%.*]] = zext i8 %xchgend to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:[^,]*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK: ret i8 [[OLDVAL]]			; CHECK: ret i8 [[OLDVAL]]
	%res = atomicrmw xchg i8* %ptr, i8 %xchgend monotonic			%res = atomicrmw xchg i8* %ptr, i8 %xchgend monotonic
	ret i8 %res			ret i8 %res
	}			}

	define i16 @test_atomic_add_i16(i16* %ptr, i16 %addend) {			define i16 @test_atomic_add_i16(i16* %ptr, i16 %addend) {
	; CHECK-LABEL: @test_atomic_add_i16			; CHECK-LABEL: @test_atomic_add_i16
	; CHECK: call void @llvm.arm.dmb(i32 11)			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i16(i16 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i16(i16 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i16			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i16
	; CHECK: [[NEWVAL:%.*]] = add i16 [[OLDVAL]], %addend			; CHECK: [[NEWVAL:%.*]] = add i16 [[OLDVAL]], %addend
	; CHECK: [[NEWVAL32:%.*]] = zext i16 [[NEWVAL]] to i32			; CHECK: [[NEWVAL32:%.*]] = zext i16 [[NEWVAL]] to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i16(i32 [[NEWVAL32]], i16 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i16(i32 [[NEWVAL32]], i16 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:[^,]*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: call void @llvm.arm.dmb(i32 11)			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: ret i16 [[OLDVAL]]			; CHECK: ret i16 [[OLDVAL]]
	%res = atomicrmw add i16* %ptr, i16 %addend seq_cst			%res = atomicrmw add i16* %ptr, i16 %addend seq_cst
	ret i16 %res			ret i16 %res
	}			}

	define i32 @test_atomic_sub_i32(i32* %ptr, i32 %subend) {			define i32 @test_atomic_sub_i32(i32* %ptr, i32 %subend) {
	; CHECK-LABEL: @test_atomic_sub_i32			; CHECK-LABEL: @test_atomic_sub_i32
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL:%.]] = call i32 @llvm.arm.ldrex.p0i32(i32 %ptr)			; CHECK: [[OLDVAL:%.]] = call i32 @llvm.arm.ldrex.p0i32(i32 %ptr)
	; CHECK: [[NEWVAL:%.*]] = sub i32 [[OLDVAL]], %subend			; CHECK: [[NEWVAL:%.*]] = sub i32 [[OLDVAL]], %subend
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i32(i32 [[NEWVAL]], i32 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i32(i32 [[NEWVAL]], i32 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:[^,]*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: call void @llvm.arm.dmb(i32 11)			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: ret i32 [[OLDVAL]]			; CHECK: ret i32 [[OLDVAL]]
	%res = atomicrmw sub i32* %ptr, i32 %subend acquire			%res = atomicrmw sub i32* %ptr, i32 %subend acquire
	ret i32 %res			ret i32 %res
	}			}

	define i8 @test_atomic_and_i8(i8* %ptr, i8 %andend) {			define i8 @test_atomic_and_i8(i8* %ptr, i8 %andend) {
	; CHECK-LABEL: @test_atomic_and_i8			; CHECK-LABEL: @test_atomic_and_i8
	; CHECK: call void @llvm.arm.dmb(i32 11)			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8
	; CHECK: [[NEWVAL:%.*]] = and i8 [[OLDVAL]], %andend			; CHECK: [[NEWVAL:%.*]] = and i8 [[OLDVAL]], %andend
	; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32			; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:[^,]*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK: ret i8 [[OLDVAL]]			; CHECK: ret i8 [[OLDVAL]]
	%res = atomicrmw and i8* %ptr, i8 %andend release			%res = atomicrmw and i8* %ptr, i8 %andend release
	ret i8 %res			ret i8 %res
	}			}

	define i16 @test_atomic_nand_i16(i16* %ptr, i16 %nandend) {			define i16 @test_atomic_nand_i16(i16* %ptr, i16 %nandend) {
	; CHECK-LABEL: @test_atomic_nand_i16			; CHECK-LABEL: @test_atomic_nand_i16
	; CHECK: call void @llvm.arm.dmb(i32 11)			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i16(i16 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i16(i16 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i16			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i16
	; CHECK: [[NEWVAL_TMP:%.*]] = and i16 [[OLDVAL]], %nandend			; CHECK: [[NEWVAL_TMP:%.*]] = and i16 [[OLDVAL]], %nandend
	; CHECK: [[NEWVAL:%.*]] = xor i16 [[NEWVAL_TMP]], -1			; CHECK: [[NEWVAL:%.*]] = xor i16 [[NEWVAL_TMP]], -1
	; CHECK: [[NEWVAL32:%.*]] = zext i16 [[NEWVAL]] to i32			; CHECK: [[NEWVAL32:%.*]] = zext i16 [[NEWVAL]] to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i16(i32 [[NEWVAL32]], i16 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i16(i32 [[NEWVAL32]], i16 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:[^,]*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: call void @llvm.arm.dmb(i32 11)			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: ret i16 [[OLDVAL]]			; CHECK: ret i16 [[OLDVAL]]
	%res = atomicrmw nand i16* %ptr, i16 %nandend seq_cst			%res = atomicrmw nand i16* %ptr, i16 %nandend seq_cst
	ret i16 %res			ret i16 %res
	}			}

	define i64 @test_atomic_or_i64(i64* %ptr, i64 %orend) {			define i64 @test_atomic_or_i64(i64* %ptr, i64 %orend) {
	Show All 11 Lines
	; CHECK: [[OLDVAL:%.*]] = or i64 [[LO64]], [[HI64]]			; CHECK: [[OLDVAL:%.*]] = or i64 [[LO64]], [[HI64]]
	; CHECK: [[NEWVAL:%.*]] = or i64 [[OLDVAL]], %orend			; CHECK: [[NEWVAL:%.*]] = or i64 [[OLDVAL]], %orend
	; CHECK: [[NEWLO:%.*]] = trunc i64 [[NEWVAL]] to i32			; CHECK: [[NEWLO:%.*]] = trunc i64 [[NEWVAL]] to i32
	; CHECK: [[NEWHI_TMP:%.*]] = lshr i64 [[NEWVAL]], 32			; CHECK: [[NEWHI_TMP:%.*]] = lshr i64 [[NEWVAL]], 32
	; CHECK: [[NEWHI:%.*]] = trunc i64 [[NEWHI_TMP]] to i32			; CHECK: [[NEWHI:%.*]] = trunc i64 [[NEWHI_TMP]] to i32
	; CHECK: [[PTR8:%.]] = bitcast i64 %ptr to i8*			; CHECK: [[PTR8:%.]] = bitcast i64 %ptr to i8*
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strexd(i32 [[NEWLO]], i32 [[NEWHI]], i8 [[PTR8]])			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strexd(i32 [[NEWLO]], i32 [[NEWHI]], i8 [[PTR8]])
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:[^,]*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: call void @llvm.arm.dmb(i32 11)			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: ret i64 [[OLDVAL]]			; CHECK: ret i64 [[OLDVAL]]
	%res = atomicrmw or i64* %ptr, i64 %orend seq_cst			%res = atomicrmw or i64* %ptr, i64 %orend seq_cst
	ret i64 %res			ret i64 %res
	}			}

	define i8 @test_atomic_xor_i8(i8* %ptr, i8 %xorend) {			define i8 @test_atomic_xor_i8(i8* %ptr, i8 %xorend) {
	; CHECK-LABEL: @test_atomic_xor_i8			; CHECK-LABEL: @test_atomic_xor_i8
	; CHECK: call void @llvm.arm.dmb(i32 11)			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8
	; CHECK: [[NEWVAL:%.*]] = xor i8 [[OLDVAL]], %xorend			; CHECK: [[NEWVAL:%.*]] = xor i8 [[OLDVAL]], %xorend
	; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32			; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:[^,]*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: call void @llvm.arm.dmb(i32 11)			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: ret i8 [[OLDVAL]]			; CHECK: ret i8 [[OLDVAL]]
	%res = atomicrmw xor i8* %ptr, i8 %xorend seq_cst			%res = atomicrmw xor i8* %ptr, i8 %xorend seq_cst
	ret i8 %res			ret i8 %res
	}			}

	define i8 @test_atomic_max_i8(i8* %ptr, i8 %maxend) {			define i8 @test_atomic_max_i8(i8* %ptr, i8 %maxend) {
	; CHECK-LABEL: @test_atomic_max_i8			; CHECK-LABEL: @test_atomic_max_i8
	; CHECK: call void @llvm.arm.dmb(i32 11)			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8
	; CHECK: [[WANT_OLD:%.*]] = icmp sgt i8 [[OLDVAL]], %maxend			; CHECK: [[WANT_OLD:%.*]] = icmp sgt i8 [[OLDVAL]], %maxend
	; CHECK: [[NEWVAL:%.*]] = select i1 [[WANT_OLD]], i8 [[OLDVAL]], i8 %maxend			; CHECK: [[NEWVAL:%.*]] = select i1 [[WANT_OLD]], i8 [[OLDVAL]], i8 %maxend
	; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32			; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:[^,]*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: call void @llvm.arm.dmb(i32 11)			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: ret i8 [[OLDVAL]]			; CHECK: ret i8 [[OLDVAL]]
	%res = atomicrmw max i8* %ptr, i8 %maxend seq_cst			%res = atomicrmw max i8* %ptr, i8 %maxend seq_cst
	ret i8 %res			ret i8 %res
	}			}

	define i8 @test_atomic_min_i8(i8* %ptr, i8 %minend) {			define i8 @test_atomic_min_i8(i8* %ptr, i8 %minend) {
	; CHECK-LABEL: @test_atomic_min_i8			; CHECK-LABEL: @test_atomic_min_i8
	; CHECK: call void @llvm.arm.dmb(i32 11)			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8
	; CHECK: [[WANT_OLD:%.*]] = icmp sle i8 [[OLDVAL]], %minend			; CHECK: [[WANT_OLD:%.*]] = icmp sle i8 [[OLDVAL]], %minend
	; CHECK: [[NEWVAL:%.*]] = select i1 [[WANT_OLD]], i8 [[OLDVAL]], i8 %minend			; CHECK: [[NEWVAL:%.*]] = select i1 [[WANT_OLD]], i8 [[OLDVAL]], i8 %minend
	; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32			; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:[^,]*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: call void @llvm.arm.dmb(i32 11)			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: ret i8 [[OLDVAL]]			; CHECK: ret i8 [[OLDVAL]]
	%res = atomicrmw min i8* %ptr, i8 %minend seq_cst			%res = atomicrmw min i8* %ptr, i8 %minend seq_cst
	ret i8 %res			ret i8 %res
	}			}

	define i8 @test_atomic_umax_i8(i8* %ptr, i8 %umaxend) {			define i8 @test_atomic_umax_i8(i8* %ptr, i8 %umaxend) {
	; CHECK-LABEL: @test_atomic_umax_i8			; CHECK-LABEL: @test_atomic_umax_i8
	; CHECK: call void @llvm.arm.dmb(i32 11)			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8
	; CHECK: [[WANT_OLD:%.*]] = icmp ugt i8 [[OLDVAL]], %umaxend			; CHECK: [[WANT_OLD:%.*]] = icmp ugt i8 [[OLDVAL]], %umaxend
	; CHECK: [[NEWVAL:%.*]] = select i1 [[WANT_OLD]], i8 [[OLDVAL]], i8 %umaxend			; CHECK: [[NEWVAL:%.*]] = select i1 [[WANT_OLD]], i8 [[OLDVAL]], i8 %umaxend
	; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32			; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:[^,]*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: call void @llvm.arm.dmb(i32 11)			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: ret i8 [[OLDVAL]]			; CHECK: ret i8 [[OLDVAL]]
	%res = atomicrmw umax i8* %ptr, i8 %umaxend seq_cst			%res = atomicrmw umax i8* %ptr, i8 %umaxend seq_cst
	ret i8 %res			ret i8 %res
	}			}

	define i8 @test_atomic_umin_i8(i8* %ptr, i8 %uminend) {			define i8 @test_atomic_umin_i8(i8* %ptr, i8 %uminend) {
	; CHECK-LABEL: @test_atomic_umin_i8			; CHECK-LABEL: @test_atomic_umin_i8
	; CHECK: call void @llvm.arm.dmb(i32 11)			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8
	; CHECK: [[WANT_OLD:%.*]] = icmp ule i8 [[OLDVAL]], %uminend			; CHECK: [[WANT_OLD:%.*]] = icmp ule i8 [[OLDVAL]], %uminend
	; CHECK: [[NEWVAL:%.*]] = select i1 [[WANT_OLD]], i8 [[OLDVAL]], i8 %uminend			; CHECK: [[NEWVAL:%.*]] = select i1 [[WANT_OLD]], i8 [[OLDVAL]], i8 %uminend
	; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32			; CHECK: [[NEWVAL32:%.*]] = zext i8 [[NEWVAL]] to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:[^,]*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK: call void @llvm.arm.dmb(i32 11)			; CHECK: call void @llvm.arm.dmb(i32 11)
	; CHECK: ret i8 [[OLDVAL]]			; CHECK: ret i8 [[OLDVAL]]
	%res = atomicrmw umin i8* %ptr, i8 %uminend seq_cst			%res = atomicrmw umin i8* %ptr, i8 %uminend seq_cst
	ret i8 %res			ret i8 %res
	}			}

	define i8 @test_cmpxchg_i8_seqcst_seqcst(i8* %ptr, i8 %desired, i8 %newval) {			define i8 @test_cmpxchg_i8_seqcst_seqcst(i8* %ptr, i8 %desired, i8 %newval) {
	▲ Show 20 Lines • Show All 217 Lines • Show Last 20 Lines

test/Transforms/AtomicExpand/ARM/atomic-expansion-v8.ll

	; RUN: opt -S -o - -mtriple=armv8-linux-gnueabihf -atomic-expand %s -codegen-opt-level=1 \| FileCheck %s			; RUN: opt -S -o - -mtriple=armv8-linux-gnueabihf -atomic-expand %s -codegen-opt-level=1 \| FileCheck %s

	define i8 @test_atomic_xchg_i8(i8* %ptr, i8 %xchgend) {			define i8 @test_atomic_xchg_i8(i8* %ptr, i8 %xchgend) {
	; CHECK-LABEL: @test_atomic_xchg_i8			; CHECK-LABEL: @test_atomic_xchg_i8
	; CHECK-NOT: fence			; CHECK-NOT: fence
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldrex.p0i8(i8 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i8
	; CHECK: [[NEWVAL32:%.*]] = zext i8 %xchgend to i32			; CHECK: [[NEWVAL32:%.*]] = zext i8 %xchgend to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i8(i32 [[NEWVAL32]], i8 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:[^,]*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK-NOT: fence			; CHECK-NOT: fence
	; CHECK: ret i8 [[OLDVAL]]			; CHECK: ret i8 [[OLDVAL]]
	%res = atomicrmw xchg i8* %ptr, i8 %xchgend monotonic			%res = atomicrmw xchg i8* %ptr, i8 %xchgend monotonic
	ret i8 %res			ret i8 %res
	}			}

	define i16 @test_atomic_add_i16(i16* %ptr, i16 %addend) {			define i16 @test_atomic_add_i16(i16* %ptr, i16 %addend) {
	; CHECK-LABEL: @test_atomic_add_i16			; CHECK-LABEL: @test_atomic_add_i16
	; CHECK-NOT: fence			; CHECK-NOT: fence
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldaex.p0i16(i16 %ptr)			; CHECK: [[OLDVAL32:%.]] = call i32 @llvm.arm.ldaex.p0i16(i16 %ptr)
	; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i16			; CHECK: [[OLDVAL:%.*]] = trunc i32 [[OLDVAL32]] to i16
	; CHECK: [[NEWVAL:%.*]] = add i16 [[OLDVAL]], %addend			; CHECK: [[NEWVAL:%.*]] = add i16 [[OLDVAL]], %addend
	; CHECK: [[NEWVAL32:%.*]] = zext i16 [[NEWVAL]] to i32			; CHECK: [[NEWVAL32:%.*]] = zext i16 [[NEWVAL]] to i32
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.stlex.p0i16(i32 [[NEWVAL32]], i16 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.stlex.p0i16(i32 [[NEWVAL32]], i16 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:[^,]*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK-NOT: fence			; CHECK-NOT: fence
	; CHECK: ret i16 [[OLDVAL]]			; CHECK: ret i16 [[OLDVAL]]
	%res = atomicrmw add i16* %ptr, i16 %addend seq_cst			%res = atomicrmw add i16* %ptr, i16 %addend seq_cst
	ret i16 %res			ret i16 %res
	}			}

	define i32 @test_atomic_sub_i32(i32* %ptr, i32 %subend) {			define i32 @test_atomic_sub_i32(i32* %ptr, i32 %subend) {
	; CHECK-LABEL: @test_atomic_sub_i32			; CHECK-LABEL: @test_atomic_sub_i32
	; CHECK-NOT: fence			; CHECK-NOT: fence
	; CHECK: br label %[[LOOP:.*]]			; CHECK: br label %[[LOOP:.*]]
	; CHECK: [[LOOP]]:			; CHECK: [[LOOP]]:
	; CHECK: [[OLDVAL:%.]] = call i32 @llvm.arm.ldaex.p0i32(i32 %ptr)			; CHECK: [[OLDVAL:%.]] = call i32 @llvm.arm.ldaex.p0i32(i32 %ptr)
	; CHECK: [[NEWVAL:%.*]] = sub i32 [[OLDVAL]], %subend			; CHECK: [[NEWVAL:%.*]] = sub i32 [[OLDVAL]], %subend
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i32(i32 [[NEWVAL]], i32 %ptr)			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.strex.p0i32(i32 [[NEWVAL]], i32 %ptr)
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:[^,]*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK-NOT: fence			; CHECK-NOT: fence
	; CHECK: ret i32 [[OLDVAL]]			; CHECK: ret i32 [[OLDVAL]]
	%res = atomicrmw sub i32* %ptr, i32 %subend acquire			%res = atomicrmw sub i32* %ptr, i32 %subend acquire
	ret i32 %res			ret i32 %res
	}			}

	define i64 @test_atomic_or_i64(i64* %ptr, i64 %orend) {			define i64 @test_atomic_or_i64(i64* %ptr, i64 %orend) {
	Show All 11 Lines
	; CHECK: [[OLDVAL:%.*]] = or i64 [[LO64]], [[HI64]]			; CHECK: [[OLDVAL:%.*]] = or i64 [[LO64]], [[HI64]]
	; CHECK: [[NEWVAL:%.*]] = or i64 [[OLDVAL]], %orend			; CHECK: [[NEWVAL:%.*]] = or i64 [[OLDVAL]], %orend
	; CHECK: [[NEWLO:%.*]] = trunc i64 [[NEWVAL]] to i32			; CHECK: [[NEWLO:%.*]] = trunc i64 [[NEWVAL]] to i32
	; CHECK: [[NEWHI_TMP:%.*]] = lshr i64 [[NEWVAL]], 32			; CHECK: [[NEWHI_TMP:%.*]] = lshr i64 [[NEWVAL]], 32
	; CHECK: [[NEWHI:%.*]] = trunc i64 [[NEWHI_TMP]] to i32			; CHECK: [[NEWHI:%.*]] = trunc i64 [[NEWHI_TMP]] to i32
	; CHECK: [[PTR8:%.]] = bitcast i64 %ptr to i8*			; CHECK: [[PTR8:%.]] = bitcast i64 %ptr to i8*
	; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.stlexd(i32 [[NEWLO]], i32 [[NEWHI]], i8 [[PTR8]])			; CHECK: [[TRYAGAIN:%.]] = call i32 @llvm.arm.stlexd(i32 [[NEWLO]], i32 [[NEWHI]], i8 [[PTR8]])
	; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0			; CHECK: [[TST:%.*]] = icmp ne i32 [[TRYAGAIN]], 0
	; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:.*]]			; CHECK: br i1 [[TST]], label %[[LOOP]], label %[[END:[^,]*]]
	; CHECK: [[END]]:			; CHECK: [[END]]:
	; CHECK-NOT: fence			; CHECK-NOT: fence
	; CHECK: ret i64 [[OLDVAL]]			; CHECK: ret i64 [[OLDVAL]]
	%res = atomicrmw or i64* %ptr, i64 %orend seq_cst			%res = atomicrmw or i64* %ptr, i64 %orend seq_cst
	ret i64 %res			ret i64 %res
	}			}

	define i8 @test_cmpxchg_i8_seqcst_seqcst(i8* %ptr, i8 %desired, i8 %newval) {			define i8 @test_cmpxchg_i8_seqcst_seqcst(i8* %ptr, i8 %desired, i8 %newval) {
	▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Aggressive choosing best loop topAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 56713

lib/CodeGen/AtomicExpandPass.cpp

lib/CodeGen/MachineBlockPlacement.cpp

lib/Target/SystemZ/SystemZISelLowering.cpp

test/CodeGen/AArch64/swifterror.ll

test/CodeGen/AMDGPU/valu-i1.ll

test/CodeGen/ARM/code-placement.ll

test/CodeGen/ARM/swifterror.ll

test/CodeGen/SystemZ/loop-01.ll

test/CodeGen/SystemZ/swifterror.ll

test/CodeGen/Thumb2/2010-02-11-phi-cycle.ll

test/CodeGen/X86/block-placement.ll

test/CodeGen/X86/code_placement_cold_loop_blocks.ll

test/CodeGen/X86/code_placement_ignore_succ_in_inner_loop.ll

test/CodeGen/X86/code_placement_loop_rotation2.ll

test/CodeGen/X86/compact-unwind.ll

test/CodeGen/X86/licm-dominance.ll

test/CodeGen/X86/mbp-false-cfg-break.ll

test/CodeGen/X86/swifterror.ll

test/Transforms/AtomicExpand/ARM/atomic-expansion-v7.ll

test/Transforms/AtomicExpand/ARM/atomic-expansion-v8.ll

Aggressive choosing best loop top
AbandonedPublic