This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
3/7
MachineSink.cpp
-
test/CodeGen/
-
CodeGen/
-
RISCV/
-
select-optimize-multiple.ll
-
X86/
-
2007-01-13-StackPtrIndex.ll
-
MachineSink-eflags.ll
-
avx2-masked-gather.ll
-
cmovcmov.ll
-
select.ll
-
vec_int_to_fp.ll

Differential D86864

[MachineSinking] sink more profitable loads
ClosedPublic

Authored by shchenz on Aug 31 2020, 2:11 AM.

Download Raw Diff

Details

Reviewers

MatzeB
echristo
efriedma
qcolombet

Group Reviewers

Restricted Project

Commits

rG24a31922ce2a: [MachineSink] sink more profitable loads

Summary

Instead of setting Store to true conservatively, handle some cases where Store can be determined by some analysis if MI.parent() dom SuccToSinkTo and SuccToSinkTo post dom MI.parent().

Thus we can get more profitable loads to be sunk.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

shchenz created this revision.Aug 31 2020, 2:11 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2020, 2:11 AM

Herald added subscribers: llvm-commits, danielkiss, luismarques and 23 others. · View Herald Transcript

shchenz requested review of this revision.Aug 31 2020, 2:11 AM

Herald added a subscriber: MaskRay. · View Herald TranscriptAug 31 2020, 2:11 AM

Harbormaster completed remote builds in B70074: Diff 288903.Aug 31 2020, 2:52 AM

shchenz edited the summary of this revision. (Show Details)Aug 31 2020, 5:06 AM

handle further more cases to sink profitable loads

shchenz mentioned this in D86925: [MachineSink] add one more profitable pattern for sinking.Sep 1 2020, 3:34 AM

Harbormaster completed remote builds in B70205: Diff 289112.Sep 1 2020, 3:56 AM

Do we need some threshold to limit the analysis? I'm worried this could get very expensive.

llvm/lib/CodeGen/MachineSink.cpp
942	Are the post-dominance checks necessary? I'm not sure what invariants you're trying to enforce.

1: can not check alias for call instruction by using mayAlias
2: add one more cache to save compiling time.

In D86864#2250785, @efriedma wrote:

Do we need some threshold to limit the analysis? I'm worried this could get very expensive.

I add a new cache to save compiling time. Could you be more specific on how to add the threshold? Thanks.

llvm/lib/CodeGen/MachineSink.cpp
942	they are not necessary. I add the post-dominance check for the convenience of finding all blocks which are in the path from block From to To. With these checks, it is easy to judge if a block BB(reachable from From) is in the path from From to To. Just check `PDT->dominates(To, BB))` `

lenary removed a subscriber: lenary.Sep 2 2020, 3:57 AM

Harbormaster completed remote builds in B70359: Diff 289389.Sep 2 2020, 4:17 AM

gentle ping

Hi @efriedma,

Given you've already taken a look, can you do the review or do you want me to step in?

Cheers,
-Quentin

I'd appreciate it if you'd step in, Quentin. I have a backlog of reviews, and I still haven't quite wrapped my head around the use of post-dominance here.

@efriedma @qcolombet haha, thanks for your review. Please take your time. ^-^ . I want to make the alias check simple(find all basic blocks in all paths from block From to block To), so I guard the dominance between From and To as From dominates To and To post dominates From. This will make us still miss some cases in which load instruction can be sunk. But it can make the complexity simple.

gentle ping ^^

Hi,

Looks reasonable but I'd like to have an idea of the compile time impact of this patch.

Make sure to fix the Lint comments as well.

Cheers,
-Quentin

llvm/lib/CodeGen/MachineSink.cpp
347	Would it make sense to preserve the cache as long as we don't move stores, calls and loads with ordered memory ref? I'm worried about the compile time impact of this patch so I am wondering if we want to be smarter about invalidating the cache.
965	While we traverse BB, should we create an entry for the pair (To, BB). Put differently, how is the compile time impact of this patch and do we have to do more to avoid computations?

1: clear the cache just before function returns
2: cache handled bb while calling depth_first() for a block

Thanks very much for your review. @qcolombet Update accordingly. I collected some compiling time numbers for llvm test-suite. No obivious degradation found.

llvm/lib/CodeGen/MachineSink.cpp

347

Good idea. But seems we still need to clear the cache before we return from MachineSinking::runOnMachineFunction. (MachineSink instance is shared with different callings to MachineSinking::runOnMachineFunction for different functions), otherwise I meet some weird memory issue.

965

I collected some data from PowerPC, seems the compiling time difference is not obivious in llvm test-suite. the tool hides some small diff tests. Among them, I saw some tests takes 10s and some of them takes more than 30s, but no big reg for them. The biggest diff 9.6% and 7.4%, compile time are around 0.1s. And these two degradations can not reproduciable in other runs.

./utils/compare.py -m compile_time base.json fix.json  
Tests: 309
Metric: compile_time

Program                                        base  fix   diff 
                                                                  
 test-suite...enchmarks/Stanford/Queens.test       9.6%
 test-suite...d-warshall/floyd-warshall.test       7.4%
 test-suite...ImageProcessing/Blur/blur.test      -4.7%
 test-suite...SubsetBRawLoops/lcalsBRaw.test       2.7%
 test-suite...math/automotive-basicmath.test     -2.5%
 test-suite.../Benchmarks/Stanford/Perm.test      -2.4%
 test-suite...rks/tramp3d-v4/tramp3d-v4.test     -2.2%
 test-suite...ow-dbl/GlobalDataFlow-dbl.test      -2.2%
 test-suite...enchmarks/Stanford/RealMM.test      -2.0%
 test-suite...ctions-flt/Reductions-flt.test      -1.8%
 test-suite...t/StatementReordering-flt.test      -1.7%
 test-suite.../Trimaran/enc-rc4/enc-rc4.test      -1.7%
 test-suite...l/StatementReordering-dbl.test      1.6%
 test-suite...pansion-dbl/Expansion-dbl.test      -1.4%
 test-suite...bl/IndirectAddressing-dbl.test      -1.2%
 Geomean difference                                           nan%
            base        fix        diff
count  309.000000  309.000000  277.000000
mean   2.696964    2.685603   -0.003232  
std    6.283106    6.251353    0.009456  
min    0.000000    0.000000   -0.047321  
25%    0.077944    0.077862   -0.005599  
50%    0.291912    0.289780   -0.003404  
75%    2.475318    2.478657   -0.001450  
max    64.257411   64.088470   0.095601

Harbormaster completed remote builds in B75995: Diff 299890.Oct 22 2020, 2:19 AM

1: address Lint suggestion

Harbormaster completed remote builds in B75997: Diff 299903.Oct 22 2020, 3:16 AM

Ah, I used @nikic perf testing tool for this patch on the X86 target(?). (Thanks for this great tool ^-^) Here is the result of the compiling time: https://llvm-compile-time-tracker.com/compare.php?from=2d25004a137223a02aa06e8bfd512a648f3b3f94&to=abf39366779bed8b9f2d276cd199c3b78471e3b1&stat=instructions

GEO degradates about 0.02% and no obivious outlier. I guess the compiling time increase on these benchmarks should be acceptable?

I guess the compiling time increase on these benchmarks should be acceptable?

Yes, that's fine by me.

llvm/lib/CodeGen/MachineSink.cpp
347	Yeah, that's expected.

This revision is now accepted and ready to land.Oct 27 2020, 9:15 AM

This revision was landed with ongoing or failed builds.Nov 1 2020, 6:16 PM

Closed by commit rG24a31922ce2a: [MachineSink] sink more profitable loads (authored by shchenz). · Explain Why

This revision was automatically updated to reflect the committed changes.

shchenz added a commit: rG24a31922ce2a: [MachineSink] sink more profitable loads.

Herald added subscribers: frasercrmck, pengfei. · View Herald TranscriptNov 1 2020, 6:16 PM

shchenz mentioned this in D88126: [Machinesink] add more profitable pattern if target bb register pressure is not too high.Nov 6 2020, 5:32 PM

LiuChen3 added a subscriber: LiuChen3.Dec 22 2020, 4:56 PM

Hi, @shchenz. Our several opecncl benchmarks have appeared great compile time regression.
For only one function, the time consume on Machine code sinking pass increased form 6.0711s to 366.5713.
According to your algorithm, this patch will obviously increase the compile time for some cases.

In D86864#2469289, @LiuChen3 wrote:

Hi, @shchenz. Our several opecncl benchmarks have appeared great compile time regression.
For only one function, the time consume on Machine code sinking pass increased form 6.0711s to 366.5713.
According to your algorithm, this patch will obviously increase the compile time for some cases.

Yes, we have been aware that this patch may introduce compiling time degradations. And as you can see in previous comments, I already tested the compiling time on X86 arch. Sadly, the tested benchmarks don't expose any regressions.

Could you please help to send me your regression function/IR? So I can have a look about how to fix it? Thanks.

LuoYuanke added a subscriber: LuoYuanke.Dec 22 2020, 7:29 PM

Yes, we have been aware that this patch may introduce compiling time degradations. And as you can see in previous comments, I already tested the compiling time on X86 arch. Sadly, the tested benchmarks don't expose any regressions.

Could you please help to send me your regression function/IR? So I can have a look about how to fix it? Thanks.

I am sorry that I can't provide the case to you directly. I am trying making a small reproduce but encountered some problems.
However, can you adding an option to control it?
Thanks.

In D86864#2469576, @LiuChen3 wrote:

Yes, we have been aware that this patch may introduce compiling time degradations. And as you can see in previous comments, I already tested the compiling time on X86 arch. Sadly, the tested benchmarks don't expose any regressions.

Could you please help to send me your regression function/IR? So I can have a look about how to fix it? Thanks.

I am sorry that I can't provide the case to you directly. I am trying making a small reproduce but encountered some problems.
However, can you adding an option to control it?
Thanks.

I can surely do that. But I think the most reasonable solution would be fix the compiling time issue. Since compiling time tests I did before does not expose any regression, your test case must be a little special. Could you find out the special point, for example the function has too many blocks or some/many blocks in the function has too many instructions? Thanks.

I can surely do that. But I think the most reasonable solution would be fix the compiling time issue. Since compiling time tests I did before does not expose any regression, your test case must be a little special. Could you find out the special point, for example the function has too many blocks or some/many blocks in the function has too many instructions? Thanks.

I think the increase in compile time is because the function has too many instructions and blocks. The function has Thousands of lines of instructions. Can you add limitation for the number of instructions or the number of blocks so the check for 'store' can end early?

In D86864#2472369, @LiuChen3 wrote:

I can surely do that. But I think the most reasonable solution would be fix the compiling time issue. Since compiling time tests I did before does not expose any regression, your test case must be a little special. Could you find out the special point, for example the function has too many blocks or some/many blocks in the function has too many instructions? Thanks.

I think the increase in compile time is because the function has too many instructions and blocks. The function has Thousands of lines of instructions. Can you add limitation for the number of instructions or the number of blocks so the check for 'store' can end early?

Could you please help to check if the following change can solve your issue?

C
diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp
index 0abdf89..8ca3520 100644
--- a/llvm/lib/CodeGen/MachineSink.cpp
+++ b/llvm/lib/CodeGen/MachineSink.cpp
@@ -79,6 +79,12 @@ static cl::opt<unsigned> SplitEdgeProbabilityThreshold(
         "splitted critical edge"),
     cl::init(40), cl::Hidden);
 
+static cl::opt<unsigned> SinkLoadInstsPerBlockThreshold(
+    "machine-sink-load-instrs-threshold",
+    cl::desc("Do not try to find alias store for a load if there is a in-path "
+             "block whose instruction number is higher than this threshold."),
+    cl::init(2000), cl::Hidden);
+
 STATISTIC(NumSunk,      "Number of machine instructions sunk");
 STATISTIC(NumSplit,     "Number of critical edges split");
 STATISTIC(NumCoalesces, "Number of copies coalesced");
@@ -1036,6 +1042,12 @@ bool MachineSinking::hasStoreBetween(MachineBasicBlock *From,
     HandledBlocks.insert(BB);
     // To post dominates BB, it must be a path from block From.
     if (PDT->dominates(To, BB)) {
+      // If this BB is too big, stop searching to save compiling time.
+      if (BB->size() > SinkLoadInstsPerBlockThreshold) {
+        HasStoreCache[BlockPair] = true;
+        return true;
+      }
+
       for (MachineInstr &I : *BB) {
         // Treat as alias conservatively for a call or an ordered memory
         // operation.

In D86864#2472375, @shchenz wrote:
In D86864#2472369, @LiuChen3 wrote:

I can surely do that. But I think the most reasonable solution would be fix the compiling time issue. Since compiling time tests I did before does not expose any regression, your test case must be a little special. Could you find out the special point, for example the function has too many blocks or some/many blocks in the function has too many instructions? Thanks.

I think the increase in compile time is because the function has too many instructions and blocks. The function has Thousands of lines of instructions. Can you add limitation for the number of instructions or the number of blocks so the check for 'store' can end early?

Could you please help to check if the following change can solve your issue?
C
diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp
index 0abdf89..8ca3520 100644
--- a/llvm/lib/CodeGen/MachineSink.cpp
+++ b/llvm/lib/CodeGen/MachineSink.cpp
@@ -79,6 +79,12 @@ static cl::opt<unsigned> SplitEdgeProbabilityThreshold(
         "splitted critical edge"),
     cl::init(40), cl::Hidden);
 
+static cl::opt<unsigned> SinkLoadInstsPerBlockThreshold(
+    "machine-sink-load-instrs-threshold",
+    cl::desc("Do not try to find alias store for a load if there is a in-path "
+             "block whose instruction number is higher than this threshold."),
+    cl::init(2000), cl::Hidden);
+
 STATISTIC(NumSunk,      "Number of machine instructions sunk");
 STATISTIC(NumSplit,     "Number of critical edges split");
 STATISTIC(NumCoalesces, "Number of copies coalesced");
@@ -1036,6 +1042,12 @@ bool MachineSinking::hasStoreBetween(MachineBasicBlock *From,
     HandledBlocks.insert(BB);
     // To post dominates BB, it must be a path from block From.
     if (PDT->dominates(To, BB)) {
+      // If this BB is too big, stop searching to save compiling time.
+      if (BB->size() > SinkLoadInstsPerBlockThreshold) {
+        HasStoreCache[BlockPair] = true;
+        return true;
+      }
+
       for (MachineInstr &I : *BB) {
         // Treat as alias conservatively for a call or an ordered memory
         // operation.

Unfortunately, this patch doesn't work. I don't think the number of blocks is necessarily related to the number of instructions.
For one function, with your default threshold, The time spent on Machine code sinking is 11.2464s, no obvious difference from before. When set the threshold to 1, the time spent on Machine code sinking reduced 0.6937.
I think it would be better if you can limit the number of MIs.
Anyway, this patch provides us with an option. Thanks for your help.

In D86864#2472387, @LiuChen3 wrote:
In D86864#2472375, @shchenz wrote:
In D86864#2472369, @LiuChen3 wrote:

I can surely do that. But I think the most reasonable solution would be fix the compiling time issue. Since compiling time tests I did before does not expose any regression, your test case must be a little special. Could you find out the special point, for example the function has too many blocks or some/many blocks in the function has too many instructions? Thanks.

I think the increase in compile time is because the function has too many instructions and blocks. The function has Thousands of lines of instructions. Can you add limitation for the number of instructions or the number of blocks so the check for 'store' can end early?

Could you please help to check if the following change can solve your issue?
C
diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp
index 0abdf89..8ca3520 100644
--- a/llvm/lib/CodeGen/MachineSink.cpp
+++ b/llvm/lib/CodeGen/MachineSink.cpp
@@ -79,6 +79,12 @@ static cl::opt<unsigned> SplitEdgeProbabilityThreshold(
         "splitted critical edge"),
     cl::init(40), cl::Hidden);
 
+static cl::opt<unsigned> SinkLoadInstsPerBlockThreshold(
+    "machine-sink-load-instrs-threshold",
+    cl::desc("Do not try to find alias store for a load if there is a in-path "
+             "block whose instruction number is higher than this threshold."),
+    cl::init(2000), cl::Hidden);
+
 STATISTIC(NumSunk,      "Number of machine instructions sunk");
 STATISTIC(NumSplit,     "Number of critical edges split");
 STATISTIC(NumCoalesces, "Number of copies coalesced");
@@ -1036,6 +1042,12 @@ bool MachineSinking::hasStoreBetween(MachineBasicBlock *From,
     HandledBlocks.insert(BB);
     // To post dominates BB, it must be a path from block From.
     if (PDT->dominates(To, BB)) {
+      // If this BB is too big, stop searching to save compiling time.
+      if (BB->size() > SinkLoadInstsPerBlockThreshold) {
+        HasStoreCache[BlockPair] = true;
+        return true;
+      }
+
       for (MachineInstr &I : *BB) {
         // Treat as alias conservatively for a call or an ordered memory
         // operation.
Unfortunately, this patch doesn't work. I don't think the number of blocks is necessarily related to the number of instructions.
For one function, with your default threshold, The time spent on Machine code sinking is 11.2464s, no obvious difference from before. When set the threshold to 1, the time spent on Machine code sinking reduced 0.6937.
I think it would be better if you can limit the number of MIs.
Anyway, this patch provides us with an option. Thanks for your help.

The above threshold is for number of MIs. BB->size() is to get instruction number of BB. I committed 31c2b93d83f63ce7f9bb4977f58de2e00bf18e0f to further reduce compiling time. You can have a try

Is it better to have the check before 'if (PDT->dominates(To, BB)) {' ?

In D86864#2472470, @steven.zhang wrote:

Is it better to have the check before 'if (PDT->dominates(To, BB)) {' ?

I don't think so. If there is a huge block that is reachable from block From but is not dominated by From and can not flow to To, we will mark {From, To} as hasStore. This is not true, we should only care about the blocks which are dominated by From and can flow to To.

The above threshold is for number of MIs. BB->size() is to get instruction number of BB. I committed 31c2b93d83f63ce7f9bb4977f58de2e00bf18e0f to further reduce compiling time. You can have a try

Thanks. It works for our tests.

shchenz mentioned this in D120330: [MachineSink] Fix CFG walk in clobber check (PR53990).Feb 23 2022, 1:28 AM

nikic mentioned this in D120800: [MachineSink] Disable if there are any irreducible cycles.Mar 3 2022, 4:10 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

MachineSink.cpp

84 lines

test/

CodeGen/

RISCV/

select-optimize-multiple.ll

106 lines

X86/

2007-01-13-StackPtrIndex.ll

50 lines

MachineSink-eflags.ll

27 lines

avx2-masked-gather.ll

128 lines

cmovcmov.ll

8 lines

select.ll

54 lines

vec_int_to_fp.ll

70 lines

Diff 302198

llvm/lib/CodeGen/MachineSink.cpp

Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	class MachineSinking : public MachineFunctionPass {
/// Record of DBG_VALUE uses of vregs in a block, so that we can identify		/// Record of DBG_VALUE uses of vregs in a block, so that we can identify
/// debug instructions to sink.		/// debug instructions to sink.
SmallDenseMap<unsigned, TinyPtrVector<SeenDbgUser>> SeenDbgUsers;		SmallDenseMap<unsigned, TinyPtrVector<SeenDbgUser>> SeenDbgUsers;

/// Record of debug variables that have had their locations set in the		/// Record of debug variables that have had their locations set in the
/// current block.		/// current block.
DenseSet<DebugVariable> SeenDbgVars;		DenseSet<DebugVariable> SeenDbgVars;

		std::map<std::pair<MachineBasicBlock , MachineBasicBlock >, bool>
		HasStoreCache;
		std::map<std::pair<MachineBasicBlock , MachineBasicBlock >,
		std::vector<MachineInstr *>>
		StoreInstrCache;

public:		public:
static char ID; // Pass identification		static char ID; // Pass identification

MachineSinking() : MachineFunctionPass(ID) {		MachineSinking() : MachineFunctionPass(ID) {
initializeMachineSinkingPass(*PassRegistry::getPassRegistry());		initializeMachineSinkingPass(*PassRegistry::getPassRegistry());
}		}

bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;
Show All 16 Lines	namespace {

private:		private:
bool ProcessBlock(MachineBasicBlock &MBB);		bool ProcessBlock(MachineBasicBlock &MBB);
void ProcessDbgInst(MachineInstr &MI);		void ProcessDbgInst(MachineInstr &MI);
bool isWorthBreakingCriticalEdge(MachineInstr &MI,		bool isWorthBreakingCriticalEdge(MachineInstr &MI,
MachineBasicBlock *From,		MachineBasicBlock *From,
MachineBasicBlock *To);		MachineBasicBlock *To);

		bool hasStoreBetween(MachineBasicBlock From, MachineBasicBlock To,
		MachineInstr &MI);

/// Postpone the splitting of the given critical		/// Postpone the splitting of the given critical
/// edge (\p From, \p To).		/// edge (\p From, \p To).
///		///
/// We do not split the edges on the fly. Indeed, this invalidates		/// We do not split the edges on the fly. Indeed, this invalidates
/// the dominance information and thus triggers a lot of updates		/// the dominance information and thus triggers a lot of updates
/// of that information underneath.		/// of that information underneath.
/// Instead, we postpone all the splits after each iteration of		/// Instead, we postpone all the splits after each iteration of
/// the main loop. That way, the information is at least valid		/// the main loop. That way, the information is at least valid
▲ Show 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	bool MachineSinking::runOnMachineFunction(MachineFunction &MF) {
bool EverMadeChange = false;		bool EverMadeChange = false;

while (true) {		while (true) {
bool MadeChange = false;		bool MadeChange = false;

// Process all basic blocks.		// Process all basic blocks.
CEBCandidates.clear();		CEBCandidates.clear();
ToSplit.clear();		ToSplit.clear();
for (auto &MBB: MF)		for (auto &MBB: MF)
		qcolombetUnsubmitted Not Done Reply Inline Actions Would it make sense to preserve the cache as long as we don't move stores, calls and loads with ordered memory ref? I'm worried about the compile time impact of this patch so I am wondering if we want to be smarter about invalidating the cache. qcolombet: Would it make sense to preserve the cache as long as we don't move stores, calls and loads with…
		shchenzAuthorUnsubmitted Done Reply Inline Actions Good idea. But seems we still need to clear the cache before we return from `MachineSinking::runOnMachineFunction`. (`MachineSink` instance is shared with different callings to `MachineSinking::runOnMachineFunction` for different functions), otherwise I meet some weird memory issue. shchenz: Good idea. But seems we still need to clear the cache before we return from `MachineSinking…
		qcolombetUnsubmitted Not Done Reply Inline Actions Yeah, that's expected. qcolombet: Yeah, that's expected.
MadeChange \|= ProcessBlock(MBB);		MadeChange \|= ProcessBlock(MBB);

// If we have anything we marked as toSplit, split it now.		// If we have anything we marked as toSplit, split it now.
for (auto &Pair : ToSplit) {		for (auto &Pair : ToSplit) {
auto NewSucc = Pair.first->SplitCriticalEdge(Pair.second, *this);		auto NewSucc = Pair.first->SplitCriticalEdge(Pair.second, *this);
if (NewSucc != nullptr) {		if (NewSucc != nullptr) {
LLVM_DEBUG(dbgs() << " *** Splitting critical edge: "		LLVM_DEBUG(dbgs() << " *** Splitting critical edge: "
<< printMBBReference(*Pair.first) << " -- "		<< printMBBReference(*Pair.first) << " -- "
<< printMBBReference(*NewSucc) << " -- "		<< printMBBReference(*NewSucc) << " -- "
<< printMBBReference(*Pair.second) << '\n');		<< printMBBReference(*Pair.second) << '\n');
if (MBFI)		if (MBFI)
MBFI->onEdgeSplit(Pair.first, NewSucc, *MBPI);		MBFI->onEdgeSplit(Pair.first, NewSucc, *MBPI);

MadeChange = true;		MadeChange = true;
++NumSplit;		++NumSplit;
} else		} else
LLVM_DEBUG(dbgs() << " *** Not legal to break critical edge\n");		LLVM_DEBUG(dbgs() << " *** Not legal to break critical edge\n");
}		}
// If this iteration over the code changed anything, keep iterating.		// If this iteration over the code changed anything, keep iterating.
if (!MadeChange) break;		if (!MadeChange) break;
EverMadeChange = true;		EverMadeChange = true;
}		}

		HasStoreCache.clear();
		StoreInstrCache.clear();

// Now clear any kill flags for recorded registers.		// Now clear any kill flags for recorded registers.
for (auto I : RegsToClearKillFlags)		for (auto I : RegsToClearKillFlags)
MRI->clearKillFlags(I);		MRI->clearKillFlags(I);
RegsToClearKillFlags.clear();		RegsToClearKillFlags.clear();

return EverMadeChange;		return EverMadeChange;
}		}

▲ Show 20 Lines • Show All 544 Lines • ▼ Show 20 Lines	for (SmallVectorImpl<MachineInstr *>::iterator DBI = DbgValuesToSink.begin(),
MachineInstr NewDbgMI = DbgMI->getMF()->CloneMachineInstr(DBI);		MachineInstr NewDbgMI = DbgMI->getMF()->CloneMachineInstr(DBI);
SuccToSinkTo.insert(InsertPos, NewDbgMI);		SuccToSinkTo.insert(InsertPos, NewDbgMI);

if (!attemptDebugCopyProp(MI, *DbgMI))		if (!attemptDebugCopyProp(MI, *DbgMI))
DbgMI->setDebugValueUndef();		DbgMI->setDebugValueUndef();
}		}
}		}

		/// hasStoreBetween - check if there is store betweeen straight line blocks From
		/// and To.
		bool MachineSinking::hasStoreBetween(MachineBasicBlock *From,
		MachineBasicBlock *To, MachineInstr &MI) {
		// Make sure From and To are in straight line which means From dominates To
		// and To post dominates From.
		if (!DT->dominates(From, To) \|\| !PDT->dominates(To, From))
		return true;

		efriedmaUnsubmitted Not Done Reply Inline Actions Are the post-dominance checks necessary? I'm not sure what invariants you're trying to enforce. efriedma: Are the post-dominance checks necessary? I'm not sure what invariants you're trying to enforce.
		shchenzAuthorUnsubmitted Done Reply Inline Actions they are not necessary. I add the post-dominance check for the convenience of finding all blocks which are in the path from block From to To. With these checks, it is easy to judge if a block BB(reachable from From) is in the path from From to To. Just check `PDT->dominates(To, BB))` ` shchenz: they are not necessary. I add the post-dominance check for the convenience of finding all…
		auto BlockPair = std::make_pair(From, To);

		// Does these two blocks pair be queried before and have a definite cached
		// result?
		if (HasStoreCache.find(BlockPair) != HasStoreCache.end())
		return HasStoreCache[BlockPair];

		if (StoreInstrCache.find(BlockPair) != StoreInstrCache.end())
		return std::any_of(
		StoreInstrCache[BlockPair].begin(), StoreInstrCache[BlockPair].end(),
		[&](MachineInstr *I) { return I->mayAlias(AA, MI, false); });

		bool SawStore = false;
		bool HasAliasedStore = false;
		DenseSet<MachineBasicBlock *> HandledBlocks;
		// Go through all reachable blocks from From.
		for (MachineBasicBlock *BB : depth_first(From)) {
		// We insert the instruction at the start of block To, so no need to worry
		// about stores inside To.
		// Store in block From should be already considered when just enter function
		// SinkInstruction.
		if (BB == To \|\| BB == From)
		continue;
		qcolombetUnsubmitted Not Done Reply Inline Actions While we traverse BB, should we create an entry for the pair (To, BB). Put differently, how is the compile time impact of this patch and do we have to do more to avoid computations? qcolombet: While we traverse BB, should we create an entry for the pair (To, BB). Put differently, how is…
		shchenzAuthorUnsubmitted Done Reply Inline Actions I collected some data from PowerPC, seems the compiling time difference is not obivious in llvm test-suite. the tool hides some small diff tests. Among them, I saw some tests takes 10s and some of them takes more than 30s, but no big reg for them. The biggest diff 9.6% and 7.4%, compile time are around 0.1s. And these two degradations can not reproduciable in other runs. ./utils/compare.py -m compile_time base.json fix.json Tests: 309 Metric: compile_time Program base fix diff test-suite...enchmarks/Stanford/Queens.test 9.6% test-suite...d-warshall/floyd-warshall.test 7.4% test-suite...ImageProcessing/Blur/blur.test -4.7% test-suite...SubsetBRawLoops/lcalsBRaw.test 2.7% test-suite...math/automotive-basicmath.test -2.5% test-suite.../Benchmarks/Stanford/Perm.test -2.4% test-suite...rks/tramp3d-v4/tramp3d-v4.test -2.2% test-suite...ow-dbl/GlobalDataFlow-dbl.test -2.2% test-suite...enchmarks/Stanford/RealMM.test -2.0% test-suite...ctions-flt/Reductions-flt.test -1.8% test-suite...t/StatementReordering-flt.test -1.7% test-suite.../Trimaran/enc-rc4/enc-rc4.test -1.7% test-suite...l/StatementReordering-dbl.test 1.6% test-suite...pansion-dbl/Expansion-dbl.test -1.4% test-suite...bl/IndirectAddressing-dbl.test -1.2% Geomean difference nan% base fix diff count 309.000000 309.000000 277.000000 mean 2.696964 2.685603 -0.003232 std 6.283106 6.251353 0.009456 min 0.000000 0.000000 -0.047321 25% 0.077944 0.077862 -0.005599 50% 0.291912 0.289780 -0.003404 75% 2.475318 2.478657 -0.001450 max 64.257411 64.088470 0.095601 shchenz: I collected some data from PowerPC, seems the compiling time difference is not obivious in llvm…

		// We already handle this BB in previous iteration.
		if (HandledBlocks.count(BB))
		continue;

		HandledBlocks.insert(BB);
		// To post dominates BB, it must be a path from block From.
		if (PDT->dominates(To, BB)) {
		for (MachineInstr &I : *BB) {
		// Treat as alias conservatively for a call or an ordered memory
		// operation.
		if (I.isCall() \|\| I.hasOrderedMemoryRef()) {
		HasStoreCache[BlockPair] = true;
		return true;
		}

		if (I.mayStore()) {
		SawStore = true;
		// We still have chance to sink MI if all stores between are not
		// aliased to MI.
		// Cache all store instructions, so that we don't need to go through
		// all From reachable blocks for next load instruction.
		if (I.mayAlias(AA, MI, false))
		HasAliasedStore = true;
		StoreInstrCache[BlockPair].push_back(&I);
		}
		}
		}
		}
		// If there is no store at all, cache the result.
		if (!SawStore)
		HasStoreCache[BlockPair] = false;
		return HasAliasedStore;
		}

/// SinkInstruction - Determine whether it is safe to sink the specified machine		/// SinkInstruction - Determine whether it is safe to sink the specified machine
/// instruction out of its current block into a successor.		/// instruction out of its current block into a successor.
bool MachineSinking::SinkInstruction(MachineInstr &MI, bool &SawStore,		bool MachineSinking::SinkInstruction(MachineInstr &MI, bool &SawStore,
AllSuccsCache &AllSuccessors) {		AllSuccsCache &AllSuccessors) {
// Don't sink instructions that the target prefers not to sink.		// Don't sink instructions that the target prefers not to sink.
if (!TII->shouldSink(MI))		if (!TII->shouldSink(MI))
return false;		return false;

▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	bool MachineSinking::SinkInstruction(MachineInstr &MI, bool &SawStore,
LLVM_DEBUG(dbgs() << "Sink instr " << MI << "\tinto block " << *SuccToSinkTo);		LLVM_DEBUG(dbgs() << "Sink instr " << MI << "\tinto block " << *SuccToSinkTo);

// If the block has multiple predecessors, this is a critical edge.		// If the block has multiple predecessors, this is a critical edge.
// Decide if we can sink along it or need to break the edge.		// Decide if we can sink along it or need to break the edge.
if (SuccToSinkTo->pred_size() > 1) {		if (SuccToSinkTo->pred_size() > 1) {
// We cannot sink a load across a critical edge - there may be stores in		// We cannot sink a load across a critical edge - there may be stores in
// other code paths.		// other code paths.
bool TryBreak = false;		bool TryBreak = false;
bool store = true;		bool Store =
if (!MI.isSafeToMove(AA, store)) {		MI.mayLoad() ? hasStoreBetween(ParentBlock, SuccToSinkTo, MI) : true;
		if (!MI.isSafeToMove(AA, Store)) {
LLVM_DEBUG(dbgs() << " *** NOTE: Won't sink load along critical edge.\n");		LLVM_DEBUG(dbgs() << " *** NOTE: Won't sink load along critical edge.\n");
TryBreak = true;		TryBreak = true;
}		}

// We don't want to sink across a critical edge if we don't dominate the		// We don't want to sink across a critical edge if we don't dominate the
// successor. We could be introducing calculations to new code paths.		// successor. We could be introducing calculations to new code paths.
if (!TryBreak && !DT->dominates(ParentBlock, SuccToSinkTo)) {		if (!TryBreak && !DT->dominates(ParentBlock, SuccToSinkTo)) {
LLVM_DEBUG(dbgs() << " *** NOTE: Critical edge found\n");		LLVM_DEBUG(dbgs() << " *** NOTE: Critical edge found\n");
▲ Show 20 Lines • Show All 460 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/select-optimize-multiple.ll

Show All 34 Lines	entry:
%cond = select i1 %cmp, i64 %b, i64 %c		%cond = select i1 %cmp, i64 %b, i64 %c
ret i64 %cond		ret i64 %cond
}		}

define i128 @cmovcc128(i64 signext %a, i128 %b, i128 %c) nounwind {		define i128 @cmovcc128(i64 signext %a, i128 %b, i128 %c) nounwind {
; RV32I-LABEL: cmovcc128:		; RV32I-LABEL: cmovcc128:
; RV32I: # %bb.0: # %entry		; RV32I: # %bb.0: # %entry
; RV32I-NEXT: xori a1, a1, 123		; RV32I-NEXT: xori a1, a1, 123
; RV32I-NEXT: or a2, a1, a2		; RV32I-NEXT: or a1, a1, a2
; RV32I-NEXT: mv a1, a3		; RV32I-NEXT: mv a2, a3
; RV32I-NEXT: beqz a2, .LBB1_2		; RV32I-NEXT: beqz a1, .LBB1_2
; RV32I-NEXT: # %bb.1: # %entry		; RV32I-NEXT: # %bb.1: # %entry
; RV32I-NEXT: mv a1, a4		; RV32I-NEXT: mv a2, a4
; RV32I-NEXT: .LBB1_2: # %entry		; RV32I-NEXT: .LBB1_2: # %entry
; RV32I-NEXT: lw a6, 0(a1)		; RV32I-NEXT: beqz a1, .LBB1_5
; RV32I-NEXT: beqz a2, .LBB1_6
; RV32I-NEXT: # %bb.3: # %entry		; RV32I-NEXT: # %bb.3: # %entry
; RV32I-NEXT: addi a1, a4, 4		; RV32I-NEXT: addi a7, a4, 4
; RV32I-NEXT: lw a5, 0(a1)		; RV32I-NEXT: bnez a1, .LBB1_6
; RV32I-NEXT: bnez a2, .LBB1_7
; RV32I-NEXT: .LBB1_4:		; RV32I-NEXT: .LBB1_4:
; RV32I-NEXT: addi a1, a3, 8		; RV32I-NEXT: addi a5, a3, 8
; RV32I-NEXT: lw a1, 0(a1)		; RV32I-NEXT: j .LBB1_7
; RV32I-NEXT: bnez a2, .LBB1_8
; RV32I-NEXT: .LBB1_5:		; RV32I-NEXT: .LBB1_5:
; RV32I-NEXT: addi a2, a3, 12		; RV32I-NEXT: addi a7, a3, 4
; RV32I-NEXT: j .LBB1_9		; RV32I-NEXT: beqz a1, .LBB1_4
; RV32I-NEXT: .LBB1_6:		; RV32I-NEXT: .LBB1_6: # %entry
; RV32I-NEXT: addi a1, a3, 4		; RV32I-NEXT: addi a5, a4, 8
; RV32I-NEXT: lw a5, 0(a1)
; RV32I-NEXT: beqz a2, .LBB1_4
; RV32I-NEXT: .LBB1_7: # %entry		; RV32I-NEXT: .LBB1_7: # %entry
; RV32I-NEXT: addi a1, a4, 8		; RV32I-NEXT: lw a6, 0(a2)
		; RV32I-NEXT: lw a7, 0(a7)
		; RV32I-NEXT: lw a2, 0(a5)
		; RV32I-NEXT: beqz a1, .LBB1_9
		; RV32I-NEXT: # %bb.8: # %entry
		; RV32I-NEXT: addi a1, a4, 12
		; RV32I-NEXT: j .LBB1_10
		; RV32I-NEXT: .LBB1_9:
		; RV32I-NEXT: addi a1, a3, 12
		; RV32I-NEXT: .LBB1_10: # %entry
; RV32I-NEXT: lw a1, 0(a1)		; RV32I-NEXT: lw a1, 0(a1)
; RV32I-NEXT: beqz a2, .LBB1_5		; RV32I-NEXT: sw a1, 12(a0)
; RV32I-NEXT: .LBB1_8: # %entry		; RV32I-NEXT: sw a2, 8(a0)
; RV32I-NEXT: addi a2, a4, 12		; RV32I-NEXT: sw a7, 4(a0)
; RV32I-NEXT: .LBB1_9: # %entry
; RV32I-NEXT: lw a2, 0(a2)
; RV32I-NEXT: sw a2, 12(a0)
; RV32I-NEXT: sw a1, 8(a0)
; RV32I-NEXT: sw a5, 4(a0)
; RV32I-NEXT: sw a6, 0(a0)		; RV32I-NEXT: sw a6, 0(a0)
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64I-LABEL: cmovcc128:		; RV64I-LABEL: cmovcc128:
; RV64I: # %bb.0: # %entry		; RV64I: # %bb.0: # %entry
; RV64I-NEXT: addi a5, zero, 123		; RV64I-NEXT: addi a5, zero, 123
; RV64I-NEXT: beq a0, a5, .LBB1_2		; RV64I-NEXT: beq a0, a5, .LBB1_2
; RV64I-NEXT: # %bb.1: # %entry		; RV64I-NEXT: # %bb.1: # %entry
Show All 34 Lines
entry:		entry:
%cond = select i1 %a, i64 %b, i64 %c		%cond = select i1 %a, i64 %b, i64 %c
ret i64 %cond		ret i64 %cond
}		}

define i128 @cmov128(i1 %a, i128 %b, i128 %c) nounwind {		define i128 @cmov128(i1 %a, i128 %b, i128 %c) nounwind {
; RV32I-LABEL: cmov128:		; RV32I-LABEL: cmov128:
; RV32I: # %bb.0: # %entry		; RV32I: # %bb.0: # %entry
; RV32I-NEXT: andi a4, a1, 1		; RV32I-NEXT: andi a1, a1, 1
; RV32I-NEXT: mv a1, a2		; RV32I-NEXT: mv a4, a2
; RV32I-NEXT: bnez a4, .LBB3_2		; RV32I-NEXT: bnez a1, .LBB3_2
; RV32I-NEXT: # %bb.1: # %entry		; RV32I-NEXT: # %bb.1: # %entry
; RV32I-NEXT: mv a1, a3		; RV32I-NEXT: mv a4, a3
; RV32I-NEXT: .LBB3_2: # %entry		; RV32I-NEXT: .LBB3_2: # %entry
; RV32I-NEXT: lw a6, 0(a1)		; RV32I-NEXT: bnez a1, .LBB3_5
; RV32I-NEXT: bnez a4, .LBB3_6
; RV32I-NEXT: # %bb.3: # %entry		; RV32I-NEXT: # %bb.3: # %entry
; RV32I-NEXT: addi a1, a3, 4		; RV32I-NEXT: addi a7, a3, 4
; RV32I-NEXT: lw a5, 0(a1)		; RV32I-NEXT: beqz a1, .LBB3_6
; RV32I-NEXT: beqz a4, .LBB3_7
; RV32I-NEXT: .LBB3_4:		; RV32I-NEXT: .LBB3_4:
; RV32I-NEXT: addi a1, a2, 8		; RV32I-NEXT: addi a5, a2, 8
; RV32I-NEXT: lw a1, 0(a1)		; RV32I-NEXT: j .LBB3_7
; RV32I-NEXT: beqz a4, .LBB3_8
; RV32I-NEXT: .LBB3_5:		; RV32I-NEXT: .LBB3_5:
; RV32I-NEXT: addi a2, a2, 12		; RV32I-NEXT: addi a7, a2, 4
; RV32I-NEXT: j .LBB3_9		; RV32I-NEXT: bnez a1, .LBB3_4
; RV32I-NEXT: .LBB3_6:		; RV32I-NEXT: .LBB3_6: # %entry
; RV32I-NEXT: addi a1, a2, 4		; RV32I-NEXT: addi a5, a3, 8
; RV32I-NEXT: lw a5, 0(a1)
; RV32I-NEXT: bnez a4, .LBB3_4
; RV32I-NEXT: .LBB3_7: # %entry		; RV32I-NEXT: .LBB3_7: # %entry
; RV32I-NEXT: addi a1, a3, 8		; RV32I-NEXT: lw a6, 0(a4)
		; RV32I-NEXT: lw a7, 0(a7)
		; RV32I-NEXT: lw a4, 0(a5)
		; RV32I-NEXT: bnez a1, .LBB3_9
		; RV32I-NEXT: # %bb.8: # %entry
		; RV32I-NEXT: addi a1, a3, 12
		; RV32I-NEXT: j .LBB3_10
		; RV32I-NEXT: .LBB3_9:
		; RV32I-NEXT: addi a1, a2, 12
		; RV32I-NEXT: .LBB3_10: # %entry
; RV32I-NEXT: lw a1, 0(a1)		; RV32I-NEXT: lw a1, 0(a1)
; RV32I-NEXT: bnez a4, .LBB3_5		; RV32I-NEXT: sw a1, 12(a0)
; RV32I-NEXT: .LBB3_8: # %entry		; RV32I-NEXT: sw a4, 8(a0)
; RV32I-NEXT: addi a2, a3, 12		; RV32I-NEXT: sw a7, 4(a0)
; RV32I-NEXT: .LBB3_9: # %entry
; RV32I-NEXT: lw a2, 0(a2)
; RV32I-NEXT: sw a2, 12(a0)
; RV32I-NEXT: sw a1, 8(a0)
; RV32I-NEXT: sw a5, 4(a0)
; RV32I-NEXT: sw a6, 0(a0)		; RV32I-NEXT: sw a6, 0(a0)
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64I-LABEL: cmov128:		; RV64I-LABEL: cmov128:
; RV64I: # %bb.0: # %entry		; RV64I: # %bb.0: # %entry
; RV64I-NEXT: andi a5, a0, 1		; RV64I-NEXT: andi a5, a0, 1
; RV64I-NEXT: mv a0, a1		; RV64I-NEXT: mv a0, a1
; RV64I-NEXT: bnez a5, .LBB3_2		; RV64I-NEXT: bnez a5, .LBB3_2
▲ Show 20 Lines • Show All 181 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/2007-01-13-StackPtrIndex.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-- \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-- \| FileCheck %s
	; PR1103			; PR1103

	target datalayout = "e-p:64:64"			target datalayout = "e-p:64:64"
	@i6000 = global [128 x i64] zeroinitializer, align 16			@i6000 = global [128 x i64] zeroinitializer, align 16


	define void @foo(i32* %a0, i32* %a1, i32* %a2, i32* %a3, i32* %a4, i32* %a5) {			define void @foo(i32* %a0, i32* %a1, i32* %a2, i32* %a3, i32* %a4, i32* %a5) {
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: # %bb.0: # %b			; CHECK: # %bb.0: # %b
	; CHECK-NEXT: pushq %rbp			; CHECK-NEXT: pushq %rbp
	; CHECK-NEXT: .cfi_def_cfa_offset 16			; CHECK-NEXT: .cfi_def_cfa_offset 16
	; CHECK-NEXT: .cfi_offset %rbp, -16			; CHECK-NEXT: .cfi_offset %rbp, -16
	; CHECK-NEXT: movq %rsp, %rbp			; CHECK-NEXT: movq %rsp, %rbp
	; CHECK-NEXT: .cfi_def_cfa_register %rbp			; CHECK-NEXT: .cfi_def_cfa_register %rbp
	; CHECK-NEXT: movslq (%rdi), %rax			; CHECK-NEXT: movslq (%rdi), %rdi
	; CHECK-NEXT: movslq (%rsi), %r8			; CHECK-NEXT: movslq (%rsi), %r8
	; CHECK-NEXT: movslq (%rdx), %r10			; CHECK-NEXT: movslq (%rdx), %r10
	; CHECK-NEXT: movl (%rcx), %edi			; CHECK-NEXT: movl (%rcx), %esi
	; CHECK-NEXT: movslq (%r9), %rcx			; CHECK-NEXT: movq %rsp, %rcx
	; CHECK-NEXT: movq %rsp, %rdx			; CHECK-NEXT: subl %edi, %r8d
	; CHECK-NEXT: subl %eax, %r8d			; CHECK-NEXT: movslq %r8d, %rdx
	; CHECK-NEXT: movslq %r8d, %rsi
	; CHECK-NEXT: js .LBB0_1			; CHECK-NEXT: js .LBB0_1
	; CHECK-NEXT: # %bb.11: # %b63			; CHECK-NEXT: # %bb.11: # %b63
	; CHECK-NEXT: testq %rsi, %rsi			; CHECK-NEXT: testq %rdx, %rdx
	; CHECK-NEXT: js .LBB0_14			; CHECK-NEXT: js .LBB0_14
	; CHECK-NEXT: # %bb.12:			; CHECK-NEXT: # %bb.12:
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %edi, %edi
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_13: # %a25b			; CHECK-NEXT: .LBB0_13: # %a25b
	; CHECK-NEXT: # =>This Inner Loop Header: Depth=1			; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %dil, %dil
	; CHECK-NEXT: je .LBB0_13			; CHECK-NEXT: je .LBB0_13
	; CHECK-NEXT: .LBB0_14: # %b85			; CHECK-NEXT: .LBB0_14: # %b85
	; CHECK-NEXT: movb $1, %al			; CHECK-NEXT: movb $1, %al
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %al, %al
	; CHECK-NEXT: jne .LBB0_1			; CHECK-NEXT: jne .LBB0_1
	; CHECK-NEXT: # %bb.15:			; CHECK-NEXT: # %bb.15:
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %edi, %edi
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_16: # %a25b140			; CHECK-NEXT: .LBB0_16: # %a25b140
	; CHECK-NEXT: # =>This Inner Loop Header: Depth=1			; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %dil, %dil
	; CHECK-NEXT: je .LBB0_16			; CHECK-NEXT: je .LBB0_16
	; CHECK-NEXT: .LBB0_1: # %a29b			; CHECK-NEXT: .LBB0_1: # %a29b
	; CHECK-NEXT: cmpl %r10d, %edi			; CHECK-NEXT: cmpl %r10d, %esi
	; CHECK-NEXT: js .LBB0_10			; CHECK-NEXT: js .LBB0_10
	; CHECK-NEXT: # %bb.2: # %b158			; CHECK-NEXT: # %bb.2: # %b158
				; CHECK-NEXT: movslq (%r9), %rsi
	; CHECK-NEXT: xorl %edi, %edi			; CHECK-NEXT: xorl %edi, %edi
	; CHECK-NEXT: xorps %xmm0, %xmm0			; CHECK-NEXT: xorps %xmm0, %xmm0
	; CHECK-NEXT: movb $1, %r10b			; CHECK-NEXT: movb $1, %r10b
	; CHECK-NEXT: jmp .LBB0_3			; CHECK-NEXT: jmp .LBB0_3
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_9: # %b1606			; CHECK-NEXT: .LBB0_9: # %b1606
	; CHECK-NEXT: # in Loop: Header=BB0_3 Depth=1			; CHECK-NEXT: # in Loop: Header=BB0_3 Depth=1
	; CHECK-NEXT: testb %dil, %dil			; CHECK-NEXT: testb %dil, %dil
	Show All 13 Lines
	; CHECK-NEXT: # Child Loop BB0_39 Depth 3			; CHECK-NEXT: # Child Loop BB0_39 Depth 3
	; CHECK-NEXT: # Child Loop BB0_33 Depth 3			; CHECK-NEXT: # Child Loop BB0_33 Depth 3
	; CHECK-NEXT: # Child Loop BB0_34 Depth 2			; CHECK-NEXT: # Child Loop BB0_34 Depth 2
	; CHECK-NEXT: # Child Loop BB0_36 Depth 2			; CHECK-NEXT: # Child Loop BB0_36 Depth 2
	; CHECK-NEXT: testl %r8d, %r8d			; CHECK-NEXT: testl %r8d, %r8d
	; CHECK-NEXT: js .LBB0_4			; CHECK-NEXT: js .LBB0_4
	; CHECK-NEXT: # %bb.17: # %b179			; CHECK-NEXT: # %bb.17: # %b179
	; CHECK-NEXT: # in Loop: Header=BB0_3 Depth=1			; CHECK-NEXT: # in Loop: Header=BB0_3 Depth=1
	; CHECK-NEXT: testq %rsi, %rsi			; CHECK-NEXT: testq %rdx, %rdx
	; CHECK-NEXT: js .LBB0_18			; CHECK-NEXT: js .LBB0_18
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_37: # %a30b			; CHECK-NEXT: .LBB0_37: # %a30b
	; CHECK-NEXT: # Parent Loop BB0_3 Depth=1			; CHECK-NEXT: # Parent Loop BB0_3 Depth=1
	; CHECK-NEXT: # => This Inner Loop Header: Depth=2			; CHECK-NEXT: # => This Inner Loop Header: Depth=2
	; CHECK-NEXT: testb %dil, %dil			; CHECK-NEXT: testb %dil, %dil
	; CHECK-NEXT: je .LBB0_37			; CHECK-NEXT: je .LBB0_37
	; CHECK-NEXT: .LBB0_18: # %b188			; CHECK-NEXT: .LBB0_18: # %b188
	; CHECK-NEXT: # in Loop: Header=BB0_3 Depth=1			; CHECK-NEXT: # in Loop: Header=BB0_3 Depth=1
	; CHECK-NEXT: testb %r10b, %r10b			; CHECK-NEXT: testb %r10b, %r10b
	; CHECK-NEXT: jne .LBB0_4			; CHECK-NEXT: jne .LBB0_4
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_19: # %a30b294			; CHECK-NEXT: .LBB0_19: # %a30b294
	; CHECK-NEXT: # Parent Loop BB0_3 Depth=1			; CHECK-NEXT: # Parent Loop BB0_3 Depth=1
	; CHECK-NEXT: # => This Inner Loop Header: Depth=2			; CHECK-NEXT: # => This Inner Loop Header: Depth=2
	; CHECK-NEXT: testb %dil, %dil			; CHECK-NEXT: testb %dil, %dil
	; CHECK-NEXT: je .LBB0_19			; CHECK-NEXT: je .LBB0_19
	; CHECK-NEXT: .LBB0_4: # %a33b			; CHECK-NEXT: .LBB0_4: # %a33b
	; CHECK-NEXT: # in Loop: Header=BB0_3 Depth=1			; CHECK-NEXT: # in Loop: Header=BB0_3 Depth=1
	; CHECK-NEXT: movl %ecx, %eax			; CHECK-NEXT: movl %esi, %eax
	; CHECK-NEXT: orl %r8d, %eax			; CHECK-NEXT: orl %r8d, %eax
	; CHECK-NEXT: movl %eax, %r9d			; CHECK-NEXT: movl %eax, %r9d
	; CHECK-NEXT: shrl $31, %r9d			; CHECK-NEXT: shrl $31, %r9d
	; CHECK-NEXT: testl %eax, %eax			; CHECK-NEXT: testl %eax, %eax
	; CHECK-NEXT: jns .LBB0_20			; CHECK-NEXT: jns .LBB0_20
	; CHECK-NEXT: .LBB0_5: # %a50b			; CHECK-NEXT: .LBB0_5: # %a50b
	; CHECK-NEXT: # in Loop: Header=BB0_3 Depth=1			; CHECK-NEXT: # in Loop: Header=BB0_3 Depth=1
	; CHECK-NEXT: movl %r8d, %eax			; CHECK-NEXT: movl %r8d, %eax
	; CHECK-NEXT: orl %ecx, %eax			; CHECK-NEXT: orl %esi, %eax
	; CHECK-NEXT: movl %eax, %r11d			; CHECK-NEXT: movl %eax, %r11d
	; CHECK-NEXT: shrl $31, %r11d			; CHECK-NEXT: shrl $31, %r11d
	; CHECK-NEXT: testl %eax, %eax			; CHECK-NEXT: testl %eax, %eax
	; CHECK-NEXT: jns .LBB0_26			; CHECK-NEXT: jns .LBB0_26
	; CHECK-NEXT: .LBB0_6: # %a57b			; CHECK-NEXT: .LBB0_6: # %a57b
	; CHECK-NEXT: # in Loop: Header=BB0_3 Depth=1			; CHECK-NEXT: # in Loop: Header=BB0_3 Depth=1
	; CHECK-NEXT: testb %r9b, %r9b			; CHECK-NEXT: testb %r9b, %r9b
	; CHECK-NEXT: je .LBB0_30			; CHECK-NEXT: je .LBB0_30
	Show All 33 Lines
	; CHECK-NEXT: .LBB0_22: # %b463			; CHECK-NEXT: .LBB0_22: # %b463
	; CHECK-NEXT: # in Loop: Header=BB0_20 Depth=2			; CHECK-NEXT: # in Loop: Header=BB0_20 Depth=2
	; CHECK-NEXT: testb %dil, %dil			; CHECK-NEXT: testb %dil, %dil
	; CHECK-NEXT: je .LBB0_23			; CHECK-NEXT: je .LBB0_23
	; CHECK-NEXT: .LBB0_20: # %b341			; CHECK-NEXT: .LBB0_20: # %b341
	; CHECK-NEXT: # Parent Loop BB0_3 Depth=1			; CHECK-NEXT: # Parent Loop BB0_3 Depth=1
	; CHECK-NEXT: # => This Loop Header: Depth=2			; CHECK-NEXT: # => This Loop Header: Depth=2
	; CHECK-NEXT: # Child Loop BB0_21 Depth 3			; CHECK-NEXT: # Child Loop BB0_21 Depth 3
	; CHECK-NEXT: testq %rcx, %rcx			; CHECK-NEXT: testq %rsi, %rsi
	; CHECK-NEXT: js .LBB0_22			; CHECK-NEXT: js .LBB0_22
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_21: # %a35b			; CHECK-NEXT: .LBB0_21: # %a35b
	; CHECK-NEXT: # Parent Loop BB0_3 Depth=1			; CHECK-NEXT: # Parent Loop BB0_3 Depth=1
	; CHECK-NEXT: # Parent Loop BB0_20 Depth=2			; CHECK-NEXT: # Parent Loop BB0_20 Depth=2
	; CHECK-NEXT: # => This Inner Loop Header: Depth=3			; CHECK-NEXT: # => This Inner Loop Header: Depth=3
	; CHECK-NEXT: testb %dil, %dil			; CHECK-NEXT: testb %dil, %dil
	; CHECK-NEXT: je .LBB0_21			; CHECK-NEXT: je .LBB0_21
	; CHECK-NEXT: jmp .LBB0_22			; CHECK-NEXT: jmp .LBB0_22
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_28: # %b1016			; CHECK-NEXT: .LBB0_28: # %b1016
	; CHECK-NEXT: # in Loop: Header=BB0_26 Depth=2			; CHECK-NEXT: # in Loop: Header=BB0_26 Depth=2
	; CHECK-NEXT: testq %rcx, %rcx			; CHECK-NEXT: testq %rsi, %rsi
	; CHECK-NEXT: jle .LBB0_6			; CHECK-NEXT: jle .LBB0_6
	; CHECK-NEXT: .LBB0_26: # %b858			; CHECK-NEXT: .LBB0_26: # %b858
	; CHECK-NEXT: # Parent Loop BB0_3 Depth=1			; CHECK-NEXT: # Parent Loop BB0_3 Depth=1
	; CHECK-NEXT: # => This Loop Header: Depth=2			; CHECK-NEXT: # => This Loop Header: Depth=2
	; CHECK-NEXT: # Child Loop BB0_38 Depth 3			; CHECK-NEXT: # Child Loop BB0_38 Depth 3
	; CHECK-NEXT: # Child Loop BB0_29 Depth 3			; CHECK-NEXT: # Child Loop BB0_29 Depth 3
	; CHECK-NEXT: testq %rsi, %rsi			; CHECK-NEXT: testq %rdx, %rdx
	; CHECK-NEXT: js .LBB0_27			; CHECK-NEXT: js .LBB0_27
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_38: # %a53b			; CHECK-NEXT: .LBB0_38: # %a53b
	; CHECK-NEXT: # Parent Loop BB0_3 Depth=1			; CHECK-NEXT: # Parent Loop BB0_3 Depth=1
	; CHECK-NEXT: # Parent Loop BB0_26 Depth=2			; CHECK-NEXT: # Parent Loop BB0_26 Depth=2
	; CHECK-NEXT: # => This Inner Loop Header: Depth=3			; CHECK-NEXT: # => This Inner Loop Header: Depth=3
	; CHECK-NEXT: testb %dil, %dil			; CHECK-NEXT: testb %dil, %dil
	; CHECK-NEXT: je .LBB0_38			; CHECK-NEXT: je .LBB0_38
	; CHECK-NEXT: .LBB0_27: # %b879			; CHECK-NEXT: .LBB0_27: # %b879
	; CHECK-NEXT: # in Loop: Header=BB0_26 Depth=2			; CHECK-NEXT: # in Loop: Header=BB0_26 Depth=2
	; CHECK-NEXT: testb %r10b, %r10b			; CHECK-NEXT: testb %r10b, %r10b
	; CHECK-NEXT: jne .LBB0_28			; CHECK-NEXT: jne .LBB0_28
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_29: # %a53b1019			; CHECK-NEXT: .LBB0_29: # %a53b1019
	; CHECK-NEXT: # Parent Loop BB0_3 Depth=1			; CHECK-NEXT: # Parent Loop BB0_3 Depth=1
	; CHECK-NEXT: # Parent Loop BB0_26 Depth=2			; CHECK-NEXT: # Parent Loop BB0_26 Depth=2
	; CHECK-NEXT: # => This Inner Loop Header: Depth=3			; CHECK-NEXT: # => This Inner Loop Header: Depth=3
	; CHECK-NEXT: testq %rsi, %rsi			; CHECK-NEXT: testq %rdx, %rdx
	; CHECK-NEXT: jle .LBB0_29			; CHECK-NEXT: jle .LBB0_29
	; CHECK-NEXT: jmp .LBB0_28			; CHECK-NEXT: jmp .LBB0_28
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_32: # %b1263			; CHECK-NEXT: .LBB0_32: # %b1263
	; CHECK-NEXT: # in Loop: Header=BB0_30 Depth=2			; CHECK-NEXT: # in Loop: Header=BB0_30 Depth=2
	; CHECK-NEXT: testq %rsi, %rsi			; CHECK-NEXT: testq %rdx, %rdx
	; CHECK-NEXT: jle .LBB0_7			; CHECK-NEXT: jle .LBB0_7
	; CHECK-NEXT: .LBB0_30: # %b1117			; CHECK-NEXT: .LBB0_30: # %b1117
	; CHECK-NEXT: # Parent Loop BB0_3 Depth=1			; CHECK-NEXT: # Parent Loop BB0_3 Depth=1
	; CHECK-NEXT: # => This Loop Header: Depth=2			; CHECK-NEXT: # => This Loop Header: Depth=2
	; CHECK-NEXT: # Child Loop BB0_39 Depth 3			; CHECK-NEXT: # Child Loop BB0_39 Depth 3
	; CHECK-NEXT: # Child Loop BB0_33 Depth 3			; CHECK-NEXT: # Child Loop BB0_33 Depth 3
	; CHECK-NEXT: testq %rcx, %rcx			; CHECK-NEXT: testq %rsi, %rsi
	; CHECK-NEXT: js .LBB0_31			; CHECK-NEXT: js .LBB0_31
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_39: # %a63b			; CHECK-NEXT: .LBB0_39: # %a63b
	; CHECK-NEXT: # Parent Loop BB0_3 Depth=1			; CHECK-NEXT: # Parent Loop BB0_3 Depth=1
	; CHECK-NEXT: # Parent Loop BB0_30 Depth=2			; CHECK-NEXT: # Parent Loop BB0_30 Depth=2
	; CHECK-NEXT: # => This Inner Loop Header: Depth=3			; CHECK-NEXT: # => This Inner Loop Header: Depth=3
	; CHECK-NEXT: testq %rcx, %rcx			; CHECK-NEXT: testq %rsi, %rsi
	; CHECK-NEXT: jle .LBB0_39			; CHECK-NEXT: jle .LBB0_39
	; CHECK-NEXT: .LBB0_31: # %b1139			; CHECK-NEXT: .LBB0_31: # %b1139
	; CHECK-NEXT: # in Loop: Header=BB0_30 Depth=2			; CHECK-NEXT: # in Loop: Header=BB0_30 Depth=2
	; CHECK-NEXT: testq %rcx, %rcx			; CHECK-NEXT: testq %rsi, %rsi
	; CHECK-NEXT: jle .LBB0_32			; CHECK-NEXT: jle .LBB0_32
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_33: # %a63b1266			; CHECK-NEXT: .LBB0_33: # %a63b1266
	; CHECK-NEXT: # Parent Loop BB0_3 Depth=1			; CHECK-NEXT: # Parent Loop BB0_3 Depth=1
	; CHECK-NEXT: # Parent Loop BB0_30 Depth=2			; CHECK-NEXT: # Parent Loop BB0_30 Depth=2
	; CHECK-NEXT: # => This Inner Loop Header: Depth=3			; CHECK-NEXT: # => This Inner Loop Header: Depth=3
	; CHECK-NEXT: testq %rcx, %rcx			; CHECK-NEXT: testq %rsi, %rsi
	; CHECK-NEXT: jle .LBB0_33			; CHECK-NEXT: jle .LBB0_33
	; CHECK-NEXT: jmp .LBB0_32			; CHECK-NEXT: jmp .LBB0_32
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_25: # %b712			; CHECK-NEXT: .LBB0_25: # %b712
	; CHECK-NEXT: # in Loop: Header=BB0_23 Depth=2			; CHECK-NEXT: # in Loop: Header=BB0_23 Depth=2
	; CHECK-NEXT: testb %dil, %dil			; CHECK-NEXT: testb %dil, %dil
	; CHECK-NEXT: je .LBB0_5			; CHECK-NEXT: je .LBB0_5
	; CHECK-NEXT: .LBB0_23: # %b535			; CHECK-NEXT: .LBB0_23: # %b535
	; CHECK-NEXT: # Parent Loop BB0_3 Depth=1			; CHECK-NEXT: # Parent Loop BB0_3 Depth=1
	; CHECK-NEXT: # => This Loop Header: Depth=2			; CHECK-NEXT: # => This Loop Header: Depth=2
	; CHECK-NEXT: # Child Loop BB0_24 Depth 3			; CHECK-NEXT: # Child Loop BB0_24 Depth 3
	; CHECK-NEXT: testq %rsi, %rsi			; CHECK-NEXT: testq %rdx, %rdx
	; CHECK-NEXT: js .LBB0_25			; CHECK-NEXT: js .LBB0_25
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: .LBB0_24: # %a45b			; CHECK-NEXT: .LBB0_24: # %a45b
	; CHECK-NEXT: # Parent Loop BB0_3 Depth=1			; CHECK-NEXT: # Parent Loop BB0_3 Depth=1
	; CHECK-NEXT: # Parent Loop BB0_23 Depth=2			; CHECK-NEXT: # Parent Loop BB0_23 Depth=2
	; CHECK-NEXT: # => This Inner Loop Header: Depth=3			; CHECK-NEXT: # => This Inner Loop Header: Depth=3
	; CHECK-NEXT: testb %dil, %dil			; CHECK-NEXT: testb %dil, %dil
	; CHECK-NEXT: je .LBB0_24			; CHECK-NEXT: je .LBB0_24
	▲ Show 20 Lines • Show All 458 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/MachineSink-eflags.ll

	Show All 10 Lines
	%4 = type <{ %5, i8, i32, i32, [4 x i64], [4 x i64], [4 x i64], [4 x i64], [4 x i64] }>			%4 = type <{ %5, i8, i32, i32, [4 x i64], [4 x i64], [4 x i64], [4 x i64], [4 x i64] }>
	%5 = type <{ void (i32), i8, i32 (i8, ...) }>			%5 = type <{ void (i32), i8, i32 (i8, ...) }>

	define void @foo(i8* nocapture %_stubArgs) nounwind {			define void @foo(i8* nocapture %_stubArgs) nounwind {
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: subq $152, %rsp			; CHECK-NEXT: subq $152, %rsp
	; CHECK-NEXT: movq 48(%rdi), %rax			; CHECK-NEXT: movq 48(%rdi), %rax
	; CHECK-NEXT: movl 64(%rdi), %edx			; CHECK-NEXT: movl 64(%rdi), %ecx
	; CHECK-NEXT: movl $200, %esi			; CHECK-NEXT: movl $200, %esi
	; CHECK-NEXT: addl 68(%rdi), %esi			; CHECK-NEXT: addl 68(%rdi), %esi
	; CHECK-NEXT: imull $46, %edx, %ecx			; CHECK-NEXT: imull $46, %ecx, %edx
	; CHECK-NEXT: addq %rsi, %rcx
	; CHECK-NEXT: shlq $4, %rcx
	; CHECK-NEXT: imull $47, %edx, %edx
	; CHECK-NEXT: addq %rsi, %rdx			; CHECK-NEXT: addq %rsi, %rdx
	; CHECK-NEXT: shlq $4, %rdx			; CHECK-NEXT: shlq $4, %rdx
	; CHECK-NEXT: movaps (%rax,%rdx), %xmm0			; CHECK-NEXT: imull $47, %ecx, %ecx
				; CHECK-NEXT: addq %rsi, %rcx
				; CHECK-NEXT: shlq $4, %rcx
	; CHECK-NEXT: cmpl $0, (%rdi)			; CHECK-NEXT: cmpl $0, (%rdi)
	; CHECK-NEXT: jne .LBB0_1			; CHECK-NEXT: jne .LBB0_1
	; CHECK-NEXT: # %bb.2: # %entry			; CHECK-NEXT: # %bb.2: # %entry
	; CHECK-NEXT: xorps %xmm1, %xmm1			; CHECK-NEXT: xorps %xmm0, %xmm0
	; CHECK-NEXT: movaps %xmm1, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: jmp .LBB0_3
	; CHECK-NEXT: je .LBB0_4
	; CHECK-NEXT: jmp .LBB0_5
	; CHECK-NEXT: .LBB0_1:			; CHECK-NEXT: .LBB0_1:
				; CHECK-NEXT: movaps (%rax,%rdx), %xmm0
				; CHECK-NEXT: .LBB0_3: # %entry
	; CHECK-NEXT: movaps (%rax,%rcx), %xmm1			; CHECK-NEXT: movaps (%rax,%rcx), %xmm1
	; CHECK-NEXT: movaps %xmm1, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movaps %xmm0, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: jne .LBB0_5			; CHECK-NEXT: jne .LBB0_5
	; CHECK-NEXT: .LBB0_4: # %entry			; CHECK-NEXT: # %bb.4: # %entry
	; CHECK-NEXT: xorps %xmm0, %xmm0			; CHECK-NEXT: xorps %xmm1, %xmm1
	; CHECK-NEXT: .LBB0_5: # %entry			; CHECK-NEXT: .LBB0_5: # %entry
	; CHECK-NEXT: movaps %xmm0, -{{[0-9]+}}(%rsp)			; CHECK-NEXT: movaps %xmm1, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: addq $152, %rsp			; CHECK-NEXT: addq $152, %rsp
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%i0 = alloca i8*, align 8			%i0 = alloca i8*, align 8
	%i2 = alloca i8*, align 8			%i2 = alloca i8*, align 8
	%b.i = alloca [16 x <2 x double>], align 16			%b.i = alloca [16 x <2 x double>], align 16
	%conv = bitcast i8* %_stubArgs to i32*			%conv = bitcast i8* %_stubArgs to i32*
	%tmp1 = load i32, i32* %conv, align 4			%tmp1 = load i32, i32* %conv, align 4
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx2-masked-gather.ll

	Show First 20 Lines • Show All 352 Lines • ▼ Show 20 Lines
	; X64-NEXT: vextracti128 $1, %ymm0, %xmm5			; X64-NEXT: vextracti128 $1, %ymm0, %xmm5
	; X64-NEXT: vpgatherqd %xmm5, (,%ymm3), %xmm4			; X64-NEXT: vpgatherqd %xmm5, (,%ymm3), %xmm4
	; X64-NEXT: vpgatherqd %xmm0, (,%ymm2), %xmm1			; X64-NEXT: vpgatherqd %xmm0, (,%ymm2), %xmm1
	; X64-NEXT: vinserti128 $1, %xmm4, %ymm1, %ymm0			; X64-NEXT: vinserti128 $1, %xmm4, %ymm1, %ymm0
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; NOGATHER-LABEL: masked_gather_v8i32:			; NOGATHER-LABEL: masked_gather_v8i32:
	; NOGATHER: # %bb.0: # %entry			; NOGATHER: # %bb.0: # %entry
	; NOGATHER-NEXT: vmovdqa (%rdi), %ymm3			; NOGATHER-NEXT: vmovdqa (%rdi), %ymm2
	; NOGATHER-NEXT: vmovdqa 32(%rdi), %ymm2
	; NOGATHER-NEXT: vpsllw $15, %xmm0, %xmm0			; NOGATHER-NEXT: vpsllw $15, %xmm0, %xmm0
	; NOGATHER-NEXT: vpacksswb %xmm0, %xmm0, %xmm0			; NOGATHER-NEXT: vpacksswb %xmm0, %xmm0, %xmm0
	; NOGATHER-NEXT: vpmovmskb %xmm0, %eax			; NOGATHER-NEXT: vpmovmskb %xmm0, %eax
	; NOGATHER-NEXT: testb $1, %al			; NOGATHER-NEXT: testb $1, %al
	; NOGATHER-NEXT: je .LBB6_2			; NOGATHER-NEXT: je .LBB6_2
	; NOGATHER-NEXT: # %bb.1: # %cond.load			; NOGATHER-NEXT: # %bb.1: # %cond.load
	; NOGATHER-NEXT: vmovq %xmm3, %rcx			; NOGATHER-NEXT: vmovq %xmm2, %rcx
	; NOGATHER-NEXT: vpinsrd $0, (%rcx), %xmm1, %xmm0			; NOGATHER-NEXT: vpinsrd $0, (%rcx), %xmm1, %xmm0
	; NOGATHER-NEXT: vblendps {{.*#+}} ymm1 = ymm0[0,1,2,3],ymm1[4,5,6,7]			; NOGATHER-NEXT: vblendps {{.*#+}} ymm1 = ymm0[0,1,2,3],ymm1[4,5,6,7]
	; NOGATHER-NEXT: .LBB6_2: # %else			; NOGATHER-NEXT: .LBB6_2: # %else
	; NOGATHER-NEXT: testb $2, %al			; NOGATHER-NEXT: testb $2, %al
	; NOGATHER-NEXT: je .LBB6_4			; NOGATHER-NEXT: je .LBB6_4
	; NOGATHER-NEXT: # %bb.3: # %cond.load1			; NOGATHER-NEXT: # %bb.3: # %cond.load1
	; NOGATHER-NEXT: vpextrq $1, %xmm3, %rcx			; NOGATHER-NEXT: vpextrq $1, %xmm2, %rcx
	; NOGATHER-NEXT: vpinsrd $1, (%rcx), %xmm1, %xmm0			; NOGATHER-NEXT: vpinsrd $1, (%rcx), %xmm1, %xmm0
	; NOGATHER-NEXT: vblendps {{.*#+}} ymm1 = ymm0[0,1,2,3],ymm1[4,5,6,7]			; NOGATHER-NEXT: vblendps {{.*#+}} ymm1 = ymm0[0,1,2,3],ymm1[4,5,6,7]
	; NOGATHER-NEXT: .LBB6_4: # %else2			; NOGATHER-NEXT: .LBB6_4: # %else2
	; NOGATHER-NEXT: vextractf128 $1, %ymm3, %xmm0			; NOGATHER-NEXT: vextractf128 $1, %ymm2, %xmm0
	; NOGATHER-NEXT: testb $4, %al			; NOGATHER-NEXT: testb $4, %al
	; NOGATHER-NEXT: jne .LBB6_5			; NOGATHER-NEXT: je .LBB6_6
	; NOGATHER-NEXT: # %bb.6: # %else5			; NOGATHER-NEXT: # %bb.5: # %cond.load4
				; NOGATHER-NEXT: vmovq %xmm0, %rcx
				; NOGATHER-NEXT: vpinsrd $2, (%rcx), %xmm1, %xmm2
				; NOGATHER-NEXT: vblendps {{.*#+}} ymm1 = ymm2[0,1,2,3],ymm1[4,5,6,7]
				; NOGATHER-NEXT: .LBB6_6: # %else5
	; NOGATHER-NEXT: testb $8, %al			; NOGATHER-NEXT: testb $8, %al
	; NOGATHER-NEXT: jne .LBB6_7			; NOGATHER-NEXT: je .LBB6_8
				; NOGATHER-NEXT: # %bb.7: # %cond.load7
				; NOGATHER-NEXT: vpextrq $1, %xmm0, %rcx
				; NOGATHER-NEXT: vpinsrd $3, (%rcx), %xmm1, %xmm0
				; NOGATHER-NEXT: vblendps {{.*#+}} ymm1 = ymm0[0,1,2,3],ymm1[4,5,6,7]
	; NOGATHER-NEXT: .LBB6_8: # %else8			; NOGATHER-NEXT: .LBB6_8: # %else8
				; NOGATHER-NEXT: vmovdqa 32(%rdi), %ymm0
	; NOGATHER-NEXT: testb $16, %al			; NOGATHER-NEXT: testb $16, %al
	; NOGATHER-NEXT: jne .LBB6_9			; NOGATHER-NEXT: je .LBB6_10
				; NOGATHER-NEXT: # %bb.9: # %cond.load10
				; NOGATHER-NEXT: vmovq %xmm0, %rcx
				; NOGATHER-NEXT: vextractf128 $1, %ymm1, %xmm2
				; NOGATHER-NEXT: vpinsrd $0, (%rcx), %xmm2, %xmm2
				; NOGATHER-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
	; NOGATHER-NEXT: .LBB6_10: # %else11			; NOGATHER-NEXT: .LBB6_10: # %else11
	; NOGATHER-NEXT: testb $32, %al			; NOGATHER-NEXT: testb $32, %al
	; NOGATHER-NEXT: je .LBB6_12			; NOGATHER-NEXT: je .LBB6_12
	; NOGATHER-NEXT: .LBB6_11: # %cond.load13			; NOGATHER-NEXT: # %bb.11: # %cond.load13
	; NOGATHER-NEXT: vpextrq $1, %xmm2, %rcx			; NOGATHER-NEXT: vpextrq $1, %xmm0, %rcx
	; NOGATHER-NEXT: vextractf128 $1, %ymm1, %xmm0			; NOGATHER-NEXT: vextractf128 $1, %ymm1, %xmm2
	; NOGATHER-NEXT: vpinsrd $1, (%rcx), %xmm0, %xmm0			; NOGATHER-NEXT: vpinsrd $1, (%rcx), %xmm2, %xmm2
	; NOGATHER-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm1			; NOGATHER-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
	; NOGATHER-NEXT: .LBB6_12: # %else14			; NOGATHER-NEXT: .LBB6_12: # %else14
	; NOGATHER-NEXT: vextractf128 $1, %ymm2, %xmm0			; NOGATHER-NEXT: vextractf128 $1, %ymm0, %xmm0
	; NOGATHER-NEXT: testb $64, %al			; NOGATHER-NEXT: testb $64, %al
	; NOGATHER-NEXT: jne .LBB6_13			; NOGATHER-NEXT: jne .LBB6_13
	; NOGATHER-NEXT: # %bb.14: # %else17			; NOGATHER-NEXT: # %bb.14: # %else17
	; NOGATHER-NEXT: testb $-128, %al			; NOGATHER-NEXT: testb $-128, %al
	; NOGATHER-NEXT: jne .LBB6_15			; NOGATHER-NEXT: jne .LBB6_15
	; NOGATHER-NEXT: .LBB6_16: # %else20			; NOGATHER-NEXT: .LBB6_16: # %else20
	; NOGATHER-NEXT: vmovaps %ymm1, %ymm0			; NOGATHER-NEXT: vmovaps %ymm1, %ymm0
	; NOGATHER-NEXT: retq			; NOGATHER-NEXT: retq
	; NOGATHER-NEXT: .LBB6_5: # %cond.load4
	; NOGATHER-NEXT: vmovq %xmm0, %rcx
	; NOGATHER-NEXT: vpinsrd $2, (%rcx), %xmm1, %xmm3
	; NOGATHER-NEXT: vblendps {{.*#+}} ymm1 = ymm3[0,1,2,3],ymm1[4,5,6,7]
	; NOGATHER-NEXT: testb $8, %al
	; NOGATHER-NEXT: je .LBB6_8
	; NOGATHER-NEXT: .LBB6_7: # %cond.load7
	; NOGATHER-NEXT: vpextrq $1, %xmm0, %rcx
	; NOGATHER-NEXT: vpinsrd $3, (%rcx), %xmm1, %xmm0
	; NOGATHER-NEXT: vblendps {{.*#+}} ymm1 = ymm0[0,1,2,3],ymm1[4,5,6,7]
	; NOGATHER-NEXT: testb $16, %al
	; NOGATHER-NEXT: je .LBB6_10
	; NOGATHER-NEXT: .LBB6_9: # %cond.load10
	; NOGATHER-NEXT: vmovq %xmm2, %rcx
	; NOGATHER-NEXT: vextractf128 $1, %ymm1, %xmm0
	; NOGATHER-NEXT: vpinsrd $0, (%rcx), %xmm0, %xmm0
	; NOGATHER-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm1
	; NOGATHER-NEXT: testb $32, %al
	; NOGATHER-NEXT: jne .LBB6_11
	; NOGATHER-NEXT: jmp .LBB6_12
	; NOGATHER-NEXT: .LBB6_13: # %cond.load16			; NOGATHER-NEXT: .LBB6_13: # %cond.load16
	; NOGATHER-NEXT: vmovq %xmm0, %rcx			; NOGATHER-NEXT: vmovq %xmm0, %rcx
	; NOGATHER-NEXT: vextractf128 $1, %ymm1, %xmm2			; NOGATHER-NEXT: vextractf128 $1, %ymm1, %xmm2
	; NOGATHER-NEXT: vpinsrd $2, (%rcx), %xmm2, %xmm2			; NOGATHER-NEXT: vpinsrd $2, (%rcx), %xmm2, %xmm2
	; NOGATHER-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1			; NOGATHER-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
	; NOGATHER-NEXT: testb $-128, %al			; NOGATHER-NEXT: testb $-128, %al
	; NOGATHER-NEXT: je .LBB6_16			; NOGATHER-NEXT: je .LBB6_16
	; NOGATHER-NEXT: .LBB6_15: # %cond.load19			; NOGATHER-NEXT: .LBB6_15: # %cond.load19
	Show All 32 Lines
	; X64-NEXT: vextracti128 $1, %ymm0, %xmm5			; X64-NEXT: vextracti128 $1, %ymm0, %xmm5
	; X64-NEXT: vgatherqps %xmm5, (,%ymm3), %xmm4			; X64-NEXT: vgatherqps %xmm5, (,%ymm3), %xmm4
	; X64-NEXT: vgatherqps %xmm0, (,%ymm2), %xmm1			; X64-NEXT: vgatherqps %xmm0, (,%ymm2), %xmm1
	; X64-NEXT: vinsertf128 $1, %xmm4, %ymm1, %ymm0			; X64-NEXT: vinsertf128 $1, %xmm4, %ymm1, %ymm0
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; NOGATHER-LABEL: masked_gather_v8float:			; NOGATHER-LABEL: masked_gather_v8float:
	; NOGATHER: # %bb.0: # %entry			; NOGATHER: # %bb.0: # %entry
	; NOGATHER-NEXT: vmovdqa (%rdi), %ymm3			; NOGATHER-NEXT: vmovdqa (%rdi), %ymm2
	; NOGATHER-NEXT: vmovdqa 32(%rdi), %ymm2
	; NOGATHER-NEXT: vpsllw $15, %xmm0, %xmm0			; NOGATHER-NEXT: vpsllw $15, %xmm0, %xmm0
	; NOGATHER-NEXT: vpacksswb %xmm0, %xmm0, %xmm0			; NOGATHER-NEXT: vpacksswb %xmm0, %xmm0, %xmm0
	; NOGATHER-NEXT: vpmovmskb %xmm0, %eax			; NOGATHER-NEXT: vpmovmskb %xmm0, %eax
	; NOGATHER-NEXT: testb $1, %al			; NOGATHER-NEXT: testb $1, %al
	; NOGATHER-NEXT: je .LBB7_2			; NOGATHER-NEXT: je .LBB7_2
	; NOGATHER-NEXT: # %bb.1: # %cond.load			; NOGATHER-NEXT: # %bb.1: # %cond.load
	; NOGATHER-NEXT: vmovq %xmm3, %rcx			; NOGATHER-NEXT: vmovq %xmm2, %rcx
	; NOGATHER-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; NOGATHER-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; NOGATHER-NEXT: vblendps {{.*#+}} ymm1 = ymm0[0],ymm1[1,2,3,4,5,6,7]			; NOGATHER-NEXT: vblendps {{.*#+}} ymm1 = ymm0[0],ymm1[1,2,3,4,5,6,7]
	; NOGATHER-NEXT: .LBB7_2: # %else			; NOGATHER-NEXT: .LBB7_2: # %else
	; NOGATHER-NEXT: testb $2, %al			; NOGATHER-NEXT: testb $2, %al
	; NOGATHER-NEXT: je .LBB7_4			; NOGATHER-NEXT: je .LBB7_4
	; NOGATHER-NEXT: # %bb.3: # %cond.load1			; NOGATHER-NEXT: # %bb.3: # %cond.load1
	; NOGATHER-NEXT: vpextrq $1, %xmm3, %rcx			; NOGATHER-NEXT: vpextrq $1, %xmm2, %rcx
	; NOGATHER-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0],mem[0],xmm1[2,3]			; NOGATHER-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0],mem[0],xmm1[2,3]
	; NOGATHER-NEXT: vblendps {{.*#+}} ymm1 = ymm0[0,1,2,3],ymm1[4,5,6,7]			; NOGATHER-NEXT: vblendps {{.*#+}} ymm1 = ymm0[0,1,2,3],ymm1[4,5,6,7]
	; NOGATHER-NEXT: .LBB7_4: # %else2			; NOGATHER-NEXT: .LBB7_4: # %else2
	; NOGATHER-NEXT: vextractf128 $1, %ymm3, %xmm0			; NOGATHER-NEXT: vextractf128 $1, %ymm2, %xmm0
	; NOGATHER-NEXT: testb $4, %al			; NOGATHER-NEXT: testb $4, %al
	; NOGATHER-NEXT: jne .LBB7_5			; NOGATHER-NEXT: je .LBB7_6
	; NOGATHER-NEXT: # %bb.6: # %else5			; NOGATHER-NEXT: # %bb.5: # %cond.load4
				; NOGATHER-NEXT: vmovq %xmm0, %rcx
				; NOGATHER-NEXT: vinsertps {{.*#+}} xmm2 = xmm1[0,1],mem[0],xmm1[3]
				; NOGATHER-NEXT: vblendps {{.*#+}} ymm1 = ymm2[0,1,2,3],ymm1[4,5,6,7]
				; NOGATHER-NEXT: .LBB7_6: # %else5
	; NOGATHER-NEXT: testb $8, %al			; NOGATHER-NEXT: testb $8, %al
	; NOGATHER-NEXT: jne .LBB7_7			; NOGATHER-NEXT: je .LBB7_8
				; NOGATHER-NEXT: # %bb.7: # %cond.load7
				; NOGATHER-NEXT: vpextrq $1, %xmm0, %rcx
				; NOGATHER-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],mem[0]
				; NOGATHER-NEXT: vblendps {{.*#+}} ymm1 = ymm0[0,1,2,3],ymm1[4,5,6,7]
	; NOGATHER-NEXT: .LBB7_8: # %else8			; NOGATHER-NEXT: .LBB7_8: # %else8
				; NOGATHER-NEXT: vmovdqa 32(%rdi), %ymm0
	; NOGATHER-NEXT: testb $16, %al			; NOGATHER-NEXT: testb $16, %al
	; NOGATHER-NEXT: jne .LBB7_9			; NOGATHER-NEXT: je .LBB7_10
				; NOGATHER-NEXT: # %bb.9: # %cond.load10
				; NOGATHER-NEXT: vmovq %xmm0, %rcx
				; NOGATHER-NEXT: vextractf128 $1, %ymm1, %xmm2
				; NOGATHER-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero
				; NOGATHER-NEXT: vblendps {{.*#+}} xmm2 = xmm3[0],xmm2[1,2,3]
				; NOGATHER-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
	; NOGATHER-NEXT: .LBB7_10: # %else11			; NOGATHER-NEXT: .LBB7_10: # %else11
	; NOGATHER-NEXT: testb $32, %al			; NOGATHER-NEXT: testb $32, %al
	; NOGATHER-NEXT: je .LBB7_12			; NOGATHER-NEXT: je .LBB7_12
	; NOGATHER-NEXT: .LBB7_11: # %cond.load13			; NOGATHER-NEXT: # %bb.11: # %cond.load13
	; NOGATHER-NEXT: vpextrq $1, %xmm2, %rcx			; NOGATHER-NEXT: vpextrq $1, %xmm0, %rcx
	; NOGATHER-NEXT: vextractf128 $1, %ymm1, %xmm0			; NOGATHER-NEXT: vextractf128 $1, %ymm1, %xmm2
	; NOGATHER-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[2,3]			; NOGATHER-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0],mem[0],xmm2[2,3]
	; NOGATHER-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm1			; NOGATHER-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
	; NOGATHER-NEXT: .LBB7_12: # %else14			; NOGATHER-NEXT: .LBB7_12: # %else14
	; NOGATHER-NEXT: vextractf128 $1, %ymm2, %xmm0			; NOGATHER-NEXT: vextractf128 $1, %ymm0, %xmm0
	; NOGATHER-NEXT: testb $64, %al			; NOGATHER-NEXT: testb $64, %al
	; NOGATHER-NEXT: jne .LBB7_13			; NOGATHER-NEXT: jne .LBB7_13
	; NOGATHER-NEXT: # %bb.14: # %else17			; NOGATHER-NEXT: # %bb.14: # %else17
	; NOGATHER-NEXT: testb $-128, %al			; NOGATHER-NEXT: testb $-128, %al
	; NOGATHER-NEXT: jne .LBB7_15			; NOGATHER-NEXT: jne .LBB7_15
	; NOGATHER-NEXT: .LBB7_16: # %else20			; NOGATHER-NEXT: .LBB7_16: # %else20
	; NOGATHER-NEXT: vmovaps %ymm1, %ymm0			; NOGATHER-NEXT: vmovaps %ymm1, %ymm0
	; NOGATHER-NEXT: retq			; NOGATHER-NEXT: retq
	; NOGATHER-NEXT: .LBB7_5: # %cond.load4
	; NOGATHER-NEXT: vmovq %xmm0, %rcx
	; NOGATHER-NEXT: vinsertps {{.*#+}} xmm3 = xmm1[0,1],mem[0],xmm1[3]
	; NOGATHER-NEXT: vblendps {{.*#+}} ymm1 = ymm3[0,1,2,3],ymm1[4,5,6,7]
	; NOGATHER-NEXT: testb $8, %al
	; NOGATHER-NEXT: je .LBB7_8
	; NOGATHER-NEXT: .LBB7_7: # %cond.load7
	; NOGATHER-NEXT: vpextrq $1, %xmm0, %rcx
	; NOGATHER-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],mem[0]
	; NOGATHER-NEXT: vblendps {{.*#+}} ymm1 = ymm0[0,1,2,3],ymm1[4,5,6,7]
	; NOGATHER-NEXT: testb $16, %al
	; NOGATHER-NEXT: je .LBB7_10
	; NOGATHER-NEXT: .LBB7_9: # %cond.load10
	; NOGATHER-NEXT: vmovq %xmm2, %rcx
	; NOGATHER-NEXT: vextractf128 $1, %ymm1, %xmm0
	; NOGATHER-NEXT: vmovd {{.*#+}} xmm3 = mem[0],zero,zero,zero
	; NOGATHER-NEXT: vpblendw {{.*#+}} xmm0 = xmm3[0,1],xmm0[2,3,4,5,6,7]
	; NOGATHER-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm1
	; NOGATHER-NEXT: testb $32, %al
	; NOGATHER-NEXT: jne .LBB7_11
	; NOGATHER-NEXT: jmp .LBB7_12
	; NOGATHER-NEXT: .LBB7_13: # %cond.load16			; NOGATHER-NEXT: .LBB7_13: # %cond.load16
	; NOGATHER-NEXT: vmovq %xmm0, %rcx			; NOGATHER-NEXT: vmovq %xmm0, %rcx
	; NOGATHER-NEXT: vextractf128 $1, %ymm1, %xmm2			; NOGATHER-NEXT: vextractf128 $1, %ymm1, %xmm2
	; NOGATHER-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,1],mem[0],xmm2[3]			; NOGATHER-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,1],mem[0],xmm2[3]
	; NOGATHER-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1			; NOGATHER-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
	; NOGATHER-NEXT: testb $-128, %al			; NOGATHER-NEXT: testb $-128, %al
	; NOGATHER-NEXT: je .LBB7_16			; NOGATHER-NEXT: je .LBB7_16
	; NOGATHER-NEXT: .LBB7_15: # %cond.load19			; NOGATHER-NEXT: .LBB7_15: # %cond.load19
	▲ Show 20 Lines • Show All 271 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/cmovcmov.ll

	Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines
	; NOCMOV-NEXT: pushl %edi			; NOCMOV-NEXT: pushl %edi
	; NOCMOV-NEXT: pushl %esi			; NOCMOV-NEXT: pushl %esi
	; NOCMOV-NEXT: flds {{[0-9]+}}(%esp)			; NOCMOV-NEXT: flds {{[0-9]+}}(%esp)
	; NOCMOV-NEXT: flds {{[0-9]+}}(%esp)			; NOCMOV-NEXT: flds {{[0-9]+}}(%esp)
	; NOCMOV-NEXT: fucompp			; NOCMOV-NEXT: fucompp
	; NOCMOV-NEXT: fnstsw %ax			; NOCMOV-NEXT: fnstsw %ax
	; NOCMOV-NEXT: # kill: def $ah killed $ah killed $ax			; NOCMOV-NEXT: # kill: def $ah killed $ah killed $ax
	; NOCMOV-NEXT: sahf			; NOCMOV-NEXT: sahf
	; NOCMOV-NEXT: leal {{[0-9]+}}(%esp), %eax			; NOCMOV-NEXT: leal {{[0-9]+}}(%esp), %ecx
	; NOCMOV-NEXT: jne .LBB4_3			; NOCMOV-NEXT: jne .LBB4_3
	; NOCMOV-NEXT: # %bb.1: # %entry			; NOCMOV-NEXT: # %bb.1: # %entry
	; NOCMOV-NEXT: jp .LBB4_3			; NOCMOV-NEXT: jp .LBB4_3
	; NOCMOV-NEXT: # %bb.2: # %entry			; NOCMOV-NEXT: # %bb.2: # %entry
	; NOCMOV-NEXT: leal {{[0-9]+}}(%esp), %eax			; NOCMOV-NEXT: leal {{[0-9]+}}(%esp), %ecx
	; NOCMOV-NEXT: .LBB4_3: # %entry			; NOCMOV-NEXT: .LBB4_3: # %entry
	; NOCMOV-NEXT: movl (%eax), %ecx
	; NOCMOV-NEXT: leal {{[0-9]+}}(%esp), %edx			; NOCMOV-NEXT: leal {{[0-9]+}}(%esp), %edx
	; NOCMOV-NEXT: jne .LBB4_6			; NOCMOV-NEXT: jne .LBB4_6
	; NOCMOV-NEXT: # %bb.4: # %entry			; NOCMOV-NEXT: # %bb.4: # %entry
	; NOCMOV-NEXT: jp .LBB4_6			; NOCMOV-NEXT: jp .LBB4_6
	; NOCMOV-NEXT: # %bb.5: # %entry			; NOCMOV-NEXT: # %bb.5: # %entry
	; NOCMOV-NEXT: leal {{[0-9]+}}(%esp), %edx			; NOCMOV-NEXT: leal {{[0-9]+}}(%esp), %edx
	; NOCMOV-NEXT: .LBB4_6: # %entry			; NOCMOV-NEXT: .LBB4_6: # %entry
	; NOCMOV-NEXT: movl {{[0-9]+}}(%esp), %eax			; NOCMOV-NEXT: movl {{[0-9]+}}(%esp), %eax
	; NOCMOV-NEXT: movl (%edx), %edx
	; NOCMOV-NEXT: leal {{[0-9]+}}(%esp), %esi			; NOCMOV-NEXT: leal {{[0-9]+}}(%esp), %esi
	; NOCMOV-NEXT: jne .LBB4_9			; NOCMOV-NEXT: jne .LBB4_9
	; NOCMOV-NEXT: # %bb.7: # %entry			; NOCMOV-NEXT: # %bb.7: # %entry
	; NOCMOV-NEXT: jp .LBB4_9			; NOCMOV-NEXT: jp .LBB4_9
	; NOCMOV-NEXT: # %bb.8: # %entry			; NOCMOV-NEXT: # %bb.8: # %entry
	; NOCMOV-NEXT: leal {{[0-9]+}}(%esp), %esi			; NOCMOV-NEXT: leal {{[0-9]+}}(%esp), %esi
	; NOCMOV-NEXT: .LBB4_9: # %entry			; NOCMOV-NEXT: .LBB4_9: # %entry
				; NOCMOV-NEXT: movl (%ecx), %ecx
				; NOCMOV-NEXT: movl (%edx), %edx
	; NOCMOV-NEXT: movl (%esi), %esi			; NOCMOV-NEXT: movl (%esi), %esi
	; NOCMOV-NEXT: leal {{[0-9]+}}(%esp), %edi			; NOCMOV-NEXT: leal {{[0-9]+}}(%esp), %edi
	; NOCMOV-NEXT: jne .LBB4_12			; NOCMOV-NEXT: jne .LBB4_12
	; NOCMOV-NEXT: # %bb.10: # %entry			; NOCMOV-NEXT: # %bb.10: # %entry
	; NOCMOV-NEXT: jp .LBB4_12			; NOCMOV-NEXT: jp .LBB4_12
	; NOCMOV-NEXT: # %bb.11: # %entry			; NOCMOV-NEXT: # %bb.11: # %entry
	; NOCMOV-NEXT: leal {{[0-9]+}}(%esp), %edi			; NOCMOV-NEXT: leal {{[0-9]+}}(%esp), %edi
	; NOCMOV-NEXT: .LBB4_12: # %entry			; NOCMOV-NEXT: .LBB4_12: # %entry
	▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/select.ll

	Show First 20 Lines • Show All 551 Lines • ▼ Show 20 Lines
	; MCU: # %bb.0:			; MCU: # %bb.0:
	; MCU-NEXT: pushl %ebp			; MCU-NEXT: pushl %ebp
	; MCU-NEXT: pushl %ebx			; MCU-NEXT: pushl %ebx
	; MCU-NEXT: pushl %edi			; MCU-NEXT: pushl %edi
	; MCU-NEXT: pushl %esi			; MCU-NEXT: pushl %esi
	; MCU-NEXT: testb $1, %al			; MCU-NEXT: testb $1, %al
	; MCU-NEXT: jne .LBB7_1			; MCU-NEXT: jne .LBB7_1
	; MCU-NEXT: # %bb.2:			; MCU-NEXT: # %bb.2:
	; MCU-NEXT: leal {{[0-9]+}}(%esp), %eax			; MCU-NEXT: leal {{[0-9]+}}(%esp), %edi
	; MCU-NEXT: movl (%eax), %eax
	; MCU-NEXT: je .LBB7_5			; MCU-NEXT: je .LBB7_5
	; MCU-NEXT: .LBB7_4:			; MCU-NEXT: .LBB7_4:
	; MCU-NEXT: leal {{[0-9]+}}(%esp), %ecx			; MCU-NEXT: leal {{[0-9]+}}(%esp), %ecx
	; MCU-NEXT: movl (%ecx), %ecx
	; MCU-NEXT: je .LBB7_8			; MCU-NEXT: je .LBB7_8
	; MCU-NEXT: .LBB7_7:			; MCU-NEXT: .LBB7_7:
	; MCU-NEXT: leal {{[0-9]+}}(%esp), %esi			; MCU-NEXT: leal {{[0-9]+}}(%esp), %esi
	; MCU-NEXT: movl (%esi), %esi
	; MCU-NEXT: je .LBB7_11			; MCU-NEXT: je .LBB7_11
	; MCU-NEXT: .LBB7_10:			; MCU-NEXT: .LBB7_10:
	; MCU-NEXT: leal {{[0-9]+}}(%esp), %edi			; MCU-NEXT: leal {{[0-9]+}}(%esp), %ebp
	; MCU-NEXT: movl (%edi), %edi
	; MCU-NEXT: je .LBB7_14			; MCU-NEXT: je .LBB7_14
	; MCU-NEXT: .LBB7_13:			; MCU-NEXT: .LBB7_13:
	; MCU-NEXT: leal {{[0-9]+}}(%esp), %ebx
	; MCU-NEXT: movl (%ebx), %ebx
	; MCU-NEXT: je .LBB7_17
	; MCU-NEXT: .LBB7_16:
	; MCU-NEXT: leal {{[0-9]+}}(%esp), %ebp
	; MCU-NEXT: jmp .LBB7_18
	; MCU-NEXT: .LBB7_1:
	; MCU-NEXT: leal {{[0-9]+}}(%esp), %eax			; MCU-NEXT: leal {{[0-9]+}}(%esp), %eax
	; MCU-NEXT: movl (%eax), %eax			; MCU-NEXT: jmp .LBB7_15
				; MCU-NEXT: .LBB7_1:
				; MCU-NEXT: leal {{[0-9]+}}(%esp), %edi
	; MCU-NEXT: jne .LBB7_4			; MCU-NEXT: jne .LBB7_4
	; MCU-NEXT: .LBB7_5:			; MCU-NEXT: .LBB7_5:
	; MCU-NEXT: leal {{[0-9]+}}(%esp), %ecx			; MCU-NEXT: leal {{[0-9]+}}(%esp), %ecx
	; MCU-NEXT: movl (%ecx), %ecx
	; MCU-NEXT: jne .LBB7_7			; MCU-NEXT: jne .LBB7_7
	; MCU-NEXT: .LBB7_8:			; MCU-NEXT: .LBB7_8:
	; MCU-NEXT: leal {{[0-9]+}}(%esp), %esi			; MCU-NEXT: leal {{[0-9]+}}(%esp), %esi
	; MCU-NEXT: movl (%esi), %esi
	; MCU-NEXT: jne .LBB7_10			; MCU-NEXT: jne .LBB7_10
	; MCU-NEXT: .LBB7_11:			; MCU-NEXT: .LBB7_11:
	; MCU-NEXT: leal {{[0-9]+}}(%esp), %edi			; MCU-NEXT: leal {{[0-9]+}}(%esp), %ebp
	; MCU-NEXT: movl (%edi), %edi
	; MCU-NEXT: jne .LBB7_13			; MCU-NEXT: jne .LBB7_13
	; MCU-NEXT: .LBB7_14:			; MCU-NEXT: .LBB7_14:
	; MCU-NEXT: leal {{[0-9]+}}(%esp), %ebx			; MCU-NEXT: leal {{[0-9]+}}(%esp), %eax
	; MCU-NEXT: movl (%ebx), %ebx			; MCU-NEXT: .LBB7_15:
				; MCU-NEXT: movl (%edi), %ebx
				; MCU-NEXT: movl (%ecx), %edi
				; MCU-NEXT: movl (%esi), %esi
				; MCU-NEXT: movl (%ebp), %ecx
				; MCU-NEXT: movl (%eax), %eax
	; MCU-NEXT: jne .LBB7_16			; MCU-NEXT: jne .LBB7_16
	; MCU-NEXT: .LBB7_17:			; MCU-NEXT: # %bb.17:
				; MCU-NEXT: leal {{[0-9]+}}(%esp), %ebp
				; MCU-NEXT: jmp .LBB7_18
				; MCU-NEXT: .LBB7_16:
	; MCU-NEXT: leal {{[0-9]+}}(%esp), %ebp			; MCU-NEXT: leal {{[0-9]+}}(%esp), %ebp
	; MCU-NEXT: .LBB7_18:			; MCU-NEXT: .LBB7_18:
	; MCU-NEXT: movl (%ebp), %ebp			; MCU-NEXT: movl (%ebp), %ebp
	; MCU-NEXT: decl %ebp			; MCU-NEXT: decl %ebp
	; MCU-NEXT: decl %ebx
	; MCU-NEXT: decl %edi
	; MCU-NEXT: decl %esi
	; MCU-NEXT: decl %ecx
	; MCU-NEXT: decl %eax			; MCU-NEXT: decl %eax
	; MCU-NEXT: movl %eax, 20(%edx)			; MCU-NEXT: decl %ecx
	; MCU-NEXT: movl %ecx, 16(%edx)			; MCU-NEXT: decl %esi
				; MCU-NEXT: decl %edi
				; MCU-NEXT: decl %ebx
				; MCU-NEXT: movl %ebx, 20(%edx)
				; MCU-NEXT: movl %edi, 16(%edx)
	; MCU-NEXT: movl %esi, 12(%edx)			; MCU-NEXT: movl %esi, 12(%edx)
	; MCU-NEXT: movl %edi, 8(%edx)			; MCU-NEXT: movl %ecx, 8(%edx)
	; MCU-NEXT: movl %ebx, 4(%edx)			; MCU-NEXT: movl %eax, 4(%edx)
	; MCU-NEXT: movl %ebp, (%edx)			; MCU-NEXT: movl %ebp, (%edx)
	; MCU-NEXT: popl %esi			; MCU-NEXT: popl %esi
	; MCU-NEXT: popl %edi			; MCU-NEXT: popl %edi
	; MCU-NEXT: popl %ebx			; MCU-NEXT: popl %ebx
	; MCU-NEXT: popl %ebp			; MCU-NEXT: popl %ebp
	; MCU-NEXT: retl			; MCU-NEXT: retl
	%x = select i1 %c, <6 x i32> %src1, <6 x i32> %src2			%x = select i1 %c, <6 x i32> %src1, <6 x i32> %src2
	%val = sub <6 x i32> %x, < i32 1, i32 1, i32 1, i32 1, i32 1, i32 1 >			%val = sub <6 x i32> %x, < i32 1, i32 1, i32 1, i32 1, i32 1, i32 1 >
	▲ Show 20 Lines • Show All 923 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vec_int_to_fp.ll

Show First 20 Lines • Show All 4,355 Lines • ▼ Show 20 Lines

;		;
; Load Unsigned Integer to Float		; Load Unsigned Integer to Float
;		;

define <4 x float> @uitofp_load_4i64_to_4f32(<4 x i64> *%a) {		define <4 x float> @uitofp_load_4i64_to_4f32(<4 x i64> *%a) {
; SSE2-LABEL: uitofp_load_4i64_to_4f32:		; SSE2-LABEL: uitofp_load_4i64_to_4f32:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa (%rdi), %xmm2
; SSE2-NEXT: movdqa 16(%rdi), %xmm0		; SSE2-NEXT: movdqa 16(%rdi), %xmm0
; SSE2-NEXT: movq %xmm0, %rax		; SSE2-NEXT: movq %xmm0, %rax
; SSE2-NEXT: testq %rax, %rax		; SSE2-NEXT: testq %rax, %rax
; SSE2-NEXT: js .LBB83_1		; SSE2-NEXT: js .LBB83_1
; SSE2-NEXT: # %bb.2:		; SSE2-NEXT: # %bb.2:
; SSE2-NEXT: cvtsi2ss %rax, %xmm1		; SSE2-NEXT: cvtsi2ss %rax, %xmm1
; SSE2-NEXT: jmp .LBB83_3		; SSE2-NEXT: jmp .LBB83_3
; SSE2-NEXT: .LBB83_1:		; SSE2-NEXT: .LBB83_1:
; SSE2-NEXT: movq %rax, %rcx		; SSE2-NEXT: movq %rax, %rcx
; SSE2-NEXT: shrq %rcx		; SSE2-NEXT: shrq %rcx
; SSE2-NEXT: andl $1, %eax		; SSE2-NEXT: andl $1, %eax
; SSE2-NEXT: orq %rcx, %rax		; SSE2-NEXT: orq %rcx, %rax
; SSE2-NEXT: cvtsi2ss %rax, %xmm1		; SSE2-NEXT: cvtsi2ss %rax, %xmm1
; SSE2-NEXT: addss %xmm1, %xmm1		; SSE2-NEXT: addss %xmm1, %xmm1
; SSE2-NEXT: .LBB83_3:		; SSE2-NEXT: .LBB83_3:
		; SSE2-NEXT: movdqa (%rdi), %xmm2
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,2,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,2,3]
; SSE2-NEXT: movq %xmm0, %rax		; SSE2-NEXT: movq %xmm0, %rax
; SSE2-NEXT: testq %rax, %rax		; SSE2-NEXT: testq %rax, %rax
; SSE2-NEXT: js .LBB83_4		; SSE2-NEXT: js .LBB83_4
; SSE2-NEXT: # %bb.5:		; SSE2-NEXT: # %bb.5:
; SSE2-NEXT: cvtsi2ss %rax, %xmm3		; SSE2-NEXT: cvtsi2ss %rax, %xmm3
; SSE2-NEXT: jmp .LBB83_6		; SSE2-NEXT: jmp .LBB83_6
; SSE2-NEXT: .LBB83_4:		; SSE2-NEXT: .LBB83_4:
▲ Show 20 Lines • Show All 317 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%ld = load <4 x i8>, <4 x i8> *%a		%ld = load <4 x i8>, <4 x i8> *%a
%cvt = uitofp <4 x i8> %ld to <4 x float>		%cvt = uitofp <4 x i8> %ld to <4 x float>
ret <4 x float> %cvt		ret <4 x float> %cvt
}		}

define <8 x float> @uitofp_load_8i64_to_8f32(<8 x i64> *%a) {		define <8 x float> @uitofp_load_8i64_to_8f32(<8 x i64> *%a) {
; SSE2-LABEL: uitofp_load_8i64_to_8f32:		; SSE2-LABEL: uitofp_load_8i64_to_8f32:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: movdqa (%rdi), %xmm5
; SSE2-NEXT: movdqa 16(%rdi), %xmm0		; SSE2-NEXT: movdqa 16(%rdi), %xmm0
; SSE2-NEXT: movdqa 32(%rdi), %xmm2
; SSE2-NEXT: movdqa 48(%rdi), %xmm1
; SSE2-NEXT: movq %xmm0, %rax		; SSE2-NEXT: movq %xmm0, %rax
; SSE2-NEXT: testq %rax, %rax		; SSE2-NEXT: testq %rax, %rax
; SSE2-NEXT: js .LBB87_1		; SSE2-NEXT: js .LBB87_1
; SSE2-NEXT: # %bb.2:		; SSE2-NEXT: # %bb.2:
; SSE2-NEXT: cvtsi2ss %rax, %xmm3		; SSE2-NEXT: cvtsi2ss %rax, %xmm2
; SSE2-NEXT: jmp .LBB87_3		; SSE2-NEXT: jmp .LBB87_3
; SSE2-NEXT: .LBB87_1:		; SSE2-NEXT: .LBB87_1:
; SSE2-NEXT: movq %rax, %rcx		; SSE2-NEXT: movq %rax, %rcx
; SSE2-NEXT: shrq %rcx		; SSE2-NEXT: shrq %rcx
; SSE2-NEXT: andl $1, %eax		; SSE2-NEXT: andl $1, %eax
; SSE2-NEXT: orq %rcx, %rax		; SSE2-NEXT: orq %rcx, %rax
; SSE2-NEXT: cvtsi2ss %rax, %xmm3		; SSE2-NEXT: cvtsi2ss %rax, %xmm2
; SSE2-NEXT: addss %xmm3, %xmm3		; SSE2-NEXT: addss %xmm2, %xmm2
; SSE2-NEXT: .LBB87_3:		; SSE2-NEXT: .LBB87_3:
		; SSE2-NEXT: movdqa (%rdi), %xmm3
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,2,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,2,3]
; SSE2-NEXT: movq %xmm0, %rax		; SSE2-NEXT: movq %xmm0, %rax
; SSE2-NEXT: testq %rax, %rax		; SSE2-NEXT: testq %rax, %rax
; SSE2-NEXT: js .LBB87_4		; SSE2-NEXT: js .LBB87_4
; SSE2-NEXT: # %bb.5:		; SSE2-NEXT: # %bb.5:
; SSE2-NEXT: cvtsi2ss %rax, %xmm4		; SSE2-NEXT: cvtsi2ss %rax, %xmm1
; SSE2-NEXT: jmp .LBB87_6		; SSE2-NEXT: jmp .LBB87_6
; SSE2-NEXT: .LBB87_4:		; SSE2-NEXT: .LBB87_4:
; SSE2-NEXT: movq %rax, %rcx		; SSE2-NEXT: movq %rax, %rcx
; SSE2-NEXT: shrq %rcx		; SSE2-NEXT: shrq %rcx
; SSE2-NEXT: andl $1, %eax		; SSE2-NEXT: andl $1, %eax
; SSE2-NEXT: orq %rcx, %rax		; SSE2-NEXT: orq %rcx, %rax
; SSE2-NEXT: cvtsi2ss %rax, %xmm4		; SSE2-NEXT: cvtsi2ss %rax, %xmm1
; SSE2-NEXT: addss %xmm4, %xmm4		; SSE2-NEXT: addss %xmm1, %xmm1
; SSE2-NEXT: .LBB87_6:		; SSE2-NEXT: .LBB87_6:
; SSE2-NEXT: movq %xmm5, %rax		; SSE2-NEXT: movq %xmm3, %rax
; SSE2-NEXT: testq %rax, %rax		; SSE2-NEXT: testq %rax, %rax
; SSE2-NEXT: js .LBB87_7		; SSE2-NEXT: js .LBB87_7
; SSE2-NEXT: # %bb.8:		; SSE2-NEXT: # %bb.8:
; SSE2-NEXT: xorps %xmm0, %xmm0		; SSE2-NEXT: xorps %xmm0, %xmm0
; SSE2-NEXT: cvtsi2ss %rax, %xmm0		; SSE2-NEXT: cvtsi2ss %rax, %xmm0
; SSE2-NEXT: jmp .LBB87_9		; SSE2-NEXT: jmp .LBB87_9
; SSE2-NEXT: .LBB87_7:		; SSE2-NEXT: .LBB87_7:
; SSE2-NEXT: movq %rax, %rcx		; SSE2-NEXT: movq %rax, %rcx
; SSE2-NEXT: shrq %rcx		; SSE2-NEXT: shrq %rcx
; SSE2-NEXT: andl $1, %eax		; SSE2-NEXT: andl $1, %eax
; SSE2-NEXT: orq %rcx, %rax		; SSE2-NEXT: orq %rcx, %rax
; SSE2-NEXT: xorps %xmm0, %xmm0		; SSE2-NEXT: xorps %xmm0, %xmm0
; SSE2-NEXT: cvtsi2ss %rax, %xmm0		; SSE2-NEXT: cvtsi2ss %rax, %xmm0
; SSE2-NEXT: addss %xmm0, %xmm0		; SSE2-NEXT: addss %xmm0, %xmm0
; SSE2-NEXT: .LBB87_9:		; SSE2-NEXT: .LBB87_9:
; SSE2-NEXT: pshufd {{.*#+}} xmm5 = xmm5[2,3,2,3]		; SSE2-NEXT: movdqa 48(%rdi), %xmm6
; SSE2-NEXT: movq %xmm5, %rax		; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm3[2,3,2,3]
		; SSE2-NEXT: movq %xmm3, %rax
; SSE2-NEXT: testq %rax, %rax		; SSE2-NEXT: testq %rax, %rax
; SSE2-NEXT: js .LBB87_10		; SSE2-NEXT: js .LBB87_10
; SSE2-NEXT: # %bb.11:		; SSE2-NEXT: # %bb.11:
; SSE2-NEXT: cvtsi2ss %rax, %xmm6		; SSE2-NEXT: cvtsi2ss %rax, %xmm4
; SSE2-NEXT: jmp .LBB87_12		; SSE2-NEXT: jmp .LBB87_12
; SSE2-NEXT: .LBB87_10:		; SSE2-NEXT: .LBB87_10:
; SSE2-NEXT: movq %rax, %rcx		; SSE2-NEXT: movq %rax, %rcx
; SSE2-NEXT: shrq %rcx		; SSE2-NEXT: shrq %rcx
; SSE2-NEXT: andl $1, %eax		; SSE2-NEXT: andl $1, %eax
; SSE2-NEXT: orq %rcx, %rax		; SSE2-NEXT: orq %rcx, %rax
; SSE2-NEXT: cvtsi2ss %rax, %xmm6		; SSE2-NEXT: cvtsi2ss %rax, %xmm4
; SSE2-NEXT: addss %xmm6, %xmm6		; SSE2-NEXT: addss %xmm4, %xmm4
; SSE2-NEXT: .LBB87_12:		; SSE2-NEXT: .LBB87_12:
; SSE2-NEXT: movq %xmm1, %rax		; SSE2-NEXT: movq %xmm6, %rax
; SSE2-NEXT: testq %rax, %rax		; SSE2-NEXT: testq %rax, %rax
; SSE2-NEXT: js .LBB87_13		; SSE2-NEXT: js .LBB87_13
; SSE2-NEXT: # %bb.14:		; SSE2-NEXT: # %bb.14:
; SSE2-NEXT: xorps %xmm5, %xmm5		; SSE2-NEXT: xorps %xmm3, %xmm3
; SSE2-NEXT: cvtsi2ss %rax, %xmm5		; SSE2-NEXT: cvtsi2ss %rax, %xmm3
; SSE2-NEXT: jmp .LBB87_15		; SSE2-NEXT: jmp .LBB87_15
; SSE2-NEXT: .LBB87_13:		; SSE2-NEXT: .LBB87_13:
; SSE2-NEXT: movq %rax, %rcx		; SSE2-NEXT: movq %rax, %rcx
; SSE2-NEXT: shrq %rcx		; SSE2-NEXT: shrq %rcx
; SSE2-NEXT: andl $1, %eax		; SSE2-NEXT: andl $1, %eax
; SSE2-NEXT: orq %rcx, %rax		; SSE2-NEXT: orq %rcx, %rax
; SSE2-NEXT: xorps %xmm5, %xmm5		; SSE2-NEXT: xorps %xmm3, %xmm3
; SSE2-NEXT: cvtsi2ss %rax, %xmm5		; SSE2-NEXT: cvtsi2ss %rax, %xmm3
; SSE2-NEXT: addss %xmm5, %xmm5		; SSE2-NEXT: addss %xmm3, %xmm3
; SSE2-NEXT: .LBB87_15:		; SSE2-NEXT: .LBB87_15:
; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]		; SSE2-NEXT: movdqa 32(%rdi), %xmm5
; SSE2-NEXT: movq %xmm1, %rax		; SSE2-NEXT: pshufd {{.*#+}} xmm6 = xmm6[2,3,2,3]
		; SSE2-NEXT: movq %xmm6, %rax
; SSE2-NEXT: testq %rax, %rax		; SSE2-NEXT: testq %rax, %rax
; SSE2-NEXT: js .LBB87_16		; SSE2-NEXT: js .LBB87_16
; SSE2-NEXT: # %bb.17:		; SSE2-NEXT: # %bb.17:
; SSE2-NEXT: cvtsi2ss %rax, %xmm7		; SSE2-NEXT: xorps %xmm6, %xmm6
		; SSE2-NEXT: cvtsi2ss %rax, %xmm6
; SSE2-NEXT: jmp .LBB87_18		; SSE2-NEXT: jmp .LBB87_18
; SSE2-NEXT: .LBB87_16:		; SSE2-NEXT: .LBB87_16:
; SSE2-NEXT: movq %rax, %rcx		; SSE2-NEXT: movq %rax, %rcx
; SSE2-NEXT: shrq %rcx		; SSE2-NEXT: shrq %rcx
; SSE2-NEXT: andl $1, %eax		; SSE2-NEXT: andl $1, %eax
; SSE2-NEXT: orq %rcx, %rax		; SSE2-NEXT: orq %rcx, %rax
; SSE2-NEXT: cvtsi2ss %rax, %xmm7		; SSE2-NEXT: xorps %xmm6, %xmm6
; SSE2-NEXT: addss %xmm7, %xmm7		; SSE2-NEXT: cvtsi2ss %rax, %xmm6
		; SSE2-NEXT: addss %xmm6, %xmm6
; SSE2-NEXT: .LBB87_18:		; SSE2-NEXT: .LBB87_18:
; SSE2-NEXT: unpcklps {{.*#+}} xmm3 = xmm3[0],xmm4[0],xmm3[1],xmm4[1]		; SSE2-NEXT: unpcklps {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1]
; SSE2-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm6[0],xmm0[1],xmm6[1]		; SSE2-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm4[0],xmm0[1],xmm4[1]
; SSE2-NEXT: movq %xmm2, %rax		; SSE2-NEXT: movq %xmm5, %rax
; SSE2-NEXT: testq %rax, %rax		; SSE2-NEXT: testq %rax, %rax
; SSE2-NEXT: js .LBB87_19		; SSE2-NEXT: js .LBB87_19
; SSE2-NEXT: # %bb.20:		; SSE2-NEXT: # %bb.20:
; SSE2-NEXT: xorps %xmm1, %xmm1		; SSE2-NEXT: xorps %xmm1, %xmm1
; SSE2-NEXT: cvtsi2ss %rax, %xmm1		; SSE2-NEXT: cvtsi2ss %rax, %xmm1
; SSE2-NEXT: jmp .LBB87_21		; SSE2-NEXT: jmp .LBB87_21
; SSE2-NEXT: .LBB87_19:		; SSE2-NEXT: .LBB87_19:
; SSE2-NEXT: movq %rax, %rcx		; SSE2-NEXT: movq %rax, %rcx
; SSE2-NEXT: shrq %rcx		; SSE2-NEXT: shrq %rcx
; SSE2-NEXT: andl $1, %eax		; SSE2-NEXT: andl $1, %eax
; SSE2-NEXT: orq %rcx, %rax		; SSE2-NEXT: orq %rcx, %rax
; SSE2-NEXT: xorps %xmm1, %xmm1		; SSE2-NEXT: xorps %xmm1, %xmm1
; SSE2-NEXT: cvtsi2ss %rax, %xmm1		; SSE2-NEXT: cvtsi2ss %rax, %xmm1
; SSE2-NEXT: addss %xmm1, %xmm1		; SSE2-NEXT: addss %xmm1, %xmm1
; SSE2-NEXT: .LBB87_21:		; SSE2-NEXT: .LBB87_21:
; SSE2-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm3[0]		; SSE2-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm2[0]
; SSE2-NEXT: unpcklps {{.*#+}} xmm5 = xmm5[0],xmm7[0],xmm5[1],xmm7[1]		; SSE2-NEXT: unpcklps {{.*#+}} xmm3 = xmm3[0],xmm6[0],xmm3[1],xmm6[1]
; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm2[2,3,2,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm5[2,3,2,3]
; SSE2-NEXT: movq %xmm2, %rax		; SSE2-NEXT: movq %xmm2, %rax
; SSE2-NEXT: testq %rax, %rax		; SSE2-NEXT: testq %rax, %rax
; SSE2-NEXT: js .LBB87_22		; SSE2-NEXT: js .LBB87_22
; SSE2-NEXT: # %bb.23:		; SSE2-NEXT: # %bb.23:
; SSE2-NEXT: xorps %xmm2, %xmm2		; SSE2-NEXT: xorps %xmm2, %xmm2
; SSE2-NEXT: cvtsi2ss %rax, %xmm2		; SSE2-NEXT: cvtsi2ss %rax, %xmm2
; SSE2-NEXT: jmp .LBB87_24		; SSE2-NEXT: jmp .LBB87_24
; SSE2-NEXT: .LBB87_22:		; SSE2-NEXT: .LBB87_22:
; SSE2-NEXT: movq %rax, %rcx		; SSE2-NEXT: movq %rax, %rcx
; SSE2-NEXT: shrq %rcx		; SSE2-NEXT: shrq %rcx
; SSE2-NEXT: andl $1, %eax		; SSE2-NEXT: andl $1, %eax
; SSE2-NEXT: orq %rcx, %rax		; SSE2-NEXT: orq %rcx, %rax
; SSE2-NEXT: xorps %xmm2, %xmm2		; SSE2-NEXT: xorps %xmm2, %xmm2
; SSE2-NEXT: cvtsi2ss %rax, %xmm2		; SSE2-NEXT: cvtsi2ss %rax, %xmm2
; SSE2-NEXT: addss %xmm2, %xmm2		; SSE2-NEXT: addss %xmm2, %xmm2
; SSE2-NEXT: .LBB87_24:		; SSE2-NEXT: .LBB87_24:
; SSE2-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]		; SSE2-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
; SSE2-NEXT: movlhps {{.*#+}} xmm1 = xmm1[0],xmm5[0]		; SSE2-NEXT: movlhps {{.*#+}} xmm1 = xmm1[0],xmm3[0]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE41-LABEL: uitofp_load_8i64_to_8f32:		; SSE41-LABEL: uitofp_load_8i64_to_8f32:
; SSE41: # %bb.0:		; SSE41: # %bb.0:
; SSE41-NEXT: movdqa (%rdi), %xmm4		; SSE41-NEXT: movdqa (%rdi), %xmm4
; SSE41-NEXT: movdqa 16(%rdi), %xmm5		; SSE41-NEXT: movdqa 16(%rdi), %xmm5
; SSE41-NEXT: movdqa 32(%rdi), %xmm6		; SSE41-NEXT: movdqa 32(%rdi), %xmm6
; SSE41-NEXT: movdqa 48(%rdi), %xmm2		; SSE41-NEXT: movdqa 48(%rdi), %xmm2
▲ Show 20 Lines • Show All 1,090 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MachineSinking] sink more profitable loadsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 302198

llvm/lib/CodeGen/MachineSink.cpp

llvm/test/CodeGen/RISCV/select-optimize-multiple.ll

llvm/test/CodeGen/X86/2007-01-13-StackPtrIndex.ll

llvm/test/CodeGen/X86/MachineSink-eflags.ll

llvm/test/CodeGen/X86/avx2-masked-gather.ll

llvm/test/CodeGen/X86/cmovcmov.ll

llvm/test/CodeGen/X86/select.ll

llvm/test/CodeGen/X86/vec_int_to_fp.ll

[MachineSinking] sink more profitable loads
ClosedPublic