This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
3/4
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
arm64-narrow-st-merge.ll
-
arm64-variadic-aapcs.ll
-
fold-constants.ll
-
AMDGPU/
-
and.ll
-
cndmask-no-def-vcc.ll
-
copy-illegal-type.ll
-
cvt_f32_ubyte.ll
-
fneg-combines.ll
-
llvm.amdgcn.s.getpc.ll
-
load-constant-i16.ll
-
load-constant-i8.ll
-
load-global-i16.ll
-
load-global-i8.ll
-
load-local-i16.ll
-
r600.bitcast.ll
-
setcc.ll
-
shift-and-i64-ubfe.ll
-
shift-i64-opts.ll
-
ARM/
-
2014-01-09-pseudo_expand_implicit_reg.ll
-
vector-DAGCombine.ll
-
PowerPC/
-
no-pref-jumps.ll
-
SPARC/
-
32abi.ll
-
SystemZ/
-
selectcc-01.ll
-
selectcc-02.ll
-
X86/
-
avx512-any_extend_load.ll
-
constant-combines.ll
-
divide-by-constant.ll
-
illegal-bitfield-loadstore.ll
-
known-bits.ll
-
legalize-shift-64.ll
-
movmsk.ll
-
oddshuffles.ll
-
or-branch.ll
-
popcnt.ll
-
sse3.ll
-
urem-i8-constant.ll
-
vec_extract-mmx.ll
-
vector-sext.ll

Differential D33587

[DAGCombine] Do several rounds of combine.
Needs ReviewPublic

Authored by deadalnix on May 26 2017, 1:25 AM.

Download Raw Diff

Details

Reviewers

baldrick
hfinkel
efriedma
RKSimon
arsenm

Summary

DAGcombine does one pass over all nodes and then give up. This is suboptimal because one node's combine can expose a combine for another node.

Because some of the combines create infinite loops, we limit things to 3 rounds. Idealy, we like to continue combining as long as something is combinable, but that will require removing the combine that do and undo each others, so in the meantime, doing 3 rounds maximum seems like a good tradeof.

Diff Detail

Build Status

Buildable 6793
Build 6793: arc lint + arc unit

Event Timeline

deadalnix created this revision.May 26 2017, 1:25 AM

Herald added subscribers: javed.absar, nhaehnle, wdng and 2 others. · View Herald TranscriptMay 26 2017, 1:25 AM

Improve checks in constant_sextload_v8i16_to_v8i32 .

Harbormaster completed remote builds in B6793: Diff 100373.May 26 2017, 1:34 AM

RKSimon edited reviewers, added: efriedma; removed: eli.friedman.May 26 2017, 2:27 AM

Effects on performance? How many of these cases are just where multiple nodes were created and not added to the worklist?

@RKSimon Most of these case aren't because node are not added to the worklist, but because of pattern that are somewhat deep - such as anything depending on KnownBits . Consider the following DAG:

C
|
B
|
A

Now imagine node A is visited and no combine was found. Then B and C are visited and C is transformed into D:

D
|
B
|
A

In this scenrio, A was already visited and isn't added back to the worklist, only B is. We could recursively add uses of uses to the worklist, but it turns out this is not very efficient as it require to add half of the DAG on average back in the worklist everytime a combine is done. Plus it wouldn't catch cases of combine based on getNodeIfExists and other various cases. Simply adding direct uses catches the most common cases, and having an extra pass over all nodes catches everything else, and is more economical as long as > 2 combine are node (on average).

As for performance (I assume you meant compile time performance impact) here are the time I get for a test suite run:

Without this patch:

real    2m41.665s
user    26m33.900s
sys     2m44.728s

With this patch:

real    2m42.729s
user    26m45.772s
sys     2m42.796s

It doesn't looks like the impact is that significant and it seems worth it to me.

In D33587#765476, @deadalnix wrote:
As for performance (I assume you meant compile time performance impact) here are the time I get for a test suite run:

Without this patch:
real    2m41.665s
user    26m33.900s
sys     2m44.728s
With this patch:
real    2m42.729s
user    26m45.772s
sys     2m42.796s
It doesn't looks like the impact is that significant and it seems worth it to me.

I think for things like this, a run of llc on an LTO'd clang .bc would be better to show if there's a problem.

Thank you,
Filipe

spatel added a subscriber: spatel.May 26 2017, 7:23 AM

I usually do not work with clang. Do you have instructions I can follow to get that bc file ?

In D33587#765703, @deadalnix wrote:

I usually do not work with clang. Do you have instructions I can follow to get that bc file ?

You need to build llvm+clang as mentioned on the getting started guide, but set LLVM_ENABLE_LTO=ON on cmake.
clang was an example of "a large code-base" with LTO. I'm ok with timings from other big programs (but clang is usually easier to compare with other people).

I'm getting a bunch of

/usr/bin/ranlib: TypeDatabaseVisitor.cpp.o: plugin needed to handle lto object

When doing so.

Alright so I ended up being able to create a lto build of clang. I'm not sure how to get the bc file to do the benchmarking.

So on the full clang bc, post optimization:

Without this patch:

real    9m26.373s
user    9m24.256s
sys     0m1.948s

With this patch:

real    9m44.870s
user    9m42.484s
sys     0m2.228s

Rebase, fix merge conflicts.

niravd added a subscriber: niravd.Jun 1 2017, 9:42 AM

arsenm added inline comments.Jun 1 2017, 11:52 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1315–1318	Spelling

Spelling

Can we get same optimization results if we run DAG combiner pass multiple times instead of iterating in the pass internally?
If so, I think it is better to run the pass without the internal iteration multiple times for providing more flexibility to control the tradeoff between the code quality and the compilation time (e.g. based on optimization level).

If I read these numbers correctly, this makes Instruction selection ~ 4-5% slower on large testcases (in your case, an LTO build of clang).
This is, quite a bit, and I need requires further justifications (i.e. needs to be backed by the performance improvement we get on these testcases for the additional compile time we pay.

@inouehrs That wouldn't be the same as this will bail when no more combine is found.
@davide It's more like 3% as far as I can tell. The sad truth here, looking into it, is that there are a lot of combine that and undo themselves and most of the perf hit come from there. These transform are the very reason why i limited the number of iterations to begin with.

As far as benefit goes, it's very helpful for code that's legalized. I've been caring about this a lot lately, because I have workload that involve a lot of cryptography, and the gains are pretty substancial. In addition, I have various transform that I haven't published yet because they simply cannot kick in reliably without the mechanism in this diff. Any pattern that match over more than 2 level deep suffer from not having this patch.

One things that could be done is to only enable this when optimizations are on. We can then weed out the case that loop over time and enable it consistently when there are not so many of them anymore.

In D33587#770721, @deadalnix wrote:

@inouehrs That wouldn't be the same as this will bail when no more combine is found.
@davide It's more like 3% as far as I can tell. The sad truth here, looking into it, is that there are a lot of combine that and undo themselves and most of the perf hit come from there. These transform are the very reason why i limited the number of iterations to begin with.

As far as benefit goes, it's very helpful for code that's legalized. I've been caring about this a lot lately, because I have workload that involve a lot of cryptography, and the gains are pretty substancial. In addition, I have various transform that I haven't published yet because they simply cannot kick in reliably without the mechanism in this diff. Any pattern that match over more than 2 level deep suffer from not having this patch.

One things that could be done is to only enable this when optimizations are on. We can then weed out the case that loop over time and enable it consistently when there are not so many of them anymore.

This wouldn't actually help the worst case, i.e. LTO, when optimizations are almost always on.
I think the impact is still quite significant, and we should have numbers before trying to be more aggressive, as SelectionDAG is already very expensive.

It sounds like the underlying cause of this is related to the conversation on diamond nodes you started on llvm-dev. Reading your description I think I understnad the issue more and I believe have a solution that should fix the underlying issue without needing to loop on all nodes multiple times.

The problem with optimizations on a deeper DAG optimizations is that we cannot leverage the fact that we add both changed/new nodes and their users to guarantee that we consider the key node (i.e. the one that triggers the optimization). You can work around this by checking for the optimization from all nodes you match on. This will increase the number of times you check, but much less than revisiting all nodes. You should be able to avoid doing the check off of each node you match against, and only check alternating layers (which might let you only have to do the optimization check for the fork and join points in a diamond shape).

deadalnix mentioned this in D33840: [DAGCombine] Do not try to deduplicate commutative operations if both operand are the same..Jun 27 2017, 4:47 AM

@niravd Did you look at your alternative approach any further? @deadalnix has updated D33840 which was supposed to help reduce the performance impact but the results are mild at best.

@RKSimon I'll find a way to make that fast, or find an alternative like activating it only in some specific situations. In addition to solving my specific problem, it seems to improve numerous other things, especially for the AMD backend. In any case, I think D33840 is a good thing either way and we should proceed with it.

I'd like to ressurect this diff.

To me, it seems like the proper thing to do, at least at some optimisation level. As long as there are combine to do that we know how to do, we should be doign them.

I see a lot of people doing more and more clever pattern over time that are just not necessary as they are combinations of simpler patterns. This is a losing battle anyways because there is a combinatorial explosion. In addition, there are just many patterns that are just not very useful in isolation, but useful to put things in a canonical form that can be picked up later on. These types of transformations are not beneficial right now unless very simple.

I was able to write variosu patches that leverage this canonicalisation mechanism ad get great result for the uses cases I'm interested in (mostly cryptography, which involve a lot of big integer manipulation). I'm sure people interested in the performance of other type of code will find it beneficial as well. Ultimately I could do without this patch, but at the end, the alternative boils down to do something similar in specific cases - the ones I care about - instead of all cases, which seems like a big missed opportunity.

The results are better in numerous cases, terrific in some and there are little regressions in term of codegen quality. I'm happy to work on these regressions, but I'd like to ensure I'm not wasting my time if that patch has no chances to get in.

TL;DR: Not doing this is creating work and complexity in addition to making it prohibitively complex to do some optimisations. I think we should do this, at higher optimisation levels such as O2/O3.

Harbormaster completed remote builds in B27359: Diff 183737.Jan 26 2019, 5:42 PM

craig.topper added a subscriber: craig.topper.Jan 26 2019, 5:49 PM

craig.topper added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1362	Don't you need to reset CombineNodes on each iteration?

Have we looked into visiting the nodes bottom up instead of top down as is currently done. Would require an explicit topological sort instead of getting whatever order the previous legalizer step left us with.

Diffusion mentioned this in rL352303: [X86] Add a pattern for (i64 (and (anyext def32:), 0x00000000FFFFFFFF)) to….Jan 26 2019, 7:37 PM

I'm not sure how processing nodes bottom up really helps. Problems arise when you want to use patterns of depth > 2 because then direct parent.child are not processed again, even though such pattern may now be available. It seems to me that both top/down and bottom/up approaches would suffers from the same problem, but maybe there is something I'm missing.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1362	Both would be correct but semantic obviously differ. Let me investigate this. Good catch.

Currently we largely visit nodes before their operands. So when we simplify the operands several layers down we don’t revisit the later nodes to match patterns. But if we visited the operands first, then the first visit of the later nodes would occur after. So we wouldn’t need to revisit them.

After thinking more about this, I do not think going bottom up is a good idea. All patterns match a node + its operands, and so benefit from operand to be combined themselves already. I do not think changing all the patterns to match use rather than operands is a good idea. This is a ton of work, and this is unclear there is any benefit at all.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1362	So I was playing with reseting/not reseting this and even removing it altogether. The first thing worth noticing is that the intent here is very similar, but more limited in scope. However, result sometime differs depending on if this is executed or not. It is due to various patterns in there depends on execution order - something I've noticed before, for instance in D41235 . I do not think we need to reset it as all existing nodes are inserted at the start in the worklist, so this ends up only adding nodes to the worklist that have been created by a precedent successful combine. I notice zero codegen difference when reseting the set between iteration, so it seems to just create more work without any benefit.

For bottom up, I was referring to the order of the nodes in the initial work list. Not the patterns themselves. Right now we visit the last instruction in the basic block first. What if we visited nodes with no operands first and visited the last instruction last. That would match what IR instcombine does.

I get it now. We had a mismatched interpretation of which side is up, which side is down. If this is indeed the order in which nodes are processed, then it'd be beneficial to change this.

xbolva00 added a subscriber: xbolva00.Jan 27 2019, 10:12 AM

xbolva00 added inline comments.

test/CodeGen/X86/mmx-cvt.ll
271 ↗	(On Diff #183737)	Weird instruction increase
test/CodeGen/X86/not-and-simplify.ll
22 ↗	(On Diff #183737)	Bug?

RKSimon added inline comments.Jan 27 2019, 10:28 AM

test/CodeGen/X86/not-and-simplify.ll
22 ↗	(On Diff #183737)	Alive says its ok: https://rise4fun.com/Alive/MDFW And it replaces a load with a re-rematerializable constant

deadalnix marked 5 inline comments as done.Jan 27 2019, 6:51 PM

deadalnix added inline comments.

test/CodeGen/X86/mmx-cvt.ll
271 ↗	(On Diff #183737)	Yes. It looks like there are a few regressions, even though overall codegen looks better. I'm happy to investigate them, but I'd like to know if this is like to go forward in principle before investing too much effort that will be wasted.
test/CodeGen/X86/not-and-simplify.ll
22 ↗	(On Diff #183737)	I assume this is an improvement, right ?
test/CodeGen/X86/shift-double-x86_64.ll
16 ↗	(On Diff #183737)	I'm not sure what is going on here, I assume there is a bug somewhere.
test/CodeGen/X86/shift-double.ll
300 ↗	(On Diff #183737)	dito
test/CodeGen/X86/unfold-masked-merge-scalar-constmask-innerouter.ll
40 ↗	(On Diff #183737)	Is there any difference between these two in term of codegen ?

Have you done any compile time measurements of this? InstCombine already gets blamed for being a compile time problem. I'm worried about repeating that criticism with DAG combine.

Also do you have examples of the kinds of things that you're seeing in your workloads? Would be good to have test cases for those so we can understand them and so they don't regress in the future if we go forward with this.

Also why is only X86 showing changes from this?

deadalnix added a child revision: D57317: [DAGCombine] Deduplicate addcarry node using commutativity..Jan 28 2019, 2:34 AM

My concern about this that DAGCombine is a relatively expensive and doing it 3 times will make a nontrivial difference in large vector compute blocks. At the very minimum we'd want to disable the most expensive merge operations for most of the passes (store merge and maybe some vector combines).

The only reason this is necessary is that some transforms rely on looking deeper than it's operands to decide if it's valid which means it's triggering condition may change without it being put on the worklist. Last time this was beign discussed I suggested changing the key node in the such transforms so they were either the earliest or a user of the earliest node so DAG changes would happen before the transform was no longer considered.

Maybe we could do the dual and more aggressively add user nodes to the worklist on a combine. I've looked at a few test cases on the rebased patch and it initial glance at the debug trace seems to indicate are both due to a change that enables a SimplifyDemandedBits-based combine from a node a few steps deep. If we could add further descendants in those cases (or alternatively recenter the SimplifyBits combines on computationally earlier nodes) we may be able to fold this back down to one pass with only marginal additional work.

Hi @craig.topper ,

First about the kind of code I try to get to have better codegen, it's mostly about large integer manipulations. I already added a fair amount of reduced test cases in addcarry.ll/subcarry.ll . I'm at a stage where the pattern I have to work with are somewhat deep, see D57302 for an example. These patterns do not do anything useful if other transform cannot pick up from whee they left.

Hi @niravd ,

I wanted to collect some performance data today, but I ran into some problem to get to generate a large bc file of some existing program as LTO seems to be broken on my end for some reason. I did explore the idea of adding more descendant, for instance sext/zext seems to be valuable to punch through. But it seems to me like you would want to add more and more of these over time, and you'd end up with a complicated version of what we have here which misses opportunities.

You raise a good point though. This should only run iteration n if iteration n - 1 actually did change the DAG. You'd expect that the code shouldn't run 3 times that often, but in practice it does because there are a lot of transform that do A -> B -> A . This is why I limited to 3 and not anything more. I do think that investigating these and removing them over time is probably preferable.

I suggested changing the key node in the such transforms so they were either the earliest or a user of the earliest node so DAG changes would happen before the transform was no longer considered.

I do not think this is a very realistic path forward as there are numerous transform looking 2+ deep. As you rightly point out anything SimplifyDemandedBits based does so for instance.

I think as first step we can hide this behavior behind some flag which default to not doing the transform until we can tune things a bit more and figure out in what cases we want to do this? Would that be acceptable to you?

In D33587#1374753, @deadalnix wrote:

Hi @craig.topper ,

First about the kind of code I try to get to have better codegen, it's mostly about large integer manipulations. I already added a fair amount of reduced test cases in addcarry.ll/subcarry.ll . I'm at a stage where the pattern I have to work with are somewhat deep, see D57302 for an example. These patterns do not do anything useful if other transform cannot pick up from whee they left.

I don't see changes to addcarry.ll and subborrow.ll in this patch. So do we not have test cases from your workloads that show the benefit of this patch?

Are there non-X86 changes from this patch as well that haven't been captured here? Or is X86 somehow the only target affected by this?

In D33587#1374792, @craig.topper wrote:

I don't see changes to addcarry.ll and subborrow.ll in this patch. So do we not have test cases from your workloads that show the benefit of this patch?

That is because I have other transforms that have no effect without this patch. To be able to do anything more than what's already done, I need to linearize carries propagation as in D57302. Then I can do various transforms such as D57317 (I have others to submit). Without reworking the carry propagation, there is no hope of getting nice chains of adc (or whatever the equivalent is on the target) and without this, breaking diamond propagation doesn't work reliably as the patterns are too deep.

I can submit other patches, but at this time it only looks like it would clutter the review queue as they'd all depend on D57302 which doesn't work reliably without this one. As mentioned earlier, punching through zext/sext and other ops that often find themselves on the path of carries works as well for me, but it seems like a missed opportunity considering what we get out of SimplifyDemandedBits and alike.

Are there non-X86 changes from this patch as well that haven't been captured here? Or is X86 somehow the only target affected by this?

There are other changes. Most of them in the AMDGPU backend. I will get them sorted out before committing anything, but I would like that we decide of a path forward so I can avoid maintaining them manually for a long time. the x86 ones are easy to maintain thanks to utils/update_llc_test_checks.py

deadalnix mentioned this in D57367: [DAGCombine] Do several rounds of combine for nodes using SimplifyDemandedBits..Jan 28 2019, 7:08 PM

I was thinking about ways to reduce the overhead created by this change. I came up with D57367, which is an alternative that focuses on nodes likely to benefit from the change instead of the whole DAG. It misses several opportunity that exist in that patch, but it seems to be a tradeof worth doing.

deadalnix removed a child revision: D57317: [DAGCombine] Deduplicate addcarry node using commutativity..Jan 28 2019, 7:35 PM

deadalnix mentioned this in D57389: [X86] Improve use of SHLD/SHRD.Jan 29 2019, 7:43 AM

dmgreen added a subscriber: dmgreen.Jan 30 2019, 1:30 AM

RKSimon mentioned this in D57317: [DAGCombine] Deduplicate addcarry node using commutativity..Feb 7 2019, 4:44 AM

nikic added a subscriber: nikic.Feb 10 2019, 11:13 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2019, 11:13 AM

deadalnix mentioned this in rL355260: [X86] Improve use of SHLD/SHRD.Mar 1 2019, 6:43 PM

Diffusion mentioned this in rGf24abf651199: [X86] Improve use of SHLD/SHRD.Mar 1 2019, 6:46 PM

deadalnix added a child revision: D57317: [DAGCombine] Deduplicate addcarry node using commutativity..Jul 3 2019, 5:00 PM

deadalnix removed a child revision: D57317: [DAGCombine] Deduplicate addcarry node using commutativity..Jul 3 2019, 5:03 PM

arsenm resigned from this revision.Feb 13 2020, 4:43 PM

Herald added a subscriber: • wuzish. · View Herald TranscriptFeb 13 2020, 4:43 PM

deadalnix added a child revision: D57317: [DAGCombine] Deduplicate addcarry node using commutativity..May 1 2022, 2:11 PM

chfast added a subscriber: chfast.Aug 23 2022, 12:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2022, 12:09 AM

Herald added subscribers: steven.zhang, pengfei. · View Herald Transcript

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

153 lines

test/

CodeGen/

AArch64/

arm64-narrow-st-merge.ll

4 lines

arm64-variadic-aapcs.ll

2 lines

fold-constants.ll

4 lines

AMDGPU/

and.ll

20 lines

cndmask-no-def-vcc.ll

2 lines

copy-illegal-type.ll

6 lines

cvt_f32_ubyte.ll

3 lines

fneg-combines.ll

2 lines

llvm.amdgcn.s.getpc.ll

3 lines

33 lines

32 lines

30 lines

32 lines

1 line

3 lines

6 lines

shift-and-i64-ubfe.ll

2 lines

shift-i64-opts.ll

4 lines

ARM/

2014-01-09-pseudo_expand_implicit_reg.ll

2 lines

vector-DAGCombine.ll

1 line

PowerPC/

no-pref-jumps.ll

2 lines

SPARC/

32abi.ll

3 lines

SystemZ/

selectcc-01.ll

4 lines

selectcc-02.ll

4 lines

X86/

avx512-any_extend_load.ll

2 lines

constant-combines.ll

7 lines

divide-by-constant.ll

6 lines

illegal-bitfield-loadstore.ll

82 lines

2 lines

25 lines

5 lines

43 lines

27 lines

2 lines

3 lines

4 lines

14 lines

212 lines

Diff 100373

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,301 Lines • ▼ Show 20 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void DAGCombiner::Run(CombineLevel AtLevel) {		void DAGCombiner::Run(CombineLevel AtLevel) {
// set the instance variables, so that the various visit routines may use it.		// set the instance variables, so that the various visit routines may use it.
Level = AtLevel;		Level = AtLevel;
LegalOperations = Level >= AfterLegalizeVectorOps;		LegalOperations = Level >= AfterLegalizeVectorOps;
LegalTypes = Level >= AfterLegalizeTypes;		LegalTypes = Level >= AfterLegalizeTypes;

// Add all the dag nodes to the worklist.
for (SDNode &Node : DAG.allnodes())
AddToWorklist(&Node);

// Create a dummy node (which is not added to allnodes), that adds a reference		// Create a dummy node (which is not added to allnodes), that adds a reference
// to the root node, preventing it from being deleted, and tracking any		// to the root node, preventing it from being deleted, and tracking any
// changes of the root.		// changes of the root.
HandleSDNode Dummy(DAG.getRoot());		HandleSDNode Dummy(DAG.getRoot());

		for (unsigned Iterration = 0; Iterration < 3; Iterration++) {
		// Add all the dag nodes to the worklist.
		for (SDNode &Node : DAG.allnodes())
		AddToWorklist(&Node);
		arsenmUnsubmitted Done Reply Inline Actions Spelling arsenm: Spelling

		bool Changed = false;

// While the worklist isn't empty, find a node and try to combine it.		// While the worklist isn't empty, find a node and try to combine it.
while (!WorklistMap.empty()) {		while (!WorklistMap.empty()) {
SDNode *N;		SDNode *N;
// The Worklist holds the SDNodes in order, but it may contain null entries.		// The Worklist holds the SDNodes in order, but it may contain null entries.
do {		do {
N = Worklist.pop_back_val();		N = Worklist.pop_back_val();
} while (!N);		} while (!N);

bool GoodWorklistEntry = WorklistMap.erase(N);		bool GoodWorklistEntry = WorklistMap.erase(N);
(void)GoodWorklistEntry;		(void)GoodWorklistEntry;
assert(GoodWorklistEntry &&		assert(GoodWorklistEntry &&
"Found a worklist entry without a corresponding map entry!");		"Found a worklist entry without a corresponding map entry!");

// If N has no uses, it is dead. Make sure to revisit all N's operands once		// If N has no uses, it is dead. Make sure to revisit all N's operands once
// N is deleted from the DAG, since they too may now be dead or may have a		// N is deleted from the DAG, since they too may now be dead or may have a
// reduced number of uses, allowing other xforms.		// reduced number of uses, allowing other xforms.
if (recursivelyDeleteUnusedNodes(N))		if (recursivelyDeleteUnusedNodes(N))
continue;		continue;

WorklistRemover DeadNodes(*this);		WorklistRemover DeadNodes(*this);

// If this combine is running after legalizing the DAG, re-legalize any		// If this combine is running after legalizing the DAG, re-legalize any
// nodes pulled off the worklist.		// nodes pulled off the worklist.
if (Level == AfterLegalizeDAG) {		if (Level == AfterLegalizeDAG) {
SmallSetVector<SDNode *, 16> UpdatedNodes;		SmallSetVector<SDNode *, 16> UpdatedNodes;
bool NIsValid = DAG.LegalizeOp(N, UpdatedNodes);		bool NIsValid = DAG.LegalizeOp(N, UpdatedNodes);

for (SDNode *LN : UpdatedNodes) {		for (SDNode *LN : UpdatedNodes) {
AddToWorklist(LN);		AddToWorklist(LN);
AddUsersToWorklist(LN);		AddUsersToWorklist(LN);
}		}
if (!NIsValid)		if (!NIsValid)
continue;		continue;
}		}

DEBUG(dbgs() << "\nCombining: "; N->dump(&DAG));		DEBUG(dbgs() << "\nCombining: "; N->dump(&DAG));

// Add any operands of the new node which have not yet been combined to the		// Add any operands of the new node which have not yet been combined to
// worklist as well. Because the worklist uniques things already, this		// the worklist as well. Because the worklist uniques things already,
// won't repeatedly process the same operand.		// this won't repeatedly process the same operand.
CombinedNodes.insert(N);		CombinedNodes.insert(N);
		craig.topperUnsubmitted Not Done Reply Inline Actions Don't you need to reset CombineNodes on each iteration? craig.topper: Don't you need to reset CombineNodes on each iteration?
		deadalnixAuthorUnsubmitted Done Reply Inline Actions Both would be correct but semantic obviously differ. Let me investigate this. Good catch. deadalnix: Both would be correct but semantic obviously differ. Let me investigate this. Good catch.
		deadalnixAuthorUnsubmitted Done Reply Inline Actions So I was playing with reseting/not reseting this and even removing it altogether. The first thing worth noticing is that the intent here is very similar, but more limited in scope. However, result sometime differs depending on if this is executed or not. It is due to various patterns in there depends on execution order - something I've noticed before, for instance in D41235 . I do not think we need to reset it as all existing nodes are inserted at the start in the worklist, so this ends up only adding nodes to the worklist that have been created by a precedent successful combine. I notice zero codegen difference when reseting the set between iteration, so it seems to just create more work without any benefit. deadalnix: So I was playing with reseting/not reseting this and even removing it altogether. The first…
for (const SDValue &ChildN : N->op_values())		for (const SDValue &ChildN : N->op_values())
if (!CombinedNodes.count(ChildN.getNode()))		if (!CombinedNodes.count(ChildN.getNode()))
AddToWorklist(ChildN.getNode());		AddToWorklist(ChildN.getNode());

SDValue RV = combine(N);		SDValue RV = combine(N);

if (!RV.getNode())		if (!RV.getNode())
continue;		continue;

++NodesCombined;		++NodesCombined;
		Changed = true;

// If we get back the same node we passed in, rather than a new node or		// If we get back the same node we passed in, rather than a new node or
// zero, we know that the node must have defined multiple values and		// zero, we know that the node must have defined multiple values and
// CombineTo was used. Since CombineTo takes care of the worklist		// CombineTo was used. Since CombineTo takes care of the worklist
// mechanics for us, we have no work to do in this case.		// mechanics for us, we have no work to do in this case.
if (RV.getNode() == N)		if (RV.getNode() == N)
continue;		continue;

assert(N->getOpcode() != ISD::DELETED_NODE &&		assert(N->getOpcode() != ISD::DELETED_NODE &&
RV.getOpcode() != ISD::DELETED_NODE &&		RV.getOpcode() != ISD::DELETED_NODE &&
"Node was deleted but visit returned new node!");		"Node was deleted but visit returned new node!");

DEBUG(dbgs() << " ... into: ";		DEBUG(dbgs() << " ... into: ";
RV.getNode()->dump(&DAG));		RV.getNode()->dump(&DAG));

if (N->getNumValues() == RV.getNode()->getNumValues())		if (N->getNumValues() == RV.getNode()->getNumValues())
DAG.ReplaceAllUsesWith(N, RV.getNode());		DAG.ReplaceAllUsesWith(N, RV.getNode());
else {		else {
assert(N->getValueType(0) == RV.getValueType() &&		assert(N->getValueType(0) == RV.getValueType() &&
N->getNumValues() == 1 && "Type mismatch");		N->getNumValues() == 1 && "Type mismatch");
DAG.ReplaceAllUsesWith(N, &RV);		DAG.ReplaceAllUsesWith(N, &RV);
}		}

// Push the new node and any users onto the worklist		// Push the new node and any users onto the worklist
AddToWorklist(RV.getNode());		AddToWorklist(RV.getNode());
AddUsersToWorklist(RV.getNode());		AddUsersToWorklist(RV.getNode());

// Finally, if the node is now dead, remove it from the graph. The node		// Finally, if the node is now dead, remove it from the graph. The node
// may not be dead if the replacement process recursively simplified to		// may not be dead if the replacement process recursively simplified to
// something else needing this node. This will also take care of adding any		// something else needing this node. This will also take care of adding any
// operands which have lost a user to the worklist.		// operands which have lost a user to the worklist.
recursivelyDeleteUnusedNodes(N);		recursivelyDeleteUnusedNodes(N);
}		}

		if (!Changed)
		break;
		}

// If the root changed (e.g. it was a dead load, update the root).		// If the root changed (e.g. it was a dead load, update the root).
DAG.setRoot(Dummy.getValue());		DAG.setRoot(Dummy.getValue());
DAG.RemoveDeadNodes();		DAG.RemoveDeadNodes();
}		}

SDValue DAGCombiner::visit(SDNode *N) {		SDValue DAGCombiner::visit(SDNode *N) {
switch (N->getOpcode()) {		switch (N->getOpcode()) {
default: break;		default: break;
▲ Show 20 Lines • Show All 4,236 Lines • ▼ Show 20 Lines	if (ConstantSDNode *LargeShift = isConstOrConstSplat(N0Op0.getOperand(1))) {
}		}
}		}
}		}

// Simplify, based on bits shifted out of the LHS.		// Simplify, based on bits shifted out of the LHS.
if (N1C && SimplifyDemandedBits(SDValue(N, 0)))		if (N1C && SimplifyDemandedBits(SDValue(N, 0)))
return SDValue(N, 0);		return SDValue(N, 0);


// If the sign bit is known to be zero, switch this to a SRL.		// If the sign bit is known to be zero, switch this to a SRL.
if (DAG.SignBitIsZero(N0))		if (DAG.SignBitIsZero(N0))
return DAG.getNode(ISD::SRL, SDLoc(N), VT, N0, N1);		return DAG.getNode(ISD::SRL, SDLoc(N), VT, N0, N1);

if (N1C && !N1C->isOpaque())		if (N1C && !N1C->isOpaque())
if (SDValue NewSRA = visitShiftByConstant(N, N1C))		if (SDValue NewSRA = visitShiftByConstant(N, N1C))
return NewSRA;		return NewSRA;

▲ Show 20 Lines • Show All 10,987 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-narrow-st-merge.ll

Show All 13 Lines	entry:
%add = add nsw i32 %n, 1		%add = add nsw i32 %n, 1
%idxprom1 = sext i32 %add to i64		%idxprom1 = sext i32 %add to i64
%arrayidx2 = getelementptr inbounds i16, i16* %P, i64 %idxprom1		%arrayidx2 = getelementptr inbounds i16, i16* %P, i64 %idxprom1
store i16 0, i16* %arrayidx2		store i16 0, i16* %arrayidx2
ret void		ret void
}		}

; CHECK-LABEL: Strh_zero_4		; CHECK-LABEL: Strh_zero_4
; CHECK: stp wzr, wzr		; CHECK: str xzr
; CHECK-STRICT-LABEL: Strh_zero_4		; CHECK-STRICT-LABEL: Strh_zero_4
; CHECK-STRICT: strh wzr		; CHECK-STRICT: strh wzr
; CHECK-STRICT: strh wzr		; CHECK-STRICT: strh wzr
; CHECK-STRICT: strh wzr		; CHECK-STRICT: strh wzr
; CHECK-STRICT: strh wzr		; CHECK-STRICT: strh wzr
define void @Strh_zero_4(i16* nocapture %P, i32 %n) {		define void @Strh_zero_4(i16* nocapture %P, i32 %n) {
entry:		entry:
%idxprom = sext i32 %n to i64		%idxprom = sext i32 %n to i64
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	entry:
%sub1 = add nsw i32 %n, -3		%sub1 = add nsw i32 %n, -3
%idxprom2 = sext i32 %sub1 to i64		%idxprom2 = sext i32 %sub1 to i64
%arrayidx3 = getelementptr inbounds i16, i16* %P, i64 %idxprom2		%arrayidx3 = getelementptr inbounds i16, i16* %P, i64 %idxprom2
store i16 0, i16* %arrayidx3		store i16 0, i16* %arrayidx3
ret void		ret void
}		}

; CHECK-LABEL: Sturh_zero_4		; CHECK-LABEL: Sturh_zero_4
; CHECK: stp wzr, wzr		; CHECK: stur xzr
; CHECK-STRICT-LABEL: Sturh_zero_4		; CHECK-STRICT-LABEL: Sturh_zero_4
; CHECK-STRICT: sturh wzr		; CHECK-STRICT: sturh wzr
; CHECK-STRICT: sturh wzr		; CHECK-STRICT: sturh wzr
; CHECK-STRICT: sturh wzr		; CHECK-STRICT: sturh wzr
; CHECK-STRICT: sturh wzr		; CHECK-STRICT: sturh wzr
define void @Sturh_zero_4(i16* nocapture %P, i32 %n) {		define void @Sturh_zero_4(i16* nocapture %P, i32 %n) {
entry:		entry:
%sub = add nsw i32 %n, -3		%sub = add nsw i32 %n, -3
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-variadic-aapcs.ll

Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	; CHECK: str [[STACK]], [x[[VAR]]]

ret void		ret void
}		}

; If there are non-variadic arguments on the stack (here two i64s) then the		; If there are non-variadic arguments on the stack (here two i64s) then the
; __stack field should point just past them.		; __stack field should point just past them.
define void @test_offsetstack([8 x i64], [2 x i64], [3 x float], ...) {		define void @test_offsetstack([8 x i64], [2 x i64], [3 x float], ...) {
; CHECK-LABEL: test_offsetstack:		; CHECK-LABEL: test_offsetstack:
; CHECK: stp {{q[0-9]+}}, {{q[0-9]+}}, [sp, #-80]!
; CHECK: add [[STACK_TOP:x[0-9]+]], sp, #96		; CHECK: add [[STACK_TOP:x[0-9]+]], sp, #96
; CHECK: add x[[VAR:[0-9]+]], {{x[0-9]+}}, :lo12:var		; CHECK: add x[[VAR:[0-9]+]], {{x[0-9]+}}, :lo12:var
		; CHECK: stp {{q[0-9]+}}, {{q[0-9]+}}, [sp]
; CHECK: str [[STACK_TOP]], [x[[VAR]]]		; CHECK: str [[STACK_TOP]], [x[[VAR]]]

%addr = bitcast %va_list* @var to i8*		%addr = bitcast %va_list* @var to i8*
call void @llvm.va_start(i8* %addr)		call void @llvm.va_start(i8* %addr)
ret void		ret void
}		}

declare void @llvm.va_end(i8*)		declare void @llvm.va_end(i8*)
Show All 33 Lines

test/CodeGen/AArch64/fold-constants.ll

Show All 14 Lines	entry:
%4 = bitcast <4 x i16> %vset_lane to <1 x i64>		%4 = bitcast <4 x i16> %vset_lane to <1 x i64>
%vget_lane = extractelement <1 x i64> %4, i32 0		%vget_lane = extractelement <1 x i64> %4, i32 0
ret i64 %vget_lane		ret i64 %vget_lane
}		}

; PR25763 - folding constant vector comparisons with sign-extended result		; PR25763 - folding constant vector comparisons with sign-extended result
define <8 x i16> @dotests_458() {		define <8 x i16> @dotests_458() {
; CHECK-LABEL: dotests_458		; CHECK-LABEL: dotests_458
; CHECK: movi d0, #0x00000000ff0000		; CHECK: adrp x8, .LCPI1_0
; CHECK-NEXT: sshll v0.8h, v0.8b, #0		; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI1_0]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%vclz_v.i = call <8 x i8> @llvm.ctlz.v8i8(<8 x i8> <i8 127, i8 38, i8 -1, i8 -128, i8 127, i8 0, i8 0, i8 0>, i1 false) #6		%vclz_v.i = call <8 x i8> @llvm.ctlz.v8i8(<8 x i8> <i8 127, i8 38, i8 -1, i8 -128, i8 127, i8 0, i8 0, i8 0>, i1 false) #6
%vsra_n = lshr <8 x i8> %vclz_v.i, <i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5>		%vsra_n = lshr <8 x i8> %vclz_v.i, <i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5, i8 5>
%name_6 = or <8 x i8> %vsra_n, <i8 127, i8 -128, i8 -1, i8 67, i8 84, i8 127, i8 -1, i8 0>		%name_6 = or <8 x i8> %vsra_n, <i8 127, i8 -128, i8 -1, i8 67, i8 84, i8 127, i8 -1, i8 0>
%cmp.i603 = icmp slt <8 x i8> %name_6, <i8 -57, i8 -128, i8 127, i8 -128, i8 -1, i8 0, i8 -1, i8 -1>		%cmp.i603 = icmp slt <8 x i8> %name_6, <i8 -57, i8 -128, i8 127, i8 -128, i8 -1, i8 0, i8 -1, i8 -1>
%vmovl.i4.i = sext <8 x i1> %cmp.i603 to <8 x i16>		%vmovl.i4.i = sext <8 x i1> %cmp.i603 to <8 x i16>
ret <8 x i16> %vmovl.i4.i		ret <8 x i16> %vmovl.i4.i
}		}
declare <8 x i8> @llvm.ctlz.v8i8(<8 x i8>, i1)		declare <8 x i8> @llvm.ctlz.v8i8(<8 x i8>, i1)

test/CodeGen/AMDGPU/and.ll

Show First 20 Lines • Show All 376 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @s_and_inline_imm_1_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {
store i64 %and, i64 addrspace(1)* %out, align 8		store i64 %and, i64 addrspace(1)* %out, align 8
ret void		ret void
}		}

; FUNC-LABEL: {{^}}s_and_inline_imm_1.0_i64		; FUNC-LABEL: {{^}}s_and_inline_imm_1.0_i64
; XSI: s_and_b64 s{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 1.0		; XSI: s_and_b64 s{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 1.0

; SI: s_load_dwordx2		; SI: s_load_dwordx2
; SI: s_load_dwordx2		; SI: s_load_dword
; SI-NOT: and		; SI-NOT: and
; SI: s_and_b32 {{s[0-9]+}}, {{s[0-9]+}}, 0x3ff00000		; SI: s_and_b32 {{s[0-9]+}}, {{s[0-9]+}}, 0x3ff00000
; SI-NOT: and		; SI-NOT: and
; SI: buffer_store_dwordx2		; SI: buffer_store_dwordx2
define amdgpu_kernel void @s_and_inline_imm_1.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {		define amdgpu_kernel void @s_and_inline_imm_1.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {
%and = and i64 %a, 4607182418800017408		%and = and i64 %a, 4607182418800017408
store i64 %and, i64 addrspace(1)* %out, align 8		store i64 %and, i64 addrspace(1)* %out, align 8
ret void		ret void
}		}

; FUNC-LABEL: {{^}}s_and_inline_imm_neg_1.0_i64		; FUNC-LABEL: {{^}}s_and_inline_imm_neg_1.0_i64
; XSI: s_and_b64 s{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, -1.0		; XSI: s_and_b64 s{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, -1.0

; SI: s_load_dwordx2		; SI: s_load_dwordx2
; SI: s_load_dwordx2		; SI: s_load_dword
; SI-NOT: and		; SI-NOT: and
; SI: s_and_b32 {{s[0-9]+}}, {{s[0-9]+}}, 0xbff00000		; SI: s_and_b32 {{s[0-9]+}}, {{s[0-9]+}}, 0xbff00000
; SI-NOT: and		; SI-NOT: and
; SI: buffer_store_dwordx2		; SI: buffer_store_dwordx2
define amdgpu_kernel void @s_and_inline_imm_neg_1.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {		define amdgpu_kernel void @s_and_inline_imm_neg_1.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {
%and = and i64 %a, 13830554455654793216		%and = and i64 %a, 13830554455654793216
store i64 %and, i64 addrspace(1)* %out, align 8		store i64 %and, i64 addrspace(1)* %out, align 8
ret void		ret void
}		}

; FUNC-LABEL: {{^}}s_and_inline_imm_0.5_i64		; FUNC-LABEL: {{^}}s_and_inline_imm_0.5_i64
; XSI: s_and_b64 s{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0.5		; XSI: s_and_b64 s{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0.5

; SI: s_load_dwordx2		; SI: s_load_dwordx2
; SI: s_load_dwordx2		; SI: s_load_dword
; SI-NOT: and		; SI-NOT: and
; SI: s_and_b32 {{s[0-9]+}}, {{s[0-9]+}}, 0x3fe00000		; SI: s_and_b32 {{s[0-9]+}}, {{s[0-9]+}}, 0x3fe00000
; SI-NOT: and		; SI-NOT: and
; SI: buffer_store_dwordx2		; SI: buffer_store_dwordx2
define amdgpu_kernel void @s_and_inline_imm_0.5_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {		define amdgpu_kernel void @s_and_inline_imm_0.5_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {
%and = and i64 %a, 4602678819172646912		%and = and i64 %a, 4602678819172646912
store i64 %and, i64 addrspace(1)* %out, align 8		store i64 %and, i64 addrspace(1)* %out, align 8
ret void		ret void
}		}

; FUNC-LABEL: {{^}}s_and_inline_imm_neg_0.5_i64:		; FUNC-LABEL: {{^}}s_and_inline_imm_neg_0.5_i64:
; XSI: s_and_b64 s{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, -0.5		; XSI: s_and_b64 s{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, -0.5

; SI: s_load_dwordx2		; SI: s_load_dwordx2
; SI: s_load_dwordx2		; SI: s_load_dword
; SI-NOT: and		; SI-NOT: and
; SI: s_and_b32 {{s[0-9]+}}, {{s[0-9]+}}, 0xbfe00000		; SI: s_and_b32 {{s[0-9]+}}, {{s[0-9]+}}, 0xbfe00000
; SI-NOT: and		; SI-NOT: and
; SI: buffer_store_dwordx2		; SI: buffer_store_dwordx2
define amdgpu_kernel void @s_and_inline_imm_neg_0.5_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {		define amdgpu_kernel void @s_and_inline_imm_neg_0.5_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {
%and = and i64 %a, 13826050856027422720		%and = and i64 %a, 13826050856027422720
store i64 %and, i64 addrspace(1)* %out, align 8		store i64 %and, i64 addrspace(1)* %out, align 8
ret void		ret void
}		}

; FUNC-LABEL: {{^}}s_and_inline_imm_2.0_i64:		; FUNC-LABEL: {{^}}s_and_inline_imm_2.0_i64:
; SI: s_load_dwordx2		; SI: s_load_dwordx2
; SI: s_load_dwordx2		; SI: s_load_dword
; SI-NOT: and		; SI-NOT: and
; SI: s_and_b32 {{s[0-9]+}}, {{s[0-9]+}}, 2.0		; SI: s_and_b32 {{s[0-9]+}}, {{s[0-9]+}}, 2.0
; SI-NOT: and		; SI-NOT: and
; SI: buffer_store_dwordx2		; SI: buffer_store_dwordx2
define amdgpu_kernel void @s_and_inline_imm_2.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {		define amdgpu_kernel void @s_and_inline_imm_2.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {
%and = and i64 %a, 4611686018427387904		%and = and i64 %a, 4611686018427387904
store i64 %and, i64 addrspace(1)* %out, align 8		store i64 %and, i64 addrspace(1)* %out, align 8
ret void		ret void
}		}

; FUNC-LABEL: {{^}}s_and_inline_imm_neg_2.0_i64:		; FUNC-LABEL: {{^}}s_and_inline_imm_neg_2.0_i64:
; SI: s_load_dwordx2		; SI: s_load_dwordx2
; SI: s_load_dwordx2		; SI: s_load_dword
; SI-NOT: and		; SI-NOT: and
; SI: s_and_b32 {{s[0-9]+}}, {{s[0-9]+}}, -2.0		; SI: s_and_b32 {{s[0-9]+}}, {{s[0-9]+}}, -2.0
; SI-NOT: and		; SI-NOT: and
; SI: buffer_store_dwordx2		; SI: buffer_store_dwordx2
define amdgpu_kernel void @s_and_inline_imm_neg_2.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {		define amdgpu_kernel void @s_and_inline_imm_neg_2.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {
%and = and i64 %a, 13835058055282163712		%and = and i64 %a, 13835058055282163712
store i64 %and, i64 addrspace(1)* %out, align 8		store i64 %and, i64 addrspace(1)* %out, align 8
ret void		ret void
}		}

; FUNC-LABEL: {{^}}s_and_inline_imm_4.0_i64:		; FUNC-LABEL: {{^}}s_and_inline_imm_4.0_i64:
; XSI: s_and_b64 s{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 4.0		; XSI: s_and_b64 s{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 4.0

; SI: s_load_dwordx2		; SI: s_load_dwordx2
; SI: s_load_dwordx2		; SI: s_load_dword
; SI-NOT: and		; SI-NOT: and
; SI: s_and_b32 {{s[0-9]+}}, {{s[0-9]+}}, 0x40100000		; SI: s_and_b32 {{s[0-9]+}}, {{s[0-9]+}}, 0x40100000
; SI-NOT: and		; SI-NOT: and
; SI: buffer_store_dwordx2		; SI: buffer_store_dwordx2
define amdgpu_kernel void @s_and_inline_imm_4.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {		define amdgpu_kernel void @s_and_inline_imm_4.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {
%and = and i64 %a, 4616189618054758400		%and = and i64 %a, 4616189618054758400
store i64 %and, i64 addrspace(1)* %out, align 8		store i64 %and, i64 addrspace(1)* %out, align 8
ret void		ret void
}		}

; FUNC-LABEL: {{^}}s_and_inline_imm_neg_4.0_i64:		; FUNC-LABEL: {{^}}s_and_inline_imm_neg_4.0_i64:
; XSI: s_and_b64 s{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, -4.0		; XSI: s_and_b64 s{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, -4.0

; SI: s_load_dwordx2		; SI: s_load_dwordx2
; SI: s_load_dwordx2		; SI: s_load_dword
; SI-NOT: and		; SI-NOT: and
; SI: s_and_b32 {{s[0-9]+}}, {{s[0-9]+}}, 0xc0100000		; SI: s_and_b32 {{s[0-9]+}}, {{s[0-9]+}}, 0xc0100000
; SI-NOT: and		; SI-NOT: and
; SI: buffer_store_dwordx2		; SI: buffer_store_dwordx2
define amdgpu_kernel void @s_and_inline_imm_neg_4.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {		define amdgpu_kernel void @s_and_inline_imm_neg_4.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {
%and = and i64 %a, 13839561654909534208		%and = and i64 %a, 13839561654909534208
store i64 %and, i64 addrspace(1)* %out, align 8		store i64 %and, i64 addrspace(1)* %out, align 8
ret void		ret void
Show All 26 Lines
define amdgpu_kernel void @s_and_inline_imm_f32_neg_4.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {		define amdgpu_kernel void @s_and_inline_imm_f32_neg_4.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {
%and = and i64 %a, -1065353216		%and = and i64 %a, -1065353216
store i64 %and, i64 addrspace(1)* %out, align 8		store i64 %and, i64 addrspace(1)* %out, align 8
ret void		ret void
}		}

; Shift into upper 32-bits		; Shift into upper 32-bits
; SI: s_load_dwordx2		; SI: s_load_dwordx2
; SI: s_load_dwordx2		; SI: s_load_dword
; SI-NOT: and		; SI-NOT: and
; SI: s_and_b32 s[[K_HI:[0-9]+]], s{{[0-9]+}}, 4.0		; SI: s_and_b32 s[[K_HI:[0-9]+]], s{{[0-9]+}}, 4.0
; SI-NOT: and		; SI-NOT: and
; SI: buffer_store_dwordx2		; SI: buffer_store_dwordx2
define amdgpu_kernel void @s_and_inline_high_imm_f32_4.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {		define amdgpu_kernel void @s_and_inline_high_imm_f32_4.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {
%and = and i64 %a, 4647714815446351872		%and = and i64 %a, 4647714815446351872
store i64 %and, i64 addrspace(1)* %out, align 8		store i64 %and, i64 addrspace(1)* %out, align 8
ret void		ret void
}		}

; FUNC-LABEL: {{^}}s_and_inline_high_imm_f32_neg_4.0_i64:		; FUNC-LABEL: {{^}}s_and_inline_high_imm_f32_neg_4.0_i64:
; SI: s_load_dwordx2		; SI: s_load_dwordx2
; SI: s_load_dwordx2		; SI: s_load_dword
; SI-NOT: and		; SI-NOT: and
; SI: s_and_b32 s[[K_HI:[0-9]+]], s{{[0-9]+}}, -4.0		; SI: s_and_b32 s[[K_HI:[0-9]+]], s{{[0-9]+}}, -4.0
; SI-NOT: and		; SI-NOT: and
; SI: buffer_store_dwordx2		; SI: buffer_store_dwordx2
define amdgpu_kernel void @s_and_inline_high_imm_f32_neg_4.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {		define amdgpu_kernel void @s_and_inline_high_imm_f32_neg_4.0_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %aptr, i64 %a) {
%and = and i64 %a, 13871086852301127680		%and = and i64 %a, 13871086852301127680
store i64 %and, i64 addrspace(1)* %out, align 8		store i64 %and, i64 addrspace(1)* %out, align 8
ret void		ret void
}		}

attributes #0 = { nounwind readnone }		attributes #0 = { nounwind readnone }

test/CodeGen/AMDGPU/cndmask-no-def-vcc.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	declare i1 @llvm.amdgcn.class.f32(float, i32)			declare i1 @llvm.amdgcn.class.f32(float, i32)

	; Produces error after adding an implicit def to v_cndmask_b32			; Produces error after adding an implicit def to v_cndmask_b32

	; GCN-LABEL: {{^}}vcc_shrink_vcc_def:			; GCN-LABEL: {{^}}vcc_shrink_vcc_def:
	; GCN: v_cmp_eq_u32_e64 vcc, s{{[0-9]+}}, 0{{$}}			; GCN: v_cmp_eq_u32_e64 vcc, s{{[0-9]+}}, 0{{$}}
	; GCN: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}, vcc			; GCN: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}, vcc
	; GCN: v_cndmask_b32_e64 v0, 0, 1, s{{\[[0-9]+:[0-9]+\]}}
	define amdgpu_kernel void @vcc_shrink_vcc_def(float %arg, i32 %arg1, float %arg2, i32 %arg3) {			define amdgpu_kernel void @vcc_shrink_vcc_def(float %arg, i32 %arg1, float %arg2, i32 %arg3) {
	bb0:			bb0:
	%tmp = icmp sgt i32 %arg1, 4			%tmp = icmp sgt i32 %arg1, 4
	%c = icmp eq i32 %arg3, 0			%c = icmp eq i32 %arg3, 0
	%tmp4 = select i1 %c, float %arg, float 1.000000e+00			%tmp4 = select i1 %c, float %arg, float 1.000000e+00
	%tmp5 = fcmp ogt float %arg2, 0.000000e+00			%tmp5 = fcmp ogt float %arg2, 0.000000e+00
	%tmp6 = fcmp olt float %arg2, 1.000000e+00			%tmp6 = fcmp olt float %arg2, 1.000000e+00
	%tmp7 = fcmp olt float %arg, %tmp4			%tmp7 = fcmp olt float %arg, %tmp4
	Show All 10 Lines
	}			}

	; The undef flag on the condition src must be preserved on the			; The undef flag on the condition src must be preserved on the
	; implicit vcc use to avoid verifier errors.			; implicit vcc use to avoid verifier errors.

	; GCN-LABEL: {{^}}preserve_condition_undef_flag:			; GCN-LABEL: {{^}}preserve_condition_undef_flag:
	; GCN-NOT: vcc			; GCN-NOT: vcc
	; GCN: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}, vcc			; GCN: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}, vcc
	; GCN: v_cndmask_b32_e64 v0, 0, 1, s{{\[[0-9]+:[0-9]+\]}}
	define amdgpu_kernel void @preserve_condition_undef_flag(float %arg, i32 %arg1, float %arg2) {			define amdgpu_kernel void @preserve_condition_undef_flag(float %arg, i32 %arg1, float %arg2) {
	bb0:			bb0:
	%tmp = icmp sgt i32 %arg1, 4			%tmp = icmp sgt i32 %arg1, 4
	%undef = call i1 @llvm.amdgcn.class.f32(float undef, i32 undef)			%undef = call i1 @llvm.amdgcn.class.f32(float undef, i32 undef)
	%tmp4 = select i1 %undef, float %arg, float 1.000000e+00			%tmp4 = select i1 %undef, float %arg, float 1.000000e+00
	%tmp5 = fcmp ogt float %arg2, 0.000000e+00			%tmp5 = fcmp ogt float %arg2, 0.000000e+00
	%tmp6 = fcmp olt float %arg2, 1.000000e+00			%tmp6 = fcmp olt float %arg2, 1.000000e+00
	%tmp7 = fcmp olt float %arg, %tmp4			%tmp7 = fcmp olt float %arg, %tmp4
	Show All 11 Lines

test/CodeGen/AMDGPU/copy-illegal-type.ll

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @test_copy_v4i8_x4(<4 x i8> addrspace(1)* %out0, <4 x i8> addrspace(1)* %out1, <4 x i8> addrspace(1)* %out2, <4 x i8> addrspace(1)* %out3, <4 x i8> addrspace(1)* %in) nounwind {
store <4 x i8> %val, <4 x i8> addrspace(1)* %out2, align 4		store <4 x i8> %val, <4 x i8> addrspace(1)* %out2, align 4
store <4 x i8> %val, <4 x i8> addrspace(1)* %out3, align 4		store <4 x i8> %val, <4 x i8> addrspace(1)* %out3, align 4
ret void		ret void
}		}

; FUNC-LABEL: {{^}}test_copy_v4i8_extra_use:		; FUNC-LABEL: {{^}}test_copy_v4i8_extra_use:
; GCN: buffer_load_dword		; GCN: buffer_load_dword
; GCN-DAG: v_lshrrev_b32		; GCN-DAG: v_lshrrev_b32
; GCN: v_and_b32		; SI: v_and_b32
; GCN: v_or_b32		; SI: v_or_b32
		; VI: v_or_b32_sdwa
		; VI: v_or_b32_sdwa
; GCN-DAG: buffer_store_dword		; GCN-DAG: buffer_store_dword
; GCN-DAG: buffer_store_dword		; GCN-DAG: buffer_store_dword

; GCN: s_endpgm		; GCN: s_endpgm
define amdgpu_kernel void @test_copy_v4i8_extra_use(<4 x i8> addrspace(1)* %out0, <4 x i8> addrspace(1)* %out1, <4 x i8> addrspace(1)* %in) nounwind {		define amdgpu_kernel void @test_copy_v4i8_extra_use(<4 x i8> addrspace(1)* %out0, <4 x i8> addrspace(1)* %out1, <4 x i8> addrspace(1)* %in) nounwind {
%val = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 4		%val = load <4 x i8>, <4 x i8> addrspace(1)* %in, align 4
%add = add <4 x i8> %val, <i8 9, i8 9, i8 9, i8 9>		%add = add <4 x i8> %val, <i8 9, i8 9, i8 9, i8 9>
store <4 x i8> %val, <4 x i8> addrspace(1)* %out0, align 4		store <4 x i8> %val, <4 x i8> addrspace(1)* %out0, align 4
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/cvt_f32_ubyte.ll

	Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines

	; GCN-LABEL: {{^}}load_v4i8_to_v4f32_2_uses:			; GCN-LABEL: {{^}}load_v4i8_to_v4f32_2_uses:
	; GCN: {{buffer\|flat}}_load_dword			; GCN: {{buffer\|flat}}_load_dword
	; GCN-DAG: v_cvt_f32_ubyte0_e32			; GCN-DAG: v_cvt_f32_ubyte0_e32
	; GCN-DAG: v_cvt_f32_ubyte1_e32			; GCN-DAG: v_cvt_f32_ubyte1_e32
	; GCN-DAG: v_cvt_f32_ubyte2_e32			; GCN-DAG: v_cvt_f32_ubyte2_e32
	; GCN-DAG: v_cvt_f32_ubyte3_e32			; GCN-DAG: v_cvt_f32_ubyte3_e32

	; GCN-DAG: v_lshrrev_b32_e32 v{{[0-9]+}}, 24
	; GCN-DAG: v_lshrrev_b32_e32 v{{[0-9]+}}, 16			; GCN-DAG: v_lshrrev_b32_e32 v{{[0-9]+}}, 16

	; SI-DAG: v_lshlrev_b32_e32 v{{[0-9]+}}, 16			; SI-DAG: v_lshlrev_b32_e32 v{{[0-9]+}}, 16
	; SI-DAG: v_lshlrev_b32_e32 v{{[0-9]+}}, 8
	; SI-DAG: v_and_b32_e32 v{{[0-9]+}}, 0xffff,			; SI-DAG: v_and_b32_e32 v{{[0-9]+}}, 0xffff,
	; SI-DAG: v_and_b32_e32 v{{[0-9]+}}, 0xff00,
	; SI-DAG: v_add_i32			; SI-DAG: v_add_i32

	; VI-DAG: v_and_b32_e32 v{{[0-9]+}}, 0xffffff00,			; VI-DAG: v_and_b32_e32 v{{[0-9]+}}, 0xffffff00,
	; VI-DAG: v_add_u16_e32			; VI-DAG: v_add_u16_e32
	; VI-DAG: v_add_u16_e32			; VI-DAG: v_add_u16_e32

	; GCN: {{buffer\|flat}}_store_dwordx4			; GCN: {{buffer\|flat}}_store_dwordx4
	; GCN: {{buffer\|flat}}_store_dword			; GCN: {{buffer\|flat}}_store_dword
	▲ Show 20 Lines • Show All 144 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/fneg-combines.ll

	Show First 20 Lines • Show All 998 Lines • ▼ Show 20 Lines
	; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]
	; GCN: {{buffer\|flat}}_load_dword [[B:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_dword [[B:v[0-9]+]]
	; GCN: {{buffer\|flat}}_load_dword [[C:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_dword [[C:v[0-9]+]]

	; GCN-SAFE: v_mac_f32_e32 [[C]], [[B]], [[A]]			; GCN-SAFE: v_mac_f32_e32 [[C]], [[B]], [[A]]
	; GCN-SAFE: v_xor_b32_e32 [[NEG_MAD:v[0-9]+]], 0x80000000, [[C]]			; GCN-SAFE: v_xor_b32_e32 [[NEG_MAD:v[0-9]+]], 0x80000000, [[C]]
	; GCN-SAFE-NEXT: v_mul_f32_e32 [[MUL:v[0-9]+]], 4.0, [[C]]			; GCN-SAFE-NEXT: v_mul_f32_e32 [[MUL:v[0-9]+]], 4.0, [[C]]

	; GCN-NSZ: v_mad_f32 [[NEG_MAD:v[0-9]+]], -[[A]], [[B]], -[[C]]			; GCN-NSZ: v_mad_f32 [[NEG_MAD:v[0-9]+]], [[A]], -[[B]], -[[C]]
	; GCN-NSZ-NEXT: v_mul_f32_e32 [[MUL:v[0-9]+]], -4.0, [[NEG_MAD]]			; GCN-NSZ-NEXT: v_mul_f32_e32 [[MUL:v[0-9]+]], -4.0, [[NEG_MAD]]

	; GCN: buffer_store_dword [[NEG_MAD]]			; GCN: buffer_store_dword [[NEG_MAD]]
	; GCN-NEXT: buffer_store_dword [[MUL]]			; GCN-NEXT: buffer_store_dword [[MUL]]
	define amdgpu_kernel void @v_fneg_fmad_multi_use_fmad_f32(float addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr, float addrspace(1)* %c.ptr) #0 {			define amdgpu_kernel void @v_fneg_fmad_multi_use_fmad_f32(float addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr, float addrspace(1)* %c.ptr) #0 {
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%tid.ext = sext i32 %tid to i64			%tid.ext = sext i32 %tid to i64
	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext			%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
	▲ Show 20 Lines • Show All 1,118 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	declare i64 @llvm.amdgcn.s.getpc() #0			declare i64 @llvm.amdgcn.s.getpc() #0

	; GCN-LABEL: {{^}}test_s_getpc:			; GCN-LABEL: {{^}}test_s_getpc:
	; GCN: s_load_dwordx2			; GCN: v_mov_b32_e32
	; GCN-DAG: s_getpc_b64 s{{\[[0-9]+:[0-9]+\]}}			; GCN-DAG: s_getpc_b64 s{{\[[0-9]+:[0-9]+\]}}
	; GCN: buffer_store_dwordx2			; GCN: buffer_store_dwordx2
	define void @test_s_getpc(i64 addrspace(1)* %out) #0 {			define void @test_s_getpc(i64 addrspace(1)* %out) #0 {
	%tmp = call i64 @llvm.amdgcn.s.getpc() #1			%tmp = call i64 @llvm.amdgcn.s.getpc() #1
	store volatile i64 %tmp, i64 addrspace(1)* %out, align 8			store volatile i64 %tmp, i64 addrspace(1)* %out, align 8
	ret void			ret void
	}			}

	attributes #0 = { nounwind readnone speculatable }			attributes #0 = { nounwind readnone speculatable }

test/CodeGen/AMDGPU/load-constant-i16.ll

	Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines

	; FUNC-LABEL: {{^}}constant_zextload_v2i16_to_v2i32:			; FUNC-LABEL: {{^}}constant_zextload_v2i16_to_v2i32:
	; GCN: s_load_dword s			; GCN: s_load_dword s
	; GCN-DAG: s_and_b32 s{{[0-9]+}}, s{{[0-9]+}}, 0xffff{{$}}			; GCN-DAG: s_and_b32 s{{[0-9]+}}, s{{[0-9]+}}, 0xffff{{$}}
	; GCN-DAG: s_lshr_b32 s{{[0-9]+}}, s{{[0-9]+}}, 16			; GCN-DAG: s_lshr_b32 s{{[0-9]+}}, s{{[0-9]+}}, 16

	; v2i16 is naturally 4 byte aligned			; v2i16 is naturally 4 byte aligned
	; EG: VTX_READ_32 [[DST:T[0-9]\.[XYZW]]], [[DST]], 0, #1			; EG: VTX_READ_32 [[DST:T[0-9]\.[XYZW]]], [[DST]], 0, #1
	; EG: BFE_UINT {{[* ]*}}T{{[0-9].[XYZW]}}, [[DST]], literal			; EG: LSHR {{[* ]*}}T{{[0-9].[XYZW]}}, [[DST]], literal
	; EG: 16
	; EG: 16			; EG: 16
	define amdgpu_kernel void @constant_zextload_v2i16_to_v2i32(<2 x i32> addrspace(1)* %out, <2 x i16> addrspace(2)* %in) #0 {			define amdgpu_kernel void @constant_zextload_v2i16_to_v2i32(<2 x i32> addrspace(1)* %out, <2 x i16> addrspace(2)* %in) #0 {
	%load = load <2 x i16>, <2 x i16> addrspace(2)* %in			%load = load <2 x i16>, <2 x i16> addrspace(2)* %in
	%ext = zext <2 x i16> %load to <2 x i32>			%ext = zext <2 x i16> %load to <2 x i32>
	store <2 x i32> %ext, <2 x i32> addrspace(1)* %out			store <2 x i32> %ext, <2 x i32> addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}constant_sextload_v2i16_to_v2i32:			; FUNC-LABEL: {{^}}constant_sextload_v2i16_to_v2i32:
	; GCN: s_load_dword s			; GCN: s_load_dword s
	; GCN-DAG: s_ashr_i32			; GCN-DAG: s_ashr_i32
	; GCN-DAG: s_sext_i32_i16			; GCN-DAG: s_sext_i32_i16

	; v2i16 is naturally 4 byte aligned			; v2i16 is naturally 4 byte aligned
	; EG: MEM_RAT_CACHELESS STORE_RAW [[ST:T[0-9]]].XY, {{T[0-9].[XYZW]}},			; EG: MEM_RAT_CACHELESS STORE_RAW [[ST:T[0-9]]].XY, {{T[0-9].[XYZW]}},
	; EG: VTX_READ_32 [[DST:T[0-9]\.[XYZW]]], [[DST]], 0, #1			; EG: VTX_READ_32 [[DST:T[0-9]\.[XYZW]]], [[DST]], 0, #1
	; EG-DAG: BFE_INT {{[* ]*}}[[ST]].X, [[DST]], 0.0, literal			; EG-DAG: BFE_INT {{[* ]*}}[[ST]].X, [[DST]], 0.0, literal
	; TODO: We should use ASHR instead of LSHR + BFE			; EG-DAG: ASHR {{[* ]}}[[ST]].Y, {{.}}, literal
	; EG-DAG: BFE_INT {{[* ]*}}[[ST]].Y, {{PV\.[XYZW]}}, 0.0, literal
	; EG-DAG: 16			; EG-DAG: 16
	; EG-DAG: 16			; EG-DAG: 16
	define amdgpu_kernel void @constant_sextload_v2i16_to_v2i32(<2 x i32> addrspace(1)* %out, <2 x i16> addrspace(2)* %in) #0 {			define amdgpu_kernel void @constant_sextload_v2i16_to_v2i32(<2 x i32> addrspace(1)* %out, <2 x i16> addrspace(2)* %in) #0 {
	%load = load <2 x i16>, <2 x i16> addrspace(2)* %in			%load = load <2 x i16>, <2 x i16> addrspace(2)* %in
	%ext = sext <2 x i16> %load to <2 x i32>			%ext = sext <2 x i16> %load to <2 x i32>
	store <2 x i32> %ext, <2 x i32> addrspace(1)* %out			store <2 x i32> %ext, <2 x i32> addrspace(1)* %out
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; GCN: s_load_dwordx2			; GCN: s_load_dwordx2
	; GCN-DAG: s_and_b32			; GCN-DAG: s_and_b32
	; GCN-DAG: s_lshr_b32			; GCN-DAG: s_lshr_b32

	; v4i16 is naturally 8 byte aligned			; v4i16 is naturally 8 byte aligned
	; EG: MEM_RAT_CACHELESS STORE_RAW [[ST:T[0-9]]].XYZW, {{T[0-9].[XYZW]}}			; EG: MEM_RAT_CACHELESS STORE_RAW [[ST:T[0-9]]].XYZW, {{T[0-9].[XYZW]}}
	; EG: VTX_READ_64 [[LD:T[0-9]]].XY, {{T[0-9].[XYZW]}}, 0, #1			; EG: VTX_READ_64 [[LD:T[0-9]]].XY, {{T[0-9].[XYZW]}}, 0, #1
	; TODO: This should use LD, but for some there are redundant MOVs			; TODO: This should use LD, but for some there are redundant MOVs
	; EG-DAG: BFE_UINT {{[* ]}}[[ST]].Y, {{.\.[XYZW]}}, literal			; EG-DAG: LSHR {{[* ]}}[[ST]].Y, {{.\.[XYZW]}}, literal
	; EG-DAG: BFE_UINT {{[* ]}}[[ST]].W, {{.\.[XYZW]}}, literal			; EG-DAG: LSHR {{[* ]}}[[ST]].W, {{.\.[XYZW]}}, literal
	; EG-DAG: 16			; EG-DAG: 16
	; EG-DAG: 16			; EG-DAG: 16
	; EG-DAG: AND_INT {{[* ]*}}[[ST]].X, {{T[0-9]\.[XYZW]}}, literal			; EG-DAG: AND_INT {{[* ]*}}[[ST]].X, {{T[0-9]\.[XYZW]}}, literal
	; EG-DAG: AND_INT {{[* ]*}}[[ST]].Z, {{T[0-9]\.[XYZW]}}, literal			; EG-DAG: AND_INT {{[* ]*}}[[ST]].Z, {{T[0-9]\.[XYZW]}}, literal
	; EG-DAG: 65535			; EG-DAG: 65535
	; EG-DAG: 65535			; EG-DAG: 65535
	define amdgpu_kernel void @constant_zextload_v4i16_to_v4i32(<4 x i32> addrspace(1)* %out, <4 x i16> addrspace(2)* %in) #0 {			define amdgpu_kernel void @constant_zextload_v4i16_to_v4i32(<4 x i32> addrspace(1)* %out, <4 x i16> addrspace(2)* %in) #0 {
	%load = load <4 x i16>, <4 x i16> addrspace(2)* %in			%load = load <4 x i16>, <4 x i16> addrspace(2)* %in
	%ext = zext <4 x i16> %load to <4 x i32>			%ext = zext <4 x i16> %load to <4 x i32>
	store <4 x i32> %ext, <4 x i32> addrspace(1)* %out			store <4 x i32> %ext, <4 x i32> addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}constant_sextload_v4i16_to_v4i32:			; FUNC-LABEL: {{^}}constant_sextload_v4i16_to_v4i32:
	; GCN: s_load_dwordx2			; GCN: s_load_dwordx2
	; GCN-DAG: s_ashr_i32			; GCN-DAG: s_ashr_i32
	; GCN-DAG: s_sext_i32_i16			; GCN-DAG: s_sext_i32_i16

	; v4i16 is naturally 8 byte aligned			; v4i16 is naturally 8 byte aligned
	; EG: MEM_RAT_CACHELESS STORE_RAW [[ST:T[0-9]]].XYZW, {{T[0-9]\.[XYZW]}},			; EG: MEM_RAT_CACHELESS STORE_RAW [[ST:T[0-9]]].XYZW, {{T[0-9]\.[XYZW]}},
	; EG: VTX_READ_64 [[DST:T[0-9]]].XY, {{T[0-9].[XYZW]}}, 0, #1			; EG: VTX_READ_64 [[DST:T[0-9]]].XY, {{T[0-9].[XYZW]}}, 0, #1
	; TODO: This should use LD, but for some there are redundant MOVs			; TODO: This should use LD, but for some there are redundant MOVs
	; EG-DAG: BFE_INT {{[* ]}}[[ST]].X, {{.}}, 0.0, literal			; EG-DAG: BFE_INT {{[* ]}}[[ST]].X, {{.}}, 0.0, literal
	; EG-DAG: BFE_INT {{[* ]}}[[ST]].Z, {{.}}, 0.0, literal			; EG-DAG: BFE_INT {{[* ]}}[[ST]].Z, {{.}}, 0.0, literal
	; TODO: We should use ASHR instead of LSHR + BFE			; EG-DAG: ASHR {{[* ]}}[[ST]].Y, {{.}}, literal
	; EG-DAG: BFE_INT {{[* ]}}[[ST]].Y, {{.}}, 0.0, literal			; EG-DAG: ASHR {{[* ]}}[[ST]].W, {{.}}, literal
	; EG-DAG: BFE_INT {{[* ]}}[[ST]].W, {{.}}, 0.0, literal
	; EG-DAG: 16			; EG-DAG: 16
	; EG-DAG: 16			; EG-DAG: 16
	; EG-DAG: 16			; EG-DAG: 16
	; EG-DAG: 16			; EG-DAG: 16
	define amdgpu_kernel void @constant_sextload_v4i16_to_v4i32(<4 x i32> addrspace(1)* %out, <4 x i16> addrspace(2)* %in) #0 {			define amdgpu_kernel void @constant_sextload_v4i16_to_v4i32(<4 x i32> addrspace(1)* %out, <4 x i16> addrspace(2)* %in) #0 {
	%load = load <4 x i16>, <4 x i16> addrspace(2)* %in			%load = load <4 x i16>, <4 x i16> addrspace(2)* %in
	%ext = sext <4 x i16> %load to <4 x i32>			%ext = sext <4 x i16> %load to <4 x i32>
	store <4 x i32> %ext, <4 x i32> addrspace(1)* %out			store <4 x i32> %ext, <4 x i32> addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}constant_zextload_v8i16_to_v8i32:			; FUNC-LABEL: {{^}}constant_zextload_v8i16_to_v8i32:
	; GCN: s_load_dwordx4			; GCN: s_load_dwordx4
	; GCN-DAG: s_and_b32			; GCN-DAG: s_and_b32
	; GCN-DAG: s_lshr_b32			; GCN-DAG: s_lshr_b32

	; v8i16 is naturally 16 byte aligned			; v8i16 is naturally 16 byte aligned
	; EG: MEM_RAT_CACHELESS STORE_RAW [[ST_HI:T[0-9]]].XYZW, {{T[0-9]+.[XYZW]}},			; EG: MEM_RAT_CACHELESS STORE_RAW [[ST_HI:T[0-9]]].XYZW, {{T[0-9]+.[XYZW]}},
	; EG: MEM_RAT_CACHELESS STORE_RAW [[ST_LO:T[0-9]]].XYZW, {{T[0-9]+.[XYZW]}},			; EG: MEM_RAT_CACHELESS STORE_RAW [[ST_LO:T[0-9]]].XYZW, {{T[0-9]+.[XYZW]}},
	; EG: VTX_READ_128 [[DST:T[0-9]]].XYZW, {{T[0-9].[XYZW]}}, 0, #1			; EG: VTX_READ_128 [[DST:T[0-9]]].XYZW, {{T[0-9].[XYZW]}}, 0, #1
	; TODO: These should use LSHR instead of BFE_UINT
	; TODO: This should use DST, but for some there are redundant MOVs			; TODO: This should use DST, but for some there are redundant MOVs
	; EG-DAG: BFE_UINT {{[* ]}}[[ST_LO]].Y, {{.}}, literal			; EG-DAG: LSHR {{[* ]}}[[ST_LO]].Y, {{.}}, literal
	; EG-DAG: BFE_UINT {{[* ]}}[[ST_LO]].W, {{.}}, literal			; EG-DAG: LSHR {{[* ]}}[[ST_LO]].W, {{.}}, literal
	; EG-DAG: BFE_UINT {{[* ]}}[[ST_HI]].Y, {{.}}, literal			; EG-DAG: LSHR {{[* ]}}[[ST_HI]].Y, {{.}}, literal
	; EG-DAG: BFE_UINT {{[* ]}}[[ST_HI]].W, {{.}}, literal			; EG-DAG: LSHR {{[* ]}}[[ST_HI]].W, {{.}}, literal
	; EG-DAG: AND_INT {{[* ]}}[[ST_LO]].X, {{.}}, literal			; EG-DAG: AND_INT {{[* ]}}[[ST_LO]].X, {{.}}, literal
	; EG-DAG: AND_INT {{[* ]}}[[ST_LO]].Z, {{.}}, literal			; EG-DAG: AND_INT {{[* ]}}[[ST_LO]].Z, {{.}}, literal
	; EG-DAG: AND_INT {{[* ]}}[[ST_HI]].X, {{.}}, literal			; EG-DAG: AND_INT {{[* ]}}[[ST_HI]].X, {{.}}, literal
	; EG-DAG: AND_INT {{[* ]}}[[ST_HI]].Z, {{.}}, literal			; EG-DAG: AND_INT {{[* ]}}[[ST_HI]].Z, {{.}}, literal
	; EG-DAG: 16			; EG-DAG: 16
	; EG-DAG: 16			; EG-DAG: 16
	; EG-DAG: 16			; EG-DAG: 16
	; EG-DAG: 16			; EG-DAG: 16
	Show All 12 Lines
	; GCN: s_load_dwordx4			; GCN: s_load_dwordx4
	; GCN-DAG: s_ashr_i32			; GCN-DAG: s_ashr_i32
	; GCN-DAG: s_sext_i32_i16			; GCN-DAG: s_sext_i32_i16

	; v8i16 is naturally 16 byte aligned			; v8i16 is naturally 16 byte aligned
	; EG: MEM_RAT_CACHELESS STORE_RAW [[ST_HI:T[0-9]]].XYZW, {{T[0-9]+.[XYZW]}},			; EG: MEM_RAT_CACHELESS STORE_RAW [[ST_HI:T[0-9]]].XYZW, {{T[0-9]+.[XYZW]}},
	; EG: MEM_RAT_CACHELESS STORE_RAW [[ST_LO:T[0-9]]].XYZW, {{T[0-9]+.[XYZW]}},			; EG: MEM_RAT_CACHELESS STORE_RAW [[ST_LO:T[0-9]]].XYZW, {{T[0-9]+.[XYZW]}},
	; EG: VTX_READ_128 [[DST:T[0-9]]].XYZW, {{T[0-9].[XYZW]}}, 0, #1			; EG: VTX_READ_128 [[DST:T[0-9]]].XYZW, {{T[0-9].[XYZW]}}, 0, #1
	; TODO: 4 of these should use ASHR instead of LSHR + BFE_INT
	; TODO: This should use DST, but for some there are redundant MOVs			; TODO: This should use DST, but for some there are redundant MOVs
	; EG-DAG: BFE_INT {{[* ]}}[[ST_LO]].Y, {{.}}, 0.0, literal			; EG-DAG: ASHR {{[* ]}}[[ST_LO]].Y, {{.}}, literal
	; EG-DAG: BFE_INT {{[* ]}}[[ST_LO]].W, {{.}}, 0.0, literal			; EG-DAG: ASHR {{[* ]}}[[ST_LO]].W, {{.}}, literal
	; EG-DAG: BFE_INT {{[* ]}}[[ST_HI]].Y, {{.}}, 0.0, literal			; EG-DAG: ASHR {{[* ]}}[[ST_HI]].Y, {{.}}, literal
	; EG-DAG: BFE_INT {{[* ]}}[[ST_HI]].W, {{.}}, 0.0, literal			; EG-DAG: ASHR {{[* ]}}[[ST_HI]].W, {{.}}, literal
	; EG-DAG: BFE_INT {{[* ]}}[[ST_LO]].X, {{.}}, 0.0, literal			; EG-DAG: BFE_INT {{[* ]}}[[ST_LO]].X, {{.}}, 0.0, literal
	; EG-DAG: BFE_INT {{[* ]}}[[ST_LO]].Z, {{.}}, 0.0, literal			; EG-DAG: BFE_INT {{[* ]}}[[ST_LO]].Z, {{.}}, 0.0, literal
	; EG-DAG: BFE_INT {{[* ]}}[[ST_HI]].X, {{.}}, 0.0, literal			; EG-DAG: BFE_INT {{[* ]}}[[ST_HI]].X, {{.}}, 0.0, literal
	; EG-DAG: BFE_INT {{[* ]}}[[ST_HI]].Z, {{.}}, 0.0, literal			; EG-DAG: BFE_INT {{[* ]}}[[ST_HI]].Z, {{.}}, 0.0, literal
	; EG-DAG: 16			; EG-DAG: 16
	; EG-DAG: 16			; EG-DAG: 16
	; EG-DAG: 16			; EG-DAG: 16
	; EG-DAG: 16			; EG-DAG: 16
	▲ Show 20 Lines • Show All 304 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/load-constant-i8.ll

Show First 20 Lines • Show All 340 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @constant_zextload_v16i8_to_v16i32(<16 x i32> addrspace(1)* %out, <16 x i8> addrspace(2)* %in) #0 {
store <16 x i32> %ext, <16 x i32> addrspace(1)* %out		store <16 x i32> %ext, <16 x i32> addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}constant_sextload_v16i8_to_v16i32:		; FUNC-LABEL: {{^}}constant_sextload_v16i8_to_v16i32:

; EG: VTX_READ_128 [[DST:T[0-9]+\.XYZW]], T{{[0-9]+}}.X, 0, #1		; EG: VTX_READ_128 [[DST:T[0-9]+\.XYZW]], T{{[0-9]+}}.X, 0, #1
; TODO: These should use DST, but for some there are redundant MOVs		; TODO: These should use DST, but for some there are redundant MOVs
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: 8		; EG-DAG: 8
; EG-DAG: 8		; EG-DAG: 8
; EG-DAG: 8		; EG-DAG: 8
; EG-DAG: 8		; EG-DAG: 8
; EG-DAG: 8		; EG-DAG: 8
; EG-DAG: 8		; EG-DAG: 8
; EG-DAG: 8		; EG-DAG: 8
; EG-DAG: 8		; EG-DAG: 8
▲ Show 20 Lines • Show All 597 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/load-global-i16.ll

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @global_sextload_v1i16_to_v1i32(<1 x i32> addrspace(1)* %out, <1 x i16> addrspace(1)* %in) #0 {
ret void		ret void
}		}

; FUNC-LABEL: {{^}}global_zextload_v2i16_to_v2i32:		; FUNC-LABEL: {{^}}global_zextload_v2i16_to_v2i32:
; GCN-NOHSA: buffer_load_dword		; GCN-NOHSA: buffer_load_dword
; GCN-HSA: flat_load_dword		; GCN-HSA: flat_load_dword

; EGCM: VTX_READ_32 [[DST:T[0-9]\.[XYZW]]], [[DST]], 0, #1		; EGCM: VTX_READ_32 [[DST:T[0-9]\.[XYZW]]], [[DST]], 0, #1
; EGCM: BFE_UINT {{[* ]*}}T{{[0-9].[XYZW]}}, [[DST]], literal		; EGCM: LSHR {{[* ]*}}T{{[0-9].[XYZW]}}, [[DST]], literal
; EGCM: 16		; EGCM: 16
define amdgpu_kernel void @global_zextload_v2i16_to_v2i32(<2 x i32> addrspace(1)* %out, <2 x i16> addrspace(1)* %in) #0 {		define amdgpu_kernel void @global_zextload_v2i16_to_v2i32(<2 x i32> addrspace(1)* %out, <2 x i16> addrspace(1)* %in) #0 {
%load = load <2 x i16>, <2 x i16> addrspace(1)* %in		%load = load <2 x i16>, <2 x i16> addrspace(1)* %in
%ext = zext <2 x i16> %load to <2 x i32>		%ext = zext <2 x i16> %load to <2 x i32>
store <2 x i32> %ext, <2 x i32> addrspace(1)* %out		store <2 x i32> %ext, <2 x i32> addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}global_sextload_v2i16_to_v2i32:		; FUNC-LABEL: {{^}}global_sextload_v2i16_to_v2i32:
; GCN-NOHSA: buffer_load_dword		; GCN-NOHSA: buffer_load_dword

; GCN-HSA: flat_load_dword		; GCN-HSA: flat_load_dword

; EG: MEM_RAT_CACHELESS STORE_RAW [[ST:T[0-9]]].XY, {{T[0-9]\.[XYZW]}},		; EG: MEM_RAT_CACHELESS STORE_RAW [[ST:T[0-9]]].XY, {{T[0-9]\.[XYZW]}},
; CM: MEM_RAT_CACHELESS STORE_DWORD [[ST:T[0-9]]], {{T[0-9]\.[XYZW]}}		; CM: MEM_RAT_CACHELESS STORE_DWORD [[ST:T[0-9]]], {{T[0-9]\.[XYZW]}}
; EGCM: VTX_READ_32 [[DST:T[0-9].[XYZW]]], [[DST]], 0, #1		; EGCM: VTX_READ_32 [[DST:T[0-9].[XYZW]]], [[DST]], 0, #1
; TODO: This should use ASHR instead of LSHR + BFE		; TODO: This should use ASHR instead of LSHR + BFE
		; EGCM-DAG: ASHR {{[* ]}}[[ST]].Y, {{.}}, literal
; EGCM-DAG: BFE_INT {{[* ]*}}[[ST]].X, [[DST]], 0.0, literal		; EGCM-DAG: BFE_INT {{[* ]*}}[[ST]].X, [[DST]], 0.0, literal
; EGCM-DAG: BFE_INT {{[* ]*}}[[ST]].Y, {{PV.[XYZW]}}, 0.0, literal
; EGCM-DAG: 16		; EGCM-DAG: 16
; EGCM-DAG: 16		; EGCM-DAG: 16
define amdgpu_kernel void @global_sextload_v2i16_to_v2i32(<2 x i32> addrspace(1)* %out, <2 x i16> addrspace(1)* %in) #0 {		define amdgpu_kernel void @global_sextload_v2i16_to_v2i32(<2 x i32> addrspace(1)* %out, <2 x i16> addrspace(1)* %in) #0 {
%load = load <2 x i16>, <2 x i16> addrspace(1)* %in		%load = load <2 x i16>, <2 x i16> addrspace(1)* %in
%ext = sext <2 x i16> %load to <2 x i32>		%ext = sext <2 x i16> %load to <2 x i32>
store <2 x i32> %ext, <2 x i32> addrspace(1)* %out		store <2 x i32> %ext, <2 x i32> addrspace(1)* %out
ret void		ret void
}		}
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
; GCN-NOHSA: buffer_load_dwordx2		; GCN-NOHSA: buffer_load_dwordx2

; GCN-HSA: flat_load_dwordx2		; GCN-HSA: flat_load_dwordx2

; CM: MEM_RAT_CACHELESS STORE_DWORD [[ST:T[0-9]]], {{T[0-9]\.[XYZW]}}		; CM: MEM_RAT_CACHELESS STORE_DWORD [[ST:T[0-9]]], {{T[0-9]\.[XYZW]}}
; EG: MEM_RAT_CACHELESS STORE_RAW [[ST:T[0-9]]].XYZW, {{T[0-9]\.[XYZW]}},		; EG: MEM_RAT_CACHELESS STORE_RAW [[ST:T[0-9]]].XYZW, {{T[0-9]\.[XYZW]}},
; EGCM: VTX_READ_64 [[DST:T[0-9]]].XY, {{T[0-9].[XYZW]}}, 0, #1		; EGCM: VTX_READ_64 [[DST:T[0-9]]].XY, {{T[0-9].[XYZW]}}, 0, #1
; TODO: This should use DST, but for some there are redundant MOVs		; TODO: This should use DST, but for some there are redundant MOVs
; EGCM-DAG: BFE_UINT {{[* ]}}[[ST]].Y, {{.}}, literal		; EGCM-DAG: LSHR {{[* ]}}[[ST]].Y, {{.}}, literal
; EGCM-DAG: 16		; EGCM-DAG: 16
; EGCM-DAG: BFE_UINT {{[* ]}}[[ST]].W, {{.}}, literal		; EGCM-DAG: LSHR {{[* ]}}[[ST]].W, {{.}}, literal
; EGCM-DAG: AND_INT {{[* ]}}[[ST]].X, {{.}}, literal		; EGCM-DAG: AND_INT {{[* ]}}[[ST]].X, {{.}}, literal
; EGCM-DAG: AND_INT {{[* ]}}[[ST]].Z, {{.}}, literal		; EGCM-DAG: AND_INT {{[* ]}}[[ST]].Z, {{.}}, literal
; EGCM-DAG: 16		; EGCM-DAG: 16
define amdgpu_kernel void @global_zextload_v4i16_to_v4i32(<4 x i32> addrspace(1)* %out, <4 x i16> addrspace(1)* %in) #0 {		define amdgpu_kernel void @global_zextload_v4i16_to_v4i32(<4 x i32> addrspace(1)* %out, <4 x i16> addrspace(1)* %in) #0 {
%load = load <4 x i16>, <4 x i16> addrspace(1)* %in		%load = load <4 x i16>, <4 x i16> addrspace(1)* %in
%ext = zext <4 x i16> %load to <4 x i32>		%ext = zext <4 x i16> %load to <4 x i32>
store <4 x i32> %ext, <4 x i32> addrspace(1)* %out		store <4 x i32> %ext, <4 x i32> addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}global_sextload_v4i16_to_v4i32:		; FUNC-LABEL: {{^}}global_sextload_v4i16_to_v4i32:
; GCN-NOHSA: buffer_load_dwordx2		; GCN-NOHSA: buffer_load_dwordx2

; GCN-HSA: flat_load_dwordx2		; GCN-HSA: flat_load_dwordx2

; CM: MEM_RAT_CACHELESS STORE_DWORD [[ST:T[0-9]]], {{T[0-9]\.[XYZW]}}		; CM: MEM_RAT_CACHELESS STORE_DWORD [[ST:T[0-9]]], {{T[0-9]\.[XYZW]}}
; EG: MEM_RAT_CACHELESS STORE_RAW [[ST:T[0-9]]].XYZW, {{T[0-9]\.[XYZW]}},		; EG: MEM_RAT_CACHELESS STORE_RAW [[ST:T[0-9]]].XYZW, {{T[0-9]\.[XYZW]}},
; EGCM: VTX_READ_64 [[DST:T[0-9]]].XY, {{T[0-9].[XYZW]}}, 0, #1		; EGCM: VTX_READ_64 [[DST:T[0-9]]].XY, {{T[0-9].[XYZW]}}, 0, #1
; TODO: We should use ASHR instead of LSHR + BFE		; TODO: We should use ASHR instead of LSHR + BFE
; TODO: This should use DST, but for some there are redundant MOVs		; TODO: This should use DST, but for some there are redundant MOVs
		; EGCM-DAG: ASHR {{[* ]}}[[ST]].Y, {{.}}, literal
		; EGCM-DAG: ASHR {{[* ]}}[[ST]].W, {{.}}, literal
; EGCM-DAG: BFE_INT {{[* ]}}[[ST]].X, {{.}}, 0.0, literal		; EGCM-DAG: BFE_INT {{[* ]}}[[ST]].X, {{.}}, 0.0, literal
; EGCM-DAG: BFE_INT {{[* ]}}[[ST]].Y, {{.}}, 0.0, literal
; EGCM-DAG: BFE_INT {{[* ]}}[[ST]].Z, {{.}}, 0.0, literal		; EGCM-DAG: BFE_INT {{[* ]}}[[ST]].Z, {{.}}, 0.0, literal
; EGCM-DAG: BFE_INT {{[* ]}}[[ST]].W, {{.}}, 0.0, literal
; EGCM-DAG: 16		; EGCM-DAG: 16
; EGCM-DAG: 16		; EGCM-DAG: 16
; EGCM-DAG: 16		; EGCM-DAG: 16
; EGCM-DAG: 16		; EGCM-DAG: 16
define amdgpu_kernel void @global_sextload_v4i16_to_v4i32(<4 x i32> addrspace(1)* %out, <4 x i16> addrspace(1)* %in) #0 {		define amdgpu_kernel void @global_sextload_v4i16_to_v4i32(<4 x i32> addrspace(1)* %out, <4 x i16> addrspace(1)* %in) #0 {
%load = load <4 x i16>, <4 x i16> addrspace(1)* %in		%load = load <4 x i16>, <4 x i16> addrspace(1)* %in
%ext = sext <4 x i16> %load to <4 x i32>		%ext = sext <4 x i16> %load to <4 x i32>
store <4 x i32> %ext, <4 x i32> addrspace(1)* %out		store <4 x i32> %ext, <4 x i32> addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}global_zextload_v8i16_to_v8i32:		; FUNC-LABEL: {{^}}global_zextload_v8i16_to_v8i32:
; GCN-NOHSA: buffer_load_dwordx4		; GCN-NOHSA: buffer_load_dwordx4
; GCN-HSA: flat_load_dwordx4		; GCN-HSA: flat_load_dwordx4

; CM-DAG: MEM_RAT_CACHELESS STORE_DWORD [[ST_LO:T[0-9]]], {{T[0-9]\.[XYZW]}}		; CM-DAG: MEM_RAT_CACHELESS STORE_DWORD [[ST_LO:T[0-9]]], {{T[0-9]\.[XYZW]}}
; CM-DAG: MEM_RAT_CACHELESS STORE_DWORD [[ST_HI:T[0-9]]], {{T[0-9]\.[XYZW]}}		; CM-DAG: MEM_RAT_CACHELESS STORE_DWORD [[ST_HI:T[0-9]]], {{T[0-9]\.[XYZW]}}
; EG-DAG: MEM_RAT_CACHELESS STORE_RAW [[ST_LO:T[0-9]]].XYZW, {{T[0-9]\.[XYZW]}},		; EG-DAG: MEM_RAT_CACHELESS STORE_RAW [[ST_LO:T[0-9]]].XYZW, {{T[0-9]\.[XYZW]}},
; EG-DAG: MEM_RAT_CACHELESS STORE_RAW [[ST_HI:T[0-9]]].XYZW, {{T[0-9]\.[XYZW]}},		; EG-DAG: MEM_RAT_CACHELESS STORE_RAW [[ST_HI:T[0-9]]].XYZW, {{T[0-9]\.[XYZW]}},
; EGCM: CF_END		; EGCM: CF_END
; EGCM: VTX_READ_128 [[DST:T[0-9]]].XYZW, {{T[0-9].[XYZW]}}, 0, #1		; EGCM: VTX_READ_128 [[DST:T[0-9]]].XYZW, {{T[0-9].[XYZW]}}, 0, #1
; TODO: These should use LSHR instead of BFE_UINT		; EGCM-DAG: LSHR {{[* ]}}[[ST_LO]].Y, {{.}}, literal
; EGCM-DAG: BFE_UINT {{[* ]}}[[ST_LO]].Y, {{.}}, literal		; EGCM-DAG: LSHR {{[* ]}}[[ST_LO]].W, {{.}}, literal
; EGCM-DAG: BFE_UINT {{[* ]}}[[ST_LO]].W, {{.}}, literal		; EGCM-DAG: LSHR {{[* ]}}[[ST_HI]].Y, {{.}}, literal
; EGCM-DAG: BFE_UINT {{[* ]}}[[ST_HI]].Y, {{.}}, literal		; EGCM-DAG: LSHR {{[* ]}}[[ST_HI]].W, {{.}}, literal
; EGCM-DAG: BFE_UINT {{[* ]}}[[ST_HI]].W, {{.}}, literal
; EGCM-DAG: AND_INT {{[* ]}}[[ST_LO]].X, {{.}}, literal		; EGCM-DAG: AND_INT {{[* ]}}[[ST_LO]].X, {{.}}, literal
; EGCM-DAG: AND_INT {{[* ]}}[[ST_LO]].Z, {{.}}, literal		; EGCM-DAG: AND_INT {{[* ]}}[[ST_LO]].Z, {{.}}, literal
; EGCM-DAG: AND_INT {{[* ]}}[[ST_HI]].X, {{.}}, literal		; EGCM-DAG: AND_INT {{[* ]}}[[ST_HI]].X, {{.}}, literal
; EGCM-DAG: AND_INT {{[* ]}}[[ST_HI]].Z, {{.}}, literal		; EGCM-DAG: AND_INT {{[* ]}}[[ST_HI]].Z, {{.}}, literal
; EGCM-DAG: 65535		; EGCM-DAG: 65535
; EGCM-DAG: 65535		; EGCM-DAG: 65535
; EGCM-DAG: 65535		; EGCM-DAG: 65535
; EGCM-DAG: 65535		; EGCM-DAG: 65535
Show All 13 Lines
; GCN-HSA: flat_load_dwordx4		; GCN-HSA: flat_load_dwordx4

; CM-DAG: MEM_RAT_CACHELESS STORE_DWORD [[ST_LO:T[0-9]]], {{T[0-9]\.[XYZW]}}		; CM-DAG: MEM_RAT_CACHELESS STORE_DWORD [[ST_LO:T[0-9]]], {{T[0-9]\.[XYZW]}}
; CM-DAG: MEM_RAT_CACHELESS STORE_DWORD [[ST_HI:T[0-9]]], {{T[0-9]\.[XYZW]}}		; CM-DAG: MEM_RAT_CACHELESS STORE_DWORD [[ST_HI:T[0-9]]], {{T[0-9]\.[XYZW]}}
; EG-DAG: MEM_RAT_CACHELESS STORE_RAW [[ST_LO:T[0-9]]].XYZW, {{T[0-9]\.[XYZW]}},		; EG-DAG: MEM_RAT_CACHELESS STORE_RAW [[ST_LO:T[0-9]]].XYZW, {{T[0-9]\.[XYZW]}},
; EG-DAG: MEM_RAT_CACHELESS STORE_RAW [[ST_HI:T[0-9]]].XYZW, {{T[0-9]\.[XYZW]}},		; EG-DAG: MEM_RAT_CACHELESS STORE_RAW [[ST_HI:T[0-9]]].XYZW, {{T[0-9]\.[XYZW]}},
; EGCM: CF_END		; EGCM: CF_END
; EGCM: VTX_READ_128 [[DST:T[0-9]]].XYZW, {{T[0-9].[XYZW]}}, 0, #1		; EGCM: VTX_READ_128 [[DST:T[0-9]]].XYZW, {{T[0-9].[XYZW]}}, 0, #1
; TODO: These should use ASHR instead of LSHR + BFE_INT		; EGCM-DAG: ASHR {{[* ]}}[[ST_HI]].Y, {{.}}, literal
; EGCM-DAG: BFE_INT {{[* ]}}[[ST_LO]].Y, {{.}}, 0.0, literal		; EGCM-DAG: ASHR {{[* ]}}[[ST_LO]].W, {{.}}, literal
; EGCM-DAG: BFE_INT {{[* ]}}[[ST_LO]].W, {{.}}, 0.0, literal		; EGCM-DAG: ASHR {{[* ]}}[[ST_LO]].Y, {{.}}, literal
; EGCM-DAG: BFE_INT {{[* ]}}[[ST_HI]].Y, {{.}}, 0.0, literal		; EGCM-DAG: ASHR {{[* ]}}[[ST_HI]].W, {{.}}, literal
; EGCM-DAG: BFE_INT {{[* ]}}[[ST_HI]].W, {{.}}, 0.0, literal
; EGCM-DAG: BFE_INT {{[* ]}}[[ST_LO]].X, {{.}}, 0.0, literal		; EGCM-DAG: BFE_INT {{[* ]}}[[ST_LO]].X, {{.}}, 0.0, literal
; EGCM-DAG: BFE_INT {{[* ]}}[[ST_LO]].Z, {{.}}, 0.0, literal		; EGCM-DAG: BFE_INT {{[* ]}}[[ST_LO]].Z, {{.}}, 0.0, literal
; EGCM-DAG: BFE_INT {{[* ]}}[[ST_HI]].X, {{.}}, 0.0, literal		; EGCM-DAG: BFE_INT {{[* ]}}[[ST_HI]].X, {{.}}, 0.0, literal
; EGCM-DAG: BFE_INT {{[* ]}}[[ST_HI]].Z, {{.}}, 0.0, literal		; EGCM-DAG: BFE_INT {{[* ]}}[[ST_HI]].Z, {{.}}, 0.0, literal
; EGCM-DAG: 16		; EGCM-DAG: 16
; EGCM-DAG: 16		; EGCM-DAG: 16
; EGCM-DAG: 16		; EGCM-DAG: 16
; EGCM-DAG: 16		; EGCM-DAG: 16
▲ Show 20 Lines • Show All 323 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/load-global-i8.ll

Show First 20 Lines • Show All 346 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @global_zextload_v16i8_to_v16i32(<16 x i32> addrspace(1)* %out, <16 x i8> addrspace(1)* %in) #0 {
store <16 x i32> %ext, <16 x i32> addrspace(1)* %out		store <16 x i32> %ext, <16 x i32> addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}global_sextload_v16i8_to_v16i32:		; FUNC-LABEL: {{^}}global_sextload_v16i8_to_v16i32:

; EG: VTX_READ_128 [[DST:T[0-9]+\.XYZW]], T{{[0-9]+}}.X, 0, #1		; EG: VTX_READ_128 [[DST:T[0-9]+\.XYZW]], T{{[0-9]+}}.X, 0, #1
; TODO: These should use DST, but for some there are redundant MOVs		; TODO: These should use DST, but for some there are redundant MOVs
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: BFE_INT {{[* ]}}T{{[0-9].[XYZW]}}, {{.}}, 0.0, literal		; EG-DAG: BFE_INT {{[* ]}}T{{[0-9]+.[XYZW]}}, {{.}}, 0.0, literal
; EG-DAG: 8		; EG-DAG: 8
; EG-DAG: 8		; EG-DAG: 8
; EG-DAG: 8		; EG-DAG: 8
; EG-DAG: 8		; EG-DAG: 8
; EG-DAG: 8		; EG-DAG: 8
; EG-DAG: 8		; EG-DAG: 8
; EG-DAG: 8		; EG-DAG: 8
; EG-DAG: 8		; EG-DAG: 8
▲ Show 20 Lines • Show All 601 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/load-local-i16.ll

	Show First 20 Lines • Show All 174 Lines • ▼ Show 20 Lines

	; FUNC-LABEL: {{^}}local_sextload_v2i16_to_v2i32:			; FUNC-LABEL: {{^}}local_sextload_v2i16_to_v2i32:
	; GCN-NOT: s_wqm_b64			; GCN-NOT: s_wqm_b64
	; GCN: s_mov_b32 m0			; GCN: s_mov_b32 m0
	; GCN: ds_read_b32			; GCN: ds_read_b32

	; EG: LDS_READ_RET			; EG: LDS_READ_RET
	; EG: BFE_INT			; EG: BFE_INT
	; EG: BFE_INT
	define amdgpu_kernel void @local_sextload_v2i16_to_v2i32(<2 x i32> addrspace(3)* %out, <2 x i16> addrspace(3)* %in) #0 {			define amdgpu_kernel void @local_sextload_v2i16_to_v2i32(<2 x i32> addrspace(3)* %out, <2 x i16> addrspace(3)* %in) #0 {
	%load = load <2 x i16>, <2 x i16> addrspace(3)* %in			%load = load <2 x i16>, <2 x i16> addrspace(3)* %in
	%ext = sext <2 x i16> %load to <2 x i32>			%ext = sext <2 x i16> %load to <2 x i32>
	store <2 x i32> %ext, <2 x i32> addrspace(3)* %out			store <2 x i32> %ext, <2 x i32> addrspace(3)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}local_local_zextload_v3i16_to_v3i32:			; FUNC-LABEL: {{^}}local_local_zextload_v3i16_to_v3i32:
	▲ Show 20 Lines • Show All 639 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/r600.bitcast.ll

	Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	}			}

	; This just checks for crash in BUILD_VECTOR/EXTRACT_ELEMENT combine			; This just checks for crash in BUILD_VECTOR/EXTRACT_ELEMENT combine
	; the stack manipulation is tricky to follow			; the stack manipulation is tricky to follow
	; TODO: This should only use one load			; TODO: This should only use one load
	; FUNC-LABEL: {{^}}v4i16_extract_i8:			; FUNC-LABEL: {{^}}v4i16_extract_i8:
	; EG: MEM_RAT MSKOR {{T[0-9]+\.XW}}, [[ST_PTR:T[0-9]+\.[XYZW]]]			; EG: MEM_RAT MSKOR {{T[0-9]+\.XW}}, [[ST_PTR:T[0-9]+\.[XYZW]]]
	; EG: VTX_READ_16			; EG: VTX_READ_16
	; EG: VTX_READ_16			; EG: LSHR {{[* ]*}}T{{[0-9]+}}.W, T{{[0-9]+}}.X, literal
	; EG-DAG: BFE_UINT
	; EG-DAG: LSHR {{[\* ]*}}[[ST_PTR]], KC0[2].Y, literal			; EG-DAG: LSHR {{[\* ]*}}[[ST_PTR]], KC0[2].Y, literal
	define amdgpu_kernel void @v4i16_extract_i8(i8 addrspace(1)* %out, <4 x i16> addrspace(1)* %in) nounwind {			define amdgpu_kernel void @v4i16_extract_i8(i8 addrspace(1)* %out, <4 x i16> addrspace(1)* %in) nounwind {
	%load = load <4 x i16>, <4 x i16> addrspace(1)* %in, align 2			%load = load <4 x i16>, <4 x i16> addrspace(1)* %in, align 2
	%bc = bitcast <4 x i16> %load to <8 x i8>			%bc = bitcast <4 x i16> %load to <8 x i8>
	%element = extractelement <8 x i8> %bc, i32 5			%element = extractelement <8 x i8> %bc, i32 5
	store i8 %element, i8 addrspace(1)* %out			store i8 %element, i8 addrspace(1)* %out
	ret void			ret void
	}			}
	Show All 13 Lines

test/CodeGen/AMDGPU/setcc.ll

Show First 20 Lines • Show All 391 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @setcc-i1(i32 %in) #0 {
br i1 %cmp, label %endif, label %if		br i1 %cmp, label %endif, label %if
if:		if:
unreachable		unreachable
endif:		endif:
ret void		ret void
}		}

; FUNC-LABEL: setcc-i1-and-xor		; FUNC-LABEL: setcc-i1-and-xor
; GCN-DAG: v_cmp_ge_f32_e64 [[A:s\[[0-9]+:[0-9]+\]]], s{{[0-9]+}}, 0{{$}}		; GCN-DAG: v_cmp_nge_f32_e64 [[A:s\[[0-9]+:[0-9]+\]]], s{{[0-9]+}}, 0{{$}}
; GCN-DAG: v_cmp_le_f32_e64 [[B:s\[[0-9]+:[0-9]+\]]], s{{[0-9]+}}, 1.0		; GCN-DAG: v_cmp_nle_f32_e64 [[B:s\[[0-9]+:[0-9]+\]]], s{{[0-9]+}}, 1.0
; GCN: s_and_b64 s[2:3], [[A]], [[B]]		; GCN: s_or_b64 vcc, [[A]], [[B]]
define amdgpu_kernel void @setcc-i1-and-xor(i32 addrspace(1)* %out, float %cond) #0 {		define amdgpu_kernel void @setcc-i1-and-xor(i32 addrspace(1)* %out, float %cond) #0 {
bb0:		bb0:
%tmp5 = fcmp oge float %cond, 0.000000e+00		%tmp5 = fcmp oge float %cond, 0.000000e+00
%tmp7 = fcmp ole float %cond, 1.000000e+00		%tmp7 = fcmp ole float %cond, 1.000000e+00
%tmp9 = and i1 %tmp5, %tmp7		%tmp9 = and i1 %tmp5, %tmp7
%tmp11 = xor i1 %tmp9, 1		%tmp11 = xor i1 %tmp9, 1
br i1 %tmp11, label %bb2, label %bb1		br i1 %tmp11, label %bb2, label %bb1

Show All 9 Lines

test/CodeGen/AMDGPU/shift-and-i64-ubfe.ll

Show First 20 Lines • Show All 293 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @v_uextract_bit_31_32_i64_trunc_i32(i32 addrspace(1)* %out, i64 addrspace(1)* %in) #1 {
%srl = lshr i64 %ld.64, 31		%srl = lshr i64 %ld.64, 31
%trunc = trunc i64 %srl to i32		%trunc = trunc i64 %srl to i32
%bit = and i32 %trunc, 3		%bit = and i32 %trunc, 3
store i32 %bit, i32 addrspace(1)* %out.gep		store i32 %bit, i32 addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}and_not_mask_i64:		; GCN-LABEL: {{^}}and_not_mask_i64:
; GCN-DAG: buffer_load_dwordx2 v{{\[}}[[VALLO:[0-9]+]]:[[VALHI:[0-9]+]]{{\]}}		; GCN-DAG: buffer_load_dword v[[VALLO:[0-9]+]]
; GCN: v_mov_b32_e32 v[[SHRHI:[0-9]+]], 0{{$}}		; GCN: v_mov_b32_e32 v[[SHRHI:[0-9]+]], 0{{$}}
; GCN: v_lshrrev_b32_e32 [[SHR:v[0-9]+]], 20, v[[VALLO]]		; GCN: v_lshrrev_b32_e32 [[SHR:v[0-9]+]], 20, v[[VALLO]]
; GCN-DAG: v_and_b32_e32 v[[SHRLO:[0-9]+]], 4, [[SHR]]		; GCN-DAG: v_and_b32_e32 v[[SHRLO:[0-9]+]], 4, [[SHR]]
; GCN-NOT: v[[SHRLO]]		; GCN-NOT: v[[SHRLO]]
; GCN-NOT: v[[SHRHI]]		; GCN-NOT: v[[SHRHI]]
; GCN: buffer_store_dwordx2 v{{\[}}[[SHRLO]]:[[SHRHI]]{{\]}}		; GCN: buffer_store_dwordx2 v{{\[}}[[SHRLO]]:[[SHRHI]]{{\]}}
define amdgpu_kernel void @and_not_mask_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %in) #1 {		define amdgpu_kernel void @and_not_mask_i64(i64 addrspace(1)* %out, i64 addrspace(1)* %in) #1 {
%id.x = tail call i32 @llvm.amdgcn.workitem.id.x()		%id.x = tail call i32 @llvm.amdgcn.workitem.id.x()
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/shift-i64-opts.ll

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @lshr_i64_32(i64 addrspace(1)* %out, i64 addrspace(1)* %in) {
store i64 %shl, i64 addrspace(1)* %out		store i64 %shl, i64 addrspace(1)* %out
ret void		ret void
}		}

; Make sure the and of the constant doesn't prevent bfe from forming		; Make sure the and of the constant doesn't prevent bfe from forming
; after 64-bit shift is split.		; after 64-bit shift is split.

; GCN-LABEL: {{^}}lshr_and_i64_35:		; GCN-LABEL: {{^}}lshr_and_i64_35:
; GCN: buffer_load_dwordx2 v{{\[}}[[LO:[0-9]+]]:[[HI:[0-9]+]]{{\]}}		; GCN: buffer_load_dword v[[HI:[0-9]+]]
; GCN: v_bfe_u32 v[[BFE:[0-9]+]], v[[HI]], 8, 23
; GCN: v_mov_b32_e32 v[[ZERO:[0-9]+]], 0{{$}}		; GCN: v_mov_b32_e32 v[[ZERO:[0-9]+]], 0{{$}}
		; GCN: v_bfe_u32 v[[BFE:[0-9]+]], v[[HI]], 8, 23
; GCN: buffer_store_dwordx2 v{{\[}}[[BFE]]:[[ZERO]]{{\]}}		; GCN: buffer_store_dwordx2 v{{\[}}[[BFE]]:[[ZERO]]{{\]}}
define amdgpu_kernel void @lshr_and_i64_35(i64 addrspace(1)* %out, i64 addrspace(1)* %in) {		define amdgpu_kernel void @lshr_and_i64_35(i64 addrspace(1)* %out, i64 addrspace(1)* %in) {
%val = load i64, i64 addrspace(1)* %in		%val = load i64, i64 addrspace(1)* %in
%and = and i64 %val, 9223372036854775807 ; 0x7fffffffffffffff		%and = and i64 %val, 9223372036854775807 ; 0x7fffffffffffffff
%shl = lshr i64 %and, 40		%shl = lshr i64 %and, 40
store i64 %shl, i64 addrspace(1)* %out		store i64 %shl, i64 addrspace(1)* %out
ret void		ret void
}		}
▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

test/CodeGen/ARM/2014-01-09-pseudo_expand_implicit_reg.ll

	; RUN: llc -mtriple=arm-eabi -mattr=+neon -print-before=post-RA-sched %s -o - 2>&1 \			; RUN: llc -mtriple=arm-eabi -mattr=+neon -print-before=post-RA-sched %s -o - 2>&1 \
	; RUN: \| FileCheck %s			; RUN: \| FileCheck %s

	define void @vst(i8* %m, [4 x i64] %v) {			define void @vst(i8* %m, [4 x i64] %v) {
	entry:			entry:
	; CHECK: vst:			; CHECK: vst:
	; CHECK: VST1d64Q %R{{[0-9]+}}<kill>, 8, %D{{[0-9]+}}, pred:14, pred:%noreg, %Q{{[0-9]+}}_Q{{[0-9]+}}<imp-use,kill>			; CHECK: VST1d64Q %R{{[0-9]+}}<kill>, 8, %D{{[0-9]+}}, pred:14, pred:%noreg, %Q{{[0-9]+}}_Q{{[0-9]+}}<imp-use>

	%v0 = extractvalue [4 x i64] %v, 0			%v0 = extractvalue [4 x i64] %v, 0
	%v1 = extractvalue [4 x i64] %v, 1			%v1 = extractvalue [4 x i64] %v, 1
	%v2 = extractvalue [4 x i64] %v, 2			%v2 = extractvalue [4 x i64] %v, 2
	%v3 = extractvalue [4 x i64] %v, 3			%v3 = extractvalue [4 x i64] %v, 3

	%t0 = bitcast i64 %v0 to <8 x i8>			%t0 = bitcast i64 %v0 to <8 x i8>
	%t1 = bitcast i64 %v1 to <8 x i8>			%t1 = bitcast i64 %v1 to <8 x i8>
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

test/CodeGen/ARM/vector-DAGCombine.ll

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	entry:
br i1 undef, label %bb1, label %bb2		br i1 undef, label %bb1, label %bb2

bb1:		bb1:
%0 = bitcast <2 x i64> zeroinitializer to <2 x double>		%0 = bitcast <2 x i64> zeroinitializer to <2 x double>
%1 = extractelement <2 x double> %0, i32 0		%1 = extractelement <2 x double> %0, i32 0
%2 = bitcast double %1 to i64		%2 = bitcast double %1 to i64
%3 = insertelement <1 x i64> undef, i64 %2, i32 0		%3 = insertelement <1 x i64> undef, i64 %2, i32 0
; CHECK-NOT: vmov s		; CHECK-NOT: vmov s
; CHECK: vext.8
%4 = shufflevector <1 x i64> %3, <1 x i64> undef, <2 x i32> <i32 0, i32 1>		%4 = shufflevector <1 x i64> %3, <1 x i64> undef, <2 x i32> <i32 0, i32 1>
%tmp2006.3 = bitcast <2 x i64> %4 to <16 x i8>		%tmp2006.3 = bitcast <2 x i64> %4 to <16 x i8>
%5 = shufflevector <16 x i8> %tmp2006.3, <16 x i8> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19>		%5 = shufflevector <16 x i8> %tmp2006.3, <16 x i8> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19>
%tmp2004.3 = bitcast <16 x i8> %5 to <4 x i32>		%tmp2004.3 = bitcast <16 x i8> %5 to <4 x i32>
br i1 undef, label %bb2, label %bb1		br i1 undef, label %bb2, label %bb1

bb2:		bb2:
%result = phi <4 x i32> [ undef, %entry ], [ %tmp2004.3, %bb1 ]		%result = phi <4 x i32> [ undef, %entry ], [ %tmp2004.3, %bb1 ]
▲ Show 20 Lines • Show All 195 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/no-pref-jumps.ll

	; RUN: llc -verify-machineinstrs -mcpu=pwr7 < %s \| FileCheck %s			; RUN: llc -verify-machineinstrs -mcpu=pwr7 < %s \| FileCheck %s
	target datalayout = "E-m:e-i64:64-n32:64"			target datalayout = "E-m:e-i64:64-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define void @foo(i32 signext %a, i32 signext %b) #0 {			define void @foo(i32 signext %a, i32 signext %b) #0 {
	entry:			entry:
	%cmp = icmp sgt i32 %a, 5			%cmp = icmp sgt i32 %a, 5
	%cmp1 = icmp slt i32 %b, 3			%cmp1 = icmp slt i32 %b, 3
	%or.cond = or i1 %cmp, %cmp1			%or.cond = or i1 %cmp, %cmp1
	br i1 %or.cond, label %if.then, label %if.else			br i1 %or.cond, label %if.then, label %if.else

	; CHECK-LABEL: @foo			; CHECK-LABEL: @foo
	; CHECK: cmpwi			; CHECK: cmpwi
	; CHECK: cmpwi			; CHECK: cmpwi
	; CHECK: cror			; CHECK: crand
	; CHECK: blr			; CHECK: blr

	if.then: ; preds = %entry			if.then: ; preds = %entry
	tail call void bitcast (void (...)* @bar to void ()*)() #0			tail call void bitcast (void (...)* @bar to void ()*)() #0
	br label %if.end			br label %if.end

	if.else: ; preds = %entry			if.else: ; preds = %entry
	tail call void bitcast (void (...)* @car to void ()*)() #0			tail call void bitcast (void (...)* @car to void ()*)() #0
	Show All 12 Lines

test/CodeGen/SPARC/32abi.ll

	Show First 20 Lines • Show All 195 Lines • ▼ Show 20 Lines
	; CHECK-BE-NEXT: addcc %g4, %i1, %i1			; CHECK-BE-NEXT: addcc %g4, %i1, %i1
	; CHECK-BE-NEXT: ld [%fp+104], %i2			; CHECK-BE-NEXT: ld [%fp+104], %i2
	; CHECK-BE-NEXT: addxcc %i5, %i0, %i0			; CHECK-BE-NEXT: addxcc %i5, %i0, %i0
	; CHECK-BE-NEXT: addcc %g2, %i1, %i1			; CHECK-BE-NEXT: addcc %g2, %i1, %i1
	; CHECK-BE-NEXT: addxcc %g3, %i0, %i0			; CHECK-BE-NEXT: addxcc %g3, %i0, %i0
	; CHECK-BE-NEXT: addcc %i2, %i1, %i1			; CHECK-BE-NEXT: addcc %i2, %i1, %i1
	; CHECK-BE-NEXT: addxcc %i0, 0, %i0			; CHECK-BE-NEXT: addxcc %i0, 0, %i0
	;			;
	; CHECK-LE: ld [%fp+96], %g2			; CHECK-LE: ldd [%fp+96], %g2
	; CHECK-LE-NEXT: ld [%fp+100], %g3
	; CHECK-LE-NEXT: ld [%fp+92], %g4			; CHECK-LE-NEXT: ld [%fp+92], %g4
	; CHECK-LE-NEXT: addcc %i0, %i2, %i0			; CHECK-LE-NEXT: addcc %i0, %i2, %i0
	; CHECK-LE-NEXT: addxcc %i1, 0, %i1			; CHECK-LE-NEXT: addxcc %i1, 0, %i1
	; CHECK-LE-NEXT: addcc %i3, %i0, %i0			; CHECK-LE-NEXT: addcc %i3, %i0, %i0
	; CHECK-LE-NEXT: addxcc %i4, %i1, %i1			; CHECK-LE-NEXT: addxcc %i4, %i1, %i1
	; CHECK-LE-NEXT: addcc %i5, %i0, %i0			; CHECK-LE-NEXT: addcc %i5, %i0, %i0
	; CHECK-LE-NEXT: ld [%fp+104], %i2			; CHECK-LE-NEXT: ld [%fp+104], %i2
	; CHECK-LE-NEXT: addxcc %g4, %i1, %i1			; CHECK-LE-NEXT: addxcc %g4, %i1, %i1
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/selectcc-01.ll

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	; CHECK: br %r14
%res = select i1 %cond, i32 -1, i32 0		%res = select i1 %cond, i32 -1, i32 0
ret i32 %res		ret i32 %res
}		}

; Test CC in { 1, 2 }		; Test CC in { 1, 2 }
define i32 @f6(float %a, float %b) {		define i32 @f6(float %a, float %b) {
; CHECK-LABEL: f6:		; CHECK-LABEL: f6:
; CHECK: ipm %r2		; CHECK: ipm %r2
; CHECK-NEXT: afi %r2, 268435456
; CHECK-NEXT: sll %r2, 2		; CHECK-NEXT: sll %r2, 2
		; CHECK-NEXT: afi %r2, 1073741824
; CHECK-NEXT: sra %r2, 31		; CHECK-NEXT: sra %r2, 31
; CHECK: br %r14		; CHECK: br %r14
%cond = fcmp one float %a, %b		%cond = fcmp one float %a, %b
%res = select i1 %cond, i32 -1, i32 0		%res = select i1 %cond, i32 -1, i32 0
ret i32 %res		ret i32 %res
}		}

; Test CC in { 0, 1, 2 }		; Test CC in { 0, 1, 2 }
Show All 19 Lines	; CHECK: br %r14
%res = select i1 %cond, i32 -1, i32 0		%res = select i1 %cond, i32 -1, i32 0
ret i32 %res		ret i32 %res
}		}

; Test CC in { 0, 3 }		; Test CC in { 0, 3 }
define i32 @f9(float %a, float %b) {		define i32 @f9(float %a, float %b) {
; CHECK-LABEL: f9:		; CHECK-LABEL: f9:
; CHECK: ipm %r2		; CHECK: ipm %r2
; CHECK-NEXT: afi %r2, -268435456
; CHECK-NEXT: sll %r2, 2		; CHECK-NEXT: sll %r2, 2
		; CHECK-NEXT: afi %r2, -1073741824
; CHECK-NEXT: sra %r2, 31		; CHECK-NEXT: sra %r2, 31
; CHECK: br %r14		; CHECK: br %r14
%cond = fcmp ueq float %a, %b		%cond = fcmp ueq float %a, %b
%res = select i1 %cond, i32 -1, i32 0		%res = select i1 %cond, i32 -1, i32 0
ret i32 %res		ret i32 %res
}		}

; Test CC in { 1, 3 }		; Test CC in { 1, 3 }
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/selectcc-02.ll

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	; CHECK: br %r14
%res = select i1 %cond, i32 0, i32 -1		%res = select i1 %cond, i32 0, i32 -1
ret i32 %res		ret i32 %res
}		}

; Test CC in { 0, 3 }		; Test CC in { 0, 3 }
define i32 @f6(float %a, float %b) {		define i32 @f6(float %a, float %b) {
; CHECK-LABEL: f6:		; CHECK-LABEL: f6:
; CHECK: ipm %r2		; CHECK: ipm %r2
; CHECK-NEXT: afi %r2, -268435456
; CHECK-NEXT: sll %r2, 2		; CHECK-NEXT: sll %r2, 2
		; CHECK-NEXT: afi %r2, -1073741824
; CHECK-NEXT: sra %r2, 31		; CHECK-NEXT: sra %r2, 31
; CHECK: br %r14		; CHECK: br %r14
%cond = fcmp one float %a, %b		%cond = fcmp one float %a, %b
%res = select i1 %cond, i32 0, i32 -1		%res = select i1 %cond, i32 0, i32 -1
ret i32 %res		ret i32 %res
}		}

; Test CC in { 3 }		; Test CC in { 3 }
Show All 19 Lines	; CHECK: br %r14
%res = select i1 %cond, i32 0, i32 -1		%res = select i1 %cond, i32 0, i32 -1
ret i32 %res		ret i32 %res
}		}

; Test CC in { 1, 2 }		; Test CC in { 1, 2 }
define i32 @f9(float %a, float %b) {		define i32 @f9(float %a, float %b) {
; CHECK-LABEL: f9:		; CHECK-LABEL: f9:
; CHECK: ipm %r2		; CHECK: ipm %r2
; CHECK-NEXT: afi %r2, 268435456
; CHECK-NEXT: sll %r2, 2		; CHECK-NEXT: sll %r2, 2
		; CHECK-NEXT: afi %r2, 1073741824
; CHECK-NEXT: sra %r2, 31		; CHECK-NEXT: sra %r2, 31
; CHECK: br %r14		; CHECK: br %r14
%cond = fcmp ueq float %a, %b		%cond = fcmp ueq float %a, %b
%res = select i1 %cond, i32 0, i32 -1		%res = select i1 %cond, i32 0, i32 -1
ret i32 %res		ret i32 %res
}		}

; Test CC in { 0, 2 }		; Test CC in { 0, 2 }
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

test/CodeGen/X86/avx512-any_extend_load.ll

Show All 25 Lines	; SKX-NEXT: retq
store <8 x i8> %4, <8 x i8>* %ptr, align 1		store <8 x i8> %4, <8 x i8>* %ptr, align 1
ret void		ret void
}		}

define void @any_extend_load_v8i32(<8 x i8> * %ptr) {		define void @any_extend_load_v8i32(<8 x i8> * %ptr) {
; KNL-LABEL: any_extend_load_v8i32:		; KNL-LABEL: any_extend_load_v8i32:
; KNL: # BB#0:		; KNL: # BB#0:
; KNL-NEXT: vpmovzxbw {{.*#+}} xmm0 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero		; KNL-NEXT: vpmovzxbw {{.*#+}} xmm0 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero
; KNL-NEXT: vpaddw {{.*}}(%rip), %xmm0, %xmm0		; KNL-NEXT: vpaddb {{.*}}(%rip), %xmm0, %xmm0
; KNL-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,2,4,6,8,10,12,14,u,u,u,u,u,u,u,u]		; KNL-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,2,4,6,8,10,12,14,u,u,u,u,u,u,u,u]
; KNL-NEXT: vmovq %xmm0, (%rdi)		; KNL-NEXT: vmovq %xmm0, (%rdi)
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: any_extend_load_v8i32:		; SKX-LABEL: any_extend_load_v8i32:
; SKX: # BB#0:		; SKX: # BB#0:
; SKX-NEXT: vpmovzxbd {{.*#+}} ymm0 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero,mem[4],zero,zero,zero,mem[5],zero,zero,zero,mem[6],zero,zero,zero,mem[7],zero,zero,zero		; SKX-NEXT: vpmovzxbd {{.*#+}} ymm0 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero,mem[4],zero,zero,zero,mem[5],zero,zero,zero,mem[6],zero,zero,zero,mem[7],zero,zero,zero
; SKX-NEXT: vpaddd {{.*}}(%rip){1to8}, %ymm0, %ymm0		; SKX-NEXT: vpaddd {{.*}}(%rip){1to8}, %ymm0, %ymm0
Show All 35 Lines

test/CodeGen/X86/constant-combines.ll

	Show All 9 Lines
	; The DAG combiner at one point contained bugs that given enough permutations			; The DAG combiner at one point contained bugs that given enough permutations
	; would incorrectly form an illegal operation for the last of these stores when			; would incorrectly form an illegal operation for the last of these stores when
	; it folded it to a zero too late to legalize the zero store operation. If this			; it folded it to a zero too late to legalize the zero store operation. If this
	; ever starts forming a zero store instead of movss, the test case has stopped			; ever starts forming a zero store instead of movss, the test case has stopped
	; being useful.			; being useful.
	;			;
	; CHECK-LABEL: PR22524:			; CHECK-LABEL: PR22524:
	; CHECK: # BB#0: # %entry			; CHECK: # BB#0: # %entry
				; CHECK-NEXT: movq $0, (%rdi)
	; CHECK-NEXT: movl $0, 4(%rdi)			; CHECK-NEXT: movl $0, 4(%rdi)
	; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: movd %eax, %xmm0
	; CHECK-NEXT: xorps %xmm1, %xmm1
	; CHECK-NEXT: mulss %xmm0, %xmm1
	; CHECK-NEXT: movl $0, (%rdi)
	; CHECK-NEXT: movss %xmm1, 4(%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%0 = getelementptr inbounds { float, float }, { float, float }* %arg, i32 0, i32 1			%0 = getelementptr inbounds { float, float }, { float, float }* %arg, i32 0, i32 1
	store float 0.000000e+00, float* %0, align 4			store float 0.000000e+00, float* %0, align 4
	%1 = getelementptr inbounds { float, float }, { float, float }* %arg, i64 0, i32 0			%1 = getelementptr inbounds { float, float }, { float, float }* %arg, i64 0, i32 0
	%2 = bitcast float* %1 to i64*			%2 = bitcast float* %1 to i64*
	%3 = load i64, i64* %2, align 8			%3 = load i64, i64* %2, align 8
	%4 = trunc i64 %3 to i32			%4 = trunc i64 %3 to i32
	Show All 9 Lines

test/CodeGen/X86/divide-by-constant.ll

	Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines

	}			}

	define zeroext i8 @test3(i8 zeroext %x, i8 zeroext %c) nounwind readnone ssp noredzone {			define zeroext i8 @test3(i8 zeroext %x, i8 zeroext %c) nounwind readnone ssp noredzone {
	; X32-LABEL: test3:			; X32-LABEL: test3:
	; X32: # BB#0: # %entry			; X32: # BB#0: # %entry
	; X32-NEXT: movzbl {{[0-9]+}}(%esp), %eax			; X32-NEXT: movzbl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: imull $171, %eax, %eax			; X32-NEXT: imull $171, %eax, %eax
	; X32-NEXT: andl $65024, %eax # imm = 0xFE00
	; X32-NEXT: shrl $9, %eax			; X32-NEXT: shrl $9, %eax
	; X32-NEXT: # kill: %AL<def> %AL<kill> %EAX<kill>			; X32-NEXT: # kill: %AL<def> %AL<kill> %EAX<kill>
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test3:			; X64-LABEL: test3:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: imull $171, %esi, %eax			; X64-NEXT: imull $171, %esi, %eax
	; X64-NEXT: andl $65024, %eax # imm = 0xFE00
	; X64-NEXT: shrl $9, %eax			; X64-NEXT: shrl $9, %eax
	; X64-NEXT: # kill: %AL<def> %AL<kill> %EAX<kill>			; X64-NEXT: # kill: %AL<def> %AL<kill> %EAX<kill>
	; X64-NEXT: retq			; X64-NEXT: retq
	entry:			entry:
	%div = udiv i8 %c, 3			%div = udiv i8 %c, 3
	ret i8 %div			ret i8 %div
	}			}

	▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
	; PR13326			; PR13326
	define i8 @test8(i8 %x) nounwind {			define i8 @test8(i8 %x) nounwind {
	; X32-LABEL: test8:			; X32-LABEL: test8:
	; X32: # BB#0:			; X32: # BB#0:
	; X32-NEXT: movb {{[0-9]+}}(%esp), %al			; X32-NEXT: movb {{[0-9]+}}(%esp), %al
	; X32-NEXT: shrb %al			; X32-NEXT: shrb %al
	; X32-NEXT: movzbl %al, %eax			; X32-NEXT: movzbl %al, %eax
	; X32-NEXT: imull $211, %eax, %eax			; X32-NEXT: imull $211, %eax, %eax
	; X32-NEXT: andl $24576, %eax # imm = 0x6000
	; X32-NEXT: shrl $13, %eax			; X32-NEXT: shrl $13, %eax
	; X32-NEXT: # kill: %AL<def> %AL<kill> %EAX<kill>			; X32-NEXT: # kill: %AL<def> %AL<kill> %EAX<kill>
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test8:			; X64-LABEL: test8:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: shrb %dil			; X64-NEXT: shrb %dil
	; X64-NEXT: movzbl %dil, %eax			; X64-NEXT: movzbl %dil, %eax
	; X64-NEXT: imull $211, %eax, %eax			; X64-NEXT: imull $211, %eax, %eax
	; X64-NEXT: andl $24576, %eax # imm = 0x6000
	; X64-NEXT: shrl $13, %eax			; X64-NEXT: shrl $13, %eax
	; X64-NEXT: # kill: %AL<def> %AL<kill> %EAX<kill>			; X64-NEXT: # kill: %AL<def> %AL<kill> %EAX<kill>
	; X64-NEXT: retq			; X64-NEXT: retq
	%div = udiv i8 %x, 78			%div = udiv i8 %x, 78
	ret i8 %div			ret i8 %div
	}			}

	define i8 @test9(i8 %x) nounwind {			define i8 @test9(i8 %x) nounwind {
	; X32-LABEL: test9:			; X32-LABEL: test9:
	; X32: # BB#0:			; X32: # BB#0:
	; X32-NEXT: movb {{[0-9]+}}(%esp), %al			; X32-NEXT: movb {{[0-9]+}}(%esp), %al
	; X32-NEXT: shrb $2, %al			; X32-NEXT: shrb $2, %al
	; X32-NEXT: movzbl %al, %eax			; X32-NEXT: movzbl %al, %eax
	; X32-NEXT: imull $71, %eax, %eax			; X32-NEXT: imull $71, %eax, %eax
	; X32-NEXT: andl $6144, %eax # imm = 0x1800
	; X32-NEXT: shrl $11, %eax			; X32-NEXT: shrl $11, %eax
	; X32-NEXT: # kill: %AL<def> %AL<kill> %EAX<kill>			; X32-NEXT: # kill: %AL<def> %AL<kill> %EAX<kill>
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test9:			; X64-LABEL: test9:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: shrb $2, %dil			; X64-NEXT: shrb $2, %dil
	; X64-NEXT: movzbl %dil, %eax			; X64-NEXT: movzbl %dil, %eax
	; X64-NEXT: imull $71, %eax, %eax			; X64-NEXT: imull $71, %eax, %eax
	; X64-NEXT: andl $6144, %eax # imm = 0x1800
	; X64-NEXT: shrl $11, %eax			; X64-NEXT: shrl $11, %eax
	; X64-NEXT: # kill: %AL<def> %AL<kill> %EAX<kill>			; X64-NEXT: # kill: %AL<def> %AL<kill> %EAX<kill>
	; X64-NEXT: retq			; X64-NEXT: retq
	%div = udiv i8 %x, 116			%div = udiv i8 %x, 116
	ret i8 %div			ret i8 %div
	}			}

	define i32 @testsize1(i32 %x) minsize nounwind {			define i32 @testsize1(i32 %x) minsize nounwind {
	▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

test/CodeGen/X86/illegal-bitfield-loadstore.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 \| FileCheck %s

	define void @i24_or(i24* %a) {			define void @i24_or(i24* %a) {
	; CHECK-LABEL: i24_or:			; CHECK-LABEL: i24_or:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzwl (%rdi), %eax			; CHECK-NEXT: movzwl (%rdi), %eax
	; CHECK-NEXT: movzbl 2(%rdi), %ecx			; CHECK-NEXT: orl $384, %eax # imm = 0x180
	; CHECK-NEXT: movb %cl, 2(%rdi)			; CHECK-NEXT: movw %ax, (%rdi)
	; CHECK-NEXT: shll $16, %ecx
	; CHECK-NEXT: orl %eax, %ecx
	; CHECK-NEXT: orl $384, %ecx # imm = 0x180
	; CHECK-NEXT: movw %cx, (%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%aa = load i24, i24* %a, align 1			%aa = load i24, i24* %a, align 1
	%b = or i24 %aa, 384			%b = or i24 %aa, 384
	store i24 %b, i24* %a, align 1			store i24 %b, i24* %a, align 1
	ret void			ret void
	}			}

	define void @i24_and_or(i24* %a) {			define void @i24_and_or(i24* %a) {
	; CHECK-LABEL: i24_and_or:			; CHECK-LABEL: i24_and_or:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzwl (%rdi), %eax			; CHECK-NEXT: movzwl (%rdi), %eax
	; CHECK-NEXT: movzbl 2(%rdi), %ecx			; CHECK-NEXT: orl $384, %eax # imm = 0x180
	; CHECK-NEXT: movb %cl, 2(%rdi)			; CHECK-NEXT: andl $65408, %eax # imm = 0xFF80
	; CHECK-NEXT: shll $16, %ecx			; CHECK-NEXT: movw %ax, (%rdi)
	; CHECK-NEXT: orl %eax, %ecx
	; CHECK-NEXT: orl $384, %ecx # imm = 0x180
	; CHECK-NEXT: andl $16777088, %ecx # imm = 0xFFFF80
	; CHECK-NEXT: movw %cx, (%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%b = load i24, i24* %a, align 1			%b = load i24, i24* %a, align 1
	%c = and i24 %b, -128			%c = and i24 %b, -128
	%d = or i24 %c, 384			%d = or i24 %c, 384
	store i24 %d, i24* %a, align 1			store i24 %d, i24* %a, align 1
	ret void			ret void
	}			}

	define void @i24_insert_bit(i24* %a, i1 zeroext %bit) {			define void @i24_insert_bit(i24* %a, i1 zeroext %bit) {
	; CHECK-LABEL: i24_insert_bit:			; CHECK-LABEL: i24_insert_bit:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzbl %sil, %eax			; CHECK-NEXT: movzbl %sil, %eax
	; CHECK-NEXT: movzwl (%rdi), %ecx			; CHECK-NEXT: movzwl (%rdi), %ecx
	; CHECK-NEXT: movzbl 2(%rdi), %edx
	; CHECK-NEXT: movb %dl, 2(%rdi)
	; CHECK-NEXT: shll $16, %edx
	; CHECK-NEXT: orl %ecx, %edx
	; CHECK-NEXT: shll $13, %eax			; CHECK-NEXT: shll $13, %eax
	; CHECK-NEXT: andl $16769023, %edx # imm = 0xFFDFFF			; CHECK-NEXT: andl $57343, %ecx # imm = 0xDFFF
	; CHECK-NEXT: orl %eax, %edx			; CHECK-NEXT: orl %eax, %ecx
	; CHECK-NEXT: movw %dx, (%rdi)			; CHECK-NEXT: movw %cx, (%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%extbit = zext i1 %bit to i24			%extbit = zext i1 %bit to i24
	%b = load i24, i24* %a, align 1			%b = load i24, i24* %a, align 1
	%extbit.shl = shl nuw nsw i24 %extbit, 13			%extbit.shl = shl nuw nsw i24 %extbit, 13
	%c = and i24 %b, -8193			%c = and i24 %b, -8193
	%d = or i24 %c, %extbit.shl			%d = or i24 %c, %extbit.shl
	store i24 %d, i24* %a, align 1			store i24 %d, i24* %a, align 1
	ret void			ret void
	}			}

	define void @i56_or(i56* %a) {			define void @i56_or(i56* %a) {
	; CHECK-LABEL: i56_or:			; CHECK-LABEL: i56_or:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzwl 4(%rdi), %eax			; CHECK-NEXT: movzwl 4(%rdi), %eax
	; CHECK-NEXT: movzbl 6(%rdi), %ecx			; CHECK-NEXT: movw %ax, 4(%rdi)
	; CHECK-NEXT: movl (%rdi), %edx			; CHECK-NEXT: orl $384, (%rdi) # imm = 0x180
	; CHECK-NEXT: movb %cl, 6(%rdi)
	; CHECK-NEXT: # kill: %ECX<def> %ECX<kill> %RCX<kill> %RCX<def>
	; CHECK-NEXT: shll $16, %ecx
	; CHECK-NEXT: orl %eax, %ecx
	; CHECK-NEXT: shlq $32, %rcx
	; CHECK-NEXT: orq %rcx, %rdx
	; CHECK-NEXT: orq $384, %rdx # imm = 0x180
	; CHECK-NEXT: movl %edx, (%rdi)
	; CHECK-NEXT: shrq $32, %rdx
	; CHECK-NEXT: movw %dx, 4(%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%aa = load i56, i56* %a, align 1			%aa = load i56, i56* %a, align 1
	%b = or i56 %aa, 384			%b = or i56 %aa, 384
	store i56 %b, i56* %a, align 1			store i56 %b, i56* %a, align 1
	ret void			ret void
	}			}

	define void @i56_and_or(i56* %a) {			define void @i56_and_or(i56* %a) {
	; CHECK-LABEL: i56_and_or:			; CHECK-LABEL: i56_and_or:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzwl 4(%rdi), %eax			; CHECK-NEXT: movzwl 4(%rdi), %eax
	; CHECK-NEXT: movzbl 6(%rdi), %ecx			; CHECK-NEXT: movl $384, %ecx # imm = 0x180
	; CHECK-NEXT: movl (%rdi), %edx			; CHECK-NEXT: orl (%rdi), %ecx
	; CHECK-NEXT: movb %cl, 6(%rdi)			; CHECK-NEXT: andl $-128, %ecx
	; CHECK-NEXT: # kill: %ECX<def> %ECX<kill> %RCX<kill> %RCX<def>
	; CHECK-NEXT: shll $16, %ecx
	; CHECK-NEXT: orl %eax, %ecx
	; CHECK-NEXT: shlq $32, %rcx
	; CHECK-NEXT: orq %rcx, %rdx
	; CHECK-NEXT: orq $384, %rdx # imm = 0x180
	; CHECK-NEXT: movabsq $72057594037927808, %rax # imm = 0xFFFFFFFFFFFF80
	; CHECK-NEXT: andq %rdx, %rax
	; CHECK-NEXT: movl %eax, (%rdi)
	; CHECK-NEXT: shrq $32, %rax
	; CHECK-NEXT: movw %ax, 4(%rdi)			; CHECK-NEXT: movw %ax, 4(%rdi)
				; CHECK-NEXT: movl %ecx, (%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%b = load i56, i56* %a, align 1			%b = load i56, i56* %a, align 1
	%c = and i56 %b, -128			%c = and i56 %b, -128
	%d = or i56 %c, 384			%d = or i56 %c, 384
	store i56 %d, i56* %a, align 1			store i56 %d, i56* %a, align 1
	ret void			ret void
	}			}

	define void @i56_insert_bit(i56* %a, i1 zeroext %bit) {			define void @i56_insert_bit(i56* %a, i1 zeroext %bit) {
	; CHECK-LABEL: i56_insert_bit:			; CHECK-LABEL: i56_insert_bit:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzbl %sil, %eax			; CHECK-NEXT: movzwl 4(%rdi), %eax
	; CHECK-NEXT: movzwl 4(%rdi), %ecx			; CHECK-NEXT: movl $-8193, %ecx # imm = 0xDFFF
	; CHECK-NEXT: movzbl 6(%rdi), %edx			; CHECK-NEXT: andl (%rdi), %ecx
	; CHECK-NEXT: movl (%rdi), %esi			; CHECK-NEXT: movzbl %sil, %edx
	; CHECK-NEXT: movb %dl, 6(%rdi)			; CHECK-NEXT: shll $13, %edx
	; CHECK-NEXT: # kill: %EDX<def> %EDX<kill> %RDX<kill> %RDX<def>
	; CHECK-NEXT: shll $16, %edx
	; CHECK-NEXT: orl %ecx, %edx			; CHECK-NEXT: orl %ecx, %edx
	; CHECK-NEXT: shlq $32, %rdx			; CHECK-NEXT: movw %ax, 4(%rdi)
	; CHECK-NEXT: orq %rdx, %rsi			; CHECK-NEXT: movl %edx, (%rdi)
	; CHECK-NEXT: shlq $13, %rax
	; CHECK-NEXT: movabsq $72057594037919743, %rcx # imm = 0xFFFFFFFFFFDFFF
	; CHECK-NEXT: andq %rsi, %rcx
	; CHECK-NEXT: orq %rax, %rcx
	; CHECK-NEXT: movl %ecx, (%rdi)
	; CHECK-NEXT: shrq $32, %rcx
	; CHECK-NEXT: movw %cx, 4(%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%extbit = zext i1 %bit to i56			%extbit = zext i1 %bit to i56
	%b = load i56, i56* %a, align 1			%b = load i56, i56* %a, align 1
	%extbit.shl = shl nuw nsw i56 %extbit, 13			%extbit.shl = shl nuw nsw i56 %extbit, 13
	%c = and i56 %b, -8193			%c = and i56 %b, -8193
	%d = or i56 %c, %extbit.shl			%d = or i56 %c, %extbit.shl
	store i56 %d, i56* %a, align 1			store i56 %d, i56* %a, align 1
	ret void			ret void
	}			}

test/CodeGen/X86/known-bits.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=X32			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=X32
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=X64

	define void @knownbits_zext_in_reg(i8*) nounwind {			define void @knownbits_zext_in_reg(i8*) nounwind {
	; X32-LABEL: knownbits_zext_in_reg:			; X32-LABEL: knownbits_zext_in_reg:
	; X32: # BB#0: # %BB			; X32: # BB#0: # %BB
	; X32-NEXT: pushl %ebp			; X32-NEXT: pushl %ebp
	; X32-NEXT: pushl %ebx			; X32-NEXT: pushl %ebx
	; X32-NEXT: pushl %edi			; X32-NEXT: pushl %edi
	; X32-NEXT: pushl %esi			; X32-NEXT: pushl %esi
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: movzbl (%eax), %eax			; X32-NEXT: movzbl (%eax), %eax
	; X32-NEXT: imull $101, %eax, %eax			; X32-NEXT: imull $101, %eax, %eax
	; X32-NEXT: andl $16384, %eax # imm = 0x4000
	; X32-NEXT: shrl $14, %eax			; X32-NEXT: shrl $14, %eax
	; X32-NEXT: movzbl %al, %eax			; X32-NEXT: movzbl %al, %eax
	; X32-NEXT: vmovd %eax, %xmm0			; X32-NEXT: vmovd %eax, %xmm0
	; X32-NEXT: vpshufb {{.*#+}} xmm0 = zero,zero,zero,zero,xmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero			; X32-NEXT: vpshufb {{.*#+}} xmm0 = zero,zero,zero,zero,xmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; X32-NEXT: vpextrd $1, %xmm0, %ebp			; X32-NEXT: vpextrd $1, %xmm0, %ebp
	; X32-NEXT: xorl %ecx, %ecx			; X32-NEXT: xorl %ecx, %ecx
	; X32-NEXT: vmovd %xmm0, %esi			; X32-NEXT: vmovd %xmm0, %esi
	; X32-NEXT: vpextrd $2, %xmm0, %edi			; X32-NEXT: vpextrd $2, %xmm0, %edi
	Show All 21 Lines
	; X32-NEXT: testb %cl, %cl			; X32-NEXT: testb %cl, %cl
	; X32-NEXT: jne .LBB0_2			; X32-NEXT: jne .LBB0_2
	; X32-NEXT: jmp .LBB0_1			; X32-NEXT: jmp .LBB0_1
	;			;
	; X64-LABEL: knownbits_zext_in_reg:			; X64-LABEL: knownbits_zext_in_reg:
	; X64: # BB#0: # %BB			; X64: # BB#0: # %BB
	; X64-NEXT: movzbl (%rdi), %eax			; X64-NEXT: movzbl (%rdi), %eax
	; X64-NEXT: imull $101, %eax, %eax			; X64-NEXT: imull $101, %eax, %eax
	; X64-NEXT: andl $16384, %eax # imm = 0x4000
	; X64-NEXT: shrl $14, %eax			; X64-NEXT: shrl $14, %eax
	; X64-NEXT: movzbl %al, %eax			; X64-NEXT: movzbl %al, %eax
	; X64-NEXT: vmovd %eax, %xmm0			; X64-NEXT: vmovd %eax, %xmm0
	; X64-NEXT: vpshufb {{.*#+}} xmm0 = zero,zero,zero,zero,xmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero			; X64-NEXT: vpshufb {{.*#+}} xmm0 = zero,zero,zero,zero,xmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
	; X64-NEXT: vpextrd $1, %xmm0, %r8d			; X64-NEXT: vpextrd $1, %xmm0, %r8d
	; X64-NEXT: xorl %esi, %esi			; X64-NEXT: xorl %esi, %esi
	; X64-NEXT: vmovd %xmm0, %r9d			; X64-NEXT: vmovd %xmm0, %r9d
	; X64-NEXT: vpextrd $2, %xmm0, %edi			; X64-NEXT: vpextrd $2, %xmm0, %edi
	▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

test/CodeGen/X86/legalize-shift-64.ll

	Show First 20 Lines • Show All 144 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: movl %esp, %ebp			; CHECK-NEXT: movl %esp, %ebp
	; CHECK-NEXT: .Lcfi10:			; CHECK-NEXT: .Lcfi10:
	; CHECK-NEXT: .cfi_def_cfa_register %ebp			; CHECK-NEXT: .cfi_def_cfa_register %ebp
	; CHECK-NEXT: andl $-8, %esp			; CHECK-NEXT: andl $-8, %esp
	; CHECK-NEXT: subl $16, %esp			; CHECK-NEXT: subl $16, %esp
	; CHECK-NEXT: movl $1, {{[0-9]+}}(%esp)			; CHECK-NEXT: movl $1, {{[0-9]+}}(%esp)
	; CHECK-NEXT: movl $0, {{[0-9]+}}(%esp)			; CHECK-NEXT: movl $0, {{[0-9]+}}(%esp)
	; CHECK-NEXT: movl $1, (%esp)			; CHECK-NEXT: movl $1, (%esp)
				; CHECK-NEXT: movb $1, %al
				; CHECK-NEXT: testb %al, %al
				; CHECK-NEXT: jne .LBB5_3
				; CHECK-NEXT: # BB#1: # %if.then
	; CHECK-NEXT: movl $1, %eax			; CHECK-NEXT: movl $1, %eax
	; CHECK-NEXT: xorl %ecx, %ecx			; CHECK-NEXT: jmp .LBB5_2
	; CHECK-NEXT: shldl $32, %eax, %ecx			; CHECK-NEXT: .LBB5_3: # %if.end
	; CHECK-NEXT: movb $32, %dl
	; CHECK-NEXT: testb %dl, %dl
	; CHECK-NEXT: jne .LBB5_2
	; CHECK-NEXT: # BB#1:
	; CHECK-NEXT: movl %ecx, %eax
	; CHECK-NEXT: .LBB5_2:
	; CHECK-NEXT: sete %cl
	; CHECK-NEXT: movzbl %cl, %ecx
	; CHECK-NEXT: xorl $1, %eax
	; CHECK-NEXT: orl %ecx, %eax
	; CHECK-NEXT: je .LBB5_5
	; CHECK-NEXT: # BB#3: # %if.then
	; CHECK-NEXT: movl $1, %eax
	; CHECK-NEXT: jmp .LBB5_4
	; CHECK-NEXT: .LBB5_5: # %if.end
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: .LBB5_4: # %if.then			; CHECK-NEXT: .LBB5_2: # %if.then
	; CHECK-NEXT: movl %ebp, %esp			; CHECK-NEXT: movl %ebp, %esp
	; CHECK-NEXT: popl %ebp			; CHECK-NEXT: popl %ebp
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%x = alloca i32, align 4			%x = alloca i32, align 4
	%t = alloca i64, align 8			%t = alloca i64, align 8
	store i32 1, i32* %x, align 4			store i32 1, i32* %x, align 4
	store i64 1, i64* %t, align 8 ;; DEAD			store i64 1, i64* %t, align 8 ;; DEAD
	%load = load i32, i32* %x, align 4			%load = load i32, i32* %x, align 4
	Show All 14 Lines

test/CodeGen/X86/movmsk.ll

	Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
	}			}

	; PR11570			; PR11570
	; FIXME: This should also use movmskps; we don't form the FGETSIGN node			; FIXME: This should also use movmskps; we don't form the FGETSIGN node
	; in this case, though.			; in this case, though.
	define void @float_call_signbit(double %n) {			define void @float_call_signbit(double %n) {
	; CHECK-LABEL: float_call_signbit:			; CHECK-LABEL: float_call_signbit:
	; CHECK: ## BB#0: ## %entry			; CHECK: ## BB#0: ## %entry
	; CHECK-NEXT: movq %xmm0, %rdi			; CHECK-NEXT: movmskpd %xmm0, %edi
	; CHECK-NEXT: shrq $63, %rdi			; CHECK-NEXT: andl $1, %edi
	; CHECK-NEXT: ## kill: %EDI<def> %EDI<kill> %RDI<kill>
	; CHECK-NEXT: jmp _float_call_signbit_callee ## TAILCALL			; CHECK-NEXT: jmp _float_call_signbit_callee ## TAILCALL
	entry:			entry:
	%t0 = bitcast double %n to i64			%t0 = bitcast double %n to i64
	%tobool.i.i.i.i = icmp slt i64 %t0, 0			%tobool.i.i.i.i = icmp slt i64 %t0, 0
	tail call void @float_call_signbit_callee(i1 zeroext %tobool.i.i.i.i)			tail call void @float_call_signbit_callee(i1 zeroext %tobool.i.i.i.i)
	ret void			ret void
	}			}
	declare void @float_call_signbit_callee(i1 zeroext)			declare void @float_call_signbit_callee(i1 zeroext)
	Show All 36 Lines

test/CodeGen/X86/oddshuffles.ll

Show First 20 Lines • Show All 227 Lines • ▼ Show 20 Lines	; AVX2-NEXT: retq
%r = shufflevector <4 x float> %a, <4 x float> %b, <5 x i32> <i32 0, i32 5, i32 1, i32 6, i32 3>		%r = shufflevector <4 x float> %a, <4 x float> %b, <5 x i32> <i32 0, i32 5, i32 1, i32 6, i32 3>
store <5 x float> %r, <5 x float>* %p		store <5 x float> %r, <5 x float>* %p
ret void		ret void
}		}

define void @v7i8(<4 x i8> %a, <4 x i8> %b, <7 x i8>* %p) nounwind {		define void @v7i8(<4 x i8> %a, <4 x i8> %b, <7 x i8>* %p) nounwind {
; SSE2-LABEL: v7i8:		; SSE2-LABEL: v7i8:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,3,1,3]		; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,2,2,3,4,5,6,7]
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [0,65535,0,65535,0,65535,65535,65535]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,3,2,3]
; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,1,0,3]		; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,2,1,3,4,5,6,7]
; SSE2-NEXT: pshuflw {{.*#+}} xmm1 = xmm1[0,0,2,0,4,5,6,7]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [0,255,0,255,0,255,255,255,255,255,255,255,255,255,255,255]
; SSE2-NEXT: pshufhw {{.*#+}} xmm1 = xmm1[0,1,2,3,4,6,4,7]		; SSE2-NEXT: pshufhw {{.*#+}} xmm3 = xmm1[0,1,2,3,4,6,6,7]
; SSE2-NEXT: pand %xmm2, %xmm1		; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm3[0,1,2,0]
		; SSE2-NEXT: pshufhw {{.*#+}} xmm3 = xmm3[0,1,2,3,4,6,5,7]
		; SSE2-NEXT: punpckhbw {{.*#+}} xmm3 = xmm3[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]
		; SSE2-NEXT: pshuflw {{.*#+}} xmm3 = xmm3[0,2,2,3,4,5,6,7]
		; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm3[0,2,2,3]
		; SSE2-NEXT: pshuflw {{.*#+}} xmm3 = xmm3[0,0,2,1,4,5,6,7]
		; SSE2-NEXT: pand %xmm2, %xmm3
; SSE2-NEXT: pandn %xmm0, %xmm2		; SSE2-NEXT: pandn %xmm0, %xmm2
; SSE2-NEXT: por %xmm1, %xmm2		; SSE2-NEXT: por %xmm3, %xmm2
; SSE2-NEXT: movdqa {{.*#+}} xmm0 = [255,255,255,255,255,255,255,255]		; SSE2-NEXT: movdqa %xmm1, -{{[0-9]+}}(%rsp)
; SSE2-NEXT: pand %xmm2, %xmm0
; SSE2-NEXT: packuswb %xmm0, %xmm0
; SSE2-NEXT: movdqa %xmm2, -{{[0-9]+}}(%rsp)
; SSE2-NEXT: movb -{{[0-9]+}}(%rsp), %al		; SSE2-NEXT: movb -{{[0-9]+}}(%rsp), %al
; SSE2-NEXT: movb %al, 6(%rdi)		; SSE2-NEXT: movb %al, 6(%rdi)
; SSE2-NEXT: movd %xmm0, (%rdi)		; SSE2-NEXT: movd %xmm2, (%rdi)
; SSE2-NEXT: pextrw $2, %xmm0, %eax		; SSE2-NEXT: pextrw $2, %xmm2, %eax
; SSE2-NEXT: movw %ax, 4(%rdi)		; SSE2-NEXT: movw %ax, 4(%rdi)
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE42-LABEL: v7i8:		; SSE42-LABEL: v7i8:
; SSE42: # BB#0:		; SSE42: # BB#0:
; SSE42-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,3,1,3]		; SSE42-NEXT: pshufb {{.*#+}} xmm0 = xmm0[0],zero,xmm0[12],zero,xmm0[4],zero,zero,xmm0[u,u,u,u,u,u,u,u,u]
; SSE42-NEXT: pextrb $0, %xmm1, 6(%rdi)		; SSE42-NEXT: pextrb $0, %xmm1, 6(%rdi)
; SSE42-NEXT: pshufb {{.*#+}} xmm1 = xmm1[8,9,8,9,4,5,8,9,0,1,12,13,0,1,14,15]		; SSE42-NEXT: pshufb {{.*#+}} xmm1 = zero,xmm1[8],zero,xmm1[8],zero,xmm1[12,0,u,u,u,u,u,u,u,u,u]
; SSE42-NEXT: pblendw {{.*#+}} xmm1 = xmm0[0],xmm1[1],xmm0[2],xmm1[3],xmm0[4],xmm1[5,6,7]		; SSE42-NEXT: por %xmm0, %xmm1
; SSE42-NEXT: pshufb {{.*#+}} xmm1 = xmm1[0,2,4,6,8,10,12,14,u,u,u,u,u,u,u,u]
; SSE42-NEXT: pextrw $2, %xmm1, 4(%rdi)		; SSE42-NEXT: pextrw $2, %xmm1, 4(%rdi)
; SSE42-NEXT: movd %xmm1, (%rdi)		; SSE42-NEXT: movd %xmm1, (%rdi)
; SSE42-NEXT: retq		; SSE42-NEXT: retq
;		;
; AVX-LABEL: v7i8:		; AVX-LABEL: v7i8:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,3,1,3]		; AVX-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0],zero,xmm0[12],zero,xmm0[4],zero,zero,xmm0[u,u,u,u,u,u,u,u,u]
; AVX-NEXT: vpshufb {{.*#+}} xmm2 = xmm1[8,9,8,9,4,5,8,9,0,1,12,13,0,1,14,15]		; AVX-NEXT: vpshufb {{.*#+}} xmm2 = zero,xmm1[8],zero,xmm1[8],zero,xmm1[12,0,u,u,u,u,u,u,u,u,u]
; AVX-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0],xmm2[1],xmm0[2],xmm2[3],xmm0[4],xmm2[5,6,7]		; AVX-NEXT: vpor %xmm0, %xmm2, %xmm0
; AVX-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,2,4,6,8,10,12,14,u,u,u,u,u,u,u,u]
; AVX-NEXT: vpextrb $0, %xmm1, 6(%rdi)		; AVX-NEXT: vpextrb $0, %xmm1, 6(%rdi)
; AVX-NEXT: vpextrw $2, %xmm0, 4(%rdi)		; AVX-NEXT: vpextrw $2, %xmm0, 4(%rdi)
; AVX-NEXT: vmovd %xmm0, (%rdi)		; AVX-NEXT: vmovd %xmm0, (%rdi)
; AVX-NEXT: retq		; AVX-NEXT: retq
%r = shufflevector <4 x i8> %a, <4 x i8> %b, <7 x i32> <i32 0, i32 6, i32 3, i32 6, i32 1, i32 7, i32 4>		%r = shufflevector <4 x i8> %a, <4 x i8> %b, <7 x i32> <i32 0, i32 6, i32 3, i32 6, i32 1, i32 7, i32 4>
store <7 x i8> %r, <7 x i8>* %p		store <7 x i8> %r, <7 x i8>* %p
ret void		ret void
}		}
▲ Show 20 Lines • Show All 1,184 Lines • Show Last 20 Lines

test/CodeGen/X86/or-branch.ll

	Show All 13 Lines
	; JUMP2-NEXT: # BB#2: # %UnifiedReturnBlock			; JUMP2-NEXT: # BB#2: # %UnifiedReturnBlock
	; JUMP2-NEXT: retl			; JUMP2-NEXT: retl
	; JUMP2-NEXT: .LBB0_3: # %cond_true			; JUMP2-NEXT: .LBB0_3: # %cond_true
	; JUMP2-NEXT: jmp bar # TAILCALL			; JUMP2-NEXT: jmp bar # TAILCALL
	;			;
	; JUMP1-LABEL: foo:			; JUMP1-LABEL: foo:
	; JUMP1: # BB#0: # %entry			; JUMP1: # BB#0: # %entry
	; JUMP1-NEXT: cmpl $0, {{[0-9]+}}(%esp)			; JUMP1-NEXT: cmpl $0, {{[0-9]+}}(%esp)
	; JUMP1-NEXT: sete %al			; JUMP1-NEXT: setne %al
	; JUMP1-NEXT: cmpl $5, {{[0-9]+}}(%esp)			; JUMP1-NEXT: cmpl $4, {{[0-9]+}}(%esp)
	; JUMP1-NEXT: setl %cl			; JUMP1-NEXT: setg %cl
	; JUMP1-NEXT: orb %al, %cl			; JUMP1-NEXT: testb %al, %cl
	; JUMP1-NEXT: cmpb $1, %cl
	; JUMP1-NEXT: jne .LBB0_1			; JUMP1-NEXT: jne .LBB0_1
	; JUMP1-NEXT: # BB#2: # %cond_true			; JUMP1-NEXT: # BB#2: # %cond_true
	; JUMP1-NEXT: jmp bar # TAILCALL			; JUMP1-NEXT: jmp bar # TAILCALL
	; JUMP1-NEXT: .LBB0_1: # %UnifiedReturnBlock			; JUMP1-NEXT: .LBB0_1: # %UnifiedReturnBlock
	; JUMP1-NEXT: retl			; JUMP1-NEXT: retl
	entry:			entry:
	%tmp1 = icmp eq i32 %X, 0			%tmp1 = icmp eq i32 %X, 0
	%tmp3 = icmp slt i32 %Y, 5			%tmp3 = icmp slt i32 %Y, 5
	Show All 10 Lines

	; If the branch is unpredictable, don't add another branch			; If the branch is unpredictable, don't add another branch
	; regardless of whether they are expensive or not.			; regardless of whether they are expensive or not.

	define void @unpredictable(i32 %X, i32 %Y, i32 %Z) nounwind {			define void @unpredictable(i32 %X, i32 %Y, i32 %Z) nounwind {
	; JUMP2-LABEL: unpredictable:			; JUMP2-LABEL: unpredictable:
	; JUMP2: # BB#0: # %entry			; JUMP2: # BB#0: # %entry
	; JUMP2-NEXT: cmpl $0, {{[0-9]+}}(%esp)			; JUMP2-NEXT: cmpl $0, {{[0-9]+}}(%esp)
	; JUMP2-NEXT: sete %al			; JUMP2-NEXT: setne %al
	; JUMP2-NEXT: cmpl $5, {{[0-9]+}}(%esp)			; JUMP2-NEXT: cmpl $4, {{[0-9]+}}(%esp)
	; JUMP2-NEXT: setl %cl			; JUMP2-NEXT: setg %cl
	; JUMP2-NEXT: orb %al, %cl			; JUMP2-NEXT: testb %al, %cl
	; JUMP2-NEXT: cmpb $1, %cl
	; JUMP2-NEXT: jne .LBB1_1			; JUMP2-NEXT: jne .LBB1_1
	; JUMP2-NEXT: # BB#2: # %cond_true			; JUMP2-NEXT: # BB#2: # %cond_true
	; JUMP2-NEXT: jmp bar # TAILCALL			; JUMP2-NEXT: jmp bar # TAILCALL
	; JUMP2-NEXT: .LBB1_1: # %UnifiedReturnBlock			; JUMP2-NEXT: .LBB1_1: # %UnifiedReturnBlock
	; JUMP2-NEXT: retl			; JUMP2-NEXT: retl
	;			;
	; JUMP1-LABEL: unpredictable:			; JUMP1-LABEL: unpredictable:
	; JUMP1: # BB#0: # %entry			; JUMP1: # BB#0: # %entry
	; JUMP1-NEXT: cmpl $0, {{[0-9]+}}(%esp)			; JUMP1-NEXT: cmpl $0, {{[0-9]+}}(%esp)
	; JUMP1-NEXT: sete %al			; JUMP1-NEXT: setne %al
	; JUMP1-NEXT: cmpl $5, {{[0-9]+}}(%esp)			; JUMP1-NEXT: cmpl $4, {{[0-9]+}}(%esp)
	; JUMP1-NEXT: setl %cl			; JUMP1-NEXT: setg %cl
	; JUMP1-NEXT: orb %al, %cl			; JUMP1-NEXT: testb %al, %cl
	; JUMP1-NEXT: cmpb $1, %cl
	; JUMP1-NEXT: jne .LBB1_1			; JUMP1-NEXT: jne .LBB1_1
	; JUMP1-NEXT: # BB#2: # %cond_true			; JUMP1-NEXT: # BB#2: # %cond_true
	; JUMP1-NEXT: jmp bar # TAILCALL			; JUMP1-NEXT: jmp bar # TAILCALL
	; JUMP1-NEXT: .LBB1_1: # %UnifiedReturnBlock			; JUMP1-NEXT: .LBB1_1: # %UnifiedReturnBlock
	; JUMP1-NEXT: retl			; JUMP1-NEXT: retl
	entry:			entry:
	%tmp1 = icmp eq i32 %X, 0			%tmp1 = icmp eq i32 %X, 0
	%tmp3 = icmp slt i32 %Y, 5			%tmp3 = icmp slt i32 %Y, 5
	Show All 15 Lines

test/CodeGen/X86/popcnt.ll

	Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; X32-NEXT: andl $21845, %ecx # imm = 0x5555			; X32-NEXT: andl $21845, %ecx # imm = 0x5555
	; X32-NEXT: subl %ecx, %eax			; X32-NEXT: subl %ecx, %eax
	; X32-NEXT: movl %eax, %ecx			; X32-NEXT: movl %eax, %ecx
	; X32-NEXT: andl $13107, %ecx # imm = 0x3333			; X32-NEXT: andl $13107, %ecx # imm = 0x3333
	; X32-NEXT: shrl $2, %eax			; X32-NEXT: shrl $2, %eax
	; X32-NEXT: andl $13107, %eax # imm = 0x3333			; X32-NEXT: andl $13107, %eax # imm = 0x3333
	; X32-NEXT: addl %ecx, %eax			; X32-NEXT: addl %ecx, %eax
	; X32-NEXT: movl %eax, %ecx			; X32-NEXT: movl %eax, %ecx
	; X32-NEXT: andl $32752, %ecx # imm = 0x7FF0
	; X32-NEXT: shrl $4, %ecx			; X32-NEXT: shrl $4, %ecx
	; X32-NEXT: addl %eax, %ecx			; X32-NEXT: addl %eax, %ecx
	; X32-NEXT: andl $3855, %ecx # imm = 0xF0F			; X32-NEXT: andl $3855, %ecx # imm = 0xF0F
	; X32-NEXT: movl %ecx, %eax			; X32-NEXT: movl %ecx, %eax
	; X32-NEXT: shll $8, %eax			; X32-NEXT: shll $8, %eax
	; X32-NEXT: addl %ecx, %eax			; X32-NEXT: addl %ecx, %eax
	; X32-NEXT: movzbl %ah, %eax			; X32-NEXT: movzbl %ah, %eax
	; X32-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>			; X32-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: cnt16:			; X64-LABEL: cnt16:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: shrl %eax			; X64-NEXT: shrl %eax
	; X64-NEXT: andl $21845, %eax # imm = 0x5555			; X64-NEXT: andl $21845, %eax # imm = 0x5555
	; X64-NEXT: subl %eax, %edi			; X64-NEXT: subl %eax, %edi
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: andl $13107, %eax # imm = 0x3333			; X64-NEXT: andl $13107, %eax # imm = 0x3333
	; X64-NEXT: shrl $2, %edi			; X64-NEXT: shrl $2, %edi
	; X64-NEXT: andl $13107, %edi # imm = 0x3333			; X64-NEXT: andl $13107, %edi # imm = 0x3333
	; X64-NEXT: addl %eax, %edi			; X64-NEXT: addl %eax, %edi
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: andl $32752, %eax # imm = 0x7FF0
	; X64-NEXT: shrl $4, %eax			; X64-NEXT: shrl $4, %eax
	; X64-NEXT: addl %edi, %eax			; X64-NEXT: addl %edi, %eax
	; X64-NEXT: andl $3855, %eax # imm = 0xF0F			; X64-NEXT: andl $3855, %eax # imm = 0xF0F
	; X64-NEXT: movl %eax, %ecx			; X64-NEXT: movl %eax, %ecx
	; X64-NEXT: shll $8, %ecx			; X64-NEXT: shll $8, %ecx
	; X64-NEXT: addl %eax, %ecx			; X64-NEXT: addl %eax, %ecx
	; X64-NEXT: movzbl %ch, %eax # NOREX			; X64-NEXT: movzbl %ch, %eax # NOREX
	; X64-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>			; X64-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>
	▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

test/CodeGen/X86/sse3.ll

Show First 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	entry:
%tmp9 = shufflevector <16 x i8> %tmp8, <16 x i8> %T0, <16 x i32> < i32 0, i32 1, i32 2, i32 17, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef , i32 undef >		%tmp9 = shufflevector <16 x i8> %tmp8, <16 x i8> %T0, <16 x i32> < i32 0, i32 1, i32 2, i32 17, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef , i32 undef >
ret <16 x i8> %tmp9		ret <16 x i8> %tmp9
}		}

; rdar://8520311		; rdar://8520311
define <4 x i32> @t17() nounwind {		define <4 x i32> @t17() nounwind {
; X64-LABEL: t17:		; X64-LABEL: t17:
; X64: ## BB#0: ## %entry		; X64: ## BB#0: ## %entry
; X64-NEXT: movaps (%rax), %xmm0		; X64-NEXT: pshufd {{.*#+}} xmm0 = mem[0,0,1,1]
; X64-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0,0,1,1]
; X64-NEXT: pxor %xmm1, %xmm1		; X64-NEXT: pxor %xmm1, %xmm1
; X64-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]		; X64-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
; X64-NEXT: retq		; X64-NEXT: retq
entry:		entry:
%tmp1 = load <4 x float>, <4 x float>* undef, align 16		%tmp1 = load <4 x float>, <4 x float>* undef, align 16
%tmp2 = shufflevector <4 x float> %tmp1, <4 x float> undef, <4 x i32> <i32 4, i32 1, i32 2, i32 3>		%tmp2 = shufflevector <4 x float> %tmp1, <4 x float> undef, <4 x i32> <i32 4, i32 1, i32 2, i32 3>
%tmp3 = load <4 x float>, <4 x float>* undef, align 16		%tmp3 = load <4 x float>, <4 x float>* undef, align 16
%tmp4 = shufflevector <4 x float> %tmp2, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>		%tmp4 = shufflevector <4 x float> %tmp2, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>
%tmp5 = bitcast <4 x float> %tmp3 to <4 x i32>		%tmp5 = bitcast <4 x float> %tmp3 to <4 x i32>
%tmp6 = shufflevector <4 x i32> %tmp5, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>		%tmp6 = shufflevector <4 x i32> %tmp5, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>
%tmp7 = and <4 x i32> %tmp6, <i32 undef, i32 undef, i32 -1, i32 0>		%tmp7 = and <4 x i32> %tmp6, <i32 undef, i32 undef, i32 -1, i32 0>
ret <4 x i32> %tmp7		ret <4 x i32> %tmp7
}		}

test/CodeGen/X86/urem-i8-constant.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i386-unknown-unknown \| FileCheck %s			; RUN: llc < %s -mtriple=i386-unknown-unknown \| FileCheck %s

	define i8 @foo(i8 %tmp325) {			define i8 @foo(i8 %tmp325) {
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzbl {{[0-9]+}}(%esp), %ecx			; CHECK-NEXT: movzbl {{[0-9]+}}(%esp), %ecx
	; CHECK-NEXT: imull $111, %ecx, %eax			; CHECK-NEXT: imull $111, %ecx, %eax
	; CHECK-NEXT: andl $28672, %eax # imm = 0x7000
	; CHECK-NEXT: shrl $12, %eax			; CHECK-NEXT: shrl $12, %eax
	; CHECK-NEXT: movb $37, %dl			; CHECK-NEXT: movb $37, %dl
	; CHECK-NEXT: # kill: %AL<def> %AL<kill> %EAX<kill>			; CHECK-NEXT: # kill: %AL<def> %AL<kill> %EAX<kill>
	; CHECK-NEXT: mulb %dl			; CHECK-NEXT: mulb %dl
	; CHECK-NEXT: subb %al, %cl			; CHECK-NEXT: subb %al, %cl
	; CHECK-NEXT: movl %ecx, %eax			; CHECK-NEXT: movl %ecx, %eax
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	;
	%t546 = urem i8 %tmp325, 37			%t546 = urem i8 %tmp325, 37
	ret i8 %t546			ret i8 %t546
	}			}

test/CodeGen/X86/vec_extract-mmx.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown -mattr=+mmx,+sse2 \| FileCheck %s --check-prefix=X32			; RUN: llc < %s -mtriple=i686-unknown -mattr=+mmx,+sse2 \| FileCheck %s --check-prefix=X32
	; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+mmx,+sse2 \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+mmx,+sse2 \| FileCheck %s --check-prefix=X64

	define i32 @test0(<1 x i64>* %v4) nounwind {			define i32 @test0(<1 x i64>* %v4) nounwind {
	; X32-LABEL: test0:			; X32-LABEL: test0:
	; X32: # BB#0: # %entry			; X32: # BB#0: # %entry
	; X32-NEXT: pushl %ebp			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: movl %esp, %ebp			; X32-NEXT: pshufw $238, (%eax), %mm0 # mm0 = mem[2,3,2,3]
	; X32-NEXT: andl $-8, %esp
	; X32-NEXT: subl $8, %esp
	; X32-NEXT: movl 8(%ebp), %eax
	; X32-NEXT: movl (%eax), %ecx
	; X32-NEXT: movl 4(%eax), %eax
	; X32-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; X32-NEXT: movl %ecx, (%esp)
	; X32-NEXT: pshufw $238, (%esp), %mm0 # mm0 = mem[2,3,2,3]
	; X32-NEXT: movd %mm0, %eax			; X32-NEXT: movd %mm0, %eax
	; X32-NEXT: addl $32, %eax			; X32-NEXT: addl $32, %eax
	; X32-NEXT: movl %ebp, %esp
	; X32-NEXT: popl %ebp
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test0:			; X64-LABEL: test0:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: pshufw $238, (%rdi), %mm0 # mm0 = mem[2,3,2,3]			; X64-NEXT: pshufw $238, (%rdi), %mm0 # mm0 = mem[2,3,2,3]
	; X64-NEXT: movd %mm0, %eax			; X64-NEXT: movd %mm0, %eax
	; X64-NEXT: addl $32, %eax			; X64-NEXT: addl $32, %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-sext.ll

	Show First 20 Lines • Show All 1,512 Lines • ▼ Show 20 Lines
	%X = load <4 x i8>, <4 x i8>* %ptr			%X = load <4 x i8>, <4 x i8>* %ptr
	%Y = sext <4 x i8> %X to <4 x i32>			%Y = sext <4 x i8> %X to <4 x i32>
	ret <4 x i32> %Y			ret <4 x i32> %Y
	}			}

	define <4 x i64> @load_sext_4i1_to_4i64(<4 x i1> *%ptr) {			define <4 x i64> @load_sext_4i1_to_4i64(<4 x i1> *%ptr) {
	; SSE2-LABEL: load_sext_4i1_to_4i64:			; SSE2-LABEL: load_sext_4i1_to_4i64:
	; SSE2: # BB#0: # %entry			; SSE2: # BB#0: # %entry
	; SSE2-NEXT: movl (%rdi), %eax			; SSE2-NEXT: movzbl (%rdi), %eax
	; SSE2-NEXT: movl %eax, %ecx			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $3, %ecx			; SSE2-NEXT: shrl $3, %ecx
	; SSE2-NEXT: movd %ecx, %xmm0			; SSE2-NEXT: movd %ecx, %xmm0
	; SSE2-NEXT: movl %eax, %ecx			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl %ecx			; SSE2-NEXT: shrl %ecx
	; SSE2-NEXT: movd %ecx, %xmm1			; SSE2-NEXT: movd %ecx, %xmm1
	; SSE2-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]			; SSE2-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
	; SSE2-NEXT: movd %eax, %xmm2			; SSE2-NEXT: movd %eax, %xmm2
	Show All 9 Lines
	; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm2[2,1,3,3]			; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm2[2,1,3,3]
	; SSE2-NEXT: psllq $63, %xmm1			; SSE2-NEXT: psllq $63, %xmm1
	; SSE2-NEXT: psrad $31, %xmm1			; SSE2-NEXT: psrad $31, %xmm1
	; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,3,3]			; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,3,3]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSSE3-LABEL: load_sext_4i1_to_4i64:			; SSSE3-LABEL: load_sext_4i1_to_4i64:
	; SSSE3: # BB#0: # %entry			; SSSE3: # BB#0: # %entry
	; SSSE3-NEXT: movl (%rdi), %eax			; SSSE3-NEXT: movzbl (%rdi), %eax
	; SSSE3-NEXT: movl %eax, %ecx			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $3, %ecx			; SSSE3-NEXT: shrl $3, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm0			; SSSE3-NEXT: movd %ecx, %xmm0
	; SSSE3-NEXT: movl %eax, %ecx			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl %ecx			; SSSE3-NEXT: shrl %ecx
	; SSSE3-NEXT: movd %ecx, %xmm1			; SSSE3-NEXT: movd %ecx, %xmm1
	; SSSE3-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]			; SSSE3-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
	; SSSE3-NEXT: movd %eax, %xmm2			; SSSE3-NEXT: movd %eax, %xmm2
	Show All 9 Lines
	; SSSE3-NEXT: pshufd {{.*#+}} xmm1 = xmm2[2,1,3,3]			; SSSE3-NEXT: pshufd {{.*#+}} xmm1 = xmm2[2,1,3,3]
	; SSSE3-NEXT: psllq $63, %xmm1			; SSSE3-NEXT: psllq $63, %xmm1
	; SSSE3-NEXT: psrad $31, %xmm1			; SSSE3-NEXT: psrad $31, %xmm1
	; SSSE3-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,3,3]			; SSSE3-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,3,3]
	; SSSE3-NEXT: retq			; SSSE3-NEXT: retq
	;			;
	; SSE41-LABEL: load_sext_4i1_to_4i64:			; SSE41-LABEL: load_sext_4i1_to_4i64:
	; SSE41: # BB#0: # %entry			; SSE41: # BB#0: # %entry
	; SSE41-NEXT: movl (%rdi), %eax			; SSE41-NEXT: movzbl (%rdi), %eax
	; SSE41-NEXT: movl %eax, %ecx			; SSE41-NEXT: movl %eax, %ecx
	; SSE41-NEXT: shrl %ecx			; SSE41-NEXT: shrl %ecx
	; SSE41-NEXT: movd %eax, %xmm1			; SSE41-NEXT: movd %eax, %xmm1
	; SSE41-NEXT: pinsrd $1, %ecx, %xmm1			; SSE41-NEXT: pinsrd $1, %ecx, %xmm1
	; SSE41-NEXT: movl %eax, %ecx			; SSE41-NEXT: movl %eax, %ecx
	; SSE41-NEXT: shrl $2, %ecx			; SSE41-NEXT: shrl $2, %ecx
	; SSE41-NEXT: pinsrd $2, %ecx, %xmm1			; SSE41-NEXT: pinsrd $2, %ecx, %xmm1
	; SSE41-NEXT: shrl $3, %eax			; SSE41-NEXT: shrl $3, %eax
	▲ Show 20 Lines • Show All 548 Lines • ▼ Show 20 Lines
	ret <8 x i64> %Y			ret <8 x i64> %Y
	}			}

	define <8 x i32> @load_sext_8i1_to_8i32(<8 x i1> *%ptr) {			define <8 x i32> @load_sext_8i1_to_8i32(<8 x i1> *%ptr) {
	; SSE2-LABEL: load_sext_8i1_to_8i32:			; SSE2-LABEL: load_sext_8i1_to_8i32:
	; SSE2: # BB#0: # %entry			; SSE2: # BB#0: # %entry
	; SSE2-NEXT: movzbl (%rdi), %eax			; SSE2-NEXT: movzbl (%rdi), %eax
	; SSE2-NEXT: movl %eax, %ecx			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $6, %ecx			; SSE2-NEXT: shrl $7, %ecx
	; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm0			; SSE2-NEXT: movd %ecx, %xmm0
	; SSE2-NEXT: movl %eax, %ecx			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $2, %ecx			; SSE2-NEXT: shrl $3, %ecx
	; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm2
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]
	; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: andl $1, %ecx			; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm1			; SSE2-NEXT: movd %ecx, %xmm1
	; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $4, %ecx
	; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm0
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]			; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3]
	; SSE2-NEXT: movl %eax, %ecx			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $5, %ecx			; SSE2-NEXT: shrl $5, %ecx
	; SSE2-NEXT: andl $1, %ecx			; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm0			; SSE2-NEXT: movd %ecx, %xmm0
	; SSE2-NEXT: movl %eax, %ecx			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl %ecx			; SSE2-NEXT: shrl %ecx
	; SSE2-NEXT: andl $1, %ecx			; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm2			; SSE2-NEXT: movd %ecx, %xmm2
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]			; SSE2-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]
				; SSE2-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]
	; SSE2-NEXT: movl %eax, %ecx			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $3, %ecx			; SSE2-NEXT: shrl $6, %ecx
	; SSE2-NEXT: andl $1, %ecx			; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm0			; SSE2-NEXT: movd %ecx, %xmm0
	; SSE2-NEXT: shrl $7, %eax			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: movzwl %ax, %eax			; SSE2-NEXT: shrl $2, %ecx
	; SSE2-NEXT: movd %eax, %xmm3			; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1],xmm0[2],xmm3[2],xmm0[3],xmm3[3]			; SSE2-NEXT: movd %ecx, %xmm3
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]			; SSE2-NEXT: punpcklwd {{.*#+}} xmm3 = xmm3[0],xmm0[0],xmm3[1],xmm0[1],xmm3[2],xmm0[2],xmm3[3],xmm0[3]
				; SSE2-NEXT: movl %eax, %ecx
				; SSE2-NEXT: andl $1, %ecx
				; SSE2-NEXT: movd %ecx, %xmm1
				; SSE2-NEXT: shrl $4, %eax
				; SSE2-NEXT: andl $1, %eax
				; SSE2-NEXT: movd %eax, %xmm0
				; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]
				; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1],xmm1[2],xmm3[2],xmm1[3],xmm3[3]
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3]			; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3]
	; SSE2-NEXT: movdqa %xmm1, %xmm0			; SSE2-NEXT: movdqa %xmm1, %xmm0
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]			; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
	; SSE2-NEXT: pslld $31, %xmm0			; SSE2-NEXT: pslld $31, %xmm0
	; SSE2-NEXT: psrad $31, %xmm0			; SSE2-NEXT: psrad $31, %xmm0
	; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]			; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
	; SSE2-NEXT: pslld $31, %xmm1			; SSE2-NEXT: pslld $31, %xmm1
	; SSE2-NEXT: psrad $31, %xmm1			; SSE2-NEXT: psrad $31, %xmm1
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSSE3-LABEL: load_sext_8i1_to_8i32:			; SSSE3-LABEL: load_sext_8i1_to_8i32:
	; SSSE3: # BB#0: # %entry			; SSSE3: # BB#0: # %entry
	; SSSE3-NEXT: movzbl (%rdi), %eax			; SSSE3-NEXT: movzbl (%rdi), %eax
	; SSSE3-NEXT: movl %eax, %ecx			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $6, %ecx			; SSSE3-NEXT: shrl $7, %ecx
	; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm0			; SSSE3-NEXT: movd %ecx, %xmm0
	; SSSE3-NEXT: movl %eax, %ecx			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $2, %ecx			; SSSE3-NEXT: shrl $3, %ecx
	; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm2
	; SSSE3-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]
	; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: andl $1, %ecx			; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm1			; SSSE3-NEXT: movd %ecx, %xmm1
	; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $4, %ecx
	; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm0
	; SSSE3-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]			; SSSE3-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]
	; SSSE3-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3]
	; SSSE3-NEXT: movl %eax, %ecx			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $5, %ecx			; SSSE3-NEXT: shrl $5, %ecx
	; SSSE3-NEXT: andl $1, %ecx			; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm0			; SSSE3-NEXT: movd %ecx, %xmm0
	; SSSE3-NEXT: movl %eax, %ecx			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl %ecx			; SSSE3-NEXT: shrl %ecx
	; SSSE3-NEXT: andl $1, %ecx			; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm2			; SSSE3-NEXT: movd %ecx, %xmm2
	; SSSE3-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]			; SSSE3-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]
				; SSSE3-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]
	; SSSE3-NEXT: movl %eax, %ecx			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $3, %ecx			; SSSE3-NEXT: shrl $6, %ecx
	; SSSE3-NEXT: andl $1, %ecx			; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm0			; SSSE3-NEXT: movd %ecx, %xmm0
	; SSSE3-NEXT: shrl $7, %eax			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: movzwl %ax, %eax			; SSSE3-NEXT: shrl $2, %ecx
	; SSSE3-NEXT: movd %eax, %xmm3			; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1],xmm0[2],xmm3[2],xmm0[3],xmm3[3]			; SSSE3-NEXT: movd %ecx, %xmm3
	; SSSE3-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3]			; SSSE3-NEXT: punpcklwd {{.*#+}} xmm3 = xmm3[0],xmm0[0],xmm3[1],xmm0[1],xmm3[2],xmm0[2],xmm3[3],xmm0[3]
				; SSSE3-NEXT: movl %eax, %ecx
				; SSSE3-NEXT: andl $1, %ecx
				; SSSE3-NEXT: movd %ecx, %xmm1
				; SSSE3-NEXT: shrl $4, %eax
				; SSSE3-NEXT: andl $1, %eax
				; SSSE3-NEXT: movd %eax, %xmm0
				; SSSE3-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]
				; SSSE3-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1],xmm1[2],xmm3[2],xmm1[3],xmm3[3]
	; SSSE3-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3]			; SSSE3-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3]
	; SSSE3-NEXT: movdqa %xmm1, %xmm0			; SSSE3-NEXT: movdqa %xmm1, %xmm0
	; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]			; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
	; SSSE3-NEXT: pslld $31, %xmm0			; SSSE3-NEXT: pslld $31, %xmm0
	; SSSE3-NEXT: psrad $31, %xmm0			; SSSE3-NEXT: psrad $31, %xmm0
	; SSSE3-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]			; SSSE3-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
	; SSSE3-NEXT: pslld $31, %xmm1			; SSSE3-NEXT: pslld $31, %xmm1
	; SSSE3-NEXT: psrad $31, %xmm1			; SSSE3-NEXT: psrad $31, %xmm1
	Show All 25 Lines
	; SSE41-NEXT: shrl $5, %ecx			; SSE41-NEXT: shrl $5, %ecx
	; SSE41-NEXT: andl $1, %ecx			; SSE41-NEXT: andl $1, %ecx
	; SSE41-NEXT: pinsrw $5, %ecx, %xmm1			; SSE41-NEXT: pinsrw $5, %ecx, %xmm1
	; SSE41-NEXT: movl %eax, %ecx			; SSE41-NEXT: movl %eax, %ecx
	; SSE41-NEXT: shrl $6, %ecx			; SSE41-NEXT: shrl $6, %ecx
	; SSE41-NEXT: andl $1, %ecx			; SSE41-NEXT: andl $1, %ecx
	; SSE41-NEXT: pinsrw $6, %ecx, %xmm1			; SSE41-NEXT: pinsrw $6, %ecx, %xmm1
	; SSE41-NEXT: shrl $7, %eax			; SSE41-NEXT: shrl $7, %eax
	; SSE41-NEXT: movzwl %ax, %eax
	; SSE41-NEXT: pinsrw $7, %eax, %xmm1			; SSE41-NEXT: pinsrw $7, %eax, %xmm1
	; SSE41-NEXT: pmovzxwd {{.*#+}} xmm0 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero			; SSE41-NEXT: pmovzxwd {{.*#+}} xmm0 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero
	; SSE41-NEXT: pslld $31, %xmm0			; SSE41-NEXT: pslld $31, %xmm0
	; SSE41-NEXT: psrad $31, %xmm0			; SSE41-NEXT: psrad $31, %xmm0
	; SSE41-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]			; SSE41-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
	; SSE41-NEXT: pslld $31, %xmm1			; SSE41-NEXT: pslld $31, %xmm1
	; SSE41-NEXT: psrad $31, %xmm1			; SSE41-NEXT: psrad $31, %xmm1
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	▲ Show 20 Lines • Show All 669 Lines • ▼ Show 20 Lines
	ret <16 x i8> %Y			ret <16 x i8> %Y
	}			}

	define <16 x i16> @load_sext_16i1_to_16i16(<16 x i1> *%ptr) {			define <16 x i16> @load_sext_16i1_to_16i16(<16 x i1> *%ptr) {
	; SSE2-LABEL: load_sext_16i1_to_16i16:			; SSE2-LABEL: load_sext_16i1_to_16i16:
	; SSE2: # BB#0: # %entry			; SSE2: # BB#0: # %entry
	; SSE2-NEXT: movzwl (%rdi), %eax			; SSE2-NEXT: movzwl (%rdi), %eax
	; SSE2-NEXT: movl %eax, %ecx			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $14, %ecx			; SSE2-NEXT: shrl $15, %ecx
	; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm0			; SSE2-NEXT: movd %ecx, %xmm0
	; SSE2-NEXT: movl %eax, %ecx			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $6, %ecx			; SSE2-NEXT: shrl $7, %ecx
	; SSE2-NEXT: andl $1, %ecx			; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm1			; SSE2-NEXT: movd %ecx, %xmm1
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
	; SSE2-NEXT: movl %eax, %ecx			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $10, %ecx			; SSE2-NEXT: shrl $11, %ecx
	; SSE2-NEXT: andl $1, %ecx			; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm0			; SSE2-NEXT: movd %ecx, %xmm0
	; SSE2-NEXT: movl %eax, %ecx			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $2, %ecx			; SSE2-NEXT: shrl $3, %ecx
	; SSE2-NEXT: andl $1, %ecx			; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm2			; SSE2-NEXT: movd %ecx, %xmm2
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3],xmm2[4],xmm0[4],xmm2[5],xmm0[5],xmm2[6],xmm0[6],xmm2[7],xmm0[7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3],xmm2[4],xmm0[4],xmm2[5],xmm0[5],xmm2[6],xmm0[6],xmm2[7],xmm0[7]
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3],xmm2[4],xmm1[4],xmm2[5],xmm1[5],xmm2[6],xmm1[6],xmm2[7],xmm1[7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3],xmm2[4],xmm1[4],xmm2[5],xmm1[5],xmm2[6],xmm1[6],xmm2[7],xmm1[7]
	; SSE2-NEXT: movl %eax, %ecx			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $12, %ecx
	; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm0
	; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $4, %ecx
	; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm3
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm0[0],xmm3[1],xmm0[1],xmm3[2],xmm0[2],xmm3[3],xmm0[3],xmm3[4],xmm0[4],xmm3[5],xmm0[5],xmm3[6],xmm0[6],xmm3[7],xmm0[7]
	; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm1
	; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $8, %ecx
	; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm0
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1],xmm1[2],xmm3[2],xmm1[3],xmm3[3],xmm1[4],xmm3[4],xmm1[5],xmm3[5],xmm1[6],xmm3[6],xmm1[7],xmm3[7]
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3],xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
	; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $13, %ecx			; SSE2-NEXT: shrl $13, %ecx
	; SSE2-NEXT: andl $1, %ecx			; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm0			; SSE2-NEXT: movd %ecx, %xmm0
	; SSE2-NEXT: movl %eax, %ecx			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $5, %ecx			; SSE2-NEXT: shrl $5, %ecx
	; SSE2-NEXT: andl $1, %ecx			; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm2			; SSE2-NEXT: movd %ecx, %xmm1
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3],xmm2[4],xmm0[4],xmm2[5],xmm0[5],xmm2[6],xmm0[6],xmm2[7],xmm0[7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
	; SSE2-NEXT: movl %eax, %ecx			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $9, %ecx			; SSE2-NEXT: shrl $9, %ecx
	; SSE2-NEXT: andl $1, %ecx			; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm3			; SSE2-NEXT: movd %ecx, %xmm3
	; SSE2-NEXT: movl %eax, %ecx			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl %ecx			; SSE2-NEXT: shrl %ecx
	; SSE2-NEXT: andl $1, %ecx			; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm0			; SSE2-NEXT: movd %ecx, %xmm0
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1],xmm0[2],xmm3[2],xmm0[3],xmm3[3],xmm0[4],xmm3[4],xmm0[5],xmm3[5],xmm0[6],xmm3[6],xmm0[7],xmm3[7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1],xmm0[2],xmm3[2],xmm0[3],xmm3[3],xmm0[4],xmm3[4],xmm0[5],xmm3[5],xmm0[6],xmm3[6],xmm0[7],xmm3[7]
				; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]
	; SSE2-NEXT: movl %eax, %ecx			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $11, %ecx			; SSE2-NEXT: shrl $14, %ecx
				; SSE2-NEXT: andl $1, %ecx
				; SSE2-NEXT: movd %ecx, %xmm1
				; SSE2-NEXT: movl %eax, %ecx
				; SSE2-NEXT: shrl $6, %ecx
	; SSE2-NEXT: andl $1, %ecx			; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm2			; SSE2-NEXT: movd %ecx, %xmm2
				; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3],xmm2[4],xmm1[4],xmm2[5],xmm1[5],xmm2[6],xmm1[6],xmm2[7],xmm1[7]
	; SSE2-NEXT: movl %eax, %ecx			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $3, %ecx			; SSE2-NEXT: shrl $10, %ecx
				; SSE2-NEXT: andl $1, %ecx
				; SSE2-NEXT: movd %ecx, %xmm1
				; SSE2-NEXT: movl %eax, %ecx
				; SSE2-NEXT: shrl $2, %ecx
	; SSE2-NEXT: andl $1, %ecx			; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm3			; SSE2-NEXT: movd %ecx, %xmm3
				; SSE2-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm1[0],xmm3[1],xmm1[1],xmm3[2],xmm1[2],xmm3[3],xmm1[3],xmm3[4],xmm1[4],xmm3[5],xmm1[5],xmm3[6],xmm1[6],xmm3[7],xmm1[7]
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3],xmm3[4],xmm2[4],xmm3[5],xmm2[5],xmm3[6],xmm2[6],xmm3[7],xmm2[7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3],xmm3[4],xmm2[4],xmm3[5],xmm2[5],xmm3[6],xmm2[6],xmm3[7],xmm2[7]
	; SSE2-NEXT: movl %eax, %ecx			; SSE2-NEXT: movl %eax, %ecx
	; SSE2-NEXT: shrl $7, %ecx			; SSE2-NEXT: shrl $12, %ecx
				; SSE2-NEXT: andl $1, %ecx
				; SSE2-NEXT: movd %ecx, %xmm1
				; SSE2-NEXT: movl %eax, %ecx
				; SSE2-NEXT: shrl $4, %ecx
	; SSE2-NEXT: andl $1, %ecx			; SSE2-NEXT: andl $1, %ecx
	; SSE2-NEXT: movd %ecx, %xmm2			; SSE2-NEXT: movd %ecx, %xmm2
	; SSE2-NEXT: shrl $15, %eax			; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3],xmm2[4],xmm1[4],xmm2[5],xmm1[5],xmm2[6],xmm1[6],xmm2[7],xmm1[7]
	; SSE2-NEXT: movzwl %ax, %eax			; SSE2-NEXT: movl %eax, %ecx
				; SSE2-NEXT: andl $1, %ecx
				; SSE2-NEXT: movd %ecx, %xmm1
				; SSE2-NEXT: shrl $8, %eax
				; SSE2-NEXT: andl $1, %eax
	; SSE2-NEXT: movd %eax, %xmm4			; SSE2-NEXT: movd %eax, %xmm4
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1],xmm2[2],xmm4[2],xmm2[3],xmm4[3],xmm2[4],xmm4[4],xmm2[5],xmm4[5],xmm2[6],xmm4[6],xmm2[7],xmm4[7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm4[0],xmm1[1],xmm4[1],xmm1[2],xmm4[2],xmm1[3],xmm4[3],xmm1[4],xmm4[4],xmm1[5],xmm4[5],xmm1[6],xmm4[6],xmm1[7],xmm4[7]
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3],xmm3[4],xmm2[4],xmm3[5],xmm2[5],xmm3[6],xmm2[6],xmm3[7],xmm2[7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3],xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1],xmm0[2],xmm3[2],xmm0[3],xmm3[3],xmm0[4],xmm3[4],xmm0[5],xmm3[5],xmm0[6],xmm3[6],xmm0[7],xmm3[7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1],xmm1[2],xmm3[2],xmm1[3],xmm3[3],xmm1[4],xmm3[4],xmm1[5],xmm3[5],xmm1[6],xmm3[6],xmm1[7],xmm3[7]
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
	; SSE2-NEXT: movdqa %xmm1, %xmm0			; SSE2-NEXT: movdqa %xmm1, %xmm0
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
	; SSE2-NEXT: psllw $15, %xmm0			; SSE2-NEXT: psllw $15, %xmm0
	; SSE2-NEXT: psraw $15, %xmm0			; SSE2-NEXT: psraw $15, %xmm0
	; SSE2-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm0[8],xmm1[9],xmm0[9],xmm1[10],xmm0[10],xmm1[11],xmm0[11],xmm1[12],xmm0[12],xmm1[13],xmm0[13],xmm1[14],xmm0[14],xmm1[15],xmm0[15]			; SSE2-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm0[8],xmm1[9],xmm0[9],xmm1[10],xmm0[10],xmm1[11],xmm0[11],xmm1[12],xmm0[12],xmm1[13],xmm0[13],xmm1[14],xmm0[14],xmm1[15],xmm0[15]
	; SSE2-NEXT: psllw $15, %xmm1			; SSE2-NEXT: psllw $15, %xmm1
	; SSE2-NEXT: psraw $15, %xmm1			; SSE2-NEXT: psraw $15, %xmm1
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSSE3-LABEL: load_sext_16i1_to_16i16:			; SSSE3-LABEL: load_sext_16i1_to_16i16:
	; SSSE3: # BB#0: # %entry			; SSSE3: # BB#0: # %entry
	; SSSE3-NEXT: movzwl (%rdi), %eax			; SSSE3-NEXT: movzwl (%rdi), %eax
	; SSSE3-NEXT: movl %eax, %ecx			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $14, %ecx			; SSSE3-NEXT: shrl $15, %ecx
	; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm0			; SSSE3-NEXT: movd %ecx, %xmm0
	; SSSE3-NEXT: movl %eax, %ecx			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $6, %ecx			; SSSE3-NEXT: shrl $7, %ecx
	; SSSE3-NEXT: andl $1, %ecx			; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm1			; SSSE3-NEXT: movd %ecx, %xmm1
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]			; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
	; SSSE3-NEXT: movl %eax, %ecx			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $10, %ecx			; SSSE3-NEXT: shrl $11, %ecx
	; SSSE3-NEXT: andl $1, %ecx			; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm0			; SSSE3-NEXT: movd %ecx, %xmm0
	; SSSE3-NEXT: movl %eax, %ecx			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $2, %ecx			; SSSE3-NEXT: shrl $3, %ecx
	; SSSE3-NEXT: andl $1, %ecx			; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm2			; SSSE3-NEXT: movd %ecx, %xmm2
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3],xmm2[4],xmm0[4],xmm2[5],xmm0[5],xmm2[6],xmm0[6],xmm2[7],xmm0[7]			; SSSE3-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3],xmm2[4],xmm0[4],xmm2[5],xmm0[5],xmm2[6],xmm0[6],xmm2[7],xmm0[7]
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3],xmm2[4],xmm1[4],xmm2[5],xmm1[5],xmm2[6],xmm1[6],xmm2[7],xmm1[7]			; SSSE3-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3],xmm2[4],xmm1[4],xmm2[5],xmm1[5],xmm2[6],xmm1[6],xmm2[7],xmm1[7]
	; SSSE3-NEXT: movl %eax, %ecx			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $12, %ecx
	; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm0
	; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $4, %ecx
	; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm3
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm0[0],xmm3[1],xmm0[1],xmm3[2],xmm0[2],xmm3[3],xmm0[3],xmm3[4],xmm0[4],xmm3[5],xmm0[5],xmm3[6],xmm0[6],xmm3[7],xmm0[7]
	; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm1
	; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $8, %ecx
	; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm0
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1],xmm1[2],xmm3[2],xmm1[3],xmm3[3],xmm1[4],xmm3[4],xmm1[5],xmm3[5],xmm1[6],xmm3[6],xmm1[7],xmm3[7]
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3],xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
	; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $13, %ecx			; SSSE3-NEXT: shrl $13, %ecx
	; SSSE3-NEXT: andl $1, %ecx			; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm0			; SSSE3-NEXT: movd %ecx, %xmm0
	; SSSE3-NEXT: movl %eax, %ecx			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $5, %ecx			; SSSE3-NEXT: shrl $5, %ecx
	; SSSE3-NEXT: andl $1, %ecx			; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm2			; SSSE3-NEXT: movd %ecx, %xmm1
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1],xmm2[2],xmm0[2],xmm2[3],xmm0[3],xmm2[4],xmm0[4],xmm2[5],xmm0[5],xmm2[6],xmm0[6],xmm2[7],xmm0[7]			; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
	; SSSE3-NEXT: movl %eax, %ecx			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $9, %ecx			; SSSE3-NEXT: shrl $9, %ecx
	; SSSE3-NEXT: andl $1, %ecx			; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm3			; SSSE3-NEXT: movd %ecx, %xmm3
	; SSSE3-NEXT: movl %eax, %ecx			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl %ecx			; SSSE3-NEXT: shrl %ecx
	; SSSE3-NEXT: andl $1, %ecx			; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm0			; SSSE3-NEXT: movd %ecx, %xmm0
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1],xmm0[2],xmm3[2],xmm0[3],xmm3[3],xmm0[4],xmm3[4],xmm0[5],xmm3[5],xmm0[6],xmm3[6],xmm0[7],xmm3[7]			; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1],xmm0[2],xmm3[2],xmm0[3],xmm3[3],xmm0[4],xmm3[4],xmm0[5],xmm3[5],xmm0[6],xmm3[6],xmm0[7],xmm3[7]
				; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]			; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3],xmm0[4],xmm2[4],xmm0[5],xmm2[5],xmm0[6],xmm2[6],xmm0[7],xmm2[7]
	; SSSE3-NEXT: movl %eax, %ecx			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $11, %ecx			; SSSE3-NEXT: shrl $14, %ecx
				; SSSE3-NEXT: andl $1, %ecx
				; SSSE3-NEXT: movd %ecx, %xmm1
				; SSSE3-NEXT: movl %eax, %ecx
				; SSSE3-NEXT: shrl $6, %ecx
	; SSSE3-NEXT: andl $1, %ecx			; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm2			; SSSE3-NEXT: movd %ecx, %xmm2
				; SSSE3-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3],xmm2[4],xmm1[4],xmm2[5],xmm1[5],xmm2[6],xmm1[6],xmm2[7],xmm1[7]
	; SSSE3-NEXT: movl %eax, %ecx			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $3, %ecx			; SSSE3-NEXT: shrl $10, %ecx
				; SSSE3-NEXT: andl $1, %ecx
				; SSSE3-NEXT: movd %ecx, %xmm1
				; SSSE3-NEXT: movl %eax, %ecx
				; SSSE3-NEXT: shrl $2, %ecx
	; SSSE3-NEXT: andl $1, %ecx			; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm3			; SSSE3-NEXT: movd %ecx, %xmm3
				; SSSE3-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm1[0],xmm3[1],xmm1[1],xmm3[2],xmm1[2],xmm3[3],xmm1[3],xmm3[4],xmm1[4],xmm3[5],xmm1[5],xmm3[6],xmm1[6],xmm3[7],xmm1[7]
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3],xmm3[4],xmm2[4],xmm3[5],xmm2[5],xmm3[6],xmm2[6],xmm3[7],xmm2[7]			; SSSE3-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3],xmm3[4],xmm2[4],xmm3[5],xmm2[5],xmm3[6],xmm2[6],xmm3[7],xmm2[7]
	; SSSE3-NEXT: movl %eax, %ecx			; SSSE3-NEXT: movl %eax, %ecx
	; SSSE3-NEXT: shrl $7, %ecx			; SSSE3-NEXT: shrl $12, %ecx
				; SSSE3-NEXT: andl $1, %ecx
				; SSSE3-NEXT: movd %ecx, %xmm1
				; SSSE3-NEXT: movl %eax, %ecx
				; SSSE3-NEXT: shrl $4, %ecx
	; SSSE3-NEXT: andl $1, %ecx			; SSSE3-NEXT: andl $1, %ecx
	; SSSE3-NEXT: movd %ecx, %xmm2			; SSSE3-NEXT: movd %ecx, %xmm2
	; SSSE3-NEXT: shrl $15, %eax			; SSSE3-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3],xmm2[4],xmm1[4],xmm2[5],xmm1[5],xmm2[6],xmm1[6],xmm2[7],xmm1[7]
	; SSSE3-NEXT: movzwl %ax, %eax			; SSSE3-NEXT: movl %eax, %ecx
				; SSSE3-NEXT: andl $1, %ecx
				; SSSE3-NEXT: movd %ecx, %xmm1
				; SSSE3-NEXT: shrl $8, %eax
				; SSSE3-NEXT: andl $1, %eax
	; SSSE3-NEXT: movd %eax, %xmm4			; SSSE3-NEXT: movd %eax, %xmm4
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1],xmm2[2],xmm4[2],xmm2[3],xmm4[3],xmm2[4],xmm4[4],xmm2[5],xmm4[5],xmm2[6],xmm4[6],xmm2[7],xmm4[7]			; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm4[0],xmm1[1],xmm4[1],xmm1[2],xmm4[2],xmm1[3],xmm4[3],xmm1[4],xmm4[4],xmm1[5],xmm4[5],xmm1[6],xmm4[6],xmm1[7],xmm4[7]
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3],xmm3[4],xmm2[4],xmm3[5],xmm2[5],xmm3[6],xmm2[6],xmm3[7],xmm2[7]			; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3],xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1],xmm0[2],xmm3[2],xmm0[3],xmm3[3],xmm0[4],xmm3[4],xmm0[5],xmm3[5],xmm0[6],xmm3[6],xmm0[7],xmm3[7]			; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1],xmm1[2],xmm3[2],xmm1[3],xmm3[3],xmm1[4],xmm3[4],xmm1[5],xmm3[5],xmm1[6],xmm3[6],xmm1[7],xmm3[7]
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]			; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
	; SSSE3-NEXT: movdqa %xmm1, %xmm0			; SSSE3-NEXT: movdqa %xmm1, %xmm0
	; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]			; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
	; SSSE3-NEXT: psllw $15, %xmm0			; SSSE3-NEXT: psllw $15, %xmm0
	; SSSE3-NEXT: psraw $15, %xmm0			; SSSE3-NEXT: psraw $15, %xmm0
	; SSSE3-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm0[8],xmm1[9],xmm0[9],xmm1[10],xmm0[10],xmm1[11],xmm0[11],xmm1[12],xmm0[12],xmm1[13],xmm0[13],xmm1[14],xmm0[14],xmm1[15],xmm0[15]			; SSSE3-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm0[8],xmm1[9],xmm0[9],xmm1[10],xmm0[10],xmm1[11],xmm0[11],xmm1[12],xmm0[12],xmm1[13],xmm0[13],xmm1[14],xmm0[14],xmm1[15],xmm0[15]
	; SSSE3-NEXT: psllw $15, %xmm1			; SSSE3-NEXT: psllw $15, %xmm1
	; SSSE3-NEXT: psraw $15, %xmm1			; SSSE3-NEXT: psraw $15, %xmm1
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; SSE41-NEXT: shrl $13, %ecx			; SSE41-NEXT: shrl $13, %ecx
	; SSE41-NEXT: andl $1, %ecx			; SSE41-NEXT: andl $1, %ecx
	; SSE41-NEXT: pinsrb $13, %ecx, %xmm1			; SSE41-NEXT: pinsrb $13, %ecx, %xmm1
	; SSE41-NEXT: movl %eax, %ecx			; SSE41-NEXT: movl %eax, %ecx
	; SSE41-NEXT: shrl $14, %ecx			; SSE41-NEXT: shrl $14, %ecx
	; SSE41-NEXT: andl $1, %ecx			; SSE41-NEXT: andl $1, %ecx
	; SSE41-NEXT: pinsrb $14, %ecx, %xmm1			; SSE41-NEXT: pinsrb $14, %ecx, %xmm1
	; SSE41-NEXT: shrl $15, %eax			; SSE41-NEXT: shrl $15, %eax
	; SSE41-NEXT: movzwl %ax, %eax
	; SSE41-NEXT: pinsrb $15, %eax, %xmm1			; SSE41-NEXT: pinsrb $15, %eax, %xmm1
	; SSE41-NEXT: pmovzxbw {{.*#+}} xmm0 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero			; SSE41-NEXT: pmovzxbw {{.*#+}} xmm0 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero
	; SSE41-NEXT: psllw $15, %xmm0			; SSE41-NEXT: psllw $15, %xmm0
	; SSE41-NEXT: psraw $15, %xmm0			; SSE41-NEXT: psraw $15, %xmm0
	; SSE41-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]			; SSE41-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]
	; SSE41-NEXT: psllw $15, %xmm1			; SSE41-NEXT: psllw $15, %xmm1
	; SSE41-NEXT: psraw $15, %xmm1			; SSE41-NEXT: psraw $15, %xmm1
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	▲ Show 20 Lines • Show All 1,831 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Do several rounds of combine.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 100373

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/AArch64/arm64-narrow-st-merge.ll

test/CodeGen/AArch64/arm64-variadic-aapcs.ll

test/CodeGen/AArch64/fold-constants.ll

test/CodeGen/AMDGPU/and.ll

test/CodeGen/AMDGPU/cndmask-no-def-vcc.ll

test/CodeGen/AMDGPU/copy-illegal-type.ll

test/CodeGen/AMDGPU/cvt_f32_ubyte.ll

test/CodeGen/AMDGPU/fneg-combines.ll

test/CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll

test/CodeGen/AMDGPU/load-constant-i16.ll

test/CodeGen/AMDGPU/load-constant-i8.ll

test/CodeGen/AMDGPU/load-global-i16.ll

test/CodeGen/AMDGPU/load-global-i8.ll

test/CodeGen/AMDGPU/load-local-i16.ll

test/CodeGen/AMDGPU/r600.bitcast.ll

test/CodeGen/AMDGPU/setcc.ll

test/CodeGen/AMDGPU/shift-and-i64-ubfe.ll

test/CodeGen/AMDGPU/shift-i64-opts.ll

test/CodeGen/ARM/2014-01-09-pseudo_expand_implicit_reg.ll

test/CodeGen/ARM/vector-DAGCombine.ll

test/CodeGen/PowerPC/no-pref-jumps.ll

test/CodeGen/SPARC/32abi.ll

test/CodeGen/SystemZ/selectcc-01.ll

test/CodeGen/SystemZ/selectcc-02.ll

test/CodeGen/X86/avx512-any_extend_load.ll

test/CodeGen/X86/constant-combines.ll

test/CodeGen/X86/divide-by-constant.ll

test/CodeGen/X86/illegal-bitfield-loadstore.ll

test/CodeGen/X86/known-bits.ll

test/CodeGen/X86/legalize-shift-64.ll

test/CodeGen/X86/movmsk.ll

test/CodeGen/X86/oddshuffles.ll

test/CodeGen/X86/or-branch.ll

test/CodeGen/X86/popcnt.ll

test/CodeGen/X86/sse3.ll

test/CodeGen/X86/urem-i8-constant.ll

test/CodeGen/X86/vec_extract-mmx.ll

test/CodeGen/X86/vector-sext.ll

[DAGCombine] Do several rounds of combine.
Needs ReviewPublic