This is an archive of the discontinued LLVM Phabricator instance.

[MachineCopyPropagation] Extend pass to do COPY source forwarding
ClosedPublic

Authored by gberry on Mar 8 2017, 11:15 AM.

Download Raw Diff

Details

Reviewers

qcolombet
jonpa
MatzeB
javed.absar

Commits

rG87f8d25150fa: [MachineCopyPropagation] Extend pass to do COPY source forwarding
rL311038: [MachineCopyPropagation] Extend pass to do COPY source forwarding

Summary

This change extends MachineCopyPropagation to do COPY source forwarding.

This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.

Diff Detail

Build Status

Buildable 7660
Build 7660: arc lint + arc unit

Event Timeline

gberry created this revision.Mar 8 2017, 11:15 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptMar 8 2017, 11:15 AM

Herald added subscribers: mgorny, nhaehnle, nemanjai, jyknight. · View Herald Transcript

High-level question: Why does the register allocator not do this on its own?

test/CodeGen/PowerPC/fma-mutate.ll
17–18	Is this an improvement?

In D30751#695783, @hfinkel wrote:

High-level question: Why does the register allocator not do this on its own?

I'm not sure I can fully answer that question, but doing this during register allocation could have the down-side of extending live ranges resulting in a worse allocation. Doing it in a pass just after register allocation allows it to be more conservative/opportunistic and not impact e.g. the amount of spilling.

If instead you're asking, why not just append this code at the end of register allocation, the only reason is based on previous feedback from Quentin that this should be a separate pass.

test/CodeGen/PowerPC/fma-mutate.ll
17–18	The fmr is not new, I just added it to get the second register number. Here are the full diffs before/after this change for this test: fmr 3, 1 addi 3, 3, .LCPI0_0@toc@l lfs 2, 0(3) - xsnmsubadp 3, 2, 3 + xsnmsubadp 3, 2, 1 xsmuldp 4, 0, 0 xsmaddmdp 4, 3, 2 xsmuldp 0, 0, 4

arsenm added a subscriber: arsenm.Mar 8 2017, 12:13 PM

arsenm added inline comments.

lib/CodeGen/MachineCopyForwarding.cpp
110 ↗	(On Diff #91056)	Use DEBUG_TYPE instead of repeating

arsenm added inline comments.Mar 8 2017, 12:13 PM

lib/CodeGen/MachineCopyForwarding.cpp
57 ↗	(On Diff #91056)	I think having this more exactly match the pass name is better. machine-copy-forwarding

In D30751#695793, @gberry wrote:

In D30751#695783, @hfinkel wrote:

High-level question: Why does the register allocator not do this on its own?

I'm not sure I can fully answer that question, but doing this during register allocation could have the down-side of extending live ranges resulting in a worse allocation. Doing it in a pass just after register allocation allows it to be more conservative/opportunistic and not impact e.g. the amount of spilling.

Okay. If it only makes sense as a fallback strategy, not something that would otherwise affect how the allocation is performed, this seems like the logical way to do it.

If instead you're asking, why not just append this code at the end of register allocation, the only reason is based on previous feedback from Quentin that this should be a separate pass.

Makes sense.

test/CodeGen/PowerPC/fma-mutate.ll
17–18	Okay, thanks!

Hi Geoff,

Two questions:

Why is this useful?
If useful, why do we end up with this pattern?

For #1, I can think of giving more freedom to the post RA scheduler or eliminating the copy. If it is solely for post RA scheduling, I believe the scheduler should be able to do that. If it is to remove the copy, why do we have it in the first place?

Cheers,
-Quentin

Hi Geoff:

Thanks for this pass. If I understand correctly, its primary objective is to reduce reduce pressure?
I see only a few cases for ARM targets. Any particular reason why more cases don't benefit from your optimisation?

test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll
7–10	Would it be better to rewrite these as MIR tests?
test/CodeGen/AArch64/neg-imm.ll
9–10	Would it be better adding new/separate test file instead of changing the purpose of this one ?

@arsenm I've addressed your comments in my working copy.
@qcolombet @javed.absar I'll address your comments/questions soon

@javed.absar The purpose of this pass is not to reduce register pressure (since it is run just after register allocation), but to allow more scheduling flexibility and to a lesser degree to remove some redundant COPYs. I'll elaborate on this in my response to Quentin.
As for your question about why more ARM tests aren't effected, I don't have a good answer, but my guess would be that there are just more X86 lit test cases both in general and in the number that are sensitive to changes in register allocation.

test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll

7–10

I'm not sure how that would help. In this test, similar to the one Hal asked about before, the newly checked 'mov's aren't new, I just needed to add them to get the new register numbers. Here are the full diffs of the generated code for this test case:

 _t:                                     ; @t
 ; BB#0:                                 ; %entry
 	stp	x20, x19, [sp, #-32]!   ; 8-byte Folded Spill
 	stp	x29, x30, [sp, #16]     ; 8-byte Folded Spill
 	mov	 x19, x3
 	mov	 x20, x2
-	mov	 x0, x20
-	mov	 x1, x19
+	mov	 x0, x2
+	mov	 x1, x3
 	bl	_foo
 	mov	 x0, x20
 	mov	 x1, x19
 	bl	_foo

test/CodeGen/AArch64/neg-imm.ll

9–10

Again, I'm not trying to change the purpose of this test. My change just caused things to be scheduled slightly differently. The test is still checking that the condition is computed by a 'subs' feeding a 'csel'. Here are the full diffs:

test:                                   // @test
 	str	x20, [sp, #-32]!        // 8-byte Folded Spill
 	stp	x19, x30, [sp, #16]     // 8-byte Folded Spill
+	subs	w8, w0, #1              // =1
 	mov	 w19, w0
-	subs	w8, w19, #1             // =1
 	csel	w20, wzr, w8, lt
 .LBB0_1:                                // %for.body
                                         // =>This Inner Loop Header: Depth=1
 	cmp		w19, w20
 	b.eq	.LBB0_3
 // BB#2:                                // %if.then3
                                         //   in Loop: Header=BB0_1 Depth=1
 	mov	 w0, w20
 	bl	foo
 .LBB0_3:                                // %for.inc
                                         //   in Loop: Header=BB0_1 Depth=1
 	cmp		w20, w19
 	add	w20, w20, #1            // =1
 	b.le	.LBB0_1
 // BB#4:                                // %for.cond.cleanup
 	ldp	x19, x30, [sp, #16]     // 8-byte Folded Reload
 	ldr	x20, [sp], #32          // 8-byte Folded Reload
 	ret

@qcolombet This change is useful primarily to increase scheduling flexibility and reducing the critical path, though it does also make some COPYs unnecessary leading to their removal. Handling just the former in the scheduler is a possibility, but it would have the drawback of not provide a benefit to OoO cores that don't do post-RA scheduling.

As to your question of why we end up with this pattern, I looked at some cases where we end up removing COPYs and saw two main causes for this:

virtual registers are not getting coalesced before/during RA because there is a mismatch in register-class (e.g. the source reg class is a subset of the dest reg class).
only a partial segment of a complex live range has a COPY that becomes dead.

In terms of overall compiler complexity, I suspect that this pass could be extended a bit (to handle subreg COPYs for example) and make the current MachineCopyPropagation unnecessary. There may be problems doing this related to phase ordering or different optimization level pass pipelines though, I haven't investigated it thoroughly.

I will also note FWIW that changes doing something similar to this have come up at least twice before: D21455 and D20531

This change is useful primarily to increase scheduling flexibility and reducing the critical path, though it does also make some COPYs unnecessary leading to their removal. Handling just the former in the scheduler is a possibility, but it would have the drawback of not provide a benefit to OoO cores that don't do post-RA scheduling.

I may be wrong but I wouldn't expect OoO cores to be that affected by such change.

As to your question of why we end up with this pattern, I looked at some cases where we end up removing COPYs and saw two main causes for this:

virtual registers are not getting coalesced before/during RA because there is a mismatch in register-class (e.g. the source reg class is a subset of the dest reg class).

That's strange as long as the register-class are a subset of one another, the coalescing should still be possible. If there is no intersection then the coalescing is not possible at all.
Could you share some test cases?

only a partial segment of a complex live range has a COPY that becomes dead.

In terms of overall compiler complexity, I suspect that this pass could be extended a bit (to handle subreg COPYs for example) and make the current MachineCopyPropagation unnecessary.

If we could merge the MachineCopyPropagation logic in here, then that would be a non-brainer for the goodness of this approach.

Cheers,
-Quentin

I've taken a new approach with this change: extending the existing
MachineCopyPropagation pass instead of making a new pass. This makes
the patch quite a bit simpler at the expense of making
MachineCopyPropagation a little more complicated (by having two
modes).

There are two AMDGPU lit test cases that I'm not sure about (marked
with XXXGCB) that I would appreciate someone more familiar with that
target to make sure they are reasonable.

To answer Quentin's original questions/comments:

I have at least one example of an OoO core that does benefit from this change (and specifically benefits even if no COPYs are removed, only forwarded).

I did some more investigating into why there are COPYs that can be forwarded/removed just after register allocation at all and the case that came up every time I looked deeper was COPYs that were inserted during RegAlloc Greedy (presumably as part of live range splitting?) that looked something like this (from aarch64 MultiSource/Benchmarks/MiBench/consumer-jpeg/jdphuff.c:decode_mcu_AC_refine)

  # After Greedy Register Allocator:
  9008B	BB#62: derived from LLVM BB %if.end169
  	    Predecessors according to CFG: BB#45 BB#93
  9056B		%vreg236:sub_32<def,read-undef> = SUBWrr %vreg236:sub_32, %vreg46; GPR64common:%vreg236 GPR32common:%vreg46
  9104B		%vreg215<def> = ASRVXr %vreg43, %vreg236; GPR64:%vreg215,%vreg43 GPR64common:%vreg236
  9128B		%vreg426<def> = COPY %vreg425; GPR32common:%vreg426,%vreg425
  9136B		%vreg217<def> = SUBWri %vreg426, 1, 0; GPR32common:%vreg217,%vreg426
  9152B		%vreg218<def> = ANDWrr %vreg217, %vreg215:sub_32; GPR32:%vreg218 GPR32common:%vreg217 GPR64:%vreg215
  9168B		%vreg426<def> = ADDWrr %vreg218, %vreg426; GPR32common:%vreg426 GPR32:%vreg218
  9200B		CBZW %vreg426, <BB#63>; GPR32common:%vreg426
	    Successors according to CFG: BB#63(0x30000000 / 0x80000000 = 37.50%) BB#94(0x50000000 / 0x80000000 = 62.50%)

Where the COPY added had a small live range and did not end up
getting allocated in such a way that the COPY was a NOP
(i.e. %vreg426 was assigned a different register than %vreg425).

gberry retitled this revision from [MachineCopyForwarding] Add new pass to do register COPY forwarding at end of register allocation. to [MachineCopyPropagation] Extend pass to do COPY source forwarding.Apr 10 2017, 1:40 PM

gberry edited the summary of this revision. (Show Details)

Ping?

For the record: This looks very similar to https://reviews.llvm.org/D20531 and https://reviews.llvm.org/D21455. Though the earlier two attempts were somewhat sketchy in terms of correctness as they were renaming physregs and we have no good way in LLVM to detect which ones are legal to rename.

In D30751#740258, @MatzeB wrote:

For the record: This looks very similar to https://reviews.llvm.org/D20531 and https://reviews.llvm.org/D21455. Though the earlier two attempts were somewhat sketchy in terms of correctness as they were renaming physregs and we have no good way in LLVM to detect which ones are legal to rename.

Yep, that's why I added you and @jonpa as reviewers and mentioned the previous changes it in my comment on 3/14 :)
I think this version avoids the issues of the previous two attempts by running when we still have virtual reg information.

Ping? @qcolombet have you had a chance to look at this latest version?

qcolombet added inline comments.May 26 2017, 5:45 PM

lib/CodeGen/MachineCopyPropagation.cpp
135	No else.
193	I am wondering if we want to obfuscate that this is really just the same pass with different parameters. I get that the dependencies are also different for the initialization process, but usually we just go with the most constraining one. Is it worth doing differently here?
273	What about FullReg (or getSubReg if you want to invert) instead of ForClobber?
301	physregs don't have subreg. Thus, you could make the code more readable with: if (!TargetRegisterInfo::isVirtualRegister(Reg)) return Reg; assert(PreRegRewrite); // Then test over sub reg
350	I found the name of the function confusing. It takes MIs but mention only reg classes.
447	Put /ForClobber/ (or the updated name) in from of false.
465	That's invalid per the MachineVerifier

inouehrs mentioned this in D34193: [PowerPC] peephole optimization on use after register copy.Jun 15 2017, 8:23 AM

I have implemented mostly same optimization in PowerPC backend (independently from this patch) and submitted a patch https://reviews.llvm.org/D34193 .
After I realized this patch based on a comment from @MatzeB, I abandoned my patch because this patch is platform neutral and so has wider coverage.
Since I see performance improvements in SPECCPU with my patch, I hope this patch gives similar performance gains.

lib/CodeGen/MachineCopyPropagation.cpp
619	Do we need to execute forwardUses after virtual register rewriting again?
692	Ditto. Do we need this after virtual register rewriting again?

FWIW, I'm hoping to get back to this change and process the feedback I've received in the next week or so.

I've fixed some of the comments and have questions about the others.
I also re-based and fixed an issue with instructions that have wider register implicit uses that are implicitly tied to other operands. The added code is the check in forwardUses() that references AMDGPU in the preceding comment.

lib/CodeGen/MachineCopyPropagation.cpp
193	I took a look at this, and the main problem that comes up when doing it as one pass is the fact that you wouldn't have separate pass IDs that can be used to disable the later run of this pass but not the earlier one (e.g. as is done by NVPTX and WebAssembly targets). I also think it might make it hard to run the pass in isolation via llc. If you think these issues are surmountable with the single pass approach, let me know and I'll take another look.
350	How about isForwardableRegClassCopy? I also renamed the second parameter to 'UseI'.
447	I refactored this to put the 'FullReg' parameter in the function name instead to be more clear.
465	Can you elaborate on this? I'm not sure what you are referring to as being invalid.
619	There are additional forwarding opportunities exposed later in the pipeline by e.g. tail-merging and tail-duplication that are caught by this second run.

Rebased and addressed some review comments

Herald added subscribers: eraman, sdardis. · View Herald TranscriptJun 27 2017, 10:44 AM

Ping?

qcolombet added inline comments.Jul 21 2017, 2:48 PM

lib/CodeGen/MachineCopyPropagation.cpp
193	Thanks for double checking. I'm fine with the added complexity given it has an explanation. Please add comment explaining that in the code.
349	Could you add an example illustrating that?
465	Sorry. Physical registers don't have subreg indices. The machine verifier checks that.

guyblank added a subscriber: guyblank.Jul 23 2017, 10:00 PM

gberry marked 2 inline comments as done.Jul 31 2017, 1:43 PM

gberry added inline comments.

lib/CodeGen/MachineCopyPropagation.cpp
465	In this case, the MOUse may not be a physical register. We are replacing a possibly virtual reg (MOUse) with a physical reg (NewUseReg), so we need to make sure to translate the subreg on MOUse. I've added a comment to hopefully clarify this a bit.

Address Quentin's comments

Harbormaster completed remote builds in B8793: Diff 108995.Jul 31 2017, 1:46 PM

Herald added subscribers: fedor.sergeev, aheejin, dschuff, jfb. · View Herald TranscriptJul 31 2017, 1:46 PM

guyblank added a subscriber: myatsina.Aug 3 2017, 12:15 AM

qcolombet requested changes to this revision.Aug 3 2017, 2:47 PM

qcolombet added inline comments.

lib/CodeGen/MachineCopyPropagation.cpp
350	Works with better comment on top. See next comment.
362	No bracket
387	I think this example would do great on the comment of the function itself with a note that Copy is the first copy and UseI the second one. Of course, we need to document the non-copy UseI case as well :)
391	I'd add an assert that UseI.getOperand(1) == DstReg
395	Add a comment on what are the relation between CopyDstReg, CopySrcReg and UseMI and assert on that. E.g., UseMI is a copy that uses CopyDstReg. CopyDstReg is going to be replaced by CopySrcReg.
396	Move that closer to its first use.
397	Unless I am missing something, I don't see the extend to use call for CopySrcReg.
424	Add a comment on what this is doing.
435	I would add: if (!UseReg) continue;
444	I would avoid forwarding on physical register period.
465	Ah right, I missed the New in the name of the check.
516	We don't need this check for virtual reg, right?
563	It would help the readability if there were helper function for the different regclass/register fixing for the respective physreg and virtureg cases. E.g., a helper with if NewUseReg is a physreg then block line 485 else block line 541. Then the code would look like: Check if replacing is possible between MOUse and NewUseReg [...] isForwardXXX [...] Adapt NewUseReg to whatever constraints are carried by MOUse fix(NewUseReg, NewUseSubReg); // <--- This call your helper function
test/CodeGen/AArch64/flags-multiuse.ll
1	Why do we need to change the run line here?

This revision now requires changes to proceed.Aug 3 2017, 2:47 PM

gberry marked 12 inline comments as done.Aug 11 2017, 2:33 PM

gberry added inline comments.

lib/CodeGen/MachineCopyPropagation.cpp
397	That was being done before the call to this function. I've moved it into this function in my latest revision.
444	I have seen some benefit to forwarding physical registers, mostly in cases where block boundaries are changed between RA time and the second run of this pass. This mostly seems to happen when tail merge/tail duplication changes which uses COPYs are exposed to in the same block. This pass won't remove any physical register writes, so this seems relatively safe, other than the caveat in the comment above.
516	No, I don't think so. I added an early exit for this case to hasImplicitOverlap.
563	I've factored out quite a bit of code from this function, let me know what you think.
test/CodeGen/AArch64/flags-multiuse.ll
1	Turning off the post-RA scheduler kept the checked instructions in the same order. I've just re-arranged the checks now.

Update to address Quentin's comments

Harbormaster completed remote builds in B9247: Diff 110815.Aug 11 2017, 2:35 PM

Thanks for your patience Geoff.

LGTM

This revision is now accepted and ready to land.Aug 14 2017, 9:28 AM

Closed by commit rL311038: [MachineCopyPropagation] Extend pass to do COPY source forwarding (authored by gberry). · Explain WhyAug 16 2017, 1:51 PM

This revision was automatically updated to reflect the committed changes.

gberry mentioned this in D37164: [ARM] Fix bug in ARMLoadStoreOptimizer when kill flags are missing..Aug 25 2017, 1:49 PM

gberry mentioned this in rL311907: [ARM] Fix bug in ARMLoadStoreOptimizer when kill flags are missing..Aug 28 2017, 12:07 PM

MatzeB mentioned this in D39536: [PowerPC] Eliminate redundant register copys after register allocation.Nov 15 2017, 10:32 AM

gberry mentioned this in D41835: [MachineCopyPropagation] Extend pass to do COPY source forwarding.Jan 8 2018, 1:15 PM

gberry mentioned this in rL323991: [MachineCopyPropagation] Extend pass to do COPY source forwarding.Feb 1 2018, 10:56 AM

Revision Contents

Path

Size

include/

llvm/

CodeGen/

Passes.h

5 lines

InitializePasses.h

1 line

lib/

CodeGen/

CodeGen.cpp

1 line

MachineCopyPropagation.cpp

470 lines

TargetPassConfig.cpp

9 lines

test/

CodeGen/

AArch64/

aarch64-fold-lslfast.ll

9 lines

arm64-AdvSIMD-Scalar.ll

16 lines

arm64-zero-cycle-regmov.ll

6 lines

f16-instructions.ll

2 lines

flags-multiuse.ll

8 lines

merge-store-dependency.ll

3 lines

neg-imm.ll

4 lines

AMDGPU/

attr-amdgpu-flat-work-group-size.ll

2 lines

attr-amdgpu-waves-per-eu.ll

2 lines

mubuf-offset-private.ll

34 lines

multilevel-break.ll

2 lines

private-access-no-objects.ll

27 lines

ret.ll

16 lines

scratch-simple.ll

44 lines

vgpr-spill-emergency-stack-slot-compute.ll

8 lines

ARM/

atomic-op.ll

6 lines

swifterror.ll

2 lines

Mips/

llvm-ir/

sub.ll

2 lines

PowerPC/

fma-mutate.ll

3 lines

inlineasm-i64-reg.ll

2 lines

tail-dup-layout.ll

2 lines

SPARC/

32abi.ll

4 lines

atomics.ll

5 lines

Thumb/

thumb-shrink-wrapping.ll

2 lines

X86/

2006-03-01-InstrSchedBug.ll

2 lines

arg-copy-elide.ll

2 lines

avg.ll

3 lines

avx512-bugfix-25270.ll

4 lines

avx512-calling-conv.ll

2 lines

avx512-mask-op.ll

24 lines

avx512bw-intrinsics-upgrade.ll

8 lines

buildvec-insertvec.ll

2 lines

combine-fcopysign.ll

8 lines

complex-fastmath.ll

10 lines

divide-by-constant.ll

2 lines

8 lines

8 lines

4 lines

12 lines

4 lines

inline-asm-fpstack.ll

3 lines

ipra-local-linkage.ll

2 lines

2 lines

107 lines

19 lines

2 lines

46 lines

2 lines

2 lines

10 lines

2 lines

3 lines

shrink-wrap-chkstk.ll

2 lines

sqrt-fastmath.ll

8 lines

sse-scalar-fp-arith.ll

12 lines

sse1.ll

4 lines

sse3-avx-addsub-2.ll

4 lines

statepoint-live-in.ll

6 lines

statepoint-stack-usage.ll

6 lines

26 lines

4 lines

2 lines

8 lines

2 lines

vector-idiv-sdiv-128.ll

10 lines

vector-idiv-udiv-128.ll

2 lines

vector-rotate-128.ll

16 lines

vector-sext.ll

20 lines

vector-shift-ashr-128.ll

2 lines

vector-shift-lshr-128.ll

8 lines

vector-shift-shl-128.ll

14 lines

vector-shuffle-combining.ll

2 lines

2 lines

14 lines

44 lines

2 lines

4 lines

x86-shrink-wrap-unwind.ll

6 lines

x86-shrink-wrapping.ll

4 lines

Diff 104212

include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 272 Lines • ▼ Show 20 Lines	/// MachineDominanaceFrontier - This pass is a machine dominators analysis pass.
extern char &ImplicitNullChecksID;		extern char &ImplicitNullChecksID;

/// MachineLICM - This pass performs LICM on machine instructions.		/// MachineLICM - This pass performs LICM on machine instructions.
extern char &MachineLICMID;		extern char &MachineLICMID;

/// MachineSinking - This pass performs sinking on machine instructions.		/// MachineSinking - This pass performs sinking on machine instructions.
extern char &MachineSinkingID;		extern char &MachineSinkingID;

		/// MachineCopyPropagationPreRegRewrite - This pass performs copy propagation
		/// on machine instructions after register allocation but before virtual
		/// register re-writing..
		extern char &MachineCopyPropagationPreRegRewriteID;

/// MachineCopyPropagation - This pass performs copy propagation on		/// MachineCopyPropagation - This pass performs copy propagation on
/// machine instructions.		/// machine instructions.
extern char &MachineCopyPropagationID;		extern char &MachineCopyPropagationID;

/// PeepholeOptimizer - This pass performs peephole optimizations -		/// PeepholeOptimizer - This pass performs peephole optimizations -
/// like extension and comparison eliminations.		/// like extension and comparison eliminations.
extern char &PeepholeOptimizerID;		extern char &PeepholeOptimizerID;

▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 228 Lines • ▼ Show 20 Lines
	void initializeMIRPrintingPassPass(PassRegistry&);			void initializeMIRPrintingPassPass(PassRegistry&);
	void initializeMachineBlockFrequencyInfoPass(PassRegistry&);			void initializeMachineBlockFrequencyInfoPass(PassRegistry&);
	void initializeMachineBlockPlacementPass(PassRegistry&);			void initializeMachineBlockPlacementPass(PassRegistry&);
	void initializeMachineBlockPlacementStatsPass(PassRegistry&);			void initializeMachineBlockPlacementStatsPass(PassRegistry&);
	void initializeMachineBranchProbabilityInfoPass(PassRegistry&);			void initializeMachineBranchProbabilityInfoPass(PassRegistry&);
	void initializeMachineCSEPass(PassRegistry&);			void initializeMachineCSEPass(PassRegistry&);
	void initializeMachineCombinerPass(PassRegistry&);			void initializeMachineCombinerPass(PassRegistry&);
	void initializeMachineCopyPropagationPass(PassRegistry&);			void initializeMachineCopyPropagationPass(PassRegistry&);
				void initializeMachineCopyPropagationPreRegRewritePass(PassRegistry&);
	void initializeMachineDominanceFrontierPass(PassRegistry&);			void initializeMachineDominanceFrontierPass(PassRegistry&);
	void initializeMachineDominatorTreePass(PassRegistry&);			void initializeMachineDominatorTreePass(PassRegistry&);
	void initializeMachineFunctionPrinterPassPass(PassRegistry&);			void initializeMachineFunctionPrinterPassPass(PassRegistry&);
	void initializeMachineLICMPass(PassRegistry&);			void initializeMachineLICMPass(PassRegistry&);
	void initializeMachineLoopInfoPass(PassRegistry&);			void initializeMachineLoopInfoPass(PassRegistry&);
	void initializeMachineModuleInfoPass(PassRegistry&);			void initializeMachineModuleInfoPass(PassRegistry&);
	void initializeMachineOptimizationRemarkEmitterPassPass(PassRegistry&);			void initializeMachineOptimizationRemarkEmitterPassPass(PassRegistry&);
	void initializeMachineOutlinerPass(PassRegistry&);			void initializeMachineOutlinerPass(PassRegistry&);
	▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines

lib/CodeGen/CodeGen.cpp

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	void llvm::initializeCodeGen(PassRegistry &Registry) {
initializeLocalStackSlotPassPass(Registry);		initializeLocalStackSlotPassPass(Registry);
initializeLowerIntrinsicsPass(Registry);		initializeLowerIntrinsicsPass(Registry);
initializeMachineBlockFrequencyInfoPass(Registry);		initializeMachineBlockFrequencyInfoPass(Registry);
initializeMachineBlockPlacementPass(Registry);		initializeMachineBlockPlacementPass(Registry);
initializeMachineBlockPlacementStatsPass(Registry);		initializeMachineBlockPlacementStatsPass(Registry);
initializeMachineCSEPass(Registry);		initializeMachineCSEPass(Registry);
initializeMachineCombinerPass(Registry);		initializeMachineCombinerPass(Registry);
initializeMachineCopyPropagationPass(Registry);		initializeMachineCopyPropagationPass(Registry);
		initializeMachineCopyPropagationPreRegRewritePass(Registry);
initializeMachineDominatorTreePass(Registry);		initializeMachineDominatorTreePass(Registry);
initializeMachineFunctionPrinterPassPass(Registry);		initializeMachineFunctionPrinterPassPass(Registry);
initializeMachineLICMPass(Registry);		initializeMachineLICMPass(Registry);
initializeMachineLoopInfoPass(Registry);		initializeMachineLoopInfoPass(Registry);
initializeMachineModuleInfoPass(Registry);		initializeMachineModuleInfoPass(Registry);
initializeMachineOptimizationRemarkEmitterPassPass(Registry);		initializeMachineOptimizationRemarkEmitterPassPass(Registry);
initializeMachineOutlinerPass(Registry);		initializeMachineOutlinerPass(Registry);
initializeMachinePipelinerPass(Registry);		initializeMachinePipelinerPass(Registry);
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

lib/CodeGen/MachineCopyPropagation.cpp

//===- MachineCopyPropagation.cpp - Machine Copy Propagation Pass ---------===//		//===- MachineCopyPropagation.cpp - Machine Copy Propagation Pass ---------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This is an extremely simple MachineInstr-level copy propagation pass.		// This is a simple MachineInstr-level copy forwarding pass. It may be run at
		// two places in the codegen pipeline:
		// - After register allocation but before virtual registers have been remapped
		// to physical registers.
		// - After physical register remapping.
		//
		// The optimizations done vary slightly based on whether virtual registers are
		// still present. In both cases, this pass forwards the source of COPYs to the
		// users of their destinations when doing so is legal. For example:
		//
		// %vreg1 = COPY %vreg0
		// ...
		// ... = OP %vreg1
		//
		// If
		// - the physical register assigned to %vreg0 has not been clobbered by the
		// time of the use of %vreg1
		// - the register class constraints are satisfied
		// - the COPY def is the only value that reaches OP
		// then this pass replaces the above with:
		//
		// %vreg1 = COPY %vreg0
		// ...
		// ... = OP %vreg0
		//
		// and updates the relevant state required by VirtRegMap (e.g. LiveIntervals).
		// COPYs whose LiveIntervals become dead as a result of this forwarding (i.e. if
		// all uses of %vreg1 are changed to %vreg0) are removed.
		//
		// When being run with only physical registers, this pass will also remove some
		// redundant COPYs. For example:
		//
		// %R1 = COPY %R0
		// ... // No clobber of %R1
		// %R0 = COPY %R1 <<< Removed
		//
		// or
		//
		// %R1 = COPY %R0
		// ... // No clobber of %R0
		// %R1 = COPY %R0 <<< Removed
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		#include "LiveDebugVariables.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
		#include "llvm/CodeGen/LiveRangeEdit.h"
		#include "llvm/CodeGen/LiveStackAnalysis.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
		#include "llvm/CodeGen/VirtRegMap.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetInstrInfo.h"		#include "llvm/Target/TargetInstrInfo.h"
#include "llvm/Target/TargetRegisterInfo.h"		#include "llvm/Target/TargetRegisterInfo.h"
#include "llvm/Target/TargetSubtargetInfo.h"		#include "llvm/Target/TargetSubtargetInfo.h"
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "machine-cp"		#define DEBUG_TYPE "machine-cp"

STATISTIC(NumDeletes, "Number of dead copies deleted");		STATISTIC(NumDeletes, "Number of dead copies deleted");
		STATISTIC(NumCopyForwards, "Number of copy uses forwarded");

namespace {		namespace {
typedef SmallVector<unsigned, 4> RegList;		typedef SmallVector<unsigned, 4> RegList;
typedef DenseMap<unsigned, RegList> SourceMap;		typedef DenseMap<unsigned, RegList> SourceMap;
typedef DenseMap<unsigned, MachineInstr*> Reg2MIMap;		typedef DenseMap<unsigned, MachineInstr*> Reg2MIMap;

class MachineCopyPropagation : public MachineFunctionPass {		class MachineCopyPropagation : public MachineFunctionPass,
		private LiveRangeEdit::Delegate {
const TargetRegisterInfo *TRI;		const TargetRegisterInfo *TRI;
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
const MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;
		MachineFunction *MF;
		SlotIndexes *Indexes;
		LiveIntervals *LIS;
		const VirtRegMap *VRM;
		// True if this pass being run before virtual registers are remapped to
		// physical ones.
		bool PreRegRewrite;
		bool NoSubRegLiveness;

		protected:
		MachineCopyPropagation(char &ID, bool PreRegRewrite)
		: MachineFunctionPass(ID), PreRegRewrite(PreRegRewrite) {}

public:		public:
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
MachineCopyPropagation() : MachineFunctionPass(ID) {		MachineCopyPropagation() : MachineCopyPropagation(ID, false) {
initializeMachineCopyPropagationPass(*PassRegistry::getPassRegistry());		initializeMachineCopyPropagationPass(*PassRegistry::getPassRegistry());
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
		if (PreRegRewrite) {
		AU.addRequired<SlotIndexes>();
		AU.addPreserved<SlotIndexes>();
		AU.addRequired<LiveIntervals>();
		AU.addPreserved<LiveIntervals>();
		AU.addRequired<VirtRegMap>();
		AU.addPreserved<VirtRegMap>();
		AU.addPreserved<LiveDebugVariables>();
		AU.addPreserved<LiveStacks>();
		}
AU.setPreservesCFG();		AU.setPreservesCFG();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}

bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

MachineFunctionProperties getRequiredProperties() const override {		MachineFunctionProperties getRequiredProperties() const override {
		if (PreRegRewrite)
		return MachineFunctionProperties()
		.set(MachineFunctionProperties::Property::NoPHIs)
		.set(MachineFunctionProperties::Property::TracksLiveness);
return MachineFunctionProperties().set(		return MachineFunctionProperties().set(
MachineFunctionProperties::Property::NoVRegs);		MachineFunctionProperties::Property::NoVRegs);
}		}

private:		private:
void ClobberRegister(unsigned Reg);		void ClobberRegister(unsigned Reg);
		qcolombetUnsubmitted Done Reply Inline Actions No else. qcolombet: No else.
void ReadRegister(unsigned Reg);		void ReadRegister(unsigned Reg);
void CopyPropagateBlock(MachineBasicBlock &MBB);		void CopyPropagateBlock(MachineBasicBlock &MBB);
bool eraseIfRedundant(MachineInstr &Copy, unsigned Src, unsigned Def);		bool eraseIfRedundant(MachineInstr &Copy, unsigned Src, unsigned Def);
		unsigned getPhysReg(const MachineOperand &Opnd, bool FullReg);
		unsigned getPhysReg(const MachineOperand &Opnd) {
		return getPhysReg(Opnd, false);
		}
		unsigned getFullPhysReg(const MachineOperand &Opnd) {
		return getPhysReg(Opnd, true);
		}
		void forwardUses(MachineInstr &MI);
		bool isForwardableRegClassCopy(MachineInstr &Copy, MachineInstr &UseI);
		void updateForwardedCopyLiveInterval(unsigned CopyDstReg,
		unsigned CopySrcReg,
		const MachineInstr &UseMI);
		/// LiveRangeEdit callback for eliminateDeadDefs().
		void LRE_WillEraseInstruction(MachineInstr *MI) override;

/// Candidates for deletion.		/// Candidates for deletion.
SmallSetVector<MachineInstr*, 8> MaybeDeadCopies;		SmallSetVector<MachineInstr*, 8> MaybeDeadCopies;
/// Def -> available copies map.		/// Def -> available copies map.
Reg2MIMap AvailCopyMap;		Reg2MIMap AvailCopyMap;
/// Def -> copies map.		/// Def -> copies map.
Reg2MIMap CopyMap;		Reg2MIMap CopyMap;
/// Src -> Def map		/// Src -> Def map
SourceMap SrcMap;		SourceMap SrcMap;
bool Changed;		bool Changed;
};		};

		class MachineCopyPropagationPreRegRewrite : public MachineCopyPropagation {
		public:
		static char ID; // Pass identification, replacement for typeid
		MachineCopyPropagationPreRegRewrite()
		: MachineCopyPropagation(ID, true) {
		initializeMachineCopyPropagationPreRegRewritePass(*PassRegistry::getPassRegistry());
		}
		};
}		}
char MachineCopyPropagation::ID = 0;		char MachineCopyPropagation::ID = 0;
char &llvm::MachineCopyPropagationID = MachineCopyPropagation::ID;		char &llvm::MachineCopyPropagationID = MachineCopyPropagation::ID;

INITIALIZE_PASS(MachineCopyPropagation, DEBUG_TYPE,		INITIALIZE_PASS(MachineCopyPropagation, DEBUG_TYPE,
"Machine Copy Propagation Pass", false, false)		"Machine Copy Propagation Pass", false, false)

		char MachineCopyPropagationPreRegRewrite::ID = 0;
		char &llvm::MachineCopyPropagationPreRegRewriteID = MachineCopyPropagationPreRegRewrite::ID;

		INITIALIZE_PASS_BEGIN(MachineCopyPropagationPreRegRewrite,
		"machine-cp-prerewrite",
		"Machine Copy Propagation Pre-Register Rewrite Pass",
		false, false)
		INITIALIZE_PASS_DEPENDENCY(SlotIndexes)
		INITIALIZE_PASS_DEPENDENCY(LiveIntervals)
		INITIALIZE_PASS_DEPENDENCY(VirtRegMap)
		INITIALIZE_PASS_END(MachineCopyPropagationPreRegRewrite,
		"machine-cp-prerewrite",
		"Machine Copy Propagation Pre-Register Rewrite Pass", false,
		false)
		qcolombetUnsubmitted Not Done Reply Inline Actions I am wondering if we want to obfuscate that this is really just the same pass with different parameters. I get that the dependencies are also different for the initialization process, but usually we just go with the most constraining one. Is it worth doing differently here? qcolombet: I am wondering if we want to obfuscate that this is really just the same pass with different…
		gberryAuthorUnsubmitted Not Done Reply Inline Actions I took a look at this, and the main problem that comes up when doing it as one pass is the fact that you wouldn't have separate pass IDs that can be used to disable the later run of this pass but not the earlier one (e.g. as is done by NVPTX and WebAssembly targets). I also think it might make it hard to run the pass in isolation via llc. If you think these issues are surmountable with the single pass approach, let me know and I'll take another look. gberry: I took a look at this, and the main problem that comes up when doing it as one pass is the fact…
		qcolombetUnsubmitted Done Reply Inline Actions Thanks for double checking. I'm fine with the added complexity given it has an explanation. Please add comment explaining that in the code. qcolombet: Thanks for double checking. I'm fine with the added complexity given it has an explanation.

/// Remove any entry in \p Map where the register is a subregister or equal to		/// Remove any entry in \p Map where the register is a subregister or equal to
/// a register contained in \p Regs.		/// a register contained in \p Regs.
static void removeRegsFromMap(Reg2MIMap &Map, const RegList &Regs,		static void removeRegsFromMap(Reg2MIMap &Map, const RegList &Regs,
const TargetRegisterInfo &TRI) {		const TargetRegisterInfo &TRI) {
for (unsigned Reg : Regs) {		for (unsigned Reg : Regs) {
// Source of copy is no longer available for propagation.		// Source of copy is no longer available for propagation.
for (MCSubRegIterator SR(Reg, &TRI, true); SR.isValid(); ++SR)		for (MCSubRegIterator SR(Reg, &TRI, true); SR.isValid(); ++SR)
Map.erase(*SR);		Map.erase(*SR);
Show All 24 Lines	for (MCRegAliasIterator AI(Reg, TRI, true); AI.isValid(); ++AI) {
if (SI != SrcMap.end()) {		if (SI != SrcMap.end()) {
removeRegsFromMap(AvailCopyMap, SI->second, *TRI);		removeRegsFromMap(AvailCopyMap, SI->second, *TRI);
SrcMap.erase(SI);		SrcMap.erase(SI);
}		}
}		}
}		}

void MachineCopyPropagation::ReadRegister(unsigned Reg) {		void MachineCopyPropagation::ReadRegister(unsigned Reg) {
		// We don't track MaybeDeadCopies when running pre-VirtRegRewriter.
		if (PreRegRewrite)
		return;

// If 'Reg' is defined by a copy, the copy is no longer a candidate		// If 'Reg' is defined by a copy, the copy is no longer a candidate
// for elimination.		// for elimination.
for (MCRegAliasIterator AI(Reg, TRI, true); AI.isValid(); ++AI) {		for (MCRegAliasIterator AI(Reg, TRI, true); AI.isValid(); ++AI) {
Reg2MIMap::iterator CI = CopyMap.find(*AI);		Reg2MIMap::iterator CI = CopyMap.find(*AI);
if (CI != CopyMap.end()) {		if (CI != CopyMap.end()) {
DEBUG(dbgs() << "MCP: Copy is used - not dead: "; CI->second->dump());		DEBUG(dbgs() << "MCP: Copy is used - not dead: "; CI->second->dump());
MaybeDeadCopies.remove(CI->second);		MaybeDeadCopies.remove(CI->second);
}		}
Show All 15 Lines	if (Src == PreviousSrc) {
return true;		return true;
}		}
if (!TRI->isSubRegister(PreviousSrc, Src))		if (!TRI->isSubRegister(PreviousSrc, Src))
return false;		return false;
unsigned SubIdx = TRI->getSubRegIndex(PreviousSrc, Src);		unsigned SubIdx = TRI->getSubRegIndex(PreviousSrc, Src);
return SubIdx == TRI->getSubRegIndex(PreviousDef, Def);		return SubIdx == TRI->getSubRegIndex(PreviousDef, Def);
}		}

		/// Return the physical register assigned to Opnd if it is a virtual register,
		/// otherwise just return the physical reg from the operand itself.
		///
		/// The 'FullReg' parameter specifies whether we want the full physical
		qcolombetUnsubmitted Done Reply Inline Actions What about FullReg (or getSubReg if you want to invert) instead of ForClobber? qcolombet: What about FullReg (or getSubReg if you want to invert) instead of ForClobber?
		/// register assigned to the virtual register ignoring subregs or not. If we
		/// aren't tracking sub-reg liveness then we need to use this to be more
		/// conservative with clobbers by killing all super reg and their sub reg COPYs
		/// as well. This is to prevent COPY forwarding in cases like the following:
		///
		/// %vreg2 = COPY %vreg1:sub1
		/// %vreg3 = COPY %vreg1:sub0
		/// ... = OP1 %vreg2
		/// ... = OP2 %vreg3
		///
		/// After forward %vreg2 (assuming this is the last use of %vreg1) and
		/// VirtRegRewriter adding kill markers we have:
		///
		/// %vreg3 = COPY %vreg1:sub0
		/// ... = OP1 %vreg1:sub1<kill>
		/// ... = OP2 %vreg3
		///
		/// If %vreg3 is assigned to a sub-reg of %vreg1, then after rewriting we have:
		///
		/// ... = OP1 R0:sub1, R0<imp-use,kill>
		/// ... = OP2 R0:sub0
		///
		/// and the use of R0 by OP2 will not have a valid definition.
		unsigned MachineCopyPropagation::getPhysReg(const MachineOperand &Opnd,
		bool FullReg) {
		unsigned Reg = Opnd.getReg();
		// Physical registers cannot have subregs.
		if (!TargetRegisterInfo::isVirtualRegister(Reg))
		qcolombetUnsubmitted Done Reply Inline Actions physregs don't have subreg. Thus, you could make the code more readable with: if (!TargetRegisterInfo::isVirtualRegister(Reg)) return Reg; assert(PreRegRewrite); // Then test over sub reg qcolombet: physregs don't have subreg. Thus, you could make the code more readable with: if (!
		return Reg;

		assert(PreRegRewrite && "Unexpected virtual register encountered");
		Reg = VRM->getPhys(Reg);
		unsigned SubReg = Opnd.getSubReg();
		if (SubReg && !(FullReg && NoSubRegLiveness))
		Reg = TRI->getSubReg(Reg, SubReg);
		return Reg;
		}

/// Remove instruction \p Copy if there exists a previous copy that copies the		/// Remove instruction \p Copy if there exists a previous copy that copies the
/// register \p Src to the register \p Def; This may happen indirectly by		/// register \p Src to the register \p Def; This may happen indirectly by
/// copying the super registers.		/// copying the super registers.
bool MachineCopyPropagation::eraseIfRedundant(MachineInstr &Copy, unsigned Src,		bool MachineCopyPropagation::eraseIfRedundant(MachineInstr &Copy, unsigned Src,
unsigned Def) {		unsigned Def) {
// Avoid eliminating a copy from/to a reserved registers as we cannot predict		// Avoid eliminating a copy from/to a reserved registers as we cannot predict
// the value (Example: The sparc zero register is writable but stays zero).		// the value (Example: The sparc zero register is writable but stays zero).
if (MRI->isReserved(Src) \|\| MRI->isReserved(Def))		if (MRI->isReserved(Src) \|\| MRI->isReserved(Def))
Show All 21 Lines	for (MachineInstr &MI :
MI.clearRegisterKills(CopyDef, TRI);		MI.clearRegisterKills(CopyDef, TRI);

Copy.eraseFromParent();		Copy.eraseFromParent();
Changed = true;		Changed = true;
++NumDeletes;		++NumDeletes;
return true;		return true;
}		}

		// Only forward cross-class COPYs into other reversed cross-class COPYs.
		qcolombetUnsubmitted Done Reply Inline Actions Could you add an example illustrating that? qcolombet: Could you add an example illustrating that?
		bool MachineCopyPropagation::isForwardableRegClassCopy(MachineInstr &Copy,
		qcolombetUnsubmitted Not Done Reply Inline Actions I found the name of the function confusing. It takes MIs but mention only reg classes. qcolombet: I found the name of the function confusing. It takes MIs but mention only reg classes.
		gberryAuthorUnsubmitted Not Done Reply Inline Actions How about isForwardableRegClassCopy? I also renamed the second parameter to 'UseI'. gberry: How about isForwardableRegClassCopy? I also renamed the second parameter to 'UseI'.
		qcolombetUnsubmitted Done Reply Inline Actions Works with better comment on top. See next comment. qcolombet: Works with better comment on top. See next comment.
		MachineInstr &UseI) {
		auto isCross = [&](const MachineOperand &Dst, const MachineOperand &Src) {
		unsigned DstReg = Dst.getReg();
		unsigned SrcPhysReg = getPhysReg(Src);
		const TargetRegisterClass *DstRC;
		if (TargetRegisterInfo::isVirtualRegister(DstReg)) {
		DstRC = MRI->getRegClass(DstReg);
		unsigned DstSubReg = Dst.getSubReg();
		if (DstSubReg)
		SrcPhysReg = TRI->getMatchingSuperReg(SrcPhysReg, DstSubReg, DstRC);
		} else {
		DstRC = TRI->getMinimalPhysRegClass(DstReg);
		qcolombetUnsubmitted Done Reply Inline Actions No bracket qcolombet: No bracket
		}

		return !DstRC->contains(SrcPhysReg);
		};

		const MachineOperand &CopyDst = Copy.getOperand(0);
		const MachineOperand &CopySrc = Copy.getOperand(1);

		if (!isCross(CopyDst, CopySrc))
		return true;

		if (!UseI.isCopy())
		return false;

		return !isCross(UseI.getOperand(0), CopySrc);
		}

		void MachineCopyPropagation::updateForwardedCopyLiveInterval(
		unsigned CopyDstReg, unsigned CopySrcReg, const MachineInstr &UseMI) {
		SmallVector<MachineInstr *, 4> DeadInsts;
		LiveInterval &LI = LIS->getInterval(CopyDstReg);

		// Can happen for undef uses.
		if (LI.empty())
		return;
		qcolombetUnsubmitted Done Reply Inline Actions I think this example would do great on the comment of the function itself with a note that Copy is the first copy and UseI the second one. Of course, we need to document the non-copy UseI case as well :) qcolombet: I think this example would do great on the comment of the function itself with a note that Copy…

		SlotIndex UseIndex = Indexes->getInstructionIndex(UseMI);
		const LiveRange::Segment *UseSeg = LI.getSegmentContaining(UseIndex);

		qcolombetUnsubmitted Done Reply Inline Actions I'd add an assert that UseI.getOperand(1) == DstReg qcolombet: I'd add an assert that UseI.getOperand(1) == DstReg
		// Only shrink if forwarded use is the end of a segment.
		if (UseSeg->end != UseIndex.getRegSlot())
		return;

		qcolombetUnsubmitted Done Reply Inline Actions Add a comment on what are the relation between CopyDstReg, CopySrcReg and UseMI and assert on that. E.g., UseMI is a copy that uses CopyDstReg. CopyDstReg is going to be replaced by CopySrcReg. qcolombet: Add a comment on what are the relation between CopyDstReg, CopySrcReg and UseMI and assert on…
		LIS->shrinkToUses(&LI, &DeadInsts);
		qcolombetUnsubmitted Done Reply Inline Actions Move that closer to its first use. qcolombet: Move that closer to its first use.
		if (!DeadInsts.empty()) {
		qcolombetUnsubmitted Done Reply Inline Actions Unless I am missing something, I don't see the extend to use call for CopySrcReg. qcolombet: Unless I am missing something, I don't see the extend to use call for CopySrcReg.
		gberryAuthorUnsubmitted Not Done Reply Inline Actions That was being done before the call to this function. I've moved it into this function in my latest revision. gberry: That was being done before the call to this function. I've moved it into this function in my…
		SmallVector<unsigned, 8> NewRegs;
		LiveRangeEdit(nullptr, NewRegs, MF, LIS,
		nullptr, this).eliminateDeadDefs(DeadInsts);
		}
		}

		void MachineCopyPropagation::LRE_WillEraseInstruction(MachineInstr *MI) {
		// Remove this COPY from further consideration for forwarding.
		ClobberRegister(getFullPhysReg(MI->getOperand(0)));
		Changed = true;
		}

		void MachineCopyPropagation::forwardUses(MachineInstr &MI) {
		if (AvailCopyMap.empty())
		return;

		// Look for non-tied explicit vreg uses that have an active COPY
		// instruction that defines the physical register allocated to them.
		// Replace the vreg with the source of the active COPY.
		for (MachineOperand &MOUse : MI.explicit_uses()) {
		if (!MOUse.isReg() \|\| MOUse.isTied())
		continue;

		unsigned UseReg = MOUse.getReg();

		if (TargetRegisterInfo::isVirtualRegister(UseReg))
		UseReg = VRM->getPhys(UseReg);
		qcolombetUnsubmitted Done Reply Inline Actions Add a comment on what this is doing. qcolombet: Add a comment on what this is doing.
		else if (MI.isCall() \|\| MI.isReturn() \|\| MI.isInlineAsm() \|\|
		MI.hasUnmodeledSideEffects() \|\| MI.isDebugValue() \|\| MI.isKill())
		// Some instructions seem to have ABI uses e.g. not marked as
		// implicit, which can lead to forwarding them when we shouldn't, so
		// restrict the types of instructions we forward physical regs into.
		continue;

		// Don't forward COPYs via non-allocatable regs since they can have
		// non-standard semantics.
		if (!MRI->isAllocatable(UseReg))
		continue;
		qcolombetUnsubmitted Done Reply Inline Actions I would add: if (!UseReg) continue; qcolombet: I would add: if (!UseReg) continue;

		auto CI = AvailCopyMap.find(UseReg);
		if (CI == AvailCopyMap.end())
		continue;

		MachineInstr &Copy = *CI->second;
		MachineOperand &CopyDst = Copy.getOperand(0);
		MachineOperand &CopySrc = Copy.getOperand(1);
		unsigned NewUseReg = CopySrc.getReg();
		qcolombetUnsubmitted Not Done Reply Inline Actions I would avoid forwarding on physical register period. qcolombet: I would avoid forwarding on physical register period.
		gberryAuthorUnsubmitted Not Done Reply Inline Actions I have seen some benefit to forwarding physical registers, mostly in cases where block boundaries are changed between RA time and the second run of this pass. This mostly seems to happen when tail merge/tail duplication changes which uses COPYs are exposed to in the same block. This pass won't remove any physical register writes, so this seems relatively safe, other than the caveat in the comment above. gberry: I have seen some benefit to forwarding physical registers, mostly in cases where block…

		// Don't forward COPYs that are already NOPs due to register assignment.
		if (getPhysReg(CopyDst) == getPhysReg(CopySrc))
		qcolombetUnsubmitted Not Done Reply Inline Actions Put /ForClobber/ (or the updated name) in from of false. qcolombet: Put /ForClobber/ (or the updated name) in from of false.
		gberryAuthorUnsubmitted Not Done Reply Inline Actions I refactored this to put the 'FullReg' parameter in the function name instead to be more clear. gberry: I refactored this to put the 'FullReg' parameter in the function name instead to be more clear.
		continue;

		// FIXME: Don't handle partial uses of wider COPYs yet.
		if (CopyDst.getSubReg() != 0 \|\| UseReg != getPhysReg(CopyDst))
		continue;

		// Don't forward COPYs of non-allocatable regs unless they are constant.
		if (TargetRegisterInfo::isPhysicalRegister(NewUseReg) &&
		!MRI->isAllocatable(NewUseReg) && !MRI->isConstantPhysReg(NewUseReg))
		continue;

		if (!isForwardableRegClassCopy(Copy, MI))
		continue;

		unsigned NewUseSubReg;
		if (TargetRegisterInfo::isPhysicalRegister(NewUseReg)) {
		if (MOUse.getSubReg())
		NewUseReg = TRI->getSubReg(NewUseReg, MOUse.getSubReg());
		qcolombetUnsubmitted Not Done Reply Inline Actions That's invalid per the MachineVerifier qcolombet: That's invalid per the MachineVerifier
		gberryAuthorUnsubmitted Not Done Reply Inline Actions Can you elaborate on this? I'm not sure what you are referring to as being invalid. gberry: Can you elaborate on this? I'm not sure what you are referring to as being invalid.
		qcolombetUnsubmitted Not Done Reply Inline Actions Sorry. Physical registers don't have subreg indices. The machine verifier checks that. qcolombet: Sorry. Physical registers don't have subreg indices. The machine verifier checks that.
		gberryAuthorUnsubmitted Not Done Reply Inline Actions In this case, the MOUse may not be a physical register. We are replacing a possibly virtual reg (MOUse) with a physical reg (NewUseReg), so we need to make sure to translate the subreg on MOUse. I've added a comment to hopefully clarify this a bit. gberry: In this case, the MOUse may not be a physical register. We are replacing a possibly virtual…
		qcolombetUnsubmitted Not Done Reply Inline Actions Ah right, I missed the New in the name of the check. qcolombet: Ah right, I missed the New in the name of the check.
		// If the original use subreg isn't valid on the new src reg, we can't
		// forward it here.
		if (!NewUseReg)
		continue;
		NewUseSubReg = 0;
		} else {
		// %v1 = COPY %v2:sub1
		// USE %v1:sub2
		// The new use is %v2:sub1:sub2
		NewUseSubReg = TRI->composeSubRegIndices(CopySrc.getSubReg(),
		MOUse.getSubReg());
		// Check that NewUseSubReg is valid on NewUseReg
		if (NewUseSubReg && !TRI->getSubClassWithSubReg(
		MRI->getRegClass(NewUseReg), NewUseSubReg))
		continue;
		}

		// Skip instructions that have implicit uses that overlap with the register
		// being replaced, since these can sometimes be implicitly tied to other
		// operands. For example, on AMDGPU:
		//
		// V_MOVRELS_B32_e32 %VGPR2, %M0<imp-use>, %EXEC<imp-use>, %VGPR2_VGPR3_VGPR4_VGPR5<imp-use>
		//
		// the %VGPR2 is implicitly tied to the larger reg operand, but we have no
		// way of knowing we need to update the latter when updating the former.
		auto hasImplicitOverlap = [this](const MachineOperand &Use,
		const MachineInstr &MI) {
		for (const MachineOperand &MIUse : MI.uses())
		if (&MIUse != &Use && MIUse.isReg() && MIUse.isImplicit() &&
		TRI->regsOverlap(Use.getReg(), MIUse.getReg()))
		return true;
		return false;
		};
		if (hasImplicitOverlap(MOUse, MI))
		continue;

		DEBUG(dbgs() << "MCP: Replacing "
		<< PrintReg(MOUse.getReg(), TRI, MOUse.getSubReg())
		<< " with "
		<< PrintReg(NewUseReg, TRI, CopySrc.getSubReg())
		<< " in: " << MI;);

		// Narrow the register class of the forwarded vreg so it matches any
		// instruction constraints.
		//
		// If we are forwarding
		// A:RCA = COPY B:RCB
		// into
		// ... = OP A:RCA
		// then we need to narrow the register class of B so that it is a subclass
		// of RCA so that it meets the instruction register class constraints.
		qcolombetUnsubmitted Done Reply Inline Actions We don't need this check for virtual reg, right? qcolombet: We don't need this check for virtual reg, right?
		gberryAuthorUnsubmitted Not Done Reply Inline Actions No, I don't think so. I added an early exit for this case to hasImplicitOverlap. gberry: No, I don't think so. I added an early exit for this case to hasImplicitOverlap.
		if (TargetRegisterInfo::isVirtualRegister(NewUseReg)) {
		// Make sure the virtual reg class allows the subreg.
		if (NewUseSubReg) {
		const TargetRegisterClass *CurUseRC = MRI->getRegClass(NewUseReg);
		const TargetRegisterClass *NewUseRC =
		TRI->getSubClassWithSubReg(CurUseRC, NewUseSubReg);
		if (CurUseRC != NewUseRC) {
		DEBUG(dbgs() << "MCP: Setting regclass of "
		<< PrintReg(NewUseReg, TRI) << " to "
		<< TRI->getRegClassName(NewUseRC) << "\n");
		MRI->setRegClass(NewUseReg, NewUseRC);
		}
		}

		unsigned MOUseOpNo = &MOUse - &MI.getOperand(0);
		const TargetRegisterClass *InstRC =
		TII->getRegClass(MI.getDesc(), MOUseOpNo, TRI, *MF);
		if (InstRC) {
		const TargetRegisterClass *CurUseRC = MRI->getRegClass(NewUseReg);
		if (NewUseSubReg)
		InstRC = TRI->getMatchingSuperRegClass(CurUseRC, InstRC, NewUseSubReg);
		if (!InstRC->hasSubClassEq(CurUseRC)) {
		const TargetRegisterClass *NewUseRC =
		TRI->getCommonSubClass(InstRC, CurUseRC);
		DEBUG(dbgs() << "MCP: Setting regclass of "
		<< PrintReg(NewUseReg, TRI) << " to "
		<< TRI->getRegClassName(NewUseRC) << "\n");
		MRI->setRegClass(NewUseReg, NewUseRC);
		}
		}
		}

		unsigned OrigUseReg = MOUse.getReg();
		MOUse.setReg(NewUseReg);
		MOUse.setSubReg(NewUseSubReg);

		DEBUG(dbgs() << "MCP: After replacement: " << MI << "\n");

		if (PreRegRewrite) {
		// Extend live range starting from COPY early-clobber slot, since that
		// is where the original src live range ends.
		SlotIndex CopyUseIdx =
		Indexes->getInstructionIndex(Copy).getRegSlot(true /=EarlyClobber/);
		SlotIndex UseIdx = Indexes->getInstructionIndex(MI).getRegSlot();
		if (TargetRegisterInfo::isVirtualRegister(NewUseReg)) {
		LiveInterval &LI = LIS->getInterval(NewUseReg);
		LI.extendInBlock(CopyUseIdx, UseIdx);
		qcolombetUnsubmitted Done Reply Inline Actions It would help the readability if there were helper function for the different regclass/register fixing for the respective physreg and virtureg cases. E.g., a helper with if NewUseReg is a physreg then block line 485 else block line 541. Then the code would look like: Check if replacing is possible between MOUse and NewUseReg [...] isForwardXXX [...] Adapt NewUseReg to whatever constraints are carried by MOUse fix(NewUseReg, NewUseSubReg); // <--- This call your helper function qcolombet: It would help the readability if there were helper function for the different regclass/register…
		gberryAuthorUnsubmitted Not Done Reply Inline Actions I've factored out quite a bit of code from this function, let me know what you think. gberry: I've factored out quite a bit of code from this function, let me know what you think.
		LaneBitmask UseMask = TRI->getSubRegIndexLaneMask(NewUseSubReg);
		for (auto &S : LI.subranges())
		if ((S.LaneMask & UseMask).any() && S.find(CopyUseIdx))
		S.extendInBlock(CopyUseIdx, UseIdx);
		} else {
		assert(NewUseSubReg == 0 && "Unexpected subreg on physical register!");
		for (MCRegUnitIterator UI(NewUseReg, TRI); UI.isValid(); ++UI) {
		LiveRange &LR = LIS->getRegUnit(*UI);
		LR.extendInBlock(CopyUseIdx, UseIdx);
		}
		}

		if (TargetRegisterInfo::isVirtualRegister(OrigUseReg))
		updateForwardedCopyLiveInterval(OrigUseReg, NewUseReg, MI);
		} else {
		for (MachineInstr &KMI :
		make_range(Copy.getIterator(), std::next(MI.getIterator())))
		KMI.clearRegisterKills(NewUseReg, TRI);
		}

		++NumCopyForwards;
		Changed = true;
		}
		}

void MachineCopyPropagation::CopyPropagateBlock(MachineBasicBlock &MBB) {		void MachineCopyPropagation::CopyPropagateBlock(MachineBasicBlock &MBB) {
DEBUG(dbgs() << "MCP: CopyPropagateBlock " << MBB.getName() << "\n");		DEBUG(dbgs() << "MCP: CopyPropagateBlock " << MBB.getName() << "\n");

for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end(); I != E; ) {		for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end(); I != E; ) {
MachineInstr MI = &I;		MachineInstr MI = &I;
++I;		++I;

if (MI->isCopy()) {		if (MI->isCopy()) {
unsigned Def = MI->getOperand(0).getReg();		unsigned Def = getPhysReg(MI->getOperand(0));
unsigned Src = MI->getOperand(1).getReg();		unsigned Src = getPhysReg(MI->getOperand(1));

assert(!TargetRegisterInfo::isVirtualRegister(Def) &&
!TargetRegisterInfo::isVirtualRegister(Src) &&
"MachineCopyPropagation should be run after register allocation!");

// The two copies cancel out and the source of the first copy		// The two copies cancel out and the source of the first copy
// hasn't been overridden, eliminate the second one. e.g.		// hasn't been overridden, eliminate the second one. e.g.
// %ECX<def> = COPY %EAX		// %ECX<def> = COPY %EAX
// ... nothing clobbered EAX.		// ... nothing clobbered EAX.
// %EAX<def> = COPY %ECX		// %EAX<def> = COPY %ECX
// =>		// =>
// %ECX<def> = COPY %EAX		// %ECX<def> = COPY %EAX
//		//
// or		// or
//		//
// %ECX<def> = COPY %EAX		// %ECX<def> = COPY %EAX
// ... nothing clobbered EAX.		// ... nothing clobbered EAX.
// %ECX<def> = COPY %EAX		// %ECX<def> = COPY %EAX
// =>		// =>
// %ECX<def> = COPY %EAX		// %ECX<def> = COPY %EAX
		if (!PreRegRewrite)
if (eraseIfRedundant(MI, Def, Src) \|\| eraseIfRedundant(MI, Src, Def))		if (eraseIfRedundant(MI, Def, Src) \|\| eraseIfRedundant(MI, Src, Def))
continue;		continue;

		forwardUses(*MI);
		inouehrsUnsubmitted Not Done Reply Inline Actions Do we need to execute forwardUses after virtual register rewriting again? inouehrs: Do we need to execute forwardUses after virtual register rewriting again?
		gberryAuthorUnsubmitted Not Done Reply Inline Actions There are additional forwarding opportunities exposed later in the pipeline by e.g. tail-merging and tail-duplication that are caught by this second run. gberry: There are additional forwarding opportunities exposed later in the pipeline by e.g. tail…

		// Src may have been changed by forwardUses()
		Src = getPhysReg(MI->getOperand(1));
		unsigned DefClobber = getFullPhysReg(MI->getOperand(0));
		unsigned SrcClobber = getFullPhysReg(MI->getOperand(1));

// If Src is defined by a previous copy, the previous copy cannot be		// If Src is defined by a previous copy, the previous copy cannot be
// eliminated.		// eliminated.
ReadRegister(Src);		ReadRegister(Src);
for (const MachineOperand &MO : MI->implicit_operands()) {		for (const MachineOperand &MO : MI->implicit_operands()) {
if (!MO.isReg() \|\| !MO.readsReg())		if (!MO.isReg() \|\| !MO.readsReg())
continue;		continue;
unsigned Reg = MO.getReg();		unsigned Reg = MO.getReg();
if (!Reg)		if (!Reg)
continue;		continue;
ReadRegister(Reg);		ReadRegister(Reg);
}		}

DEBUG(dbgs() << "MCP: Copy is a deletion candidate: "; MI->dump());		DEBUG(dbgs() << "MCP: Copy is a deletion candidate: "; MI->dump());

// Copy is now a candidate for deletion.		// Copy is now a candidate for deletion.
if (!MRI->isReserved(Def))		// Only look for dead COPYs if we're not running just before
		// VirtRegRewriter, since presumably these COPYs will have already been
		// removed.
		if (!PreRegRewrite && !MRI->isReserved(Def))
MaybeDeadCopies.insert(MI);		MaybeDeadCopies.insert(MI);

// If 'Def' is previously source of another copy, then this earlier copy's		// If 'Def' is previously source of another copy, then this earlier copy's
// source is no longer available. e.g.		// source is no longer available. e.g.
// %xmm9<def> = copy %xmm2		// %xmm9<def> = copy %xmm2
// ...		// ...
// %xmm2<def> = copy %xmm0		// %xmm2<def> = copy %xmm0
// ...		// ...
// %xmm2<def> = copy %xmm9		// %xmm2<def> = copy %xmm9
ClobberRegister(Def);		ClobberRegister(DefClobber);
for (const MachineOperand &MO : MI->implicit_operands()) {		for (const MachineOperand &MO : MI->implicit_operands()) {
if (!MO.isReg() \|\| !MO.isDef())		if (!MO.isReg() \|\| !MO.isDef())
continue;		continue;
unsigned Reg = MO.getReg();		unsigned Reg = getFullPhysReg(MO);
if (!Reg)		if (!Reg)
continue;		continue;
ClobberRegister(Reg);		ClobberRegister(Reg);
}		}

// Remember Def is defined by the copy.		// Remember Def is defined by the copy.
for (MCSubRegIterator SR(Def, TRI, /IncludeSelf=/true); SR.isValid();		for (MCSubRegIterator SR(Def, TRI, /IncludeSelf=/true); SR.isValid();
++SR) {		++SR) {
CopyMap[*SR] = MI;		CopyMap[*SR] = MI;
AvailCopyMap[*SR] = MI;		AvailCopyMap[*SR] = MI;
}		}

// Remember source that's copied to Def. Once it's clobbered, then		// Remember source that's copied to Def. Once it's clobbered, then
// it's no longer available for copy propagation.		// it's no longer available for copy propagation.
RegList &DestList = SrcMap[Src];		RegList &DestList = SrcMap[SrcClobber];
if (!is_contained(DestList, Def))		if (!is_contained(DestList, DefClobber))
DestList.push_back(Def);		DestList.push_back(DefClobber);

continue;		continue;
}		}

		// Clobber any earlyclobber regs first.
		for (const MachineOperand &MO : MI->operands())
		if (MO.isReg() && MO.isEarlyClobber()) {
		unsigned Reg = getFullPhysReg(MO);
		// If we have a tied earlyclobber, that means it is also read by this
		// instruction, so we need to make sure we don't remove it as dead
		// later.
		if (MO.isTied())
		ReadRegister(Reg);
		ClobberRegister(Reg);
		}

		forwardUses(*MI);
		inouehrsUnsubmitted Not Done Reply Inline Actions Ditto. Do we need this after virtual register rewriting again? inouehrs: Ditto. Do we need this after virtual register rewriting again?

// Not a copy.		// Not a copy.
SmallVector<unsigned, 2> Defs;		SmallVector<unsigned, 2> Defs;
const MachineOperand *RegMask = nullptr;		const MachineOperand *RegMask = nullptr;
for (const MachineOperand &MO : MI->operands()) {		for (const MachineOperand &MO : MI->operands()) {
if (MO.isRegMask())		if (MO.isRegMask())
RegMask = &MO;		RegMask = &MO;
if (!MO.isReg())		if (!MO.isReg())
continue;		continue;
unsigned Reg = MO.getReg();		unsigned Reg = getFullPhysReg(MO);
if (!Reg)		if (!Reg)
continue;		continue;

assert(!TargetRegisterInfo::isVirtualRegister(Reg) &&		if (MO.isDef() && !MO.isEarlyClobber()) {
"MachineCopyPropagation should be run after register allocation!");

if (MO.isDef()) {
Defs.push_back(Reg);		Defs.push_back(Reg);
continue;		continue;
} else if (MO.readsReg())		} else if (MO.readsReg())
ReadRegister(Reg);		ReadRegister(Reg);
}		}

// The instruction has a register mask operand which means that it clobbers		// The instruction has a register mask operand which means that it clobbers
// a large set of registers. Treat clobbered registers the same way as		// a large set of registers. Treat clobbered registers the same way as
Show All 40 Lines	for (unsigned Reg : Defs)
ClobberRegister(Reg);		ClobberRegister(Reg);
}		}

// If MBB doesn't have successors, delete the copies whose defs are not used.		// If MBB doesn't have successors, delete the copies whose defs are not used.
// If MBB does have successors, then conservative assume the defs are live-out		// If MBB does have successors, then conservative assume the defs are live-out
// since we don't want to trust live-in lists.		// since we don't want to trust live-in lists.
if (MBB.succ_empty()) {		if (MBB.succ_empty()) {
for (MachineInstr *MaybeDead : MaybeDeadCopies) {		for (MachineInstr *MaybeDead : MaybeDeadCopies) {
		DEBUG(dbgs() << "MCP: Removing copy due to no live-out succ: ";
		MaybeDead->dump());
assert(!MRI->isReserved(MaybeDead->getOperand(0).getReg()));		assert(!MRI->isReserved(MaybeDead->getOperand(0).getReg()));
MaybeDead->eraseFromParent();		MaybeDead->eraseFromParent();
Changed = true;		Changed = true;
++NumDeletes;		++NumDeletes;
}		}
}		}

MaybeDeadCopies.clear();		MaybeDeadCopies.clear();
AvailCopyMap.clear();		AvailCopyMap.clear();
CopyMap.clear();		CopyMap.clear();
SrcMap.clear();		SrcMap.clear();
}		}

bool MachineCopyPropagation::runOnMachineFunction(MachineFunction &MF) {		bool MachineCopyPropagation::runOnMachineFunction(MachineFunction &MF) {
if (skipFunction(*MF.getFunction()))		if (skipFunction(*MF.getFunction()))
return false;		return false;

Changed = false;		Changed = false;

TRI = MF.getSubtarget().getRegisterInfo();		TRI = MF.getSubtarget().getRegisterInfo();
TII = MF.getSubtarget().getInstrInfo();		TII = MF.getSubtarget().getInstrInfo();
MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
		this->MF = &MF;
		if (PreRegRewrite) {
		Indexes = &getAnalysis<SlotIndexes>();
		LIS = &getAnalysis<LiveIntervals>();
		VRM = &getAnalysis<VirtRegMap>();
		}
		NoSubRegLiveness = !MRI->subRegLivenessEnabled();

for (MachineBasicBlock &MBB : MF)		for (MachineBasicBlock &MBB : MF)
CopyPropagateBlock(MBB);		CopyPropagateBlock(MBB);

return Changed;		return Changed;
}		}

lib/CodeGen/TargetPassConfig.cpp

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
static cl::opt<bool> DisableLSR("disable-lsr", cl::Hidden,		static cl::opt<bool> DisableLSR("disable-lsr", cl::Hidden,
cl::desc("Disable Loop Strength Reduction Pass"));		cl::desc("Disable Loop Strength Reduction Pass"));
static cl::opt<bool> DisableConstantHoisting("disable-constant-hoisting",		static cl::opt<bool> DisableConstantHoisting("disable-constant-hoisting",
cl::Hidden, cl::desc("Disable ConstantHoisting"));		cl::Hidden, cl::desc("Disable ConstantHoisting"));
static cl::opt<bool> DisableCGP("disable-cgp", cl::Hidden,		static cl::opt<bool> DisableCGP("disable-cgp", cl::Hidden,
cl::desc("Disable Codegen Prepare"));		cl::desc("Disable Codegen Prepare"));
static cl::opt<bool> DisableCopyProp("disable-copyprop", cl::Hidden,		static cl::opt<bool> DisableCopyProp("disable-copyprop", cl::Hidden,
cl::desc("Disable Copy Propagation pass"));		cl::desc("Disable Copy Propagation pass"));
		static cl::opt<bool> DisableCopyPropPreRegRewrite("disable-copyprop-prerewrite", cl::Hidden,
		cl::desc("Disable Copy Propagation Pre-Register Re-write pass"));
static cl::opt<bool> DisablePartialLibcallInlining("disable-partial-libcall-inlining",		static cl::opt<bool> DisablePartialLibcallInlining("disable-partial-libcall-inlining",
cl::Hidden, cl::desc("Disable Partial Libcall Inlining"));		cl::Hidden, cl::desc("Disable Partial Libcall Inlining"));
static cl::opt<bool> EnableImplicitNullChecks(		static cl::opt<bool> EnableImplicitNullChecks(
"enable-implicit-null-checks",		"enable-implicit-null-checks",
cl::desc("Fold null checks into faulting memory operations"),		cl::desc("Fold null checks into faulting memory operations"),
cl::init(false));		cl::init(false));
static cl::opt<bool> PrintLSR("print-lsr-output", cl::Hidden,		static cl::opt<bool> PrintLSR("print-lsr-output", cl::Hidden,
cl::desc("Print LLVM IR produced by the loop-reduce pass"));		cl::desc("Print LLVM IR produced by the loop-reduce pass"));
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	if (StandardID == &TargetPassConfig::PostRAMachineLICMID)
return applyDisable(TargetID, DisablePostRAMachineLICM);		return applyDisable(TargetID, DisablePostRAMachineLICM);

if (StandardID == &MachineSinkingID)		if (StandardID == &MachineSinkingID)
return applyDisable(TargetID, DisableMachineSink);		return applyDisable(TargetID, DisableMachineSink);

if (StandardID == &MachineCopyPropagationID)		if (StandardID == &MachineCopyPropagationID)
return applyDisable(TargetID, DisableCopyProp);		return applyDisable(TargetID, DisableCopyProp);

		if (StandardID == &MachineCopyPropagationPreRegRewriteID)
		return applyDisable(TargetID, DisableCopyPropPreRegRewrite);

return TargetID;		return TargetID;
}		}

//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
/// TargetPassConfig		/// TargetPassConfig
//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//

INITIALIZE_PASS(TargetPassConfig, "targetpassconfig",		INITIALIZE_PASS(TargetPassConfig, "targetpassconfig",
▲ Show 20 Lines • Show All 731 Lines • ▼ Show 20 Lines	void TargetPassConfig::addOptimizedRegAlloc(FunctionPass *RegAllocPass) {

if (RegAllocPass) {		if (RegAllocPass) {
// Add the selected register allocation pass.		// Add the selected register allocation pass.
addPass(RegAllocPass);		addPass(RegAllocPass);

// Allow targets to change the register assignments before rewriting.		// Allow targets to change the register assignments before rewriting.
addPreRewrite();		addPreRewrite();

		// Copy propagate to forward register uses and try to eliminate COPYs that
		// were not coalesced.
		addPass(&MachineCopyPropagationPreRegRewriteID);

// Finally rewrite virtual registers.		// Finally rewrite virtual registers.
addPass(&VirtRegRewriterID);		addPass(&VirtRegRewriterID);

// Perform stack slot coloring and post-ra machine LICM.		// Perform stack slot coloring and post-ra machine LICM.
//		//
// FIXME: Re-enable coloring with register when it's capable of adding		// FIXME: Re-enable coloring with register when it's capable of adding
// kill markers.		// kill markers.
addPass(&StackSlotColoringID);		addPass(&StackSlotColoringID);
▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

test/CodeGen/AArch64/aarch64-fold-lslfast.ll

	; RUN: llc < %s -mtriple=aarch64-linux-gnu -mattr=+lsl-fast \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-linux-gnu -mattr=+lsl-fast \| FileCheck %s

	%struct.a = type [256 x i16]			%struct.a = type [256 x i16]
	%struct.b = type [256 x i32]			%struct.b = type [256 x i32]
	%struct.c = type [256 x i64]			%struct.c = type [256 x i64]

	declare void @foo()			declare void @foo()
	define i16 @halfword(%struct.a* %ctx, i32 %xor72) nounwind {			define i16 @halfword(%struct.a* %ctx, i32 %xor72) nounwind {
	; CHECK-LABEL: halfword:			; CHECK-LABEL: halfword:
	; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8			; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8
	; CHECK: ldrh [[REG1:w[0-9]+]], [{{.*}}[[REG2:x[0-9]+]], [[REG]], lsl #1]			; CHECK: ldrh [[REG1:w[0-9]+]], [{{.*}}[[REG2:x[0-9]+]], [[REG]], lsl #1]
	; CHECK: strh [[REG1]], [{{.*}}[[REG2]], [[REG]], lsl #1]			; CHECK: mov [[REG3:x[0-9]+]], [[REG2]]
				; CHECK: strh [[REG1]], [{{.*}}[[REG3]], [[REG]], lsl #1]
	%shr81 = lshr i32 %xor72, 9			%shr81 = lshr i32 %xor72, 9
	%conv82 = zext i32 %shr81 to i64			%conv82 = zext i32 %shr81 to i64
	%idxprom83 = and i64 %conv82, 255			%idxprom83 = and i64 %conv82, 255
	%arrayidx86 = getelementptr inbounds %struct.a, %struct.a* %ctx, i64 0, i64 %idxprom83			%arrayidx86 = getelementptr inbounds %struct.a, %struct.a* %ctx, i64 0, i64 %idxprom83
	%result = load i16, i16* %arrayidx86, align 2			%result = load i16, i16* %arrayidx86, align 2
	call void @foo()			call void @foo()
	store i16 %result, i16* %arrayidx86, align 2			store i16 %result, i16* %arrayidx86, align 2
	ret i16 %result			ret i16 %result
	}			}

	define i32 @word(%struct.b* %ctx, i32 %xor72) nounwind {			define i32 @word(%struct.b* %ctx, i32 %xor72) nounwind {
	; CHECK-LABEL: word:			; CHECK-LABEL: word:
	; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8			; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8
	; CHECK: ldr [[REG1:w[0-9]+]], [{{.*}}[[REG2:x[0-9]+]], [[REG]], lsl #2]			; CHECK: ldr [[REG1:w[0-9]+]], [{{.*}}[[REG2:x[0-9]+]], [[REG]], lsl #2]
	; CHECK: str [[REG1]], [{{.*}}[[REG2]], [[REG]], lsl #2]			; CHECK: mov [[REG3:x[0-9]+]], [[REG2]]
				; CHECK: str [[REG1]], [{{.*}}[[REG3]], [[REG]], lsl #2]
	%shr81 = lshr i32 %xor72, 9			%shr81 = lshr i32 %xor72, 9
	%conv82 = zext i32 %shr81 to i64			%conv82 = zext i32 %shr81 to i64
	%idxprom83 = and i64 %conv82, 255			%idxprom83 = and i64 %conv82, 255
	%arrayidx86 = getelementptr inbounds %struct.b, %struct.b* %ctx, i64 0, i64 %idxprom83			%arrayidx86 = getelementptr inbounds %struct.b, %struct.b* %ctx, i64 0, i64 %idxprom83
	%result = load i32, i32* %arrayidx86, align 4			%result = load i32, i32* %arrayidx86, align 4
	call void @foo()			call void @foo()
	store i32 %result, i32* %arrayidx86, align 4			store i32 %result, i32* %arrayidx86, align 4
	ret i32 %result			ret i32 %result
	}			}

	define i64 @doubleword(%struct.c* %ctx, i32 %xor72) nounwind {			define i64 @doubleword(%struct.c* %ctx, i32 %xor72) nounwind {
	; CHECK-LABEL: doubleword:			; CHECK-LABEL: doubleword:
	; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8			; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8
	; CHECK: ldr [[REG1:x[0-9]+]], [{{.*}}[[REG2:x[0-9]+]], [[REG]], lsl #3]			; CHECK: ldr [[REG1:x[0-9]+]], [{{.*}}[[REG2:x[0-9]+]], [[REG]], lsl #3]
	; CHECK: str [[REG1]], [{{.*}}[[REG2]], [[REG]], lsl #3]			; CHECK: mov [[REG3:x[0-9]+]], [[REG2]]
				; CHECK: str [[REG1]], [{{.*}}[[REG3]], [[REG]], lsl #3]
	%shr81 = lshr i32 %xor72, 9			%shr81 = lshr i32 %xor72, 9
	%conv82 = zext i32 %shr81 to i64			%conv82 = zext i32 %shr81 to i64
	%idxprom83 = and i64 %conv82, 255			%idxprom83 = and i64 %conv82, 255
	%arrayidx86 = getelementptr inbounds %struct.c, %struct.c* %ctx, i64 0, i64 %idxprom83			%arrayidx86 = getelementptr inbounds %struct.c, %struct.c* %ctx, i64 0, i64 %idxprom83
	%result = load i64, i64* %arrayidx86, align 8			%result = load i64, i64* %arrayidx86, align 8
	call void @foo()			call void @foo()
	store i64 %result, i64* %arrayidx86, align 8			store i64 %result, i64* %arrayidx86, align 8
	ret i64 %result			ret i64 %result
	Show All 24 Lines

test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll

	; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-eabi -aarch64-neon-syntax=apple -aarch64-enable-simd-scalar=true -asm-verbose=false -disable-adv-copy-opt=true \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-NOOPT			; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-eabi -aarch64-neon-syntax=apple -aarch64-enable-simd-scalar=true -asm-verbose=false -disable-adv-copy-opt=true \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-NOOPT
	; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-eabi -aarch64-neon-syntax=apple -aarch64-enable-simd-scalar=true -asm-verbose=false -disable-adv-copy-opt=false \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-OPT			; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-eabi -aarch64-neon-syntax=apple -aarch64-enable-simd-scalar=true -asm-verbose=false -disable-adv-copy-opt=false \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-OPT
	; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-eabi -aarch64-neon-syntax=generic -aarch64-enable-simd-scalar=true -asm-verbose=false -disable-adv-copy-opt=true \| FileCheck %s -check-prefix=GENERIC -check-prefix=GENERIC-NOOPT			; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-eabi -aarch64-neon-syntax=generic -aarch64-enable-simd-scalar=true -asm-verbose=false -disable-adv-copy-opt=true \| FileCheck %s -check-prefix=GENERIC -check-prefix=GENERIC-NOOPT
	; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-eabi -aarch64-neon-syntax=generic -aarch64-enable-simd-scalar=true -asm-verbose=false -disable-adv-copy-opt=false \| FileCheck %s -check-prefix=GENERIC -check-prefix=GENERIC-OPT			; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-eabi -aarch64-neon-syntax=generic -aarch64-enable-simd-scalar=true -asm-verbose=false -disable-adv-copy-opt=false \| FileCheck %s -check-prefix=GENERIC -check-prefix=GENERIC-OPT

	define <2 x i64> @bar(<2 x i64> %a, <2 x i64> %b) nounwind readnone {			define <2 x i64> @bar(<2 x i64> %a, <2 x i64> %b) nounwind readnone {
	; CHECK-LABEL: bar:			; CHECK-LABEL: bar:
	; CHECK: add.2d v[[REG:[0-9]+]], v0, v1			; CHECK: add.2d v[[REG:[0-9]+]], v0, v1
	; CHECK: add d[[REG3:[0-9]+]], d[[REG]], d1			; CHECK: add d[[REG3:[0-9]+]], d[[REG]], d1
	; CHECK: sub d[[REG2:[0-9]+]], d[[REG]], d1			; CHECK: sub d[[REG2:[0-9]+]], d[[REG]], d1
	; Without advanced copy optimization, we end up with cross register			; CHECK-NOT: fmov
	; banks copies that cannot be coalesced.
	; CHECK-NOOPT: fmov [[COPY_REG3:x[0-9]+]], d[[REG3]]
	; With advanced copy optimization, we end up with just one copy
	; to insert the computed high part into the V register.
	; CHECK-OPT-NOT: fmov
	; CHECK: fmov [[COPY_REG2:x[0-9]+]], d[[REG2]]			; CHECK: fmov [[COPY_REG2:x[0-9]+]], d[[REG2]]
	; CHECK-NOOPT: fmov d0, [[COPY_REG3]]			; CHECK-NOT: fmov
	; CHECK-OPT-NOT: fmov
	; CHECK: ins.d v0[1], [[COPY_REG2]]			; CHECK: ins.d v0[1], [[COPY_REG2]]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	;			;
	; GENERIC-LABEL: bar:			; GENERIC-LABEL: bar:
	; GENERIC: add v[[REG:[0-9]+]].2d, v0.2d, v1.2d			; GENERIC: add v[[REG:[0-9]+]].2d, v0.2d, v1.2d
	; GENERIC: add d[[REG3:[0-9]+]], d[[REG]], d1			; GENERIC: add d[[REG3:[0-9]+]], d[[REG]], d1
	; GENERIC: sub d[[REG2:[0-9]+]], d[[REG]], d1			; GENERIC: sub d[[REG2:[0-9]+]], d[[REG]], d1
	; GENERIC-NOOPT: fmov [[COPY_REG3:x[0-9]+]], d[[REG3]]			; GENERIC-NOT: fmov
	; GENERIC-OPT-NOT: fmov
	; GENERIC: fmov [[COPY_REG2:x[0-9]+]], d[[REG2]]			; GENERIC: fmov [[COPY_REG2:x[0-9]+]], d[[REG2]]
	; GENERIC-NOOPT: fmov d0, [[COPY_REG3]]			; GENERIC-NOT: fmov
	; GENERIC-OPT-NOT: fmov
	; GENERIC: ins v0.d[1], [[COPY_REG2]]			; GENERIC: ins v0.d[1], [[COPY_REG2]]
	; GENERIC-NEXT: ret			; GENERIC-NEXT: ret
	%add = add <2 x i64> %a, %b			%add = add <2 x i64> %a, %b
	%vgetq_lane = extractelement <2 x i64> %add, i32 0			%vgetq_lane = extractelement <2 x i64> %add, i32 0
	%vgetq_lane2 = extractelement <2 x i64> %b, i32 0			%vgetq_lane2 = extractelement <2 x i64> %b, i32 0
	%add3 = add i64 %vgetq_lane, %vgetq_lane2			%add3 = add i64 %vgetq_lane, %vgetq_lane2
	%sub = sub i64 %vgetq_lane, %vgetq_lane2			%sub = sub i64 %vgetq_lane, %vgetq_lane2
	%vecinit = insertelement <2 x i64> undef, i64 %add3, i32 0			%vecinit = insertelement <2 x i64> undef, i64 %add3, i32 0
	▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll

	; RUN: llc -mtriple=arm64-apple-ios -mcpu=cyclone < %s \| FileCheck %s			; RUN: llc -mtriple=arm64-apple-ios -mcpu=cyclone < %s \| FileCheck %s
	; rdar://12254953			; rdar://12254953

	define i32 @t(i32 %a, i32 %b, i32 %c, i32 %d) nounwind ssp {			define i32 @t(i32 %a, i32 %b, i32 %c, i32 %d) nounwind ssp {
	entry:			entry:
	; CHECK-LABEL: t:			; CHECK-LABEL: t:
	; CHECK: mov x0, [[REG1:x[0-9]+]]			; CHECK: mov [[REG2:x[0-9]+]], x3
	; CHECK: mov x1, [[REG2:x[0-9]+]]			; CHECK: mov [[REG1:x[0-9]+]], x2
				; CHECK: mov x0, x2
				; CHECK: mov x1, x3
				javed.absarUnsubmitted Not Done Reply Inline Actions Would it be better to rewrite these as MIR tests? javed.absar: Would it be better to rewrite these as MIR tests?
				gberryAuthorUnsubmitted Not Done Reply Inline Actions I'm not sure how that would help. In this test, similar to the one Hal asked about before, the newly checked 'mov's aren't new, I just needed to add them to get the new register numbers. Here are the full diffs of the generated code for this test case: _t: ; @t ; BB#0: ; %entry stp x20, x19, [sp, #-32]! ; 8-byte Folded Spill stp x29, x30, [sp, #16] ; 8-byte Folded Spill mov x19, x3 mov x20, x2 - mov x0, x20 - mov x1, x19 + mov x0, x2 + mov x1, x3 bl _foo mov x0, x20 mov x1, x19 bl _foo gberry: I'm not sure how that would help. In this test, similar to the one Hal asked about before, the…
	; CHECK: bl _foo			; CHECK: bl _foo
	; CHECK: mov x0, [[REG1]]			; CHECK: mov x0, [[REG1]]
	; CHECK: mov x1, [[REG2]]			; CHECK: mov x1, [[REG2]]
	%call = call i32 @foo(i32 %c, i32 %d) nounwind			%call = call i32 @foo(i32 %c, i32 %d) nounwind
	%call1 = call i32 @foo(i32 %c, i32 %d) nounwind			%call1 = call i32 @foo(i32 %c, i32 %d) nounwind
	unreachable			unreachable
	}			}

	declare i32 @foo(i32, i32)			declare i32 @foo(i32, i32)

test/CodeGen/AArch64/f16-instructions.ll

Show First 20 Lines • Show All 344 Lines • ▼ Show 20 Lines	then:
ret void		ret void
else:		else:
store i32 0, i32* %p2		store i32 0, i32* %p2
ret void		ret void
}		}

; CHECK-LABEL: test_phi:		; CHECK-LABEL: test_phi:
; CHECK: mov x[[PTR:[0-9]+]], x0		; CHECK: mov x[[PTR:[0-9]+]], x0
; CHECK: ldr h[[AB:[0-9]+]], [x[[PTR]]]		; CHECK: ldr h[[AB:[0-9]+]], [x0]
; CHECK: [[LOOP:LBB[0-9_]+]]:		; CHECK: [[LOOP:LBB[0-9_]+]]:
; CHECK: mov.16b v[[R:[0-9]+]], v[[AB]]		; CHECK: mov.16b v[[R:[0-9]+]], v[[AB]]
; CHECK: ldr h[[AB]], [x[[PTR]]]		; CHECK: ldr h[[AB]], [x[[PTR]]]
; CHECK: mov x0, x[[PTR]]		; CHECK: mov x0, x[[PTR]]
; CHECK: bl {{_?}}test_dummy		; CHECK: bl {{_?}}test_dummy
; CHECK: mov.16b v0, v[[R]]		; CHECK: mov.16b v0, v[[R]]
; CHECK: ret		; CHECK: ret
define half @test_phi(half* %p1) #0 {		define half @test_phi(half* %p1) #0 {
▲ Show 20 Lines • Show All 487 Lines • Show Last 20 Lines

test/CodeGen/AArch64/flags-multiuse.ll

	; RUN: llc -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 -verify-machineinstrs -o - %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 -disable-post-ra -verify-machineinstrs -o - %s \| FileCheck %s
				qcolombetUnsubmitted Done Reply Inline Actions Why do we need to change the run line here? qcolombet: Why do we need to change the run line here?
				gberryAuthorUnsubmitted Not Done Reply Inline Actions Turning off the post-RA scheduler kept the checked instructions in the same order. I've just re-arranged the checks now. gberry: Turning off the post-RA scheduler kept the checked instructions in the same order. I've just…

	; LLVM should be able to cope with multiple uses of the same flag-setting			; LLVM should be able to cope with multiple uses of the same flag-setting
	; instruction at different points of a routine. Either by rematerializing the			; instruction at different points of a routine. Either by rematerializing the
	; compare or by saving and restoring the flag register.			; compare or by saving and restoring the flag register.

	declare void @bar()			declare void @bar()

	@var = global i32 0			@var = global i32 0

	define i32 @test_multiflag(i32 %n, i32 %m, i32 %o) {			define i32 @test_multiflag(i32 %n, i32 %m, i32 %o) {
	; CHECK-LABEL: test_multiflag:			; CHECK-LABEL: test_multiflag:

	%test = icmp ne i32 %n, %m			%test = icmp ne i32 %n, %m
	; CHECK: cmp [[LHS:w[0-9]+]], [[RHS:w[0-9]+]]			; CHECK: mov [[RHSCOPY:w[0-9]+]], [[RHS:w[0-9]+]]
				; CHECK: mov [[LHSCOPY:w[0-9]+]], [[LHS:w[0-9]+]]
				; CHECK: cmp [[LHS]], [[RHS]]

	%val = zext i1 %test to i32			%val = zext i1 %test to i32
	; CHECK: cset {{[xw][0-9]+}}, ne			; CHECK: cset {{[xw][0-9]+}}, ne

	store i32 %val, i32* @var			store i32 %val, i32* @var

	call void @bar()			call void @bar()
	; CHECK: bl bar			; CHECK: bl bar

	; Currently, the comparison is emitted again. An MSR/MRS pair would also be			; Currently, the comparison is emitted again. An MSR/MRS pair would also be
	; acceptable, but assuming the call preserves NZCV is not.			; acceptable, but assuming the call preserves NZCV is not.
	br i1 %test, label %iftrue, label %iffalse			br i1 %test, label %iftrue, label %iffalse
	; CHECK: cmp [[LHS]], [[RHS]]			; CHECK: cmp [[LHSCOPY]], [[RHSCOPY]]
	; CHECK: b.eq			; CHECK: b.eq

	iftrue:			iftrue:
	ret i32 42			ret i32 42
	iffalse:			iffalse:
	ret i32 0			ret i32 0
	}			}

test/CodeGen/AArch64/merge-store-dependency.ll

	; RUN: llc < %s -mcpu cortex-a53 -mtriple=aarch64-eabi \| FileCheck %s --check-prefix=A53			; RUN: llc < %s -mcpu cortex-a53 -mtriple=aarch64-eabi \| FileCheck %s --check-prefix=A53

	; PR26827 - Merge stores causes wrong dependency.			; PR26827 - Merge stores causes wrong dependency.
	%struct1 = type { %struct1, %struct1, i32, i32, i16, i16, void (i32, i32, i8), i8* }			%struct1 = type { %struct1, %struct1, i32, i32, i16, i16, void (i32, i32, i8), i8* }
	@gv0 = internal unnamed_addr global i32 0, align 4			@gv0 = internal unnamed_addr global i32 0, align 4
	@gv1 = internal unnamed_addr global %struct1** null, align 8			@gv1 = internal unnamed_addr global %struct1** null, align 8

	define void @test(%struct1* %fde, i32 %fd, void (i32, i32, i8) %func, i8* %arg) {			define void @test(%struct1* %fde, i32 %fd, void (i32, i32, i8) %func, i8* %arg) {
	;CHECK-LABEL: test			;CHECK-LABEL: test
	entry:			entry:
	; A53: mov [[DATA:w[0-9]+]], w1
	; A53: str q{{[0-9]+}}, {{.*}}			; A53: str q{{[0-9]+}}, {{.*}}
	; A53: str q{{[0-9]+}}, {{.*}}			; A53: str q{{[0-9]+}}, {{.*}}
	; A53: str [[DATA]], {{.*}}			; A53: str w1, {{.*}}

	%0 = bitcast %struct1* %fde to i8*			%0 = bitcast %struct1* %fde to i8*
	tail call void @llvm.memset.p0i8.i64(i8* %0, i8 0, i64 40, i32 8, i1 false)			tail call void @llvm.memset.p0i8.i64(i8* %0, i8 0, i64 40, i32 8, i1 false)
	%state = getelementptr inbounds %struct1, %struct1* %fde, i64 0, i32 4			%state = getelementptr inbounds %struct1, %struct1* %fde, i64 0, i32 4
	store i16 256, i16* %state, align 8			store i16 256, i16* %state, align 8
	%fd1 = getelementptr inbounds %struct1, %struct1* %fde, i64 0, i32 2			%fd1 = getelementptr inbounds %struct1, %struct1* %fde, i64 0, i32 2
	store i32 %fd, i32* %fd1, align 8			store i32 %fd, i32* %fd1, align 8
	%force_eof = getelementptr inbounds %struct1, %struct1* %fde, i64 0, i32 3			%force_eof = getelementptr inbounds %struct1, %struct1* %fde, i64 0, i32 3
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

test/CodeGen/AArch64/neg-imm.ll

	; RUN: llc -mtriple=aarch64-linux-gnu -verify-machineinstrs -o - %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -verify-machineinstrs -o - %s \| FileCheck %s
	; LSR used to pick a sub-optimal solution due to the target responding			; LSR used to pick a sub-optimal solution due to the target responding
	; conservatively to isLegalAddImmediate for negative values.			; conservatively to isLegalAddImmediate for negative values.

	declare void @foo(i32)			declare void @foo(i32)

	define void @test(i32 %px) {			define void @test(i32 %px) {
	; CHECK_LABEL: test:			; CHECK_LABEL: test:
	; CHECK_LABEL: %entry			; CHECK_LABEL: %entry
	; CHECK: subs			; CHECK: subs [[REG0:w[0-9]+]],
				javed.absarUnsubmitted Not Done Reply Inline Actions Would it be better adding new/separate test file instead of changing the purpose of this one ? javed.absar: Would it be better adding new/separate test file instead of changing the purpose of this one ?
				gberryAuthorUnsubmitted Not Done Reply Inline Actions Again, I'm not trying to change the purpose of this test. My change just caused things to be scheduled slightly differently. The test is still checking that the condition is computed by a 'subs' feeding a 'csel'. Here are the full diffs: test: // @test str x20, [sp, #-32]! // 8-byte Folded Spill stp x19, x30, [sp, #16] // 8-byte Folded Spill + subs w8, w0, #1 // =1 mov w19, w0 - subs w8, w19, #1 // =1 csel w20, wzr, w8, lt .LBB0_1: // %for.body // =>This Inner Loop Header: Depth=1 cmp w19, w20 b.eq .LBB0_3 // BB#2: // %if.then3 // in Loop: Header=BB0_1 Depth=1 mov w0, w20 bl foo .LBB0_3: // %for.inc // in Loop: Header=BB0_1 Depth=1 cmp w20, w19 add w20, w20, #1 // =1 b.le .LBB0_1 // BB#4: // %for.cond.cleanup ldp x19, x30, [sp, #16] // 8-byte Folded Reload ldr x20, [sp], #32 // 8-byte Folded Reload ret gberry: Again, I'm not trying to change the purpose of this test. My change just caused things to be…
	; CHECK-NEXT: csel			; CHECK: csel {{w[0-9]+}}, wzr, [[REG0]]
	entry:			entry:
	%sub = add nsw i32 %px, -1			%sub = add nsw i32 %px, -1
	%cmp = icmp slt i32 %px, 1			%cmp = icmp slt i32 %px, 1
	%.sub = select i1 %cmp, i32 0, i32 %sub			%.sub = select i1 %cmp, i32 0, i32 %sub
	br label %for.body			br label %for.body

	for.body:			for.body:
	; CHECK_LABEL: %for.body			; CHECK_LABEL: %for.body
	Show All 27 Lines

test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size.ll

	Show All 30 Lines
	entry:			entry:
	ret void			ret void
	}			}
	attributes #2 = {"amdgpu-flat-work-group-size"="128,128"}			attributes #2 = {"amdgpu-flat-work-group-size"="128,128"}

	; CHECK-LABEL: {{^}}min_1024_max_2048			; CHECK-LABEL: {{^}}min_1024_max_2048
	; CHECK: SGPRBlocks: 1			; CHECK: SGPRBlocks: 1
	; CHECK: VGPRBlocks: 7			; CHECK: VGPRBlocks: 7
	; CHECK: NumSGPRsForWavesPerEU: 13			; CHECK: NumSGPRsForWavesPerEU: 12
	; CHECK: NumVGPRsForWavesPerEU: 32			; CHECK: NumVGPRsForWavesPerEU: 32
	@var = addrspace(1) global float 0.0			@var = addrspace(1) global float 0.0
	define amdgpu_kernel void @min_1024_max_2048() #3 {			define amdgpu_kernel void @min_1024_max_2048() #3 {
	%val0 = load volatile float, float addrspace(1)* @var			%val0 = load volatile float, float addrspace(1)* @var
	%val1 = load volatile float, float addrspace(1)* @var			%val1 = load volatile float, float addrspace(1)* @var
	%val2 = load volatile float, float addrspace(1)* @var			%val2 = load volatile float, float addrspace(1)* @var
	%val3 = load volatile float, float addrspace(1)* @var			%val3 = load volatile float, float addrspace(1)* @var
	%val4 = load volatile float, float addrspace(1)* @var			%val4 = load volatile float, float addrspace(1)* @var
	▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/attr-amdgpu-waves-per-eu.ll

	Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
	attributes #8 = {"amdgpu-waves-per-eu"="5,10"}			attributes #8 = {"amdgpu-waves-per-eu"="5,10"}

	@var = addrspace(1) global float 0.0			@var = addrspace(1) global float 0.0

	; Exactly 10 waves per execution unit.			; Exactly 10 waves per execution unit.
	; CHECK-LABEL: {{^}}exactly_10:			; CHECK-LABEL: {{^}}exactly_10:
	; CHECK: SGPRBlocks: 1			; CHECK: SGPRBlocks: 1
	; CHECK: VGPRBlocks: 5			; CHECK: VGPRBlocks: 5
	; CHECK: NumSGPRsForWavesPerEU: 13			; CHECK: NumSGPRsForWavesPerEU: 12
	; CHECK: NumVGPRsForWavesPerEU: 24			; CHECK: NumVGPRsForWavesPerEU: 24
	define amdgpu_kernel void @exactly_10() #9 {			define amdgpu_kernel void @exactly_10() #9 {
	%val0 = load volatile float, float addrspace(1)* @var			%val0 = load volatile float, float addrspace(1)* @var
	%val1 = load volatile float, float addrspace(1)* @var			%val1 = load volatile float, float addrspace(1)* @var
	%val2 = load volatile float, float addrspace(1)* @var			%val2 = load volatile float, float addrspace(1)* @var
	%val3 = load volatile float, float addrspace(1)* @var			%val3 = load volatile float, float addrspace(1)* @var
	%val4 = load volatile float, float addrspace(1)* @var			%val4 = load volatile float, float addrspace(1)* @var
	%val5 = load volatile float, float addrspace(1)* @var			%val5 = load volatile float, float addrspace(1)* @var
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/mubuf-offset-private.ll

	; RUN: llc -march=amdgcn -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=SI %s			; RUN: llc -march=amdgcn -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=SI %s
	; RUN: llc -march=amdgcn -mcpu=fiji -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI %s			; RUN: llc -march=amdgcn -mcpu=fiji -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI %s

	; Test addressing modes when the scratch base is not a frame index.			; Test addressing modes when the scratch base is not a frame index.

	; GCN-LABEL: {{^}}store_private_offset_i8:			; GCN-LABEL: {{^}}store_private_offset_i8:
	; GCN: buffer_store_byte v{{[0-9]+}}, off, s[4:7], s8 offset:8			; GCN: buffer_store_byte v{{[0-9]+}}, off, s[4:7], s3 offset:8
	define amdgpu_kernel void @store_private_offset_i8() #0 {			define amdgpu_kernel void @store_private_offset_i8() #0 {
	store volatile i8 5, i8* inttoptr (i32 8 to i8*)			store volatile i8 5, i8* inttoptr (i32 8 to i8*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_i16:			; GCN-LABEL: {{^}}store_private_offset_i16:
	; GCN: buffer_store_short v{{[0-9]+}}, off, s[4:7], s8 offset:8			; GCN: buffer_store_short v{{[0-9]+}}, off, s[4:7], s3 offset:8
	define amdgpu_kernel void @store_private_offset_i16() #0 {			define amdgpu_kernel void @store_private_offset_i16() #0 {
	store volatile i16 5, i16* inttoptr (i32 8 to i16*)			store volatile i16 5, i16* inttoptr (i32 8 to i16*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_i32:			; GCN-LABEL: {{^}}store_private_offset_i32:
	; GCN: buffer_store_dword v{{[0-9]+}}, off, s[4:7], s8 offset:8			; GCN: buffer_store_dword v{{[0-9]+}}, off, s[4:7], s3 offset:8
	define amdgpu_kernel void @store_private_offset_i32() #0 {			define amdgpu_kernel void @store_private_offset_i32() #0 {
	store volatile i32 5, i32* inttoptr (i32 8 to i32*)			store volatile i32 5, i32* inttoptr (i32 8 to i32*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_v2i32:			; GCN-LABEL: {{^}}store_private_offset_v2i32:
	; GCN: buffer_store_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], s8 offset:8			; GCN: buffer_store_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], s3 offset:8
	define amdgpu_kernel void @store_private_offset_v2i32() #0 {			define amdgpu_kernel void @store_private_offset_v2i32() #0 {
	store volatile <2 x i32> <i32 5, i32 10>, <2 x i32>* inttoptr (i32 8 to <2 x i32>*)			store volatile <2 x i32> <i32 5, i32 10>, <2 x i32>* inttoptr (i32 8 to <2 x i32>*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_v4i32:			; GCN-LABEL: {{^}}store_private_offset_v4i32:
	; GCN: buffer_store_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], s8 offset:8			; GCN: buffer_store_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], s3 offset:8
	define amdgpu_kernel void @store_private_offset_v4i32() #0 {			define amdgpu_kernel void @store_private_offset_v4i32() #0 {
	store volatile <4 x i32> <i32 5, i32 10, i32 15, i32 0>, <4 x i32>* inttoptr (i32 8 to <4 x i32>*)			store volatile <4 x i32> <i32 5, i32 10, i32 15, i32 0>, <4 x i32>* inttoptr (i32 8 to <4 x i32>*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_private_offset_i8:			; GCN-LABEL: {{^}}load_private_offset_i8:
	; GCN: buffer_load_ubyte v{{[0-9]+}}, off, s[4:7], s8 offset:8			; GCN: buffer_load_ubyte v{{[0-9]+}}, off, s[4:7], s3 offset:8
	define amdgpu_kernel void @load_private_offset_i8() #0 {			define amdgpu_kernel void @load_private_offset_i8() #0 {
	%load = load volatile i8, i8* inttoptr (i32 8 to i8*)			%load = load volatile i8, i8* inttoptr (i32 8 to i8*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}sextload_private_offset_i8:			; GCN-LABEL: {{^}}sextload_private_offset_i8:
	; GCN: buffer_load_sbyte v{{[0-9]+}}, off, s[4:7], s8 offset:8			; GCN: buffer_load_sbyte v{{[0-9]+}}, off, s[4:7], s3 offset:8
	define amdgpu_kernel void @sextload_private_offset_i8(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @sextload_private_offset_i8(i32 addrspace(1)* %out) #0 {
	%load = load volatile i8, i8* inttoptr (i32 8 to i8*)			%load = load volatile i8, i8* inttoptr (i32 8 to i8*)
	%sextload = sext i8 %load to i32			%sextload = sext i8 %load to i32
	store i32 %sextload, i32 addrspace(1)* undef			store i32 %sextload, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}zextload_private_offset_i8:			; GCN-LABEL: {{^}}zextload_private_offset_i8:
	; GCN: buffer_load_ubyte v{{[0-9]+}}, off, s[4:7], s8 offset:8			; GCN: buffer_load_ubyte v{{[0-9]+}}, off, s[4:7], s3 offset:8
	define amdgpu_kernel void @zextload_private_offset_i8(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @zextload_private_offset_i8(i32 addrspace(1)* %out) #0 {
	%load = load volatile i8, i8* inttoptr (i32 8 to i8*)			%load = load volatile i8, i8* inttoptr (i32 8 to i8*)
	%zextload = zext i8 %load to i32			%zextload = zext i8 %load to i32
	store i32 %zextload, i32 addrspace(1)* undef			store i32 %zextload, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_private_offset_i16:			; GCN-LABEL: {{^}}load_private_offset_i16:
	; GCN: buffer_load_ushort v{{[0-9]+}}, off, s[4:7], s8 offset:8			; GCN: buffer_load_ushort v{{[0-9]+}}, off, s[4:7], s3 offset:8
	define amdgpu_kernel void @load_private_offset_i16() #0 {			define amdgpu_kernel void @load_private_offset_i16() #0 {
	%load = load volatile i16, i16* inttoptr (i32 8 to i16*)			%load = load volatile i16, i16* inttoptr (i32 8 to i16*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}sextload_private_offset_i16:			; GCN-LABEL: {{^}}sextload_private_offset_i16:
	; GCN: buffer_load_sshort v{{[0-9]+}}, off, s[4:7], s8 offset:8			; GCN: buffer_load_sshort v{{[0-9]+}}, off, s[4:7], s3 offset:8
	define amdgpu_kernel void @sextload_private_offset_i16(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @sextload_private_offset_i16(i32 addrspace(1)* %out) #0 {
	%load = load volatile i16, i16* inttoptr (i32 8 to i16*)			%load = load volatile i16, i16* inttoptr (i32 8 to i16*)
	%sextload = sext i16 %load to i32			%sextload = sext i16 %load to i32
	store i32 %sextload, i32 addrspace(1)* undef			store i32 %sextload, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}zextload_private_offset_i16:			; GCN-LABEL: {{^}}zextload_private_offset_i16:
	; GCN: buffer_load_ushort v{{[0-9]+}}, off, s[4:7], s8 offset:8			; GCN: buffer_load_ushort v{{[0-9]+}}, off, s[4:7], s3 offset:8
	define amdgpu_kernel void @zextload_private_offset_i16(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @zextload_private_offset_i16(i32 addrspace(1)* %out) #0 {
	%load = load volatile i16, i16* inttoptr (i32 8 to i16*)			%load = load volatile i16, i16* inttoptr (i32 8 to i16*)
	%zextload = zext i16 %load to i32			%zextload = zext i16 %load to i32
	store i32 %zextload, i32 addrspace(1)* undef			store i32 %zextload, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_private_offset_i32:			; GCN-LABEL: {{^}}load_private_offset_i32:
	; GCN: buffer_load_dword v{{[0-9]+}}, off, s[4:7], s8 offset:8			; GCN: buffer_load_dword v{{[0-9]+}}, off, s[4:7], s3 offset:8
	define amdgpu_kernel void @load_private_offset_i32() #0 {			define amdgpu_kernel void @load_private_offset_i32() #0 {
	%load = load volatile i32, i32* inttoptr (i32 8 to i32*)			%load = load volatile i32, i32* inttoptr (i32 8 to i32*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_private_offset_v2i32:			; GCN-LABEL: {{^}}load_private_offset_v2i32:
	; GCN: buffer_load_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], s8 offset:8			; GCN: buffer_load_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], s3 offset:8
	define amdgpu_kernel void @load_private_offset_v2i32() #0 {			define amdgpu_kernel void @load_private_offset_v2i32() #0 {
	%load = load volatile <2 x i32>, <2 x i32>* inttoptr (i32 8 to <2 x i32>*)			%load = load volatile <2 x i32>, <2 x i32>* inttoptr (i32 8 to <2 x i32>*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_private_offset_v4i32:			; GCN-LABEL: {{^}}load_private_offset_v4i32:
	; GCN: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], s8 offset:8			; GCN: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], s3 offset:8
	define amdgpu_kernel void @load_private_offset_v4i32() #0 {			define amdgpu_kernel void @load_private_offset_v4i32() #0 {
	%load = load volatile <4 x i32>, <4 x i32>* inttoptr (i32 8 to <4 x i32>*)			%load = load volatile <4 x i32>, <4 x i32>* inttoptr (i32 8 to <4 x i32>*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_i8_max_offset:			; GCN-LABEL: {{^}}store_private_offset_i8_max_offset:
	; GCN: buffer_store_byte v{{[0-9]+}}, off, s[4:7], s8 offset:4095			; GCN: buffer_store_byte v{{[0-9]+}}, off, s[4:7], s3 offset:4095
	define amdgpu_kernel void @store_private_offset_i8_max_offset() #0 {			define amdgpu_kernel void @store_private_offset_i8_max_offset() #0 {
	store volatile i8 5, i8* inttoptr (i32 4095 to i8*)			store volatile i8 5, i8* inttoptr (i32 4095 to i8*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_i8_max_offset_plus1:			; GCN-LABEL: {{^}}store_private_offset_i8_max_offset_plus1:
	; GCN: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x1000			; GCN: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x1000
	; GCN: buffer_store_byte v{{[0-9]+}}, [[OFFSET]], s[4:7], s8 offen{{$}}			; GCN: buffer_store_byte v{{[0-9]+}}, [[OFFSET]], s[4:7], s3 offen{{$}}
	define amdgpu_kernel void @store_private_offset_i8_max_offset_plus1() #0 {			define amdgpu_kernel void @store_private_offset_i8_max_offset_plus1() #0 {
	store volatile i8 5, i8* inttoptr (i32 4096 to i8*)			store volatile i8 5, i8* inttoptr (i32 4096 to i8*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_i8_max_offset_plus2:			; GCN-LABEL: {{^}}store_private_offset_i8_max_offset_plus2:
	; GCN: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x1000			; GCN: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x1000
	; GCN: buffer_store_byte v{{[0-9]+}}, [[OFFSET]], s[4:7], s8 offen offset:1{{$}}			; GCN: buffer_store_byte v{{[0-9]+}}, [[OFFSET]], s[4:7], s3 offen offset:1{{$}}
	define amdgpu_kernel void @store_private_offset_i8_max_offset_plus2() #0 {			define amdgpu_kernel void @store_private_offset_i8_max_offset_plus2() #0 {
	store volatile i8 5, i8* inttoptr (i32 4097 to i8*)			store volatile i8 5, i8* inttoptr (i32 4097 to i8*)
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

test/CodeGen/AMDGPU/multilevel-break.ll

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines

	; GCN-LABEL: {{^}}multi_if_break_loop:			; GCN-LABEL: {{^}}multi_if_break_loop:
	; GCN: s_mov_b64 [[BREAK_REG:s\[[0-9]+:[0-9]+\]]], 0{{$}}			; GCN: s_mov_b64 [[BREAK_REG:s\[[0-9]+:[0-9]+\]]], 0{{$}}

	; GCN: [[LOOP:BB[0-9]+_[0-9]+]]: ; %bb1{{$}}			; GCN: [[LOOP:BB[0-9]+_[0-9]+]]: ; %bb1{{$}}

	; Uses a copy intsead of an or			; Uses a copy intsead of an or
	; GCN: s_mov_b64 [[COPY:s\[[0-9]+:[0-9]+\]]], [[BREAK_REG]]			; GCN: s_mov_b64 [[COPY:s\[[0-9]+:[0-9]+\]]], [[BREAK_REG]]
	; GCN: s_or_b64 [[BREAK_REG]], exec, [[COPY]]			; GCN: s_or_b64 [[BREAK_REG]], exec, [[BREAK_REG]]
	define amdgpu_kernel void @multi_if_break_loop(i32 %arg) #0 {			define amdgpu_kernel void @multi_if_break_loop(i32 %arg) #0 {
	bb:			bb:
	%id = call i32 @llvm.amdgcn.workitem.id.x()			%id = call i32 @llvm.amdgcn.workitem.id.x()
	%tmp = sub i32 %id, %arg			%tmp = sub i32 %id, %arg
	br label %bb1			br label %bb1

	bb1:			bb1:
	%lsr.iv = phi i32 [ undef, %bb ], [ %lsr.iv.next, %case0 ], [ %lsr.iv.next, %case1 ]			%lsr.iv = phi i32 [ undef, %bb ], [ %lsr.iv.next, %case0 ], [ %lsr.iv.next, %case1 ]
	Show All 26 Lines

test/CodeGen/AMDGPU/private-access-no-objects.ll

	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI -check-prefix=OPT %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI -check-prefix=OPT %s
	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=CI -check-prefix=OPT %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=CI -check-prefix=OPT %s
	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=iceland -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI -check-prefix=OPT %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=iceland -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI -check-prefix=OPTICELAND %s
	; RUN: llc -O0 -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=OPTNONE %s			; RUN: llc -O0 -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=OPTNONE %s

	; There are no stack objects, but still a private memory access. The			; There are no stack objects, but still a private memory access. The
	; private access regiters need to be correctly initialized anyway, and			; private access regiters need to be correctly initialized anyway, and
	; shifted down to the end of the used registers.			; shifted down to the end of the used registers.

	; GCN-LABEL: {{^}}store_to_undef:			; GCN-LABEL: {{^}}store_to_undef:
	; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]			; OPT: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s7 offen{{$}}
	; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]			; The -mcpu=iceland case doesn't copy-propagate the same as the other two opt cases because the temp registers %SGPR88_SGPR89_SGPR90_SGPR91 and %SGPR93 are marked as non-allocatable by this subtarget.
	; OPT-DAG: s_mov_b32 [[SOFFSET:s[0-9]+]], s7{{$}}			; OPTICELAND-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]
	; OPT: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, [[SOFFSET]] offen{{$}}			; OPTICELAND-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]
				; OPTICELAND-DAG: s_mov_b32 [[SOFFSET:s[0-9]+]], s7{{$}}
				; OPTICELAND: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, [[SOFFSET]] offen{{$}}

	; -O0 should assume spilling, so the input scratch resource descriptor			; -O0 should assume spilling, so the input scratch resource descriptor
	; -should be used directly without any copies.			; -should be used directly without any copies.

	; OPTNONE-NOT: s_mov_b32			; OPTNONE-NOT: s_mov_b32
	; OPTNONE: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s7 offen{{$}}			; OPTNONE: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s7 offen{{$}}
	define amdgpu_kernel void @store_to_undef() #0 {			define amdgpu_kernel void @store_to_undef() #0 {
	store volatile i32 0, i32* undef			store volatile i32 0, i32* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_to_inttoptr:			; GCN-LABEL: {{^}}store_to_inttoptr:
	; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]			; OPT: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s7 offset:124{{$}}
	; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]
	; OPT-DAG: s_mov_b32 [[SOFFSET:s[0-9]+]], s7{{$}}
	; OPT: buffer_store_dword v{{[0-9]+}}, off, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, [[SOFFSET]] offset:124{{$}}
	define amdgpu_kernel void @store_to_inttoptr() #0 {			define amdgpu_kernel void @store_to_inttoptr() #0 {
	store volatile i32 0, i32* inttoptr (i32 124 to i32*)			store volatile i32 0, i32* inttoptr (i32 124 to i32*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_from_undef:			; GCN-LABEL: {{^}}load_from_undef:
	; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]			; OPT: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s7 offen{{$}}
	; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]
	; OPT-DAG: s_mov_b32 [[SOFFSET:s[0-9]+]], s7{{$}}
	; OPT: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, [[SOFFSET]] offen{{$}}
	define amdgpu_kernel void @load_from_undef() #0 {			define amdgpu_kernel void @load_from_undef() #0 {
	%ld = load volatile i32, i32* undef			%ld = load volatile i32, i32* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_from_inttoptr:			; GCN-LABEL: {{^}}load_from_inttoptr:
	; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]			; OPT: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s7 offset:124{{$}}
	; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]
	; OPT-DAG: s_mov_b32 [[SOFFSET:s[0-9]+]], s7{{$}}
	; OPT: buffer_load_dword v{{[0-9]+}}, off, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, [[SOFFSET]] offset:124{{$}}
	define amdgpu_kernel void @load_from_inttoptr() #0 {			define amdgpu_kernel void @load_from_inttoptr() #0 {
	%ld = load volatile i32, i32* inttoptr (i32 124 to i32*)			%ld = load volatile i32, i32* inttoptr (i32 124 to i32*)
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

test/CodeGen/AMDGPU/ret.ll

	; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	; GCN-LABEL: {{^}}vgpr:			; GCN-LABEL: {{^}}vgpr:
	; GCN: v_mov_b32_e32 v1, v0			; GCN-DAG: v_mov_b32_e32 v1, v0
	; GCN-DAG: v_add_f32_e32 v0, 1.0, v1			; GCN-DAG: exp mrt0 v0, v0, v0, v0 done vm
	; GCN-DAG: exp mrt0 v1, v1, v1, v1 done vm
	; GCN: s_waitcnt expcnt(0)			; GCN: s_waitcnt expcnt(0)
				; GCN: v_add_f32_e32 v0, 1.0, v0
	; GCN-NOT: s_endpgm			; GCN-NOT: s_endpgm
	define amdgpu_vs { float, float } @vgpr([9 x <16 x i8>] addrspace(2)* byval %arg, i32 inreg %arg1, i32 inreg %arg2, float %arg3) #0 {			define amdgpu_vs { float, float } @vgpr([9 x <16 x i8>] addrspace(2)* byval %arg, i32 inreg %arg1, i32 inreg %arg2, float %arg3) #0 {
	bb:			bb:
	call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %arg3, float %arg3, float %arg3, float %arg3, i1 true, i1 true) #0			call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %arg3, float %arg3, float %arg3, float %arg3, i1 true, i1 true) #0
	%x = fadd float %arg3, 1.000000e+00			%x = fadd float %arg3, 1.000000e+00
	%a = insertvalue { float, float } undef, float %x, 0			%a = insertvalue { float, float } undef, float %x, 0
	%b = insertvalue { float, float } %a, float %arg3, 1			%b = insertvalue { float, float } %a, float %arg3, 1
	ret { float, float } %b			ret { float, float } %b
	▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	; GCN-NOT: s_endpgm			; GCN-NOT: s_endpgm
	define amdgpu_vs { i32, i32, i32, i32 } @sgpr_literal([9 x <16 x i8>] addrspace(2)* byval %arg, i32 inreg %arg1, i32 inreg %arg2, float %arg3) #0 {			define amdgpu_vs { i32, i32, i32, i32 } @sgpr_literal([9 x <16 x i8>] addrspace(2)* byval %arg, i32 inreg %arg1, i32 inreg %arg2, float %arg3) #0 {
	bb:			bb:
	%x = add i32 %arg2, 2			%x = add i32 %arg2, 2
	ret { i32, i32, i32, i32 } { i32 5, i32 6, i32 7, i32 8 }			ret { i32, i32, i32, i32 } { i32 5, i32 6, i32 7, i32 8 }
	}			}

	; GCN-LABEL: {{^}}both:			; GCN-LABEL: {{^}}both:
	; GCN: v_mov_b32_e32 v1, v0			; GCN-DAG: exp mrt0 v0, v0, v0, v0 done vm
	; GCN-DAG: exp mrt0 v1, v1, v1, v1 done vm			; GCN-DAG: v_mov_b32_e32 v1, v0
	; GCN-DAG: v_add_f32_e32 v0, 1.0, v1
	; GCN-DAG: s_add_i32 s0, s3, 2
	; GCN-DAG: s_mov_b32 s1, s2			; GCN-DAG: s_mov_b32 s1, s2
	; GCN: s_mov_b32 s2, s3
	; GCN: s_waitcnt expcnt(0)			; GCN: s_waitcnt expcnt(0)
				; GCN: v_add_f32_e32 v0, 1.0, v0
				; GCN-DAG: s_add_i32 s0, s3, 2
				; GCN-DAG: s_mov_b32 s2, s3
	; GCN-NOT: s_endpgm			; GCN-NOT: s_endpgm
	define amdgpu_vs { float, i32, float, i32, i32 } @both([9 x <16 x i8>] addrspace(2)* byval %arg, i32 inreg %arg1, i32 inreg %arg2, float %arg3) #0 {			define amdgpu_vs { float, i32, float, i32, i32 } @both([9 x <16 x i8>] addrspace(2)* byval %arg, i32 inreg %arg1, i32 inreg %arg2, float %arg3) #0 {
	bb:			bb:
	call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %arg3, float %arg3, float %arg3, float %arg3, i1 true, i1 true) #0			call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %arg3, float %arg3, float %arg3, float %arg3, i1 true, i1 true) #0
	%v = fadd float %arg3, 1.000000e+00			%v = fadd float %arg3, 1.000000e+00
	%s = add i32 %arg2, 2			%s = add i32 %arg2, 2
	%a0 = insertvalue { float, i32, float, i32, i32 } undef, float %v, 0			%a0 = insertvalue { float, i32, float, i32, i32 } undef, float %v, 0
	%a1 = insertvalue { float, i32, float, i32, i32 } %a0, i32 %s, 1			%a1 = insertvalue { float, i32, float, i32, i32 } %a0, i32 %s, 1
	Show All 29 Lines

test/CodeGen/AMDGPU/scratch-simple.ll

; RUN: llc -march=amdgcn -mcpu=verde -mattr=+vgpr-spilling -verify-machineinstrs < %s \| FileCheck --check-prefix=GCN --check-prefix=SI %s		; RUN: llc -march=amdgcn -mcpu=verde -mattr=+vgpr-spilling -verify-machineinstrs < %s \| FileCheck --check-prefix=GCN --check-prefix=SI %s
; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -mattr=+vgpr-spilling -verify-machineinstrs < %s \| FileCheck --check-prefix=GCN --check-prefix=SI %s		; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -mattr=+vgpr-spilling -verify-machineinstrs < %s \| FileCheck --check-prefix=GCN --check-prefix=TONGA %s
; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -mattr=+vgpr-spilling -verify-machineinstrs < %s \| FileCheck --check-prefix=GCN --check-prefix=GFX9 %s		; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -mattr=+vgpr-spilling -verify-machineinstrs < %s \| FileCheck --check-prefix=GCN --check-prefix=GFX9 %s

; This used to fail due to a v_add_i32 instruction with an illegal immediate		; This used to fail due to a v_add_i32 instruction with an illegal immediate
; operand that was created during Local Stack Slot Allocation. Test case derived		; operand that was created during Local Stack Slot Allocation. Test case derived
; from https://bugs.freedesktop.org/show_bug.cgi?id=96602		; from https://bugs.freedesktop.org/show_bug.cgi?id=96602
;		;
; GCN-LABEL: {{^}}ps_main:		; GCN-LABEL: {{^}}ps_main:

Show All 34 Lines	define amdgpu_cs float @cs_main(i32 %idx) {
%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx		%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx		%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
%r = fadd float %v1, %v2		%r = fadd float %v1, %v2
ret float %r		ret float %r
}		}

; GCN-LABEL: {{^}}hs_main:		; GCN-LABEL: {{^}}hs_main:
; SI: s_mov_b32 [[SWO:s[0-9]+]], s0		; SI: s_mov_b32 [[SWO:s[0-9]+]], s0
; GFX9: s_mov_b32 [[SWO:s[0-9]+]], s5		; TONGA: s_mov_b32 [[SWO:s[0-9]+]], s0
; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen		; SI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen
; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen		; SI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen
		; TONGA: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen
		; TONGA: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen
		; GFX9: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen
		; GFX9: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen
define amdgpu_hs float @hs_main(i32 %idx) {		define amdgpu_hs float @hs_main(i32 %idx) {
%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx		%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx		%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
%r = fadd float %v1, %v2		%r = fadd float %v1, %v2
ret float %r		ret float %r
}		}

; GCN-LABEL: {{^}}gs_main:		; GCN-LABEL: {{^}}gs_main:
; SI: s_mov_b32 [[SWO:s[0-9]+]], s0		; SI: s_mov_b32 [[SWO:s[0-9]+]], s0
; GFX9: s_mov_b32 [[SWO:s[0-9]+]], s5		; TONGA: s_mov_b32 [[SWO:s[0-9]+]], s0
; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen		; SI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen
; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen		; SI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen
		; TONGA: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen
		; TONGA: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen
		; GFX9: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen
		; GFX9: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen
define amdgpu_gs float @gs_main(i32 %idx) {		define amdgpu_gs float @gs_main(i32 %idx) {
%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx		%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx		%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
%r = fadd float %v1, %v2		%r = fadd float %v1, %v2
ret float %r		ret float %r
}		}

; GCN-LABEL: {{^}}hs_ir_uses_scratch_offset:		; GCN-LABEL: {{^}}hs_ir_uses_scratch_offset:
; SI: s_mov_b32 [[SWO:s[0-9]+]], s6		; TONGA: s_mov_b32 [[SWO:s[0-9]+]], s6
; GFX9: s_mov_b32 [[SWO:s[0-9]+]], s5		; SI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen
; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen		; SI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen
; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen		; TONGA: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen
		; TONGA: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen
		; GFX9: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen
		; GFX9: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen
; GCN: s_mov_b32 s2, s5		; GCN: s_mov_b32 s2, s5
define amdgpu_hs <{i32, i32, i32, float}> @hs_ir_uses_scratch_offset(i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg %swo, i32 %idx) {		define amdgpu_hs <{i32, i32, i32, float}> @hs_ir_uses_scratch_offset(i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg %swo, i32 %idx) {
%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx		%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx		%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
%f = fadd float %v1, %v2		%f = fadd float %v1, %v2
%r1 = insertvalue <{i32, i32, i32, float}> undef, i32 %swo, 2		%r1 = insertvalue <{i32, i32, i32, float}> undef, i32 %swo, 2
%r2 = insertvalue <{i32, i32, i32, float}> %r1, float %f, 3		%r2 = insertvalue <{i32, i32, i32, float}> %r1, float %f, 3
ret <{i32, i32, i32, float}> %r2		ret <{i32, i32, i32, float}> %r2
}		}

; GCN-LABEL: {{^}}gs_ir_uses_scratch_offset:		; GCN-LABEL: {{^}}gs_ir_uses_scratch_offset:
; SI: s_mov_b32 [[SWO:s[0-9]+]], s6		; TONGA: s_mov_b32 [[SWO:s[0-9]+]], s6
; GFX9: s_mov_b32 [[SWO:s[0-9]+]], s5		; SI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen
; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen		; SI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen
; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen		; TONGA: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen
		; TONGA: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, [[SWO]] offen
		; GFX9: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen
		; GFX9: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen
; GCN: s_mov_b32 s2, s5		; GCN: s_mov_b32 s2, s5
define amdgpu_gs <{i32, i32, i32, float}> @gs_ir_uses_scratch_offset(i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg %swo, i32 %idx) {		define amdgpu_gs <{i32, i32, i32, float}> @gs_ir_uses_scratch_offset(i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg %swo, i32 %idx) {
%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx		%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx		%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
%f = fadd float %v1, %v2		%f = fadd float %v1, %v2
%r1 = insertvalue <{i32, i32, i32, float}> undef, i32 %swo, 2		%r1 = insertvalue <{i32, i32, i32, float}> undef, i32 %swo, 2
%r2 = insertvalue <{i32, i32, i32, float}> %r1, float %f, 3		%r2 = insertvalue <{i32, i32, i32, float}> %r1, float %f, 3
ret <{i32, i32, i32, float}> %r2		ret <{i32, i32, i32, float}> %r2
}		}

test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll

	Show All 22 Lines
	; GCNMESA-DAG: s_mov_b32 s16, s3			; GCNMESA-DAG: s_mov_b32 s16, s3
	; GCNMESA-DAG: s_mov_b32 s12, SCRATCH_RSRC_DWORD0			; GCNMESA-DAG: s_mov_b32 s12, SCRATCH_RSRC_DWORD0
	; GCNMESA-DAG: s_mov_b32 s13, SCRATCH_RSRC_DWORD1			; GCNMESA-DAG: s_mov_b32 s13, SCRATCH_RSRC_DWORD1
	; GCNMESA-DAG: s_mov_b32 s14, -1			; GCNMESA-DAG: s_mov_b32 s14, -1
	; SIMESA-DAG: s_mov_b32 s15, 0xe8f000			; SIMESA-DAG: s_mov_b32 s15, 0xe8f000
	; VIMESA-DAG: s_mov_b32 s15, 0xe80000			; VIMESA-DAG: s_mov_b32 s15, 0xe80000
	; GFX9MESA-DAG: s_mov_b32 s15, 0xe00000			; GFX9MESA-DAG: s_mov_b32 s15, 0xe00000

				; The base register will get copy propagated for some sub-targets but not others depending on whether the COPY src gets clobbered before the use in the store instruction.
	; GCN: buffer_store_dword {{v[0-9]+}}, off, s[12:15], s16 offset:{{[0-9]+}} ; 4-byte Folded Spill			; SIMESA: buffer_store_dword {{v[0-9]+}}, off, s[12:15], s16 offset:{{[0-9]+}} ; 4-byte Folded Spill
				; VIMESA: buffer_store_dword {{v[0-9]+}}, off, s[12:15], s16 offset:{{[0-9]+}} ; 4-byte Folded Spill
				; GFX9MESA: buffer_store_dword {{v[0-9]+}}, off, s[12:15], s16 offset:{{[0-9]+}} ; 4-byte Folded Spill
				; CIHSA: buffer_store_dword {{v[0-9]+}}, off, s[12:15], s7 offset:{{[0-9]+}} ; 4-byte Folded Spill
				; VIHSA: buffer_store_dword {{v[0-9]+}}, off, s[12:15], s7 offset:{{[0-9]+}} ; 4-byte Folded Spill

	; GCN: buffer_store_dword {{v[0-9]}}, off, s[12:15], s16 offset:{{[0-9]+}}			; GCN: buffer_store_dword {{v[0-9]}}, off, s[12:15], s16 offset:{{[0-9]+}}
	; GCN: buffer_store_dword {{v[0-9]}}, off, s[12:15], s16 offset:{{[0-9]+}}			; GCN: buffer_store_dword {{v[0-9]}}, off, s[12:15], s16 offset:{{[0-9]+}}
	; GCN: buffer_store_dword {{v[0-9]}}, off, s[12:15], s16 offset:{{[0-9]+}}			; GCN: buffer_store_dword {{v[0-9]}}, off, s[12:15], s16 offset:{{[0-9]+}}
	; GCN: buffer_store_dword {{v[0-9]}}, off, s[12:15], s16 offset:{{[0-9]+}}			; GCN: buffer_store_dword {{v[0-9]}}, off, s[12:15], s16 offset:{{[0-9]+}}

	; GCN: buffer_load_dword {{v[0-9]+}}, off, s[12:15], s16 offset:{{[0-9]+}}			; GCN: buffer_load_dword {{v[0-9]+}}, off, s[12:15], s16 offset:{{[0-9]+}}
	; GCN: buffer_load_dword {{v[0-9]+}}, off, s[12:15], s16 offset:{{[0-9]+}}			; GCN: buffer_load_dword {{v[0-9]+}}, off, s[12:15], s16 offset:{{[0-9]+}}
	▲ Show 20 Lines • Show All 559 Lines • Show Last 20 Lines

test/CodeGen/ARM/atomic-op.ll

Show First 20 Lines • Show All 266 Lines • ▼ Show 20 Lines	entry:
ret i32 %0		ret i32 %0
}		}

define i32 @test_cmpxchg_fail_order(i32 *%addr, i32 %desired, i32 %new) {		define i32 @test_cmpxchg_fail_order(i32 *%addr, i32 %desired, i32 %new) {
; CHECK-LABEL: test_cmpxchg_fail_order:		; CHECK-LABEL: test_cmpxchg_fail_order:

%pair = cmpxchg i32* %addr, i32 %desired, i32 %new seq_cst monotonic		%pair = cmpxchg i32* %addr, i32 %desired, i32 %new seq_cst monotonic
%oldval = extractvalue { i32, i1 } %pair, 0		%oldval = extractvalue { i32, i1 } %pair, 0
; CHECK-ARMV7: ldrex [[OLDVAL:r[0-9]+]], [r[[ADDR:[0-9]+]]]		; CHECK-ARMV7: mov r[[ADDR:[0-9]+]], r0
		; CHECK-ARMV7: ldrex [[OLDVAL:r[0-9]+]], [r0]
; CHECK-ARMV7: cmp [[OLDVAL]], r1		; CHECK-ARMV7: cmp [[OLDVAL]], r1
; CHECK-ARMV7: bne [[FAIL_BB:\.?LBB[0-9]+_[0-9]+]]		; CHECK-ARMV7: bne [[FAIL_BB:\.?LBB[0-9]+_[0-9]+]]
; CHECK-ARMV7: dmb ish		; CHECK-ARMV7: dmb ish
; CHECK-ARMV7: [[LOOP_BB:\.?LBB.*]]:		; CHECK-ARMV7: [[LOOP_BB:\.?LBB.*]]:
; CHECK-ARMV7: strex [[SUCCESS:r[0-9]+]], r2, [r[[ADDR]]]		; CHECK-ARMV7: strex [[SUCCESS:r[0-9]+]], r2, [r[[ADDR]]]
; CHECK-ARMV7: cmp [[SUCCESS]], #0		; CHECK-ARMV7: cmp [[SUCCESS]], #0
; CHECK-ARMV7: beq [[SUCCESS_BB:\.?LBB.*]]		; CHECK-ARMV7: beq [[SUCCESS_BB:\.?LBB.*]]
; CHECK-ARMV7: ldrex [[OLDVAL]], [r[[ADDR]]]		; CHECK-ARMV7: ldrex [[OLDVAL]], [r[[ADDR]]]
; CHECK-ARMV7: cmp [[OLDVAL]], r1		; CHECK-ARMV7: cmp [[OLDVAL]], r1
; CHECK-ARMV7: beq [[LOOP_BB]]		; CHECK-ARMV7: beq [[LOOP_BB]]
; CHECK-ARMV7: [[FAIL_BB]]:		; CHECK-ARMV7: [[FAIL_BB]]:
; CHECK-ARMV7: clrex		; CHECK-ARMV7: clrex
; CHECK-ARMV7: bx lr		; CHECK-ARMV7: bx lr
; CHECK-ARMV7: [[SUCCESS_BB]]:		; CHECK-ARMV7: [[SUCCESS_BB]]:
; CHECK-ARMV7: dmb ish		; CHECK-ARMV7: dmb ish
; CHECK-ARMV7: bx lr		; CHECK-ARMV7: bx lr

; CHECK-T2: ldrex [[OLDVAL:r[0-9]+]], [r[[ADDR:[0-9]+]]]		; CHECK-T2: mov r[[ADDR:[0-9]+]], r0
		; CHECK-T2: ldrex [[OLDVAL:r[0-9]+]], [r0]
; CHECK-T2: cmp [[OLDVAL]], r1		; CHECK-T2: cmp [[OLDVAL]], r1
; CHECK-T2: bne [[FAIL_BB:\.?LBB.*]]		; CHECK-T2: bne [[FAIL_BB:\.?LBB.*]]
; CHECK-T2: dmb ish		; CHECK-T2: dmb ish
; CHECK-T2: [[LOOP_BB:\.?LBB.*]]:		; CHECK-T2: [[LOOP_BB:\.?LBB.*]]:
; CHECK-T2: strex [[SUCCESS:r[0-9]+]], r2, [r[[ADDR]]]		; CHECK-T2: strex [[SUCCESS:r[0-9]+]], r2, [r[[ADDR]]]
; CHECK-T2: cmp [[SUCCESS]], #0		; CHECK-T2: cmp [[SUCCESS]], #0
; CHECK-T2: dmbeq ish		; CHECK-T2: dmbeq ish
; CHECK-T2: bxeq lr		; CHECK-T2: bxeq lr
▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

test/CodeGen/ARM/swifterror.ll

	Show First 20 Lines • Show All 175 Lines • ▼ Show 20 Lines
	; CHECK-APPLE-LABEL: foo_loop:			; CHECK-APPLE-LABEL: foo_loop:
	; CHECK-APPLE: mov [[CODE:r[0-9]+]], r0			; CHECK-APPLE: mov [[CODE:r[0-9]+]], r0
	; swifterror is kept in a register			; swifterror is kept in a register
	; CHECK-APPLE: mov [[ID:r[0-9]+]], r8			; CHECK-APPLE: mov [[ID:r[0-9]+]], r8
	; CHECK-APPLE: cmp [[CODE]], #0			; CHECK-APPLE: cmp [[CODE]], #0
	; CHECK-APPLE: beq			; CHECK-APPLE: beq
	; CHECK-APPLE: mov r0, #16			; CHECK-APPLE: mov r0, #16
	; CHECK-APPLE: malloc			; CHECK-APPLE: malloc
	; CHECK-APPLE: strb r{{.}}, [{{.}}[[ID]], #8]			; CHECK-APPLE: strb r{{.*}}, [r0, #8]
	; CHECK-APPLE: ble			; CHECK-APPLE: ble
	; CHECK-APPLE: mov r8, [[ID]]			; CHECK-APPLE: mov r8, [[ID]]

	; CHECK-O0-LABEL: foo_loop:			; CHECK-O0-LABEL: foo_loop:
	; CHECK-O0: mov r{{.*}}, r8			; CHECK-O0: mov r{{.*}}, r8
	; CHECK-O0: cmp r{{.*}}, #0			; CHECK-O0: cmp r{{.*}}, #0
	; CHECK-O0: beq			; CHECK-O0: beq
	; CHECK-O0-DAG: movw r{{.*}}, #1			; CHECK-O0-DAG: movw r{{.*}}, #1
	▲ Show 20 Lines • Show All 366 Lines • Show Last 20 Lines

test/CodeGen/Mips/llvm-ir/sub.ll

	Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines
	; MMR3: subu16 $[[T16:[0-9]+]], $[[T15]], $[[T10]]			; MMR3: subu16 $[[T16:[0-9]+]], $[[T15]], $[[T10]]
	; MMR3: subu16 $[[T17:[0-9]+]], $6, $[[T1]]			; MMR3: subu16 $[[T17:[0-9]+]], $6, $[[T1]]
	; MMR3: subu16 $[[T18:[0-9]+]], $[[T17]], $7			; MMR3: subu16 $[[T18:[0-9]+]], $[[T17]], $7
	; MMR3: lw $[[T19:[0-9]+]], 8($sp)			; MMR3: lw $[[T19:[0-9]+]], 8($sp)
	; MMR3: lw $[[T20:[0-9]+]], 0($sp)			; MMR3: lw $[[T20:[0-9]+]], 0($sp)
	; MMR3: subu16 $5, $[[T19]], $[[T20]]			; MMR3: subu16 $5, $[[T19]], $[[T20]]

	; MMR6: move $[[T0:[0-9]+]], $7			; MMR6: move $[[T0:[0-9]+]], $7
	; MMR6: sw $[[T0]], 8($sp)			; MMR6: sw $7, 8($sp)
	; MMR6: move $[[T1:[0-9]+]], $5			; MMR6: move $[[T1:[0-9]+]], $5
	; MMR6: sw $4, 12($sp)			; MMR6: sw $4, 12($sp)
	; MMR6: lw $[[T2:[0-9]+]], 48($sp)			; MMR6: lw $[[T2:[0-9]+]], 48($sp)
	; MMR6: sltu $[[T3:[0-9]+]], $6, $[[T2]]			; MMR6: sltu $[[T3:[0-9]+]], $6, $[[T2]]
	; MMR6: xor $[[T4:[0-9]+]], $6, $[[T2]]			; MMR6: xor $[[T4:[0-9]+]], $6, $[[T2]]
	; MMR6: sltiu $[[T5:[0-9]+]], $[[T4]], 1			; MMR6: sltiu $[[T5:[0-9]+]], $[[T4]], 1
	; MMR6: seleqz $[[T6:[0-9]+]], $[[T3]], $[[T5]]			; MMR6: seleqz $[[T6:[0-9]+]], $[[T3]], $[[T5]]
	; MMR6: lw $[[T7:[0-9]+]], 52($sp)			; MMR6: lw $[[T7:[0-9]+]], 52($sp)
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/fma-mutate.ll

	; Test several VSX FMA mutation opportunities. The first one isn't a			; Test several VSX FMA mutation opportunities. The first one isn't a
	; reasonable transformation because the killed product register is the			; reasonable transformation because the killed product register is the
	; same as the FMA target register. The second one is legal. The third			; same as the FMA target register. The second one is legal. The third
	; one doesn't fit the feeding-copy pattern.			; one doesn't fit the feeding-copy pattern.

	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -mattr=+vsx -disable-ppc-vsx-fma-mutation=false \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -mattr=+vsx -disable-ppc-vsx-fma-mutation=false \| FileCheck %s
	target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"			target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	declare double @llvm.sqrt.f64(double)			declare double @llvm.sqrt.f64(double)

	define double @foo3(double %a) nounwind {			define double @foo3(double %a) nounwind {
	%r = call double @llvm.sqrt.f64(double %a)			%r = call double @llvm.sqrt.f64(double %a)
	ret double %r			ret double %r

	; CHECK: @foo3			; CHECK: @foo3
	; CHECK: xsnmsubadp [[REG:[0-9]+]], {{[0-9]+}}, [[REG]]			; CHECK: fmr [[REG:[0-9]+]], [[REG2:[0-9]+]]
				; CHECK: xsnmsubadp [[REG]], {{[0-9]+}}, [[REG2]]
				hfinkelUnsubmitted Not Done Reply Inline Actions Is this an improvement? hfinkel: Is this an improvement?
				gberryAuthorUnsubmitted Not Done Reply Inline Actions The fmr is not new, I just added it to get the second register number. Here are the full diffs before/after this change for this test: fmr 3, 1 addi 3, 3, .LCPI0_0@toc@l lfs 2, 0(3) - xsnmsubadp 3, 2, 3 + xsnmsubadp 3, 2, 1 xsmuldp 4, 0, 0 xsmaddmdp 4, 3, 2 xsmuldp 0, 0, 4 gberry: The fmr is not new, I just added it to get the second register number. Here are the full diffs…
				hfinkelUnsubmitted Not Done Reply Inline Actions Okay, thanks! hfinkel: Okay, thanks!
	; CHECK: xsmaddmdp			; CHECK: xsmaddmdp
	; CHECK: xsmaddadp			; CHECK: xsmaddadp
	}			}

test/CodeGen/PowerPC/inlineasm-i64-reg.ll

Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	entry:
%conv.i = trunc i64 %asmresult1.i to i32		%conv.i = trunc i64 %asmresult1.i to i32
%cmp = icmp eq i32 %conv.i, 0		%cmp = icmp eq i32 %conv.i, 0
br i1 %cmp, label %if.then, label %if.end		br i1 %cmp, label %if.then, label %if.end

; CHECK-LABEL: @main		; CHECK-LABEL: @main

; CHECK-DAG: mr [[REG:[0-9]+]], 3		; CHECK-DAG: mr [[REG:[0-9]+]], 3
; CHECK-DAG: li 0, 1076		; CHECK-DAG: li 0, 1076
; CHECK: stw [[REG]],		; CHECK-DAG: stw 3,

; CHECK: #APP		; CHECK: #APP
; CHECK: sc		; CHECK: sc
; CHECK: #NO_APP		; CHECK: #NO_APP

; CHECK: cmpwi {{[0-9]+}}, [[REG]], 1		; CHECK: cmpwi {{[0-9]+}}, [[REG]], 1

; CHECK: blr		; CHECK: blr
Show All 22 Lines

test/CodeGen/PowerPC/tail-dup-layout.ll

	Show All 17 Lines
	; so optional1 includes a copy of test2 at the end, and branches			; so optional1 includes a copy of test2 at the end, and branches
	; to test3 (at the top) or falls through to optional 2.			; to test3 (at the top) or falls through to optional 2.
	; The CHECK statements check for the whole string of tests			; The CHECK statements check for the whole string of tests
	; and then check that the correct test has been duplicated into the end of			; and then check that the correct test has been duplicated into the end of
	; the optional blocks and that the optional blocks are in the correct order.			; the optional blocks and that the optional blocks are in the correct order.
	;CHECK-LABEL: straight_test:			;CHECK-LABEL: straight_test:
	; test1 may have been merged with entry			; test1 may have been merged with entry
	;CHECK: mr [[TAGREG:[0-9]+]], 3			;CHECK: mr [[TAGREG:[0-9]+]], 3
	;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1			;CHECK: andi. {{[0-9]+}}, [[TAGREG:[0-9]+]], 1
	;CHECK-NEXT: bc 12, 1, .[[OPT1LABEL:[_0-9A-Za-z]+]]			;CHECK-NEXT: bc 12, 1, .[[OPT1LABEL:[_0-9A-Za-z]+]]
	;CHECK-NEXT: # %test2			;CHECK-NEXT: # %test2
	;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30			;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
	;CHECK-NEXT: bne 0, .[[OPT2LABEL:[_0-9A-Za-z]+]]			;CHECK-NEXT: bne 0, .[[OPT2LABEL:[_0-9A-Za-z]+]]
	;CHECK-NEXT: .[[TEST3LABEL:[_0-9A-Za-z]+]]: # %test3			;CHECK-NEXT: .[[TEST3LABEL:[_0-9A-Za-z]+]]: # %test3
	;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29			;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29
	;CHECK-NEXT: bne 0, .[[OPT3LABEL:[_0-9A-Za-z]+]]			;CHECK-NEXT: bne 0, .[[OPT3LABEL:[_0-9A-Za-z]+]]
	;CHECK-NEXT: .[[TEST4LABEL:[_0-9A-Za-z]+]]: # %test4			;CHECK-NEXT: .[[TEST4LABEL:[_0-9A-Za-z]+]]: # %test4
	▲ Show 20 Lines • Show All 599 Lines • Show Last 20 Lines

test/CodeGen/SPARC/32abi.ll

	Show First 20 Lines • Show All 150 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: call_floatarg:			; CHECK-LABEL: call_floatarg:
	; HARD: save %sp, -112, %sp			; HARD: save %sp, -112, %sp
	; HARD: mov %i2, %o1			; HARD: mov %i2, %o1
	; HARD-NEXT: mov %i1, %o0			; HARD-NEXT: mov %i1, %o0
	; HARD-NEXT: st %i0, [%sp+104]			; HARD-NEXT: st %i0, [%sp+104]
	; HARD-NEXT: std %o0, [%sp+96]			; HARD-NEXT: std %o0, [%sp+96]
	; HARD-NEXT: st %o1, [%sp+92]			; HARD-NEXT: st %o1, [%sp+92]
	; HARD-NEXT: mov %i0, %o2			; HARD-NEXT: mov %i0, %o2
	; HARD-NEXT: mov %o0, %o3			; HARD-NEXT: mov %i1, %o3
	; HARD-NEXT: mov %o1, %o4			; HARD-NEXT: mov %o1, %o4
	; HARD-NEXT: mov %o0, %o5			; HARD-NEXT: mov %i1, %o5
	; HARD-NEXT: call floatarg			; HARD-NEXT: call floatarg
	; HARD: std %f0, [%i4]			; HARD: std %f0, [%i4]
	; SOFT: st %i0, [%sp+104]			; SOFT: st %i0, [%sp+104]
	; SOFT-NEXT: st %i2, [%sp+100]			; SOFT-NEXT: st %i2, [%sp+100]
	; SOFT-NEXT: st %i1, [%sp+96]			; SOFT-NEXT: st %i1, [%sp+96]
	; SOFT-NEXT: st %i2, [%sp+92]			; SOFT-NEXT: st %i2, [%sp+92]
	; SOFT-NEXT: mov %i1, %o0			; SOFT-NEXT: mov %i1, %o0
	; SOFT-NEXT: mov %i2, %o1			; SOFT-NEXT: mov %i2, %o1
	▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

test/CodeGen/SPARC/atomics.ll

	Show First 20 Lines • Show All 229 Lines • ▼ Show 20 Lines
	define zeroext i16 @test_load_sub_i16(i16* %p, i16 zeroext %v) {			define zeroext i16 @test_load_sub_i16(i16* %p, i16 zeroext %v) {
	entry:			entry:
	%0 = atomicrmw sub i16* %p, i16 %v seq_cst			%0 = atomicrmw sub i16* %p, i16 %v seq_cst
	ret i16 %0			ret i16 %0
	}			}

	; CHECK-LABEL: test_load_add_i32			; CHECK-LABEL: test_load_add_i32
	; CHECK: membar			; CHECK: membar
	; CHECK: add [[V:%[gilo][0-7]]], %o1, [[U:%[gilo][0-7]]]			; CHECK: mov [[U:%[gilo][0-7]]], [[V:%[gilo][0-7]]]
	; CHECK: cas [%o0], [[V]], [[U]]			; CHECK: add [[U:%[gilo][0-7]]], %o1, [[V2:%[gilo][0-7]]]
				; CHECK: cas [%o0], [[V]], [[V2]]
	; CHECK: membar			; CHECK: membar
	define zeroext i32 @test_load_add_i32(i32* %p, i32 zeroext %v) {			define zeroext i32 @test_load_add_i32(i32* %p, i32 zeroext %v) {
	entry:			entry:
	%0 = atomicrmw add i32* %p, i32 %v seq_cst			%0 = atomicrmw add i32* %p, i32 %v seq_cst
	ret i32 %0			ret i32 %0
	}			}

	; CHECK-LABEL: test_load_sub_64			; CHECK-LABEL: test_load_sub_64
	▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

test/CodeGen/Thumb/thumb-shrink-wrapping.ll

Show First 20 Lines • Show All 592 Lines • ▼ Show 20 Lines	if.end:
ret i32 42		ret i32 42
}		}

declare void @abort() #0		declare void @abort() #0

define i32 @b_to_bx(i32 %value) {		define i32 @b_to_bx(i32 %value) {
; CHECK-LABEL: b_to_bx:		; CHECK-LABEL: b_to_bx:
; DISABLE: push {r7, lr}		; DISABLE: push {r7, lr}
; CHECK: cmp r1, #49		; CHECK: cmp r0, #49
; CHECK-NEXT: bgt [[ELSE_LABEL:LBB[0-9_]+]]		; CHECK-NEXT: bgt [[ELSE_LABEL:LBB[0-9_]+]]
; ENABLE: push {r7, lr}		; ENABLE: push {r7, lr}

; CHECK: bl		; CHECK: bl
; DISABLE-V5-NEXT: pop {r7, pc}		; DISABLE-V5-NEXT: pop {r7, pc}
; DISABLE-V4T-NEXT: b [[END_LABEL:LBB[0-9_]+]]		; DISABLE-V4T-NEXT: b [[END_LABEL:LBB[0-9_]+]]

; ENABLE-V5-NEXT: pop {r7, pc}		; ENABLE-V5-NEXT: pop {r7, pc}
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

test/CodeGen/X86/2006-03-01-InstrSchedBug.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown-unknown \| FileCheck %s			; RUN: llc < %s -mtriple=i686-unknown-unknown \| FileCheck %s

	define i32 @f(i32 %a, i32 %b) {			define i32 @f(i32 %a, i32 %b) {
	; CHECK-LABEL: f:			; CHECK-LABEL: f:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; CHECK-NEXT: movl %ecx, %edx			; CHECK-NEXT: movl %ecx, %edx
	; CHECK-NEXT: imull %edx, %edx			; CHECK-NEXT: imull %ecx, %edx
	; CHECK-NEXT: imull %eax, %ecx			; CHECK-NEXT: imull %eax, %ecx
	; CHECK-NEXT: imull %eax, %eax			; CHECK-NEXT: imull %eax, %eax
	; CHECK-NEXT: addl %edx, %eax			; CHECK-NEXT: addl %edx, %eax
	; CHECK-NEXT: leal (%eax,%ecx,2), %eax			; CHECK-NEXT: leal (%eax,%ecx,2), %eax
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%tmp.2 = mul i32 %a, %a			%tmp.2 = mul i32 %a, %a
	%tmp.5 = shl i32 %a, 1			%tmp.5 = shl i32 %a, 1
	%tmp.6 = mul i32 %tmp.5, %b			%tmp.6 = mul i32 %tmp.5, %b
	%tmp.10 = mul i32 %b, %b			%tmp.10 = mul i32 %b, %b
	%tmp.7 = add i32 %tmp.10, %tmp.2			%tmp.7 = add i32 %tmp.10, %tmp.2
	%tmp.11 = add i32 %tmp.7, %tmp.6			%tmp.11 = add i32 %tmp.7, %tmp.6
	ret i32 %tmp.11			ret i32 %tmp.11
	}			}

test/CodeGen/X86/arg-copy-elide.ll

	Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	}			}

	; CHECK-LABEL: _fastcc_split_i64:			; CHECK-LABEL: _fastcc_split_i64:
	; CHECK: pushl %ebp			; CHECK: pushl %ebp
	; CHECK: movl %esp, %ebp			; CHECK: movl %esp, %ebp
	; CHECK-DAG: movl %edx, %[[r1:[^ ]*]]			; CHECK-DAG: movl %edx, %[[r1:[^ ]*]]
	; CHECK-DAG: movl 8(%ebp), %[[r2:[^ ]*]]			; CHECK-DAG: movl 8(%ebp), %[[r2:[^ ]*]]
	; CHECK-DAG: movl %[[r2]], 4(%esp)			; CHECK-DAG: movl %[[r2]], 4(%esp)
	; CHECK-DAG: movl %[[r1]], (%esp)			; CHECK-DAG: movl %edx, (%esp)
	; CHECK: movl %esp, %[[reg:[^ ]*]]			; CHECK: movl %esp, %[[reg:[^ ]*]]
	; CHECK: pushl %[[reg]]			; CHECK: pushl %[[reg]]
	; CHECK: calll _addrof_i64			; CHECK: calll _addrof_i64
	; CHECK: popl %ebp			; CHECK: popl %ebp
	; CHECK: retl			; CHECK: retl


	; We can't copy elide when it would reduce the user requested alignment.			; We can't copy elide when it would reduce the user requested alignment.
	▲ Show 20 Lines • Show All 179 Lines • Show Last 20 Lines

test/CodeGen/X86/avg.ll

	Show First 20 Lines • Show All 401 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: packuswb %xmm1, %xmm10			; SSE2-NEXT: packuswb %xmm1, %xmm10
	; SSE2-NEXT: psrld $1, %xmm2			; SSE2-NEXT: psrld $1, %xmm2
	; SSE2-NEXT: movdqa -{{[0-9]+}}(%rsp), %xmm1 # 16-byte Reload			; SSE2-NEXT: movdqa -{{[0-9]+}}(%rsp), %xmm1 # 16-byte Reload
	; SSE2-NEXT: psrld $1, %xmm1			; SSE2-NEXT: psrld $1, %xmm1
	; SSE2-NEXT: pand %xmm0, %xmm1			; SSE2-NEXT: pand %xmm0, %xmm1
	; SSE2-NEXT: pand %xmm0, %xmm2			; SSE2-NEXT: pand %xmm0, %xmm2
	; SSE2-NEXT: packuswb %xmm1, %xmm2			; SSE2-NEXT: packuswb %xmm1, %xmm2
	; SSE2-NEXT: packuswb %xmm10, %xmm2			; SSE2-NEXT: packuswb %xmm10, %xmm2
	; SSE2-NEXT: movdqa %xmm2, %xmm1
	; SSE2-NEXT: psrld $1, %xmm4			; SSE2-NEXT: psrld $1, %xmm4
	; SSE2-NEXT: psrld $1, %xmm12			; SSE2-NEXT: psrld $1, %xmm12
	; SSE2-NEXT: pand %xmm0, %xmm12			; SSE2-NEXT: pand %xmm0, %xmm12
	; SSE2-NEXT: pand %xmm0, %xmm4			; SSE2-NEXT: pand %xmm0, %xmm4
	; SSE2-NEXT: packuswb %xmm12, %xmm4			; SSE2-NEXT: packuswb %xmm12, %xmm4
	; SSE2-NEXT: psrld $1, %xmm13			; SSE2-NEXT: psrld $1, %xmm13
	; SSE2-NEXT: psrld $1, %xmm15			; SSE2-NEXT: psrld $1, %xmm15
	; SSE2-NEXT: pand %xmm0, %xmm15			; SSE2-NEXT: pand %xmm0, %xmm15
	Show All 20 Lines
	; SSE2-NEXT: psrld $1, %xmm5			; SSE2-NEXT: psrld $1, %xmm5
	; SSE2-NEXT: pand %xmm0, %xmm5			; SSE2-NEXT: pand %xmm0, %xmm5
	; SSE2-NEXT: pand %xmm0, %xmm7			; SSE2-NEXT: pand %xmm0, %xmm7
	; SSE2-NEXT: packuswb %xmm5, %xmm7			; SSE2-NEXT: packuswb %xmm5, %xmm7
	; SSE2-NEXT: packuswb %xmm3, %xmm7			; SSE2-NEXT: packuswb %xmm3, %xmm7
	; SSE2-NEXT: movdqu %xmm7, (%rax)			; SSE2-NEXT: movdqu %xmm7, (%rax)
	; SSE2-NEXT: movdqu %xmm11, (%rax)			; SSE2-NEXT: movdqu %xmm11, (%rax)
	; SSE2-NEXT: movdqu %xmm13, (%rax)			; SSE2-NEXT: movdqu %xmm13, (%rax)
	; SSE2-NEXT: movdqu %xmm1, (%rax)			; SSE2-NEXT: movdqu %xmm2, (%rax)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX1-LABEL: avg_v64i8:			; AVX1-LABEL: avg_v64i8:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: subq $24, %rsp			; AVX1-NEXT: subq $24, %rsp
	; AVX1-NEXT: .Lcfi0:			; AVX1-NEXT: .Lcfi0:
	; AVX1-NEXT: .cfi_def_cfa_offset 32			; AVX1-NEXT: .cfi_def_cfa_offset 32
	; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero
	▲ Show 20 Lines • Show All 2,513 Lines • Show Last 20 Lines

test/CodeGen/X86/avx512-bugfix-25270.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl \| FileCheck %s

	declare void @Print__512(<16 x i32>) #0			declare void @Print__512(<16 x i32>) #0

	define void @bar__512(<16 x i32>* %var) #0 {			define void @bar__512(<16 x i32>* %var) #0 {
	; CHECK-LABEL: bar__512:			; CHECK-LABEL: bar__512:
	; CHECK: ## BB#0: ## %allocas			; CHECK: ## BB#0: ## %allocas
	; CHECK-NEXT: pushq %rbx			; CHECK-NEXT: pushq %rbx
	; CHECK-NEXT: subq $112, %rsp			; CHECK-NEXT: subq $112, %rsp
	; CHECK-NEXT: movq %rdi, %rbx			; CHECK-NEXT: movq %rdi, %rbx
	; CHECK-NEXT: vmovups (%rbx), %zmm0			; CHECK-NEXT: vmovups (%rdi), %zmm0
	; CHECK-NEXT: vmovups %zmm0, (%rsp) ## 64-byte Spill			; CHECK-NEXT: vmovups %zmm0, (%rsp) ## 64-byte Spill
	; CHECK-NEXT: vbroadcastss {{.*}}(%rip), %zmm1			; CHECK-NEXT: vbroadcastss {{.*}}(%rip), %zmm1
	; CHECK-NEXT: vmovaps %zmm1, (%rbx)			; CHECK-NEXT: vmovaps %zmm1, (%rdi)
	; CHECK-NEXT: callq _Print__512			; CHECK-NEXT: callq _Print__512
	; CHECK-NEXT: vmovups (%rsp), %zmm0 ## 64-byte Reload			; CHECK-NEXT: vmovups (%rsp), %zmm0 ## 64-byte Reload
	; CHECK-NEXT: callq _Print__512			; CHECK-NEXT: callq _Print__512
	; CHECK-NEXT: vbroadcastss {{.*}}(%rip), %zmm0			; CHECK-NEXT: vbroadcastss {{.*}}(%rip), %zmm0
	; CHECK-NEXT: vmovaps %zmm0, (%rbx)			; CHECK-NEXT: vmovaps %zmm0, (%rbx)
	; CHECK-NEXT: addq $112, %rsp			; CHECK-NEXT: addq $112, %rsp
	; CHECK-NEXT: popq %rbx			; CHECK-NEXT: popq %rbx
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	Show All 12 Lines

test/CodeGen/X86/avx512-calling-conv.ll

	Show First 20 Lines • Show All 460 Lines • ▼ Show 20 Lines
	; KNL_X32-NEXT: .cfi_offset %ebx, -8			; KNL_X32-NEXT: .cfi_offset %ebx, -8
	; KNL_X32-NEXT: movl {{[0-9]+}}(%esp), %esi			; KNL_X32-NEXT: movl {{[0-9]+}}(%esp), %esi
	; KNL_X32-NEXT: movl {{[0-9]+}}(%esp), %edi			; KNL_X32-NEXT: movl {{[0-9]+}}(%esp), %edi
	; KNL_X32-NEXT: movl {{[0-9]+}}(%esp), %eax			; KNL_X32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; KNL_X32-NEXT: movl %eax, {{[0-9]+}}(%esp)			; KNL_X32-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; KNL_X32-NEXT: movl %edi, (%esp)			; KNL_X32-NEXT: movl %edi, (%esp)
	; KNL_X32-NEXT: calll _test11			; KNL_X32-NEXT: calll _test11
	; KNL_X32-NEXT: movl %eax, %ebx			; KNL_X32-NEXT: movl %eax, %ebx
	; KNL_X32-NEXT: movzbl %bl, %eax			; KNL_X32-NEXT: movzbl %al, %eax
	; KNL_X32-NEXT: movl %eax, {{[0-9]+}}(%esp)			; KNL_X32-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; KNL_X32-NEXT: movl %esi, {{[0-9]+}}(%esp)			; KNL_X32-NEXT: movl %esi, {{[0-9]+}}(%esp)
	; KNL_X32-NEXT: movl %edi, (%esp)			; KNL_X32-NEXT: movl %edi, (%esp)
	; KNL_X32-NEXT: calll _test10			; KNL_X32-NEXT: calll _test10
	; KNL_X32-NEXT: xorl %ecx, %ecx			; KNL_X32-NEXT: xorl %ecx, %ecx
	; KNL_X32-NEXT: testb $1, %bl			; KNL_X32-NEXT: testb $1, %bl
	; KNL_X32-NEXT: cmovel %ecx, %eax			; KNL_X32-NEXT: cmovel %ecx, %eax
	; KNL_X32-NEXT: addl $16, %esp			; KNL_X32-NEXT: addl $16, %esp
	Show All 9 Lines

test/CodeGen/X86/avx512-mask-op.ll

	Show First 20 Lines • Show All 1,165 Lines • ▼ Show 20 Lines

	define <8 x i1> @test18(i8 %a, i16 %y) {			define <8 x i1> @test18(i8 %a, i16 %y) {
	; KNL-LABEL: test18:			; KNL-LABEL: test18:
	; KNL: ## BB#0:			; KNL: ## BB#0:
	; KNL-NEXT: kmovw %edi, %k1			; KNL-NEXT: kmovw %edi, %k1
	; KNL-NEXT: kmovw %esi, %k0			; KNL-NEXT: kmovw %esi, %k0
	; KNL-NEXT: kshiftlw $7, %k0, %k2			; KNL-NEXT: kshiftlw $7, %k0, %k2
	; KNL-NEXT: kshiftrw $15, %k2, %k2			; KNL-NEXT: kshiftrw $15, %k2, %k2
	; KNL-NEXT: kmovw %k2, %eax
	; KNL-NEXT: kshiftlw $6, %k0, %k0			; KNL-NEXT: kshiftlw $6, %k0, %k0
	; KNL-NEXT: kshiftrw $15, %k0, %k0			; KNL-NEXT: kshiftrw $15, %k0, %k0
	; KNL-NEXT: kmovw %k0, %ecx			; KNL-NEXT: kmovw %k0, %ecx
	; KNL-NEXT: vpternlogq $255, %zmm0, %zmm0, %zmm0 {%k1} {z}			; KNL-NEXT: vpternlogq $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
	; KNL-NEXT: kmovw %ecx, %k1			; KNL-NEXT: kmovw %ecx, %k1
	; KNL-NEXT: vpternlogq $255, %zmm1, %zmm1, %zmm1 {%k1} {z}			; KNL-NEXT: vpternlogq $255, %zmm1, %zmm1, %zmm1 {%k1} {z}
	; KNL-NEXT: vmovdqa64 {{.*#+}} zmm2 = [0,1,2,3,4,5,8,7]			; KNL-NEXT: vmovdqa64 {{.*#+}} zmm2 = [0,1,2,3,4,5,8,7]
	; KNL-NEXT: vpermi2q %zmm1, %zmm0, %zmm2			; KNL-NEXT: vpermi2q %zmm1, %zmm0, %zmm2
	; KNL-NEXT: vpsllq $63, %zmm2, %zmm0			; KNL-NEXT: vpsllq $63, %zmm2, %zmm0
	; KNL-NEXT: vptestmq %zmm0, %zmm0, %k0			; KNL-NEXT: vptestmq %zmm0, %zmm0, %k0
	; KNL-NEXT: kshiftlw $1, %k0, %k0			; KNL-NEXT: kshiftlw $1, %k0, %k0
	; KNL-NEXT: kshiftrw $1, %k0, %k0			; KNL-NEXT: kshiftrw $1, %k0, %k0
	; KNL-NEXT: kmovw %eax, %k1			; KNL-NEXT: kshiftlw $7, %k2, %k1
	; KNL-NEXT: kshiftlw $7, %k1, %k1
	; KNL-NEXT: korw %k1, %k0, %k1			; KNL-NEXT: korw %k1, %k0, %k1
	; KNL-NEXT: vpternlogq $255, %zmm0, %zmm0, %zmm0 {%k1} {z}			; KNL-NEXT: vpternlogq $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
	; KNL-NEXT: vpmovqw %zmm0, %xmm0			; KNL-NEXT: vpmovqw %zmm0, %xmm0
	; KNL-NEXT: retq			; KNL-NEXT: retq
	;			;
	; SKX-LABEL: test18:			; SKX-LABEL: test18:
	; SKX: ## BB#0:			; SKX: ## BB#0:
	; SKX-NEXT: kmovd %edi, %k0			; SKX-NEXT: kmovd %edi, %k0
	; SKX-NEXT: kmovd %esi, %k1			; SKX-NEXT: kmovd %esi, %k1
	; SKX-NEXT: kshiftlw $7, %k1, %k2			; SKX-NEXT: kshiftlw $7, %k1, %k2
	; SKX-NEXT: kshiftrw $15, %k2, %k2			; SKX-NEXT: kshiftrw $15, %k2, %k2
	; SKX-NEXT: kmovd %k2, %eax
	; SKX-NEXT: kshiftlw $6, %k1, %k1			; SKX-NEXT: kshiftlw $6, %k1, %k1
	; SKX-NEXT: kshiftrw $15, %k1, %k1			; SKX-NEXT: kshiftrw $15, %k1, %k1
	; SKX-NEXT: kmovd %k1, %ecx
	; SKX-NEXT: vpmovm2q %k0, %zmm0			; SKX-NEXT: vpmovm2q %k0, %zmm0
	; SKX-NEXT: kmovd %ecx, %k0			; SKX-NEXT: vpmovm2q %k1, %zmm1
	; SKX-NEXT: vpmovm2q %k0, %zmm1
	; SKX-NEXT: vmovdqa64 {{.*#+}} zmm2 = [0,1,2,3,4,5,8,7]			; SKX-NEXT: vmovdqa64 {{.*#+}} zmm2 = [0,1,2,3,4,5,8,7]
	; SKX-NEXT: vpermi2q %zmm1, %zmm0, %zmm2			; SKX-NEXT: vpermi2q %zmm1, %zmm0, %zmm2
	; SKX-NEXT: vpmovq2m %zmm2, %k0			; SKX-NEXT: vpmovq2m %zmm2, %k0
	; SKX-NEXT: kshiftlb $1, %k0, %k0			; SKX-NEXT: kshiftlb $1, %k0, %k0
	; SKX-NEXT: kshiftrb $1, %k0, %k0			; SKX-NEXT: kshiftrb $1, %k0, %k0
	; SKX-NEXT: kmovd %eax, %k1			; SKX-NEXT: kshiftlb $7, %k2, %k1
	; SKX-NEXT: kshiftlb $7, %k1, %k1
	; SKX-NEXT: korb %k1, %k0, %k0			; SKX-NEXT: korb %k1, %k0, %k0
	; SKX-NEXT: vpmovm2w %k0, %xmm0			; SKX-NEXT: vpmovm2w %k0, %xmm0
	; SKX-NEXT: vzeroupper			; SKX-NEXT: vzeroupper
	; SKX-NEXT: retq			; SKX-NEXT: retq
	;			;
	; AVX512BW-LABEL: test18:			; AVX512BW-LABEL: test18:
	; AVX512BW: ## BB#0:			; AVX512BW: ## BB#0:
	; AVX512BW-NEXT: kmovd %edi, %k1			; AVX512BW-NEXT: kmovd %edi, %k1
	; AVX512BW-NEXT: kmovd %esi, %k0			; AVX512BW-NEXT: kmovd %esi, %k0
	; AVX512BW-NEXT: kshiftlw $7, %k0, %k2			; AVX512BW-NEXT: kshiftlw $7, %k0, %k2
	; AVX512BW-NEXT: kshiftrw $15, %k2, %k2			; AVX512BW-NEXT: kshiftrw $15, %k2, %k2
	; AVX512BW-NEXT: kmovd %k2, %eax
	; AVX512BW-NEXT: kshiftlw $6, %k0, %k0			; AVX512BW-NEXT: kshiftlw $6, %k0, %k0
	; AVX512BW-NEXT: kshiftrw $15, %k0, %k0			; AVX512BW-NEXT: kshiftrw $15, %k0, %k0
	; AVX512BW-NEXT: kmovd %k0, %ecx			; AVX512BW-NEXT: kmovd %k0, %ecx
	; AVX512BW-NEXT: vpternlogq $255, %zmm0, %zmm0, %zmm0 {%k1} {z}			; AVX512BW-NEXT: vpternlogq $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
	; AVX512BW-NEXT: kmovd %ecx, %k1			; AVX512BW-NEXT: kmovd %ecx, %k1
	; AVX512BW-NEXT: vpternlogq $255, %zmm1, %zmm1, %zmm1 {%k1} {z}			; AVX512BW-NEXT: vpternlogq $255, %zmm1, %zmm1, %zmm1 {%k1} {z}
	; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm2 = [0,1,2,3,4,5,8,7]			; AVX512BW-NEXT: vmovdqa64 {{.*#+}} zmm2 = [0,1,2,3,4,5,8,7]
	; AVX512BW-NEXT: vpermi2q %zmm1, %zmm0, %zmm2			; AVX512BW-NEXT: vpermi2q %zmm1, %zmm0, %zmm2
	; AVX512BW-NEXT: vpsllq $63, %zmm2, %zmm0			; AVX512BW-NEXT: vpsllq $63, %zmm2, %zmm0
	; AVX512BW-NEXT: vptestmq %zmm0, %zmm0, %k0			; AVX512BW-NEXT: vptestmq %zmm0, %zmm0, %k0
	; AVX512BW-NEXT: kshiftlw $1, %k0, %k0			; AVX512BW-NEXT: kshiftlw $1, %k0, %k0
	; AVX512BW-NEXT: kshiftrw $1, %k0, %k0			; AVX512BW-NEXT: kshiftrw $1, %k0, %k0
	; AVX512BW-NEXT: kmovd %eax, %k1			; AVX512BW-NEXT: kshiftlw $7, %k2, %k1
	; AVX512BW-NEXT: kshiftlw $7, %k1, %k1
	; AVX512BW-NEXT: korw %k1, %k0, %k0			; AVX512BW-NEXT: korw %k1, %k0, %k0
	; AVX512BW-NEXT: vpmovm2w %k0, %zmm0			; AVX512BW-NEXT: vpmovm2w %k0, %zmm0
	; AVX512BW-NEXT: ## kill: %XMM0<def> %XMM0<kill> %ZMM0<kill>			; AVX512BW-NEXT: ## kill: %XMM0<def> %XMM0<kill> %ZMM0<kill>
	; AVX512BW-NEXT: vzeroupper			; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512DQ-LABEL: test18:			; AVX512DQ-LABEL: test18:
	; AVX512DQ: ## BB#0:			; AVX512DQ: ## BB#0:
	; AVX512DQ-NEXT: kmovw %edi, %k0			; AVX512DQ-NEXT: kmovw %edi, %k0
	; AVX512DQ-NEXT: kmovw %esi, %k1			; AVX512DQ-NEXT: kmovw %esi, %k1
	; AVX512DQ-NEXT: kshiftlw $7, %k1, %k2			; AVX512DQ-NEXT: kshiftlw $7, %k1, %k2
	; AVX512DQ-NEXT: kshiftrw $15, %k2, %k2			; AVX512DQ-NEXT: kshiftrw $15, %k2, %k2
	; AVX512DQ-NEXT: kmovw %k2, %eax
	; AVX512DQ-NEXT: kshiftlw $6, %k1, %k1			; AVX512DQ-NEXT: kshiftlw $6, %k1, %k1
	; AVX512DQ-NEXT: kshiftrw $15, %k1, %k1			; AVX512DQ-NEXT: kshiftrw $15, %k1, %k1
	; AVX512DQ-NEXT: kmovw %k1, %ecx
	; AVX512DQ-NEXT: vpmovm2q %k0, %zmm0			; AVX512DQ-NEXT: vpmovm2q %k0, %zmm0
	; AVX512DQ-NEXT: kmovw %ecx, %k0			; AVX512DQ-NEXT: vpmovm2q %k1, %zmm1
	; AVX512DQ-NEXT: vpmovm2q %k0, %zmm1
	; AVX512DQ-NEXT: vmovdqa64 {{.*#+}} zmm2 = [0,1,2,3,4,5,8,7]			; AVX512DQ-NEXT: vmovdqa64 {{.*#+}} zmm2 = [0,1,2,3,4,5,8,7]
	; AVX512DQ-NEXT: vpermi2q %zmm1, %zmm0, %zmm2			; AVX512DQ-NEXT: vpermi2q %zmm1, %zmm0, %zmm2
	; AVX512DQ-NEXT: vpmovq2m %zmm2, %k0			; AVX512DQ-NEXT: vpmovq2m %zmm2, %k0
	; AVX512DQ-NEXT: kshiftlb $1, %k0, %k0			; AVX512DQ-NEXT: kshiftlb $1, %k0, %k0
	; AVX512DQ-NEXT: kshiftrb $1, %k0, %k0			; AVX512DQ-NEXT: kshiftrb $1, %k0, %k0
	; AVX512DQ-NEXT: kmovw %eax, %k1			; AVX512DQ-NEXT: kshiftlb $7, %k2, %k1
	; AVX512DQ-NEXT: kshiftlb $7, %k1, %k1
	; AVX512DQ-NEXT: korb %k1, %k0, %k0			; AVX512DQ-NEXT: korb %k1, %k0, %k0
	; AVX512DQ-NEXT: vpmovm2q %k0, %zmm0			; AVX512DQ-NEXT: vpmovm2q %k0, %zmm0
	; AVX512DQ-NEXT: vpmovqw %zmm0, %xmm0			; AVX512DQ-NEXT: vpmovqw %zmm0, %xmm0
	; AVX512DQ-NEXT: vzeroupper			; AVX512DQ-NEXT: vzeroupper
	; AVX512DQ-NEXT: retq			; AVX512DQ-NEXT: retq
	%b = bitcast i8 %a to <8 x i1>			%b = bitcast i8 %a to <8 x i1>
	%b1 = bitcast i16 %y to <16 x i1>			%b1 = bitcast i16 %y to <16 x i1>
	%el1 = extractelement <16 x i1>%b1, i32 8			%el1 = extractelement <16 x i1>%b1, i32 8
	▲ Show 20 Lines • Show All 2,674 Lines • Show Last 20 Lines

test/CodeGen/X86/avx512bw-intrinsics-upgrade.ll

	Show First 20 Lines • Show All 1,999 Lines • ▼ Show 20 Lines
	; AVX512F-32-NEXT: vpmovm2b %k1, %zmm2			; AVX512F-32-NEXT: vpmovm2b %k1, %zmm2
	; AVX512F-32-NEXT: vpslldq {{.*#+}} xmm2 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm2[0,1,2]			; AVX512F-32-NEXT: vpslldq {{.*#+}} xmm2 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm2[0,1,2]
	; AVX512F-32-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm2			; AVX512F-32-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm2
	; AVX512F-32-NEXT: vpmovm2b %k0, %zmm0			; AVX512F-32-NEXT: vpmovm2b %k0, %zmm0
	; AVX512F-32-NEXT: vmovdqa {{.*#+}} ymm1 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0,255,255]			; AVX512F-32-NEXT: vmovdqa {{.*#+}} ymm1 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0,255,255]
	; AVX512F-32-NEXT: vpblendvb %ymm1, %ymm0, %ymm2, %ymm2			; AVX512F-32-NEXT: vpblendvb %ymm1, %ymm0, %ymm2, %ymm2
	; AVX512F-32-NEXT: vshufi64x2 {{.*#+}} zmm0 = zmm2[0,1,2,3],zmm0[4,5,6,7]			; AVX512F-32-NEXT: vshufi64x2 {{.*#+}} zmm0 = zmm2[0,1,2,3],zmm0[4,5,6,7]
	; AVX512F-32-NEXT: vpmovb2m %zmm0, %k0			; AVX512F-32-NEXT: vpmovb2m %zmm0, %k0
	; AVX512F-32-NEXT: movl %esi, %eax			; AVX512F-32-NEXT: movl %ecx, %eax
	; AVX512F-32-NEXT: shrl $30, %eax			; AVX512F-32-NEXT: shrl $30, %eax
	; AVX512F-32-NEXT: kmovd %eax, %k1			; AVX512F-32-NEXT: kmovd %eax, %k1
	; AVX512F-32-NEXT: vpmovm2b %k1, %zmm0			; AVX512F-32-NEXT: vpmovm2b %k1, %zmm0
	; AVX512F-32-NEXT: vpbroadcastw %xmm0, %xmm0			; AVX512F-32-NEXT: vpbroadcastw %xmm0, %xmm0
	; AVX512F-32-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm1			; AVX512F-32-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm1
	; AVX512F-32-NEXT: vpmovm2b %k0, %zmm0			; AVX512F-32-NEXT: vpmovm2b %k0, %zmm0
	; AVX512F-32-NEXT: vmovdqa {{.*#+}} ymm2 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0,255]			; AVX512F-32-NEXT: vmovdqa {{.*#+}} ymm2 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0,255]
	; AVX512F-32-NEXT: vpblendvb %ymm2, %ymm0, %ymm1, %ymm1			; AVX512F-32-NEXT: vpblendvb %ymm2, %ymm0, %ymm1, %ymm1
	; AVX512F-32-NEXT: vshufi64x2 {{.*#+}} zmm0 = zmm1[0,1,2,3],zmm0[4,5,6,7]			; AVX512F-32-NEXT: vshufi64x2 {{.*#+}} zmm0 = zmm1[0,1,2,3],zmm0[4,5,6,7]
	; AVX512F-32-NEXT: vpmovb2m %zmm0, %k0			; AVX512F-32-NEXT: vpmovb2m %zmm0, %k0
	; AVX512F-32-NEXT: movl %esi, %eax			; AVX512F-32-NEXT: movl %ecx, %eax
	; AVX512F-32-NEXT: shrl $31, %eax			; AVX512F-32-NEXT: shrl $31, %eax
	; AVX512F-32-NEXT: kmovd %eax, %k1			; AVX512F-32-NEXT: kmovd %eax, %k1
	; AVX512F-32-NEXT: vpmovm2b %k1, %zmm0			; AVX512F-32-NEXT: vpmovm2b %k1, %zmm0
	; AVX512F-32-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0]			; AVX512F-32-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0]
	; AVX512F-32-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0			; AVX512F-32-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0
	; AVX512F-32-NEXT: vpmovm2b %k0, %zmm1			; AVX512F-32-NEXT: vpmovm2b %k0, %zmm1
	; AVX512F-32-NEXT: vmovdqa {{.*#+}} ymm7 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0]			; AVX512F-32-NEXT: vmovdqa {{.*#+}} ymm7 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0]
	; AVX512F-32-NEXT: vpblendvb %ymm7, %ymm1, %ymm0, %ymm0			; AVX512F-32-NEXT: vpblendvb %ymm7, %ymm1, %ymm0, %ymm0
	▲ Show 20 Lines • Show All 858 Lines • ▼ Show 20 Lines
	; AVX512F-32-NEXT: vpmovm2b %k1, %zmm2			; AVX512F-32-NEXT: vpmovm2b %k1, %zmm2
	; AVX512F-32-NEXT: vpslldq {{.*#+}} xmm2 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm2[0,1,2]			; AVX512F-32-NEXT: vpslldq {{.*#+}} xmm2 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm2[0,1,2]
	; AVX512F-32-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm2			; AVX512F-32-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm2
	; AVX512F-32-NEXT: vpmovm2b %k0, %zmm0			; AVX512F-32-NEXT: vpmovm2b %k0, %zmm0
	; AVX512F-32-NEXT: vmovdqa {{.*#+}} ymm1 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0,255,255]			; AVX512F-32-NEXT: vmovdqa {{.*#+}} ymm1 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0,255,255]
	; AVX512F-32-NEXT: vpblendvb %ymm1, %ymm0, %ymm2, %ymm2			; AVX512F-32-NEXT: vpblendvb %ymm1, %ymm0, %ymm2, %ymm2
	; AVX512F-32-NEXT: vshufi64x2 {{.*#+}} zmm0 = zmm2[0,1,2,3],zmm0[4,5,6,7]			; AVX512F-32-NEXT: vshufi64x2 {{.*#+}} zmm0 = zmm2[0,1,2,3],zmm0[4,5,6,7]
	; AVX512F-32-NEXT: vpmovb2m %zmm0, %k0			; AVX512F-32-NEXT: vpmovb2m %zmm0, %k0
	; AVX512F-32-NEXT: movl %esi, %eax			; AVX512F-32-NEXT: movl %ecx, %eax
	; AVX512F-32-NEXT: shrl $30, %eax			; AVX512F-32-NEXT: shrl $30, %eax
	; AVX512F-32-NEXT: kmovd %eax, %k1			; AVX512F-32-NEXT: kmovd %eax, %k1
	; AVX512F-32-NEXT: vpmovm2b %k1, %zmm0			; AVX512F-32-NEXT: vpmovm2b %k1, %zmm0
	; AVX512F-32-NEXT: vpbroadcastw %xmm0, %xmm0			; AVX512F-32-NEXT: vpbroadcastw %xmm0, %xmm0
	; AVX512F-32-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm1			; AVX512F-32-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm1
	; AVX512F-32-NEXT: vpmovm2b %k0, %zmm0			; AVX512F-32-NEXT: vpmovm2b %k0, %zmm0
	; AVX512F-32-NEXT: vmovdqa {{.*#+}} ymm2 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0,255]			; AVX512F-32-NEXT: vmovdqa {{.*#+}} ymm2 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0,255]
	; AVX512F-32-NEXT: vpblendvb %ymm2, %ymm0, %ymm1, %ymm1			; AVX512F-32-NEXT: vpblendvb %ymm2, %ymm0, %ymm1, %ymm1
	; AVX512F-32-NEXT: vshufi64x2 {{.*#+}} zmm0 = zmm1[0,1,2,3],zmm0[4,5,6,7]			; AVX512F-32-NEXT: vshufi64x2 {{.*#+}} zmm0 = zmm1[0,1,2,3],zmm0[4,5,6,7]
	; AVX512F-32-NEXT: vpmovb2m %zmm0, %k0			; AVX512F-32-NEXT: vpmovb2m %zmm0, %k0
	; AVX512F-32-NEXT: movl %esi, %eax			; AVX512F-32-NEXT: movl %ecx, %eax
	; AVX512F-32-NEXT: shrl $31, %eax			; AVX512F-32-NEXT: shrl $31, %eax
	; AVX512F-32-NEXT: kmovd %eax, %k1			; AVX512F-32-NEXT: kmovd %eax, %k1
	; AVX512F-32-NEXT: vpmovm2b %k1, %zmm0			; AVX512F-32-NEXT: vpmovm2b %k1, %zmm0
	; AVX512F-32-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0]			; AVX512F-32-NEXT: vpslldq {{.*#+}} xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0]
	; AVX512F-32-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0			; AVX512F-32-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0
	; AVX512F-32-NEXT: vpmovm2b %k0, %zmm1			; AVX512F-32-NEXT: vpmovm2b %k0, %zmm1
	; AVX512F-32-NEXT: vmovdqa {{.*#+}} ymm7 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0]			; AVX512F-32-NEXT: vmovdqa {{.*#+}} ymm7 = [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0]
	; AVX512F-32-NEXT: vpblendvb %ymm7, %ymm1, %ymm0, %ymm0			; AVX512F-32-NEXT: vpblendvb %ymm7, %ymm1, %ymm0, %ymm0
	▲ Show 20 Lines • Show All 703 Lines • Show Last 20 Lines

test/CodeGen/X86/buildvec-insertvec.ll

	Show All 32 Lines

	; Verify that the DAGCombiner doesn't wrongly fold a build_vector into a			; Verify that the DAGCombiner doesn't wrongly fold a build_vector into a
	; blend with a zero vector if the build_vector contains negative zero.			; blend with a zero vector if the build_vector contains negative zero.

	define <4 x float> @test_negative_zero_1(<4 x float> %A) {			define <4 x float> @test_negative_zero_1(<4 x float> %A) {
	; SSE2-LABEL: test_negative_zero_1:			; SSE2-LABEL: test_negative_zero_1:
	; SSE2: # BB#0: # %entry			; SSE2: # BB#0: # %entry
	; SSE2-NEXT: movaps %xmm0, %xmm1			; SSE2-NEXT: movaps %xmm0, %xmm1
	; SSE2-NEXT: movhlps {{.*#+}} xmm1 = xmm1[1,1]			; SSE2-NEXT: movhlps {{.*#+}} xmm1 = xmm0[1],xmm1[1]
	; SSE2-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero			; SSE2-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero
	; SSE2-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]			; SSE2-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
	; SSE2-NEXT: xorps %xmm2, %xmm2			; SSE2-NEXT: xorps %xmm2, %xmm2
	; SSE2-NEXT: movss {{.*#+}} xmm2 = xmm1[0],xmm2[1,2,3]			; SSE2-NEXT: movss {{.*#+}} xmm2 = xmm1[0],xmm2[1,2,3]
	; SSE2-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]			; SSE2-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: test_negative_zero_1:			; SSE41-LABEL: test_negative_zero_1:
	▲ Show 20 Lines • Show All 500 Lines • Show Last 20 Lines

test/CodeGen/X86/combine-fcopysign.ll

	Show First 20 Lines • Show All 225 Lines • ▼ Show 20 Lines
	; copysign(x, fp_extend(y)) -> copysign(x, y)			; copysign(x, fp_extend(y)) -> copysign(x, y)
	define <4 x double> @combine_vec_fcopysign_fpext_sgn(<4 x double> %x, <4 x float> %y) {			define <4 x double> @combine_vec_fcopysign_fpext_sgn(<4 x double> %x, <4 x float> %y) {
	; SSE-LABEL: combine_vec_fcopysign_fpext_sgn:			; SSE-LABEL: combine_vec_fcopysign_fpext_sgn:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: movaps %xmm2, %xmm3			; SSE-NEXT: movaps %xmm2, %xmm3
	; SSE-NEXT: cvtss2sd %xmm2, %xmm4			; SSE-NEXT: cvtss2sd %xmm2, %xmm4
	; SSE-NEXT: movshdup {{.*#+}} xmm5 = xmm2[1,1,3,3]			; SSE-NEXT: movshdup {{.*#+}} xmm5 = xmm2[1,1,3,3]
	; SSE-NEXT: movaps %xmm2, %xmm6			; SSE-NEXT: movaps %xmm2, %xmm6
	; SSE-NEXT: movhlps {{.*#+}} xmm6 = xmm6[1,1]			; SSE-NEXT: movhlps {{.*#+}} xmm6 = xmm2[1],xmm6[1]
	; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[3,1,2,3]			; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[3,1],xmm2[2,3]
	; SSE-NEXT: movaps {{.*#+}} xmm7			; SSE-NEXT: movaps {{.*#+}} xmm7
	; SSE-NEXT: movaps %xmm0, %xmm2			; SSE-NEXT: movaps %xmm0, %xmm2
	; SSE-NEXT: andps %xmm7, %xmm2			; SSE-NEXT: andps %xmm7, %xmm2
	; SSE-NEXT: movaps {{.*#+}} xmm8 = [-0.000000e+00,-0.000000e+00]			; SSE-NEXT: movaps {{.*#+}} xmm8 = [-0.000000e+00,-0.000000e+00]
	; SSE-NEXT: andps %xmm8, %xmm4			; SSE-NEXT: andps %xmm8, %xmm4
	; SSE-NEXT: orps %xmm4, %xmm2			; SSE-NEXT: orps %xmm4, %xmm2
	; SSE-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]			; SSE-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]
	; SSE-NEXT: andps %xmm7, %xmm0			; SSE-NEXT: andps %xmm7, %xmm0
	; SSE-NEXT: xorps %xmm4, %xmm4			; SSE-NEXT: xorps %xmm4, %xmm4
	; SSE-NEXT: cvtss2sd %xmm5, %xmm4			; SSE-NEXT: cvtss2sd %xmm5, %xmm4
	; SSE-NEXT: andps %xmm8, %xmm4			; SSE-NEXT: andps %xmm8, %xmm4
	; SSE-NEXT: orps %xmm0, %xmm4			; SSE-NEXT: orps %xmm0, %xmm4
	; SSE-NEXT: unpcklpd {{.*#+}} xmm2 = xmm2[0],xmm4[0]			; SSE-NEXT: unpcklpd {{.*#+}} xmm2 = xmm2[0],xmm4[0]
	; SSE-NEXT: movaps %xmm1, %xmm0			; SSE-NEXT: movaps %xmm1, %xmm0
	; SSE-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]			; SSE-NEXT: movhlps {{.*#+}} xmm0 = xmm1[1],xmm0[1]
	; SSE-NEXT: andps %xmm7, %xmm0			; SSE-NEXT: andps %xmm7, %xmm0
	; SSE-NEXT: cvtss2sd %xmm3, %xmm3			; SSE-NEXT: cvtss2sd %xmm3, %xmm3
	; SSE-NEXT: andps %xmm8, %xmm3			; SSE-NEXT: andps %xmm8, %xmm3
	; SSE-NEXT: orps %xmm0, %xmm3			; SSE-NEXT: orps %xmm0, %xmm3
	; SSE-NEXT: andps %xmm7, %xmm1			; SSE-NEXT: andps %xmm7, %xmm1
	; SSE-NEXT: xorps %xmm0, %xmm0			; SSE-NEXT: xorps %xmm0, %xmm0
	; SSE-NEXT: cvtss2sd %xmm6, %xmm0			; SSE-NEXT: cvtss2sd %xmm6, %xmm0
	; SSE-NEXT: andps %xmm8, %xmm0			; SSE-NEXT: andps %xmm8, %xmm0
	Show All 30 Lines
	; SSE-NEXT: movshdup {{.*#+}} xmm6 = xmm3[1,1,3,3]			; SSE-NEXT: movshdup {{.*#+}} xmm6 = xmm3[1,1,3,3]
	; SSE-NEXT: andps %xmm5, %xmm6			; SSE-NEXT: andps %xmm5, %xmm6
	; SSE-NEXT: movhlps {{.*#+}} xmm1 = xmm1[1,1]			; SSE-NEXT: movhlps {{.*#+}} xmm1 = xmm1[1,1]
	; SSE-NEXT: cvtsd2ss %xmm1, %xmm1			; SSE-NEXT: cvtsd2ss %xmm1, %xmm1
	; SSE-NEXT: andps %xmm4, %xmm1			; SSE-NEXT: andps %xmm4, %xmm1
	; SSE-NEXT: orps %xmm6, %xmm1			; SSE-NEXT: orps %xmm6, %xmm1
	; SSE-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]			; SSE-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
	; SSE-NEXT: movaps %xmm3, %xmm1			; SSE-NEXT: movaps %xmm3, %xmm1
	; SSE-NEXT: movhlps {{.*#+}} xmm1 = xmm1[1,1]			; SSE-NEXT: movhlps {{.*#+}} xmm1 = xmm3[1],xmm1[1]
	; SSE-NEXT: andps %xmm5, %xmm1			; SSE-NEXT: andps %xmm5, %xmm1
	; SSE-NEXT: xorps %xmm6, %xmm6			; SSE-NEXT: xorps %xmm6, %xmm6
	; SSE-NEXT: cvtsd2ss %xmm2, %xmm6			; SSE-NEXT: cvtsd2ss %xmm2, %xmm6
	; SSE-NEXT: andps %xmm4, %xmm6			; SSE-NEXT: andps %xmm4, %xmm6
	; SSE-NEXT: orps %xmm1, %xmm6			; SSE-NEXT: orps %xmm1, %xmm6
	; SSE-NEXT: insertps {{.*#+}} xmm0 = xmm0[0,1],xmm6[0],xmm0[3]			; SSE-NEXT: insertps {{.*#+}} xmm0 = xmm0[0,1],xmm6[0],xmm0[3]
	; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[3,1,2,3]			; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[3,1,2,3]
	; SSE-NEXT: andps %xmm5, %xmm3			; SSE-NEXT: andps %xmm5, %xmm3
	Show All 26 Lines

test/CodeGen/X86/complex-fastmath.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.2 \| FileCheck %s --check-prefix=ALL --check-prefix=SSE		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.2 \| FileCheck %s --check-prefix=ALL --check-prefix=SSE
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX1		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX1
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2,+fma \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=FMA		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2,+fma \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=FMA
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+avx512vl \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=FMA		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+avx512vl \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=FMA

; PR31866		; PR31866
; complex float complex_square_f32(complex float x) {		; complex float complex_square_f32(complex float x) {
; return x*x;		; return x*x;
; }		; }

define <2 x float> @complex_square_f32(<2 x float>) #0 {		define <2 x float> @complex_square_f32(<2 x float>) #0 {
; SSE-LABEL: complex_square_f32:		; SSE-LABEL: complex_square_f32:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]		; SSE-NEXT: movshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
; SSE-NEXT: movaps %xmm0, %xmm2		; SSE-NEXT: movaps %xmm0, %xmm2
; SSE-NEXT: addss %xmm2, %xmm2		; SSE-NEXT: addss %xmm0, %xmm2
; SSE-NEXT: mulss %xmm1, %xmm2		; SSE-NEXT: mulss %xmm1, %xmm2
; SSE-NEXT: mulss %xmm0, %xmm0		; SSE-NEXT: mulss %xmm0, %xmm0
; SSE-NEXT: mulss %xmm1, %xmm1		; SSE-NEXT: mulss %xmm1, %xmm1
; SSE-NEXT: subss %xmm1, %xmm0		; SSE-NEXT: subss %xmm1, %xmm0
; SSE-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[2,3]		; SSE-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[2,3]
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX1-LABEL: complex_square_f32:		; AVX1-LABEL: complex_square_f32:
Show All 27 Lines	; FMA-NEXT: retq
%10 = insertelement <2 x float> %9, float %5, i32 1		%10 = insertelement <2 x float> %9, float %5, i32 1
ret <2 x float> %10		ret <2 x float> %10
}		}

define <2 x double> @complex_square_f64(<2 x double>) #0 {		define <2 x double> @complex_square_f64(<2 x double>) #0 {
; SSE-LABEL: complex_square_f64:		; SSE-LABEL: complex_square_f64:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movaps %xmm0, %xmm1		; SSE-NEXT: movaps %xmm0, %xmm1
; SSE-NEXT: movhlps {{.*#+}} xmm1 = xmm1[1,1]		; SSE-NEXT: movhlps {{.*#+}} xmm1 = xmm0[1],xmm1[1]
; SSE-NEXT: movaps %xmm0, %xmm2		; SSE-NEXT: movaps %xmm0, %xmm2
; SSE-NEXT: addsd %xmm2, %xmm2		; SSE-NEXT: addsd %xmm0, %xmm2
; SSE-NEXT: mulsd %xmm1, %xmm2		; SSE-NEXT: mulsd %xmm1, %xmm2
; SSE-NEXT: mulsd %xmm0, %xmm0		; SSE-NEXT: mulsd %xmm0, %xmm0
; SSE-NEXT: mulsd %xmm1, %xmm1		; SSE-NEXT: mulsd %xmm1, %xmm1
; SSE-NEXT: subsd %xmm1, %xmm0		; SSE-NEXT: subsd %xmm1, %xmm0
; SSE-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]		; SSE-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX1-LABEL: complex_square_f64:		; AVX1-LABEL: complex_square_f64:
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	; FMA-NEXT: retq
%14 = insertelement <2 x float> %13, float %9, i32 1		%14 = insertelement <2 x float> %13, float %9, i32 1
ret <2 x float> %14		ret <2 x float> %14
}		}

define <2 x double> @complex_mul_f64(<2 x double>, <2 x double>) #0 {		define <2 x double> @complex_mul_f64(<2 x double>, <2 x double>) #0 {
; SSE-LABEL: complex_mul_f64:		; SSE-LABEL: complex_mul_f64:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movaps %xmm0, %xmm2		; SSE-NEXT: movaps %xmm0, %xmm2
; SSE-NEXT: movhlps {{.*#+}} xmm2 = xmm2[1,1]		; SSE-NEXT: movhlps {{.*#+}} xmm2 = xmm0[1],xmm2[1]
; SSE-NEXT: movaps %xmm1, %xmm3		; SSE-NEXT: movaps %xmm1, %xmm3
; SSE-NEXT: movhlps {{.*#+}} xmm3 = xmm3[1,1]		; SSE-NEXT: movhlps {{.*#+}} xmm3 = xmm1[1],xmm3[1]
; SSE-NEXT: movaps %xmm3, %xmm4		; SSE-NEXT: movaps %xmm3, %xmm4
; SSE-NEXT: mulsd %xmm0, %xmm4		; SSE-NEXT: mulsd %xmm0, %xmm4
; SSE-NEXT: mulsd %xmm1, %xmm0		; SSE-NEXT: mulsd %xmm1, %xmm0
; SSE-NEXT: mulsd %xmm2, %xmm1		; SSE-NEXT: mulsd %xmm2, %xmm1
; SSE-NEXT: addsd %xmm4, %xmm1		; SSE-NEXT: addsd %xmm4, %xmm1
; SSE-NEXT: mulsd %xmm2, %xmm3		; SSE-NEXT: mulsd %xmm2, %xmm3
; SSE-NEXT: subsd %xmm3, %xmm0		; SSE-NEXT: subsd %xmm3, %xmm0
; SSE-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]		; SSE-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

test/CodeGen/X86/divide-by-constant.ll

	Show First 20 Lines • Show All 312 Lines • ▼ Show 20 Lines
	; X32-NEXT: calll __udivdi3			; X32-NEXT: calll __udivdi3
	; X32-NEXT: addl $28, %esp			; X32-NEXT: addl $28, %esp
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: PR23590:			; X64-LABEL: PR23590:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: movq %rdi, %rcx			; X64-NEXT: movq %rdi, %rcx
	; X64-NEXT: movabsq $6120523590596543007, %rdx # imm = 0x54F077C718E7C21F			; X64-NEXT: movabsq $6120523590596543007, %rdx # imm = 0x54F077C718E7C21F
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: mulq %rdx			; X64-NEXT: mulq %rdx
	; X64-NEXT: shrq $12, %rdx			; X64-NEXT: shrq $12, %rdx
	; X64-NEXT: imulq $12345, %rdx, %rax # imm = 0x3039			; X64-NEXT: imulq $12345, %rdx, %rax # imm = 0x3039
	; X64-NEXT: subq %rax, %rcx			; X64-NEXT: subq %rax, %rcx
	; X64-NEXT: movabsq $2635249153387078803, %rdx # imm = 0x2492492492492493			; X64-NEXT: movabsq $2635249153387078803, %rdx # imm = 0x2492492492492493
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: movq %rcx, %rax
	; X64-NEXT: mulq %rdx			; X64-NEXT: mulq %rdx
	; X64-NEXT: subq %rdx, %rcx			; X64-NEXT: subq %rdx, %rcx
	Show All 9 Lines

test/CodeGen/X86/fmaxnum.ll

Show All 12 Lines
declare <2 x double> @llvm.maxnum.v2f64(<2 x double>, <2 x double>)		declare <2 x double> @llvm.maxnum.v2f64(<2 x double>, <2 x double>)
declare <4 x double> @llvm.maxnum.v4f64(<4 x double>, <4 x double>)		declare <4 x double> @llvm.maxnum.v4f64(<4 x double>, <4 x double>)
declare <8 x double> @llvm.maxnum.v8f64(<8 x double>, <8 x double>)		declare <8 x double> @llvm.maxnum.v8f64(<8 x double>, <8 x double>)

; FIXME: As the vector tests show, the SSE run shouldn't need this many moves.		; FIXME: As the vector tests show, the SSE run shouldn't need this many moves.

; CHECK-LABEL: @test_fmaxf		; CHECK-LABEL: @test_fmaxf
; SSE: movaps %xmm0, %xmm2		; SSE: movaps %xmm0, %xmm2
; SSE-NEXT: cmpunordss %xmm2, %xmm2		; SSE-NEXT: cmpunordss %xmm0, %xmm2
; SSE-NEXT: movaps %xmm2, %xmm3		; SSE-NEXT: movaps %xmm2, %xmm3
; SSE-NEXT: andps %xmm1, %xmm3		; SSE-NEXT: andps %xmm1, %xmm3
; SSE-NEXT: maxss %xmm0, %xmm1		; SSE-NEXT: maxss %xmm0, %xmm1
; SSE-NEXT: andnps %xmm1, %xmm2		; SSE-NEXT: andnps %xmm1, %xmm2
; SSE-NEXT: orps %xmm3, %xmm2		; SSE-NEXT: orps %xmm3, %xmm2
; SSE-NEXT: movaps %xmm2, %xmm0		; SSE-NEXT: movaps %xmm2, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
Show All 12 Lines	define float @test_fmaxf_minsize(float %x, float %y) minsize {
%z = call float @fmaxf(float %x, float %y) readnone		%z = call float @fmaxf(float %x, float %y) readnone
ret float %z		ret float %z
}		}

; FIXME: As the vector tests show, the SSE run shouldn't need this many moves.		; FIXME: As the vector tests show, the SSE run shouldn't need this many moves.

; CHECK-LABEL: @test_fmax		; CHECK-LABEL: @test_fmax
; SSE: movapd %xmm0, %xmm2		; SSE: movapd %xmm0, %xmm2
; SSE-NEXT: cmpunordsd %xmm2, %xmm2		; SSE-NEXT: cmpunordsd %xmm0, %xmm2
; SSE-NEXT: movapd %xmm2, %xmm3		; SSE-NEXT: movapd %xmm2, %xmm3
; SSE-NEXT: andpd %xmm1, %xmm3		; SSE-NEXT: andpd %xmm1, %xmm3
; SSE-NEXT: maxsd %xmm0, %xmm1		; SSE-NEXT: maxsd %xmm0, %xmm1
; SSE-NEXT: andnpd %xmm1, %xmm2		; SSE-NEXT: andnpd %xmm1, %xmm2
; SSE-NEXT: orpd %xmm3, %xmm2		; SSE-NEXT: orpd %xmm3, %xmm2
; SSE-NEXT: movapd %xmm2, %xmm0		; SSE-NEXT: movapd %xmm2, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
Show All 10 Lines
; CHECK: callq fmaxl		; CHECK: callq fmaxl
define x86_fp80 @test_fmaxl(x86_fp80 %x, x86_fp80 %y) {		define x86_fp80 @test_fmaxl(x86_fp80 %x, x86_fp80 %y) {
%z = call x86_fp80 @fmaxl(x86_fp80 %x, x86_fp80 %y) readnone		%z = call x86_fp80 @fmaxl(x86_fp80 %x, x86_fp80 %y) readnone
ret x86_fp80 %z		ret x86_fp80 %z
}		}

; CHECK-LABEL: @test_intrinsic_fmaxf		; CHECK-LABEL: @test_intrinsic_fmaxf
; SSE: movaps %xmm0, %xmm2		; SSE: movaps %xmm0, %xmm2
; SSE-NEXT: cmpunordss %xmm2, %xmm2		; SSE-NEXT: cmpunordss %xmm0, %xmm2
; SSE-NEXT: movaps %xmm2, %xmm3		; SSE-NEXT: movaps %xmm2, %xmm3
; SSE-NEXT: andps %xmm1, %xmm3		; SSE-NEXT: andps %xmm1, %xmm3
; SSE-NEXT: maxss %xmm0, %xmm1		; SSE-NEXT: maxss %xmm0, %xmm1
; SSE-NEXT: andnps %xmm1, %xmm2		; SSE-NEXT: andnps %xmm1, %xmm2
; SSE-NEXT: orps %xmm3, %xmm2		; SSE-NEXT: orps %xmm3, %xmm2
; SSE-NEXT: movaps %xmm2, %xmm0		; SSE-NEXT: movaps %xmm2, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX: vmaxss %xmm0, %xmm1, %xmm2		; AVX: vmaxss %xmm0, %xmm1, %xmm2
; AVX-NEXT: vcmpunordss %xmm0, %xmm0, %xmm0		; AVX-NEXT: vcmpunordss %xmm0, %xmm0, %xmm0
; AVX-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0		; AVX-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
define float @test_intrinsic_fmaxf(float %x, float %y) {		define float @test_intrinsic_fmaxf(float %x, float %y) {
%z = call float @llvm.maxnum.f32(float %x, float %y) readnone		%z = call float @llvm.maxnum.f32(float %x, float %y) readnone
ret float %z		ret float %z
}		}


; CHECK-LABEL: @test_intrinsic_fmax		; CHECK-LABEL: @test_intrinsic_fmax
; SSE: movapd %xmm0, %xmm2		; SSE: movapd %xmm0, %xmm2
; SSE-NEXT: cmpunordsd %xmm2, %xmm2		; SSE-NEXT: cmpunordsd %xmm0, %xmm2
; SSE-NEXT: movapd %xmm2, %xmm3		; SSE-NEXT: movapd %xmm2, %xmm3
; SSE-NEXT: andpd %xmm1, %xmm3		; SSE-NEXT: andpd %xmm1, %xmm3
; SSE-NEXT: maxsd %xmm0, %xmm1		; SSE-NEXT: maxsd %xmm0, %xmm1
; SSE-NEXT: andnpd %xmm1, %xmm2		; SSE-NEXT: andnpd %xmm1, %xmm2
; SSE-NEXT: orpd %xmm3, %xmm2		; SSE-NEXT: orpd %xmm3, %xmm2
; SSE-NEXT: movapd %xmm2, %xmm0		; SSE-NEXT: movapd %xmm2, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

test/CodeGen/X86/fminnum.ll

	Show All 12 Lines
	declare <2 x double> @llvm.minnum.v2f64(<2 x double>, <2 x double>)			declare <2 x double> @llvm.minnum.v2f64(<2 x double>, <2 x double>)
	declare <4 x double> @llvm.minnum.v4f64(<4 x double>, <4 x double>)			declare <4 x double> @llvm.minnum.v4f64(<4 x double>, <4 x double>)
	declare <8 x double> @llvm.minnum.v8f64(<8 x double>, <8 x double>)			declare <8 x double> @llvm.minnum.v8f64(<8 x double>, <8 x double>)

	; FIXME: As the vector tests show, the SSE run shouldn't need this many moves.			; FIXME: As the vector tests show, the SSE run shouldn't need this many moves.

	; CHECK-LABEL: @test_fminf			; CHECK-LABEL: @test_fminf
	; SSE: movaps %xmm0, %xmm2			; SSE: movaps %xmm0, %xmm2
	; SSE-NEXT: cmpunordss %xmm2, %xmm2			; SSE-NEXT: cmpunordss %xmm0, %xmm2
	; SSE-NEXT: movaps %xmm2, %xmm3			; SSE-NEXT: movaps %xmm2, %xmm3
	; SSE-NEXT: andps %xmm1, %xmm3			; SSE-NEXT: andps %xmm1, %xmm3
	; SSE-NEXT: minss %xmm0, %xmm1			; SSE-NEXT: minss %xmm0, %xmm1
	; SSE-NEXT: andnps %xmm1, %xmm2			; SSE-NEXT: andnps %xmm1, %xmm2
	; SSE-NEXT: orps %xmm3, %xmm2			; SSE-NEXT: orps %xmm3, %xmm2
	; SSE-NEXT: movaps %xmm2, %xmm0			; SSE-NEXT: movaps %xmm2, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX: vminss %xmm0, %xmm1, %xmm2			; AVX: vminss %xmm0, %xmm1, %xmm2
	; AVX-NEXT: vcmpunordss %xmm0, %xmm0, %xmm0			; AVX-NEXT: vcmpunordss %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0			; AVX-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	define float @test_fminf(float %x, float %y) {			define float @test_fminf(float %x, float %y) {
	%z = call float @fminf(float %x, float %y) readnone			%z = call float @fminf(float %x, float %y) readnone
	ret float %z			ret float %z
	}			}

	; FIXME: As the vector tests show, the SSE run shouldn't need this many moves.			; FIXME: As the vector tests show, the SSE run shouldn't need this many moves.

	; CHECK-LABEL: @test_fmin			; CHECK-LABEL: @test_fmin
	; SSE: movapd %xmm0, %xmm2			; SSE: movapd %xmm0, %xmm2
	; SSE-NEXT: cmpunordsd %xmm2, %xmm2			; SSE-NEXT: cmpunordsd %xmm0, %xmm2
	; SSE-NEXT: movapd %xmm2, %xmm3			; SSE-NEXT: movapd %xmm2, %xmm3
	; SSE-NEXT: andpd %xmm1, %xmm3			; SSE-NEXT: andpd %xmm1, %xmm3
	; SSE-NEXT: minsd %xmm0, %xmm1			; SSE-NEXT: minsd %xmm0, %xmm1
	; SSE-NEXT: andnpd %xmm1, %xmm2			; SSE-NEXT: andnpd %xmm1, %xmm2
	; SSE-NEXT: orpd %xmm3, %xmm2			; SSE-NEXT: orpd %xmm3, %xmm2
	; SSE-NEXT: movapd %xmm2, %xmm0			; SSE-NEXT: movapd %xmm2, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	Show All 10 Lines
	; CHECK: callq fminl			; CHECK: callq fminl
	define x86_fp80 @test_fminl(x86_fp80 %x, x86_fp80 %y) {			define x86_fp80 @test_fminl(x86_fp80 %x, x86_fp80 %y) {
	%z = call x86_fp80 @fminl(x86_fp80 %x, x86_fp80 %y) readnone			%z = call x86_fp80 @fminl(x86_fp80 %x, x86_fp80 %y) readnone
	ret x86_fp80 %z			ret x86_fp80 %z
	}			}

	; CHECK-LABEL: @test_intrinsic_fminf			; CHECK-LABEL: @test_intrinsic_fminf
	; SSE: movaps %xmm0, %xmm2			; SSE: movaps %xmm0, %xmm2
	; SSE-NEXT: cmpunordss %xmm2, %xmm2			; SSE-NEXT: cmpunordss %xmm0, %xmm2
	; SSE-NEXT: movaps %xmm2, %xmm3			; SSE-NEXT: movaps %xmm2, %xmm3
	; SSE-NEXT: andps %xmm1, %xmm3			; SSE-NEXT: andps %xmm1, %xmm3
	; SSE-NEXT: minss %xmm0, %xmm1			; SSE-NEXT: minss %xmm0, %xmm1
	; SSE-NEXT: andnps %xmm1, %xmm2			; SSE-NEXT: andnps %xmm1, %xmm2
	; SSE-NEXT: orps %xmm3, %xmm2			; SSE-NEXT: orps %xmm3, %xmm2
	; SSE-NEXT: movaps %xmm2, %xmm0			; SSE-NEXT: movaps %xmm2, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX: vminss %xmm0, %xmm1, %xmm2			; AVX: vminss %xmm0, %xmm1, %xmm2
	; AVX-NEXT: vcmpunordss %xmm0, %xmm0, %xmm0			; AVX-NEXT: vcmpunordss %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0			; AVX-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	define float @test_intrinsic_fminf(float %x, float %y) {			define float @test_intrinsic_fminf(float %x, float %y) {
	%z = call float @llvm.minnum.f32(float %x, float %y) readnone			%z = call float @llvm.minnum.f32(float %x, float %y) readnone
	ret float %z			ret float %z
	}			}

	; CHECK-LABEL: @test_intrinsic_fmin			; CHECK-LABEL: @test_intrinsic_fmin
	; SSE: movapd %xmm0, %xmm2			; SSE: movapd %xmm0, %xmm2
	; SSE-NEXT: cmpunordsd %xmm2, %xmm2			; SSE-NEXT: cmpunordsd %xmm0, %xmm2
	; SSE-NEXT: movapd %xmm2, %xmm3			; SSE-NEXT: movapd %xmm2, %xmm3
	; SSE-NEXT: andpd %xmm1, %xmm3			; SSE-NEXT: andpd %xmm1, %xmm3
	; SSE-NEXT: minsd %xmm0, %xmm1			; SSE-NEXT: minsd %xmm0, %xmm1
	; SSE-NEXT: andnpd %xmm1, %xmm2			; SSE-NEXT: andnpd %xmm1, %xmm2
	; SSE-NEXT: orpd %xmm3, %xmm2			; SSE-NEXT: orpd %xmm3, %xmm2
	; SSE-NEXT: movapd %xmm2, %xmm0			; SSE-NEXT: movapd %xmm2, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

test/CodeGen/X86/fp128-i128.ll

Show First 20 Lines • Show All 221 Lines • ▼ Show 20 Lines
; df = u.e;		; df = u.e;
; return x + df;		; return x + df;
; }		; }
define fp128 @TestI128_4(fp128 %x) #0 {		define fp128 @TestI128_4(fp128 %x) #0 {
; CHECK-LABEL: TestI128_4:		; CHECK-LABEL: TestI128_4:
; CHECK: # BB#0: # %entry		; CHECK: # BB#0: # %entry
; CHECK-NEXT: subq $40, %rsp		; CHECK-NEXT: subq $40, %rsp
; CHECK-NEXT: movaps %xmm0, %xmm1		; CHECK-NEXT: movaps %xmm0, %xmm1
; CHECK-NEXT: movaps %xmm1, {{[0-9]+}}(%rsp)		; CHECK-NEXT: movaps %xmm0, {{[0-9]+}}(%rsp)
; CHECK-NEXT: movq {{[0-9]+}}(%rsp), %rax		; CHECK-NEXT: movq {{[0-9]+}}(%rsp), %rax
; CHECK-NEXT: movq %rax, {{[0-9]+}}(%rsp)		; CHECK-NEXT: movq %rax, {{[0-9]+}}(%rsp)
; CHECK-NEXT: movq $0, (%rsp)		; CHECK-NEXT: movq $0, (%rsp)
; CHECK-NEXT: movaps (%rsp), %xmm0		; CHECK-NEXT: movaps (%rsp), %xmm0
; CHECK-NEXT: callq __addtf3		; CHECK-NEXT: callq __addtf3
; CHECK-NEXT: addq $40, %rsp		; CHECK-NEXT: addq $40, %rsp
; CHECK-NEXT: retq		; CHECK-NEXT: retq
entry:		entry:
Show All 31 Lines	entry:
ret void		ret void
}		}

define fp128 @acosl(fp128 %x) #0 {		define fp128 @acosl(fp128 %x) #0 {
; CHECK-LABEL: acosl:		; CHECK-LABEL: acosl:
; CHECK: # BB#0: # %entry		; CHECK: # BB#0: # %entry
; CHECK-NEXT: subq $40, %rsp		; CHECK-NEXT: subq $40, %rsp
; CHECK-NEXT: movaps %xmm0, %xmm1		; CHECK-NEXT: movaps %xmm0, %xmm1
; CHECK-NEXT: movaps %xmm1, {{[0-9]+}}(%rsp)		; CHECK-NEXT: movaps %xmm0, {{[0-9]+}}(%rsp)
; CHECK-NEXT: movq {{[0-9]+}}(%rsp), %rax		; CHECK-NEXT: movq {{[0-9]+}}(%rsp), %rax
; CHECK-NEXT: movq %rax, {{[0-9]+}}(%rsp)		; CHECK-NEXT: movq %rax, {{[0-9]+}}(%rsp)
; CHECK-NEXT: movq $0, (%rsp)		; CHECK-NEXT: movq $0, (%rsp)
; CHECK-NEXT: movaps (%rsp), %xmm0		; CHECK-NEXT: movaps (%rsp), %xmm0
; CHECK-NEXT: callq __addtf3		; CHECK-NEXT: callq __addtf3
; CHECK-NEXT: addq $40, %rsp		; CHECK-NEXT: addq $40, %rsp
; CHECK-NEXT: retq		; CHECK-NEXT: retq
entry:		entry:
▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

test/CodeGen/X86/haddsub-2.ll

Show First 20 Lines • Show All 902 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%vecinit13 = insertelement <4 x i32> %vecinit9, i32 %sub12, i32 3		%vecinit13 = insertelement <4 x i32> %vecinit9, i32 %sub12, i32 3
ret <4 x i32> %vecinit13		ret <4 x i32> %vecinit13
}		}

define <4 x float> @not_a_hsub_2(<4 x float> %A, <4 x float> %B) {		define <4 x float> @not_a_hsub_2(<4 x float> %A, <4 x float> %B) {
; SSE-LABEL: not_a_hsub_2:		; SSE-LABEL: not_a_hsub_2:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movaps %xmm0, %xmm2		; SSE-NEXT: movaps %xmm0, %xmm2
; SSE-NEXT: movhlps {{.*#+}} xmm2 = xmm2[1,1]		; SSE-NEXT: movhlps {{.*#+}} xmm2 = xmm0[1],xmm2[1]
; SSE-NEXT: movaps %xmm0, %xmm3		; SSE-NEXT: movaps %xmm0, %xmm3
; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[3,1,2,3]		; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[3,1],xmm0[2,3]
; SSE-NEXT: subss %xmm3, %xmm2		; SSE-NEXT: subss %xmm3, %xmm2
; SSE-NEXT: movshdup {{.*#+}} xmm3 = xmm0[1,1,3,3]		; SSE-NEXT: movshdup {{.*#+}} xmm3 = xmm0[1,1,3,3]
; SSE-NEXT: subss %xmm3, %xmm0		; SSE-NEXT: subss %xmm3, %xmm0
; SSE-NEXT: movaps %xmm1, %xmm3		; SSE-NEXT: movaps %xmm1, %xmm3
; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[3,1,2,3]		; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[3,1],xmm1[2,3]
; SSE-NEXT: movaps %xmm1, %xmm4		; SSE-NEXT: movaps %xmm1, %xmm4
; SSE-NEXT: movhlps {{.*#+}} xmm4 = xmm4[1,1]		; SSE-NEXT: movhlps {{.*#+}} xmm4 = xmm1[1],xmm4[1]
; SSE-NEXT: subss %xmm4, %xmm3		; SSE-NEXT: subss %xmm4, %xmm3
; SSE-NEXT: movshdup {{.*#+}} xmm4 = xmm1[1,1,3,3]		; SSE-NEXT: movshdup {{.*#+}} xmm4 = xmm1[1,1,3,3]
; SSE-NEXT: subss %xmm4, %xmm1		; SSE-NEXT: subss %xmm4, %xmm1
; SSE-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1]		; SSE-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1]
; SSE-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]		; SSE-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
; SSE-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]		; SSE-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
Show All 31 Lines	; AVX-NEXT: retq
%vecinit13 = insertelement <4 x float> %vecinit9, float %sub12, i32 2		%vecinit13 = insertelement <4 x float> %vecinit9, float %sub12, i32 2
ret <4 x float> %vecinit13		ret <4 x float> %vecinit13
}		}

define <2 x double> @not_a_hsub_3(<2 x double> %A, <2 x double> %B) {		define <2 x double> @not_a_hsub_3(<2 x double> %A, <2 x double> %B) {
; SSE-LABEL: not_a_hsub_3:		; SSE-LABEL: not_a_hsub_3:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movaps %xmm1, %xmm2		; SSE-NEXT: movaps %xmm1, %xmm2
; SSE-NEXT: movhlps {{.*#+}} xmm2 = xmm2[1,1]		; SSE-NEXT: movhlps {{.*#+}} xmm2 = xmm1[1],xmm2[1]
; SSE-NEXT: subsd %xmm2, %xmm1		; SSE-NEXT: subsd %xmm2, %xmm1
; SSE-NEXT: movaps %xmm0, %xmm2		; SSE-NEXT: movaps %xmm0, %xmm2
; SSE-NEXT: movhlps {{.*#+}} xmm2 = xmm2[1,1]		; SSE-NEXT: movhlps {{.*#+}} xmm2 = xmm0[1],xmm2[1]
; SSE-NEXT: subsd %xmm0, %xmm2		; SSE-NEXT: subsd %xmm0, %xmm2
; SSE-NEXT: unpcklpd {{.*#+}} xmm2 = xmm2[0],xmm1[0]		; SSE-NEXT: unpcklpd {{.*#+}} xmm2 = xmm2[0],xmm1[0]
; SSE-NEXT: movapd %xmm2, %xmm0		; SSE-NEXT: movapd %xmm2, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: not_a_hsub_3:		; AVX-LABEL: not_a_hsub_3:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vpermilpd {{.*#+}} xmm2 = xmm1[1,0]		; AVX-NEXT: vpermilpd {{.*#+}} xmm2 = xmm1[1,0]
▲ Show 20 Lines • Show All 490 Lines • Show Last 20 Lines

test/CodeGen/X86/haddsub-undef.ll

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%vecinit = insertelement <4 x float> undef, float %add, i32 0		%vecinit = insertelement <4 x float> undef, float %add, i32 0
ret <4 x float> %vecinit		ret <4 x float> %vecinit
}		}

define <2 x double> @test5_undef(<2 x double> %a, <2 x double> %b) {		define <2 x double> @test5_undef(<2 x double> %a, <2 x double> %b) {
; SSE-LABEL: test5_undef:		; SSE-LABEL: test5_undef:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movaps %xmm0, %xmm1		; SSE-NEXT: movaps %xmm0, %xmm1
; SSE-NEXT: movhlps {{.*#+}} xmm1 = xmm1[1,1]		; SSE-NEXT: movhlps {{.*#+}} xmm1 = xmm0[1],xmm1[1]
; SSE-NEXT: addsd %xmm0, %xmm1		; SSE-NEXT: addsd %xmm0, %xmm1
; SSE-NEXT: movapd %xmm1, %xmm0		; SSE-NEXT: movapd %xmm1, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: test5_undef:		; AVX-LABEL: test5_undef:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vpermilpd {{.*#+}} xmm1 = xmm0[1,0]		; AVX-NEXT: vpermilpd {{.*#+}} xmm1 = xmm0[1,0]
; AVX-NEXT: vaddsd %xmm1, %xmm0, %xmm0		; AVX-NEXT: vaddsd %xmm1, %xmm0, %xmm0
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
}		}

define <4 x float> @test8_undef(<4 x float> %a, <4 x float> %b) {		define <4 x float> @test8_undef(<4 x float> %a, <4 x float> %b) {
; SSE-LABEL: test8_undef:		; SSE-LABEL: test8_undef:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]		; SSE-NEXT: movshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
; SSE-NEXT: addss %xmm0, %xmm1		; SSE-NEXT: addss %xmm0, %xmm1
; SSE-NEXT: movaps %xmm0, %xmm2		; SSE-NEXT: movaps %xmm0, %xmm2
; SSE-NEXT: movhlps {{.*#+}} xmm2 = xmm2[1,1]		; SSE-NEXT: movhlps {{.*#+}} xmm2 = xmm0[1],xmm2[1]
; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[3,1,2,3]		; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[3,1,2,3]
; SSE-NEXT: addss %xmm2, %xmm0		; SSE-NEXT: addss %xmm2, %xmm0
; SSE-NEXT: unpcklpd {{.*#+}} xmm1 = xmm1[0],xmm0[0]		; SSE-NEXT: unpcklpd {{.*#+}} xmm1 = xmm1[0],xmm0[0]
; SSE-NEXT: movapd %xmm1, %xmm0		; SSE-NEXT: movapd %xmm1, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: test8_undef:		; AVX-LABEL: test8_undef:
; AVX: # BB#0:		; AVX: # BB#0:
▲ Show 20 Lines • Show All 272 Lines • Show Last 20 Lines

test/CodeGen/X86/inline-asm-fpstack.ll

Show First 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	return:
ret void		ret void
}		}

; PR4459		; PR4459
; The return value from ceil must be duped before being consumed by asm.		; The return value from ceil must be duped before being consumed by asm.
; CHECK: testPR4459		; CHECK: testPR4459
; CHECK: ceil		; CHECK: ceil
; CHECK: fld %st(0)		; CHECK: fld %st(0)
; CHECK-NOT: fxch		; FIXME: This fxch is redundant.
		; CHECK: fxch %st(1)
; CHECK: fistpl		; CHECK: fistpl
; CHECK-NOT: fxch		; CHECK-NOT: fxch
; CHECK: fstpt		; CHECK: fstpt
; CHECK: test		; CHECK: test
define void @testPR4459(x86_fp80 %a) {		define void @testPR4459(x86_fp80 %a) {
entry:		entry:
%0 = call x86_fp80 @ceil(x86_fp80 %a)		%0 = call x86_fp80 @ceil(x86_fp80 %a)
call void asm sideeffect "fistpl $0", "{st},~{st}"( x86_fp80 %0)		call void asm sideeffect "fistpl $0", "{st},~{st}"( x86_fp80 %0)
▲ Show 20 Lines • Show All 263 Lines • Show Last 20 Lines

test/CodeGen/X86/ipra-local-linkage.ll

Show All 18 Lines	define void @bar(i32 %X) {
call void asm sideeffect "movl %r12d, $0", "{r15}~{r12}"(i32 %X)		call void asm sideeffect "movl %r12d, $0", "{r15}~{r12}"(i32 %X)
; As R15 is clobbered by foo() when IPRA is enabled value of R15 should be		; As R15 is clobbered by foo() when IPRA is enabled value of R15 should be
; saved if register containing orignal value is also getting clobbered		; saved if register containing orignal value is also getting clobbered
; and reloaded after foo(), here original value is loaded back into R15D after		; and reloaded after foo(), here original value is loaded back into R15D after
; call to foo.		; call to foo.
call void @foo()		call void @foo()
; CHECK-LABEL: bar:		; CHECK-LABEL: bar:
; CHECK: callq foo		; CHECK: callq foo
; CHECK-NEXT: movl %eax, %r15d		; CHECK-NEXT: movl %edi, %r15d
call void asm sideeffect "movl $0, %r12d", "{r15}~{r12}"(i32 %X)		call void asm sideeffect "movl $0, %r12d", "{r15}~{r12}"(i32 %X)
ret void		ret void
}		}

test/CodeGen/X86/localescape.ll

Show All 21 Lines	define void @print_framealloc_from_fp(i8* %fp) {
%b2 = getelementptr i32, i32* %b, i32 1		%b2 = getelementptr i32, i32* %b, i32 1
%b2.val = load i32, i32* %b2		%b2.val = load i32, i32* %b2
call i32 (i8, ...) @printf(i8 getelementptr ([10 x i8], [10 x i8]* @str, i32 0, i32 0), i32 %b2.val)		call i32 (i8, ...) @printf(i8 getelementptr ([10 x i8], [10 x i8]* @str, i32 0, i32 0), i32 %b2.val)
ret void		ret void
}		}

; X64-LABEL: print_framealloc_from_fp:		; X64-LABEL: print_framealloc_from_fp:
; X64: movq %rcx, %[[parent_fp:[a-z]+]]		; X64: movq %rcx, %[[parent_fp:[a-z]+]]
; X64: movl .Lalloc_func$frame_escape_0(%[[parent_fp]]), %edx		; X64: movl .Lalloc_func$frame_escape_0(%rcx), %edx
; X64: leaq {{.*}}(%rip), %[[str:[a-z]+]]		; X64: leaq {{.*}}(%rip), %[[str:[a-z]+]]
; X64: movq %[[str]], %rcx		; X64: movq %[[str]], %rcx
; X64: callq printf		; X64: callq printf
; X64: movl .Lalloc_func$frame_escape_1(%[[parent_fp]]), %edx		; X64: movl .Lalloc_func$frame_escape_1(%[[parent_fp]]), %edx
; X64: movq %[[str]], %rcx		; X64: movq %[[str]], %rcx
; X64: callq printf		; X64: callq printf
; X64: movl $42, .Lalloc_func$frame_escape_1(%[[parent_fp]])		; X64: movl $42, .Lalloc_func$frame_escape_1(%[[parent_fp]])
; X64: retq		; X64: retq
▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

test/CodeGen/X86/mul-i1024.ll

	Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
	; X32-NEXT: addl $32, %esp			; X32-NEXT: addl $32, %esp
	; X32-NEXT: leal {{[0-9]+}}(%esp), %eax			; X32-NEXT: leal {{[0-9]+}}(%esp), %eax
	; X32-NEXT: pushl %ebx			; X32-NEXT: pushl %ebx
	; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload			; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload
	; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload			; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload
	; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload			; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload
	; X32-NEXT: pushl %esi			; X32-NEXT: pushl %esi
	; X32-NEXT: movl %esi, %ebx			; X32-NEXT: movl %esi, %ebx
	; X32-NEXT: movl %ebx, {{[0-9]+}}(%esp) # 4-byte Spill			; X32-NEXT: movl %esi, {{[0-9]+}}(%esp) # 4-byte Spill
	; X32-NEXT: pushl %edi			; X32-NEXT: pushl %edi
	; X32-NEXT: movl %edi, {{[0-9]+}}(%esp) # 4-byte Spill			; X32-NEXT: movl %edi, {{[0-9]+}}(%esp) # 4-byte Spill
	; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload			; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload
	; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload			; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload
	; X32-NEXT: pushl %eax			; X32-NEXT: pushl %eax
	; X32-NEXT: calll __multi3			; X32-NEXT: calll __multi3
	; X32-NEXT: addl $32, %esp			; X32-NEXT: addl $32, %esp
	; X32-NEXT: leal {{[0-9]+}}(%esp), %eax			; X32-NEXT: leal {{[0-9]+}}(%esp), %eax
	▲ Show 20 Lines • Show All 576 Lines • ▼ Show 20 Lines
	; X32-NEXT: pushl %eax			; X32-NEXT: pushl %eax
	; X32-NEXT: calll __multi3			; X32-NEXT: calll __multi3
	; X32-NEXT: addl $32, %esp			; X32-NEXT: addl $32, %esp
	; X32-NEXT: leal {{[0-9]+}}(%esp), %eax			; X32-NEXT: leal {{[0-9]+}}(%esp), %eax
	; X32-NEXT: pushl $0			; X32-NEXT: pushl $0
	; X32-NEXT: pushl $0			; X32-NEXT: pushl $0
	; X32-NEXT: pushl %edi			; X32-NEXT: pushl %edi
	; X32-NEXT: movl %ebx, %esi			; X32-NEXT: movl %ebx, %esi
	; X32-NEXT: pushl %esi			; X32-NEXT: pushl %ebx
	; X32-NEXT: pushl $0			; X32-NEXT: pushl $0
	; X32-NEXT: pushl $0			; X32-NEXT: pushl $0
	; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload			; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload
	; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload			; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload
	; X32-NEXT: pushl %eax			; X32-NEXT: pushl %eax
	; X32-NEXT: calll __multi3			; X32-NEXT: calll __multi3
	; X32-NEXT: addl $32, %esp			; X32-NEXT: addl $32, %esp
	; X32-NEXT: leal {{[0-9]+}}(%esp), %eax			; X32-NEXT: leal {{[0-9]+}}(%esp), %eax
	▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
	; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload			; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload
	; X32-NEXT: pushl %eax			; X32-NEXT: pushl %eax
	; X32-NEXT: calll __multi3			; X32-NEXT: calll __multi3
	; X32-NEXT: addl $32, %esp			; X32-NEXT: addl $32, %esp
	; X32-NEXT: leal {{[0-9]+}}(%esp), %eax			; X32-NEXT: leal {{[0-9]+}}(%esp), %eax
	; X32-NEXT: pushl $0			; X32-NEXT: pushl $0
	; X32-NEXT: pushl $0			; X32-NEXT: pushl $0
	; X32-NEXT: pushl %edi			; X32-NEXT: pushl %edi
	; X32-NEXT: movl %edi, %ebx
	; X32-NEXT: pushl %esi			; X32-NEXT: pushl %esi
	; X32-NEXT: pushl $0			; X32-NEXT: pushl $0
	; X32-NEXT: pushl $0			; X32-NEXT: pushl $0
	; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload			; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload
	; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload			; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload
	; X32-NEXT: pushl %eax			; X32-NEXT: pushl %eax
	; X32-NEXT: calll __multi3			; X32-NEXT: calll __multi3
	; X32-NEXT: addl $32, %esp			; X32-NEXT: addl $32, %esp
	; X32-NEXT: leal {{[0-9]+}}(%esp), %eax			; X32-NEXT: leal {{[0-9]+}}(%esp), %eax
	; X32-NEXT: pushl $0			; X32-NEXT: pushl $0
	; X32-NEXT: pushl $0			; X32-NEXT: pushl $0
	; X32-NEXT: pushl %ebx			; X32-NEXT: pushl %edi
	; X32-NEXT: pushl %esi			; X32-NEXT: pushl %esi
	; X32-NEXT: pushl $0			; X32-NEXT: pushl $0
	; X32-NEXT: pushl $0			; X32-NEXT: pushl $0
	; X32-NEXT: movl {{[0-9]+}}(%esp), %esi # 4-byte Reload			; X32-NEXT: movl {{[0-9]+}}(%esp), %esi # 4-byte Reload
	; X32-NEXT: pushl %esi			; X32-NEXT: pushl %esi
	; X32-NEXT: movl {{[0-9]+}}(%esp), %ebx # 4-byte Reload			; X32-NEXT: movl {{[0-9]+}}(%esp), %ebx # 4-byte Reload
	; X32-NEXT: pushl %ebx			; X32-NEXT: pushl %ebx
	; X32-NEXT: pushl %eax			; X32-NEXT: pushl %eax
	▲ Show 20 Lines • Show All 438 Lines • ▼ Show 20 Lines
	; X32-NEXT: pushl %ebx			; X32-NEXT: pushl %ebx
	; X32-NEXT: pushl %eax			; X32-NEXT: pushl %eax
	; X32-NEXT: calll __multi3			; X32-NEXT: calll __multi3
	; X32-NEXT: addl $32, %esp			; X32-NEXT: addl $32, %esp
	; X32-NEXT: leal {{[0-9]+}}(%esp), %eax			; X32-NEXT: leal {{[0-9]+}}(%esp), %eax
	; X32-NEXT: pushl $0			; X32-NEXT: pushl $0
	; X32-NEXT: pushl $0			; X32-NEXT: pushl $0
	; X32-NEXT: movl %edi, %ebx			; X32-NEXT: movl %edi, %ebx
	; X32-NEXT: pushl %ebx			; X32-NEXT: pushl %edi
	; X32-NEXT: movl {{[0-9]+}}(%esp), %esi # 4-byte Reload			; X32-NEXT: movl {{[0-9]+}}(%esp), %esi # 4-byte Reload
	; X32-NEXT: pushl %esi			; X32-NEXT: pushl %esi
	; X32-NEXT: pushl $0			; X32-NEXT: pushl $0
	; X32-NEXT: pushl $0			; X32-NEXT: pushl $0
	; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload			; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload
	; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload			; X32-NEXT: pushl {{[0-9]+}}(%esp) # 4-byte Folded Reload
	; X32-NEXT: pushl %eax			; X32-NEXT: pushl %eax
	; X32-NEXT: calll __multi3			; X32-NEXT: calll __multi3
	▲ Show 20 Lines • Show All 1,060 Lines • ▼ Show 20 Lines
	; X32-NEXT: movl %ecx, {{[0-9]+}}(%esp) # 4-byte Spill			; X32-NEXT: movl %ecx, {{[0-9]+}}(%esp) # 4-byte Spill
	; X32-NEXT: adcl %edx, %ebx			; X32-NEXT: adcl %edx, %ebx
	; X32-NEXT: movl %ebx, {{[0-9]+}}(%esp) # 4-byte Spill			; X32-NEXT: movl %ebx, {{[0-9]+}}(%esp) # 4-byte Spill
	; X32-NEXT: movl {{[0-9]+}}(%esp), %edx # 4-byte Reload			; X32-NEXT: movl {{[0-9]+}}(%esp), %edx # 4-byte Reload
	; X32-NEXT: adcl {{[0-9]+}}(%esp), %edx # 4-byte Folded Reload			; X32-NEXT: adcl {{[0-9]+}}(%esp), %edx # 4-byte Folded Reload
	; X32-NEXT: movl %edx, {{[0-9]+}}(%esp) # 4-byte Spill			; X32-NEXT: movl %edx, {{[0-9]+}}(%esp) # 4-byte Spill
	; X32-NEXT: adcl %edi, %eax			; X32-NEXT: adcl %edi, %eax
	; X32-NEXT: movl %eax, %esi			; X32-NEXT: movl %eax, %esi
	; X32-NEXT: movl %esi, {{[0-9]+}}(%esp) # 4-byte Spill			; X32-NEXT: movl %eax, {{[0-9]+}}(%esp) # 4-byte Spill
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax # 4-byte Reload			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax # 4-byte Reload
	; X32-NEXT: addl %eax, {{[0-9]+}}(%esp) # 4-byte Folded Spill			; X32-NEXT: addl %eax, {{[0-9]+}}(%esp) # 4-byte Folded Spill
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax # 4-byte Reload			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax # 4-byte Reload
	; X32-NEXT: adcl %eax, {{[0-9]+}}(%esp) # 4-byte Folded Spill			; X32-NEXT: adcl %eax, {{[0-9]+}}(%esp) # 4-byte Folded Spill
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax # 4-byte Reload			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax # 4-byte Reload
	; X32-NEXT: adcl %eax, {{[0-9]+}}(%esp) # 4-byte Folded Spill			; X32-NEXT: adcl %eax, {{[0-9]+}}(%esp) # 4-byte Folded Spill
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax # 4-byte Reload			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax # 4-byte Reload
	; X32-NEXT: adcl %eax, {{[0-9]+}}(%esp) # 4-byte Folded Spill			; X32-NEXT: adcl %eax, {{[0-9]+}}(%esp) # 4-byte Folded Spill
	▲ Show 20 Lines • Show All 1,806 Lines • ▼ Show 20 Lines
	; X64-NEXT: movq %rbp, %rax			; X64-NEXT: movq %rbp, %rax
	; X64-NEXT: mulq %r8			; X64-NEXT: mulq %r8
	; X64-NEXT: movq %rax, %rbx			; X64-NEXT: movq %rax, %rbx
	; X64-NEXT: addq %rdi, %rbx			; X64-NEXT: addq %rdi, %rbx
	; X64-NEXT: movq %rdx, %rbp			; X64-NEXT: movq %rdx, %rbp
	; X64-NEXT: adcq $0, %rbp			; X64-NEXT: adcq $0, %rbp
	; X64-NEXT: addq %rcx, %rbx			; X64-NEXT: addq %rcx, %rbx
	; X64-NEXT: movq %rbx, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rbx, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rcx, %r11
	; X64-NEXT: adcq %rdi, %rbp			; X64-NEXT: adcq %rdi, %rbp
	; X64-NEXT: setb %bl			; X64-NEXT: setb %bl
	; X64-NEXT: movzbl %bl, %ebx			; X64-NEXT: movzbl %bl, %ebx
	; X64-NEXT: addq %rax, %rbp			; X64-NEXT: addq %rax, %rbp
	; X64-NEXT: adcq %rdx, %rbx			; X64-NEXT: adcq %rdx, %rbx
	; X64-NEXT: movq %r9, %rax			; X64-NEXT: movq %r9, %rax
	; X64-NEXT: mulq %r8			; X64-NEXT: mulq %r8
	; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rdx, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rdx, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %r11, %r12			; X64-NEXT: movq %rcx, %r12
	; X64-NEXT: movq %r11, %r8			; X64-NEXT: movq %rcx, %r8
	; X64-NEXT: addq %rax, %r12			; X64-NEXT: addq %rax, %r12
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: movq %rdi, %r9			; X64-NEXT: movq %rdi, %r9
	; X64-NEXT: movq %r9, (%rsp) # 8-byte Spill			; X64-NEXT: movq %rdi, (%rsp) # 8-byte Spill
	; X64-NEXT: adcq %rdx, %rax			; X64-NEXT: adcq %rdx, %rax
	; X64-NEXT: addq %rbp, %r12			; X64-NEXT: addq %rbp, %r12
	; X64-NEXT: movq %r12, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %r12, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: adcq %rbx, %rax			; X64-NEXT: adcq %rbx, %rax
	; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq (%rsi), %rax			; X64-NEXT: movq (%rsi), %rax
	; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: xorl %ebp, %ebp			; X64-NEXT: xorl %ebp, %ebp
	Show All 12 Lines
	; X64-NEXT: adcq %rcx, %rbp			; X64-NEXT: adcq %rcx, %rbp
	; X64-NEXT: movq %rcx, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rcx, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: setb %bl			; X64-NEXT: setb %bl
	; X64-NEXT: addq %rax, %rbp			; X64-NEXT: addq %rax, %rbp
	; X64-NEXT: movzbl %bl, %ebx			; X64-NEXT: movzbl %bl, %ebx
	; X64-NEXT: adcq %rdx, %rbx			; X64-NEXT: adcq %rdx, %rbx
	; X64-NEXT: movq 16(%rsi), %rax			; X64-NEXT: movq 16(%rsi), %rax
	; X64-NEXT: movq %rsi, %r13			; X64-NEXT: movq %rsi, %r13
	; X64-NEXT: movq %r13, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rsi, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: mulq %r11			; X64-NEXT: mulq %r11
	; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rdx, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rdx, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rdi, %r14			; X64-NEXT: movq %rdi, %r14
	; X64-NEXT: addq %rax, %r14			; X64-NEXT: addq %rax, %r14
	; X64-NEXT: movq %rcx, %r11			; X64-NEXT: movq %rcx, %r11
	; X64-NEXT: adcq %rdx, %r11			; X64-NEXT: adcq %rdx, %r11
	; X64-NEXT: addq %rbp, %r14			; X64-NEXT: addq %rbp, %r14
	; X64-NEXT: adcq %rbx, %r11			; X64-NEXT: adcq %rbx, %r11
	; X64-NEXT: movq %r8, %rax			; X64-NEXT: movq %r8, %rax
	; X64-NEXT: movq %r8, %rbp			; X64-NEXT: movq %r8, %rbp
	; X64-NEXT: movq %rbp, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %r8, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: addq %rdi, %rax			; X64-NEXT: addq %rdi, %rax
	; X64-NEXT: movq %r9, %rax			; X64-NEXT: movq %r9, %rax
	; X64-NEXT: adcq %rcx, %rax			; X64-NEXT: adcq %rcx, %rax
	; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq (%r10), %rax			; X64-NEXT: movq (%r10), %rax
	; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: xorl %r8d, %r8d			; X64-NEXT: xorl %r8d, %r8d
	; X64-NEXT: mulq %r8			; X64-NEXT: mulq %r8
	; X64-NEXT: movq %rdx, %rsi			; X64-NEXT: movq %rdx, %rsi
	; X64-NEXT: movq %rax, %rbx			; X64-NEXT: movq %rax, %rbx
	; X64-NEXT: addq %rdi, %rax			; X64-NEXT: addq %rdi, %rax
	; X64-NEXT: movq %rdi, %r9			; X64-NEXT: movq %rdx, %rax
	; X64-NEXT: movq %rsi, %rax
	; X64-NEXT: adcq %rcx, %rax			; X64-NEXT: adcq %rcx, %rax
	; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq 32(%r13), %rax			; X64-NEXT: movq 32(%r13), %rax
	; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: mulq %r8			; X64-NEXT: mulq %r8
	; X64-NEXT: xorl %r8d, %r8d			; X64-NEXT: xorl %r8d, %r8d
	; X64-NEXT: movq %rax, %r13			; X64-NEXT: movq %rax, %r13
	; X64-NEXT: movq %rdx, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rdx, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rbx, %rax			; X64-NEXT: movq %rbx, %rax
	; X64-NEXT: movq %rbx, %rcx			; X64-NEXT: movq %rbx, %rcx
	; X64-NEXT: addq %r13, %rax			; X64-NEXT: addq %r13, %rax
	; X64-NEXT: movq %rsi, %rax			; X64-NEXT: movq %rsi, %rax
	; X64-NEXT: adcq %rdx, %rax			; X64-NEXT: adcq %rdx, %rax
	; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rbp, %rax			; X64-NEXT: movq %rbp, %rax
	; X64-NEXT: addq %r9, %rax			; X64-NEXT: addq %rdi, %rax
	; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %r9, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rdi, %r9
				; X64-NEXT: movq %rdi, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rax # 8-byte Reload			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rax # 8-byte Reload
	; X64-NEXT: adcq %r15, %rax			; X64-NEXT: adcq %r15, %rax
	; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: adcq %r14, %r12			; X64-NEXT: adcq %r14, %r12
	; X64-NEXT: movq %r12, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %r12, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rax # 8-byte Reload			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rax # 8-byte Reload
	; X64-NEXT: adcq %r11, %rax			; X64-NEXT: adcq %r11, %rax
	; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %r11, %rdi			; X64-NEXT: movq %r11, %rdi
	; X64-NEXT: movq 8(%r10), %rax			; X64-NEXT: movq 8(%r10), %rax
	; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %r10, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %r10, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: mulq %r8			; X64-NEXT: mulq %r8
	; X64-NEXT: movq %rax, %r11			; X64-NEXT: movq %rax, %r11
	; X64-NEXT: addq %rsi, %r11			; X64-NEXT: addq %rsi, %r11
	; X64-NEXT: movq %rdx, %rbp			; X64-NEXT: movq %rdx, %rbp
	; X64-NEXT: adcq $0, %rbp			; X64-NEXT: adcq $0, %rbp
	; X64-NEXT: addq %rcx, %r11			; X64-NEXT: addq %rbx, %r11
	; X64-NEXT: adcq %rsi, %rbp			; X64-NEXT: adcq %rsi, %rbp
	; X64-NEXT: movq %rsi, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rsi, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: setb %bl			; X64-NEXT: setb %bl
	; X64-NEXT: addq %rax, %rbp			; X64-NEXT: addq %rax, %rbp
	; X64-NEXT: movzbl %bl, %ebx			; X64-NEXT: movzbl %bl, %ebx
	; X64-NEXT: adcq %rdx, %rbx			; X64-NEXT: adcq %rdx, %rbx
	; X64-NEXT: movq 16(%r10), %rax			; X64-NEXT: movq 16(%r10), %rax
	; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: mulq %r8			; X64-NEXT: mulq %r8
	; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rdx, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rdx, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rcx, %r8			; X64-NEXT: movq %rcx, %r8
	; X64-NEXT: addq %rax, %r8			; X64-NEXT: addq %rax, %r8
	; X64-NEXT: movq %rsi, %r10			; X64-NEXT: movq %rsi, %r10
	; X64-NEXT: adcq %rdx, %r10			; X64-NEXT: adcq %rdx, %r10
	; X64-NEXT: addq %rbp, %r8			; X64-NEXT: addq %rbp, %r8
	; X64-NEXT: movq %r8, %rax			; X64-NEXT: movq %r8, %rax
	; X64-NEXT: adcq %rbx, %r10			; X64-NEXT: adcq %rbx, %r10
	; X64-NEXT: movq %rcx, %rdx			; X64-NEXT: movq %rcx, %rdx
	; X64-NEXT: movq %rcx, %r12			; X64-NEXT: movq %rcx, %r12
	; X64-NEXT: movq %r12, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rcx, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: addq %r9, %rdx			; X64-NEXT: addq %r9, %rdx
	; X64-NEXT: movq %rdx, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rdx, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %r11, %r8			; X64-NEXT: movq %r11, %r8
	; X64-NEXT: adcq %r8, %r15			; X64-NEXT: adcq %r11, %r15
	; X64-NEXT: movq %r15, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %r15, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: adcq %rax, %r14			; X64-NEXT: adcq %rax, %r14
	; X64-NEXT: movq %r14, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %r14, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rax, %rcx			; X64-NEXT: movq %rax, %rcx
	; X64-NEXT: adcq %r10, %rdi			; X64-NEXT: adcq %r10, %rdi
	; X64-NEXT: movq %rdi, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rdi, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rsi # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rsi # 8-byte Reload
	; X64-NEXT: movq 40(%rsi), %rax			; X64-NEXT: movq 40(%rsi), %rax
	▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %r15 # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %r15 # 8-byte Reload
	; X64-NEXT: addq {{[0-9]+}}(%rsp), %r15 # 8-byte Folded Reload			; X64-NEXT: addq {{[0-9]+}}(%rsp), %r15 # 8-byte Folded Reload
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %r12 # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %r12 # 8-byte Reload
	; X64-NEXT: adcq {{[0-9]+}}(%rsp), %r12 # 8-byte Folded Reload			; X64-NEXT: adcq {{[0-9]+}}(%rsp), %r12 # 8-byte Folded Reload
	; X64-NEXT: addq %rax, %r15			; X64-NEXT: addq %rax, %r15
	; X64-NEXT: adcq %rdx, %r12			; X64-NEXT: adcq %rdx, %r12
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx # 8-byte Reload
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: movq %rcx, %rax
	; X64-NEXT: movq %r10, %rbp			; X64-NEXT: mulq %r10
	; X64-NEXT: mulq %rbp
	; X64-NEXT: movq %rdx, %rsi			; X64-NEXT: movq %rdx, %rsi
	; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdi # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdi # 8-byte Reload
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: mulq %rbp			; X64-NEXT: mulq %r10
	; X64-NEXT: movq %rdx, %rbp			; X64-NEXT: movq %rdx, %rbp
	; X64-NEXT: movq %rax, %rbx			; X64-NEXT: movq %rax, %rbx
	; X64-NEXT: addq %rsi, %rbx			; X64-NEXT: addq %rsi, %rbx
	; X64-NEXT: adcq $0, %rbp			; X64-NEXT: adcq $0, %rbp
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: movq %rcx, %rax
	; X64-NEXT: movq %rcx, %r10			; X64-NEXT: movq %rcx, %r10
	; X64-NEXT: mulq %r11			; X64-NEXT: mulq %r11
	; X64-NEXT: movq %rdx, %rcx			; X64-NEXT: movq %rdx, %rcx
	Show All 10 Lines
	; X64-NEXT: adcq %rax, %r13			; X64-NEXT: adcq %rax, %r13
	; X64-NEXT: addq -{{[0-9]+}}(%rsp), %rsi # 8-byte Folded Reload			; X64-NEXT: addq -{{[0-9]+}}(%rsp), %rsi # 8-byte Folded Reload
	; X64-NEXT: adcq -{{[0-9]+}}(%rsp), %r13 # 8-byte Folded Reload			; X64-NEXT: adcq -{{[0-9]+}}(%rsp), %r13 # 8-byte Folded Reload
	; X64-NEXT: addq %r9, %rsi			; X64-NEXT: addq %r9, %rsi
	; X64-NEXT: adcq %r8, %r13			; X64-NEXT: adcq %r8, %r13
	; X64-NEXT: adcq $0, %r15			; X64-NEXT: adcq $0, %r15
	; X64-NEXT: adcq $0, %r12			; X64-NEXT: adcq $0, %r12
	; X64-NEXT: movq %r10, %rbx			; X64-NEXT: movq %r10, %rbx
	; X64-NEXT: movq %rbx, %rax			; X64-NEXT: movq %r10, %rax
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %r11 # 8-byte Reload			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %r11 # 8-byte Reload
	; X64-NEXT: mulq %r11			; X64-NEXT: mulq %r11
	; X64-NEXT: movq %rdx, %rcx			; X64-NEXT: movq %rdx, %rcx
	; X64-NEXT: movq %rax, %r10			; X64-NEXT: movq %rax, %r10
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: movq %rdi, %r9			; X64-NEXT: movq %rdi, %r9
	; X64-NEXT: mulq %r11			; X64-NEXT: mulq %r11
	; X64-NEXT: movq %rdx, %rdi			; X64-NEXT: movq %rdx, %rdi
	; X64-NEXT: movq %rax, %rbp			; X64-NEXT: movq %rax, %rbp
	; X64-NEXT: addq %rcx, %rbp			; X64-NEXT: addq %rcx, %rbp
	; X64-NEXT: adcq $0, %rdi			; X64-NEXT: adcq $0, %rdi
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax # 8-byte Reload
	; X64-NEXT: movq 24(%rax), %rcx			; X64-NEXT: movq 24(%rax), %rcx
	; X64-NEXT: movq %rbx, %rax			; X64-NEXT: movq %rbx, %rax
	; X64-NEXT: mulq %rcx			; X64-NEXT: mulq %rcx
	; X64-NEXT: movq %rcx, %rbx			; X64-NEXT: movq %rcx, %rbx
	; X64-NEXT: movq %rbx, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rcx, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rdx, %rcx			; X64-NEXT: movq %rdx, %rcx
	; X64-NEXT: movq %rax, %r8			; X64-NEXT: movq %rax, %r8
	; X64-NEXT: addq %rbp, %r8			; X64-NEXT: addq %rbp, %r8
	; X64-NEXT: adcq %rdi, %rcx			; X64-NEXT: adcq %rdi, %rcx
	; X64-NEXT: setb %dil			; X64-NEXT: setb %dil
	; X64-NEXT: movq %r9, %rax			; X64-NEXT: movq %r9, %rax
	; X64-NEXT: mulq %rbx			; X64-NEXT: mulq %rbx
	; X64-NEXT: addq %rcx, %rax			; X64-NEXT: addq %rcx, %rax
	Show All 14 Lines
	; X64-NEXT: adcq $0, %rbp			; X64-NEXT: adcq $0, %rbp
	; X64-NEXT: adcq $0, %rbx			; X64-NEXT: adcq $0, %rbx
	; X64-NEXT: addq %r15, %rbp			; X64-NEXT: addq %r15, %rbp
	; X64-NEXT: adcq %r12, %rbx			; X64-NEXT: adcq %r12, %rbx
	; X64-NEXT: setb %r15b			; X64-NEXT: setb %r15b
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx # 8-byte Reload
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: movq %rcx, %rax
	; X64-NEXT: movq %r11, %rsi			; X64-NEXT: movq %r11, %rsi
	; X64-NEXT: mulq %rsi			; X64-NEXT: mulq %r11
	; X64-NEXT: movq %rdx, %r11			; X64-NEXT: movq %rdx, %r11
	; X64-NEXT: movq %rax, %r13			; X64-NEXT: movq %rax, %r13
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %r12 # 8-byte Reload			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %r12 # 8-byte Reload
	; X64-NEXT: movq %r12, %rax			; X64-NEXT: movq %r12, %rax
	; X64-NEXT: mulq %rsi			; X64-NEXT: mulq %rsi
	; X64-NEXT: movq %rdx, %rsi			; X64-NEXT: movq %rdx, %rsi
	; X64-NEXT: movq %rax, %rdi			; X64-NEXT: movq %rax, %rdi
	; X64-NEXT: addq %r11, %rdi			; X64-NEXT: addq %r11, %rdi
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %r8 # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %r8 # 8-byte Reload
	; X64-NEXT: addq {{[0-9]+}}(%rsp), %r8 # 8-byte Folded Reload			; X64-NEXT: addq {{[0-9]+}}(%rsp), %r8 # 8-byte Folded Reload
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %r10 # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %r10 # 8-byte Reload
	; X64-NEXT: adcq {{[0-9]+}}(%rsp), %r10 # 8-byte Folded Reload			; X64-NEXT: adcq {{[0-9]+}}(%rsp), %r10 # 8-byte Folded Reload
	; X64-NEXT: addq %rax, %r8			; X64-NEXT: addq %rax, %r8
	; X64-NEXT: adcq %rdx, %r10			; X64-NEXT: adcq %rdx, %r10
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx # 8-byte Reload
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: movq %rcx, %rax
	; X64-NEXT: movq %r11, %rbp			; X64-NEXT: mulq %r11
	; X64-NEXT: mulq %rbp
	; X64-NEXT: movq %rdx, %rdi			; X64-NEXT: movq %rdx, %rdi
	; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rsi # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rsi # 8-byte Reload
	; X64-NEXT: movq %rsi, %rax			; X64-NEXT: movq %rsi, %rax
	; X64-NEXT: mulq %rbp			; X64-NEXT: mulq %r11
	; X64-NEXT: movq %rdx, %rbp			; X64-NEXT: movq %rdx, %rbp
	; X64-NEXT: movq %rax, %rbx			; X64-NEXT: movq %rax, %rbx
	; X64-NEXT: addq %rdi, %rbx			; X64-NEXT: addq %rdi, %rbx
	; X64-NEXT: adcq $0, %rbp			; X64-NEXT: adcq $0, %rbp
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: movq %rcx, %rax
	; X64-NEXT: movq %rcx, %r11			; X64-NEXT: movq %rcx, %r11
	; X64-NEXT: mulq %r9			; X64-NEXT: mulq %r9
	; X64-NEXT: movq %rdx, %rdi			; X64-NEXT: movq %rdx, %rdi
	▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines
	; X64-NEXT: adcq $0, -{{[0-9]+}}(%rsp) # 8-byte Folded Spill			; X64-NEXT: adcq $0, -{{[0-9]+}}(%rsp) # 8-byte Folded Spill
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdi # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdi # 8-byte Reload
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx # 8-byte Reload
	; X64-NEXT: mulq %rcx			; X64-NEXT: mulq %rcx
	; X64-NEXT: movq %rdx, %rsi			; X64-NEXT: movq %rdx, %rsi
	; X64-NEXT: movq %rax, %r14			; X64-NEXT: movq %rax, %r14
	; X64-NEXT: movq %r8, %rbp			; X64-NEXT: movq %r8, %rbp
	; X64-NEXT: movq %rbp, %rax			; X64-NEXT: movq %r8, %rax
	; X64-NEXT: mulq %rcx			; X64-NEXT: mulq %rcx
	; X64-NEXT: movq %rcx, %r11			; X64-NEXT: movq %rcx, %r11
	; X64-NEXT: movq %rdx, %rbx			; X64-NEXT: movq %rdx, %rbx
	; X64-NEXT: movq %rax, %rcx			; X64-NEXT: movq %rax, %rcx
	; X64-NEXT: addq %rsi, %rcx			; X64-NEXT: addq %rsi, %rcx
	; X64-NEXT: adcq $0, %rbx			; X64-NEXT: adcq $0, %rbx
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdi # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdi # 8-byte Reload
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; X64-NEXT: adcq %rax, %r15			; X64-NEXT: adcq %rax, %r15
	; X64-NEXT: addq {{[0-9]+}}(%rsp), %rbx # 8-byte Folded Reload			; X64-NEXT: addq {{[0-9]+}}(%rsp), %rbx # 8-byte Folded Reload
	; X64-NEXT: adcq {{[0-9]+}}(%rsp), %r15 # 8-byte Folded Reload			; X64-NEXT: adcq {{[0-9]+}}(%rsp), %r15 # 8-byte Folded Reload
	; X64-NEXT: addq %r14, %rbx			; X64-NEXT: addq %r14, %rbx
	; X64-NEXT: adcq %r8, %r15			; X64-NEXT: adcq %r8, %r15
	; X64-NEXT: adcq $0, %r9			; X64-NEXT: adcq $0, %r9
	; X64-NEXT: adcq $0, %r10			; X64-NEXT: adcq $0, %r10
	; X64-NEXT: movq %rbp, %rsi			; X64-NEXT: movq %rbp, %rsi
	; X64-NEXT: movq %rsi, %rax			; X64-NEXT: movq %rbp, %rax
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rcx # 8-byte Reload			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rcx # 8-byte Reload
	; X64-NEXT: mulq %rcx			; X64-NEXT: mulq %rcx
	; X64-NEXT: movq %rdx, %r14			; X64-NEXT: movq %rdx, %r14
	; X64-NEXT: movq %rax, %r12			; X64-NEXT: movq %rax, %r12
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: movq %rdi, %r8			; X64-NEXT: movq %rdi, %r8
	; X64-NEXT: mulq %rcx			; X64-NEXT: mulq %rcx
	; X64-NEXT: movq %rdx, %rbp			; X64-NEXT: movq %rdx, %rbp
	Show All 40 Lines
	; X64-NEXT: movq %r10, %rax			; X64-NEXT: movq %r10, %rax
	; X64-NEXT: mulq %rdi			; X64-NEXT: mulq %rdi
	; X64-NEXT: movq %rdx, %r15			; X64-NEXT: movq %rdx, %r15
	; X64-NEXT: movq %rax, %rbx			; X64-NEXT: movq %rax, %rbx
	; X64-NEXT: addq %r9, %rbx			; X64-NEXT: addq %r9, %rbx
	; X64-NEXT: adcq $0, %r15			; X64-NEXT: adcq $0, %r15
	; X64-NEXT: movq %rbp, %rax			; X64-NEXT: movq %rbp, %rax
	; X64-NEXT: movq %r8, %rdi			; X64-NEXT: movq %r8, %rdi
	; X64-NEXT: movq %rdi, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %r8, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: mulq %rdi			; X64-NEXT: mulq %r8
	; X64-NEXT: movq %rdx, %r9			; X64-NEXT: movq %rdx, %r9
	; X64-NEXT: movq %rax, %r8			; X64-NEXT: movq %rax, %r8
	; X64-NEXT: addq %rbx, %r8			; X64-NEXT: addq %rbx, %r8
	; X64-NEXT: adcq %r15, %r9			; X64-NEXT: adcq %r15, %r9
	; X64-NEXT: setb %bl			; X64-NEXT: setb %bl
	; X64-NEXT: movq %r10, %rax			; X64-NEXT: movq %r10, %rax
	; X64-NEXT: mulq %rdi			; X64-NEXT: mulq %rdi
	; X64-NEXT: addq %r9, %rax			; X64-NEXT: addq %r9, %rax
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; X64-NEXT: addq {{[0-9]+}}(%rsp), %r8 # 8-byte Folded Reload			; X64-NEXT: addq {{[0-9]+}}(%rsp), %r8 # 8-byte Folded Reload
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rcx # 8-byte Reload			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rcx # 8-byte Reload
	; X64-NEXT: adcq {{[0-9]+}}(%rsp), %rcx # 8-byte Folded Reload			; X64-NEXT: adcq {{[0-9]+}}(%rsp), %rcx # 8-byte Folded Reload
	; X64-NEXT: addq %rax, %r8			; X64-NEXT: addq %rax, %r8
	; X64-NEXT: adcq %rdx, %rcx			; X64-NEXT: adcq %rdx, %rcx
	; X64-NEXT: movq %rcx, %r14			; X64-NEXT: movq %rcx, %r14
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx # 8-byte Reload
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: movq %rcx, %rax
	; X64-NEXT: movq %r10, %rdi			; X64-NEXT: mulq %r10
	; X64-NEXT: mulq %rdi
	; X64-NEXT: movq %rdx, %r11			; X64-NEXT: movq %rdx, %r11
	; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rsi # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rsi # 8-byte Reload
	; X64-NEXT: movq %rsi, %rax			; X64-NEXT: movq %rsi, %rax
	; X64-NEXT: mulq %rdi			; X64-NEXT: mulq %r10
	; X64-NEXT: movq %rdx, %rdi			; X64-NEXT: movq %rdx, %rdi
	; X64-NEXT: movq %rax, %rbx			; X64-NEXT: movq %rax, %rbx
	; X64-NEXT: addq %r11, %rbx			; X64-NEXT: addq %r11, %rbx
	; X64-NEXT: adcq $0, %rdi			; X64-NEXT: adcq $0, %rdi
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: movq %rcx, %rax
	; X64-NEXT: movq %rcx, %r13			; X64-NEXT: movq %rcx, %r13
	; X64-NEXT: mulq %r9			; X64-NEXT: mulq %r9
	; X64-NEXT: movq %rdx, %rcx			; X64-NEXT: movq %rdx, %rcx
	Show All 11 Lines
	; X64-NEXT: addq {{[0-9]+}}(%rsp), %rdi # 8-byte Folded Reload			; X64-NEXT: addq {{[0-9]+}}(%rsp), %rdi # 8-byte Folded Reload
	; X64-NEXT: adcq {{[0-9]+}}(%rsp), %r11 # 8-byte Folded Reload			; X64-NEXT: adcq {{[0-9]+}}(%rsp), %r11 # 8-byte Folded Reload
	; X64-NEXT: addq {{[0-9]+}}(%rsp), %rdi # 8-byte Folded Reload			; X64-NEXT: addq {{[0-9]+}}(%rsp), %rdi # 8-byte Folded Reload
	; X64-NEXT: adcq %r12, %r11			; X64-NEXT: adcq %r12, %r11
	; X64-NEXT: adcq $0, %r8			; X64-NEXT: adcq $0, %r8
	; X64-NEXT: movq %r8, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %r8, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: adcq $0, %r14			; X64-NEXT: adcq $0, %r14
	; X64-NEXT: movq %r14, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %r14, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %r13, %rbx			; X64-NEXT: movq %r13, %rax
	; X64-NEXT: movq %rbx, %rax
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rcx # 8-byte Reload			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rcx # 8-byte Reload
	; X64-NEXT: mulq %rcx			; X64-NEXT: mulq %rcx
	; X64-NEXT: movq %rdx, %r8			; X64-NEXT: movq %rdx, %r8
	; X64-NEXT: movq %rax, %r12			; X64-NEXT: movq %rax, %r12
	; X64-NEXT: movq %rsi, %rax			; X64-NEXT: movq %rsi, %rax
	; X64-NEXT: movq %rsi, %r9			; X64-NEXT: movq %rsi, %r9
	; X64-NEXT: mulq %rcx			; X64-NEXT: mulq %rcx
	; X64-NEXT: movq %rcx, %r10			; X64-NEXT: movq %rcx, %r10
	; X64-NEXT: movq %rdx, %rsi			; X64-NEXT: movq %rdx, %rsi
	; X64-NEXT: movq %rax, %rcx			; X64-NEXT: movq %rax, %rcx
	; X64-NEXT: addq %r8, %rcx			; X64-NEXT: addq %r8, %rcx
	; X64-NEXT: adcq $0, %rsi			; X64-NEXT: adcq $0, %rsi
	; X64-NEXT: movq %rbx, %rax			; X64-NEXT: movq %r13, %rax
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %r13 # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %r13 # 8-byte Reload
	; X64-NEXT: mulq %r13			; X64-NEXT: mulq %r13
	; X64-NEXT: movq %rdx, %rbx			; X64-NEXT: movq %rdx, %rbx
	; X64-NEXT: addq %rcx, %rax			; X64-NEXT: addq %rcx, %rax
	; X64-NEXT: movq %rax, %r8			; X64-NEXT: movq %rax, %r8
	; X64-NEXT: adcq %rsi, %rbx			; X64-NEXT: adcq %rsi, %rbx
	; X64-NEXT: setb %cl			; X64-NEXT: setb %cl
	; X64-NEXT: movq %r9, %rax			; X64-NEXT: movq %r9, %rax
	Show All 17 Lines
	; X64-NEXT: adcq $0, %rcx			; X64-NEXT: adcq $0, %rcx
	; X64-NEXT: addq -{{[0-9]+}}(%rsp), %rsi # 8-byte Folded Reload			; X64-NEXT: addq -{{[0-9]+}}(%rsp), %rsi # 8-byte Folded Reload
	; X64-NEXT: movq %rsi, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rsi, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: adcq -{{[0-9]+}}(%rsp), %rcx # 8-byte Folded Reload			; X64-NEXT: adcq -{{[0-9]+}}(%rsp), %rcx # 8-byte Folded Reload
	; X64-NEXT: movq %rcx, (%rsp) # 8-byte Spill			; X64-NEXT: movq %rcx, (%rsp) # 8-byte Spill
	; X64-NEXT: setb -{{[0-9]+}}(%rsp) # 1-byte Folded Spill			; X64-NEXT: setb -{{[0-9]+}}(%rsp) # 1-byte Folded Spill
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rbx # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rbx # 8-byte Reload
	; X64-NEXT: movq %rbx, %rax			; X64-NEXT: movq %rbx, %rax
	; X64-NEXT: movq %r10, %rsi			; X64-NEXT: mulq %r10
	; X64-NEXT: mulq %rsi
	; X64-NEXT: movq %rdx, %rcx			; X64-NEXT: movq %rdx, %rcx
	; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %r8 # 8-byte Reload			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %r8 # 8-byte Reload
	; X64-NEXT: movq %r8, %rax			; X64-NEXT: movq %r8, %rax
	; X64-NEXT: mulq %rsi			; X64-NEXT: mulq %r10
	; X64-NEXT: movq %rdx, %rsi			; X64-NEXT: movq %rdx, %rsi
	; X64-NEXT: movq %rax, %rdi			; X64-NEXT: movq %rax, %rdi
	; X64-NEXT: addq %rcx, %rdi			; X64-NEXT: addq %rcx, %rdi
	; X64-NEXT: adcq $0, %rsi			; X64-NEXT: adcq $0, %rsi
	; X64-NEXT: movq %rbx, %rax			; X64-NEXT: movq %rbx, %rax
	; X64-NEXT: mulq %r9			; X64-NEXT: mulq %r9
	; X64-NEXT: movq %rdx, %rcx			; X64-NEXT: movq %rdx, %rcx
	; X64-NEXT: movq %rax, %r10			; X64-NEXT: movq %rax, %r10
	▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	; X64-NEXT: movq %rdx, %rsi			; X64-NEXT: movq %rdx, %rsi
	; X64-NEXT: movq %rax, %r8			; X64-NEXT: movq %rax, %r8
	; X64-NEXT: addq %rbx, %r8			; X64-NEXT: addq %rbx, %r8
	; X64-NEXT: adcq %rbp, %rsi			; X64-NEXT: adcq %rbp, %rsi
	; X64-NEXT: setb %bl			; X64-NEXT: setb %bl
	; X64-NEXT: movq %r9, %rax			; X64-NEXT: movq %r9, %rax
	; X64-NEXT: mulq %rcx			; X64-NEXT: mulq %rcx
	; X64-NEXT: movq %rcx, %r10			; X64-NEXT: movq %rcx, %r10
	; X64-NEXT: movq %r10, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rcx, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rdx, %rcx			; X64-NEXT: movq %rdx, %rcx
	; X64-NEXT: movq %rax, %rdi			; X64-NEXT: movq %rax, %rdi
	; X64-NEXT: addq %rsi, %rdi			; X64-NEXT: addq %rsi, %rdi
	; X64-NEXT: movzbl %bl, %eax			; X64-NEXT: movzbl %bl, %eax
	; X64-NEXT: adcq %rax, %rcx			; X64-NEXT: adcq %rax, %rcx
	; X64-NEXT: movq %r11, %rax			; X64-NEXT: movq %r11, %rax
	; X64-NEXT: xorl %edx, %edx			; X64-NEXT: xorl %edx, %edx
	; X64-NEXT: mulq %rdx			; X64-NEXT: mulq %rdx
	; X64-NEXT: movq %rax, %rbx			; X64-NEXT: movq %rax, %rbx
	; X64-NEXT: movq %rdx, %r14			; X64-NEXT: movq %rdx, %r14
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %r12 # 8-byte Reload			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %r12 # 8-byte Reload
	; X64-NEXT: addq %rbx, %r12			; X64-NEXT: addq %rax, %r12
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %r15 # 8-byte Reload			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %r15 # 8-byte Reload
	; X64-NEXT: adcq %r14, %r15			; X64-NEXT: adcq %rdx, %r15
	; X64-NEXT: addq %rdi, %r12			; X64-NEXT: addq %rdi, %r12
	; X64-NEXT: adcq %rcx, %r15			; X64-NEXT: adcq %rcx, %r15
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx # 8-byte Reload
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: movq %rcx, %rax
	; X64-NEXT: movq %r11, %rsi			; X64-NEXT: movq %r11, %rsi
	; X64-NEXT: movq %rsi, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %r11, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: mulq %rsi			; X64-NEXT: mulq %r11
	; X64-NEXT: movq %rdx, %r11			; X64-NEXT: movq %rdx, %r11
	; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %r9 # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %r9 # 8-byte Reload
	; X64-NEXT: movq %r9, %rax			; X64-NEXT: movq %r9, %rax
	; X64-NEXT: mulq %rsi			; X64-NEXT: mulq %rsi
	; X64-NEXT: movq %rdx, %rsi			; X64-NEXT: movq %rdx, %rsi
	; X64-NEXT: movq %rax, %rdi			; X64-NEXT: movq %rax, %rdi
	; X64-NEXT: addq %r11, %rdi			; X64-NEXT: addq %r11, %rdi
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; X64-NEXT: movzbl %r11b, %eax			; X64-NEXT: movzbl %r11b, %eax
	; X64-NEXT: adcq %rax, %rcx			; X64-NEXT: adcq %rax, %rcx
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: xorl %edx, %edx			; X64-NEXT: xorl %edx, %edx
	; X64-NEXT: mulq %rdx			; X64-NEXT: mulq %rdx
	; X64-NEXT: movq %rdx, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rdx, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rax, %r9			; X64-NEXT: movq %rax, %r9
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rbp # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rbp # 8-byte Reload
	; X64-NEXT: addq %r9, %rbp			; X64-NEXT: addq %rax, %rbp
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rax # 8-byte Reload
	; X64-NEXT: adcq %rdx, %rax			; X64-NEXT: adcq %rdx, %rax
	; X64-NEXT: addq %rsi, %rbp			; X64-NEXT: addq %rsi, %rbp
	; X64-NEXT: adcq %rcx, %rax			; X64-NEXT: adcq %rcx, %rax
	; X64-NEXT: addq %rbx, %r13			; X64-NEXT: addq %rbx, %r13
	; X64-NEXT: movq %r13, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %r13, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: adcq %r14, %r8			; X64-NEXT: adcq %r14, %r8
	; X64-NEXT: movq %r8, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %r8, {{[0-9]+}}(%rsp) # 8-byte Spill
	▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx # 8-byte Reload
	; X64-NEXT: mulq %rcx			; X64-NEXT: mulq %rcx
	; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rdx, %r8			; X64-NEXT: movq %rdx, %r8
	; X64-NEXT: movq 88(%rsi), %rax			; X64-NEXT: movq 88(%rsi), %rax
	; X64-NEXT: movq %rsi, %r9			; X64-NEXT: movq %rsi, %r9
	; X64-NEXT: movq %rax, %rsi			; X64-NEXT: movq %rax, %rsi
	; X64-NEXT: movq %rsi, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: mulq %rcx			; X64-NEXT: mulq %rcx
	; X64-NEXT: movq %rcx, %r11			; X64-NEXT: movq %rcx, %r11
	; X64-NEXT: movq %rdx, %rbp			; X64-NEXT: movq %rdx, %rbp
	; X64-NEXT: movq %rax, %rbx			; X64-NEXT: movq %rax, %rbx
	; X64-NEXT: addq %r8, %rbx			; X64-NEXT: addq %r8, %rbx
	; X64-NEXT: adcq $0, %rbp			; X64-NEXT: adcq $0, %rbp
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte Spill
	Show All 19 Lines
	; X64-NEXT: movq %rax, %rsi			; X64-NEXT: movq %rax, %rsi
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %r12 # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %r12 # 8-byte Reload
	; X64-NEXT: addq %r12, %rsi			; X64-NEXT: addq %r12, %rsi
	; X64-NEXT: movq %rdx, %r10			; X64-NEXT: movq %rdx, %r10
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %r8 # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %r8 # 8-byte Reload
	; X64-NEXT: adcq %r8, %r10			; X64-NEXT: adcq %r8, %r10
	; X64-NEXT: addq %rbx, %rsi			; X64-NEXT: addq %rbx, %rsi
	; X64-NEXT: adcq %rbp, %r10			; X64-NEXT: adcq %rbp, %r10
	; X64-NEXT: movq %r9, %rdi			; X64-NEXT: movq 64(%r9), %r13
	; X64-NEXT: movq 64(%rdi), %r13
	; X64-NEXT: movq %r13, %rax			; X64-NEXT: movq %r13, %rax
	; X64-NEXT: mulq %r11			; X64-NEXT: mulq %r11
	; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rdx, %rcx			; X64-NEXT: movq %rdx, %rcx
	; X64-NEXT: movq 72(%rdi), %r9			; X64-NEXT: movq 72(%r9), %r9
	; X64-NEXT: movq %r9, %rax			; X64-NEXT: movq %r9, %rax
	; X64-NEXT: mulq %r11			; X64-NEXT: mulq %r11
	; X64-NEXT: movq %rdx, %rbp			; X64-NEXT: movq %rdx, %rbp
	; X64-NEXT: movq %rax, %rbx			; X64-NEXT: movq %rax, %rbx
	; X64-NEXT: addq %rcx, %rbx			; X64-NEXT: addq %rcx, %rbx
	; X64-NEXT: adcq $0, %rbp			; X64-NEXT: adcq $0, %rbp
	; X64-NEXT: movq %r13, %rax			; X64-NEXT: movq %r13, %rax
	; X64-NEXT: mulq %r15			; X64-NEXT: mulq %r15
	Show All 11 Lines
	; X64-NEXT: movzbl %r11b, %eax			; X64-NEXT: movzbl %r11b, %eax
	; X64-NEXT: adcq %rax, %rbx			; X64-NEXT: adcq %rax, %rbx
	; X64-NEXT: movq %r13, %rax			; X64-NEXT: movq %r13, %rax
	; X64-NEXT: xorl %ecx, %ecx			; X64-NEXT: xorl %ecx, %ecx
	; X64-NEXT: mulq %rcx			; X64-NEXT: mulq %rcx
	; X64-NEXT: movq %rdx, %r11			; X64-NEXT: movq %rdx, %r11
	; X64-NEXT: movq %rax, %r15			; X64-NEXT: movq %rax, %r15
	; X64-NEXT: movq %r12, %rcx			; X64-NEXT: movq %r12, %rcx
	; X64-NEXT: addq %r15, %rcx			; X64-NEXT: addq %rax, %rcx
	; X64-NEXT: adcq %r11, %r8			; X64-NEXT: adcq %rdx, %r8
	; X64-NEXT: addq %rbp, %rcx			; X64-NEXT: addq %rbp, %rcx
	; X64-NEXT: adcq %rbx, %r8			; X64-NEXT: adcq %rbx, %r8
	; X64-NEXT: addq -{{[0-9]+}}(%rsp), %rcx # 8-byte Folded Reload			; X64-NEXT: addq -{{[0-9]+}}(%rsp), %rcx # 8-byte Folded Reload
	; X64-NEXT: movq %rcx, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rcx, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: adcq %r14, %r8			; X64-NEXT: adcq %r14, %r8
	; X64-NEXT: movq %r8, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %r8, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: adcq $0, %rsi			; X64-NEXT: adcq $0, %rsi
	; X64-NEXT: adcq $0, %r10			; X64-NEXT: adcq $0, %r10
	Show All 35 Lines
	; X64-NEXT: movq %rbp, {{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rbp, {{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: adcq $0, %r15			; X64-NEXT: adcq $0, %r15
	; X64-NEXT: adcq $0, %r11			; X64-NEXT: adcq $0, %r11
	; X64-NEXT: addq %rsi, %r15			; X64-NEXT: addq %rsi, %r15
	; X64-NEXT: adcq %r10, %r11			; X64-NEXT: adcq %r10, %r11
	; X64-NEXT: setb %r10b			; X64-NEXT: setb %r10b
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rsi # 8-byte Reload			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rsi # 8-byte Reload
	; X64-NEXT: movq %rsi, %rax			; X64-NEXT: movq %rsi, %rax
	; X64-NEXT: movq %r8, %rdi			; X64-NEXT: mulq %r8
	; X64-NEXT: mulq %rdi
	; X64-NEXT: movq %rdx, %rcx			; X64-NEXT: movq %rdx, %rcx
	; X64-NEXT: movq %rax, %r9			; X64-NEXT: movq %rax, %r9
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rbp # 8-byte Reload			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rbp # 8-byte Reload
	; X64-NEXT: movq %rbp, %rax			; X64-NEXT: movq %rbp, %rax
	; X64-NEXT: mulq %rdi			; X64-NEXT: mulq %r8
	; X64-NEXT: movq %rdi, %r12			; X64-NEXT: movq %r8, %r12
	; X64-NEXT: movq %rdx, %rdi			; X64-NEXT: movq %rdx, %rdi
	; X64-NEXT: movq %rax, %rbx			; X64-NEXT: movq %rax, %rbx
	; X64-NEXT: addq %rcx, %rbx			; X64-NEXT: addq %rcx, %rbx
	; X64-NEXT: adcq $0, %rdi			; X64-NEXT: adcq $0, %rdi
	; X64-NEXT: movq %rsi, %rax			; X64-NEXT: movq %rsi, %rax
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rsi # 8-byte Reload			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rsi # 8-byte Reload
	; X64-NEXT: mulq %rsi			; X64-NEXT: mulq %rsi
	; X64-NEXT: movq %rdx, %rcx			; X64-NEXT: movq %rdx, %rcx
	Show All 22 Lines
	; X64-NEXT: movq %rsi, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rsi, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: adcq $0, %rcx			; X64-NEXT: adcq $0, %rcx
	; X64-NEXT: movq %rcx, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rcx, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rbp # 8-byte Reload			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rbp # 8-byte Reload
	; X64-NEXT: movq 96(%rbp), %rcx			; X64-NEXT: movq 96(%rbp), %rcx
	; X64-NEXT: imulq %rcx, %rdi			; X64-NEXT: imulq %rcx, %rdi
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: movq %rcx, %rax
	; X64-NEXT: movq %r12, %rsi			; X64-NEXT: movq %r12, %rsi
	; X64-NEXT: mulq %rsi			; X64-NEXT: mulq %r12
	; X64-NEXT: movq %rax, %r9			; X64-NEXT: movq %rax, %r9
	; X64-NEXT: addq %rdi, %rdx			; X64-NEXT: addq %rdi, %rdx
	; X64-NEXT: movq 104(%rbp), %r8			; X64-NEXT: movq 104(%rbp), %r8
	; X64-NEXT: imulq %r8, %rsi			; X64-NEXT: imulq %r8, %rsi
	; X64-NEXT: addq %rdx, %rsi			; X64-NEXT: addq %rdx, %rsi
	; X64-NEXT: movq %rsi, %r11			; X64-NEXT: movq %rsi, %r11
	; X64-NEXT: movq 112(%rbp), %rax			; X64-NEXT: movq 112(%rbp), %rax
	; X64-NEXT: movq %rbp, %rdi			; X64-NEXT: movq %rbp, %rdi
	▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines

test/CodeGen/X86/mul-i512.ll

	Show First 20 Lines • Show All 903 Lines • ▼ Show 20 Lines
	; X64-NEXT: movq %rdx, (%rsp) # 8-byte Spill			; X64-NEXT: movq %rdx, (%rsp) # 8-byte Spill
	; X64-NEXT: movq 24(%rdi), %r11			; X64-NEXT: movq 24(%rdi), %r11
	; X64-NEXT: movq 16(%rdi), %r15			; X64-NEXT: movq 16(%rdi), %r15
	; X64-NEXT: movq %rsi, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rsi, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq (%rsi), %rdx			; X64-NEXT: movq (%rsi), %rdx
	; X64-NEXT: movq 8(%rsi), %rbp			; X64-NEXT: movq 8(%rsi), %rbp
	; X64-NEXT: movq %r15, %rax			; X64-NEXT: movq %r15, %rax
	; X64-NEXT: movq %rdx, %rsi			; X64-NEXT: movq %rdx, %rsi
	; X64-NEXT: mulq %rsi			; X64-NEXT: mulq %rdx
	; X64-NEXT: movq %rdx, %r9			; X64-NEXT: movq %rdx, %r9
	; X64-NEXT: movq %rax, %r8			; X64-NEXT: movq %rax, %r8
	; X64-NEXT: movq %r11, %rax			; X64-NEXT: movq %r11, %rax
	; X64-NEXT: movq %r11, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %r11, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: mulq %rsi			; X64-NEXT: mulq %rsi
	; X64-NEXT: movq %rsi, %r10			; X64-NEXT: movq %rsi, %r10
	; X64-NEXT: movq %rdx, %rbx			; X64-NEXT: movq %rdx, %rbx
	; X64-NEXT: movq %rax, %rsi			; X64-NEXT: movq %rax, %rsi
	; X64-NEXT: addq %r9, %rsi			; X64-NEXT: addq %r9, %rsi
	; X64-NEXT: adcq $0, %rbx			; X64-NEXT: adcq $0, %rbx
	; X64-NEXT: movq %r15, %rax			; X64-NEXT: movq %r15, %rax
	; X64-NEXT: movq %r15, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %r15, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: mulq %rbp			; X64-NEXT: mulq %rbp
	; X64-NEXT: movq %rdx, %rcx			; X64-NEXT: movq %rdx, %rcx
	; X64-NEXT: movq %rax, %r9			; X64-NEXT: movq %rax, %r9
	; X64-NEXT: addq %rsi, %r9			; X64-NEXT: addq %rsi, %r9
	; X64-NEXT: adcq %rbx, %rcx			; X64-NEXT: adcq %rbx, %rcx
	; X64-NEXT: setb %al			; X64-NEXT: setb %al
	; X64-NEXT: movzbl %al, %ebx			; X64-NEXT: movzbl %al, %ebx
	; X64-NEXT: movq %r11, %rax			; X64-NEXT: movq %r11, %rax
	; X64-NEXT: mulq %rbp			; X64-NEXT: mulq %rbp
	; X64-NEXT: movq %rbp, %r14			; X64-NEXT: movq %rbp, %r14
	; X64-NEXT: movq %r14, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rbp, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rdx, %rsi			; X64-NEXT: movq %rdx, %rsi
	; X64-NEXT: movq %rax, %rbp			; X64-NEXT: movq %rax, %rbp
	; X64-NEXT: addq %rcx, %rbp			; X64-NEXT: addq %rcx, %rbp
	; X64-NEXT: adcq %rbx, %rsi			; X64-NEXT: adcq %rbx, %rsi
	; X64-NEXT: xorl %ecx, %ecx			; X64-NEXT: xorl %ecx, %ecx
	; X64-NEXT: movq %r10, %rbx			; X64-NEXT: movq %r10, %rbx
	; X64-NEXT: movq %rbx, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %r10, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rbx, %rax			; X64-NEXT: movq %r10, %rax
	; X64-NEXT: mulq %rcx			; X64-NEXT: mulq %rcx
	; X64-NEXT: movq %rdx, %r13			; X64-NEXT: movq %rdx, %r13
	; X64-NEXT: movq %rax, %r10			; X64-NEXT: movq %rax, %r10
	; X64-NEXT: movq %r15, %rax			; X64-NEXT: movq %r15, %rax
	; X64-NEXT: mulq %rcx			; X64-NEXT: mulq %rcx
	; X64-NEXT: movq %rdx, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rdx, -{{[0-9]+}}(%rsp) # 8-byte Spill
				; X64-NEXT: # kill: %RAX<kill>
				; X64-NEXT: movq %rax, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rax, %r15			; X64-NEXT: movq %rax, %r15
	; X64-NEXT: movq %r15, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: addq %r10, %r15			; X64-NEXT: addq %r10, %r15
	; X64-NEXT: adcq %r13, %rdx			; X64-NEXT: adcq %r13, %rdx
	; X64-NEXT: addq %rbp, %r15			; X64-NEXT: addq %rbp, %r15
	; X64-NEXT: adcq %rsi, %rdx			; X64-NEXT: adcq %rsi, %rdx
	; X64-NEXT: movq %rdx, %r12			; X64-NEXT: movq %rdx, %r12
	; X64-NEXT: movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq (%rdi), %rcx			; X64-NEXT: movq (%rdi), %rcx
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: movq %rcx, %rax
	Show All 22 Lines
	; X64-NEXT: addq %rbx, %rbp			; X64-NEXT: addq %rbx, %rbp
	; X64-NEXT: movzbl %r11b, %eax			; X64-NEXT: movzbl %r11b, %eax
	; X64-NEXT: adcq %rax, %rsi			; X64-NEXT: adcq %rax, %rsi
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: movq %rcx, %rax
	; X64-NEXT: xorl %edx, %edx			; X64-NEXT: xorl %edx, %edx
	; X64-NEXT: mulq %rdx			; X64-NEXT: mulq %rdx
	; X64-NEXT: movq %rdx, %r14			; X64-NEXT: movq %rdx, %r14
	; X64-NEXT: movq %rax, %r11			; X64-NEXT: movq %rax, %r11
	; X64-NEXT: addq %r11, %r10			; X64-NEXT: addq %rax, %r10
	; X64-NEXT: adcq %r14, %r13			; X64-NEXT: adcq %rdx, %r13
	; X64-NEXT: addq %rbp, %r10			; X64-NEXT: addq %rbp, %r10
	; X64-NEXT: adcq %rsi, %r13			; X64-NEXT: adcq %rsi, %r13
	; X64-NEXT: addq %r8, %r10			; X64-NEXT: addq %r8, %r10
	; X64-NEXT: adcq %r9, %r13			; X64-NEXT: adcq %r9, %r13
	; X64-NEXT: adcq $0, %r15			; X64-NEXT: adcq $0, %r15
	; X64-NEXT: adcq $0, %r12			; X64-NEXT: adcq $0, %r12
	; X64-NEXT: movq %r12, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %r12, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rsi # 8-byte Reload			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rsi # 8-byte Reload
	; X64-NEXT: movq 16(%rsi), %r8			; X64-NEXT: movq 16(%rsi), %r8
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: movq %rcx, %rax
	; X64-NEXT: movq %rcx, %r9			; X64-NEXT: movq %rcx, %r9
	; X64-NEXT: movq %r9, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rcx, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: mulq %r8			; X64-NEXT: mulq %r8
	; X64-NEXT: movq %rdx, %rdi			; X64-NEXT: movq %rdx, %rdi
	; X64-NEXT: movq %rax, %r12			; X64-NEXT: movq %rax, %r12
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rcx # 8-byte Reload			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rcx # 8-byte Reload
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: movq %rcx, %rax
	; X64-NEXT: mulq %r8			; X64-NEXT: mulq %r8
	; X64-NEXT: movq %rdx, %rbp			; X64-NEXT: movq %rdx, %rbp
	; X64-NEXT: movq %rax, %rbx			; X64-NEXT: movq %rax, %rbx
	Show All 14 Lines
	; X64-NEXT: addq %rsi, %r9			; X64-NEXT: addq %rsi, %r9
	; X64-NEXT: movzbl %bpl, %eax			; X64-NEXT: movzbl %bpl, %eax
	; X64-NEXT: adcq %rax, %rbx			; X64-NEXT: adcq %rax, %rbx
	; X64-NEXT: movq %r8, %rax			; X64-NEXT: movq %r8, %rax
	; X64-NEXT: xorl %ecx, %ecx			; X64-NEXT: xorl %ecx, %ecx
	; X64-NEXT: mulq %rcx			; X64-NEXT: mulq %rcx
	; X64-NEXT: movq %rdx, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %rdx, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: movq %rax, %rbp			; X64-NEXT: movq %rax, %rbp
	; X64-NEXT: addq %rbp, %r11			; X64-NEXT: addq %rax, %r11
	; X64-NEXT: adcq %rdx, %r14			; X64-NEXT: adcq %rdx, %r14
	; X64-NEXT: addq %r9, %r11			; X64-NEXT: addq %r9, %r11
	; X64-NEXT: adcq %rbx, %r14			; X64-NEXT: adcq %rbx, %r14
	; X64-NEXT: addq %r10, %r12			; X64-NEXT: addq %r10, %r12
	; X64-NEXT: movq %r12, -{{[0-9]+}}(%rsp) # 8-byte Spill			; X64-NEXT: movq %r12, -{{[0-9]+}}(%rsp) # 8-byte Spill
	; X64-NEXT: adcq %r13, -{{[0-9]+}}(%rsp) # 8-byte Folded Spill			; X64-NEXT: adcq %r13, -{{[0-9]+}}(%rsp) # 8-byte Folded Spill
	; X64-NEXT: adcq $0, %r11			; X64-NEXT: adcq $0, %r11
	; X64-NEXT: adcq $0, %r14			; X64-NEXT: adcq $0, %r14
	▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines

test/CodeGen/X86/mul128.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown \| FileCheck %s --check-prefix=X64

	define i128 @foo(i128 %t, i128 %u) {			define i128 @foo(i128 %t, i128 %u) {
	; X64-LABEL: foo:			; X64-LABEL: foo:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: movq %rdx, %r8			; X64-NEXT: movq %rdx, %r8
	; X64-NEXT: imulq %rdi, %rcx			; X64-NEXT: imulq %rdi, %rcx
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: mulq %r8			; X64-NEXT: mulq %rdx
	; X64-NEXT: addq %rcx, %rdx			; X64-NEXT: addq %rcx, %rdx
	; X64-NEXT: imulq %r8, %rsi			; X64-NEXT: imulq %r8, %rsi
	; X64-NEXT: addq %rsi, %rdx			; X64-NEXT: addq %rsi, %rdx
	; X64-NEXT: retq			; X64-NEXT: retq
	%k = mul i128 %t, %u			%k = mul i128 %t, %u
	ret i128 %k			ret i128 %k
	}			}

test/CodeGen/X86/pmul.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefix=SSE --check-prefix=SSE2		; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefix=SSE --check-prefix=SSE2
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefix=SSE --check-prefix=SSE41		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefix=SSE --check-prefix=SSE41
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefix=AVX --check-prefix=AVX2		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefix=AVX --check-prefix=AVX2
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f \| FileCheck %s --check-prefix=AVX --check-prefix=AVX512 --check-prefix=AVX512F		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f \| FileCheck %s --check-prefix=AVX --check-prefix=AVX512 --check-prefix=AVX512F
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw \| FileCheck %s --check-prefix=AVX --check-prefix=AVX512 --check-prefix=AVX512BW		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512bw \| FileCheck %s --check-prefix=AVX --check-prefix=AVX512 --check-prefix=AVX512BW

define <16 x i8> @mul_v16i8c(<16 x i8> %i) nounwind {		define <16 x i8> @mul_v16i8c(<16 x i8> %i) nounwind {
; SSE2-LABEL: mul_v16i8c:		; SSE2-LABEL: mul_v16i8c:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm0, %xmm1		; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm0[8],xmm1[9],xmm0[9],xmm1[10],xmm0[10],xmm1[11],xmm0[11],xmm1[12],xmm0[12],xmm1[13],xmm0[13],xmm1[14],xmm0[14],xmm1[15],xmm0[15]
; SSE2-NEXT: psraw $8, %xmm1		; SSE2-NEXT: psraw $8, %xmm1
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [117,117,117,117,117,117,117,117]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [117,117,117,117,117,117,117,117]
; SSE2-NEXT: pmullw %xmm2, %xmm1		; SSE2-NEXT: pmullw %xmm2, %xmm1
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [255,255,255,255,255,255,255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [255,255,255,255,255,255,255,255]
; SSE2-NEXT: pand %xmm3, %xmm1		; SSE2-NEXT: pand %xmm3, %xmm1
; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm0		; SSE2-NEXT: psraw $8, %xmm0
; SSE2-NEXT: pmullw %xmm2, %xmm0		; SSE2-NEXT: pmullw %xmm2, %xmm0
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	entry:
%A = mul <2 x i64> %i, < i64 117, i64 117 >		%A = mul <2 x i64> %i, < i64 117, i64 117 >
ret <2 x i64> %A		ret <2 x i64> %A
}		}

define <16 x i8> @mul_v16i8(<16 x i8> %i, <16 x i8> %j) nounwind {		define <16 x i8> @mul_v16i8(<16 x i8> %i, <16 x i8> %j) nounwind {
; SSE2-LABEL: mul_v16i8:		; SSE2-LABEL: mul_v16i8:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: punpckhbw {{.*#+}} xmm2 = xmm2[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm2 = xmm2[8],xmm1[8],xmm2[9],xmm1[9],xmm2[10],xmm1[10],xmm2[11],xmm1[11],xmm2[12],xmm1[12],xmm2[13],xmm1[13],xmm2[14],xmm1[14],xmm2[15],xmm1[15]
; SSE2-NEXT: psraw $8, %xmm2		; SSE2-NEXT: psraw $8, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm3		; SSE2-NEXT: movdqa %xmm0, %xmm3
; SSE2-NEXT: punpckhbw {{.*#+}} xmm3 = xmm3[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm3 = xmm3[8],xmm0[8],xmm3[9],xmm0[9],xmm3[10],xmm0[10],xmm3[11],xmm0[11],xmm3[12],xmm0[12],xmm3[13],xmm0[13],xmm3[14],xmm0[14],xmm3[15],xmm0[15]
; SSE2-NEXT: psraw $8, %xmm3		; SSE2-NEXT: psraw $8, %xmm3
; SSE2-NEXT: pmullw %xmm2, %xmm3		; SSE2-NEXT: pmullw %xmm2, %xmm3
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255,255,255,255,255,255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [255,255,255,255,255,255,255,255]
; SSE2-NEXT: pand %xmm2, %xmm3		; SSE2-NEXT: pand %xmm2, %xmm3
; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm1		; SSE2-NEXT: psraw $8, %xmm1
; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm0		; SSE2-NEXT: psraw $8, %xmm0
▲ Show 20 Lines • Show All 223 Lines • ▼ Show 20 Lines	entry:
%A = mul <2 x i64> %i, %j		%A = mul <2 x i64> %i, %j
ret <2 x i64> %A		ret <2 x i64> %A
}		}

define <32 x i8> @mul_v32i8c(<32 x i8> %i) nounwind {		define <32 x i8> @mul_v32i8c(<32 x i8> %i) nounwind {
; SSE2-LABEL: mul_v32i8c:		; SSE2-LABEL: mul_v32i8c:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm0, %xmm2		; SSE2-NEXT: movdqa %xmm0, %xmm2
; SSE2-NEXT: punpckhbw {{.*#+}} xmm2 = xmm2[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm2 = xmm2[8],xmm0[8],xmm2[9],xmm0[9],xmm2[10],xmm0[10],xmm2[11],xmm0[11],xmm2[12],xmm0[12],xmm2[13],xmm0[13],xmm2[14],xmm0[14],xmm2[15],xmm0[15]
; SSE2-NEXT: psraw $8, %xmm2		; SSE2-NEXT: psraw $8, %xmm2
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [117,117,117,117,117,117,117,117]		; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [117,117,117,117,117,117,117,117]
; SSE2-NEXT: pmullw %xmm3, %xmm2		; SSE2-NEXT: pmullw %xmm3, %xmm2
; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [255,255,255,255,255,255,255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [255,255,255,255,255,255,255,255]
; SSE2-NEXT: pand %xmm4, %xmm2		; SSE2-NEXT: pand %xmm4, %xmm2
; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm0		; SSE2-NEXT: psraw $8, %xmm0
; SSE2-NEXT: pmullw %xmm3, %xmm0		; SSE2-NEXT: pmullw %xmm3, %xmm0
; SSE2-NEXT: pand %xmm4, %xmm0		; SSE2-NEXT: pand %xmm4, %xmm0
; SSE2-NEXT: packuswb %xmm2, %xmm0		; SSE2-NEXT: packuswb %xmm2, %xmm0
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: punpckhbw {{.*#+}} xmm2 = xmm2[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm2 = xmm2[8],xmm1[8],xmm2[9],xmm1[9],xmm2[10],xmm1[10],xmm2[11],xmm1[11],xmm2[12],xmm1[12],xmm2[13],xmm1[13],xmm2[14],xmm1[14],xmm2[15],xmm1[15]
; SSE2-NEXT: psraw $8, %xmm2		; SSE2-NEXT: psraw $8, %xmm2
; SSE2-NEXT: pmullw %xmm3, %xmm2		; SSE2-NEXT: pmullw %xmm3, %xmm2
; SSE2-NEXT: pand %xmm4, %xmm2		; SSE2-NEXT: pand %xmm4, %xmm2
; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm1		; SSE2-NEXT: psraw $8, %xmm1
; SSE2-NEXT: pmullw %xmm3, %xmm1		; SSE2-NEXT: pmullw %xmm3, %xmm1
; SSE2-NEXT: pand %xmm4, %xmm1		; SSE2-NEXT: pand %xmm4, %xmm1
; SSE2-NEXT: packuswb %xmm2, %xmm1		; SSE2-NEXT: packuswb %xmm2, %xmm1
▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	entry:
%A = mul <4 x i64> %i, < i64 117, i64 117, i64 117, i64 117 >		%A = mul <4 x i64> %i, < i64 117, i64 117, i64 117, i64 117 >
ret <4 x i64> %A		ret <4 x i64> %A
}		}

define <32 x i8> @mul_v32i8(<32 x i8> %i, <32 x i8> %j) nounwind {		define <32 x i8> @mul_v32i8(<32 x i8> %i, <32 x i8> %j) nounwind {
; SSE2-LABEL: mul_v32i8:		; SSE2-LABEL: mul_v32i8:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm2, %xmm4		; SSE2-NEXT: movdqa %xmm2, %xmm4
; SSE2-NEXT: punpckhbw {{.*#+}} xmm4 = xmm4[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm4 = xmm4[8],xmm2[8],xmm4[9],xmm2[9],xmm4[10],xmm2[10],xmm4[11],xmm2[11],xmm4[12],xmm2[12],xmm4[13],xmm2[13],xmm4[14],xmm2[14],xmm4[15],xmm2[15]
; SSE2-NEXT: psraw $8, %xmm4		; SSE2-NEXT: psraw $8, %xmm4
; SSE2-NEXT: movdqa %xmm0, %xmm5		; SSE2-NEXT: movdqa %xmm0, %xmm5
; SSE2-NEXT: punpckhbw {{.*#+}} xmm5 = xmm5[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm5 = xmm5[8],xmm0[8],xmm5[9],xmm0[9],xmm5[10],xmm0[10],xmm5[11],xmm0[11],xmm5[12],xmm0[12],xmm5[13],xmm0[13],xmm5[14],xmm0[14],xmm5[15],xmm0[15]
; SSE2-NEXT: psraw $8, %xmm5		; SSE2-NEXT: psraw $8, %xmm5
; SSE2-NEXT: pmullw %xmm4, %xmm5		; SSE2-NEXT: pmullw %xmm4, %xmm5
; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [255,255,255,255,255,255,255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [255,255,255,255,255,255,255,255]
; SSE2-NEXT: pand %xmm4, %xmm5		; SSE2-NEXT: pand %xmm4, %xmm5
; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm2		; SSE2-NEXT: psraw $8, %xmm2
; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm0		; SSE2-NEXT: psraw $8, %xmm0
; SSE2-NEXT: pmullw %xmm2, %xmm0		; SSE2-NEXT: pmullw %xmm2, %xmm0
; SSE2-NEXT: pand %xmm4, %xmm0		; SSE2-NEXT: pand %xmm4, %xmm0
; SSE2-NEXT: packuswb %xmm5, %xmm0		; SSE2-NEXT: packuswb %xmm5, %xmm0
; SSE2-NEXT: movdqa %xmm3, %xmm2		; SSE2-NEXT: movdqa %xmm3, %xmm2
; SSE2-NEXT: punpckhbw {{.*#+}} xmm2 = xmm2[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm2 = xmm2[8],xmm3[8],xmm2[9],xmm3[9],xmm2[10],xmm3[10],xmm2[11],xmm3[11],xmm2[12],xmm3[12],xmm2[13],xmm3[13],xmm2[14],xmm3[14],xmm2[15],xmm3[15]
; SSE2-NEXT: psraw $8, %xmm2		; SSE2-NEXT: psraw $8, %xmm2
; SSE2-NEXT: movdqa %xmm1, %xmm5		; SSE2-NEXT: movdqa %xmm1, %xmm5
; SSE2-NEXT: punpckhbw {{.*#+}} xmm5 = xmm5[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm5 = xmm5[8],xmm1[8],xmm5[9],xmm1[9],xmm5[10],xmm1[10],xmm5[11],xmm1[11],xmm5[12],xmm1[12],xmm5[13],xmm1[13],xmm5[14],xmm1[14],xmm5[15],xmm1[15]
; SSE2-NEXT: psraw $8, %xmm5		; SSE2-NEXT: psraw $8, %xmm5
; SSE2-NEXT: pmullw %xmm2, %xmm5		; SSE2-NEXT: pmullw %xmm2, %xmm5
; SSE2-NEXT: pand %xmm4, %xmm5		; SSE2-NEXT: pand %xmm4, %xmm5
; SSE2-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm3		; SSE2-NEXT: psraw $8, %xmm3
; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm1		; SSE2-NEXT: psraw $8, %xmm1
; SSE2-NEXT: pmullw %xmm3, %xmm1		; SSE2-NEXT: pmullw %xmm3, %xmm1
▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	entry:
%A = mul <4 x i64> %i, %j		%A = mul <4 x i64> %i, %j
ret <4 x i64> %A		ret <4 x i64> %A
}		}

define <64 x i8> @mul_v64i8c(<64 x i8> %i) nounwind {		define <64 x i8> @mul_v64i8c(<64 x i8> %i) nounwind {
; SSE2-LABEL: mul_v64i8c:		; SSE2-LABEL: mul_v64i8c:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm0, %xmm6		; SSE2-NEXT: movdqa %xmm0, %xmm6
; SSE2-NEXT: punpckhbw {{.*#+}} xmm6 = xmm6[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm6 = xmm6[8],xmm0[8],xmm6[9],xmm0[9],xmm6[10],xmm0[10],xmm6[11],xmm0[11],xmm6[12],xmm0[12],xmm6[13],xmm0[13],xmm6[14],xmm0[14],xmm6[15],xmm0[15]
; SSE2-NEXT: psraw $8, %xmm6		; SSE2-NEXT: psraw $8, %xmm6
; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [117,117,117,117,117,117,117,117]		; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [117,117,117,117,117,117,117,117]
; SSE2-NEXT: pmullw %xmm4, %xmm6		; SSE2-NEXT: pmullw %xmm4, %xmm6
; SSE2-NEXT: movdqa {{.*#+}} xmm5 = [255,255,255,255,255,255,255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm5 = [255,255,255,255,255,255,255,255]
; SSE2-NEXT: pand %xmm5, %xmm6		; SSE2-NEXT: pand %xmm5, %xmm6
; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm0		; SSE2-NEXT: psraw $8, %xmm0
; SSE2-NEXT: pmullw %xmm4, %xmm0		; SSE2-NEXT: pmullw %xmm4, %xmm0
; SSE2-NEXT: pand %xmm5, %xmm0		; SSE2-NEXT: pand %xmm5, %xmm0
; SSE2-NEXT: packuswb %xmm6, %xmm0		; SSE2-NEXT: packuswb %xmm6, %xmm0
; SSE2-NEXT: movdqa %xmm1, %xmm6		; SSE2-NEXT: movdqa %xmm1, %xmm6
; SSE2-NEXT: punpckhbw {{.*#+}} xmm6 = xmm6[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm6 = xmm6[8],xmm1[8],xmm6[9],xmm1[9],xmm6[10],xmm1[10],xmm6[11],xmm1[11],xmm6[12],xmm1[12],xmm6[13],xmm1[13],xmm6[14],xmm1[14],xmm6[15],xmm1[15]
; SSE2-NEXT: psraw $8, %xmm6		; SSE2-NEXT: psraw $8, %xmm6
; SSE2-NEXT: pmullw %xmm4, %xmm6		; SSE2-NEXT: pmullw %xmm4, %xmm6
; SSE2-NEXT: pand %xmm5, %xmm6		; SSE2-NEXT: pand %xmm5, %xmm6
; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm1		; SSE2-NEXT: psraw $8, %xmm1
; SSE2-NEXT: pmullw %xmm4, %xmm1		; SSE2-NEXT: pmullw %xmm4, %xmm1
; SSE2-NEXT: pand %xmm5, %xmm1		; SSE2-NEXT: pand %xmm5, %xmm1
; SSE2-NEXT: packuswb %xmm6, %xmm1		; SSE2-NEXT: packuswb %xmm6, %xmm1
; SSE2-NEXT: movdqa %xmm2, %xmm6		; SSE2-NEXT: movdqa %xmm2, %xmm6
; SSE2-NEXT: punpckhbw {{.*#+}} xmm6 = xmm6[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm6 = xmm6[8],xmm2[8],xmm6[9],xmm2[9],xmm6[10],xmm2[10],xmm6[11],xmm2[11],xmm6[12],xmm2[12],xmm6[13],xmm2[13],xmm6[14],xmm2[14],xmm6[15],xmm2[15]
; SSE2-NEXT: psraw $8, %xmm6		; SSE2-NEXT: psraw $8, %xmm6
; SSE2-NEXT: pmullw %xmm4, %xmm6		; SSE2-NEXT: pmullw %xmm4, %xmm6
; SSE2-NEXT: pand %xmm5, %xmm6		; SSE2-NEXT: pand %xmm5, %xmm6
; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm2		; SSE2-NEXT: psraw $8, %xmm2
; SSE2-NEXT: pmullw %xmm4, %xmm2		; SSE2-NEXT: pmullw %xmm4, %xmm2
; SSE2-NEXT: pand %xmm5, %xmm2		; SSE2-NEXT: pand %xmm5, %xmm2
; SSE2-NEXT: packuswb %xmm6, %xmm2		; SSE2-NEXT: packuswb %xmm6, %xmm2
; SSE2-NEXT: movdqa %xmm3, %xmm6		; SSE2-NEXT: movdqa %xmm3, %xmm6
; SSE2-NEXT: punpckhbw {{.*#+}} xmm6 = xmm6[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm6 = xmm6[8],xmm3[8],xmm6[9],xmm3[9],xmm6[10],xmm3[10],xmm6[11],xmm3[11],xmm6[12],xmm3[12],xmm6[13],xmm3[13],xmm6[14],xmm3[14],xmm6[15],xmm3[15]
; SSE2-NEXT: psraw $8, %xmm6		; SSE2-NEXT: psraw $8, %xmm6
; SSE2-NEXT: pmullw %xmm4, %xmm6		; SSE2-NEXT: pmullw %xmm4, %xmm6
; SSE2-NEXT: pand %xmm5, %xmm6		; SSE2-NEXT: pand %xmm5, %xmm6
; SSE2-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm3		; SSE2-NEXT: psraw $8, %xmm3
; SSE2-NEXT: pmullw %xmm4, %xmm3		; SSE2-NEXT: pmullw %xmm4, %xmm3
; SSE2-NEXT: pand %xmm5, %xmm3		; SSE2-NEXT: pand %xmm5, %xmm3
; SSE2-NEXT: packuswb %xmm6, %xmm3		; SSE2-NEXT: packuswb %xmm6, %xmm3
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE41-LABEL: mul_v64i8c:		; SSE41-LABEL: mul_v64i8c:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: movdqa %xmm1, %xmm4		; SSE41-NEXT: movdqa %xmm1, %xmm4
; SSE41-NEXT: movdqa %xmm0, %xmm1		; SSE41-NEXT: movdqa %xmm0, %xmm1
; SSE41-NEXT: pmovsxbw %xmm1, %xmm0		; SSE41-NEXT: pmovsxbw %xmm0, %xmm0
; SSE41-NEXT: movdqa {{.*#+}} xmm6 = [117,117,117,117,117,117,117,117]		; SSE41-NEXT: movdqa {{.*#+}} xmm6 = [117,117,117,117,117,117,117,117]
; SSE41-NEXT: pmullw %xmm6, %xmm0		; SSE41-NEXT: pmullw %xmm6, %xmm0
; SSE41-NEXT: movdqa {{.*#+}} xmm7 = [255,255,255,255,255,255,255,255]		; SSE41-NEXT: movdqa {{.*#+}} xmm7 = [255,255,255,255,255,255,255,255]
; SSE41-NEXT: pand %xmm7, %xmm0		; SSE41-NEXT: pand %xmm7, %xmm0
; SSE41-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]		; SSE41-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
; SSE41-NEXT: pmovsxbw %xmm1, %xmm1		; SSE41-NEXT: pmovsxbw %xmm1, %xmm1
; SSE41-NEXT: pmullw %xmm6, %xmm1		; SSE41-NEXT: pmullw %xmm6, %xmm1
; SSE41-NEXT: pand %xmm7, %xmm1		; SSE41-NEXT: pand %xmm7, %xmm1
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	entry:
%A = mul <64 x i8> %i, < i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117 >		%A = mul <64 x i8> %i, < i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117, i8 117 >
ret <64 x i8> %A		ret <64 x i8> %A
}		}

define <64 x i8> @mul_v64i8(<64 x i8> %i, <64 x i8> %j) nounwind {		define <64 x i8> @mul_v64i8(<64 x i8> %i, <64 x i8> %j) nounwind {
; SSE2-LABEL: mul_v64i8:		; SSE2-LABEL: mul_v64i8:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm4, %xmm8		; SSE2-NEXT: movdqa %xmm4, %xmm8
; SSE2-NEXT: punpckhbw {{.*#+}} xmm8 = xmm8[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm8 = xmm8[8],xmm4[8],xmm8[9],xmm4[9],xmm8[10],xmm4[10],xmm8[11],xmm4[11],xmm8[12],xmm4[12],xmm8[13],xmm4[13],xmm8[14],xmm4[14],xmm8[15],xmm4[15]
; SSE2-NEXT: psraw $8, %xmm8		; SSE2-NEXT: psraw $8, %xmm8
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: punpckhbw {{.*#+}} xmm9 = xmm9[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm9 = xmm9[8],xmm0[8],xmm9[9],xmm0[9],xmm9[10],xmm0[10],xmm9[11],xmm0[11],xmm9[12],xmm0[12],xmm9[13],xmm0[13],xmm9[14],xmm0[14],xmm9[15],xmm0[15]
; SSE2-NEXT: psraw $8, %xmm9		; SSE2-NEXT: psraw $8, %xmm9
; SSE2-NEXT: pmullw %xmm8, %xmm9		; SSE2-NEXT: pmullw %xmm8, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm8 = [255,255,255,255,255,255,255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm8 = [255,255,255,255,255,255,255,255]
; SSE2-NEXT: pand %xmm8, %xmm9		; SSE2-NEXT: pand %xmm8, %xmm9
; SSE2-NEXT: punpcklbw {{.*#+}} xmm4 = xmm4[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm4 = xmm4[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm4		; SSE2-NEXT: psraw $8, %xmm4
; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm0		; SSE2-NEXT: psraw $8, %xmm0
; SSE2-NEXT: pmullw %xmm4, %xmm0		; SSE2-NEXT: pmullw %xmm4, %xmm0
; SSE2-NEXT: pand %xmm8, %xmm0		; SSE2-NEXT: pand %xmm8, %xmm0
; SSE2-NEXT: packuswb %xmm9, %xmm0		; SSE2-NEXT: packuswb %xmm9, %xmm0
; SSE2-NEXT: movdqa %xmm5, %xmm9		; SSE2-NEXT: movdqa %xmm5, %xmm9
; SSE2-NEXT: punpckhbw {{.*#+}} xmm9 = xmm9[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm9 = xmm9[8],xmm5[8],xmm9[9],xmm5[9],xmm9[10],xmm5[10],xmm9[11],xmm5[11],xmm9[12],xmm5[12],xmm9[13],xmm5[13],xmm9[14],xmm5[14],xmm9[15],xmm5[15]
; SSE2-NEXT: psraw $8, %xmm9		; SSE2-NEXT: psraw $8, %xmm9
; SSE2-NEXT: movdqa %xmm1, %xmm4		; SSE2-NEXT: movdqa %xmm1, %xmm4
; SSE2-NEXT: punpckhbw {{.*#+}} xmm4 = xmm4[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm4 = xmm4[8],xmm1[8],xmm4[9],xmm1[9],xmm4[10],xmm1[10],xmm4[11],xmm1[11],xmm4[12],xmm1[12],xmm4[13],xmm1[13],xmm4[14],xmm1[14],xmm4[15],xmm1[15]
; SSE2-NEXT: psraw $8, %xmm4		; SSE2-NEXT: psraw $8, %xmm4
; SSE2-NEXT: pmullw %xmm9, %xmm4		; SSE2-NEXT: pmullw %xmm9, %xmm4
; SSE2-NEXT: pand %xmm8, %xmm4		; SSE2-NEXT: pand %xmm8, %xmm4
; SSE2-NEXT: punpcklbw {{.*#+}} xmm5 = xmm5[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm5 = xmm5[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm5		; SSE2-NEXT: psraw $8, %xmm5
; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm1		; SSE2-NEXT: psraw $8, %xmm1
; SSE2-NEXT: pmullw %xmm5, %xmm1		; SSE2-NEXT: pmullw %xmm5, %xmm1
; SSE2-NEXT: pand %xmm8, %xmm1		; SSE2-NEXT: pand %xmm8, %xmm1
; SSE2-NEXT: packuswb %xmm4, %xmm1		; SSE2-NEXT: packuswb %xmm4, %xmm1
; SSE2-NEXT: movdqa %xmm6, %xmm4		; SSE2-NEXT: movdqa %xmm6, %xmm4
; SSE2-NEXT: punpckhbw {{.*#+}} xmm4 = xmm4[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm4 = xmm4[8],xmm6[8],xmm4[9],xmm6[9],xmm4[10],xmm6[10],xmm4[11],xmm6[11],xmm4[12],xmm6[12],xmm4[13],xmm6[13],xmm4[14],xmm6[14],xmm4[15],xmm6[15]
; SSE2-NEXT: psraw $8, %xmm4		; SSE2-NEXT: psraw $8, %xmm4
; SSE2-NEXT: movdqa %xmm2, %xmm5		; SSE2-NEXT: movdqa %xmm2, %xmm5
; SSE2-NEXT: punpckhbw {{.*#+}} xmm5 = xmm5[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm5 = xmm5[8],xmm2[8],xmm5[9],xmm2[9],xmm5[10],xmm2[10],xmm5[11],xmm2[11],xmm5[12],xmm2[12],xmm5[13],xmm2[13],xmm5[14],xmm2[14],xmm5[15],xmm2[15]
; SSE2-NEXT: psraw $8, %xmm5		; SSE2-NEXT: psraw $8, %xmm5
; SSE2-NEXT: pmullw %xmm4, %xmm5		; SSE2-NEXT: pmullw %xmm4, %xmm5
; SSE2-NEXT: pand %xmm8, %xmm5		; SSE2-NEXT: pand %xmm8, %xmm5
; SSE2-NEXT: punpcklbw {{.*#+}} xmm6 = xmm6[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm6 = xmm6[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm6		; SSE2-NEXT: psraw $8, %xmm6
; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm2		; SSE2-NEXT: psraw $8, %xmm2
; SSE2-NEXT: pmullw %xmm6, %xmm2		; SSE2-NEXT: pmullw %xmm6, %xmm2
; SSE2-NEXT: pand %xmm8, %xmm2		; SSE2-NEXT: pand %xmm8, %xmm2
; SSE2-NEXT: packuswb %xmm5, %xmm2		; SSE2-NEXT: packuswb %xmm5, %xmm2
; SSE2-NEXT: movdqa %xmm7, %xmm4		; SSE2-NEXT: movdqa %xmm7, %xmm4
; SSE2-NEXT: punpckhbw {{.*#+}} xmm4 = xmm4[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm4 = xmm4[8],xmm7[8],xmm4[9],xmm7[9],xmm4[10],xmm7[10],xmm4[11],xmm7[11],xmm4[12],xmm7[12],xmm4[13],xmm7[13],xmm4[14],xmm7[14],xmm4[15],xmm7[15]
; SSE2-NEXT: psraw $8, %xmm4		; SSE2-NEXT: psraw $8, %xmm4
; SSE2-NEXT: movdqa %xmm3, %xmm5		; SSE2-NEXT: movdqa %xmm3, %xmm5
; SSE2-NEXT: punpckhbw {{.*#+}} xmm5 = xmm5[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm5 = xmm5[8],xmm3[8],xmm5[9],xmm3[9],xmm5[10],xmm3[10],xmm5[11],xmm3[11],xmm5[12],xmm3[12],xmm5[13],xmm3[13],xmm5[14],xmm3[14],xmm5[15],xmm3[15]
; SSE2-NEXT: psraw $8, %xmm5		; SSE2-NEXT: psraw $8, %xmm5
; SSE2-NEXT: pmullw %xmm4, %xmm5		; SSE2-NEXT: pmullw %xmm4, %xmm5
; SSE2-NEXT: pand %xmm8, %xmm5		; SSE2-NEXT: pand %xmm8, %xmm5
; SSE2-NEXT: punpcklbw {{.*#+}} xmm7 = xmm7[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm7 = xmm7[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm7		; SSE2-NEXT: psraw $8, %xmm7
; SSE2-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm3		; SSE2-NEXT: psraw $8, %xmm3
; SSE2-NEXT: pmullw %xmm7, %xmm3		; SSE2-NEXT: pmullw %xmm7, %xmm3
; SSE2-NEXT: pand %xmm8, %xmm3		; SSE2-NEXT: pand %xmm8, %xmm3
; SSE2-NEXT: packuswb %xmm5, %xmm3		; SSE2-NEXT: packuswb %xmm5, %xmm3
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE41-LABEL: mul_v64i8:		; SSE41-LABEL: mul_v64i8:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: movdqa %xmm1, %xmm8		; SSE41-NEXT: movdqa %xmm1, %xmm8
; SSE41-NEXT: movdqa %xmm0, %xmm1		; SSE41-NEXT: movdqa %xmm0, %xmm1
; SSE41-NEXT: pmovsxbw %xmm4, %xmm9		; SSE41-NEXT: pmovsxbw %xmm4, %xmm9
; SSE41-NEXT: pmovsxbw %xmm1, %xmm0		; SSE41-NEXT: pmovsxbw %xmm0, %xmm0
; SSE41-NEXT: pmullw %xmm9, %xmm0		; SSE41-NEXT: pmullw %xmm9, %xmm0
; SSE41-NEXT: movdqa {{.*#+}} xmm9 = [255,255,255,255,255,255,255,255]		; SSE41-NEXT: movdqa {{.*#+}} xmm9 = [255,255,255,255,255,255,255,255]
; SSE41-NEXT: pand %xmm9, %xmm0		; SSE41-NEXT: pand %xmm9, %xmm0
; SSE41-NEXT: pshufd {{.*#+}} xmm4 = xmm4[2,3,0,1]		; SSE41-NEXT: pshufd {{.*#+}} xmm4 = xmm4[2,3,0,1]
; SSE41-NEXT: pmovsxbw %xmm4, %xmm4		; SSE41-NEXT: pmovsxbw %xmm4, %xmm4
; SSE41-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]		; SSE41-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
; SSE41-NEXT: pmovsxbw %xmm1, %xmm1		; SSE41-NEXT: pmovsxbw %xmm1, %xmm1
; SSE41-NEXT: pmullw %xmm4, %xmm1		; SSE41-NEXT: pmullw %xmm4, %xmm1
▲ Show 20 Lines • Show All 495 Lines • Show Last 20 Lines

test/CodeGen/X86/powi.ll

	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 \| FileCheck %s
	; Ideally this would compile to 5 multiplies.			; Ideally this would compile to 5 multiplies.

	define double @pow_wrapper(double %a) nounwind readonly ssp noredzone {			define double @pow_wrapper(double %a) nounwind readonly ssp noredzone {
	; CHECK-LABEL: pow_wrapper:			; CHECK-LABEL: pow_wrapper:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movapd %xmm0, %xmm1			; CHECK-NEXT: movapd %xmm0, %xmm1
	; CHECK-NEXT: mulsd %xmm1, %xmm1			; CHECK-NEXT: mulsd %xmm0, %xmm1
	; CHECK-NEXT: mulsd %xmm1, %xmm0			; CHECK-NEXT: mulsd %xmm1, %xmm0
	; CHECK-NEXT: mulsd %xmm1, %xmm1			; CHECK-NEXT: mulsd %xmm1, %xmm1
	; CHECK-NEXT: mulsd %xmm1, %xmm0			; CHECK-NEXT: mulsd %xmm1, %xmm0
	; CHECK-NEXT: mulsd %xmm1, %xmm1			; CHECK-NEXT: mulsd %xmm1, %xmm1
	; CHECK-NEXT: mulsd %xmm0, %xmm1			; CHECK-NEXT: mulsd %xmm0, %xmm1
	; CHECK-NEXT: movapd %xmm1, %xmm0			; CHECK-NEXT: movapd %xmm1, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%ret = tail call double @llvm.powi.f64(double %a, i32 15) nounwind ; <double> [#uses=1]			%ret = tail call double @llvm.powi.f64(double %a, i32 15) nounwind ; <double> [#uses=1]
	Show All 24 Lines

test/CodeGen/X86/pr11334.ll

	Show All 19 Lines
	define <3 x double> @v3f2d_ext_vec(<3 x float> %v1) nounwind {			define <3 x double> @v3f2d_ext_vec(<3 x float> %v1) nounwind {
	; SSE-LABEL: v3f2d_ext_vec:			; SSE-LABEL: v3f2d_ext_vec:
	; SSE: # BB#0: # %entry			; SSE: # BB#0: # %entry
	; SSE-NEXT: cvtps2pd %xmm0, %xmm2			; SSE-NEXT: cvtps2pd %xmm0, %xmm2
	; SSE-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]			; SSE-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]
	; SSE-NEXT: cvtps2pd %xmm0, %xmm0			; SSE-NEXT: cvtps2pd %xmm0, %xmm0
	; SSE-NEXT: movlps %xmm0, -{{[0-9]+}}(%rsp)			; SSE-NEXT: movlps %xmm0, -{{[0-9]+}}(%rsp)
	; SSE-NEXT: movaps %xmm2, %xmm1			; SSE-NEXT: movaps %xmm2, %xmm1
	; SSE-NEXT: movhlps {{.*#+}} xmm1 = xmm1[1,1]			; SSE-NEXT: movhlps {{.*#+}} xmm1 = xmm2[1],xmm1[1]
	; SSE-NEXT: fldl -{{[0-9]+}}(%rsp)			; SSE-NEXT: fldl -{{[0-9]+}}(%rsp)
	; SSE-NEXT: movaps %xmm2, %xmm0			; SSE-NEXT: movaps %xmm2, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: v3f2d_ext_vec:			; AVX-LABEL: v3f2d_ext_vec:
	; AVX: # BB#0: # %entry			; AVX: # BB#0: # %entry
	; AVX-NEXT: vcvtps2pd %xmm0, %ymm0			; AVX-NEXT: vcvtps2pd %xmm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

test/CodeGen/X86/pr29112.ll

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm2[1],xmm0[3]			; CHECK-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1],xmm2[1],xmm0[3]
	; CHECK-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm13[0]			; CHECK-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm13[0]
	; CHECK-NEXT: vinsertps {{.*#+}} xmm1 = xmm8[0],xmm13[0],xmm8[2,3]			; CHECK-NEXT: vinsertps {{.*#+}} xmm1 = xmm8[0],xmm13[0],xmm8[2,3]
	; CHECK-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[1],xmm1[3]			; CHECK-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[1],xmm1[3]
	; CHECK-NEXT: vinsertps {{.*#+}} xmm3 = xmm1[0,1,2],xmm3[1]			; CHECK-NEXT: vinsertps {{.*#+}} xmm3 = xmm1[0,1,2],xmm3[1]
	; CHECK-NEXT: vinsertps {{.*#+}} xmm2 = xmm9[0,1],xmm2[3],xmm9[3]			; CHECK-NEXT: vinsertps {{.*#+}} xmm2 = xmm9[0,1],xmm2[3],xmm9[3]
	; CHECK-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,1,2],xmm12[0]			; CHECK-NEXT: vinsertps {{.*#+}} xmm2 = xmm2[0,1,2],xmm12[0]
	; CHECK-NEXT: vaddps %xmm3, %xmm2, %xmm2			; CHECK-NEXT: vaddps %xmm3, %xmm2, %xmm2
	; CHECK-NEXT: vmovaps %xmm15, %xmm1			; CHECK-NEXT: vmovaps %xmm15, {{[0-9]+}}(%rsp) # 16-byte Spill
	; CHECK-NEXT: vmovaps %xmm1, {{[0-9]+}}(%rsp) # 16-byte Spill			; CHECK-NEXT: vaddps %xmm0, %xmm15, %xmm9
	; CHECK-NEXT: vaddps %xmm0, %xmm1, %xmm9
	; CHECK-NEXT: vaddps %xmm14, %xmm10, %xmm0			; CHECK-NEXT: vaddps %xmm14, %xmm10, %xmm0
	; CHECK-NEXT: vaddps %xmm1, %xmm1, %xmm8			; CHECK-NEXT: vaddps %xmm15, %xmm15, %xmm8
	; CHECK-NEXT: vaddps %xmm11, %xmm3, %xmm3			; CHECK-NEXT: vaddps %xmm11, %xmm3, %xmm3
	; CHECK-NEXT: vaddps %xmm0, %xmm3, %xmm0			; CHECK-NEXT: vaddps %xmm0, %xmm3, %xmm0
	; CHECK-NEXT: vaddps %xmm0, %xmm1, %xmm0			; CHECK-NEXT: vaddps %xmm0, %xmm15, %xmm0
	; CHECK-NEXT: vmovaps %xmm8, {{[0-9]+}}(%rsp)			; CHECK-NEXT: vmovaps %xmm8, {{[0-9]+}}(%rsp)
	; CHECK-NEXT: vmovaps %xmm9, (%rsp)			; CHECK-NEXT: vmovaps %xmm9, (%rsp)
				; CHECK-NEXT: vmovaps %xmm15, %xmm1
	; CHECK-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm3 # 16-byte Reload			; CHECK-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm3 # 16-byte Reload
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: callq foo			; CHECK-NEXT: callq foo
	; CHECK-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm1 # 16-byte Reload			; CHECK-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm1 # 16-byte Reload
	; CHECK-NEXT: vaddps {{[0-9]+}}(%rsp), %xmm1, %xmm1 # 16-byte Folded Reload			; CHECK-NEXT: vaddps {{[0-9]+}}(%rsp), %xmm1, %xmm1 # 16-byte Folded Reload
	; CHECK-NEXT: vaddps %xmm0, %xmm1, %xmm0			; CHECK-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; CHECK-NEXT: addq $88, %rsp			; CHECK-NEXT: addq $88, %rsp
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	Show All 37 Lines

test/CodeGen/X86/psubus.ll

	Show First 20 Lines • Show All 632 Lines • ▼ Show 20 Lines
	; SSSE3-NEXT: packuswb %xmm2, %xmm1			; SSSE3-NEXT: packuswb %xmm2, %xmm1
	; SSSE3-NEXT: packuswb %xmm3, %xmm1			; SSSE3-NEXT: packuswb %xmm3, %xmm1
	; SSSE3-NEXT: andnpd %xmm1, %xmm0			; SSSE3-NEXT: andnpd %xmm1, %xmm0
	; SSSE3-NEXT: retq			; SSSE3-NEXT: retq
	;			;
	; SSE41-LABEL: test14:			; SSE41-LABEL: test14:
	; SSE41: ## BB#0: ## %vector.ph			; SSE41: ## BB#0: ## %vector.ph
	; SSE41-NEXT: movdqa %xmm0, %xmm5			; SSE41-NEXT: movdqa %xmm0, %xmm5
	; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm5[1,1,2,3]			; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; SSE41-NEXT: pmovzxbd {{.*#+}} xmm8 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; SSE41-NEXT: pmovzxbd {{.*#+}} xmm8 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; SSE41-NEXT: pmovzxbd {{.*#+}} xmm0 = xmm5[0],zero,zero,zero,xmm5[1],zero,zero,zero,xmm5[2],zero,zero,zero,xmm5[3],zero,zero,zero			; SSE41-NEXT: pmovzxbd {{.*#+}} xmm0 = xmm5[0],zero,zero,zero,xmm5[1],zero,zero,zero,xmm5[2],zero,zero,zero,xmm5[3],zero,zero,zero
	; SSE41-NEXT: pshufd {{.*#+}} xmm6 = xmm5[2,3,0,1]			; SSE41-NEXT: pshufd {{.*#+}} xmm6 = xmm5[2,3,0,1]
	; SSE41-NEXT: pmovzxbd {{.*#+}} xmm9 = xmm6[0],zero,zero,zero,xmm6[1],zero,zero,zero,xmm6[2],zero,zero,zero,xmm6[3],zero,zero,zero			; SSE41-NEXT: pmovzxbd {{.*#+}} xmm9 = xmm6[0],zero,zero,zero,xmm6[1],zero,zero,zero,xmm6[2],zero,zero,zero,xmm6[3],zero,zero,zero
	; SSE41-NEXT: pshufd {{.*#+}} xmm5 = xmm5[3,1,2,3]			; SSE41-NEXT: pshufd {{.*#+}} xmm5 = xmm5[3,1,2,3]
	; SSE41-NEXT: pmovzxbd {{.*#+}} xmm6 = xmm5[0],zero,zero,zero,xmm5[1],zero,zero,zero,xmm5[2],zero,zero,zero,xmm5[3],zero,zero,zero			; SSE41-NEXT: pmovzxbd {{.*#+}} xmm6 = xmm5[0],zero,zero,zero,xmm5[1],zero,zero,zero,xmm5[2],zero,zero,zero,xmm5[3],zero,zero,zero
	; SSE41-NEXT: movdqa {{.*#+}} xmm5 = [2147483648,2147483648,2147483648,2147483648]			; SSE41-NEXT: movdqa {{.*#+}} xmm5 = [2147483648,2147483648,2147483648,2147483648]
	; SSE41-NEXT: movdqa %xmm4, %xmm7			; SSE41-NEXT: movdqa %xmm4, %xmm7
	▲ Show 20 Lines • Show All 397 Lines • Show Last 20 Lines

test/CodeGen/X86/select.ll

	Show All 17 Lines
	; CHECK-NEXT: ## -- End function			; CHECK-NEXT: ## -- End function
	;			;
	; MCU-LABEL: test1:			; MCU-LABEL: test1:
	; MCU: # BB#0:			; MCU: # BB#0:
	; MCU-NEXT: testb $1, %cl			; MCU-NEXT: testb $1, %cl
	; MCU-NEXT: jne .LBB0_1			; MCU-NEXT: jne .LBB0_1
	; MCU-NEXT: # BB#2:			; MCU-NEXT: # BB#2:
	; MCU-NEXT: addl $8, %edx			; MCU-NEXT: addl $8, %edx
	; MCU-NEXT: movl %edx, %eax			; MCU-NEXT: movl (%edx), %eax
	; MCU-NEXT: movl (%eax), %eax
	; MCU-NEXT: retl			; MCU-NEXT: retl
	; MCU-NEXT: .LBB0_1:			; MCU-NEXT: .LBB0_1:
	; MCU-NEXT: addl $8, %eax			; MCU-NEXT: addl $8, %eax
	; MCU-NEXT: movl (%eax), %eax			; MCU-NEXT: movl (%eax), %eax
	; MCU-NEXT: retl			; MCU-NEXT: retl
	%t0 = load %0, %0* %p			%t0 = load %0, %0* %p
	%t1 = load %0, %0* %q			%t1 = load %0, %0* %q
	%t4 = select i1 %r, %0 %t0, %0 %t1			%t4 = select i1 %r, %0 %t0, %0 %t1
	▲ Show 20 Lines • Show All 1,169 Lines • Show Last 20 Lines

test/CodeGen/X86/shrink-wrap-chkstk.ll

	Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines

	false:			false:
	%tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]			%tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
	ret i32 %tmp.0			ret i32 %tmp.0
	}			}

	; CHECK-LABEL: @use_eax_before_prologue@8: # @use_eax_before_prologue			; CHECK-LABEL: @use_eax_before_prologue@8: # @use_eax_before_prologue
	; CHECK: movl %ecx, %eax			; CHECK: movl %ecx, %eax
	; CHECK: cmpl %edx, %eax			; CHECK: cmpl %edx, %ecx
	; CHECK: jge LBB1_2			; CHECK: jge LBB1_2
	; CHECK: pushl %eax			; CHECK: pushl %eax
	; CHECK: movl $4092, %eax			; CHECK: movl $4092, %eax
	; CHECK: calll __chkstk			; CHECK: calll __chkstk
	; CHECK: movl 4092(%esp), %eax			; CHECK: movl 4092(%esp), %eax
	; CHECK: calll _doSomething			; CHECK: calll _doSomething
	; CHECK: LBB1_2:			; CHECK: LBB1_2:
	; CHECK: retl			; CHECK: retl

test/CodeGen/X86/sqrt-fastmath.ll

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
ret float %div		ret float %div
}		}

define float @f32_estimate(float %x) #1 {		define float @f32_estimate(float %x) #1 {
; SSE-LABEL: f32_estimate:		; SSE-LABEL: f32_estimate:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: rsqrtss %xmm0, %xmm1		; SSE-NEXT: rsqrtss %xmm0, %xmm1
; SSE-NEXT: movaps %xmm1, %xmm2		; SSE-NEXT: movaps %xmm1, %xmm2
; SSE-NEXT: mulss %xmm2, %xmm2		; SSE-NEXT: mulss %xmm1, %xmm2
; SSE-NEXT: mulss %xmm0, %xmm2		; SSE-NEXT: mulss %xmm0, %xmm2
; SSE-NEXT: addss {{.*}}(%rip), %xmm2		; SSE-NEXT: addss {{.*}}(%rip), %xmm2
; SSE-NEXT: mulss {{.*}}(%rip), %xmm1		; SSE-NEXT: mulss {{.*}}(%rip), %xmm1
; SSE-NEXT: mulss %xmm2, %xmm1		; SSE-NEXT: mulss %xmm2, %xmm1
; SSE-NEXT: movaps %xmm1, %xmm0		; SSE-NEXT: movaps %xmm1, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: f32_estimate:		; AVX-LABEL: f32_estimate:
Show All 29 Lines	; AVX-NEXT: retq
ret <4 x float> %div		ret <4 x float> %div
}		}

define <4 x float> @v4f32_estimate(<4 x float> %x) #1 {		define <4 x float> @v4f32_estimate(<4 x float> %x) #1 {
; SSE-LABEL: v4f32_estimate:		; SSE-LABEL: v4f32_estimate:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: rsqrtps %xmm0, %xmm1		; SSE-NEXT: rsqrtps %xmm0, %xmm1
; SSE-NEXT: movaps %xmm1, %xmm2		; SSE-NEXT: movaps %xmm1, %xmm2
; SSE-NEXT: mulps %xmm2, %xmm2		; SSE-NEXT: mulps %xmm1, %xmm2
; SSE-NEXT: mulps %xmm0, %xmm2		; SSE-NEXT: mulps %xmm0, %xmm2
; SSE-NEXT: addps {{.*}}(%rip), %xmm2		; SSE-NEXT: addps {{.*}}(%rip), %xmm2
; SSE-NEXT: mulps {{.*}}(%rip), %xmm1		; SSE-NEXT: mulps {{.*}}(%rip), %xmm1
; SSE-NEXT: mulps %xmm2, %xmm1		; SSE-NEXT: mulps %xmm2, %xmm1
; SSE-NEXT: movaps %xmm1, %xmm0		; SSE-NEXT: movaps %xmm1, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: v4f32_estimate:		; AVX-LABEL: v4f32_estimate:
Show All 33 Lines
}		}

define <8 x float> @v8f32_estimate(<8 x float> %x) #1 {		define <8 x float> @v8f32_estimate(<8 x float> %x) #1 {
; SSE-LABEL: v8f32_estimate:		; SSE-LABEL: v8f32_estimate:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: rsqrtps %xmm0, %xmm3		; SSE-NEXT: rsqrtps %xmm0, %xmm3
; SSE-NEXT: movaps {{.*#+}} xmm4 = [-5.000000e-01,-5.000000e-01,-5.000000e-01,-5.000000e-01]		; SSE-NEXT: movaps {{.*#+}} xmm4 = [-5.000000e-01,-5.000000e-01,-5.000000e-01,-5.000000e-01]
; SSE-NEXT: movaps %xmm3, %xmm2		; SSE-NEXT: movaps %xmm3, %xmm2
; SSE-NEXT: mulps %xmm2, %xmm2		; SSE-NEXT: mulps %xmm3, %xmm2
; SSE-NEXT: mulps %xmm0, %xmm2		; SSE-NEXT: mulps %xmm0, %xmm2
; SSE-NEXT: movaps {{.*#+}} xmm0 = [-3.000000e+00,-3.000000e+00,-3.000000e+00,-3.000000e+00]		; SSE-NEXT: movaps {{.*#+}} xmm0 = [-3.000000e+00,-3.000000e+00,-3.000000e+00,-3.000000e+00]
; SSE-NEXT: addps %xmm0, %xmm2		; SSE-NEXT: addps %xmm0, %xmm2
; SSE-NEXT: mulps %xmm4, %xmm2		; SSE-NEXT: mulps %xmm4, %xmm2
; SSE-NEXT: mulps %xmm3, %xmm2		; SSE-NEXT: mulps %xmm3, %xmm2
; SSE-NEXT: rsqrtps %xmm1, %xmm5		; SSE-NEXT: rsqrtps %xmm1, %xmm5
; SSE-NEXT: movaps %xmm5, %xmm3		; SSE-NEXT: movaps %xmm5, %xmm3
; SSE-NEXT: mulps %xmm3, %xmm3		; SSE-NEXT: mulps %xmm5, %xmm3
; SSE-NEXT: mulps %xmm1, %xmm3		; SSE-NEXT: mulps %xmm1, %xmm3
; SSE-NEXT: addps %xmm0, %xmm3		; SSE-NEXT: addps %xmm0, %xmm3
; SSE-NEXT: mulps %xmm4, %xmm3		; SSE-NEXT: mulps %xmm4, %xmm3
; SSE-NEXT: mulps %xmm5, %xmm3		; SSE-NEXT: mulps %xmm5, %xmm3
; SSE-NEXT: movaps %xmm2, %xmm0		; SSE-NEXT: movaps %xmm2, %xmm0
; SSE-NEXT: movaps %xmm3, %xmm1		; SSE-NEXT: movaps %xmm3, %xmm1
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
Show All 19 Lines

test/CodeGen/X86/sse-scalar-fp-arith.ll

	Show First 20 Lines • Show All 1,078 Lines • ▼ Show 20 Lines
	}			}

	define <4 x float> @add_ss_mask(<4 x float> %a, <4 x float> %b, <4 x float> %c, i8 %mask) {			define <4 x float> @add_ss_mask(<4 x float> %a, <4 x float> %b, <4 x float> %c, i8 %mask) {
	; SSE2-LABEL: add_ss_mask:			; SSE2-LABEL: add_ss_mask:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: testb $1, %dil			; SSE2-NEXT: testb $1, %dil
	; SSE2-NEXT: jne .LBB62_1			; SSE2-NEXT: jne .LBB62_1
	; SSE2-NEXT: # BB#2:			; SSE2-NEXT: # BB#2:
	; SSE2-NEXT: movaps %xmm2, %xmm1			; SSE2-NEXT: movss {{.*#+}} xmm0 = xmm2[0],xmm0[1,2,3]
	; SSE2-NEXT: movss {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	; SSE2-NEXT: .LBB62_1:			; SSE2-NEXT: .LBB62_1:
	; SSE2-NEXT: addss %xmm0, %xmm1			; SSE2-NEXT: addss %xmm0, %xmm1
	; SSE2-NEXT: movss {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]			; SSE2-NEXT: movss {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: add_ss_mask:			; SSE41-LABEL: add_ss_mask:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: testb $1, %dil			; SSE41-NEXT: testb $1, %dil
	; SSE41-NEXT: jne .LBB62_1			; SSE41-NEXT: jne .LBB62_1
	; SSE41-NEXT: # BB#2:			; SSE41-NEXT: # BB#2:
	; SSE41-NEXT: movaps %xmm2, %xmm1			; SSE41-NEXT: blendps {{.*#+}} xmm0 = xmm2[0],xmm0[1,2,3]
	; SSE41-NEXT: blendps {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	; SSE41-NEXT: .LBB62_1:			; SSE41-NEXT: .LBB62_1:
	; SSE41-NEXT: addss %xmm0, %xmm1			; SSE41-NEXT: addss %xmm0, %xmm1
	; SSE41-NEXT: blendps {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]			; SSE41-NEXT: blendps {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: add_ss_mask:			; AVX1-LABEL: add_ss_mask:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	Show All 24 Lines
	}			}

	define <2 x double> @add_sd_mask(<2 x double> %a, <2 x double> %b, <2 x double> %c, i8 %mask) {			define <2 x double> @add_sd_mask(<2 x double> %a, <2 x double> %b, <2 x double> %c, i8 %mask) {
	; SSE2-LABEL: add_sd_mask:			; SSE2-LABEL: add_sd_mask:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: testb $1, %dil			; SSE2-NEXT: testb $1, %dil
	; SSE2-NEXT: jne .LBB63_1			; SSE2-NEXT: jne .LBB63_1
	; SSE2-NEXT: # BB#2:			; SSE2-NEXT: # BB#2:
	; SSE2-NEXT: movapd %xmm2, %xmm1			; SSE2-NEXT: movsd {{.*#+}} xmm0 = xmm2[0],xmm0[1]
	; SSE2-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	; SSE2-NEXT: .LBB63_1:			; SSE2-NEXT: .LBB63_1:
	; SSE2-NEXT: addsd %xmm0, %xmm1			; SSE2-NEXT: addsd %xmm0, %xmm1
	; SSE2-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]			; SSE2-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: add_sd_mask:			; SSE41-LABEL: add_sd_mask:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: testb $1, %dil			; SSE41-NEXT: testb $1, %dil
	; SSE41-NEXT: jne .LBB63_1			; SSE41-NEXT: jne .LBB63_1
	; SSE41-NEXT: # BB#2:			; SSE41-NEXT: # BB#2:
	; SSE41-NEXT: movapd %xmm2, %xmm1			; SSE41-NEXT: blendpd {{.*#+}} xmm0 = xmm2[0],xmm0[1]
	; SSE41-NEXT: blendpd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	; SSE41-NEXT: .LBB63_1:			; SSE41-NEXT: .LBB63_1:
	; SSE41-NEXT: addsd %xmm0, %xmm1			; SSE41-NEXT: addsd %xmm0, %xmm1
	; SSE41-NEXT: blendpd {{.*#+}} xmm0 = xmm1[0],xmm0[1]			; SSE41-NEXT: blendpd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: add_sd_mask:			; AVX1-LABEL: add_sd_mask:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	Show All 25 Lines

test/CodeGen/X86/sse1.ll

	Show All 10 Lines

	; This should not emit shuffles to populate the top 2 elements of the 4-element			; This should not emit shuffles to populate the top 2 elements of the 4-element
	; vector that this ends up returning.			; vector that this ends up returning.
	; rdar://8368414			; rdar://8368414
	define <2 x float> @test4(<2 x float> %A, <2 x float> %B) nounwind {			define <2 x float> @test4(<2 x float> %A, <2 x float> %B) nounwind {
	; X32-LABEL: test4:			; X32-LABEL: test4:
	; X32: # BB#0: # %entry			; X32: # BB#0: # %entry
	; X32-NEXT: movaps %xmm0, %xmm2			; X32-NEXT: movaps %xmm0, %xmm2
	; X32-NEXT: shufps {{.*#+}} xmm2 = xmm2[1,1,2,3]			; X32-NEXT: shufps {{.*#+}} xmm2 = xmm2[1,1],xmm0[2,3]
	; X32-NEXT: addss %xmm1, %xmm0			; X32-NEXT: addss %xmm1, %xmm0
	; X32-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1,2,3]			; X32-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1,2,3]
	; X32-NEXT: subss %xmm1, %xmm2			; X32-NEXT: subss %xmm1, %xmm2
	; X32-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]			; X32-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test4:			; X64-LABEL: test4:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: movaps %xmm0, %xmm2			; X64-NEXT: movaps %xmm0, %xmm2
	; X64-NEXT: shufps {{.*#+}} xmm2 = xmm2[1,1,2,3]			; X64-NEXT: shufps {{.*#+}} xmm2 = xmm2[1,1],xmm0[2,3]
	; X64-NEXT: addss %xmm1, %xmm0			; X64-NEXT: addss %xmm1, %xmm0
	; X64-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1,2,3]			; X64-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1,2,3]
	; X64-NEXT: subss %xmm1, %xmm2			; X64-NEXT: subss %xmm1, %xmm2
	; X64-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]			; X64-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
	; X64-NEXT: retq			; X64-NEXT: retq
	entry:			entry:
	%tmp7 = extractelement <2 x float> %A, i32 0			%tmp7 = extractelement <2 x float> %A, i32 0
	%tmp5 = extractelement <2 x float> %A, i32 1			%tmp5 = extractelement <2 x float> %A, i32 1
	▲ Show 20 Lines • Show All 313 Lines • Show Last 20 Lines

test/CodeGen/X86/sse3-avx-addsub-2.ll

	Show First 20 Lines • Show All 400 Lines • ▼ Show 20 Lines
	}			}

	define <4 x float> @test16(<4 x float> %A, <4 x float> %B) {			define <4 x float> @test16(<4 x float> %A, <4 x float> %B) {
	; SSE-LABEL: test16:			; SSE-LABEL: test16:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: movaps %xmm0, %xmm2			; SSE-NEXT: movaps %xmm0, %xmm2
	; SSE-NEXT: subss %xmm0, %xmm2			; SSE-NEXT: subss %xmm0, %xmm2
	; SSE-NEXT: movaps %xmm0, %xmm3			; SSE-NEXT: movaps %xmm0, %xmm3
	; SSE-NEXT: movhlps {{.*#+}} xmm3 = xmm3[1,1]			; SSE-NEXT: movhlps {{.*#+}} xmm3 = xmm0[1],xmm3[1]
	; SSE-NEXT: movaps %xmm1, %xmm4			; SSE-NEXT: movaps %xmm1, %xmm4
	; SSE-NEXT: movhlps {{.*#+}} xmm4 = xmm4[1,1]			; SSE-NEXT: movhlps {{.*#+}} xmm4 = xmm1[1],xmm4[1]
	; SSE-NEXT: subss %xmm4, %xmm3			; SSE-NEXT: subss %xmm4, %xmm3
	; SSE-NEXT: movshdup {{.*#+}} xmm4 = xmm0[1,1,3,3]			; SSE-NEXT: movshdup {{.*#+}} xmm4 = xmm0[1,1,3,3]
	; SSE-NEXT: addss %xmm0, %xmm4			; SSE-NEXT: addss %xmm0, %xmm4
	; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[3,1,2,3]			; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[3,1,2,3]
	; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[3,1,2,3]			; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[3,1,2,3]
	; SSE-NEXT: addss %xmm0, %xmm1			; SSE-NEXT: addss %xmm0, %xmm1
	; SSE-NEXT: unpcklps {{.*#+}} xmm3 = xmm3[0],xmm1[0],xmm3[1],xmm1[1]			; SSE-NEXT: unpcklps {{.*#+}} xmm3 = xmm3[0],xmm1[0],xmm3[1],xmm1[1]
	; SSE-NEXT: unpcklps {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1]			; SSE-NEXT: unpcklps {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1]
	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

test/CodeGen/X86/statepoint-live-in.ll

	Show First 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Lcfi9:			; CHECK-NEXT: Lcfi9:
	; CHECK-NEXT: .cfi_def_cfa_offset 16			; CHECK-NEXT: .cfi_def_cfa_offset 16
	; CHECK-NEXT: subq $16, %rsp			; CHECK-NEXT: subq $16, %rsp
	; CHECK-NEXT: Lcfi10:			; CHECK-NEXT: Lcfi10:
	; CHECK-NEXT: .cfi_def_cfa_offset 32			; CHECK-NEXT: .cfi_def_cfa_offset 32
	; CHECK-NEXT: Lcfi11:			; CHECK-NEXT: Lcfi11:
	; CHECK-NEXT: .cfi_offset %rbx, -16			; CHECK-NEXT: .cfi_offset %rbx, -16
	; CHECK-NEXT: movl %edi, %ebx			; CHECK-NEXT: movl %edi, %ebx
	; CHECK-NEXT: movl %ebx, {{[0-9]+}}(%rsp)			; CHECK-NEXT: movl %edi, {{[0-9]+}}(%rsp)
	; CHECK-NEXT: callq _baz			; CHECK-NEXT: callq _baz
	; CHECK-NEXT: Ltmp6:			; CHECK-NEXT: Ltmp6:
	; CHECK-NEXT: callq _bar			; CHECK-NEXT: callq _bar
	; CHECK-NEXT: Ltmp7:			; CHECK-NEXT: Ltmp7:
	; CHECK-NEXT: addq $16, %rsp			; CHECK-NEXT: addq $16, %rsp
	; CHECK-NEXT: popq %rbx			; CHECK-NEXT: popq %rbx
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	Show All 10 Lines
	; CHECK-NEXT: .short 5			; CHECK-NEXT: .short 5
	; CHECK-NEXT: .short 0			; CHECK-NEXT: .short 0
	; CHECK-NEXT: .long 0			; CHECK-NEXT: .long 0

	; CHECK: Ltmp1-_test2			; CHECK: Ltmp1-_test2
	; CHECK: .byte 1			; CHECK: .byte 1
	; CHECK-NEXT: .byte 0			; CHECK-NEXT: .byte 0
	; CHECK-NEXT: .short 4			; CHECK-NEXT: .short 4
	; CHECK-NEXT: .short 6			; CHECK-NEXT: .short 5
	; CHECK-NEXT: .short 0			; CHECK-NEXT: .short 0
	; CHECK-NEXT: .long 0			; CHECK-NEXT: .long 0
	; CHECK: .byte 1			; CHECK: .byte 1
	; CHECK-NEXT: .byte 0			; CHECK-NEXT: .byte 0
	; CHECK-NEXT: .short 4			; CHECK-NEXT: .short 4
	; CHECK-NEXT: .short 3			; CHECK-NEXT: .short 4
	; CHECK-NEXT: .short 0			; CHECK-NEXT: .short 0
	; CHECK-NEXT: .long 0			; CHECK-NEXT: .long 0
	; CHECK: Ltmp2-_test2			; CHECK: Ltmp2-_test2
	; CHECK: .byte 1			; CHECK: .byte 1
	; CHECK-NEXT: .byte 0			; CHECK-NEXT: .byte 0
	; CHECK-NEXT: .short 4			; CHECK-NEXT: .short 4
	; CHECK-NEXT: .short 3			; CHECK-NEXT: .short 3
	; CHECK-NEXT: .short 0			; CHECK-NEXT: .short 0
	Show All 14 Lines

test/CodeGen/X86/statepoint-stack-usage.ll

	Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines

	; Check that we reuse the same stack slot across multiple calls. The use of			; Check that we reuse the same stack slot across multiple calls. The use of
	; more than two calls here is critical. We've had a bug which allowed reuse			; more than two calls here is critical. We've had a bug which allowed reuse
	; exactly once which went undetected for a long time.			; exactly once which went undetected for a long time.
	define i32 @back_to_back_deopt(i32 %a, i32 %b, i32 %c) #1			define i32 @back_to_back_deopt(i32 %a, i32 %b, i32 %c) #1
	gc "statepoint-example" {			gc "statepoint-example" {
	; CHECK-LABEL: back_to_back_deopt			; CHECK-LABEL: back_to_back_deopt
	; The exact stores don't matter, but there need to be three stack slots created			; The exact stores don't matter, but there need to be three stack slots created
	; CHECK: movl %ebx, 12(%rsp)			; CHECK: movl %edi, 12(%rsp)
	; CHECK: movl %ebp, 8(%rsp)			; CHECK: movl %esi, 8(%rsp)
	; CHECK: movl %r14d, 4(%rsp)			; CHECK: movl %edx, 4(%rsp)
	; CHECK: callq			; CHECK: callq
	; CHECK: movl %ebx, 12(%rsp)			; CHECK: movl %ebx, 12(%rsp)
	; CHECK: movl %ebp, 8(%rsp)			; CHECK: movl %ebp, 8(%rsp)
	; CHECK: movl %r14d, 4(%rsp)			; CHECK: movl %r14d, 4(%rsp)
	; CHECK: callq			; CHECK: callq
	; CHECK: movl %ebx, 12(%rsp)			; CHECK: movl %ebx, 12(%rsp)
	; CHECK: movl %ebp, 8(%rsp)			; CHECK: movl %ebp, 8(%rsp)
	; CHECK: movl %r14d, 4(%rsp)			; CHECK: movl %r14d, 4(%rsp)
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

test/CodeGen/X86/vec_fp_to_int.ll

Show First 20 Lines • Show All 1,012 Lines • ▼ Show 20 Lines
}		}

define <4 x i64> @fptosi_4f32_to_4i64(<8 x float> %a) {		define <4 x i64> @fptosi_4f32_to_4i64(<8 x float> %a) {
; SSE-LABEL: fptosi_4f32_to_4i64:		; SSE-LABEL: fptosi_4f32_to_4i64:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: cvttss2si %xmm0, %rax		; SSE-NEXT: cvttss2si %xmm0, %rax
; SSE-NEXT: movq %rax, %xmm2		; SSE-NEXT: movq %rax, %xmm2
; SSE-NEXT: movaps %xmm0, %xmm1		; SSE-NEXT: movaps %xmm0, %xmm1
; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1,2,3]		; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1],xmm0[2,3]
; SSE-NEXT: cvttss2si %xmm1, %rax		; SSE-NEXT: cvttss2si %xmm1, %rax
; SSE-NEXT: movq %rax, %xmm1		; SSE-NEXT: movq %rax, %xmm1
; SSE-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm1[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm1[0]
; SSE-NEXT: movaps %xmm0, %xmm1		; SSE-NEXT: movaps %xmm0, %xmm1
; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[3,1,2,3]		; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[3,1],xmm0[2,3]
; SSE-NEXT: cvttss2si %xmm1, %rax		; SSE-NEXT: cvttss2si %xmm1, %rax
; SSE-NEXT: movq %rax, %xmm3		; SSE-NEXT: movq %rax, %xmm3
; SSE-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]		; SSE-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]
; SSE-NEXT: cvttss2si %xmm0, %rax		; SSE-NEXT: cvttss2si %xmm0, %rax
; SSE-NEXT: movq %rax, %xmm1		; SSE-NEXT: movq %rax, %xmm1
; SSE-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm3[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm3[0]
; SSE-NEXT: movdqa %xmm2, %xmm0		; SSE-NEXT: movdqa %xmm2, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
}		}

define <4 x i64> @fptosi_8f32_to_4i64(<8 x float> %a) {		define <4 x i64> @fptosi_8f32_to_4i64(<8 x float> %a) {
; SSE-LABEL: fptosi_8f32_to_4i64:		; SSE-LABEL: fptosi_8f32_to_4i64:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: cvttss2si %xmm0, %rax		; SSE-NEXT: cvttss2si %xmm0, %rax
; SSE-NEXT: movq %rax, %xmm2		; SSE-NEXT: movq %rax, %xmm2
; SSE-NEXT: movaps %xmm0, %xmm1		; SSE-NEXT: movaps %xmm0, %xmm1
; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1,2,3]		; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1],xmm0[2,3]
; SSE-NEXT: cvttss2si %xmm1, %rax		; SSE-NEXT: cvttss2si %xmm1, %rax
; SSE-NEXT: movq %rax, %xmm1		; SSE-NEXT: movq %rax, %xmm1
; SSE-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm1[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm1[0]
; SSE-NEXT: movaps %xmm0, %xmm1		; SSE-NEXT: movaps %xmm0, %xmm1
; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[3,1,2,3]		; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[3,1],xmm0[2,3]
; SSE-NEXT: cvttss2si %xmm1, %rax		; SSE-NEXT: cvttss2si %xmm1, %rax
; SSE-NEXT: movq %rax, %xmm3		; SSE-NEXT: movq %rax, %xmm3
; SSE-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]		; SSE-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]
; SSE-NEXT: cvttss2si %xmm0, %rax		; SSE-NEXT: cvttss2si %xmm0, %rax
; SSE-NEXT: movq %rax, %xmm1		; SSE-NEXT: movq %rax, %xmm1
; SSE-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm3[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm3[0]
; SSE-NEXT: movdqa %xmm2, %xmm0		; SSE-NEXT: movdqa %xmm2, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
▲ Show 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	; AVX512VLDQ-NEXT: retq
%cvt = fptoui <2 x float> %a to <2 x i32>		%cvt = fptoui <2 x float> %a to <2 x i32>
ret <2 x i32> %cvt		ret <2 x i32> %cvt
}		}

define <4 x i32> @fptoui_4f32_to_4i32(<4 x float> %a) {		define <4 x i32> @fptoui_4f32_to_4i32(<4 x float> %a) {
; SSE-LABEL: fptoui_4f32_to_4i32:		; SSE-LABEL: fptoui_4f32_to_4i32:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movaps %xmm0, %xmm1		; SSE-NEXT: movaps %xmm0, %xmm1
; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[3,1,2,3]		; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[3,1],xmm0[2,3]
; SSE-NEXT: cvttss2si %xmm1, %rax		; SSE-NEXT: cvttss2si %xmm1, %rax
; SSE-NEXT: movd %eax, %xmm1		; SSE-NEXT: movd %eax, %xmm1
; SSE-NEXT: movaps %xmm0, %xmm2		; SSE-NEXT: movaps %xmm0, %xmm2
; SSE-NEXT: movhlps {{.*#+}} xmm2 = xmm2[1,1]		; SSE-NEXT: movhlps {{.*#+}} xmm2 = xmm0[1],xmm2[1]
; SSE-NEXT: cvttss2si %xmm2, %rax		; SSE-NEXT: cvttss2si %xmm2, %rax
; SSE-NEXT: movd %eax, %xmm2		; SSE-NEXT: movd %eax, %xmm2
; SSE-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1]		; SSE-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1]
; SSE-NEXT: cvttss2si %xmm0, %rax		; SSE-NEXT: cvttss2si %xmm0, %rax
; SSE-NEXT: movd %eax, %xmm1		; SSE-NEXT: movd %eax, %xmm1
; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,1,2,3]		; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,1,2,3]
; SSE-NEXT: cvttss2si %xmm0, %rax		; SSE-NEXT: cvttss2si %xmm0, %rax
; SSE-NEXT: movd %eax, %xmm0		; SSE-NEXT: movd %eax, %xmm0
▲ Show 20 Lines • Show All 223 Lines • ▼ Show 20 Lines
define <8 x i32> @fptoui_8f32_to_8i32(<8 x float> %a) {		define <8 x i32> @fptoui_8f32_to_8i32(<8 x float> %a) {
; SSE-LABEL: fptoui_8f32_to_8i32:		; SSE-LABEL: fptoui_8f32_to_8i32:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movaps %xmm0, %xmm2		; SSE-NEXT: movaps %xmm0, %xmm2
; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[3,1,2,3]		; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[3,1,2,3]
; SSE-NEXT: cvttss2si %xmm0, %rax		; SSE-NEXT: cvttss2si %xmm0, %rax
; SSE-NEXT: movd %eax, %xmm0		; SSE-NEXT: movd %eax, %xmm0
; SSE-NEXT: movaps %xmm2, %xmm3		; SSE-NEXT: movaps %xmm2, %xmm3
; SSE-NEXT: movhlps {{.*#+}} xmm3 = xmm3[1,1]		; SSE-NEXT: movhlps {{.*#+}} xmm3 = xmm2[1],xmm3[1]
; SSE-NEXT: cvttss2si %xmm3, %rax		; SSE-NEXT: cvttss2si %xmm3, %rax
; SSE-NEXT: movd %eax, %xmm3		; SSE-NEXT: movd %eax, %xmm3
; SSE-NEXT: punpckldq {{.*#+}} xmm3 = xmm3[0],xmm0[0],xmm3[1],xmm0[1]		; SSE-NEXT: punpckldq {{.*#+}} xmm3 = xmm3[0],xmm0[0],xmm3[1],xmm0[1]
; SSE-NEXT: cvttss2si %xmm2, %rax		; SSE-NEXT: cvttss2si %xmm2, %rax
; SSE-NEXT: movd %eax, %xmm0		; SSE-NEXT: movd %eax, %xmm0
; SSE-NEXT: shufps {{.*#+}} xmm2 = xmm2[1,1,2,3]		; SSE-NEXT: shufps {{.*#+}} xmm2 = xmm2[1,1,2,3]
; SSE-NEXT: cvttss2si %xmm2, %rax		; SSE-NEXT: cvttss2si %xmm2, %rax
; SSE-NEXT: movd %eax, %xmm2		; SSE-NEXT: movd %eax, %xmm2
; SSE-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]		; SSE-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
; SSE-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm3[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm3[0]
; SSE-NEXT: movaps %xmm1, %xmm2		; SSE-NEXT: movaps %xmm1, %xmm2
; SSE-NEXT: shufps {{.*#+}} xmm2 = xmm2[3,1,2,3]		; SSE-NEXT: shufps {{.*#+}} xmm2 = xmm2[3,1],xmm1[2,3]
; SSE-NEXT: cvttss2si %xmm2, %rax		; SSE-NEXT: cvttss2si %xmm2, %rax
; SSE-NEXT: movd %eax, %xmm2		; SSE-NEXT: movd %eax, %xmm2
; SSE-NEXT: movaps %xmm1, %xmm3		; SSE-NEXT: movaps %xmm1, %xmm3
; SSE-NEXT: movhlps {{.*#+}} xmm3 = xmm3[1,1]		; SSE-NEXT: movhlps {{.*#+}} xmm3 = xmm1[1],xmm3[1]
; SSE-NEXT: cvttss2si %xmm3, %rax		; SSE-NEXT: cvttss2si %xmm3, %rax
; SSE-NEXT: movd %eax, %xmm3		; SSE-NEXT: movd %eax, %xmm3
; SSE-NEXT: punpckldq {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1]		; SSE-NEXT: punpckldq {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1]
; SSE-NEXT: cvttss2si %xmm1, %rax		; SSE-NEXT: cvttss2si %xmm1, %rax
; SSE-NEXT: movd %eax, %xmm2		; SSE-NEXT: movd %eax, %xmm2
; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1,2,3]		; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,1,2,3]
; SSE-NEXT: cvttss2si %xmm1, %rax		; SSE-NEXT: cvttss2si %xmm1, %rax
; SSE-NEXT: movd %eax, %xmm1		; SSE-NEXT: movd %eax, %xmm1
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
; SSE-NEXT: cvttss2si %xmm2, %rcx		; SSE-NEXT: cvttss2si %xmm2, %rcx
; SSE-NEXT: movabsq $-9223372036854775808, %rax # imm = 0x8000000000000000		; SSE-NEXT: movabsq $-9223372036854775808, %rax # imm = 0x8000000000000000
; SSE-NEXT: xorq %rax, %rcx		; SSE-NEXT: xorq %rax, %rcx
; SSE-NEXT: cvttss2si %xmm0, %rdx		; SSE-NEXT: cvttss2si %xmm0, %rdx
; SSE-NEXT: ucomiss %xmm1, %xmm0		; SSE-NEXT: ucomiss %xmm1, %xmm0
; SSE-NEXT: cmovaeq %rcx, %rdx		; SSE-NEXT: cmovaeq %rcx, %rdx
; SSE-NEXT: movq %rdx, %xmm2		; SSE-NEXT: movq %rdx, %xmm2
; SSE-NEXT: movaps %xmm0, %xmm3		; SSE-NEXT: movaps %xmm0, %xmm3
; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[1,1,2,3]		; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[1,1],xmm0[2,3]
; SSE-NEXT: movaps %xmm3, %xmm4		; SSE-NEXT: movaps %xmm3, %xmm4
; SSE-NEXT: subss %xmm1, %xmm4		; SSE-NEXT: subss %xmm1, %xmm4
; SSE-NEXT: cvttss2si %xmm4, %rcx		; SSE-NEXT: cvttss2si %xmm4, %rcx
; SSE-NEXT: xorq %rax, %rcx		; SSE-NEXT: xorq %rax, %rcx
; SSE-NEXT: cvttss2si %xmm3, %rdx		; SSE-NEXT: cvttss2si %xmm3, %rdx
; SSE-NEXT: ucomiss %xmm1, %xmm3		; SSE-NEXT: ucomiss %xmm1, %xmm3
; SSE-NEXT: cmovaeq %rcx, %rdx		; SSE-NEXT: cmovaeq %rcx, %rdx
; SSE-NEXT: movq %rdx, %xmm3		; SSE-NEXT: movq %rdx, %xmm3
; SSE-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]
; SSE-NEXT: movaps %xmm0, %xmm3		; SSE-NEXT: movaps %xmm0, %xmm3
; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[3,1,2,3]		; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[3,1],xmm0[2,3]
; SSE-NEXT: movaps %xmm3, %xmm4		; SSE-NEXT: movaps %xmm3, %xmm4
; SSE-NEXT: subss %xmm1, %xmm4		; SSE-NEXT: subss %xmm1, %xmm4
; SSE-NEXT: cvttss2si %xmm4, %rcx		; SSE-NEXT: cvttss2si %xmm4, %rcx
; SSE-NEXT: xorq %rax, %rcx		; SSE-NEXT: xorq %rax, %rcx
; SSE-NEXT: cvttss2si %xmm3, %rdx		; SSE-NEXT: cvttss2si %xmm3, %rdx
; SSE-NEXT: ucomiss %xmm1, %xmm3		; SSE-NEXT: ucomiss %xmm1, %xmm3
; SSE-NEXT: cmovaeq %rcx, %rdx		; SSE-NEXT: cmovaeq %rcx, %rdx
; SSE-NEXT: movq %rdx, %xmm3		; SSE-NEXT: movq %rdx, %xmm3
▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines
; SSE-NEXT: cvttss2si %xmm2, %rcx		; SSE-NEXT: cvttss2si %xmm2, %rcx
; SSE-NEXT: movabsq $-9223372036854775808, %rax # imm = 0x8000000000000000		; SSE-NEXT: movabsq $-9223372036854775808, %rax # imm = 0x8000000000000000
; SSE-NEXT: xorq %rax, %rcx		; SSE-NEXT: xorq %rax, %rcx
; SSE-NEXT: cvttss2si %xmm0, %rdx		; SSE-NEXT: cvttss2si %xmm0, %rdx
; SSE-NEXT: ucomiss %xmm1, %xmm0		; SSE-NEXT: ucomiss %xmm1, %xmm0
; SSE-NEXT: cmovaeq %rcx, %rdx		; SSE-NEXT: cmovaeq %rcx, %rdx
; SSE-NEXT: movq %rdx, %xmm2		; SSE-NEXT: movq %rdx, %xmm2
; SSE-NEXT: movaps %xmm0, %xmm3		; SSE-NEXT: movaps %xmm0, %xmm3
; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[1,1,2,3]		; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[1,1],xmm0[2,3]
; SSE-NEXT: movaps %xmm3, %xmm4		; SSE-NEXT: movaps %xmm3, %xmm4
; SSE-NEXT: subss %xmm1, %xmm4		; SSE-NEXT: subss %xmm1, %xmm4
; SSE-NEXT: cvttss2si %xmm4, %rcx		; SSE-NEXT: cvttss2si %xmm4, %rcx
; SSE-NEXT: xorq %rax, %rcx		; SSE-NEXT: xorq %rax, %rcx
; SSE-NEXT: cvttss2si %xmm3, %rdx		; SSE-NEXT: cvttss2si %xmm3, %rdx
; SSE-NEXT: ucomiss %xmm1, %xmm3		; SSE-NEXT: ucomiss %xmm1, %xmm3
; SSE-NEXT: cmovaeq %rcx, %rdx		; SSE-NEXT: cmovaeq %rcx, %rdx
; SSE-NEXT: movq %rdx, %xmm3		; SSE-NEXT: movq %rdx, %xmm3
; SSE-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]		; SSE-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]
; SSE-NEXT: movaps %xmm0, %xmm3		; SSE-NEXT: movaps %xmm0, %xmm3
; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[3,1,2,3]		; SSE-NEXT: shufps {{.*#+}} xmm3 = xmm3[3,1],xmm0[2,3]
; SSE-NEXT: movaps %xmm3, %xmm4		; SSE-NEXT: movaps %xmm3, %xmm4
; SSE-NEXT: subss %xmm1, %xmm4		; SSE-NEXT: subss %xmm1, %xmm4
; SSE-NEXT: cvttss2si %xmm4, %rcx		; SSE-NEXT: cvttss2si %xmm4, %rcx
; SSE-NEXT: xorq %rax, %rcx		; SSE-NEXT: xorq %rax, %rcx
; SSE-NEXT: cvttss2si %xmm3, %rdx		; SSE-NEXT: cvttss2si %xmm3, %rdx
; SSE-NEXT: ucomiss %xmm1, %xmm3		; SSE-NEXT: ucomiss %xmm1, %xmm3
; SSE-NEXT: cmovaeq %rcx, %rdx		; SSE-NEXT: cmovaeq %rcx, %rdx
; SSE-NEXT: movq %rdx, %xmm3		; SSE-NEXT: movq %rdx, %xmm3
▲ Show 20 Lines • Show All 564 Lines • Show Last 20 Lines

test/CodeGen/X86/vec_int_to_fp.ll

Show First 20 Lines • Show All 1,605 Lines • ▼ Show 20 Lines
;		;
; Unsigned Integer to Float		; Unsigned Integer to Float
;		;

define <4 x float> @uitofp_2i64_to_4f32(<2 x i64> %a) {		define <4 x float> @uitofp_2i64_to_4f32(<2 x i64> %a) {
; SSE-LABEL: uitofp_2i64_to_4f32:		; SSE-LABEL: uitofp_2i64_to_4f32:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movdqa %xmm0, %xmm1		; SSE-NEXT: movdqa %xmm0, %xmm1
; SSE-NEXT: movq %xmm1, %rax		; SSE-NEXT: movq %xmm0, %rax
; SSE-NEXT: testq %rax, %rax		; SSE-NEXT: testq %rax, %rax
; SSE-NEXT: js .LBB39_1		; SSE-NEXT: js .LBB39_1
; SSE-NEXT: # BB#2:		; SSE-NEXT: # BB#2:
; SSE-NEXT: xorps %xmm0, %xmm0		; SSE-NEXT: xorps %xmm0, %xmm0
; SSE-NEXT: cvtsi2ssq %rax, %xmm0		; SSE-NEXT: cvtsi2ssq %rax, %xmm0
; SSE-NEXT: jmp .LBB39_3		; SSE-NEXT: jmp .LBB39_3
; SSE-NEXT: .LBB39_1:		; SSE-NEXT: .LBB39_1:
; SSE-NEXT: movq %rax, %rcx		; SSE-NEXT: movq %rax, %rcx
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	; AVX512VLDQ-NEXT: retq
%ext = shufflevector <2 x float> %cvt, <2 x float> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		%ext = shufflevector <2 x float> %cvt, <2 x float> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
ret <4 x float> %ext		ret <4 x float> %ext
}		}

define <4 x float> @uitofp_4i64_to_4f32_undef(<2 x i64> %a) {		define <4 x float> @uitofp_4i64_to_4f32_undef(<2 x i64> %a) {
; SSE-LABEL: uitofp_4i64_to_4f32_undef:		; SSE-LABEL: uitofp_4i64_to_4f32_undef:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: movdqa %xmm0, %xmm1		; SSE-NEXT: movdqa %xmm0, %xmm1
; SSE-NEXT: movq %xmm1, %rax		; SSE-NEXT: movq %xmm0, %rax
; SSE-NEXT: testq %rax, %rax		; SSE-NEXT: testq %rax, %rax
; SSE-NEXT: js .LBB41_1		; SSE-NEXT: js .LBB41_1
; SSE-NEXT: # BB#2:		; SSE-NEXT: # BB#2:
; SSE-NEXT: xorps %xmm0, %xmm0		; SSE-NEXT: xorps %xmm0, %xmm0
; SSE-NEXT: cvtsi2ssq %rax, %xmm0		; SSE-NEXT: cvtsi2ssq %rax, %xmm0
; SSE-NEXT: jmp .LBB41_3		; SSE-NEXT: jmp .LBB41_3
; SSE-NEXT: .LBB41_1:		; SSE-NEXT: .LBB41_1:
; SSE-NEXT: movq %rax, %rcx		; SSE-NEXT: movq %rax, %rcx
▲ Show 20 Lines • Show All 3,075 Lines • Show Last 20 Lines

test/CodeGen/X86/vec_minmax_sint.ll

	Show First 20 Lines • Show All 431 Lines • ▼ Show 20 Lines
	; SSE41-NEXT: blendvpd %xmm0, %xmm2, %xmm1			; SSE41-NEXT: blendvpd %xmm0, %xmm2, %xmm1
	; SSE41-NEXT: movapd %xmm1, %xmm0			; SSE41-NEXT: movapd %xmm1, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; SSE42-LABEL: max_ge_v2i64:			; SSE42-LABEL: max_ge_v2i64:
	; SSE42: # BB#0:			; SSE42: # BB#0:
	; SSE42-NEXT: movdqa %xmm0, %xmm2			; SSE42-NEXT: movdqa %xmm0, %xmm2
	; SSE42-NEXT: movdqa %xmm1, %xmm3			; SSE42-NEXT: movdqa %xmm1, %xmm3
	; SSE42-NEXT: pcmpgtq %xmm2, %xmm3			; SSE42-NEXT: pcmpgtq %xmm0, %xmm3
	; SSE42-NEXT: pcmpeqd %xmm0, %xmm0			; SSE42-NEXT: pcmpeqd %xmm0, %xmm0
	; SSE42-NEXT: pxor %xmm3, %xmm0			; SSE42-NEXT: pxor %xmm3, %xmm0
	; SSE42-NEXT: blendvpd %xmm0, %xmm2, %xmm1			; SSE42-NEXT: blendvpd %xmm0, %xmm2, %xmm1
	; SSE42-NEXT: movapd %xmm1, %xmm0			; SSE42-NEXT: movapd %xmm1, %xmm0
	; SSE42-NEXT: retq			; SSE42-NEXT: retq
	;			;
	; AVX-LABEL: max_ge_v2i64:			; AVX-LABEL: max_ge_v2i64:
	; AVX: # BB#0:			; AVX: # BB#0:
	▲ Show 20 Lines • Show All 1,642 Lines • Show Last 20 Lines

test/CodeGen/X86/vec_shift4.ll

Show All 29 Lines	; CHECK: pmulld
ret <2 x i64> %tmp2		ret <2 x i64> %tmp2
}		}

define <2 x i64> @shl2(<16 x i8> %r, <16 x i8> %a) nounwind readnone ssp {		define <2 x i64> @shl2(<16 x i8> %r, <16 x i8> %a) nounwind readnone ssp {
; X32-LABEL: shl2:		; X32-LABEL: shl2:
; X32: # BB#0: # %entry		; X32: # BB#0: # %entry
; X32-NEXT: movdqa %xmm0, %xmm2		; X32-NEXT: movdqa %xmm0, %xmm2
; X32-NEXT: psllw $5, %xmm1		; X32-NEXT: psllw $5, %xmm1
; X32-NEXT: movdqa %xmm2, %xmm3		; X32-NEXT: movdqa %xmm0, %xmm3
; X32-NEXT: psllw $4, %xmm3		; X32-NEXT: psllw $4, %xmm3
; X32-NEXT: pand {{\.LCPI.*}}, %xmm3		; X32-NEXT: pand {{\.LCPI.*}}, %xmm3
; X32-NEXT: movdqa %xmm1, %xmm0		; X32-NEXT: movdqa %xmm1, %xmm0
; X32-NEXT: pblendvb %xmm0, %xmm3, %xmm2		; X32-NEXT: pblendvb %xmm0, %xmm3, %xmm2
; X32-NEXT: movdqa %xmm2, %xmm3		; X32-NEXT: movdqa %xmm2, %xmm3
; X32-NEXT: psllw $2, %xmm3		; X32-NEXT: psllw $2, %xmm3
; X32-NEXT: pand {{\.LCPI.*}}, %xmm3		; X32-NEXT: pand {{\.LCPI.*}}, %xmm3
; X32-NEXT: paddb %xmm1, %xmm1		; X32-NEXT: paddb %xmm1, %xmm1
; X32-NEXT: movdqa %xmm1, %xmm0		; X32-NEXT: movdqa %xmm1, %xmm0
; X32-NEXT: pblendvb %xmm0, %xmm3, %xmm2		; X32-NEXT: pblendvb %xmm0, %xmm3, %xmm2
; X32-NEXT: movdqa %xmm2, %xmm3		; X32-NEXT: movdqa %xmm2, %xmm3
; X32-NEXT: paddb %xmm3, %xmm3		; X32-NEXT: paddb %xmm2, %xmm3
; X32-NEXT: paddb %xmm1, %xmm1		; X32-NEXT: paddb %xmm1, %xmm1
; X32-NEXT: movdqa %xmm1, %xmm0		; X32-NEXT: movdqa %xmm1, %xmm0
; X32-NEXT: pblendvb %xmm0, %xmm3, %xmm2		; X32-NEXT: pblendvb %xmm0, %xmm3, %xmm2
; X32-NEXT: movdqa %xmm2, %xmm0		; X32-NEXT: movdqa %xmm2, %xmm0
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: shl2:		; X64-LABEL: shl2:
; X64: # BB#0: # %entry		; X64: # BB#0: # %entry
; X64-NEXT: movdqa %xmm0, %xmm2		; X64-NEXT: movdqa %xmm0, %xmm2
; X64-NEXT: psllw $5, %xmm1		; X64-NEXT: psllw $5, %xmm1
; X64-NEXT: movdqa %xmm2, %xmm3		; X64-NEXT: movdqa %xmm0, %xmm3
; X64-NEXT: psllw $4, %xmm3		; X64-NEXT: psllw $4, %xmm3
; X64-NEXT: pand {{.*}}(%rip), %xmm3		; X64-NEXT: pand {{.*}}(%rip), %xmm3
; X64-NEXT: movdqa %xmm1, %xmm0		; X64-NEXT: movdqa %xmm1, %xmm0
; X64-NEXT: pblendvb %xmm0, %xmm3, %xmm2		; X64-NEXT: pblendvb %xmm0, %xmm3, %xmm2
; X64-NEXT: movdqa %xmm2, %xmm3		; X64-NEXT: movdqa %xmm2, %xmm3
; X64-NEXT: psllw $2, %xmm3		; X64-NEXT: psllw $2, %xmm3
; X64-NEXT: pand {{.*}}(%rip), %xmm3		; X64-NEXT: pand {{.*}}(%rip), %xmm3
; X64-NEXT: paddb %xmm1, %xmm1		; X64-NEXT: paddb %xmm1, %xmm1
; X64-NEXT: movdqa %xmm1, %xmm0		; X64-NEXT: movdqa %xmm1, %xmm0
; X64-NEXT: pblendvb %xmm0, %xmm3, %xmm2		; X64-NEXT: pblendvb %xmm0, %xmm3, %xmm2
; X64-NEXT: movdqa %xmm2, %xmm3		; X64-NEXT: movdqa %xmm2, %xmm3
; X64-NEXT: paddb %xmm3, %xmm3		; X64-NEXT: paddb %xmm2, %xmm3
; X64-NEXT: paddb %xmm1, %xmm1		; X64-NEXT: paddb %xmm1, %xmm1
; X64-NEXT: movdqa %xmm1, %xmm0		; X64-NEXT: movdqa %xmm1, %xmm0
; X64-NEXT: pblendvb %xmm0, %xmm3, %xmm2		; X64-NEXT: pblendvb %xmm0, %xmm3, %xmm2
; X64-NEXT: movdqa %xmm2, %xmm0		; X64-NEXT: movdqa %xmm2, %xmm0
; X64-NEXT: retq		; X64-NEXT: retq
entry:		entry:
; CHECK-NOT: shlb		; CHECK-NOT: shlb
; CHECK: pblendvb		; CHECK: pblendvb
; CHECK: pblendvb		; CHECK: pblendvb
; CHECK: pblendvb		; CHECK: pblendvb
%shl = shl <16 x i8> %r, %a ; <<16 x i8>> [#uses=1]		%shl = shl <16 x i8> %r, %a ; <<16 x i8>> [#uses=1]
%tmp2 = bitcast <16 x i8> %shl to <2 x i64> ; <<2 x i64>> [#uses=1]		%tmp2 = bitcast <16 x i8> %shl to <2 x i64> ; <<2 x i64>> [#uses=1]
ret <2 x i64> %tmp2		ret <2 x i64> %tmp2
}		}

test/CodeGen/X86/vector-blend.ll

	Show First 20 Lines • Show All 986 Lines • ▼ Show 20 Lines
	; SSSE3-NEXT: movdqa %xmm1, %xmm0			; SSSE3-NEXT: movdqa %xmm1, %xmm0
	; SSSE3-NEXT: retq			; SSSE3-NEXT: retq
	;			;
	; SSE41-LABEL: blend_neg_logic_v4i32_2:			; SSE41-LABEL: blend_neg_logic_v4i32_2:
	; SSE41: # BB#0: # %entry			; SSE41: # BB#0: # %entry
	; SSE41-NEXT: movdqa %xmm0, %xmm2			; SSE41-NEXT: movdqa %xmm0, %xmm2
	; SSE41-NEXT: psrad $31, %xmm1			; SSE41-NEXT: psrad $31, %xmm1
	; SSE41-NEXT: pxor %xmm3, %xmm3			; SSE41-NEXT: pxor %xmm3, %xmm3
	; SSE41-NEXT: psubd %xmm2, %xmm3			; SSE41-NEXT: psubd %xmm0, %xmm3
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: blendvps %xmm0, %xmm2, %xmm3			; SSE41-NEXT: blendvps %xmm0, %xmm2, %xmm3
	; SSE41-NEXT: movaps %xmm3, %xmm0			; SSE41-NEXT: movaps %xmm3, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: blend_neg_logic_v4i32_2:			; AVX-LABEL: blend_neg_logic_v4i32_2:
	; AVX: # BB#0: # %entry			; AVX: # BB#0: # %entry
	; AVX-NEXT: vpsrad $31, %xmm1, %xmm1			; AVX-NEXT: vpsrad $31, %xmm1, %xmm1
	Show All 11 Lines

test/CodeGen/X86/vector-idiv-sdiv-128.ll

Show First 20 Lines • Show All 170 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%res = sdiv <8 x i16> %a, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>		%res = sdiv <8 x i16> %a, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
ret <8 x i16> %res		ret <8 x i16> %res
}		}

define <16 x i8> @test_div7_16i8(<16 x i8> %a) nounwind {		define <16 x i8> @test_div7_16i8(<16 x i8> %a) nounwind {
; SSE2-LABEL: test_div7_16i8:		; SSE2-LABEL: test_div7_16i8:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: movdqa %xmm0, %xmm2		; SSE2-NEXT: movdqa %xmm0, %xmm2
; SSE2-NEXT: punpckhbw {{.*#+}} xmm2 = xmm2[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm2 = xmm2[8],xmm0[8],xmm2[9],xmm0[9],xmm2[10],xmm0[10],xmm2[11],xmm0[11],xmm2[12],xmm0[12],xmm2[13],xmm0[13],xmm2[14],xmm0[14],xmm2[15],xmm0[15]
; SSE2-NEXT: psraw $8, %xmm2		; SSE2-NEXT: psraw $8, %xmm2
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [65427,65427,65427,65427,65427,65427,65427,65427]		; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [65427,65427,65427,65427,65427,65427,65427,65427]
; SSE2-NEXT: pmullw %xmm3, %xmm2		; SSE2-NEXT: pmullw %xmm3, %xmm2
; SSE2-NEXT: psrlw $8, %xmm2		; SSE2-NEXT: psrlw $8, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm1		; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
; SSE2-NEXT: psraw $8, %xmm1		; SSE2-NEXT: psraw $8, %xmm1
; SSE2-NEXT: pmullw %xmm3, %xmm1		; SSE2-NEXT: pmullw %xmm3, %xmm1
; SSE2-NEXT: psrlw $8, %xmm1		; SSE2-NEXT: psrlw $8, %xmm1
; SSE2-NEXT: packuswb %xmm2, %xmm1		; SSE2-NEXT: packuswb %xmm2, %xmm1
; SSE2-NEXT: paddb %xmm0, %xmm1		; SSE2-NEXT: paddb %xmm0, %xmm1
; SSE2-NEXT: movdqa %xmm1, %xmm0		; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: psrlw $2, %xmm0		; SSE2-NEXT: psrlw $2, %xmm0
; SSE2-NEXT: pand {{.*}}(%rip), %xmm0		; SSE2-NEXT: pand {{.*}}(%rip), %xmm0
▲ Show 20 Lines • Show All 283 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%res = srem <8 x i16> %a, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>		%res = srem <8 x i16> %a, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
ret <8 x i16> %res		ret <8 x i16> %res
}		}

define <16 x i8> @test_rem7_16i8(<16 x i8> %a) nounwind {		define <16 x i8> @test_rem7_16i8(<16 x i8> %a) nounwind {
; SSE2-LABEL: test_rem7_16i8:		; SSE2-LABEL: test_rem7_16i8:
; SSE2: # BB#0:		; SSE2: # BB#0:
; SSE2-NEXT: movdqa %xmm0, %xmm2		; SSE2-NEXT: movdqa %xmm0, %xmm2
; SSE2-NEXT: punpckhbw {{.*#+}} xmm2 = xmm2[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm2 = xmm2[8],xmm0[8],xmm2[9],xmm0[9],xmm2[10],xmm0[10],xmm2[11],xmm0[11],xmm2[12],xmm0[12],xmm2[13],xmm0[13],xmm2[14],xmm0[14],xmm2[15],xmm0[15]
; SSE2-NEXT: psraw $8, %xmm2		; SSE2-NEXT: psraw $8, %xmm2
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [65427,65427,65427,65427,65427,65427,65427,65427]		; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [65427,65427,65427,65427,65427,65427,65427,65427]
; SSE2-NEXT: pmullw %xmm3, %xmm2		; SSE2-NEXT: pmullw %xmm3, %xmm2
; SSE2-NEXT: psrlw $8, %xmm2		; SSE2-NEXT: psrlw $8, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm1		; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3],xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
; SSE2-NEXT: psraw $8, %xmm1		; SSE2-NEXT: psraw $8, %xmm1
; SSE2-NEXT: pmullw %xmm3, %xmm1		; SSE2-NEXT: pmullw %xmm3, %xmm1
; SSE2-NEXT: psrlw $8, %xmm1		; SSE2-NEXT: psrlw $8, %xmm1
; SSE2-NEXT: packuswb %xmm2, %xmm1		; SSE2-NEXT: packuswb %xmm2, %xmm1
; SSE2-NEXT: paddb %xmm0, %xmm1		; SSE2-NEXT: paddb %xmm0, %xmm1
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: psrlw $2, %xmm2		; SSE2-NEXT: psrlw $2, %xmm2
; SSE2-NEXT: pand {{.*}}(%rip), %xmm2		; SSE2-NEXT: pand {{.*}}(%rip), %xmm2
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32]		; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32]
; SSE2-NEXT: pxor %xmm3, %xmm2		; SSE2-NEXT: pxor %xmm3, %xmm2
; SSE2-NEXT: psubb %xmm3, %xmm2		; SSE2-NEXT: psubb %xmm3, %xmm2
; SSE2-NEXT: psrlw $7, %xmm1		; SSE2-NEXT: psrlw $7, %xmm1
; SSE2-NEXT: pand {{.*}}(%rip), %xmm1		; SSE2-NEXT: pand {{.*}}(%rip), %xmm1
; SSE2-NEXT: paddb %xmm2, %xmm1		; SSE2-NEXT: paddb %xmm2, %xmm1
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: punpckhbw {{.*#+}} xmm2 = xmm2[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm2 = xmm2[8],xmm1[8],xmm2[9],xmm1[9],xmm2[10],xmm1[10],xmm2[11],xmm1[11],xmm2[12],xmm1[12],xmm2[13],xmm1[13],xmm2[14],xmm1[14],xmm2[15],xmm1[15]
; SSE2-NEXT: psraw $8, %xmm2		; SSE2-NEXT: psraw $8, %xmm2
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [7,7,7,7,7,7,7,7]		; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [7,7,7,7,7,7,7,7]
; SSE2-NEXT: pmullw %xmm3, %xmm2		; SSE2-NEXT: pmullw %xmm3, %xmm2
; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [255,255,255,255,255,255,255,255]		; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [255,255,255,255,255,255,255,255]
; SSE2-NEXT: pand %xmm4, %xmm2		; SSE2-NEXT: pand %xmm4, %xmm2
; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: psraw $8, %xmm1		; SSE2-NEXT: psraw $8, %xmm1
; SSE2-NEXT: pmullw %xmm3, %xmm1		; SSE2-NEXT: pmullw %xmm3, %xmm1
▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-idiv-udiv-128.ll

	Show First 20 Lines • Show All 475 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: movdqa %xmm0, %xmm1			; SSE2-NEXT: movdqa %xmm0, %xmm1
	; SSE2-NEXT: psubb %xmm4, %xmm1			; SSE2-NEXT: psubb %xmm4, %xmm1
	; SSE2-NEXT: psrlw $1, %xmm1			; SSE2-NEXT: psrlw $1, %xmm1
	; SSE2-NEXT: pand {{.*}}(%rip), %xmm1			; SSE2-NEXT: pand {{.*}}(%rip), %xmm1
	; SSE2-NEXT: paddb %xmm4, %xmm1			; SSE2-NEXT: paddb %xmm4, %xmm1
	; SSE2-NEXT: psrlw $2, %xmm1			; SSE2-NEXT: psrlw $2, %xmm1
	; SSE2-NEXT: pand {{.*}}(%rip), %xmm1			; SSE2-NEXT: pand {{.*}}(%rip), %xmm1
	; SSE2-NEXT: movdqa %xmm1, %xmm2			; SSE2-NEXT: movdqa %xmm1, %xmm2
	; SSE2-NEXT: punpckhbw {{.*#+}} xmm2 = xmm2[8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15]			; SSE2-NEXT: punpckhbw {{.*#+}} xmm2 = xmm2[8],xmm1[8],xmm2[9],xmm1[9],xmm2[10],xmm1[10],xmm2[11],xmm1[11],xmm2[12],xmm1[12],xmm2[13],xmm1[13],xmm2[14],xmm1[14],xmm2[15],xmm1[15]
	; SSE2-NEXT: psraw $8, %xmm2			; SSE2-NEXT: psraw $8, %xmm2
	; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [7,7,7,7,7,7,7,7]			; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [7,7,7,7,7,7,7,7]
	; SSE2-NEXT: pmullw %xmm3, %xmm2			; SSE2-NEXT: pmullw %xmm3, %xmm2
	; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [255,255,255,255,255,255,255,255]			; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [255,255,255,255,255,255,255,255]
	; SSE2-NEXT: pand %xmm4, %xmm2			; SSE2-NEXT: pand %xmm4, %xmm2
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
	; SSE2-NEXT: psraw $8, %xmm1			; SSE2-NEXT: psraw $8, %xmm1
	; SSE2-NEXT: pmullw %xmm3, %xmm1			; SSE2-NEXT: pmullw %xmm3, %xmm1
	▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-rotate-128.ll

	Show First 20 Lines • Show All 345 Lines • ▼ Show 20 Lines
	; SSE41-NEXT: movdqa %xmm0, %xmm3			; SSE41-NEXT: movdqa %xmm0, %xmm3
	; SSE41-NEXT: movdqa {{.*#+}} xmm2 = [16,16,16,16,16,16,16,16]			; SSE41-NEXT: movdqa {{.*#+}} xmm2 = [16,16,16,16,16,16,16,16]
	; SSE41-NEXT: psubw %xmm1, %xmm2			; SSE41-NEXT: psubw %xmm1, %xmm2
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: psllw $12, %xmm0			; SSE41-NEXT: psllw $12, %xmm0
	; SSE41-NEXT: psllw $4, %xmm1			; SSE41-NEXT: psllw $4, %xmm1
	; SSE41-NEXT: por %xmm0, %xmm1			; SSE41-NEXT: por %xmm0, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm4			; SSE41-NEXT: movdqa %xmm1, %xmm4
	; SSE41-NEXT: paddw %xmm4, %xmm4			; SSE41-NEXT: paddw %xmm1, %xmm4
	; SSE41-NEXT: movdqa %xmm3, %xmm6			; SSE41-NEXT: movdqa %xmm3, %xmm6
	; SSE41-NEXT: psllw $8, %xmm6			; SSE41-NEXT: psllw $8, %xmm6
	; SSE41-NEXT: movdqa %xmm3, %xmm5			; SSE41-NEXT: movdqa %xmm3, %xmm5
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm6, %xmm5			; SSE41-NEXT: pblendvb %xmm0, %xmm6, %xmm5
	; SSE41-NEXT: movdqa %xmm5, %xmm1			; SSE41-NEXT: movdqa %xmm5, %xmm1
	; SSE41-NEXT: psllw $4, %xmm1			; SSE41-NEXT: psllw $4, %xmm1
	; SSE41-NEXT: movdqa %xmm4, %xmm0			; SSE41-NEXT: movdqa %xmm4, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm1, %xmm5			; SSE41-NEXT: pblendvb %xmm0, %xmm1, %xmm5
	; SSE41-NEXT: movdqa %xmm5, %xmm1			; SSE41-NEXT: movdqa %xmm5, %xmm1
	; SSE41-NEXT: psllw $2, %xmm1			; SSE41-NEXT: psllw $2, %xmm1
	; SSE41-NEXT: paddw %xmm4, %xmm4			; SSE41-NEXT: paddw %xmm4, %xmm4
	; SSE41-NEXT: movdqa %xmm4, %xmm0			; SSE41-NEXT: movdqa %xmm4, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm1, %xmm5			; SSE41-NEXT: pblendvb %xmm0, %xmm1, %xmm5
	; SSE41-NEXT: movdqa %xmm5, %xmm1			; SSE41-NEXT: movdqa %xmm5, %xmm1
	; SSE41-NEXT: psllw $1, %xmm1			; SSE41-NEXT: psllw $1, %xmm1
	; SSE41-NEXT: paddw %xmm4, %xmm4			; SSE41-NEXT: paddw %xmm4, %xmm4
	; SSE41-NEXT: movdqa %xmm4, %xmm0			; SSE41-NEXT: movdqa %xmm4, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm1, %xmm5			; SSE41-NEXT: pblendvb %xmm0, %xmm1, %xmm5
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: psllw $12, %xmm0			; SSE41-NEXT: psllw $12, %xmm0
	; SSE41-NEXT: psllw $4, %xmm2			; SSE41-NEXT: psllw $4, %xmm2
	; SSE41-NEXT: por %xmm0, %xmm2			; SSE41-NEXT: por %xmm0, %xmm2
	; SSE41-NEXT: movdqa %xmm2, %xmm1			; SSE41-NEXT: movdqa %xmm2, %xmm1
	; SSE41-NEXT: paddw %xmm1, %xmm1			; SSE41-NEXT: paddw %xmm2, %xmm1
	; SSE41-NEXT: movdqa %xmm3, %xmm4			; SSE41-NEXT: movdqa %xmm3, %xmm4
	; SSE41-NEXT: psrlw $8, %xmm4			; SSE41-NEXT: psrlw $8, %xmm4
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm4, %xmm3			; SSE41-NEXT: pblendvb %xmm0, %xmm4, %xmm3
	; SSE41-NEXT: movdqa %xmm3, %xmm2			; SSE41-NEXT: movdqa %xmm3, %xmm2
	; SSE41-NEXT: psrlw $4, %xmm2			; SSE41-NEXT: psrlw $4, %xmm2
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm2, %xmm3			; SSE41-NEXT: pblendvb %xmm0, %xmm2, %xmm3
	▲ Show 20 Lines • Show All 228 Lines • ▼ Show 20 Lines
	;			;
	; SSE41-LABEL: var_rotate_v16i8:			; SSE41-LABEL: var_rotate_v16i8:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: movdqa %xmm1, %xmm3			; SSE41-NEXT: movdqa %xmm1, %xmm3
	; SSE41-NEXT: movdqa %xmm0, %xmm1			; SSE41-NEXT: movdqa %xmm0, %xmm1
	; SSE41-NEXT: movdqa {{.*#+}} xmm2 = [8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8]			; SSE41-NEXT: movdqa {{.*#+}} xmm2 = [8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8]
	; SSE41-NEXT: psubb %xmm3, %xmm2			; SSE41-NEXT: psubb %xmm3, %xmm2
	; SSE41-NEXT: psllw $5, %xmm3			; SSE41-NEXT: psllw $5, %xmm3
	; SSE41-NEXT: movdqa %xmm1, %xmm5			; SSE41-NEXT: movdqa %xmm0, %xmm5
	; SSE41-NEXT: psllw $4, %xmm5			; SSE41-NEXT: psllw $4, %xmm5
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm5			; SSE41-NEXT: pand {{.*}}(%rip), %xmm5
	; SSE41-NEXT: movdqa %xmm1, %xmm4			; SSE41-NEXT: movdqa %xmm0, %xmm4
	; SSE41-NEXT: movdqa %xmm3, %xmm0			; SSE41-NEXT: movdqa %xmm3, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm5, %xmm4			; SSE41-NEXT: pblendvb %xmm0, %xmm5, %xmm4
	; SSE41-NEXT: movdqa %xmm4, %xmm5			; SSE41-NEXT: movdqa %xmm4, %xmm5
	; SSE41-NEXT: psllw $2, %xmm5			; SSE41-NEXT: psllw $2, %xmm5
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm5			; SSE41-NEXT: pand {{.*}}(%rip), %xmm5
	; SSE41-NEXT: paddb %xmm3, %xmm3			; SSE41-NEXT: paddb %xmm3, %xmm3
	; SSE41-NEXT: movdqa %xmm3, %xmm0			; SSE41-NEXT: movdqa %xmm3, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm5, %xmm4			; SSE41-NEXT: pblendvb %xmm0, %xmm5, %xmm4
	; SSE41-NEXT: movdqa %xmm4, %xmm5			; SSE41-NEXT: movdqa %xmm4, %xmm5
	; SSE41-NEXT: paddb %xmm5, %xmm5			; SSE41-NEXT: paddb %xmm4, %xmm5
	; SSE41-NEXT: paddb %xmm3, %xmm3			; SSE41-NEXT: paddb %xmm3, %xmm3
	; SSE41-NEXT: movdqa %xmm3, %xmm0			; SSE41-NEXT: movdqa %xmm3, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm5, %xmm4			; SSE41-NEXT: pblendvb %xmm0, %xmm5, %xmm4
	; SSE41-NEXT: psllw $5, %xmm2			; SSE41-NEXT: psllw $5, %xmm2
	; SSE41-NEXT: movdqa %xmm2, %xmm3			; SSE41-NEXT: movdqa %xmm2, %xmm3
	; SSE41-NEXT: paddb %xmm3, %xmm3			; SSE41-NEXT: paddb %xmm2, %xmm3
	; SSE41-NEXT: movdqa %xmm1, %xmm5			; SSE41-NEXT: movdqa %xmm1, %xmm5
	; SSE41-NEXT: psrlw $4, %xmm5			; SSE41-NEXT: psrlw $4, %xmm5
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm5			; SSE41-NEXT: pand {{.*}}(%rip), %xmm5
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm5, %xmm1			; SSE41-NEXT: pblendvb %xmm0, %xmm5, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm2			; SSE41-NEXT: movdqa %xmm1, %xmm2
	; SSE41-NEXT: psrlw $2, %xmm2			; SSE41-NEXT: psrlw $2, %xmm2
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm2			; SSE41-NEXT: pand {{.*}}(%rip), %xmm2
	▲ Show 20 Lines • Show All 549 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: por %xmm4, %xmm0			; SSE2-NEXT: por %xmm4, %xmm0
	; SSE2-NEXT: por %xmm3, %xmm0			; SSE2-NEXT: por %xmm3, %xmm0
	; SSE2-NEXT: por %xmm1, %xmm0			; SSE2-NEXT: por %xmm1, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: constant_rotate_v16i8:			; SSE41-LABEL: constant_rotate_v16i8:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: movdqa %xmm0, %xmm1			; SSE41-NEXT: movdqa %xmm0, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm3			; SSE41-NEXT: movdqa %xmm0, %xmm3
	; SSE41-NEXT: psllw $4, %xmm3			; SSE41-NEXT: psllw $4, %xmm3
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm3			; SSE41-NEXT: pand {{.*}}(%rip), %xmm3
	; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [8192,24640,41088,57536,57600,41152,24704,8256]			; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [8192,24640,41088,57536,57600,41152,24704,8256]
	; SSE41-NEXT: movdqa %xmm1, %xmm2			; SSE41-NEXT: movdqa %xmm1, %xmm2
	; SSE41-NEXT: pblendvb %xmm0, %xmm3, %xmm2			; SSE41-NEXT: pblendvb %xmm0, %xmm3, %xmm2
	; SSE41-NEXT: movdqa %xmm2, %xmm3			; SSE41-NEXT: movdqa %xmm2, %xmm3
	; SSE41-NEXT: psllw $2, %xmm3			; SSE41-NEXT: psllw $2, %xmm3
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm3			; SSE41-NEXT: pand {{.*}}(%rip), %xmm3
	; SSE41-NEXT: paddb %xmm0, %xmm0			; SSE41-NEXT: paddb %xmm0, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm3, %xmm2			; SSE41-NEXT: pblendvb %xmm0, %xmm3, %xmm2
	; SSE41-NEXT: movdqa %xmm2, %xmm3			; SSE41-NEXT: movdqa %xmm2, %xmm3
	; SSE41-NEXT: paddb %xmm3, %xmm3			; SSE41-NEXT: paddb %xmm2, %xmm3
	; SSE41-NEXT: paddb %xmm0, %xmm0			; SSE41-NEXT: paddb %xmm0, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm3, %xmm2			; SSE41-NEXT: pblendvb %xmm0, %xmm3, %xmm2
	; SSE41-NEXT: movdqa %xmm1, %xmm3			; SSE41-NEXT: movdqa %xmm1, %xmm3
	; SSE41-NEXT: psrlw $4, %xmm3			; SSE41-NEXT: psrlw $4, %xmm3
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm3			; SSE41-NEXT: pand {{.*}}(%rip), %xmm3
	; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [57600,41152,24704,8256,8192,24640,41088,57536]			; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [57600,41152,24704,8256,8192,24640,41088,57536]
	; SSE41-NEXT: pblendvb %xmm0, %xmm3, %xmm1			; SSE41-NEXT: pblendvb %xmm0, %xmm3, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm3			; SSE41-NEXT: movdqa %xmm1, %xmm3
	▲ Show 20 Lines • Show All 516 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-sext.ll

Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines
; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]
; SSE2-NEXT: psrad $24, %xmm1		; SSE2-NEXT: psrad $24, %xmm1
; SSE2-NEXT: movdqa %xmm2, %xmm0		; SSE2-NEXT: movdqa %xmm2, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: sext_16i8_to_8i32:		; SSSE3-LABEL: sext_16i8_to_8i32:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movdqa %xmm0, %xmm1		; SSSE3-NEXT: movdqa %xmm0, %xmm1
; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]		; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
; SSSE3-NEXT: psrad $24, %xmm0		; SSSE3-NEXT: psrad $24, %xmm0
; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = xmm1[u,u,u,4,u,u,u,5,u,u,u,6,u,u,u,7]		; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = xmm1[u,u,u,4,u,u,u,5,u,u,u,6,u,u,u,7]
; SSSE3-NEXT: psrad $24, %xmm1		; SSSE3-NEXT: psrad $24, %xmm1
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: sext_16i8_to_8i32:		; SSE41-LABEL: sext_16i8_to_8i32:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
; SSE2-NEXT: punpcklwd {{.*#+}} xmm3 = xmm3[0],xmm0[0],xmm3[1],xmm0[1],xmm3[2],xmm0[2],xmm3[3],xmm0[3]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm3 = xmm3[0],xmm0[0],xmm3[1],xmm0[1],xmm3[2],xmm0[2],xmm3[3],xmm0[3]
; SSE2-NEXT: psrad $24, %xmm3		; SSE2-NEXT: psrad $24, %xmm3
; SSE2-NEXT: movdqa %xmm4, %xmm0		; SSE2-NEXT: movdqa %xmm4, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: sext_16i8_to_16i32:		; SSSE3-LABEL: sext_16i8_to_16i32:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movdqa %xmm0, %xmm3		; SSSE3-NEXT: movdqa %xmm0, %xmm3
; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1],xmm0[2],xmm3[2],xmm0[3],xmm3[3],xmm0[4],xmm3[4],xmm0[5],xmm3[5],xmm0[6],xmm3[6],xmm0[7],xmm3[7]		; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
; SSSE3-NEXT: psrad $24, %xmm0		; SSSE3-NEXT: psrad $24, %xmm0
; SSSE3-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm3[8],xmm1[9],xmm3[9],xmm1[10],xmm3[10],xmm1[11],xmm3[11],xmm1[12],xmm3[12],xmm1[13],xmm3[13],xmm1[14],xmm3[14],xmm1[15],xmm3[15]		; SSSE3-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm3[8],xmm1[9],xmm3[9],xmm1[10],xmm3[10],xmm1[11],xmm3[11],xmm1[12],xmm3[12],xmm1[13],xmm3[13],xmm1[14],xmm3[14],xmm1[15],xmm3[15]
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]
; SSSE3-NEXT: psrad $24, %xmm2		; SSSE3-NEXT: psrad $24, %xmm2
; SSSE3-NEXT: movdqa %xmm3, %xmm1		; SSSE3-NEXT: movdqa %xmm3, %xmm1
; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = xmm1[u,u,u,4,u,u,u,5,u,u,u,6,u,u,u,7]		; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = xmm1[u,u,u,4,u,u,u,5,u,u,u,6,u,u,u,7]
; SSSE3-NEXT: psrad $24, %xmm1		; SSSE3-NEXT: psrad $24, %xmm1
▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines
; SSE2-NEXT: psrad $24, %xmm1		; SSE2-NEXT: psrad $24, %xmm1
; SSE2-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]		; SSE2-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
; SSE2-NEXT: movdqa %xmm2, %xmm0		; SSE2-NEXT: movdqa %xmm2, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: sext_16i8_to_4i64:		; SSSE3-LABEL: sext_16i8_to_4i64:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movdqa %xmm0, %xmm1		; SSSE3-NEXT: movdqa %xmm0, %xmm1
; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]		; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
; SSSE3-NEXT: movdqa %xmm0, %xmm2		; SSSE3-NEXT: movdqa %xmm0, %xmm2
; SSSE3-NEXT: psrad $31, %xmm2		; SSSE3-NEXT: psrad $31, %xmm2
; SSSE3-NEXT: psrad $24, %xmm0		; SSSE3-NEXT: psrad $24, %xmm0
; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]		; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = xmm1[u,u,u,2,u,u,u,3,u,u,u],zero,xmm1[u,u,u],zero		; SSSE3-NEXT: pshufb {{.*#+}} xmm1 = xmm1[u,u,u,2,u,u,u,3,u,u,u],zero,xmm1[u,u,u],zero
; SSSE3-NEXT: movdqa %xmm1, %xmm2		; SSSE3-NEXT: movdqa %xmm1, %xmm2
; SSSE3-NEXT: psrad $31, %xmm2		; SSSE3-NEXT: psrad $31, %xmm2
Show All 39 Lines	entry:
%C = sext <4 x i8> %B to <4 x i64>		%C = sext <4 x i8> %B to <4 x i64>
ret <4 x i64> %C		ret <4 x i64> %C
}		}

define <8 x i64> @sext_16i8_to_8i64(<16 x i8> %A) nounwind uwtable readnone ssp {		define <8 x i64> @sext_16i8_to_8i64(<16 x i8> %A) nounwind uwtable readnone ssp {
; SSE2-LABEL: sext_16i8_to_8i64:		; SSE2-LABEL: sext_16i8_to_8i64:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm0, %xmm1		; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
; SSE2-NEXT: movdqa %xmm0, %xmm2		; SSE2-NEXT: movdqa %xmm0, %xmm2
; SSE2-NEXT: psrad $31, %xmm2		; SSE2-NEXT: psrad $31, %xmm2
; SSE2-NEXT: psrad $24, %xmm0		; SSE2-NEXT: psrad $24, %xmm0
; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]		; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm1[1,1,2,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm1[1,1,2,3]
; SSE2-NEXT: psrld $16, %xmm1		; SSE2-NEXT: psrld $16, %xmm1
; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
▲ Show 20 Lines • Show All 596 Lines • ▼ Show 20 Lines
}		}

define <8 x i64> @sext_8i32_to_8i64(<8 x i32> %A) nounwind uwtable readnone ssp {		define <8 x i64> @sext_8i32_to_8i64(<8 x i32> %A) nounwind uwtable readnone ssp {
; SSE2-LABEL: sext_8i32_to_8i64:		; SSE2-LABEL: sext_8i32_to_8i64:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm3		; SSE2-NEXT: movdqa %xmm0, %xmm3
; SSE2-NEXT: psrad $31, %xmm3		; SSE2-NEXT: psrad $31, %xmm3
; SSE2-NEXT: movdqa %xmm2, %xmm4		; SSE2-NEXT: movdqa %xmm1, %xmm4
; SSE2-NEXT: psrad $31, %xmm4		; SSE2-NEXT: psrad $31, %xmm4
; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]		; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1]		; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1]
; SSE2-NEXT: movdqa %xmm1, %xmm3		; SSE2-NEXT: movdqa %xmm1, %xmm3
; SSE2-NEXT: psrad $31, %xmm3		; SSE2-NEXT: psrad $31, %xmm3
; SSE2-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1]		; SSE2-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1]
; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm2[2,3,0,1]		; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm2[2,3,0,1]
; SSE2-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1]		; SSE2-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1]
; SSE2-NEXT: movdqa %xmm3, %xmm4		; SSE2-NEXT: movdqa %xmm3, %xmm4
; SSE2-NEXT: psrad $31, %xmm4		; SSE2-NEXT: psrad $31, %xmm4
; SSE2-NEXT: punpckldq {{.*#+}} xmm3 = xmm3[0],xmm4[0],xmm3[1],xmm4[1]		; SSE2-NEXT: punpckldq {{.*#+}} xmm3 = xmm3[0],xmm4[0],xmm3[1],xmm4[1]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: sext_8i32_to_8i64:		; SSSE3-LABEL: sext_8i32_to_8i64:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movdqa %xmm1, %xmm2		; SSSE3-NEXT: movdqa %xmm1, %xmm2
; SSSE3-NEXT: movdqa %xmm0, %xmm3		; SSSE3-NEXT: movdqa %xmm0, %xmm3
; SSSE3-NEXT: psrad $31, %xmm3		; SSSE3-NEXT: psrad $31, %xmm3
; SSSE3-NEXT: movdqa %xmm2, %xmm4		; SSSE3-NEXT: movdqa %xmm1, %xmm4
; SSSE3-NEXT: psrad $31, %xmm4		; SSSE3-NEXT: psrad $31, %xmm4
; SSSE3-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]		; SSSE3-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1]		; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1]
; SSSE3-NEXT: movdqa %xmm1, %xmm3		; SSSE3-NEXT: movdqa %xmm1, %xmm3
; SSSE3-NEXT: psrad $31, %xmm3		; SSSE3-NEXT: psrad $31, %xmm3
; SSSE3-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1]		; SSSE3-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1]
; SSSE3-NEXT: pshufd {{.*#+}} xmm3 = xmm2[2,3,0,1]		; SSSE3-NEXT: pshufd {{.*#+}} xmm3 = xmm2[2,3,0,1]
; SSSE3-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1]		; SSSE3-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1]
▲ Show 20 Lines • Show All 1,080 Lines • ▼ Show 20 Lines
; SSE2-NEXT: movd %ecx, %xmm0		; SSE2-NEXT: movd %ecx, %xmm0
; SSE2-NEXT: shrl $7, %eax		; SSE2-NEXT: shrl $7, %eax
; SSE2-NEXT: movzwl %ax, %eax		; SSE2-NEXT: movzwl %ax, %eax
; SSE2-NEXT: movd %eax, %xmm3		; SSE2-NEXT: movd %eax, %xmm3
; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1],xmm0[2],xmm3[2],xmm0[3],xmm3[3]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1],xmm0[2],xmm3[2],xmm0[3],xmm3[3]
; SSE2-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]		; SSE2-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]
; SSE2-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]		; SSE2-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]
; SSE2-NEXT: movdqa %xmm1, %xmm0		; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
; SSE2-NEXT: pslld $31, %xmm0		; SSE2-NEXT: pslld $31, %xmm0
; SSE2-NEXT: psrad $31, %xmm0		; SSE2-NEXT: psrad $31, %xmm0
; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]		; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
; SSE2-NEXT: pslld $31, %xmm1		; SSE2-NEXT: pslld $31, %xmm1
; SSE2-NEXT: psrad $31, %xmm1		; SSE2-NEXT: psrad $31, %xmm1
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: load_sext_8i1_to_8i32:		; SSSE3-LABEL: load_sext_8i1_to_8i32:
Show All 32 Lines
; SSSE3-NEXT: movd %ecx, %xmm0		; SSSE3-NEXT: movd %ecx, %xmm0
; SSSE3-NEXT: shrl $7, %eax		; SSSE3-NEXT: shrl $7, %eax
; SSSE3-NEXT: movzwl %ax, %eax		; SSSE3-NEXT: movzwl %ax, %eax
; SSSE3-NEXT: movd %eax, %xmm3		; SSSE3-NEXT: movd %eax, %xmm3
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1],xmm0[2],xmm3[2],xmm0[3],xmm3[3]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1],xmm0[2],xmm3[2],xmm0[3],xmm3[3]
; SSSE3-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]		; SSSE3-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]
; SSSE3-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]		; SSSE3-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]
; SSSE3-NEXT: movdqa %xmm1, %xmm0		; SSSE3-NEXT: movdqa %xmm1, %xmm0
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
; SSSE3-NEXT: pslld $31, %xmm0		; SSSE3-NEXT: pslld $31, %xmm0
; SSSE3-NEXT: psrad $31, %xmm0		; SSSE3-NEXT: psrad $31, %xmm0
; SSSE3-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]		; SSSE3-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]
; SSSE3-NEXT: pslld $31, %xmm1		; SSSE3-NEXT: pslld $31, %xmm1
; SSSE3-NEXT: psrad $31, %xmm1		; SSSE3-NEXT: psrad $31, %xmm1
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: load_sext_8i1_to_8i32:		; SSE41-LABEL: load_sext_8i1_to_8i32:
▲ Show 20 Lines • Show All 785 Lines • ▼ Show 20 Lines
; SSE2-NEXT: shrl $15, %eax		; SSE2-NEXT: shrl $15, %eax
; SSE2-NEXT: movzwl %ax, %eax		; SSE2-NEXT: movzwl %ax, %eax
; SSE2-NEXT: movd %eax, %xmm4		; SSE2-NEXT: movd %eax, %xmm4
; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1],xmm2[2],xmm4[2],xmm2[3],xmm4[3],xmm2[4],xmm4[4],xmm2[5],xmm4[5],xmm2[6],xmm4[6],xmm2[7],xmm4[7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1],xmm2[2],xmm4[2],xmm2[3],xmm4[3],xmm2[4],xmm4[4],xmm2[5],xmm4[5],xmm2[6],xmm4[6],xmm2[7],xmm4[7]
; SSE2-NEXT: punpcklwd {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3]
; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1]		; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1]
; SSE2-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0]		; SSE2-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0]
; SSE2-NEXT: movdqa %xmm1, %xmm0		; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
; SSE2-NEXT: psllw $15, %xmm0		; SSE2-NEXT: psllw $15, %xmm0
; SSE2-NEXT: psraw $15, %xmm0		; SSE2-NEXT: psraw $15, %xmm0
; SSE2-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm0[8],xmm1[9],xmm0[9],xmm1[10],xmm0[10],xmm1[11],xmm0[11],xmm1[12],xmm0[12],xmm1[13],xmm0[13],xmm1[14],xmm0[14],xmm1[15],xmm0[15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm0[8],xmm1[9],xmm0[9],xmm1[10],xmm0[10],xmm1[11],xmm0[11],xmm1[12],xmm0[12],xmm1[13],xmm0[13],xmm1[14],xmm0[14],xmm1[15],xmm0[15]
; SSE2-NEXT: psllw $15, %xmm1		; SSE2-NEXT: psllw $15, %xmm1
; SSE2-NEXT: psraw $15, %xmm1		; SSE2-NEXT: psraw $15, %xmm1
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: load_sext_16i1_to_16i16:		; SSSE3-LABEL: load_sext_16i1_to_16i16:
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
; SSSE3-NEXT: shrl $15, %eax		; SSSE3-NEXT: shrl $15, %eax
; SSSE3-NEXT: movzwl %ax, %eax		; SSSE3-NEXT: movzwl %ax, %eax
; SSSE3-NEXT: movd %eax, %xmm4		; SSSE3-NEXT: movd %eax, %xmm4
; SSSE3-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1],xmm2[2],xmm4[2],xmm2[3],xmm4[3],xmm2[4],xmm4[4],xmm2[5],xmm4[5],xmm2[6],xmm4[6],xmm2[7],xmm4[7]		; SSSE3-NEXT: punpcklbw {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1],xmm2[2],xmm4[2],xmm2[3],xmm4[3],xmm2[4],xmm4[4],xmm2[5],xmm4[5],xmm2[6],xmm4[6],xmm2[7],xmm4[7]
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1],xmm3[2],xmm2[2],xmm3[3],xmm2[3]
; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1]		; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm3[0],xmm0[1],xmm3[1]
; SSSE3-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0]		; SSSE3-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0]
; SSSE3-NEXT: movdqa %xmm1, %xmm0		; SSSE3-NEXT: movdqa %xmm1, %xmm0
; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]		; SSSE3-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
; SSSE3-NEXT: psllw $15, %xmm0		; SSSE3-NEXT: psllw $15, %xmm0
; SSSE3-NEXT: psraw $15, %xmm0		; SSSE3-NEXT: psraw $15, %xmm0
; SSSE3-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm0[8],xmm1[9],xmm0[9],xmm1[10],xmm0[10],xmm1[11],xmm0[11],xmm1[12],xmm0[12],xmm1[13],xmm0[13],xmm1[14],xmm0[14],xmm1[15],xmm0[15]		; SSSE3-NEXT: punpckhbw {{.*#+}} xmm1 = xmm1[8],xmm0[8],xmm1[9],xmm0[9],xmm1[10],xmm0[10],xmm1[11],xmm0[11],xmm1[12],xmm0[12],xmm1[13],xmm0[13],xmm1[14],xmm0[14],xmm1[15],xmm0[15]
; SSSE3-NEXT: psllw $15, %xmm1		; SSSE3-NEXT: psllw $15, %xmm1
; SSSE3-NEXT: psraw $15, %xmm1		; SSSE3-NEXT: psraw $15, %xmm1
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: load_sext_16i1_to_16i16:		; SSE41-LABEL: load_sext_16i1_to_16i16:
▲ Show 20 Lines • Show All 1,902 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shift-ashr-128.ll

	Show First 20 Lines • Show All 268 Lines • ▼ Show 20 Lines
	; SSE41-LABEL: var_shift_v8i16:			; SSE41-LABEL: var_shift_v8i16:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: movdqa %xmm0, %xmm2			; SSE41-NEXT: movdqa %xmm0, %xmm2
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: psllw $12, %xmm0			; SSE41-NEXT: psllw $12, %xmm0
	; SSE41-NEXT: psllw $4, %xmm1			; SSE41-NEXT: psllw $4, %xmm1
	; SSE41-NEXT: por %xmm0, %xmm1			; SSE41-NEXT: por %xmm0, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm3			; SSE41-NEXT: movdqa %xmm1, %xmm3
	; SSE41-NEXT: paddw %xmm3, %xmm3			; SSE41-NEXT: paddw %xmm1, %xmm3
	; SSE41-NEXT: movdqa %xmm2, %xmm4			; SSE41-NEXT: movdqa %xmm2, %xmm4
	; SSE41-NEXT: psraw $8, %xmm4			; SSE41-NEXT: psraw $8, %xmm4
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm4, %xmm2			; SSE41-NEXT: pblendvb %xmm0, %xmm4, %xmm2
	; SSE41-NEXT: movdqa %xmm2, %xmm1			; SSE41-NEXT: movdqa %xmm2, %xmm1
	; SSE41-NEXT: psraw $4, %xmm1			; SSE41-NEXT: psraw $4, %xmm1
	; SSE41-NEXT: movdqa %xmm3, %xmm0			; SSE41-NEXT: movdqa %xmm3, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm1, %xmm2			; SSE41-NEXT: pblendvb %xmm0, %xmm1, %xmm2
	▲ Show 20 Lines • Show All 1,418 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shift-lshr-128.ll

	Show First 20 Lines • Show All 239 Lines • ▼ Show 20 Lines
	; SSE41-LABEL: var_shift_v8i16:			; SSE41-LABEL: var_shift_v8i16:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: movdqa %xmm0, %xmm2			; SSE41-NEXT: movdqa %xmm0, %xmm2
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: psllw $12, %xmm0			; SSE41-NEXT: psllw $12, %xmm0
	; SSE41-NEXT: psllw $4, %xmm1			; SSE41-NEXT: psllw $4, %xmm1
	; SSE41-NEXT: por %xmm0, %xmm1			; SSE41-NEXT: por %xmm0, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm3			; SSE41-NEXT: movdqa %xmm1, %xmm3
	; SSE41-NEXT: paddw %xmm3, %xmm3			; SSE41-NEXT: paddw %xmm1, %xmm3
	; SSE41-NEXT: movdqa %xmm2, %xmm4			; SSE41-NEXT: movdqa %xmm2, %xmm4
	; SSE41-NEXT: psrlw $8, %xmm4			; SSE41-NEXT: psrlw $8, %xmm4
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm4, %xmm2			; SSE41-NEXT: pblendvb %xmm0, %xmm4, %xmm2
	; SSE41-NEXT: movdqa %xmm2, %xmm1			; SSE41-NEXT: movdqa %xmm2, %xmm1
	; SSE41-NEXT: psrlw $4, %xmm1			; SSE41-NEXT: psrlw $4, %xmm1
	; SSE41-NEXT: movdqa %xmm3, %xmm0			; SSE41-NEXT: movdqa %xmm3, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm1, %xmm2			; SSE41-NEXT: pblendvb %xmm0, %xmm1, %xmm2
	▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: pand %xmm2, %xmm0			; SSE2-NEXT: pand %xmm2, %xmm0
	; SSE2-NEXT: por %xmm1, %xmm0			; SSE2-NEXT: por %xmm1, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: var_shift_v16i8:			; SSE41-LABEL: var_shift_v16i8:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: movdqa %xmm0, %xmm2			; SSE41-NEXT: movdqa %xmm0, %xmm2
	; SSE41-NEXT: psllw $5, %xmm1			; SSE41-NEXT: psllw $5, %xmm1
	; SSE41-NEXT: movdqa %xmm2, %xmm3			; SSE41-NEXT: movdqa %xmm0, %xmm3
	; SSE41-NEXT: psrlw $4, %xmm3			; SSE41-NEXT: psrlw $4, %xmm3
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm3			; SSE41-NEXT: pand {{.*}}(%rip), %xmm3
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm3, %xmm2			; SSE41-NEXT: pblendvb %xmm0, %xmm3, %xmm2
	; SSE41-NEXT: movdqa %xmm2, %xmm3			; SSE41-NEXT: movdqa %xmm2, %xmm3
	; SSE41-NEXT: psrlw $2, %xmm3			; SSE41-NEXT: psrlw $2, %xmm3
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm3			; SSE41-NEXT: pand {{.*}}(%rip), %xmm3
	; SSE41-NEXT: paddb %xmm1, %xmm1			; SSE41-NEXT: paddb %xmm1, %xmm1
	▲ Show 20 Lines • Show All 255 Lines • ▼ Show 20 Lines
	;			;
	; SSE41-LABEL: splatvar_shift_v16i8:			; SSE41-LABEL: splatvar_shift_v16i8:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: movdqa %xmm0, %xmm2			; SSE41-NEXT: movdqa %xmm0, %xmm2
	; SSE41-NEXT: pxor %xmm0, %xmm0			; SSE41-NEXT: pxor %xmm0, %xmm0
	; SSE41-NEXT: pshufb %xmm0, %xmm1			; SSE41-NEXT: pshufb %xmm0, %xmm1
	; SSE41-NEXT: psllw $5, %xmm1			; SSE41-NEXT: psllw $5, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm3			; SSE41-NEXT: movdqa %xmm1, %xmm3
	; SSE41-NEXT: paddb %xmm3, %xmm3			; SSE41-NEXT: paddb %xmm1, %xmm3
	; SSE41-NEXT: movdqa %xmm2, %xmm4			; SSE41-NEXT: movdqa %xmm2, %xmm4
	; SSE41-NEXT: psrlw $4, %xmm4			; SSE41-NEXT: psrlw $4, %xmm4
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm4			; SSE41-NEXT: pand {{.*}}(%rip), %xmm4
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm4, %xmm2			; SSE41-NEXT: pblendvb %xmm0, %xmm4, %xmm2
	; SSE41-NEXT: movdqa %xmm2, %xmm1			; SSE41-NEXT: movdqa %xmm2, %xmm1
	; SSE41-NEXT: psrlw $2, %xmm1			; SSE41-NEXT: psrlw $2, %xmm1
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm1			; SSE41-NEXT: pand {{.*}}(%rip), %xmm1
	▲ Show 20 Lines • Show All 405 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: pand {{.*}}(%rip), %xmm0			; SSE2-NEXT: pand {{.*}}(%rip), %xmm0
	; SSE2-NEXT: pand %xmm1, %xmm0			; SSE2-NEXT: pand %xmm1, %xmm0
	; SSE2-NEXT: por %xmm2, %xmm0			; SSE2-NEXT: por %xmm2, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: constant_shift_v16i8:			; SSE41-LABEL: constant_shift_v16i8:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: movdqa %xmm0, %xmm1			; SSE41-NEXT: movdqa %xmm0, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm2			; SSE41-NEXT: movdqa %xmm0, %xmm2
	; SSE41-NEXT: psrlw $4, %xmm2			; SSE41-NEXT: psrlw $4, %xmm2
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm2			; SSE41-NEXT: pand {{.*}}(%rip), %xmm2
	; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [8192,24640,41088,57536,49376,32928,16480,32]			; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [8192,24640,41088,57536,49376,32928,16480,32]
	; SSE41-NEXT: pblendvb %xmm0, %xmm2, %xmm1			; SSE41-NEXT: pblendvb %xmm0, %xmm2, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm2			; SSE41-NEXT: movdqa %xmm1, %xmm2
	; SSE41-NEXT: psrlw $2, %xmm2			; SSE41-NEXT: psrlw $2, %xmm2
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm2			; SSE41-NEXT: pand {{.*}}(%rip), %xmm2
	; SSE41-NEXT: paddb %xmm0, %xmm0			; SSE41-NEXT: paddb %xmm0, %xmm0
	▲ Show 20 Lines • Show All 226 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shift-shl-128.ll

	Show First 20 Lines • Show All 196 Lines • ▼ Show 20 Lines
	; SSE41-LABEL: var_shift_v8i16:			; SSE41-LABEL: var_shift_v8i16:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: movdqa %xmm0, %xmm2			; SSE41-NEXT: movdqa %xmm0, %xmm2
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: psllw $12, %xmm0			; SSE41-NEXT: psllw $12, %xmm0
	; SSE41-NEXT: psllw $4, %xmm1			; SSE41-NEXT: psllw $4, %xmm1
	; SSE41-NEXT: por %xmm0, %xmm1			; SSE41-NEXT: por %xmm0, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm3			; SSE41-NEXT: movdqa %xmm1, %xmm3
	; SSE41-NEXT: paddw %xmm3, %xmm3			; SSE41-NEXT: paddw %xmm1, %xmm3
	; SSE41-NEXT: movdqa %xmm2, %xmm4			; SSE41-NEXT: movdqa %xmm2, %xmm4
	; SSE41-NEXT: psllw $8, %xmm4			; SSE41-NEXT: psllw $8, %xmm4
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm4, %xmm2			; SSE41-NEXT: pblendvb %xmm0, %xmm4, %xmm2
	; SSE41-NEXT: movdqa %xmm2, %xmm1			; SSE41-NEXT: movdqa %xmm2, %xmm1
	; SSE41-NEXT: psllw $4, %xmm1			; SSE41-NEXT: psllw $4, %xmm1
	; SSE41-NEXT: movdqa %xmm3, %xmm0			; SSE41-NEXT: movdqa %xmm3, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm1, %xmm2			; SSE41-NEXT: pblendvb %xmm0, %xmm1, %xmm2
	▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: pand %xmm2, %xmm0			; SSE2-NEXT: pand %xmm2, %xmm0
	; SSE2-NEXT: por %xmm1, %xmm0			; SSE2-NEXT: por %xmm1, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: var_shift_v16i8:			; SSE41-LABEL: var_shift_v16i8:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: movdqa %xmm0, %xmm2			; SSE41-NEXT: movdqa %xmm0, %xmm2
	; SSE41-NEXT: psllw $5, %xmm1			; SSE41-NEXT: psllw $5, %xmm1
	; SSE41-NEXT: movdqa %xmm2, %xmm3			; SSE41-NEXT: movdqa %xmm0, %xmm3
	; SSE41-NEXT: psllw $4, %xmm3			; SSE41-NEXT: psllw $4, %xmm3
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm3			; SSE41-NEXT: pand {{.*}}(%rip), %xmm3
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm3, %xmm2			; SSE41-NEXT: pblendvb %xmm0, %xmm3, %xmm2
	; SSE41-NEXT: movdqa %xmm2, %xmm3			; SSE41-NEXT: movdqa %xmm2, %xmm3
	; SSE41-NEXT: psllw $2, %xmm3			; SSE41-NEXT: psllw $2, %xmm3
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm3			; SSE41-NEXT: pand {{.*}}(%rip), %xmm3
	; SSE41-NEXT: paddb %xmm1, %xmm1			; SSE41-NEXT: paddb %xmm1, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm3, %xmm2			; SSE41-NEXT: pblendvb %xmm0, %xmm3, %xmm2
	; SSE41-NEXT: movdqa %xmm2, %xmm3			; SSE41-NEXT: movdqa %xmm2, %xmm3
	; SSE41-NEXT: paddb %xmm3, %xmm3			; SSE41-NEXT: paddb %xmm2, %xmm3
	; SSE41-NEXT: paddb %xmm1, %xmm1			; SSE41-NEXT: paddb %xmm1, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm3, %xmm2			; SSE41-NEXT: pblendvb %xmm0, %xmm3, %xmm2
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: var_shift_v16i8:			; AVX-LABEL: var_shift_v16i8:
	; AVX: # BB#0:			; AVX: # BB#0:
	▲ Show 20 Lines • Show All 237 Lines • ▼ Show 20 Lines
	;			;
	; SSE41-LABEL: splatvar_shift_v16i8:			; SSE41-LABEL: splatvar_shift_v16i8:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: movdqa %xmm0, %xmm2			; SSE41-NEXT: movdqa %xmm0, %xmm2
	; SSE41-NEXT: pxor %xmm0, %xmm0			; SSE41-NEXT: pxor %xmm0, %xmm0
	; SSE41-NEXT: pshufb %xmm0, %xmm1			; SSE41-NEXT: pshufb %xmm0, %xmm1
	; SSE41-NEXT: psllw $5, %xmm1			; SSE41-NEXT: psllw $5, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm3			; SSE41-NEXT: movdqa %xmm1, %xmm3
	; SSE41-NEXT: paddb %xmm3, %xmm3			; SSE41-NEXT: paddb %xmm1, %xmm3
	; SSE41-NEXT: movdqa %xmm2, %xmm4			; SSE41-NEXT: movdqa %xmm2, %xmm4
	; SSE41-NEXT: psllw $4, %xmm4			; SSE41-NEXT: psllw $4, %xmm4
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm4			; SSE41-NEXT: pand {{.*}}(%rip), %xmm4
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm4, %xmm2			; SSE41-NEXT: pblendvb %xmm0, %xmm4, %xmm2
	; SSE41-NEXT: movdqa %xmm2, %xmm1			; SSE41-NEXT: movdqa %xmm2, %xmm1
	; SSE41-NEXT: psllw $2, %xmm1			; SSE41-NEXT: psllw $2, %xmm1
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm1			; SSE41-NEXT: pand {{.*}}(%rip), %xmm1
	; SSE41-NEXT: movdqa %xmm3, %xmm0			; SSE41-NEXT: movdqa %xmm3, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm1, %xmm2			; SSE41-NEXT: pblendvb %xmm0, %xmm1, %xmm2
	; SSE41-NEXT: movdqa %xmm2, %xmm1			; SSE41-NEXT: movdqa %xmm2, %xmm1
	; SSE41-NEXT: paddb %xmm1, %xmm1			; SSE41-NEXT: paddb %xmm2, %xmm1
	; SSE41-NEXT: paddb %xmm3, %xmm3			; SSE41-NEXT: paddb %xmm3, %xmm3
	; SSE41-NEXT: movdqa %xmm3, %xmm0			; SSE41-NEXT: movdqa %xmm3, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm1, %xmm2			; SSE41-NEXT: pblendvb %xmm0, %xmm1, %xmm2
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: splatvar_shift_v16i8:			; AVX1-LABEL: splatvar_shift_v16i8:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	▲ Show 20 Lines • Show All 301 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: paddb %xmm0, %xmm0			; SSE2-NEXT: paddb %xmm0, %xmm0
	; SSE2-NEXT: pand %xmm1, %xmm0			; SSE2-NEXT: pand %xmm1, %xmm0
	; SSE2-NEXT: por %xmm2, %xmm0			; SSE2-NEXT: por %xmm2, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: constant_shift_v16i8:			; SSE41-LABEL: constant_shift_v16i8:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: movdqa %xmm0, %xmm1			; SSE41-NEXT: movdqa %xmm0, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm2			; SSE41-NEXT: movdqa %xmm0, %xmm2
	; SSE41-NEXT: psllw $4, %xmm2			; SSE41-NEXT: psllw $4, %xmm2
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm2			; SSE41-NEXT: pand {{.*}}(%rip), %xmm2
	; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [8192,24640,41088,57536,49376,32928,16480,32]			; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [8192,24640,41088,57536,49376,32928,16480,32]
	; SSE41-NEXT: pblendvb %xmm0, %xmm2, %xmm1			; SSE41-NEXT: pblendvb %xmm0, %xmm2, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm2			; SSE41-NEXT: movdqa %xmm1, %xmm2
	; SSE41-NEXT: psllw $2, %xmm2			; SSE41-NEXT: psllw $2, %xmm2
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm2			; SSE41-NEXT: pand {{.*}}(%rip), %xmm2
	; SSE41-NEXT: paddb %xmm0, %xmm0			; SSE41-NEXT: paddb %xmm0, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm2, %xmm1			; SSE41-NEXT: pblendvb %xmm0, %xmm2, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm2			; SSE41-NEXT: movdqa %xmm1, %xmm2
	; SSE41-NEXT: paddb %xmm2, %xmm2			; SSE41-NEXT: paddb %xmm1, %xmm2
	; SSE41-NEXT: paddb %xmm0, %xmm0			; SSE41-NEXT: paddb %xmm0, %xmm0
	; SSE41-NEXT: pblendvb %xmm0, %xmm2, %xmm1			; SSE41-NEXT: pblendvb %xmm0, %xmm2, %xmm1
	; SSE41-NEXT: movdqa %xmm1, %xmm0			; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: constant_shift_v16i8:			; AVX-LABEL: constant_shift_v16i8:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpsllw $4, %xmm0, %xmm1			; AVX-NEXT: vpsllw $4, %xmm0, %xmm1
	▲ Show 20 Lines • Show All 208 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-combining.ll

Show First 20 Lines • Show All 2,786 Lines • ▼ Show 20 Lines	; AVX2-NEXT: retq
%ret = shufflevector <4 x i32> %a0, <4 x i32> <i32 undef, i32 4, i32 5, i32 30>, <4 x i32> <i32 0, i32 5, i32 6, i32 7>		%ret = shufflevector <4 x i32> %a0, <4 x i32> <i32 undef, i32 4, i32 5, i32 30>, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
ret <4 x i32> %ret		ret <4 x i32> %ret
}		}

define <4 x float> @PR22377(<4 x float> %a, <4 x float> %b) {		define <4 x float> @PR22377(<4 x float> %a, <4 x float> %b) {
; SSE-LABEL: PR22377:		; SSE-LABEL: PR22377:
; SSE: # BB#0: # %entry		; SSE: # BB#0: # %entry
; SSE-NEXT: movaps %xmm0, %xmm1		; SSE-NEXT: movaps %xmm0, %xmm1
; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,3,1,3]		; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,3],xmm0[1,3]
; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2,0,2]		; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2,0,2]
; SSE-NEXT: addps %xmm0, %xmm1		; SSE-NEXT: addps %xmm0, %xmm1
; SSE-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]		; SSE-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: PR22377:		; AVX-LABEL: PR22377:
; AVX: # BB#0: # %entry		; AVX: # BB#0: # %entry
; AVX-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[1,3,1,3]		; AVX-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[1,3,1,3]
▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-trunc-math.ll

	Show First 20 Lines • Show All 5,192 Lines • ▼ Show 20 Lines
	;			;
	; complex patterns - often created by vectorizer			; complex patterns - often created by vectorizer
	;			;

	define <4 x i32> @mul_add_const_v4i64_v4i32(<4 x i32> %a0, <4 x i32> %a1) nounwind {			define <4 x i32> @mul_add_const_v4i64_v4i32(<4 x i32> %a0, <4 x i32> %a1) nounwind {
	; SSE-LABEL: mul_add_const_v4i64_v4i32:			; SSE-LABEL: mul_add_const_v4i64_v4i32:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: movdqa %xmm0, %xmm2			; SSE-NEXT: movdqa %xmm0, %xmm2
	; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm2[0,1,1,3]			; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]
	; SSE-NEXT: pshufd {{.*#+}} xmm2 = xmm2[2,1,3,3]			; SSE-NEXT: pshufd {{.*#+}} xmm2 = xmm2[2,1,3,3]
	; SSE-NEXT: pshufd {{.*#+}} xmm3 = xmm1[0,1,1,3]			; SSE-NEXT: pshufd {{.*#+}} xmm3 = xmm1[0,1,1,3]
	; SSE-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,1,3,3]			; SSE-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,1,3,3]
	; SSE-NEXT: movdqa %xmm2, %xmm4			; SSE-NEXT: movdqa %xmm2, %xmm4
	; SSE-NEXT: psrlq $32, %xmm4			; SSE-NEXT: psrlq $32, %xmm4
	; SSE-NEXT: pmuludq %xmm1, %xmm4			; SSE-NEXT: pmuludq %xmm1, %xmm4
	; SSE-NEXT: movdqa %xmm1, %xmm5			; SSE-NEXT: movdqa %xmm1, %xmm5
	; SSE-NEXT: psrlq $32, %xmm5			; SSE-NEXT: psrlq $32, %xmm5
	▲ Show 20 Lines • Show All 130 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-zext.ll

Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	entry:
ret <8 x i32> %C		ret <8 x i32> %C
}		}

define <16 x i32> @zext_16i8_to_16i32(<16 x i8> %A) nounwind uwtable readnone ssp {		define <16 x i32> @zext_16i8_to_16i32(<16 x i8> %A) nounwind uwtable readnone ssp {
; SSE2-LABEL: zext_16i8_to_16i32:		; SSE2-LABEL: zext_16i8_to_16i32:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm0, %xmm3		; SSE2-NEXT: movdqa %xmm0, %xmm3
; SSE2-NEXT: pxor %xmm4, %xmm4		; SSE2-NEXT: pxor %xmm4, %xmm4
; SSE2-NEXT: movdqa %xmm3, %xmm1		; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm4[0],xmm1[1],xmm4[1],xmm1[2],xmm4[2],xmm1[3],xmm4[3],xmm1[4],xmm4[4],xmm1[5],xmm4[5],xmm1[6],xmm4[6],xmm1[7],xmm4[7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm4[0],xmm1[1],xmm4[1],xmm1[2],xmm4[2],xmm1[3],xmm4[3],xmm1[4],xmm4[4],xmm1[5],xmm4[5],xmm1[6],xmm4[6],xmm1[7],xmm4[7]
; SSE2-NEXT: movdqa %xmm1, %xmm0		; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm4[0],xmm0[1],xmm4[1],xmm0[2],xmm4[2],xmm0[3],xmm4[3]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm4[0],xmm0[1],xmm4[1],xmm0[2],xmm4[2],xmm0[3],xmm4[3]
; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm4[4],xmm1[5],xmm4[5],xmm1[6],xmm4[6],xmm1[7],xmm4[7]		; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm4[4],xmm1[5],xmm4[5],xmm1[6],xmm4[6],xmm1[7],xmm4[7]
; SSE2-NEXT: punpckhbw {{.*#+}} xmm3 = xmm3[8],xmm4[8],xmm3[9],xmm4[9],xmm3[10],xmm4[10],xmm3[11],xmm4[11],xmm3[12],xmm4[12],xmm3[13],xmm4[13],xmm3[14],xmm4[14],xmm3[15],xmm4[15]		; SSE2-NEXT: punpckhbw {{.*#+}} xmm3 = xmm3[8],xmm4[8],xmm3[9],xmm4[9],xmm3[10],xmm4[10],xmm3[11],xmm4[11],xmm3[12],xmm4[12],xmm3[13],xmm4[13],xmm3[14],xmm4[14],xmm3[15],xmm4[15]
; SSE2-NEXT: movdqa %xmm3, %xmm2		; SSE2-NEXT: movdqa %xmm3, %xmm2
; SSE2-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1],xmm2[2],xmm4[2],xmm2[3],xmm4[3]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1],xmm2[2],xmm4[2],xmm2[3],xmm4[3]
; SSE2-NEXT: punpckhwd {{.*#+}} xmm3 = xmm3[4],xmm4[4],xmm3[5],xmm4[5],xmm3[6],xmm4[6],xmm3[7],xmm4[7]		; SSE2-NEXT: punpckhwd {{.*#+}} xmm3 = xmm3[4],xmm4[4],xmm3[5],xmm4[5],xmm3[6],xmm4[6],xmm3[7],xmm4[7]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: zext_16i8_to_16i32:		; SSSE3-LABEL: zext_16i8_to_16i32:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movdqa %xmm0, %xmm3		; SSSE3-NEXT: movdqa %xmm0, %xmm3
; SSSE3-NEXT: pxor %xmm4, %xmm4		; SSSE3-NEXT: pxor %xmm4, %xmm4
; SSSE3-NEXT: movdqa %xmm3, %xmm1		; SSSE3-NEXT: movdqa %xmm0, %xmm1
; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm4[0],xmm1[1],xmm4[1],xmm1[2],xmm4[2],xmm1[3],xmm4[3],xmm1[4],xmm4[4],xmm1[5],xmm4[5],xmm1[6],xmm4[6],xmm1[7],xmm4[7]		; SSSE3-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm4[0],xmm1[1],xmm4[1],xmm1[2],xmm4[2],xmm1[3],xmm4[3],xmm1[4],xmm4[4],xmm1[5],xmm4[5],xmm1[6],xmm4[6],xmm1[7],xmm4[7]
; SSSE3-NEXT: movdqa %xmm1, %xmm0		; SSSE3-NEXT: movdqa %xmm1, %xmm0
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm4[0],xmm0[1],xmm4[1],xmm0[2],xmm4[2],xmm0[3],xmm4[3]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm4[0],xmm0[1],xmm4[1],xmm0[2],xmm4[2],xmm0[3],xmm4[3]
; SSSE3-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm4[4],xmm1[5],xmm4[5],xmm1[6],xmm4[6],xmm1[7],xmm4[7]		; SSSE3-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm4[4],xmm1[5],xmm4[5],xmm1[6],xmm4[6],xmm1[7],xmm4[7]
; SSSE3-NEXT: punpckhbw {{.*#+}} xmm3 = xmm3[8],xmm4[8],xmm3[9],xmm4[9],xmm3[10],xmm4[10],xmm3[11],xmm4[11],xmm3[12],xmm4[12],xmm3[13],xmm4[13],xmm3[14],xmm4[14],xmm3[15],xmm4[15]		; SSSE3-NEXT: punpckhbw {{.*#+}} xmm3 = xmm3[8],xmm4[8],xmm3[9],xmm4[9],xmm3[10],xmm4[10],xmm3[11],xmm4[11],xmm3[12],xmm4[12],xmm3[13],xmm4[13],xmm3[14],xmm4[14],xmm3[15],xmm4[15]
; SSSE3-NEXT: movdqa %xmm3, %xmm2		; SSSE3-NEXT: movdqa %xmm3, %xmm2
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1],xmm2[2],xmm4[2],xmm2[3],xmm4[3]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1],xmm2[2],xmm4[2],xmm2[3],xmm4[3]
; SSSE3-NEXT: punpckhwd {{.*#+}} xmm3 = xmm3[4],xmm4[4],xmm3[5],xmm4[5],xmm3[6],xmm4[6],xmm3[7],xmm4[7]		; SSSE3-NEXT: punpckhwd {{.*#+}} xmm3 = xmm3[4],xmm4[4],xmm3[5],xmm4[5],xmm3[6],xmm4[6],xmm3[7],xmm4[7]
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	entry:
ret <4 x i64> %C		ret <4 x i64> %C
}		}

define <8 x i64> @zext_16i8_to_8i64(<16 x i8> %A) nounwind uwtable readnone ssp {		define <8 x i64> @zext_16i8_to_8i64(<16 x i8> %A) nounwind uwtable readnone ssp {
; SSE2-LABEL: zext_16i8_to_8i64:		; SSE2-LABEL: zext_16i8_to_8i64:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm0, %xmm1		; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: pxor %xmm4, %xmm4		; SSE2-NEXT: pxor %xmm4, %xmm4
; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm1[1,1,2,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm0[1,1,2,3]
; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm4[0],xmm1[1],xmm4[1],xmm1[2],xmm4[2],xmm1[3],xmm4[3],xmm1[4],xmm4[4],xmm1[5],xmm4[5],xmm1[6],xmm4[6],xmm1[7],xmm4[7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm4[0],xmm1[1],xmm4[1],xmm1[2],xmm4[2],xmm1[3],xmm4[3],xmm1[4],xmm4[4],xmm1[5],xmm4[5],xmm1[6],xmm4[6],xmm1[7],xmm4[7]
; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm4[0],xmm1[1],xmm4[1],xmm1[2],xmm4[2],xmm1[3],xmm4[3]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm4[0],xmm1[1],xmm4[1],xmm1[2],xmm4[2],xmm1[3],xmm4[3]
; SSE2-NEXT: movdqa %xmm1, %xmm0		; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm4[0],xmm0[1],xmm4[1]		; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm4[0],xmm0[1],xmm4[1]
; SSE2-NEXT: punpckhdq {{.*#+}} xmm1 = xmm1[2],xmm4[2],xmm1[3],xmm4[3]		; SSE2-NEXT: punpckhdq {{.*#+}} xmm1 = xmm1[2],xmm4[2],xmm1[3],xmm4[3]
; SSE2-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm4[0],xmm3[1],xmm4[1],xmm3[2],xmm4[2],xmm3[3],xmm4[3],xmm3[4],xmm4[4],xmm3[5],xmm4[5],xmm3[6],xmm4[6],xmm3[7],xmm4[7]		; SSE2-NEXT: punpcklbw {{.*#+}} xmm3 = xmm3[0],xmm4[0],xmm3[1],xmm4[1],xmm3[2],xmm4[2],xmm3[3],xmm4[3],xmm3[4],xmm4[4],xmm3[5],xmm4[5],xmm3[6],xmm4[6],xmm3[7],xmm4[7]
; SSE2-NEXT: punpcklwd {{.*#+}} xmm3 = xmm3[0],xmm4[0],xmm3[1],xmm4[1],xmm3[2],xmm4[2],xmm3[3],xmm4[3]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm3 = xmm3[0],xmm4[0],xmm3[1],xmm4[1],xmm3[2],xmm4[2],xmm3[3],xmm4[3]
; SSE2-NEXT: movdqa %xmm3, %xmm2		; SSE2-NEXT: movdqa %xmm3, %xmm2
▲ Show 20 Lines • Show All 284 Lines • ▼ Show 20 Lines	entry:
ret <4 x i64> %C		ret <4 x i64> %C
}		}

define <8 x i64> @zext_8i16_to_8i64(<8 x i16> %A) nounwind uwtable readnone ssp {		define <8 x i64> @zext_8i16_to_8i64(<8 x i16> %A) nounwind uwtable readnone ssp {
; SSE2-LABEL: zext_8i16_to_8i64:		; SSE2-LABEL: zext_8i16_to_8i64:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm0, %xmm3		; SSE2-NEXT: movdqa %xmm0, %xmm3
; SSE2-NEXT: pxor %xmm4, %xmm4		; SSE2-NEXT: pxor %xmm4, %xmm4
; SSE2-NEXT: movdqa %xmm3, %xmm1		; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm4[0],xmm1[1],xmm4[1],xmm1[2],xmm4[2],xmm1[3],xmm4[3]		; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm4[0],xmm1[1],xmm4[1],xmm1[2],xmm4[2],xmm1[3],xmm4[3]
; SSE2-NEXT: movdqa %xmm1, %xmm0		; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm4[0],xmm0[1],xmm4[1]		; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm4[0],xmm0[1],xmm4[1]
; SSE2-NEXT: punpckhdq {{.*#+}} xmm1 = xmm1[2],xmm4[2],xmm1[3],xmm4[3]		; SSE2-NEXT: punpckhdq {{.*#+}} xmm1 = xmm1[2],xmm4[2],xmm1[3],xmm4[3]
; SSE2-NEXT: punpckhwd {{.*#+}} xmm3 = xmm3[4],xmm4[4],xmm3[5],xmm4[5],xmm3[6],xmm4[6],xmm3[7],xmm4[7]		; SSE2-NEXT: punpckhwd {{.*#+}} xmm3 = xmm3[4],xmm4[4],xmm3[5],xmm4[5],xmm3[6],xmm4[6],xmm3[7],xmm4[7]
; SSE2-NEXT: movdqa %xmm3, %xmm2		; SSE2-NEXT: movdqa %xmm3, %xmm2
; SSE2-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1]		; SSE2-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1]
; SSE2-NEXT: punpckhdq {{.*#+}} xmm3 = xmm3[2],xmm4[2],xmm3[3],xmm4[3]		; SSE2-NEXT: punpckhdq {{.*#+}} xmm3 = xmm3[2],xmm4[2],xmm3[3],xmm4[3]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: zext_8i16_to_8i64:		; SSSE3-LABEL: zext_8i16_to_8i64:
; SSSE3: # BB#0: # %entry		; SSSE3: # BB#0: # %entry
; SSSE3-NEXT: movdqa %xmm0, %xmm3		; SSSE3-NEXT: movdqa %xmm0, %xmm3
; SSSE3-NEXT: pxor %xmm4, %xmm4		; SSSE3-NEXT: pxor %xmm4, %xmm4
; SSSE3-NEXT: movdqa %xmm3, %xmm1		; SSSE3-NEXT: movdqa %xmm0, %xmm1
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm4[0],xmm1[1],xmm4[1],xmm1[2],xmm4[2],xmm1[3],xmm4[3]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm4[0],xmm1[1],xmm4[1],xmm1[2],xmm4[2],xmm1[3],xmm4[3]
; SSSE3-NEXT: movdqa %xmm1, %xmm0		; SSSE3-NEXT: movdqa %xmm1, %xmm0
; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm4[0],xmm0[1],xmm4[1]		; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm4[0],xmm0[1],xmm4[1]
; SSSE3-NEXT: punpckhdq {{.*#+}} xmm1 = xmm1[2],xmm4[2],xmm1[3],xmm4[3]		; SSSE3-NEXT: punpckhdq {{.*#+}} xmm1 = xmm1[2],xmm4[2],xmm1[3],xmm4[3]
; SSSE3-NEXT: punpckhwd {{.*#+}} xmm3 = xmm3[4],xmm4[4],xmm3[5],xmm4[5],xmm3[6],xmm4[6],xmm3[7],xmm4[7]		; SSSE3-NEXT: punpckhwd {{.*#+}} xmm3 = xmm3[4],xmm4[4],xmm3[5],xmm4[5],xmm3[6],xmm4[6],xmm3[7],xmm4[7]
; SSSE3-NEXT: movdqa %xmm3, %xmm2		; SSSE3-NEXT: movdqa %xmm3, %xmm2
; SSSE3-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1]		; SSSE3-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm4[0],xmm2[1],xmm4[1]
; SSSE3-NEXT: punpckhdq {{.*#+}} xmm3 = xmm3[2],xmm4[2],xmm3[3],xmm4[3]		; SSSE3-NEXT: punpckhdq {{.*#+}} xmm3 = xmm3[2],xmm4[2],xmm3[3],xmm4[3]
▲ Show 20 Lines • Show All 850 Lines • ▼ Show 20 Lines
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
; SSSE3-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]		; SSSE3-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: shuf_zext_8i16_to_8i32:		; SSE41-LABEL: shuf_zext_8i16_to_8i32:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: movdqa %xmm0, %xmm1		; SSE41-NEXT: movdqa %xmm0, %xmm1
; SSE41-NEXT: pxor %xmm2, %xmm2		; SSE41-NEXT: pxor %xmm2, %xmm2
; SSE41-NEXT: pmovzxwd {{.*#+}} xmm0 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero		; SSE41-NEXT: pmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
; SSE41-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]		; SSE41-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX1-LABEL: shuf_zext_8i16_to_8i32:		; AVX1-LABEL: shuf_zext_8i16_to_8i32:
; AVX1: # BB#0: # %entry		; AVX1: # BB#0: # %entry
; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1		; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]		; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero		; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
Show All 31 Lines
; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]		; SSSE3-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
; SSSE3-NEXT: punpckhdq {{.*#+}} xmm1 = xmm1[2],xmm2[2],xmm1[3],xmm2[3]		; SSSE3-NEXT: punpckhdq {{.*#+}} xmm1 = xmm1[2],xmm2[2],xmm1[3],xmm2[3]
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: shuf_zext_4i32_to_4i64:		; SSE41-LABEL: shuf_zext_4i32_to_4i64:
; SSE41: # BB#0: # %entry		; SSE41: # BB#0: # %entry
; SSE41-NEXT: movdqa %xmm0, %xmm1		; SSE41-NEXT: movdqa %xmm0, %xmm1
; SSE41-NEXT: pxor %xmm2, %xmm2		; SSE41-NEXT: pxor %xmm2, %xmm2
; SSE41-NEXT: pmovzxdq {{.*#+}} xmm0 = xmm1[0],zero,xmm1[1],zero		; SSE41-NEXT: pmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
; SSE41-NEXT: punpckhdq {{.*#+}} xmm1 = xmm1[2],xmm2[2],xmm1[3],xmm2[3]		; SSE41-NEXT: punpckhdq {{.*#+}} xmm1 = xmm1[2],xmm2[2],xmm1[3],xmm2[3]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX1-LABEL: shuf_zext_4i32_to_4i64:		; AVX1-LABEL: shuf_zext_4i32_to_4i64:
; AVX1: # BB#0: # %entry		; AVX1: # BB#0: # %entry
; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1		; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
; AVX1-NEXT: vpunpckhdq {{.*#+}} xmm1 = xmm0[2],xmm1[2],xmm0[3],xmm1[3]		; AVX1-NEXT: vpunpckhdq {{.*#+}} xmm1 = xmm0[2],xmm1[2],xmm0[3],xmm1[3]
; AVX1-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero		; AVX1-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
▲ Show 20 Lines • Show All 600 Lines • Show Last 20 Lines

test/CodeGen/X86/vselect-minmax.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,338 Lines • ▼ Show 20 Lines	entry:
ret <64 x i8> %sel		ret <64 x i8> %sel
}		}

define <64 x i8> @test98(<64 x i8> %a, <64 x i8> %b) {		define <64 x i8> @test98(<64 x i8> %a, <64 x i8> %b) {
; SSE2-LABEL: test98:		; SSE2-LABEL: test98:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm3, %xmm8		; SSE2-NEXT: movdqa %xmm3, %xmm8
; SSE2-NEXT: movdqa %xmm2, %xmm9		; SSE2-NEXT: movdqa %xmm2, %xmm9
; SSE2-NEXT: movdqa %xmm8, %xmm12		; SSE2-NEXT: movdqa %xmm3, %xmm12
; SSE2-NEXT: pcmpgtb %xmm7, %xmm12		; SSE2-NEXT: pcmpgtb %xmm7, %xmm12
; SSE2-NEXT: pcmpeqd %xmm13, %xmm13		; SSE2-NEXT: pcmpeqd %xmm13, %xmm13
; SSE2-NEXT: movdqa %xmm12, %xmm3		; SSE2-NEXT: movdqa %xmm12, %xmm3
; SSE2-NEXT: pxor %xmm13, %xmm3		; SSE2-NEXT: pxor %xmm13, %xmm3
; SSE2-NEXT: movdqa %xmm9, %xmm14		; SSE2-NEXT: movdqa %xmm2, %xmm14
; SSE2-NEXT: pcmpgtb %xmm6, %xmm14		; SSE2-NEXT: pcmpgtb %xmm6, %xmm14
; SSE2-NEXT: movdqa %xmm14, %xmm2		; SSE2-NEXT: movdqa %xmm14, %xmm2
; SSE2-NEXT: pxor %xmm13, %xmm2		; SSE2-NEXT: pxor %xmm13, %xmm2
; SSE2-NEXT: movdqa %xmm1, %xmm15		; SSE2-NEXT: movdqa %xmm1, %xmm15
; SSE2-NEXT: pcmpgtb %xmm5, %xmm15		; SSE2-NEXT: pcmpgtb %xmm5, %xmm15
; SSE2-NEXT: movdqa %xmm15, %xmm10		; SSE2-NEXT: movdqa %xmm15, %xmm10
; SSE2-NEXT: pxor %xmm13, %xmm10		; SSE2-NEXT: pxor %xmm13, %xmm10
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines

define <64 x i8> @test100(<64 x i8> %a, <64 x i8> %b) {		define <64 x i8> @test100(<64 x i8> %a, <64 x i8> %b) {
; SSE2-LABEL: test100:		; SSE2-LABEL: test100:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm3, %xmm8		; SSE2-NEXT: movdqa %xmm3, %xmm8
; SSE2-NEXT: movdqa %xmm2, %xmm9		; SSE2-NEXT: movdqa %xmm2, %xmm9
; SSE2-NEXT: movdqa %xmm0, %xmm10		; SSE2-NEXT: movdqa %xmm0, %xmm10
; SSE2-NEXT: movdqa %xmm7, %xmm12		; SSE2-NEXT: movdqa %xmm7, %xmm12
; SSE2-NEXT: pcmpgtb %xmm8, %xmm12		; SSE2-NEXT: pcmpgtb %xmm3, %xmm12
; SSE2-NEXT: pcmpeqd %xmm0, %xmm0		; SSE2-NEXT: pcmpeqd %xmm0, %xmm0
; SSE2-NEXT: movdqa %xmm12, %xmm3		; SSE2-NEXT: movdqa %xmm12, %xmm3
; SSE2-NEXT: pxor %xmm0, %xmm3		; SSE2-NEXT: pxor %xmm0, %xmm3
; SSE2-NEXT: movdqa %xmm6, %xmm13		; SSE2-NEXT: movdqa %xmm6, %xmm13
; SSE2-NEXT: pcmpgtb %xmm9, %xmm13		; SSE2-NEXT: pcmpgtb %xmm2, %xmm13
; SSE2-NEXT: movdqa %xmm13, %xmm2		; SSE2-NEXT: movdqa %xmm13, %xmm2
; SSE2-NEXT: pxor %xmm0, %xmm2		; SSE2-NEXT: pxor %xmm0, %xmm2
; SSE2-NEXT: movdqa %xmm5, %xmm14		; SSE2-NEXT: movdqa %xmm5, %xmm14
; SSE2-NEXT: pcmpgtb %xmm1, %xmm14		; SSE2-NEXT: pcmpgtb %xmm1, %xmm14
; SSE2-NEXT: movdqa %xmm14, %xmm11		; SSE2-NEXT: movdqa %xmm14, %xmm11
; SSE2-NEXT: pxor %xmm0, %xmm11		; SSE2-NEXT: pxor %xmm0, %xmm11
; SSE2-NEXT: movdqa %xmm4, %xmm15		; SSE2-NEXT: movdqa %xmm4, %xmm15
; SSE2-NEXT: pcmpgtb %xmm10, %xmm15		; SSE2-NEXT: pcmpgtb %xmm10, %xmm15
▲ Show 20 Lines • Show All 716 Lines • ▼ Show 20 Lines	entry:
ret <16 x i32> %sel		ret <16 x i32> %sel
}		}

define <16 x i32> @test114(<16 x i32> %a, <16 x i32> %b) {		define <16 x i32> @test114(<16 x i32> %a, <16 x i32> %b) {
; SSE2-LABEL: test114:		; SSE2-LABEL: test114:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm3, %xmm8		; SSE2-NEXT: movdqa %xmm3, %xmm8
; SSE2-NEXT: movdqa %xmm2, %xmm9		; SSE2-NEXT: movdqa %xmm2, %xmm9
; SSE2-NEXT: movdqa %xmm8, %xmm12		; SSE2-NEXT: movdqa %xmm3, %xmm12
; SSE2-NEXT: pcmpgtd %xmm7, %xmm12		; SSE2-NEXT: pcmpgtd %xmm7, %xmm12
; SSE2-NEXT: pcmpeqd %xmm13, %xmm13		; SSE2-NEXT: pcmpeqd %xmm13, %xmm13
; SSE2-NEXT: movdqa %xmm12, %xmm3		; SSE2-NEXT: movdqa %xmm12, %xmm3
; SSE2-NEXT: pxor %xmm13, %xmm3		; SSE2-NEXT: pxor %xmm13, %xmm3
; SSE2-NEXT: movdqa %xmm9, %xmm14		; SSE2-NEXT: movdqa %xmm2, %xmm14
; SSE2-NEXT: pcmpgtd %xmm6, %xmm14		; SSE2-NEXT: pcmpgtd %xmm6, %xmm14
; SSE2-NEXT: movdqa %xmm14, %xmm2		; SSE2-NEXT: movdqa %xmm14, %xmm2
; SSE2-NEXT: pxor %xmm13, %xmm2		; SSE2-NEXT: pxor %xmm13, %xmm2
; SSE2-NEXT: movdqa %xmm1, %xmm15		; SSE2-NEXT: movdqa %xmm1, %xmm15
; SSE2-NEXT: pcmpgtd %xmm5, %xmm15		; SSE2-NEXT: pcmpgtd %xmm5, %xmm15
; SSE2-NEXT: movdqa %xmm15, %xmm10		; SSE2-NEXT: movdqa %xmm15, %xmm10
; SSE2-NEXT: pxor %xmm13, %xmm10		; SSE2-NEXT: pxor %xmm13, %xmm10
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines

define <16 x i32> @test116(<16 x i32> %a, <16 x i32> %b) {		define <16 x i32> @test116(<16 x i32> %a, <16 x i32> %b) {
; SSE2-LABEL: test116:		; SSE2-LABEL: test116:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm3, %xmm8		; SSE2-NEXT: movdqa %xmm3, %xmm8
; SSE2-NEXT: movdqa %xmm2, %xmm9		; SSE2-NEXT: movdqa %xmm2, %xmm9
; SSE2-NEXT: movdqa %xmm0, %xmm10		; SSE2-NEXT: movdqa %xmm0, %xmm10
; SSE2-NEXT: movdqa %xmm7, %xmm12		; SSE2-NEXT: movdqa %xmm7, %xmm12
; SSE2-NEXT: pcmpgtd %xmm8, %xmm12		; SSE2-NEXT: pcmpgtd %xmm3, %xmm12
; SSE2-NEXT: pcmpeqd %xmm0, %xmm0		; SSE2-NEXT: pcmpeqd %xmm0, %xmm0
; SSE2-NEXT: movdqa %xmm12, %xmm3		; SSE2-NEXT: movdqa %xmm12, %xmm3
; SSE2-NEXT: pxor %xmm0, %xmm3		; SSE2-NEXT: pxor %xmm0, %xmm3
; SSE2-NEXT: movdqa %xmm6, %xmm13		; SSE2-NEXT: movdqa %xmm6, %xmm13
; SSE2-NEXT: pcmpgtd %xmm9, %xmm13		; SSE2-NEXT: pcmpgtd %xmm2, %xmm13
; SSE2-NEXT: movdqa %xmm13, %xmm2		; SSE2-NEXT: movdqa %xmm13, %xmm2
; SSE2-NEXT: pxor %xmm0, %xmm2		; SSE2-NEXT: pxor %xmm0, %xmm2
; SSE2-NEXT: movdqa %xmm5, %xmm14		; SSE2-NEXT: movdqa %xmm5, %xmm14
; SSE2-NEXT: pcmpgtd %xmm1, %xmm14		; SSE2-NEXT: pcmpgtd %xmm1, %xmm14
; SSE2-NEXT: movdqa %xmm14, %xmm11		; SSE2-NEXT: movdqa %xmm14, %xmm11
; SSE2-NEXT: pxor %xmm0, %xmm11		; SSE2-NEXT: pxor %xmm0, %xmm11
; SSE2-NEXT: movdqa %xmm4, %xmm15		; SSE2-NEXT: movdqa %xmm4, %xmm15
; SSE2-NEXT: pcmpgtd %xmm10, %xmm15		; SSE2-NEXT: pcmpgtd %xmm10, %xmm15
▲ Show 20 Lines • Show All 500 Lines • ▼ Show 20 Lines	entry:
%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b		%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test122(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test122(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test122:		; SSE2-LABEL: test122:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm8
; SSE2-NEXT: movdqa %xmm8, -{{[0-9]+}}(%rsp) # 16-byte Spill		; SSE2-NEXT: movdqa %xmm7, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm7, %xmm0		; SSE2-NEXT: movdqa %xmm7, %xmm0
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
▲ Show 20 Lines • Show All 257 Lines • ▼ Show 20 Lines	entry:
%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b		%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test124(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test124(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test124:		; SSE2-LABEL: test124:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm11		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill		; SSE2-NEXT: movdqa %xmm7, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm8
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm11, %xmm0		; SSE2-NEXT: movdqa %xmm11, %xmm0
▲ Show 20 Lines • Show All 286 Lines • ▼ Show 20 Lines	entry:
%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b		%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test126(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test126(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test126:		; SSE2-LABEL: test126:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm8
; SSE2-NEXT: movdqa %xmm8, -{{[0-9]+}}(%rsp) # 16-byte Spill		; SSE2-NEXT: movdqa %xmm7, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm7, %xmm0		; SSE2-NEXT: movdqa %xmm7, %xmm0
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
▲ Show 20 Lines • Show All 311 Lines • ▼ Show 20 Lines	entry:
%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b		%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test128(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test128(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test128:		; SSE2-LABEL: test128:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm11		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill		; SSE2-NEXT: movdqa %xmm7, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm8
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm11, %xmm0		; SSE2-NEXT: movdqa %xmm11, %xmm0
▲ Show 20 Lines • Show All 235 Lines • ▼ Show 20 Lines
; SSE2-LABEL: test130:		; SSE2-LABEL: test130:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm2, %xmm8		; SSE2-NEXT: movdqa %xmm2, %xmm8
; SSE2-NEXT: movdqa %xmm3, %xmm12		; SSE2-NEXT: movdqa %xmm3, %xmm12
; SSE2-NEXT: pcmpgtb %xmm7, %xmm12		; SSE2-NEXT: pcmpgtb %xmm7, %xmm12
; SSE2-NEXT: pcmpeqd %xmm13, %xmm13		; SSE2-NEXT: pcmpeqd %xmm13, %xmm13
; SSE2-NEXT: movdqa %xmm12, %xmm9		; SSE2-NEXT: movdqa %xmm12, %xmm9
; SSE2-NEXT: pxor %xmm13, %xmm9		; SSE2-NEXT: pxor %xmm13, %xmm9
; SSE2-NEXT: movdqa %xmm8, %xmm14		; SSE2-NEXT: movdqa %xmm2, %xmm14
; SSE2-NEXT: pcmpgtb %xmm6, %xmm14		; SSE2-NEXT: pcmpgtb %xmm6, %xmm14
; SSE2-NEXT: movdqa %xmm14, %xmm2		; SSE2-NEXT: movdqa %xmm14, %xmm2
; SSE2-NEXT: pxor %xmm13, %xmm2		; SSE2-NEXT: pxor %xmm13, %xmm2
; SSE2-NEXT: movdqa %xmm1, %xmm15		; SSE2-NEXT: movdqa %xmm1, %xmm15
; SSE2-NEXT: pcmpgtb %xmm5, %xmm15		; SSE2-NEXT: pcmpgtb %xmm5, %xmm15
; SSE2-NEXT: movdqa %xmm15, %xmm10		; SSE2-NEXT: movdqa %xmm15, %xmm10
; SSE2-NEXT: pxor %xmm13, %xmm10		; SSE2-NEXT: pxor %xmm13, %xmm10
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines
; SSE2-NEXT: movdqa %xmm2, %xmm8		; SSE2-NEXT: movdqa %xmm2, %xmm8
; SSE2-NEXT: movdqa %xmm0, %xmm10		; SSE2-NEXT: movdqa %xmm0, %xmm10
; SSE2-NEXT: movdqa %xmm7, %xmm12		; SSE2-NEXT: movdqa %xmm7, %xmm12
; SSE2-NEXT: pcmpgtb %xmm3, %xmm12		; SSE2-NEXT: pcmpgtb %xmm3, %xmm12
; SSE2-NEXT: pcmpeqd %xmm0, %xmm0		; SSE2-NEXT: pcmpeqd %xmm0, %xmm0
; SSE2-NEXT: movdqa %xmm12, %xmm9		; SSE2-NEXT: movdqa %xmm12, %xmm9
; SSE2-NEXT: pxor %xmm0, %xmm9		; SSE2-NEXT: pxor %xmm0, %xmm9
; SSE2-NEXT: movdqa %xmm6, %xmm13		; SSE2-NEXT: movdqa %xmm6, %xmm13
; SSE2-NEXT: pcmpgtb %xmm8, %xmm13		; SSE2-NEXT: pcmpgtb %xmm2, %xmm13
; SSE2-NEXT: movdqa %xmm13, %xmm2		; SSE2-NEXT: movdqa %xmm13, %xmm2
; SSE2-NEXT: pxor %xmm0, %xmm2		; SSE2-NEXT: pxor %xmm0, %xmm2
; SSE2-NEXT: movdqa %xmm5, %xmm14		; SSE2-NEXT: movdqa %xmm5, %xmm14
; SSE2-NEXT: pcmpgtb %xmm1, %xmm14		; SSE2-NEXT: pcmpgtb %xmm1, %xmm14
; SSE2-NEXT: movdqa %xmm14, %xmm11		; SSE2-NEXT: movdqa %xmm14, %xmm11
; SSE2-NEXT: pxor %xmm0, %xmm11		; SSE2-NEXT: pxor %xmm0, %xmm11
; SSE2-NEXT: movdqa %xmm4, %xmm15		; SSE2-NEXT: movdqa %xmm4, %xmm15
; SSE2-NEXT: pcmpgtb %xmm10, %xmm15		; SSE2-NEXT: pcmpgtb %xmm10, %xmm15
▲ Show 20 Lines • Show All 734 Lines • ▼ Show 20 Lines
; SSE2-LABEL: test146:		; SSE2-LABEL: test146:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm2, %xmm8		; SSE2-NEXT: movdqa %xmm2, %xmm8
; SSE2-NEXT: movdqa %xmm3, %xmm12		; SSE2-NEXT: movdqa %xmm3, %xmm12
; SSE2-NEXT: pcmpgtd %xmm7, %xmm12		; SSE2-NEXT: pcmpgtd %xmm7, %xmm12
; SSE2-NEXT: pcmpeqd %xmm13, %xmm13		; SSE2-NEXT: pcmpeqd %xmm13, %xmm13
; SSE2-NEXT: movdqa %xmm12, %xmm9		; SSE2-NEXT: movdqa %xmm12, %xmm9
; SSE2-NEXT: pxor %xmm13, %xmm9		; SSE2-NEXT: pxor %xmm13, %xmm9
; SSE2-NEXT: movdqa %xmm8, %xmm14		; SSE2-NEXT: movdqa %xmm2, %xmm14
; SSE2-NEXT: pcmpgtd %xmm6, %xmm14		; SSE2-NEXT: pcmpgtd %xmm6, %xmm14
; SSE2-NEXT: movdqa %xmm14, %xmm2		; SSE2-NEXT: movdqa %xmm14, %xmm2
; SSE2-NEXT: pxor %xmm13, %xmm2		; SSE2-NEXT: pxor %xmm13, %xmm2
; SSE2-NEXT: movdqa %xmm1, %xmm15		; SSE2-NEXT: movdqa %xmm1, %xmm15
; SSE2-NEXT: pcmpgtd %xmm5, %xmm15		; SSE2-NEXT: pcmpgtd %xmm5, %xmm15
; SSE2-NEXT: movdqa %xmm15, %xmm10		; SSE2-NEXT: movdqa %xmm15, %xmm10
; SSE2-NEXT: pxor %xmm13, %xmm10		; SSE2-NEXT: pxor %xmm13, %xmm10
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines
; SSE2-NEXT: movdqa %xmm2, %xmm8		; SSE2-NEXT: movdqa %xmm2, %xmm8
; SSE2-NEXT: movdqa %xmm0, %xmm10		; SSE2-NEXT: movdqa %xmm0, %xmm10
; SSE2-NEXT: movdqa %xmm7, %xmm12		; SSE2-NEXT: movdqa %xmm7, %xmm12
; SSE2-NEXT: pcmpgtd %xmm3, %xmm12		; SSE2-NEXT: pcmpgtd %xmm3, %xmm12
; SSE2-NEXT: pcmpeqd %xmm0, %xmm0		; SSE2-NEXT: pcmpeqd %xmm0, %xmm0
; SSE2-NEXT: movdqa %xmm12, %xmm9		; SSE2-NEXT: movdqa %xmm12, %xmm9
; SSE2-NEXT: pxor %xmm0, %xmm9		; SSE2-NEXT: pxor %xmm0, %xmm9
; SSE2-NEXT: movdqa %xmm6, %xmm13		; SSE2-NEXT: movdqa %xmm6, %xmm13
; SSE2-NEXT: pcmpgtd %xmm8, %xmm13		; SSE2-NEXT: pcmpgtd %xmm2, %xmm13
; SSE2-NEXT: movdqa %xmm13, %xmm2		; SSE2-NEXT: movdqa %xmm13, %xmm2
; SSE2-NEXT: pxor %xmm0, %xmm2		; SSE2-NEXT: pxor %xmm0, %xmm2
; SSE2-NEXT: movdqa %xmm5, %xmm14		; SSE2-NEXT: movdqa %xmm5, %xmm14
; SSE2-NEXT: pcmpgtd %xmm1, %xmm14		; SSE2-NEXT: pcmpgtd %xmm1, %xmm14
; SSE2-NEXT: movdqa %xmm14, %xmm11		; SSE2-NEXT: movdqa %xmm14, %xmm11
; SSE2-NEXT: pxor %xmm0, %xmm11		; SSE2-NEXT: pxor %xmm0, %xmm11
; SSE2-NEXT: movdqa %xmm4, %xmm15		; SSE2-NEXT: movdqa %xmm4, %xmm15
; SSE2-NEXT: pcmpgtd %xmm10, %xmm15		; SSE2-NEXT: pcmpgtd %xmm10, %xmm15
▲ Show 20 Lines • Show All 509 Lines • ▼ Show 20 Lines	entry:
%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a		%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test154(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test154(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test154:		; SSE2-LABEL: test154:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm8
; SSE2-NEXT: movdqa %xmm8, -{{[0-9]+}}(%rsp) # 16-byte Spill		; SSE2-NEXT: movdqa %xmm7, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm7, %xmm0		; SSE2-NEXT: movdqa %xmm7, %xmm0
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
▲ Show 20 Lines • Show All 255 Lines • ▼ Show 20 Lines	entry:
%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a		%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test156(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test156(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test156:		; SSE2-LABEL: test156:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm11		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill		; SSE2-NEXT: movdqa %xmm7, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm8
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm11, %xmm0		; SSE2-NEXT: movdqa %xmm11, %xmm0
▲ Show 20 Lines • Show All 284 Lines • ▼ Show 20 Lines	entry:
%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a		%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test158(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test158(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test158:		; SSE2-LABEL: test158:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm8
; SSE2-NEXT: movdqa %xmm8, -{{[0-9]+}}(%rsp) # 16-byte Spill		; SSE2-NEXT: movdqa %xmm7, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm7, %xmm0		; SSE2-NEXT: movdqa %xmm7, %xmm0
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
▲ Show 20 Lines • Show All 309 Lines • ▼ Show 20 Lines	entry:
%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a		%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test160(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test160(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test160:		; SSE2-LABEL: test160:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm11		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill		; SSE2-NEXT: movdqa %xmm7, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm8
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm11, %xmm0		; SSE2-NEXT: movdqa %xmm11, %xmm0
▲ Show 20 Lines • Show All 1,763 Lines • ▼ Show 20 Lines
; SSE2-NEXT: por %xmm3, %xmm2		; SSE2-NEXT: por %xmm3, %xmm2
; SSE2-NEXT: movdqa %xmm2, %xmm0		; SSE2-NEXT: movdqa %xmm2, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE4-LABEL: test180:		; SSE4-LABEL: test180:
; SSE4: # BB#0: # %entry		; SSE4: # BB#0: # %entry
; SSE4-NEXT: movdqa %xmm0, %xmm2		; SSE4-NEXT: movdqa %xmm0, %xmm2
; SSE4-NEXT: movdqa %xmm1, %xmm3		; SSE4-NEXT: movdqa %xmm1, %xmm3
; SSE4-NEXT: pcmpgtq %xmm2, %xmm3		; SSE4-NEXT: pcmpgtq %xmm0, %xmm3
; SSE4-NEXT: pcmpeqd %xmm0, %xmm0		; SSE4-NEXT: pcmpeqd %xmm0, %xmm0
; SSE4-NEXT: pxor %xmm3, %xmm0		; SSE4-NEXT: pxor %xmm3, %xmm0
; SSE4-NEXT: blendvpd %xmm0, %xmm2, %xmm1		; SSE4-NEXT: blendvpd %xmm0, %xmm2, %xmm1
; SSE4-NEXT: movapd %xmm1, %xmm0		; SSE4-NEXT: movapd %xmm1, %xmm0
; SSE4-NEXT: retq		; SSE4-NEXT: retq
;		;
; AVX1-LABEL: test180:		; AVX1-LABEL: test180:
; AVX1: # BB#0: # %entry		; AVX1: # BB#0: # %entry
▲ Show 20 Lines • Show All 462 Lines • ▼ Show 20 Lines
; SSE2-NEXT: por %xmm3, %xmm2		; SSE2-NEXT: por %xmm3, %xmm2
; SSE2-NEXT: movdqa %xmm2, %xmm0		; SSE2-NEXT: movdqa %xmm2, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSE4-LABEL: test188:		; SSE4-LABEL: test188:
; SSE4: # BB#0: # %entry		; SSE4: # BB#0: # %entry
; SSE4-NEXT: movdqa %xmm0, %xmm2		; SSE4-NEXT: movdqa %xmm0, %xmm2
; SSE4-NEXT: movdqa %xmm1, %xmm3		; SSE4-NEXT: movdqa %xmm1, %xmm3
; SSE4-NEXT: pcmpgtq %xmm2, %xmm3		; SSE4-NEXT: pcmpgtq %xmm0, %xmm3
; SSE4-NEXT: pcmpeqd %xmm0, %xmm0		; SSE4-NEXT: pcmpeqd %xmm0, %xmm0
; SSE4-NEXT: pxor %xmm3, %xmm0		; SSE4-NEXT: pxor %xmm3, %xmm0
; SSE4-NEXT: blendvpd %xmm0, %xmm1, %xmm2		; SSE4-NEXT: blendvpd %xmm0, %xmm1, %xmm2
; SSE4-NEXT: movapd %xmm2, %xmm0		; SSE4-NEXT: movapd %xmm2, %xmm0
; SSE4-NEXT: retq		; SSE4-NEXT: retq
;		;
; AVX1-LABEL: test188:		; AVX1-LABEL: test188:
; AVX1: # BB#0: # %entry		; AVX1: # BB#0: # %entry
▲ Show 20 Lines • Show All 282 Lines • Show Last 20 Lines

test/CodeGen/X86/widen_conv-3.ll

	Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; X86-SSE2-NEXT: movzbl 2(%ecx), %ecx			; X86-SSE2-NEXT: movzbl 2(%ecx), %ecx
	; X86-SSE2-NEXT: pinsrw $1, %ecx, %xmm0			; X86-SSE2-NEXT: pinsrw $1, %ecx, %xmm0
	; X86-SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]			; X86-SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
	; X86-SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]			; X86-SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]
	; X86-SSE2-NEXT: psrad $24, %xmm0			; X86-SSE2-NEXT: psrad $24, %xmm0
	; X86-SSE2-NEXT: cvtdq2ps %xmm0, %xmm0			; X86-SSE2-NEXT: cvtdq2ps %xmm0, %xmm0
	; X86-SSE2-NEXT: movss %xmm0, (%eax)			; X86-SSE2-NEXT: movss %xmm0, (%eax)
	; X86-SSE2-NEXT: movaps %xmm0, %xmm1			; X86-SSE2-NEXT: movaps %xmm0, %xmm1
	; X86-SSE2-NEXT: movhlps {{.*#+}} xmm1 = xmm1[1,1]			; X86-SSE2-NEXT: movhlps {{.*#+}} xmm1 = xmm0[1],xmm1[1]
	; X86-SSE2-NEXT: movss %xmm1, 8(%eax)			; X86-SSE2-NEXT: movss %xmm1, 8(%eax)
	; X86-SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,1,2,3]			; X86-SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; X86-SSE2-NEXT: movss %xmm0, 4(%eax)			; X86-SSE2-NEXT: movss %xmm0, 4(%eax)
	; X86-SSE2-NEXT: leal -4(%ebp), %esp			; X86-SSE2-NEXT: leal -4(%ebp), %esp
	; X86-SSE2-NEXT: popl %esi			; X86-SSE2-NEXT: popl %esi
	; X86-SSE2-NEXT: popl %ebp			; X86-SSE2-NEXT: popl %ebp
	; X86-SSE2-NEXT: retl			; X86-SSE2-NEXT: retl
	;			;
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

test/CodeGen/X86/widen_conv-4.ll

	Show All 13 Lines
	; X86-SSE2-NEXT: movdqa %xmm0, %xmm2			; X86-SSE2-NEXT: movdqa %xmm0, %xmm2
	; X86-SSE2-NEXT: punpckhwd {{.*#+}} xmm2 = xmm2[4],xmm1[4],xmm2[5],xmm1[5],xmm2[6],xmm1[6],xmm2[7],xmm1[7]			; X86-SSE2-NEXT: punpckhwd {{.*#+}} xmm2 = xmm2[4],xmm1[4],xmm2[5],xmm1[5],xmm2[6],xmm1[6],xmm2[7],xmm1[7]
	; X86-SSE2-NEXT: cvtdq2ps %xmm2, %xmm2			; X86-SSE2-NEXT: cvtdq2ps %xmm2, %xmm2
	; X86-SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]			; X86-SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
	; X86-SSE2-NEXT: cvtdq2ps %xmm0, %xmm0			; X86-SSE2-NEXT: cvtdq2ps %xmm0, %xmm0
	; X86-SSE2-NEXT: movups %xmm0, (%eax)			; X86-SSE2-NEXT: movups %xmm0, (%eax)
	; X86-SSE2-NEXT: movss %xmm2, 16(%eax)			; X86-SSE2-NEXT: movss %xmm2, 16(%eax)
	; X86-SSE2-NEXT: movaps %xmm2, %xmm0			; X86-SSE2-NEXT: movaps %xmm2, %xmm0
	; X86-SSE2-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]			; X86-SSE2-NEXT: movhlps {{.*#+}} xmm0 = xmm2[1],xmm0[1]
	; X86-SSE2-NEXT: movss %xmm0, 24(%eax)			; X86-SSE2-NEXT: movss %xmm0, 24(%eax)
	; X86-SSE2-NEXT: shufps {{.*#+}} xmm2 = xmm2[1,1,2,3]			; X86-SSE2-NEXT: shufps {{.*#+}} xmm2 = xmm2[1,1,2,3]
	; X86-SSE2-NEXT: movss %xmm2, 20(%eax)			; X86-SSE2-NEXT: movss %xmm2, 20(%eax)
	; X86-SSE2-NEXT: retl			; X86-SSE2-NEXT: retl
	;			;
	; X86-SSE42-LABEL: convert_v7i16_v7f32:			; X86-SSE42-LABEL: convert_v7i16_v7f32:
	; X86-SSE42: # BB#0: # %entry			; X86-SSE42: # BB#0: # %entry
	; X86-SSE42-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-SSE42-NEXT: movl {{[0-9]+}}(%esp), %eax
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; X86-SSE2-NEXT: movzbl 2(%ecx), %ecx			; X86-SSE2-NEXT: movzbl 2(%ecx), %ecx
	; X86-SSE2-NEXT: pinsrw $1, %ecx, %xmm0			; X86-SSE2-NEXT: pinsrw $1, %ecx, %xmm0
	; X86-SSE2-NEXT: pxor %xmm1, %xmm1			; X86-SSE2-NEXT: pxor %xmm1, %xmm1
	; X86-SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; X86-SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	; X86-SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]			; X86-SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
	; X86-SSE2-NEXT: cvtdq2ps %xmm0, %xmm0			; X86-SSE2-NEXT: cvtdq2ps %xmm0, %xmm0
	; X86-SSE2-NEXT: movss %xmm0, (%eax)			; X86-SSE2-NEXT: movss %xmm0, (%eax)
	; X86-SSE2-NEXT: movaps %xmm0, %xmm1			; X86-SSE2-NEXT: movaps %xmm0, %xmm1
	; X86-SSE2-NEXT: movhlps {{.*#+}} xmm1 = xmm1[1,1]			; X86-SSE2-NEXT: movhlps {{.*#+}} xmm1 = xmm0[1],xmm1[1]
	; X86-SSE2-NEXT: movss %xmm1, 8(%eax)			; X86-SSE2-NEXT: movss %xmm1, 8(%eax)
	; X86-SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,1,2,3]			; X86-SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; X86-SSE2-NEXT: movss %xmm0, 4(%eax)			; X86-SSE2-NEXT: movss %xmm0, 4(%eax)
	; X86-SSE2-NEXT: leal -4(%ebp), %esp			; X86-SSE2-NEXT: leal -4(%ebp), %esp
	; X86-SSE2-NEXT: popl %esi			; X86-SSE2-NEXT: popl %esi
	; X86-SSE2-NEXT: popl %ebp			; X86-SSE2-NEXT: popl %ebp
	; X86-SSE2-NEXT: retl			; X86-SSE2-NEXT: retl
	;			;
	▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

test/CodeGen/X86/x86-shrink-wrap-unwind.ll

	Show All 17 Lines
	;			;
	; Prologue code.			; Prologue code.
	; (What we push does not matter. It should be some random sratch register.)			; (What we push does not matter. It should be some random sratch register.)
	; CHECK: pushq			; CHECK: pushq
	;			;
	; Compare the arguments and jump to exit.			; Compare the arguments and jump to exit.
	; After the prologue is set.			; After the prologue is set.
	; CHECK: movl %edi, [[ARG0CPY:%e[a-z]+]]			; CHECK: movl %edi, [[ARG0CPY:%e[a-z]+]]
	; CHECK-NEXT: cmpl %esi, [[ARG0CPY]]			; CHECK-NEXT: cmpl %esi, %edi
	; CHECK-NEXT: jge [[EXIT_LABEL:LBB[0-9_]+]]			; CHECK-NEXT: jge [[EXIT_LABEL:LBB[0-9_]+]]
	;			;
	; Store %a in the alloca.			; Store %a in the alloca.
	; CHECK: movl [[ARG0CPY]], 4(%rsp)			; CHECK: movl [[ARG0CPY]], 4(%rsp)
	; Set the alloca address in the second argument.			; Set the alloca address in the second argument.
	; CHECK-NEXT: leaq 4(%rsp), %rsi			; CHECK-NEXT: leaq 4(%rsp), %rsi
	; Set the first argument to zero.			; Set the first argument to zero.
	; CHECK-NEXT: xorl %edi, %edi			; CHECK-NEXT: xorl %edi, %edi
	Show All 29 Lines
	; CHECK-LABEL: frameUnwind:			; CHECK-LABEL: frameUnwind:
	;			;
	; Compare the arguments and jump to exit.			; Compare the arguments and jump to exit.
	; No prologue needed.			; No prologue needed.
	;			;
	; Compare the arguments and jump to exit.			; Compare the arguments and jump to exit.
	; After the prologue is set.			; After the prologue is set.
	; CHECK: movl %edi, [[ARG0CPY:%e[a-z]+]]			; CHECK: movl %edi, [[ARG0CPY:%e[a-z]+]]
	; CHECK-NEXT: cmpl %esi, [[ARG0CPY]]			; CHECK-NEXT: cmpl %esi, %edi
	; CHECK-NEXT: jge [[EXIT_LABEL:LBB[0-9_]+]]			; CHECK-NEXT: jge [[EXIT_LABEL:LBB[0-9_]+]]
	;			;
	; Prologue code.			; Prologue code.
	; CHECK: pushq %rbp			; CHECK: pushq %rbp
	; CHECK: movq %rsp, %rbp			; CHECK: movq %rsp, %rbp
	;			;
	; Store %a in the alloca.			; Store %a in the alloca.
	; CHECK: movl [[ARG0CPY]], -4(%rbp)			; CHECK: movl [[ARG0CPY]], -4(%rbp)
	Show All 29 Lines
	; CHECK-LABEL: framelessnoUnwind:			; CHECK-LABEL: framelessnoUnwind:
	;			;
	; Compare the arguments and jump to exit.			; Compare the arguments and jump to exit.
	; No prologue needed.			; No prologue needed.
	;			;
	; Compare the arguments and jump to exit.			; Compare the arguments and jump to exit.
	; After the prologue is set.			; After the prologue is set.
	; CHECK: movl %edi, [[ARG0CPY:%e[a-z]+]]			; CHECK: movl %edi, [[ARG0CPY:%e[a-z]+]]
	; CHECK-NEXT: cmpl %esi, [[ARG0CPY]]			; CHECK-NEXT: cmpl %esi, %edi
	; CHECK-NEXT: jge [[EXIT_LABEL:LBB[0-9_]+]]			; CHECK-NEXT: jge [[EXIT_LABEL:LBB[0-9_]+]]
	;			;
	; Prologue code.			; Prologue code.
	; (What we push does not matter. It should be some random sratch register.)			; (What we push does not matter. It should be some random sratch register.)
	; CHECK: pushq			; CHECK: pushq
	;			;
	; Store %a in the alloca.			; Store %a in the alloca.
	; CHECK: movl [[ARG0CPY]], 4(%rsp)			; CHECK: movl [[ARG0CPY]], 4(%rsp)
	▲ Show 20 Lines • Show All 98 Lines • Show Last 20 Lines

test/CodeGen/X86/x86-shrink-wrapping.ll

	Show All 11 Lines


	; Initial motivating example: Simple diamond with a call just on one side.			; Initial motivating example: Simple diamond with a call just on one side.
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	;			;
	; Compare the arguments and jump to exit.			; Compare the arguments and jump to exit.
	; No prologue needed.			; No prologue needed.
	; ENABLE: movl %edi, [[ARG0CPY:%e[a-z]+]]			; ENABLE: movl %edi, [[ARG0CPY:%e[a-z]+]]
	; ENABLE-NEXT: cmpl %esi, [[ARG0CPY]]			; ENABLE-NEXT: cmpl %esi, %edi
	; ENABLE-NEXT: jge [[EXIT_LABEL:LBB[0-9_]+]]			; ENABLE-NEXT: jge [[EXIT_LABEL:LBB[0-9_]+]]
	;			;
	; Prologue code.			; Prologue code.
	; (What we push does not matter. It should be some random sratch register.)			; (What we push does not matter. It should be some random sratch register.)
	; CHECK: pushq			; CHECK: pushq
	;			;
	; Compare the arguments and jump to exit.			; Compare the arguments and jump to exit.
	; After the prologue is set.			; After the prologue is set.
	; DISABLE: movl %edi, [[ARG0CPY:%e[a-z]+]]			; DISABLE: movl %edi, [[ARG0CPY:%e[a-z]+]]
	; DISABLE-NEXT: cmpl %esi, [[ARG0CPY]]			; DISABLE-NEXT: cmpl %esi, %edi
	; DISABLE-NEXT: jge [[EXIT_LABEL:LBB[0-9_]+]]			; DISABLE-NEXT: jge [[EXIT_LABEL:LBB[0-9_]+]]
	;			;
	; Store %a in the alloca.			; Store %a in the alloca.
	; CHECK: movl [[ARG0CPY]], 4(%rsp)			; CHECK: movl [[ARG0CPY]], 4(%rsp)
	; Set the alloca address in the second argument.			; Set the alloca address in the second argument.
	; CHECK-NEXT: leaq 4(%rsp), %rsi			; CHECK-NEXT: leaq 4(%rsp), %rsi
	; Set the first argument to zero.			; Set the first argument to zero.
	; CHECK-NEXT: xorl %edi, %edi			; CHECK-NEXT: xorl %edi, %edi
	▲ Show 20 Lines • Show All 995 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MachineCopyPropagation] Extend pass to do COPY source forwardingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 104212

include/llvm/CodeGen/Passes.h

include/llvm/InitializePasses.h

lib/CodeGen/CodeGen.cpp

lib/CodeGen/MachineCopyPropagation.cpp

lib/CodeGen/TargetPassConfig.cpp

test/CodeGen/AArch64/aarch64-fold-lslfast.ll

test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll

test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll

test/CodeGen/AArch64/f16-instructions.ll

test/CodeGen/AArch64/flags-multiuse.ll

test/CodeGen/AArch64/merge-store-dependency.ll

test/CodeGen/AArch64/neg-imm.ll

test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size.ll

test/CodeGen/AMDGPU/attr-amdgpu-waves-per-eu.ll

test/CodeGen/AMDGPU/mubuf-offset-private.ll

test/CodeGen/AMDGPU/multilevel-break.ll

test/CodeGen/AMDGPU/private-access-no-objects.ll

test/CodeGen/AMDGPU/ret.ll

test/CodeGen/AMDGPU/scratch-simple.ll

test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll

test/CodeGen/ARM/atomic-op.ll

test/CodeGen/ARM/swifterror.ll

test/CodeGen/Mips/llvm-ir/sub.ll

test/CodeGen/PowerPC/fma-mutate.ll

test/CodeGen/PowerPC/inlineasm-i64-reg.ll

test/CodeGen/PowerPC/tail-dup-layout.ll

test/CodeGen/SPARC/32abi.ll

test/CodeGen/SPARC/atomics.ll

test/CodeGen/Thumb/thumb-shrink-wrapping.ll

test/CodeGen/X86/2006-03-01-InstrSchedBug.ll

test/CodeGen/X86/arg-copy-elide.ll

test/CodeGen/X86/avg.ll

test/CodeGen/X86/avx512-bugfix-25270.ll

test/CodeGen/X86/avx512-calling-conv.ll

test/CodeGen/X86/avx512-mask-op.ll

test/CodeGen/X86/avx512bw-intrinsics-upgrade.ll

test/CodeGen/X86/buildvec-insertvec.ll

test/CodeGen/X86/combine-fcopysign.ll

test/CodeGen/X86/complex-fastmath.ll

test/CodeGen/X86/divide-by-constant.ll

test/CodeGen/X86/fmaxnum.ll

test/CodeGen/X86/fminnum.ll

test/CodeGen/X86/fp128-i128.ll

test/CodeGen/X86/haddsub-2.ll

test/CodeGen/X86/haddsub-undef.ll

test/CodeGen/X86/inline-asm-fpstack.ll

test/CodeGen/X86/ipra-local-linkage.ll

test/CodeGen/X86/localescape.ll

test/CodeGen/X86/mul-i1024.ll

test/CodeGen/X86/mul-i512.ll

test/CodeGen/X86/mul128.ll

test/CodeGen/X86/pmul.ll

test/CodeGen/X86/powi.ll

test/CodeGen/X86/pr11334.ll

test/CodeGen/X86/pr29112.ll

test/CodeGen/X86/psubus.ll

test/CodeGen/X86/select.ll

test/CodeGen/X86/shrink-wrap-chkstk.ll

test/CodeGen/X86/sqrt-fastmath.ll

test/CodeGen/X86/sse-scalar-fp-arith.ll

test/CodeGen/X86/sse1.ll

test/CodeGen/X86/sse3-avx-addsub-2.ll

test/CodeGen/X86/statepoint-live-in.ll

test/CodeGen/X86/statepoint-stack-usage.ll

test/CodeGen/X86/vec_fp_to_int.ll

test/CodeGen/X86/vec_int_to_fp.ll

test/CodeGen/X86/vec_minmax_sint.ll

test/CodeGen/X86/vec_shift4.ll

test/CodeGen/X86/vector-blend.ll

test/CodeGen/X86/vector-idiv-sdiv-128.ll

test/CodeGen/X86/vector-idiv-udiv-128.ll

test/CodeGen/X86/vector-rotate-128.ll

test/CodeGen/X86/vector-sext.ll

[MachineCopyPropagation] Extend pass to do COPY source forwarding
ClosedPublic