This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
CodeGen/
1
MachineBasicBlock.h
-
MachineInstr.h
-
Passes.h
-
InitializePasses.h
-
Target/
-
Target.td
-
TargetFrameLowering.h
-
lib/
-
CodeGen/
3/11
BranchFolding.cpp
2/2
CFIInfoVerifier.cpp
3/5
CFIInstrInserter.cpp
-
CMakeLists.txt
-
CodeGen.cpp
3/7
MachineBasicBlock.cpp
-
MachineInstr.cpp
-
PrologEpilogInserter.cpp
2/4
TailDuplicator.cpp
-
TargetPassConfig.cpp
-
Target/X86/
-
X86/
-
X86CallFrameOptimization.cpp
-
X86FrameLowering.h
-
X86FrameLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
2009-03-16-PHIElimInLPad.ll
-
2011-10-19-widen_vselect.ll
-
GlobalISel/
-
add-scalar.ll
-
frameIndex.ll
-
O0-pipeline.ll
-
avg.ll
-
avx512-vbroadcast.ll
-
avx512bw-intrinsics-upgrade.ll
-
avx512bw-intrinsics.ll
-
avx512vl-intrinsics-fast-isel.ll
-
avx512vl-vbroadcast.ll
-
emutls-pie.ll
-
emutls.ll
2
epilogue-cfi-fp.ll
-
epilogue-cfi-no-fp.ll
-
fast-isel-store.ll
-
frame-lowering-debug-intrinsic-2.ll
-
frame-lowering-debug-intrinsic.ll
-
haddsub-2.ll
-
hipe-cc64.ll
-
imul.ll
-
legalize-shift-64.ll
-
load-combine.ll
-
masked_gather_scatter.ll
-
memset-nonzero.ll
-
merge-consecutive-loads-128.ll
-
movtopush.ll
-
mul-constant-result.ll
-
mul-i256.ll
-
pr21792.ll
-
pr29112.ll
-
pr30430.ll
-
pr32241.ll
-
pr32256.ll
-
pr32329.ll
-
pr32345.ll
-
pr32451.ll
-
pr9743.ll
-
push-cfi-debug.ll
-
push-cfi-obj.ll
-
push-cfi.ll
-
return-ext.ll
-
rtm.ll
-
setcc-lowering.ll
-
statepoint-call-lowering.ll
-
statepoint-gctransition-call-lowering.ll
-
statepoint-invoke.ll
-
throws-cfi-fp.ll
-
throws-cfi-no-fp.ll
-
vector-sext.ll
-
vector-shuffle-avx512.ll
-
vector-shuffle-v1.ll
-
wide-integer-cmp.ll
-
x86-framelowering-trap.ll
-
x86-no_caller_saved_registers-preserve.ll

Differential D18046

[X86] Providing correct unwind info in function epilogue
ClosedPublic

Authored by violetav on Mar 10 2016, 7:14 AM.

Download Raw Diff

Details

Reviewers

MatzeB
mkuper
iteratee
rnk

Commits

rG7b3a38ec306c: [X86] Correct dwarf unwind information in function epilogue
rL306529: [X86] Correct dwarf unwind information in function epilogue

Summary

This patch contains a pass that runs after basic block layout and inserts CFI instructions in epilogue. It assumes that there are CFI instructions inserted in prologue. It then finds the value of the offset/register values that are set and uses them to compute the offset/register values that should be set in epilogue. Also, it handles the case with multiple epilogues or epilogue block in the middle of the function by adding CFI instructions where needed.

Diff Detail

Repository: rL LLVM

Event Timeline

violetav updated this revision to Diff 50273.Mar 10 2016, 7:14 AM

violetav retitled this revision from to [X86] Providing correct unwind info in function epilogue.

violetav updated this object.

violetav added reviewers: rnk, mkuper.

violetav set the repository for this revision to rL LLVM.

violetav added subscribers: DavidKreitzer, majnemer, petarj, llvm-commits.

violetav added subscribers: ivanbaev, rankov.Mar 10 2016, 7:17 AM

Hi Violeta,

I'm in the middle of the review, but there's one thing I want to say about the general approach. What bothers me a bit is that recognizing the instructions that modify the stack pointer is separated from where these are actually are emitted. So, if we start making sp changes in the epilogue in some new, unpredictable way, this will break. On the other hand, I understand the need for doing this post-block-layout. (Unfortunately, I missed the original email, only found it now.)

I've been thinking, though - if we're ok with assuming that the sp only changes using a specific set of instructions, why not emit all cfa adjustment post-layout, instead of during frame lowering? And if we're not ok with that, then perhaps we need to solve the epilogue problem in a different way as well?

lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
577 ↗	(On Diff #50273)	Have you checked this with the Darwin people? I don't know enough (=anything) about compact encoding, so can't really say whether these changes make sense or not.
lib/Target/X86/X86CFIInstrInserter.cpp
102 ↗	(On Diff #50273)	The LLVM convention is to generally frown upon braces for single-line blocks, except when needed for readability. This isn't a hard-and-fast rule (and clang-format does not enforce it), but to be consistent with the rest of the code base, I'd suggest removing them.
102 ↗	(On Diff #50273)	What if we don't have debug info, but have EH? On the one hand, I'm not sure whether we care about CFA in the epilogue being correct for EH. On the other hand, I think we've already decided that we don't want -g being passed to cause LLVM to modify .eh_frame. And since we currently can't generate different .eh_frame and .debug_frame, this will happen.
108 ↗	(On Diff #50273)	Why not TRI->getStackRegister(MF)?
113 ↗	(On Diff #50273)	I think TRI->getPtrSizedFrameRegister(MF) already does what you want.
123 ↗	(On Diff #50273)	Why are these two on the heap?
124 ↗	(On Diff #50273)	Actually, why is this a member? Isn't it local to AnalyzeMBB()?
128 ↗	(On Diff #50273)	Can you use a range for?
165 ↗	(On Diff #50273)	This doesn't look it's used anywhere.
181 ↗	(On Diff #50273)	Maybe have CFIInstructions[CFIIndex].getOperation() as a temp variable?
219 ↗	(On Diff #50273)	This doesn't look right. Why would the first block that has pops be the epilogue? More generally, will it make sense to mark the epilogue block while it's emitted? Sure, it can move it around, but do we expect it to be split? If we do, and can't mark it ahead of time, then I think we need a better way to recognize it.
test/CodeGen/X86/epilogue-cfi-fp.ll
2	Any chance the tests can be made smaller? E.g. even if we need debug info to be available, we don't actually need the debug info, only the flag, right?

mkuper added inline comments.Mar 10 2016, 4:31 PM

lib/Target/X86/X86CFIInstrInserter.cpp
322 ↗	(On Diff #50273)	Reformat this to make the comment look better? (I guess there was an 80-char violation there, and clang-format broke the comment up?)
332 ↗	(On Diff #50273)	Don't you have this in StackPtr already?
347 ↗	(On Diff #50273)	You mean just "pops", right? It's not a "pop %esp".
348 ↗	(On Diff #50273)	Same as above re comment.
354 ↗	(On Diff #50273)	Oh, so you did mean "pop %esp"? If not, then shouldn't this be SlotSize? If you did, then is the check for the pop argument missing? And what happens for a pop to a different register?
365 ↗	(On Diff #50273)	Can we maybe have different code only to get the offset based on the opcode, and then use the same code to actually construct the CFI instruction? It looks like most of the inside of these 3 ifs is duplicated.

violetav added inline comments.Mar 17 2016, 10:29 AM

lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
577 ↗	(On Diff #50273)	No, I haven't. I will post a question to llvm-dev.
lib/Target/X86/X86CFIInstrInserter.cpp
102 ↗	(On Diff #50273)	I didn't know about the decision concerning -g and .eh_frame. Yes, these changes would modify .eh_frame now, since there is no support for saying which CFI directives should go in .eh_frame, and which in .debug_frame. Are there any plans for adding the support for generating different .eh_frame and .debug_frame?
123 ↗	(On Diff #50273)	I changed BlocksToAnalyze not to be, as there was no particular reason for it. And as for MBBInfoList, I wanted to create an array and index its elements so I created one on heap because its length is variable (number of MBBs in MF). Should I use SmallVector or something else instead?
219 ↗	(On Diff #50273)	FoundEpilogue is a badly named variable. The thing I am trying to recognize here isn't 'the epilogue', instead, I'm looking for each BB that contains instructions that change the value of SP (or FP). That is what I consider to be 'an epilogue' here. The flag FoundEpilogue would probably be better called something such as ShouldUpdateCFI, because it is set to true when a BB contains instructions that cause the need for adding CFI instructions that update values of offset/register. As for marking the epilogue block while it's emitted, in addition to it moving around, yes, it can be split, it can also 'disappear' (by merging into the previous block). In my opinion, maintaining info about a BB being an epilogue during all passes would be complicated and error prone.
354 ↗	(On Diff #50273)	No, the comment is wrong, I did not mean "pop %esp". InitialOffset is set to SlotSize at the beginning.

Thank you for the review, Michael!

One of the reasons against inserting CFI instructions in epilogue during frame lowering, that I have come across, is the fact that their existence in the epilogue block interferes with the work of the tail duplication pass; it doesn't duplicate blocks which it would otherwise and therefore produces different code.

Generating all cfa adjustment post-layout sounds like a good idea. However, if support for generating different .eh_frame and .debug_frame is added, I am not sure if moving all CFA adjustments to a late pass would somehow affect the decision on which CFI directive should end up in .eh_frame and which shouldn't. Maybe generating CFI directives for .eh_frame could be left as it is right now (done in frame lowering), and generating all CFI for .debug_frame could be done in a late pass.

I addressed your comments in the patch.

Comments inline.

Regardless of those comments, I'm still not sure I'm comfortable with the approach.
Reid, I think you originally supported this on the mailing list. Can you please take a look and see whether this is what you had in mind?

lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
577 ↗	(On Diff #50955)	May also be worthwhile to add someone relevant to this review.
lib/Target/X86/X86CFIInstrInserter.cpp
103 ↗	(On Diff #50955)	I don't think there are any plans to do that right now, unfortunately. Is there anything preventing you from generating this for .eh_frame?
124 ↗	(On Diff #50955)	Or an std::vector - I don't think SmallVector is too appropriate here, there's no reason for MF.size() to be small.
202 ↗	(On Diff #50955)	Can we maybe rip this whole if out into a separate function that returns ShouldUpdateCFI?
220 ↗	(On Diff #50955)	Can you also update the comments/names, as well as the comment in the class definition that explains what InsertCFIInEpilogue() does?
238 ↗	(On Diff #50955)	This still seems somewhat odd. The logic in InsertCFIInEpilogue made sense to me, when it was meant specifically for epilogues. But now, this will affect every block - including non-epilogue blocks that happen to pop / change esp. If I understand what's going on correctly, this will add a .cfi_def_cfa_offset at the end of each block that modifies the cfa offset. But doesn't CorrectCFA already do this, regardless? Or am I getting something wrong here? In any case, perhaps it will be easier to understand once the names and comments are updated.
355 ↗	(On Diff #50955)	It may be so, but this has nothing to do with the initial offset - this is explicitly the slot size.
test/CodeGen/X86/epilogue-cfi-fp.ll
3	This is a good start, but the tests still look like they contain much more than needed to check the specific things they check. Both in terms of debug info and attributes, and, for the EH test, a lot of code that seems to me to be redundant.

violetav added inline comments.Apr 11 2016, 10:08 AM

lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
577 ↗	(On Diff #50955)	I posted a question to llvm-dev and the response http://lists.llvm.org/pipermail/llvm-dev/2016-March/097044.html made me think about the possible solution to the problem concerning generating compact unwind encoding as well as the problem of impacting eh_frame in the case when -g is passed. The solution consisted of marking CFI instructions with "frame destroy/frame setup" tags and then disregarding the CFI instructions marked with 'frame destroy' when generating compact unwind, or emitting eh_frame. However, the problem with this solution is the case when there are already generated .s files, and then .o files are generated: in that case, none of the CFI instructions are marked with 'frame setup/frame destroy' tags, because information about the tags does not exist in the .s file. Because of this, and because the response on the mailing list said that "The Darwin compact unwinder is never going to learn anything to its benefit from a CFI instruction in the epilogue.", I decided to disable the pass for Darwin. Another thing that I found out is that when using gas to assemble the .s file generated by llvm, CFI instructions added to the epilogue would also be inserted in the eh_frame.
lib/Target/X86/X86CFIInstrInserter.cpp
103 ↗	(On Diff #50955)	No, there isn't. The CFI instructions that I'm generating already end up in eh_frame.
238 ↗	(On Diff #50955)	No, InsertCFIInEpilogue() will add appropriate CFI instructions after each instruction that changes the SP (FP). For example, it will add a .cfi_def_cfa_offset with the updated offset after each pop instruction. I added a check to see whether instructions that cause the CFI instructions to be inserted are marked as FrameDestroy. That way, it can be assumed that they are part of the epilogue. On the other hand, CorrectCFA() inserts a CFI instruction at the beginning of a MBB if it's needed. The case when this is needed is when epilogue block (with updated CFI) comes before another MBB that should have the offset that is set by the prologue. Then, a cfi_def_cfa_offset is inserted at the beginning of that MBB so it would override the offset set in the epilogue, and therefore provide correct offset for the MBB in question.

Hi Michael,
Thank you for the review.
I addressed your comments in the patch and answered your questions inline.

I would like to explain why I went with this approach.
My first solution was to add the CFI instructions during emitEpilogue() in frame lowering. I realized that I would need to cover the cases when epilogue was in the middle of the function (when it was setting the wrong offset for blocks below it). The solution for that problem was to add a pass that would run after block placement and insert additional CFI instructions that correct the offset for BBs that have the wrong offset set by the epilogue above them (this is basically what the CorrectCFA method does now). However, then I found out another problem with inserting CFI instructions during emitEpilogue(): they interfered with later passes, e.g. the tail duplication pass wouldn't duplicate blocks that contained them. That is when I decided to move the insertion of CFI instructions in epilogue to the implemented pass.

What is it about this approach that you are not comfortable with? Do you have any suggestions on how it could be improved?

Sorry, I'm late to the party here (I was about halfway through implementing this myself before I found this). I'd also like to understand why a separate pass is needed. For the epilogue in the middle of a function problem, why are save/restore CFI instructions not sufficient? Can you elaborate on the problem in TailDuplication?

Rather than using a heuristic to decide what is an epilogue, why not check the FrameDestroy flag on the instructions?

Wrt to the the Darwin problem, the current solution that's implemented in X86AsmBackend doesn't seem great to me (since it generically doesn't handle cfi instructions in epilogues). Perhaps the correct solution is to have a heuristic that detects which cfi instructions are part of the prologue? This wouldn't be a problem of course if compact unwind info were generated earlier (as it used to be before rL190290), but the concern about not getting compact unwind info from the .s if we could have gotten it from the .ll is quite valid. How does GCC handle this situation?

Also, minor review comment inline.

lib/Target/X86/X86CFIInstrInserter.cpp
112 ↗	(On Diff #53270)	Have you checked that this works with x32, I saw problems there in my implementation.

Two more comments I found while testing this out.

lib/Target/X86/X86CFIInstrInserter.cpp
105 ↗	(On Diff #53270)	`&& !MF->getFunction()->needsUnwindTableEntry()` would be good. That's what's used elsewhere.
377 ↗	(On Diff #53270)	Missing `X86::ADD64ri32`. A sufficiently large alloca would probably work for a test.

The problem with TailDuplication is the main reason why a separate pass is needed. In shouldTailDuplicate() method, while going through instructions in the BB, the pass finds CFI instructions (that are marked as NotDuplicable in Target.td) and decides not to duplicate the given BB.

Using save/restore CFI instructions could probably simplify solving the problem of epilogue in the middle, if CFI instructions were inserted in epilogue in frame lowering.

Another problem that complicates the case when CFI instructions are inserted in epilogue during frame lowering is shrink wrapping. It can cause epilogue to split (e.g. to a BB containing 'pop' instructions and a BB consisting of 'ret' instruction). Those BBs can be reordered later, and there could exist a situation where 'ret' instruction comes before 'pop' instructions. That ret instruction should have def_cfa_offset set by the epilogue, not prologue. This problem could be solved in some way, but the solution would probably be complicated and scattered among different passes.
Also, I believe that problems with other passes could come up, but that we just haven't come across them yet. Those, and mainly the TailDuplication issue, are the reasons why I decided to implement a separate pass to solve this issue.

As for deciding what is an epilogue, I will check if just checking the FrameDestroy flag is sufficient.

Concerning the problem with Darwin, I tried setting FrameSetup tag to CFI instructions added in prologue and using that when generating compact unwind, and it works, but not for the case when it is generated from the .s file (because the information about the tag is lost). I haven't checked what GCC does yet.

lib/Target/X86/X86CFIInstrInserter.cpp
112 ↗	(On Diff #53270)	I didn't notice any problems with x32 so far. What problems have you come across?
377 ↗	(On Diff #53270)	Thanks for pointing it out!

In D18046#381033, @mkuper wrote:

Reid, I think you originally supported this on the mailing list. Can you please take a look and see whether this is what you had in mind?

Apparently I did say that. This pass ended up being surprisingly complicated, I thought it would be a lot simpler. =/

I guess I think we do need a late pass, but it should coordinate with X86FrameLowering so that it doesn't have to reason too much about arbitrary X86 code. We shouldn't have to try to pattern match instructions that reset ESP from EBP, for example. We shouldn't need to run a dataflow algorithm to calculate the CFA on exit-entry to MBB. X86FrameLowering should *know* this information, and should write it down in some side table that we can refer to in this late pass.

Does that make sense? The late pass should only exist to handle the special case of machine block placement creating mid-function return blocks. The majority of the CFA logic should be in the prologue/epilogue generation.

Could we have an intermediate state where between frame lowering and the final cleanup pass, we essentially assume that at entry to every basic block we're in the CFI state we would have been after the prologue and then use the late pass to insert save/restore instruction when it detects that is not the case? Eventually the late pass could also do cfi optimization.

As for the Darwin problem, I think I've come around to the idea that the only possible way to do this properly is to generate the unwind info while we still know what the prologue is, record it in the assembly (I think cfi_fde_data lets you do that) and only do the heuristic inference thing if no such annotation is found.

lib/Target/X86/X86CFIInstrInserter.cpp
112 ↗	(On Diff #53270)	Passing StackPtr to getDwarfRegNum didn't work. Note that x32 is 32bit pointers on 64bit not x86.

The key point with that proposal being (which I realized I didn't mention) that one could handle tail duplication by inserting save state at the beginning of the basic block, then remember state just before the duplicated instructions and have everything else as usual. Of course it would then be nice for the late pass to clean this up (e.g. remove save/restore pairs when there's no instructions in between), but at least it would be correct.

I think we want to make sure that we move in a direction that makes it easier to do optimizations that affect the CFI between X86FrameLowering and this late pass. For example, we cannot schedule the pushes generated by the X86CallFrameOptimization pass without moving the CFI along with the push. So we generate very poor code in cases like this where the push operands get in the way of outgoing inreg arguments:

target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i386-unknown-linux-gnu"

declare i32 @f1(i32 inreg, i32 inreg, i32 inreg, i32, i32)
define i32 @f2(i32 inreg %a, i32 inreg %b, i32 inreg %c, i32 %d, i32 %e) nounwind {
entry:
  %call = tail call i32 @f1(i32 inreg 1, i32 inreg 2, i32 inreg 3, i32 %a, i32 %b)
  %add = add nsw i32 %call, 1
  ret i32 %add
}

LLVM generates this:

f2:
	pushl	%edi
	pushl	%esi
	pushl	%eax
	movl	%edx, %esi
	movl	%eax, %edi
	subl	$8, %esp
	movl	$1, %eax
	movl	$2, %edx
	movl	$3, %ecx
	pushl	%esi
	pushl	%edi
	calll	f1
	addl	$16, %esp
	incl	%eax
	addl	$4, %esp
	popl	%esi
	popl	%edi
	retl

icc generates much cleaner code (gcc is similar):

f2:
        subl      $20, %esp
        movl      $3, %ecx
        pushl     %edx
        pushl     %eax
        movl      $1, %eax
        movl      $2, %edx
        call      f1
        incl      %eax
        addl      $28, %esp
        ret

We would also like the ability to accumulate the stack-cleanup "add %esp" instructions for a series of calls like this:

target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i386-unknown-linux-gnu"

declare void @C(i32, i32, i32, i32)
define void @F() nounwind {
entry:
  tail call void @C(i32 1, i32 2, i32 3, i32 4)
  tail call void @C(i32 5, i32 6, i32 7, i32 8)
  tail call void @C(i32 9, i32 10, i32 11, i32 12)
  ret void
}

Instead of what is currently generated

F:
	subl	$12, %esp
	pushl	$4
	pushl	$3
	pushl	$2
	pushl	$1
	calll	C
	addl	$16, %esp
	pushl	$8
	pushl	$7
	pushl	$6
	pushl	$5
	calll	C
	addl	$16, %esp
	pushl	$12
	pushl	$11
	pushl	$10
	pushl	$9
	calll	C
	addl	$28, %esp
	retl

we can eliminate both "addl $16, %esp" instructions and bump up the last %esp adjust to "addl $60, %esp". This is simpler to do without having separate CFI & stack-adjust instructions.

To put this into a concrete proposal, I would suggest making this new pass responsible not only for generating proper epilog CFI but also for generating the CFI for simple stack adjusts. That would not only help enable optimizations like the above, but also eliminate the need for transformations that generate fixed stack adjusts to worry about also generating CFI. There have recently been at least 3 patches that added CFI for transforms involving stack adjusts (see D13767, D14021, D18246) all with their own logic for adding the CFI and deciding whether or not it's necessary.

To avoid having to calculate CFA on entry/end in the late pass, we could remember information about entry/end offset (and register) of a MBB in frame lowering and just use that info in the pass later. Possible problems could be BB merging, splitting and creating in later passes. Reid, is this what you had in mind? I am not sure if you meant to leave the insertion of CFI instructions in epilogue in the frame lowering, or to use the information available during frame lowering in order to simplify the implemented pass (but leave the insertion of CFI instructions in the late pass). Also, is storing information in side tables already used somewhere in the code?

We could store certain information about instructions: how they affect the offset/register. This information could be written down when the instruction is created, and then used in the late pass to generate and insert CFI instructions. That could make it easier to do optimizations that affect the CFI between X86FrameLowering and this pass.

I will start investigating these ideas, to see what kind of problems may arise. Do you have any thoughts on them?

Keno, could you explain your proposal a bit further, maybe provide an example?

My thinking was the following: There's generally three phases to CFI:

Setting up everything (prologue)
The function body
(Potentially multiple epilogues)

On entry to most basic blocks, the current state of the CFI program will be that which we established after the prologue.
The idea is basically to, in the early passes have the invariant that no basic block other than the prologue changes the CFI
state (i.e. the state at the beginning of the basic block must be the same as the state at the end of the basic block). I think such an invariant
could be easily maintained in all the pre-layout passes (worst case by just inserting remember/restore pairs). I think in
most situations this will actually be already correct (since the only places inside the function where the CFI state changes is the epilogue).
The job of the late pass would then simply be to clean everything up once the layout is known (e.g. removing redundant remember/restore pairs, coalescing multiple instructions etc.

Hi Keno,

Thanks for the explanation. Just to check if I am following you, my understanding of your idea is to:

insert CFI instructions in epilogue during frame lowering (I suppose to use cfi_adjust_cfa_offset)
surround epilogue with remember state/restore state CFI instructions (also in frame lowering)
teach TailDuplication to recognize and allow this pattern (a pair of remember/restore state instructions + adjust instructions in between). By "allowing" I mean to duplicate that block even though it contains CFI instructions.
have the late pass remove duplicate remember/restore state instructions

Is this what you had in mind?

Hi, I would like to propose a potential solution to this problem. If anyone knows a reason why this would fail, or if someone can think of a better way to do it, share your thoughts, help would be much appreciated. Here is the proposal:

insert cfi instructions in epilogue during frame lowering

add a flag to each MBB that specifies the type of that MBB based on its beginning and end cfi offset:
- type 1: MBBs that are inside the prologue-epilogue path and have the same beginning and end offset (the one set by prologue)
- type 2: prologue block that has initial frame offset as the beginning offset and sets its own end offset
- type 3: epilogue blocks that have beginning offset the same as end prologue offset and end offset the same as initial frame offset
- type 4: MBBs that are outside the prologue-epilogue path, that have beginning and end offset the same as the initial frame offset

Default value of the flag would be type 1. During frame lowering, prologue and epilogue blocks would have their appropriate flags set, as well as blocks that are outside the prologue-epilogue path. Maybe this could also be done in shrink wrapping, by finding predecessors of the prologue and successors of the epilogue. This would probably require some tree searching, but it would be limited to basic blocks outside of prologue-epilogue path. Does anyone know of a better way to find information about which blocks are outside of the prologue-epilogue path?

teach tail duplication and any other pass that causes problems to allow cfi instructions in the epilogue and not to change the code generation

maintain information about added flags throughout different passes (during merging, splitting of MBBs)

the late pass would go through all MBBs while keeping track of the current cfi offset that is set and add cfi instructions if needed (based on the information from the added flags). Could information about offset set by the prologue be obtained from getStackSize() (for the case without FP)?

What are your thoughts? Can anyone think of something that breaks this solution? One thing that comes to mind would be the existence of multiple prologues, i.e. multiple prologue-epilogue pairs with different frame sizes. Can this happen in LLVM?

Does anyone have any comments?

Sorry for not getting back to this thread sooner. The plan looks reasonable to me, though I worry about the possibility of an epilogue split over two MBBs, which doesn't seem like it would be doable in this scheme. I don't think there are cases in LLVM where there is more than one prologue.

Hi Keno, thanks for commenting. As for the splitting of the epilogue over two MBBs, I have come across a case where 'pop' instructions were separated from the 'ret' instruction ('pop' instrs were in one MBB, and 'ret' was in another MBB). That case would be covered with the proposed solution. I haven't come across a case where 'add', 'pop' instructions that form an epilogue are split over multiple MBBs.

I would start working on implementing this if no one has anything against the proposed solution.

Here is the current implementation of the proposed solution.
I have added a new flag to a MBB that represents its type based on its beginning and end cfi offset. There are 4 types of MBBs: PROLOGUE, EPILOGUE, IN_PATH and OUT_PATH. Default type of the MBB is IN_PATH. During prologue and epilogue emission in X86FrameLowering, corresponding MBBs are set as PROLOGUE and EPILOGUE, and also, MBBs that are not in the prologue/epilogue path are marked as OUT_PATH. There are changes in some of the passes that split or merge MBBs, where these types are updated correctly. There is also a late pass that inserts additional CFI instructions to the beginnings of the MBBs for the cases where epilogue is in the middle of the function and sets incorrect CFI offset and register values for those MBBs.
Does this look like a step in the right direction? Does anyone have any comments?

Herald added a subscriber: mgorny. · View Herald TranscriptMar 16 2017, 9:59 AM

rnk added a reviewer: MatzeB.Mar 21 2017, 10:45 AM

rnk added a reviewer: iteratee.Mar 21 2017, 10:47 AM

rnk added inline comments.Mar 21 2017, 10:50 AM

lib/CodeGen/BranchFolding.cpp
305	Please don't do these types of triple checks in target independent CodeGen code. Use TII. What are you trying to do here? Avoid tail merging epilogues, or do more tail merging, or...?
449	ditto

The more I think about this, the more I think this is the wrong approach.

As I understand it, the problem is that CFI line tables are linear based on the assembly output, but that the actual CFI adjustments can be anywhere, and we need some way to "reset" the values for the linear layout.

This isn't just an epilogue problem, we are limited in changing stores to pushes because of this. If we have a diamond with a spill in the top of the diamond and a restore in the join, we will get this wrong if we don't lay the diamond out linearly. This is just an example, but there are other things that having better cfa tracking would enable.

If we're going to add information to a basic block, why don't we store the incoming and outgoing cfa_offset, and cfa_register? Then we can allow CFI directives to be copied.
This allows several beneficial things:

We can shrink-wrap things that need to use a smaller prologue (meaning there are actually 2 prologues in the function)
We can add a verification pass that all outgoing offsets of predecessors match incoming offsets of successors.
We can do more store-to-push conversions
This is platform independent. CFI cfa_def directives are platform independent, and only support register,offset or dwarf expressions. We can ignore dwarf expressions for now.

Then the pass that's needed simply inserts a directive between blocks that are linear-mismatches, and we can even handle rbp,rsp mismatches.

Hi Kyle, thank you for commenting!
I am not sure that I fully understand what you are proposing.
Are you saying that we should attach info about incoming and outgoing cfa_offset and register to a MBB or actually add CFI instructions to the beginning and end of a MBB (and then remove unnecessary CFI instructions later)?
Do you think that CFI instructions in epilogue should be inserted during frame lowering, when instructions that change the SP are created? Same question for the store-to-push optimization. The problem is that CFI instructions that are inserted earlier (before MBB reordering, merging, splitting, that is done in the later passes (tail duplication, control flow optimizer)) affect code generation. If we add information about incoming and outgoing CFI offset and register to a MBB, we would definitely be more flexible. That would support multiple MBBs that change CFI offset (that have different CFI offset at their beginning and end) besides prologue and epilogue. However, the problem with CFI instructions impacting code generation would still be present, and would be handled similarly to the way it is handled in the proposed implementation. Do you have something else in mind for solving this issue?
What do you think is wrong with this approach?

lib/CodeGen/BranchFolding.cpp
305	How can I use TII to get this information? To do more tail merging (CFI instructions have different CFI indices, so isIdenticalTo() returns false, and prevents tail merging that would have happened otherwise).

In D18046#712052, @violetav wrote:

Hi Kyle, thank you for commenting!
I am not sure that I fully understand what you are proposing.
Are you saying that we should attach info about incoming and outgoing cfa_offset and register to a MBB or actually add CFI instructions to the beginning and end of a MBB (and then remove unnecessary CFI instructions later)?

Attach the info

Do you think that CFI instructions in epilogue should be inserted during frame lowering, when instructions that change the SP are created?

Yes.

Same question for the store-to-push optimization.

Yes.

The problem is that CFI instructions that are inserted earlier (before MBB reordering, merging, splitting, that is done in the later passes (tail duplication, control flow optimizer)) affect code generation.

They currently affect code generation. This is my biggest complaint. I don't see why they should affect these passes.

If we add information about incoming and outgoing CFI offset and register to a MBB, we would definitely be more flexible. That would support multiple MBBs that change CFI offset (that have different CFI offset at their beginning and end) besides prologue and epilogue. However, the problem with CFI instructions impacting code generation would still be present, and would be handled similarly to the way it is handled in the proposed implementation. Do you have something else in mind for solving this issue?
What do you think is wrong with this approach?

I've answered above, but I'm still not convinced that CFI instructions, being only assembler directives to generate dwarf info should affect code generation. I think this is the big problem worth solving, and then the rest of the work will be small problems.

They currently affect code generation. This is my biggest complaint. I don't see why they should affect these passes.

This patch modifies those passes with changes that prevent CFI instructions from affecting code generation, so, with these changes we will avoid affecting code generation.

In D18046#715352, @violetav wrote:

They currently affect code generation. This is my biggest complaint. I don't see why they should affect these passes.

This patch modifies those passes with changes that prevent CFI instructions from affecting code generation, so, with these changes we will avoid affecting code generation.

Those kinds of modifications are exactly what I don't want to see.

Here's what I'd like to see:
Change CFI Instructions so that the following 3 things are no longer true:

CFI Instructions are marked as not duplicable.
CFI Instructions don't compare as equal (and so can't be merged)
CFI Instructions count as instructions when counting for tail-duplicating or tail-merging.

The first 2 shouldn't require any changes to existing passes, and the third should be minor, and definitely not platform specific.
These will require other work as well, but basically what I described above.

As a second best, special casing these instructions in a platform-independent way for platforms that can handle them later would work.
That means adding something to Target Info to indicate if a platform can handle these instructions, defaulting it to false, and then setting it to true for X86.
That would be sufficient for tail-duplication and tail merging.

lib/CodeGen/BranchFolding.cpp
305	CFI Instructions should be more like the debug instructions that are skipped below. It shouldn't be platform specific. CFI isn't platform specific.
lib/CodeGen/TailDuplicator.cpp
588	This tells me that marking CFI Instructions as not duplicable is wrong for all platforms.
607–608	And this tells me that either it needs to be marked as a debug instruction, or we need a broader category for assembler directives that are not instructions.
lib/CodeGen/TargetInstrInfo.cpp
396 ↗	(On Diff #92010)	I read the DWARF spec on CFI. I didn't see where it said this only worked on X86.

Here is an implementation of the approach that was suggested.

The following set of changes is introduced:

Changes to CFI instructions:
- they are now marked as duplicable.
- they can compare as equal (they are compared based on operation/offset/register values, and not on CFIIndex).
- they do not count as instructions when tail-duplicating or tail-merging.

These changes ensure that CFI instructions in epilogue do not affect code generation.

Attached information about cfa offset and cfa register to MachineBasicBlock. Each basic block now has info about cfa offset and register valid at it's entry and exit. When a CFI instruction gets inserted into a basic block, it is checked whether block's outgoing cfa offset and register need to be updated. It is checked whether incoming and outgoing information of the block's successors needs to be updated as well. This information is also updated when blocks are split, merged, duplicated or created.

This information is used by a late pass that inserts CFI instructions in order to set correct cfa offset and register for basic blocks. This needs to be done if blocks get reordered in a way that some have incorrect cfa offset and register set by previous blocks.
A verification pass is added that checks that outgoing cfa offset and register of predecessor blocks match incoming offset and register of their successors.

The current implementation adds CFI instructions in epilogue for X86. However, the changes described above can be used for all platforms.

Some initial thoughts. I would like to hide the actual CFI algorithms from the existing passes as much as possible.

lib/CodeGen/BranchFolding.cpp
450	I'd like to see this factored out into MachineBasicBlock. Something like recalculateCFI(bool useExistingIncoming)
1012	This looks like the same code above. Please factor this out.
lib/CodeGen/TailDuplicator.cpp
607–608	Can you create a function on MI: isDirective() and have it return true for debugvalue and CFIInstruction?
924	It would nice to not have this in the pass, but rather as an abstraction: PrevBB->mergeCFIInfo(TailBB);

Addressed comments about hiding CFI algorithms from the existing passes.

violetav marked 4 inline comments as done.Jun 7 2017, 12:32 PM

This is coming along nicely. I forgot to say last time that I was pleased overall.

I have a few more things, but they're slowly getting smaller.

lib/CodeGen/BranchFolding.cpp
349	should this be isDirective?
366	Is this block still necessary if the tests above are changed to isDirective? I'm having trouble making sense of them
lib/CodeGen/MachineBasicBlock.cpp
1456	Why is this offset negative? Can you explain it to me?

Fixed a bug related to calculating values of incoming and outgoing cfa offset. For def_cfa and def_cfa_offset, negative offset value was used for calculation, and that was not the case for adjust_cfa_offset. Now, the values stored as in/out cfa offsets are the actual offsets set by cfi directives (and not their negative values).

violetav added inline comments.Jun 15 2017, 6:31 AM

lib/CodeGen/BranchFolding.cpp
349	I don't think that this logic can be applied to CFI instructions. It seems that this code aims to set the iterators to include consecutive DBG_VALUE instructions above last common instruction in one block (when those DBG instructions do not exist in the other block). I don't think that that's the desired behaviour for CFI instructions.
366	It is. The tests above make sure that CFI instructions are not compared with isIdenticalTo(), and are not included when calculating TailLen. This code ensures that the iterators do not point to CFI instructions (so CFI instructions do not represent the starting point of the common tail in a BB, as they do not exactly 'count' as common instructions of two blocks). For example: BB1: ... INSTRUCTION_A ADD32ri8 CFI_INSTRUCTION POP32r CFI_INSTRUCTION POP32r CFI_INSTRUCTION RET BB2: ... INSTRUCTION_B CFI_INSTRUCTION ADD32ri8 CFI_INSTRUCTION POP32r CFI_INSTRUCTION POP32r CFI_INSTRUCTION RET In this example, BB1 and BB2 will have 4 common instructions (RET, POP, POP and ADD). When INSTRUCTION_A and INSTRUCTION_B are compared as not equal, after incrementing the iterators (++I1; ++I2;), I1 will point to ADD (the last common instruction, as it should), however I2 will point to the CFI instruction. This will, later on, result in BB2 being 'hacked off' (in ReplaceTailWithBranchTo()) at the wrong place (starting from this CFI instruction, instead of ADD below it), so this CFI instruction will be lost.
lib/CodeGen/MachineBasicBlock.cpp
1456	A short answer would be: because the value of cfa offset that is passed when creating a CFI instruction with createDefCfa() and createDefCfaOffset() is negated. That is the value that will end up as the actual offset in cfi directives. I was calculating in/out cfa offset with this non-negated value (that is passed when creating these instructions) so I had to negate the offset when reading the value with getOffset(). I am not sure why the correct value is not passed in the first place (e.g. in createAdjustCfaOffset(), the passed value is not negated). I realised there was a bug with calculating in/out cfa offset - for DefCfa and DefCfaOffset values used for calculating in/out cfa offset were negative of the ones that end up in cfi directives, but that was not the case for AdjustCfaOffset. I changed it so that the values stored as in/out cfa offset are the values of actual offsets set by cfi directives (and not their negative values).

This is looking pretty good. I'm going to go back over, but most of my initial concerns have been satisfied.

lib/CodeGen/BranchFolding.cpp
366	Thanks. Can you shrink the example: BB1: ... INSTRUCTION_A ADD32ri8 ... BB2: ... INSTRUCTION_B CFI_INSTRUCTION and include it in the comments?

thegameg added a subscriber: thegameg.Jun 15 2017, 1:54 PM

Updated a comment in ComputeCommonTailLength() in BranchFolding.cpp.

violetav marked an inline comment as done.Jun 19 2017, 6:12 AM

OK. It's looking pretty good overall. I looked a lot closer at the actual CFI Code. Mostly it looks great. I don't think you need AdjustCFAOffset at all. You spend a lot of work maintaining it, but in the one case that it's used, it's just (OutgoingOffset - IncomingOffset). It's only used with blocks that don't contain an offset def.

That was the hardest part for me to understand, and I don't think you need it. If you can remove it, feel free to mark the comments as done by way of code removal.

include/llvm/CodeGen/MachineBasicBlock.h
770	Should this always be equal to OutgoingCFAOffset - IncomingCFAOffset? I think I get it. It's the total amount of adjustments that occur in this block, either since the beginning or since the last def_cfa_offset directive in the function. Can you be more explicit in the comments?
lib/CodeGen/CFIInfoVerifier.cpp
65	Can you name this something like Pred?
90	Same here.
lib/CodeGen/CFIInstrInserter.cpp
66	you should clear this value in CorrectCFA.
89	The negative here is confusing. Can you make it so you don't have to remember to flip the sign here?
98	Same here.
lib/CodeGen/MachineBasicBlock.cpp
1362	I think this would be more clear if you used a local variable.
1433	Isn't there a missing case here, where if you encounter both a def_cfa_offset and a def_cfa_register, you can early return, like below?
1486	Should this short-circuit if they already match?
1514	Same here. It looks like you could short-circuit if they're equal.
1577	It feels weird to me that you would scan the block twice. This could be done in the loop above, couldn't it?

Removed AdjustCFAOffset.
Addressed other comments.
Changed the code in updateCFIInfoSucc() a bit since AdjustCFAOffset is no longer used.

violetav marked 8 inline comments as done.Jun 22 2017, 10:04 AM

This is looking really good. Thanks.

I still don't get the negation. I'm glad that you pinned it down to one location in the code, but I don't understand why it's there at all. Can you explain it to me?

lib/CodeGen/CFIInstrInserter.cpp
44	I'm still not sure I get the negatives. Do the CFI instructions return the negative of what they're created with? That doesn't seem to make any sense.

Hi,

I see that your patch is mostly about the state of CFA, but I think this applies to other things, like callee-saved registers.

As part of my work on shrink-wrapping, I needed to add support for saving / restoring registers in multiple locations. So, in order to get correct CFI information for the registers, we have to add the corresponding .cfi_offset and .cfi_restore to the save and restore points, which results in the following code:

// BLOCK0
if (i) {
    // BLOCK1
    save reg1
    if (a) {
      // BLOCK2
      save reg2
      [... use reg1 reg2 ...]
      restore reg2
      restore reg1
      ret
    } else {
      // BLOCK3
      save reg3
      [... use only reg1, reg3 ...]
      restore reg3
      restore reg1
      ret
    }
} else {
 // BLOCK4
 ret
}

So, in this case, it's not correct to assume that the CFI state starts at the entry block, nor to assume that it is the same in the whole function (pretty much the same issue as CFA). For this, I emit a CFI_INSTRUCTION .cfi_offset along with every save, and a CFI_INSTRUCTION .cfi_restore along with every restore. After that, in the AsmPrinter I run another pass to fill the gaps and add more .cfi_offset or .cfi_restore where the fall through state is not correct.

Imagine the order of the basic blocks was:

BLOCK0
BLOCK1
BLOCK4
BLOCK2
BLOCK3

then we need an extra .cfi_restore reg1 at the end of BLOCK1 and an extra .cfi_offset reg1 at the beginning of BLOCK2.

This could currently happen with the current shrink-wrapping implementation, but the block placement is never placing blocks in a way that the prologue comes after the epilogue, so we didn't hit that issue yet.

I guess this issue can be also addressed with the same infrastructure you're building here for CFA. What do you think?

Also, I might be missing something here, but do we really need to maintain the state of CFA in every MBB starting from PEI to (almost) AsmPrinter? Can't we just do one simple pass during (or before) AsmPrinter which collects all the CFI directives and does the analysis based on the CFG and the final block layout (since at that point, we have that information) ? It looks like TailDuplication was the problem, but looks like you fixed that to skip CFI instructions.

Let me know what you think.

violetav added inline comments.Jun 23 2017, 8:39 AM

lib/CodeGen/CFIInstrInserter.cpp
44	The thing is, they are created with the negative of what you pass to the create* method: static MCCFIInstruction createDefCfa(MCSymbol L, unsigned Register, int Offset) { return MCCFIInstruction(OpDefCfa, L, Register, -Offset, ""); } static MCCFIInstruction createDefCfaOffset(MCSymbol L, int Offset) { return MCCFIInstruction(OpDefCfaOffset, L, 0, -Offset, ""); } So, if you want to create a '.cfi_def_cfa_offset 16' you would pass -16 to the createDefCfaOffset() method. This is true for def_cfa and def_cfa_offset, but not for adjust_cfa_offset: static MCCFIInstruction createAdjustCfaOffset(MCSymbol *L, int Adjustment) { return MCCFIInstruction(OpAdjustCfaOffset, L, 0, Adjustment, ""); } And here, for in/out cfa offset, I am storing these positive values that end up in cfi directives, and then creating def_cfa or def_cfa_offset instruction, hence the '-MBB.getIncomingCFAOffset()'.

It would be nice to get rid of that negative completely. Not in this patch, but just remove it completely wherever we create CFI Instructions.

This revision is now accepted and ready to land.Jun 23 2017, 11:32 AM

Rebased the patch.

Rebased patch again.

Closed by commit rL306529: [X86] Correct dwarf unwind information in function epilogue (authored by petarj). · Explain WhyJun 28 2017, 3:22 AM

This revision was automatically updated to reflect the committed changes.

Hi Francis,
I am not sure if I am missing something, but yes, it looks like you could do something similar for cfi_offset and cfi_restore; keep track of registers that should be saved/restored at the beginning and end of a basic block and then insert save/restore CFI instructions for appropriate registers if the basic blocks get reordered in a way that this information is no longer correct.
For the second question, I have started with something similar (inserted CFI instructions in epilogue and added additional CFI instrs that correct the CFA calculation rule both in the late pass), however, that was not preferable, because I was analyzing each BB, checking it for CFI instructions, and analyzing if additional instructions should be inserted, basically, I should have known this information already, and not need to do this analysis in the pass.

Hi Violeta,

Thanks for taking your time to answer.

In D18046#799475, @violetav wrote:

I am not sure if I am missing something, but yes, it looks like you could do something similar for cfi_offset and cfi_restore; keep track of registers that should be saved/restored at the beginning and end of a basic block and then insert save/restore CFI instructions for appropriate registers if the basic blocks get reordered in a way that this information is no longer correct.

Yes, exactly. I'll try to merge this with your code from CFIInstrInserter.

I see this has been reverted (r306676). Any plans to re-commit this @violetav ?

violetav mentioned this in D35844: Correct dwarf unwind information in function epilogue.Jul 25 2017, 9:20 AM

@thegameg yes, I have just posted D35844 that fixes the issues.

Sorry for being so late to the review.

I'm very concerned about adding members to MachineBasicBlock, that makes this information effectively part of our machine representation that needs to be modified appropriately by transformation passes. In that respect it is not natural and I expect maintenance problems long term.

I know this was already discussed but I fail to find the exact reasons why producing all the CFA information as part of the AsmPrinter was dismissed.
Intuitively I would expect it to be very well possible to do some abstract simulation of the stack pointer during emission (possibly with target callbacks helping out to determine which instructions modify the stack/callframe access in which way). And I would take such a solution any day over introducing more "state" into machine functions/blocks. Even if it is more/complexer code than doing it as part of Prolog Epilog emission.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

47 lines

1 line

8 lines

2 lines

Target/

Target.td

2 lines

TargetFrameLowering.h

13 lines

lib/

CodeGen/

62 lines

123 lines

124 lines

2 lines

2 lines

MachineBasicBlock.cpp

227 lines

MachineInstr.cpp

38 lines

PrologEpilogInserter.cpp

4 lines

TailDuplicator.cpp

10 lines

TargetPassConfig.cpp

7 lines

Target/

X86/

X86CallFrameOptimization.cpp

18 lines

X86FrameLowering.h

2 lines

X86FrameLowering.cpp

119 lines

test/

CodeGen/

X86/

2009-03-16-PHIElimInLPad.ll

2 lines

2011-10-19-widen_vselect.ll

2 lines

GlobalISel/

2 lines

2 lines

2 lines

2 lines

6 lines

avx512bw-intrinsics-upgrade.ll

54 lines

avx512bw-intrinsics.ll

14 lines

avx512vl-intrinsics-fast-isel.ll

118 lines

avx512vl-vbroadcast.ll

10 lines

emutls-pie.ll

12 lines

emutls.ll

32 lines

epilogue-cfi-fp.ll

44 lines

epilogue-cfi-no-fp.ll

50 lines

fast-isel-store.ll

40 lines

frame-lowering-debug-intrinsic-2.ll

14 lines

frame-lowering-debug-intrinsic.ll

8 lines

48 lines

2 lines

14 lines

16 lines

8 lines

masked_gather_scatter.ll

74 lines

memset-nonzero.ll

2 lines

merge-consecutive-loads-128.ll

72 lines

movtopush.ll

4 lines

mul-constant-result.ll

322 lines

8 lines

2 lines

2 lines

2 lines

4 lines

2 lines

8 lines

4 lines

4 lines

2 lines

4 lines

7 lines

3 lines

6 lines

2 lines

2 lines

statepoint-call-lowering.ll

2 lines

statepoint-gctransition-call-lowering.ll

2 lines

6 lines

103 lines

102 lines

26 lines

vector-shuffle-avx512.ll

10 lines

vector-shuffle-v1.ll

2 lines

wide-integer-cmp.ll

6 lines

x86-framelowering-trap.ll

2 lines

x86-no_caller_saved_registers-preserve.ll

2 lines

Diff 104367

include/llvm/CodeGen/MachineBasicBlock.h

Show All 17 Lines
#include "llvm/ADT/ilist.h"		#include "llvm/ADT/ilist.h"
#include "llvm/ADT/ilist_node.h"		#include "llvm/ADT/ilist_node.h"
#include "llvm/ADT/iterator_range.h"		#include "llvm/ADT/iterator_range.h"
#include "llvm/ADT/simple_ilist.h"		#include "llvm/ADT/simple_ilist.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineInstrBundleIterator.h"		#include "llvm/CodeGen/MachineInstrBundleIterator.h"
#include "llvm/IR/DebugLoc.h"		#include "llvm/IR/DebugLoc.h"
#include "llvm/MC/LaneBitmask.h"		#include "llvm/MC/LaneBitmask.h"
		#include "llvm/MC/MCDwarf.h"
#include "llvm/MC/MCRegisterInfo.h"		#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/Support/BranchProbability.h"		#include "llvm/Support/BranchProbability.h"
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <functional>		#include <functional>
#include <iterator>		#include <iterator>
#include <string>		#include <string>
#include <vector>		#include <vector>
▲ Show 20 Lines • Show All 718 Lines • ▼ Show 20 Lines	private:
/// unless you know what you're doing, because it doesn't update Pred's		/// unless you know what you're doing, because it doesn't update Pred's
/// successors list. Use Pred->addSuccessor instead.		/// successors list. Use Pred->addSuccessor instead.
void addPredecessor(MachineBasicBlock *Pred);		void addPredecessor(MachineBasicBlock *Pred);

/// Remove Pred as a predecessor of this MachineBasicBlock. Don't do this		/// Remove Pred as a predecessor of this MachineBasicBlock. Don't do this
/// unless you know what you're doing, because it doesn't update Pred's		/// unless you know what you're doing, because it doesn't update Pred's
/// successors list. Use Pred->removeSuccessor instead.		/// successors list. Use Pred->removeSuccessor instead.
void removePredecessor(MachineBasicBlock *Pred);		void removePredecessor(MachineBasicBlock *Pred);

		// Value of cfa offset valid at basic block entry.
		int IncomingCFAOffset = -1;
		// Value of cfa offset valid at basic block exit.
		int OutgoingCFAOffset = -1;
		// Value of cfa register valid at basic block entry.
		unsigned IncomingCFARegister = 0;
		// Value of cfa register valid at basic block exit.
		unsigned OutgoingCFARegister = 0;
		// If a block contains a def_cfa_offset or def_cfa directive.
		iterateeUnsubmitted Not Done Reply Inline Actions Should this always be equal to OutgoingCFAOffset - IncomingCFAOffset? I think I get it. It's the total amount of adjustments that occur in this block, either since the beginning or since the last def_cfa_offset directive in the function. Can you be more explicit in the comments? iteratee: Should this always be equal to OutgoingCFAOffset - IncomingCFAOffset? I think I get it. It's…
		bool DefOffset = false;
		// If a block contains a def_cfa_register or def_cfa directive.
		bool DefRegister = false;

		public:
		int getIncomingCFAOffset() { return IncomingCFAOffset; }
		void setIncomingCFAOffset(int Offset) { IncomingCFAOffset = Offset; }
		int getOutgoingCFAOffset() { return OutgoingCFAOffset; }
		void setOutgoingCFAOffset(int Offset) { OutgoingCFAOffset = Offset; }
		unsigned getIncomingCFARegister() { return IncomingCFARegister; }
		void setIncomingCFARegister(unsigned Register) {
		IncomingCFARegister = Register;
		}
		unsigned getOutgoingCFARegister() { return OutgoingCFARegister; }
		void setOutgoingCFARegister(unsigned Register) {
		OutgoingCFARegister = Register;
		}

		bool hasDefOffset() { return DefOffset; }
		bool hasDefRegister() { return DefRegister; }
		void setDefOffset(bool SetsOffset) { DefOffset = SetsOffset; }
		void setDefRegister(bool SetsRegister) { DefRegister = SetsRegister; }

		// Update the outgoing cfa offset and register for this block based on the CFI
		// instruction inserted at Pos.
		void updateCFIInfo(MachineBasicBlock::iterator Pos);
		// Update the cfa offset and register values for all successors of this block.
		void updateCFIInfoSucc();
		// Recalculate outgoing cfa offset and register. Use existing incoming offset
		// and register values if UseExistingIncoming is set to true. If it is false,
		// use new values passed as arguments.
		void recalculateCFIInfo(bool UseExistingIncoming, int NewIncomingOffset = -1,
		unsigned NewIncomingRegister = 0);
		// Update outgoing cfa offset and register of the block after it is merged
		// with MBB.
		void mergeCFIInfo(MachineBasicBlock *MBB);
};		};

raw_ostream& operator<<(raw_ostream &OS, const MachineBasicBlock &MBB);		raw_ostream& operator<<(raw_ostream &OS, const MachineBasicBlock &MBB);

// This is useful when building IndexedMaps keyed on basic block pointers.		// This is useful when building IndexedMaps keyed on basic block pointers.
struct MBB2NumberFunctor :		struct MBB2NumberFunctor :
public std::unary_function<const MachineBasicBlock*, unsigned> {		public std::unary_function<const MachineBasicBlock*, unsigned> {
unsigned operator()(const MachineBasicBlock *MBB) const {		unsigned operator()(const MachineBasicBlock *MBB) const {
▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

include/llvm/CodeGen/MachineInstr.h

Show First 20 Lines • Show All 783 Lines • ▼ Show 20 Lines	public:
/// A DBG_VALUE is indirect iff the first operand is a register and		/// A DBG_VALUE is indirect iff the first operand is a register and
/// the second operand is an immediate.		/// the second operand is an immediate.
bool isIndirectDebugValue() const {		bool isIndirectDebugValue() const {
return isDebugValue()		return isDebugValue()
&& getOperand(0).isReg()		&& getOperand(0).isReg()
&& getOperand(1).isImm();		&& getOperand(1).isImm();
}		}

		bool isDirective() const { return isDebugValue() \|\| isCFIInstruction(); }
bool isPHI() const { return getOpcode() == TargetOpcode::PHI; }		bool isPHI() const { return getOpcode() == TargetOpcode::PHI; }
bool isKill() const { return getOpcode() == TargetOpcode::KILL; }		bool isKill() const { return getOpcode() == TargetOpcode::KILL; }
bool isImplicitDef() const { return getOpcode()==TargetOpcode::IMPLICIT_DEF; }		bool isImplicitDef() const { return getOpcode()==TargetOpcode::IMPLICIT_DEF; }
bool isInlineAsm() const { return getOpcode() == TargetOpcode::INLINEASM; }		bool isInlineAsm() const { return getOpcode() == TargetOpcode::INLINEASM; }

bool isMSInlineAsm() const {		bool isMSInlineAsm() const {
return getOpcode() == TargetOpcode::INLINEASM && getInlineAsmDialect();		return getOpcode() == TargetOpcode::INLINEASM && getInlineAsmDialect();
}		}
▲ Show 20 Lines • Show All 547 Lines • Show Last 20 Lines

include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 414 Lines • ▼ Show 20 Lines	/// MachineDominanaceFrontier - This pass is a machine dominators analysis pass.
/// This pass performs outlining on machine instructions directly before		/// This pass performs outlining on machine instructions directly before
/// printing assembly.		/// printing assembly.
ModulePass *createMachineOutlinerPass();		ModulePass *createMachineOutlinerPass();

/// This pass expands the experimental reduction intrinsics into sequences of		/// This pass expands the experimental reduction intrinsics into sequences of
/// shuffles.		/// shuffles.
FunctionPass *createExpandReductionsPass();		FunctionPass *createExpandReductionsPass();

		/// This pass verifies that outgoing cfa offset and register of predecessor
		/// blocks match incoming cfa offset and register of their successors.
		FunctionPass *createCFIInfoVerifier();

		/// This pass inserts required CFI instruction at basic block beginning to
		/// correct the CFA calculation rule for that block if necessary.
		FunctionPass *createCFIInstrInserter();

} // End llvm namespace		} // End llvm namespace

#endif		#endif

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	void initializeBranchProbabilityInfoWrapperPassPass(PassRegistry&);			void initializeBranchProbabilityInfoWrapperPassPass(PassRegistry&);
	void initializeBranchRelaxationPass(PassRegistry&);			void initializeBranchRelaxationPass(PassRegistry&);
	void initializeBreakCriticalEdgesPass(PassRegistry&);			void initializeBreakCriticalEdgesPass(PassRegistry&);
	void initializeCFGOnlyPrinterLegacyPassPass(PassRegistry&);			void initializeCFGOnlyPrinterLegacyPassPass(PassRegistry&);
	void initializeCFGOnlyViewerLegacyPassPass(PassRegistry&);			void initializeCFGOnlyViewerLegacyPassPass(PassRegistry&);
	void initializeCFGPrinterLegacyPassPass(PassRegistry&);			void initializeCFGPrinterLegacyPassPass(PassRegistry&);
	void initializeCFGSimplifyPassPass(PassRegistry&);			void initializeCFGSimplifyPassPass(PassRegistry&);
	void initializeCFGViewerLegacyPassPass(PassRegistry&);			void initializeCFGViewerLegacyPassPass(PassRegistry&);
				void initializeCFIInfoVerifierPass(PassRegistry&);
				void initializeCFIInstrInserterPass(PassRegistry&);
	void initializeCFLAndersAAWrapperPassPass(PassRegistry&);			void initializeCFLAndersAAWrapperPassPass(PassRegistry&);
	void initializeCFLSteensAAWrapperPassPass(PassRegistry&);			void initializeCFLSteensAAWrapperPassPass(PassRegistry&);
	void initializeCallGraphDOTPrinterPass(PassRegistry&);			void initializeCallGraphDOTPrinterPass(PassRegistry&);
	void initializeCallGraphPrinterLegacyPassPass(PassRegistry&);			void initializeCallGraphPrinterLegacyPassPass(PassRegistry&);
	void initializeCallGraphViewerPass(PassRegistry&);			void initializeCallGraphViewerPass(PassRegistry&);
	void initializeCallGraphWrapperPassPass(PassRegistry&);			void initializeCallGraphWrapperPassPass(PassRegistry&);
	void initializeCodeGenPreparePass(PassRegistry&);			void initializeCodeGenPreparePass(PassRegistry&);
	void initializeConstantHoistingLegacyPassPass(PassRegistry&);			void initializeConstantHoistingLegacyPassPass(PassRegistry&);
	▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

include/llvm/Target/Target.td

Show First 20 Lines • Show All 818 Lines • ▼ Show 20 Lines	def INLINEASM : Instruction {
let AsmString = "";		let AsmString = "";
let hasSideEffects = 0; // Note side effect is encoded in an operand.		let hasSideEffects = 0; // Note side effect is encoded in an operand.
}		}
def CFI_INSTRUCTION : Instruction {		def CFI_INSTRUCTION : Instruction {
let OutOperandList = (outs);		let OutOperandList = (outs);
let InOperandList = (ins i32imm:$id);		let InOperandList = (ins i32imm:$id);
let AsmString = "";		let AsmString = "";
let hasCtrlDep = 1;		let hasCtrlDep = 1;
let isNotDuplicable = 1;		let isNotDuplicable = 0;
}		}
def EH_LABEL : Instruction {		def EH_LABEL : Instruction {
let OutOperandList = (outs);		let OutOperandList = (outs);
let InOperandList = (ins i32imm:$id);		let InOperandList = (ins i32imm:$id);
let AsmString = "";		let AsmString = "";
let hasCtrlDep = 1;		let hasCtrlDep = 1;
let isNotDuplicable = 1;		let isNotDuplicable = 1;
}		}
▲ Show 20 Lines • Show All 550 Lines • Show Last 20 Lines

include/llvm/Target/TargetFrameLowering.h

Show First 20 Lines • Show All 333 Lines • ▼ Show 20 Lines	if (!F->hasLocalLinkage() \|\| F->hasAddressTaken() \|\|
return false;		return false;
// Function should not be optimized as tail call.		// Function should not be optimized as tail call.
for (const User *U : F->users())		for (const User *U : F->users())
if (auto CS = ImmutableCallSite(U))		if (auto CS = ImmutableCallSite(U))
if (CS.isTailCall())		if (CS.isTailCall())
return false;		return false;
return true;		return true;
}		}

		// Set initial incoming and outgoing cfa offset and register values for basic
		// blocks. Initial values are the ones valid at the beginning of the function
		// (before any stack operations). Incoming and outgoing cfa offset and
		// register values are used to keep track of offset and register that are
		// valid at basic block entry and exit. This information is used by a late
		// pass that corrects the CFA calculation rule for a basic block if needed.
		// Having CFI instructions in function epilogue can cause incorrect CFA
		// calculation rule for some basic blocks. This can happen if, due to basic
		// block reordering, or the existence of multiple epilogue blocks, some of the
		// blocks have wrong cfa offset and register values set by the epilogue block
		// above them.
		virtual void initializeCFIInfo(MachineFunction & MF) const {}
};		};

} // End llvm namespace		} // End llvm namespace

#endif		#endif

lib/CodeGen/BranchFolding.cpp

Show First 20 Lines • Show All 296 Lines • ▼ Show 20 Lines	static unsigned ComputeCommonTailLength(MachineBasicBlock *MBB1,
MachineBasicBlock *MBB2,		MachineBasicBlock *MBB2,
MachineBasicBlock::iterator &I1,		MachineBasicBlock::iterator &I1,
MachineBasicBlock::iterator &I2) {		MachineBasicBlock::iterator &I2) {
I1 = MBB1->end();		I1 = MBB1->end();
I2 = MBB2->end();		I2 = MBB2->end();

unsigned TailLen = 0;		unsigned TailLen = 0;
while (I1 != MBB1->begin() && I2 != MBB2->begin()) {		while (I1 != MBB1->begin() && I2 != MBB2->begin()) {
--I1; --I2;		--I1; --I2;
		rnkUnsubmitted Not Done Reply Inline Actions Please don't do these types of triple checks in target independent CodeGen code. Use TII. What are you trying to do here? Avoid tail merging epilogues, or do more tail merging, or...? rnk: Please don't do these types of triple checks in target independent CodeGen code. Use TII. What…
		violetavAuthorUnsubmitted Not Done Reply Inline Actions How can I use TII to get this information? To do more tail merging (CFI instructions have different CFI indices, so isIdenticalTo() returns false, and prevents tail merging that would have happened otherwise). violetav: How can I use TII to get this information? To do more tail merging (CFI instructions have…
		iterateeUnsubmitted Not Done Reply Inline Actions CFI Instructions should be more like the debug instructions that are skipped below. It shouldn't be platform specific. CFI isn't platform specific. iteratee: CFI Instructions should be more like the debug instructions that are skipped below. It…
// Skip debugging pseudos; necessary to avoid changing the code.		// Skip debugging pseudos; necessary to avoid changing the code.
while (I1->isDebugValue()) {		while (I1->isDirective()) {
if (I1==MBB1->begin()) {		if (I1==MBB1->begin()) {
while (I2->isDebugValue()) {		while (I2->isDirective()) {
if (I2==MBB2->begin())		if (I2==MBB2->begin())
// I1==DBG at begin; I2==DBG at begin		// I1==DBG at begin; I2==DBG at begin
return TailLen;		return TailLen;
--I2;		--I2;
}		}
++I2;		++I2;
// I1==DBG at begin; I2==non-DBG, or first of DBGs not at begin		// I1==DBG at begin; I2==non-DBG, or first of DBGs not at begin
return TailLen;		return TailLen;
}		}
--I1;		--I1;
}		}
// I1==first (untested) non-DBG preceding known match		// I1==first (untested) non-DBG preceding known match
while (I2->isDebugValue()) {		while (I2->isDirective()) {
if (I2==MBB2->begin()) {		if (I2==MBB2->begin()) {
++I1;		++I1;
// I1==non-DBG, or first of DBGs not at begin; I2==DBG at begin		// I1==non-DBG, or first of DBGs not at begin; I2==DBG at begin
return TailLen;		return TailLen;
}		}
--I2;		--I2;
}		}
// I1, I2==first (untested) non-DBGs preceding known match		// I1, I2==first (untested) non-DBGs preceding known match
Show All 10 Lines	while (I1 != MBB1->begin() && I2 != MBB2->begin()) {
++TailLen;		++TailLen;
}		}
// Back past possible debugging pseudos at beginning of block. This matters		// Back past possible debugging pseudos at beginning of block. This matters
// when one block differs from the other only by whether debugging pseudos		// when one block differs from the other only by whether debugging pseudos
// are present at the beginning. (This way, the various checks later for		// are present at the beginning. (This way, the various checks later for
// I1==MBB1->begin() work as expected.)		// I1==MBB1->begin() work as expected.)
if (I1 == MBB1->begin() && I2 != MBB2->begin()) {		if (I1 == MBB1->begin() && I2 != MBB2->begin()) {
--I2;		--I2;
while (I2->isDebugValue()) {		while (I2->isDebugValue()) {
		iterateeUnsubmitted Not Done Reply Inline Actions should this be isDirective? iteratee: should this be isDirective?
		violetavAuthorUnsubmitted Not Done Reply Inline Actions I don't think that this logic can be applied to CFI instructions. It seems that this code aims to set the iterators to include consecutive DBG_VALUE instructions above last common instruction in one block (when those DBG instructions do not exist in the other block). I don't think that that's the desired behaviour for CFI instructions. violetav: I don't think that this logic can be applied to CFI instructions. It seems that this code aims…
if (I2 == MBB2->begin())		if (I2 == MBB2->begin())
return TailLen;		return TailLen;
--I2;		--I2;
}		}
++I2;		++I2;
}		}
if (I2 == MBB2->begin() && I1 != MBB1->begin()) {		if (I2 == MBB2->begin() && I1 != MBB1->begin()) {
--I1;		--I1;
while (I1->isDebugValue()) {		while (I1->isDebugValue()) {
if (I1 == MBB1->begin())		if (I1 == MBB1->begin())
return TailLen;		return TailLen;
--I1;		--I1;
}		}
++I1;		++I1;
}		}

		// Ensure that I1 and I2 do not point to a CFI_INSTRUCTION. This can happen if
		iterateeUnsubmitted Not Done Reply Inline Actions Is this block still necessary if the tests above are changed to isDirective? I'm having trouble making sense of them iteratee: Is this block still necessary if the tests above are changed to isDirective? I'm having trouble…
		violetavAuthorUnsubmitted Not Done Reply Inline Actions It is. The tests above make sure that CFI instructions are not compared with isIdenticalTo(), and are not included when calculating TailLen. This code ensures that the iterators do not point to CFI instructions (so CFI instructions do not represent the starting point of the common tail in a BB, as they do not exactly 'count' as common instructions of two blocks). For example: BB1: ... INSTRUCTION_A ADD32ri8 CFI_INSTRUCTION POP32r CFI_INSTRUCTION POP32r CFI_INSTRUCTION RET BB2: ... INSTRUCTION_B CFI_INSTRUCTION ADD32ri8 CFI_INSTRUCTION POP32r CFI_INSTRUCTION POP32r CFI_INSTRUCTION RET In this example, BB1 and BB2 will have 4 common instructions (RET, POP, POP and ADD). When INSTRUCTION_A and INSTRUCTION_B are compared as not equal, after incrementing the iterators (++I1; ++I2;), I1 will point to ADD (the last common instruction, as it should), however I2 will point to the CFI instruction. This will, later on, result in BB2 being 'hacked off' (in ReplaceTailWithBranchTo()) at the wrong place (starting from this CFI instruction, instead of ADD below it), so this CFI instruction will be lost. violetav: It is. The tests above make sure that CFI instructions are not compared with isIdenticalTo()…
		iterateeUnsubmitted Done Reply Inline Actions Thanks. Can you shrink the example: BB1: ... INSTRUCTION_A ADD32ri8 ... BB2: ... INSTRUCTION_B CFI_INSTRUCTION and include it in the comments? iteratee: Thanks. Can you shrink the example: BB1: ... INSTRUCTION_A ADD32ri8 ... BB2: ...
		// I1 and I2 are non-identical when compared and then one or both of them ends
		// up pointing to a CFI instruction after being incremented. For example:
		/*
		BB1:
		...
		INSTRUCTION_A
		ADD32ri8 <- last common instruction
		...
		BB2:
		...
		INSTRUCTION_B
		CFI_INSTRUCTION
		ADD32ri8 <- last common instruction
		...
		*/
		// When INSTRUCTION_A and INSTRUCTION_B are compared as not equal, after
		// incrementing the iterators, I1 will point to ADD, however I2 will point to
		// the CFI instruction. Later on, this leads to BB2 being 'hacked off' at the
		// wrong place (in ReplaceTailWithBranchTo()) which results in losing this CFI
		// instruction.
		while (I1 != MBB1->end() && I1->isCFIInstruction()) {
		++I1;
		}

		while (I2 != MBB2->end() && I2->isCFIInstruction()) {
		++I2;
		}
return TailLen;		return TailLen;
}		}

void BranchFolder::ReplaceTailWithBranchTo(MachineBasicBlock::iterator OldInst,		void BranchFolder::ReplaceTailWithBranchTo(MachineBasicBlock::iterator OldInst,
MachineBasicBlock *NewDest) {		MachineBasicBlock *NewDest) {
TII->ReplaceTailWithBranchTo(OldInst, NewDest);		TII->ReplaceTailWithBranchTo(OldInst, NewDest);

if (UpdateLiveIns) {		if (UpdateLiveIns) {
Show All 39 Lines	MachineBasicBlock *BranchFolder::SplitMBBAt(MachineBasicBlock &CurMBB,

// Add the new block to the funclet.		// Add the new block to the funclet.
const auto &FuncletI = FuncletMembership.find(&CurMBB);		const auto &FuncletI = FuncletMembership.find(&CurMBB);
if (FuncletI != FuncletMembership.end()) {		if (FuncletI != FuncletMembership.end()) {
auto n = FuncletI->second;		auto n = FuncletI->second;
FuncletMembership[NewMBB] = n;		FuncletMembership[NewMBB] = n;
}		}

		// Recalculate CFI info for CurMBB. Use existing incoming cfa offset and
		rnkUnsubmitted Not Done Reply Inline Actions ditto rnk: ditto
		// register.
		iterateeUnsubmitted Done Reply Inline Actions I'd like to see this factored out into MachineBasicBlock. Something like recalculateCFI(bool useExistingIncoming) iteratee: I'd like to see this factored out into MachineBasicBlock. Something like recalculateCFI(bool…
		CurMBB.recalculateCFIInfo(true);
		// Recalculate CFI info for NewMBB. Use CurMBB's outgoing cfa offset and
		// register as NewMBB's incoming.
		NewMBB->recalculateCFIInfo(false, CurMBB.getOutgoingCFAOffset(),
		CurMBB.getOutgoingCFARegister());

return NewMBB;		return NewMBB;
}		}

/// EstimateRuntime - Make a rough estimate for how long it will take to run		/// EstimateRuntime - Make a rough estimate for how long it will take to run
/// the specified code.		/// the specified code.
static unsigned EstimateRuntime(MachineBasicBlock::iterator I,		static unsigned EstimateRuntime(MachineBasicBlock::iterator I,
MachineBasicBlock::iterator E) {		MachineBasicBlock::iterator E) {
unsigned Time = 0;		unsigned Time = 0;
for (; I != E; ++I) {		for (; I != E; ++I) {
if (I->isDebugValue())		if (I->isDirective())
continue;		continue;
if (I->isCall())		if (I->isCall())
Time += 10;		Time += 10;
else if (I->mayLoad() \|\| I->mayStore())		else if (I->mayLoad() \|\| I->mayStore())
Time += 2;		Time += 2;
else		else
++Time;		++Time;
}		}
▲ Show 20 Lines • Show All 337 Lines • ▼ Show 20 Lines	if (i != commonTailIndex)
NextCommonInsts[i] = SameTails[i].getTailStartPos();		NextCommonInsts[i] = SameTails[i].getTailStartPos();
else {		else {
assert(SameTails[i].getTailStartPos() == MBB->begin() &&		assert(SameTails[i].getTailStartPos() == MBB->begin() &&
"MBB is not a common tail only block");		"MBB is not a common tail only block");
}		}
}		}

for (auto &MI : *MBB) {		for (auto &MI : *MBB) {
if (MI.isDebugValue())		if (MI.isDirective())
continue;		continue;
DebugLoc DL = MI.getDebugLoc();		DebugLoc DL = MI.getDebugLoc();
for (unsigned int i = 0 ; i < NextCommonInsts.size() ; i++) {		for (unsigned int i = 0 ; i < NextCommonInsts.size() ; i++) {
if (i == commonTailIndex)		if (i == commonTailIndex)
continue;		continue;

auto &Pos = NextCommonInsts[i];		auto &Pos = NextCommonInsts[i];
assert(Pos != SameTails[i].getBlock()->end() &&		assert(Pos != SameTails[i].getBlock()->end() &&
"Reached BB end within common tail");		"Reached BB end within common tail");
while (Pos->isDebugValue()) {		while (Pos->isDirective()) {
++Pos;		++Pos;
assert(Pos != SameTails[i].getBlock()->end() &&		assert(Pos != SameTails[i].getBlock()->end() &&
"Reached BB end within common tail");		"Reached BB end within common tail");
}		}
assert(MI.isIdenticalTo(*Pos) && "Expected matching MIIs!");		assert(MI.isIdenticalTo(*Pos) && "Expected matching MIIs!");
DL = DILocation::getMergedLocation(DL, Pos->getDebugLoc());		DL = DILocation::getMergedLocation(DL, Pos->getDebugLoc());
NextCommonInsts[i] = ++Pos;		NextCommonInsts[i] = ++Pos;
}		}
Show All 16 Lines	mergeOperations(MachineBasicBlock::iterator MBBIStartPos,
MachineBasicBlock::reverse_iterator MBBIE = MBB->rend();		MachineBasicBlock::reverse_iterator MBBIE = MBB->rend();
MachineBasicBlock::reverse_iterator MBBICommon = MBBCommon.rbegin();		MachineBasicBlock::reverse_iterator MBBICommon = MBBCommon.rbegin();
MachineBasicBlock::reverse_iterator MBBIECommon = MBBCommon.rend();		MachineBasicBlock::reverse_iterator MBBIECommon = MBBCommon.rend();

while (CommonTailLen--) {		while (CommonTailLen--) {
assert(MBBI != MBBIE && "Reached BB end within common tail length!");		assert(MBBI != MBBIE && "Reached BB end within common tail length!");
(void)MBBIE;		(void)MBBIE;

if (MBBI->isDebugValue()) {		if (MBBI->isDirective()) {
++MBBI;		++MBBI;
continue;		continue;
}		}

while ((MBBICommon != MBBIECommon) && MBBICommon->isDebugValue())		while ((MBBICommon != MBBIECommon) && MBBICommon->isDirective())
++MBBICommon;		++MBBICommon;

assert(MBBICommon != MBBIECommon &&		assert(MBBICommon != MBBIECommon &&
"Reached BB end within common tail length!");		"Reached BB end within common tail length!");
assert(MBBICommon->isIdenticalTo(*MBBI) && "Expected matching MIIs!");		assert(MBBICommon->isIdenticalTo(*MBBI) && "Expected matching MIIs!");

// Merge MMOs from memory operations in the common block.		// Merge MMOs from memory operations in the common block.
if (MBBICommon->mayLoad() \|\| MBBICommon->mayStore())		if (MBBICommon->mayLoad() \|\| MBBICommon->mayStore())
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	for (unsigned int i=0, e = SameTails.size(); i != e; ++i) {
if (commonTailIndex == i)		if (commonTailIndex == i)
continue;		continue;
DEBUG(dbgs() << "BB#" << SameTails[i].getBlock()->getNumber()		DEBUG(dbgs() << "BB#" << SameTails[i].getBlock()->getNumber()
<< (i == e-1 ? "" : ", "));		<< (i == e-1 ? "" : ", "));
// Merge operations (MMOs, undef flags)		// Merge operations (MMOs, undef flags)
mergeOperations(SameTails[i].getTailStartPos(), *MBB);		mergeOperations(SameTails[i].getTailStartPos(), *MBB);
// Hack the end off BB i, making it jump to BB commonTailIndex instead.		// Hack the end off BB i, making it jump to BB commonTailIndex instead.
ReplaceTailWithBranchTo(SameTails[i].getTailStartPos(), MBB);		ReplaceTailWithBranchTo(SameTails[i].getTailStartPos(), MBB);

		// Recalculate CFI info for BB. Use existing incoming cfa offset and
		iterateeUnsubmitted Done Reply Inline Actions This looks like the same code above. Please factor this out. iteratee: This looks like the same code above. Please factor this out.
		// register.
		SameTails[i].getBlock()->recalculateCFIInfo(true);

// BB i is no longer a predecessor of SuccBB; remove it from the worklist.		// BB i is no longer a predecessor of SuccBB; remove it from the worklist.
MergePotentials.erase(SameTails[i].getMPIter());		MergePotentials.erase(SameTails[i].getMPIter());
}		}
DEBUG(dbgs() << "\n");		DEBUG(dbgs() << "\n");
// We leave commonTailIndex in the worklist in case there are other blocks		// We leave commonTailIndex in the worklist in case there are other blocks
// that match it with a smaller number of instructions.		// that match it with a smaller number of instructions.
MadeChange = true;		MadeChange = true;
}		}
▲ Show 20 Lines • Show All 394 Lines • ▼ Show 20 Lines	if (PriorCond.empty() && !PriorTBB && MBB->pred_size() == 1 &&
DuplicateDbg.eraseFromParent();		DuplicateDbg.eraseFromParent();
}		}
}		}
PrevBB.splice(PrevBB.end(), MBB, MBB->begin(), MBB->end());		PrevBB.splice(PrevBB.end(), MBB, MBB->begin(), MBB->end());
PrevBB.removeSuccessor(PrevBB.succ_begin());		PrevBB.removeSuccessor(PrevBB.succ_begin());
assert(PrevBB.succ_empty());		assert(PrevBB.succ_empty());
PrevBB.transferSuccessors(MBB);		PrevBB.transferSuccessors(MBB);
MadeChange = true;		MadeChange = true;

		// Update CFI info for PrevBB.
		PrevBB.mergeCFIInfo(MBB);

return MadeChange;		return MadeChange;
}		}

// If the previous branch only branches to this block (conditional or		// If the previous branch only branches to this block (conditional or
// not) remove the branch.		// not) remove the branch.
if (PriorTBB == MBB && !PriorFBB) {		if (PriorTBB == MBB && !PriorFBB) {
TII->removeBranch(PrevBB);		TII->removeBranch(PrevBB);
MadeChange = true;		MadeChange = true;
▲ Show 20 Lines • Show All 621 Lines • Show Last 20 Lines

lib/CodeGen/CFIInfoVerifier.cpp

This file was added.

				//===----------- CFIInfoVerifier.cpp - CFI Information Verifier -----------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass verifies incoming and outgoing CFI information of basic blocks. CFI
				// information is information about offset and register set by CFI directives,
				// valid at the start and end of a basic block. This pass checks that outgoing
				// information of predecessors matches incoming information of their successors.
				//===----------------------------------------------------------------------===//

				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineModuleInfo.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/Target/TargetMachine.h"
				using namespace llvm;

				namespace {
				class CFIInfoVerifier : public MachineFunctionPass {
				public:
				static char ID;

				CFIInfoVerifier() : MachineFunctionPass(ID) {
				initializeCFIInfoVerifierPass(*PassRegistry::getPassRegistry());
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesAll();
				MachineFunctionPass::getAnalysisUsage(AU);
				}

				bool runOnMachineFunction(MachineFunction &MF) override {
				bool NeedsDwarfCFI = (MF.getMMI().hasDebugInfo() \|\|
				MF.getFunction()->needsUnwindTableEntry()) &&
				(!MF.getTarget().getTargetTriple().isOSDarwin() &&
				!MF.getTarget().getTargetTriple().isOSWindows());
				if (!NeedsDwarfCFI) return false;
				verify(MF);
				return false;
				}

				private:
				// Go through each MBB in a function and check that outgoing offset and
				// register of its predecessors match incoming offset and register of that
				// MBB, as well as that incoming offset and register of its successors match
				// outgoing offset and register of the MBB.
				void verify(MachineFunction &MF);
				void report(const char *msg, MachineBasicBlock &MBB);
				};
				}

				char CFIInfoVerifier::ID = 0;
				INITIALIZE_PASS(CFIInfoVerifier, "cfiinfoverifier",
				"Verify that corresponding in/out CFI info matches", false,
				false)
				FunctionPass *llvm::createCFIInfoVerifier() { return new CFIInfoVerifier(); }

				void CFIInfoVerifier::verify(MachineFunction &MF) {
				for (auto &CurrMBB : MF) {
				for (auto Pred : CurrMBB.predecessors()) {
				// Check that outgoing offset values of predecessors match the incoming
				iterateeUnsubmitted Done Reply Inline Actions Can you name this something like Pred? iteratee: Can you name this something like Pred?
				// offset value of CurrMBB
				if (Pred->getOutgoingCFAOffset() != CurrMBB.getIncomingCFAOffset()) {
				report("The outgoing offset of a predecessor is inconsistent.",
				CurrMBB);
				errs() << "Predecessor BB#" << Pred->getNumber()
				<< " has outgoing offset (" << Pred->getOutgoingCFAOffset()
				<< "), while BB#" << CurrMBB.getNumber()
				<< " has incoming offset (" << CurrMBB.getIncomingCFAOffset()
				<< ").\n";
				}
				// Check that outgoing register values of predecessors match the incoming
				// register value of CurrMBB
				if (Pred->getOutgoingCFARegister() != CurrMBB.getIncomingCFARegister()) {
				report("The outgoing register of a predecessor is inconsistent.",
				CurrMBB);
				errs() << "Predecessor BB#" << Pred->getNumber()
				<< " has outgoing register (" << Pred->getOutgoingCFARegister()
				<< "), while BB#" << CurrMBB.getNumber()
				<< " has incoming register (" << CurrMBB.getIncomingCFARegister()
				<< ").\n";
				}
				}

				for (auto Succ : CurrMBB.successors()) {
				// Check that incoming offset values of successors match the outgoing
				iterateeUnsubmitted Done Reply Inline Actions Same here. iteratee: Same here.
				// offset value of CurrMBB
				if (Succ->getIncomingCFAOffset() != CurrMBB.getOutgoingCFAOffset()) {
				report("The incoming offset of a successor is inconsistent.", CurrMBB);
				errs() << "Successor BB#" << Succ->getNumber()
				<< " has incoming offset (" << Succ->getIncomingCFAOffset()
				<< "), while BB#" << CurrMBB.getNumber()
				<< " has outgoing offset (" << CurrMBB.getOutgoingCFAOffset()
				<< ").\n";
				}
				// Check that incoming register values of successors match the outgoing
				// register value of CurrMBB
				if (Succ->getIncomingCFARegister() != CurrMBB.getOutgoingCFARegister()) {
				report("The incoming register of a successor is inconsistent.",
				CurrMBB);
				errs() << "Successor BB#" << Succ->getNumber()
				<< " has incoming register (" << Succ->getIncomingCFARegister()
				<< "), while BB#" << CurrMBB.getNumber()
				<< " has outgoing register (" << CurrMBB.getOutgoingCFARegister()
				<< ").\n";
				}
				}
				}
				}

				void CFIInfoVerifier::report(const char *msg, MachineBasicBlock &MBB) {
				assert(&MBB);
				errs() << '\n';
				errs() << "* " << msg << " *\n"
				<< "- function: " << MBB.getParent()->getName() << "\n";
				errs() << "- basic block: BB#" << MBB.getNumber() << ' ' << MBB.getName()
				<< " (" << (const void *)&MBB << ')';
				errs() << '\n';
				}

lib/CodeGen/CFIInstrInserter.cpp

This file was added.

				//===------ CFIInstrInserter.cpp - Insert additional CFI instructions -----===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// Insert CFI instructions at the beginnings of basic blocks if needed. CFI
				// instructions are inserted if basic blocks have incorrect offset or register
				// set by prevoius blocks.
				//
				//===----------------------------------------------------------------------===//
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineModuleInfo.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/Target/TargetInstrInfo.h"
				#include "llvm/Target/TargetMachine.h"
				#include "llvm/Target/TargetSubtargetInfo.h"
				using namespace llvm;

				namespace {
				class CFIInstrInserter : public MachineFunctionPass {
				public:
				CFIInstrInserter() : MachineFunctionPass(ID) {
				initializeCFIInstrInserterPass(*PassRegistry::getPassRegistry());
				}
				bool runOnMachineFunction(MachineFunction &MF) override;
				static char ID;

				private:
				StringRef getPassName() const override { return "CFI Instruction Inserter"; }

				// Check if incoming CFI information of a basic block matches outgoing CFI
				// information of the previous block. If it doesn't, insert CFI instruction at
				// the beginning of the block that corrects the CFA calculation rule for that
				// block.
				void CorrectCFA(MachineFunction &MF);

				// Return the cfa offset value that should be set at the beginning of MBB if
				// needed. The negated value is needed when creating CFI instructions that set
				// absolute offset.
				iterateeUnsubmitted Not Done Reply Inline Actions I'm still not sure I get the negatives. Do the CFI instructions return the negative of what they're created with? That doesn't seem to make any sense. iteratee: I'm still not sure I get the negatives. Do the CFI instructions return the negative of what…
				violetavAuthorUnsubmitted Not Done Reply Inline Actions The thing is, they are created with the negative of what you pass to the create* method: static MCCFIInstruction createDefCfa(MCSymbol L, unsigned Register, int Offset) { return MCCFIInstruction(OpDefCfa, L, Register, -Offset, ""); } static MCCFIInstruction createDefCfaOffset(MCSymbol L, int Offset) { return MCCFIInstruction(OpDefCfaOffset, L, 0, -Offset, ""); } So, if you want to create a '.cfi_def_cfa_offset 16' you would pass -16 to the createDefCfaOffset() method. This is true for def_cfa and def_cfa_offset, but not for adjust_cfa_offset: static MCCFIInstruction createAdjustCfaOffset(MCSymbol L, int Adjustment) { return MCCFIInstruction(OpAdjustCfaOffset, L, 0, Adjustment, ""); } And here, for in/out cfa offset, I am storing these positive values that end up in cfi directives, and then creating def_cfa or def_cfa_offset instruction, hence the '-MBB.getIncomingCFAOffset()'. violetav:* The thing is, they are created with the negative of what you pass to the create* method…
				int getCorrectCFAOffset(MachineBasicBlock &MBB) {
				return -MBB.getIncomingCFAOffset();
				}

				// Were any CFI instructions inserted
				bool InsertedCFIInstr = false;
				};
				}

				char CFIInstrInserter::ID = 0;
				INITIALIZE_PASS(CFIInstrInserter, "cfiinstrinserter",
				"Check CFI info and insert CFI instructions if needed", false,
				false)

				FunctionPass *llvm::createCFIInstrInserter() { return new CFIInstrInserter(); }

				bool CFIInstrInserter::runOnMachineFunction(MachineFunction &MF) {
				bool NeedsDwarfCFI = (MF.getMMI().hasDebugInfo() \|\|
				MF.getFunction()->needsUnwindTableEntry()) &&
				(!MF.getTarget().getTargetTriple().isOSDarwin() &&
				!MF.getTarget().getTargetTriple().isOSWindows());

				iterateeUnsubmitted Done Reply Inline Actions you should clear this value in CorrectCFA. iteratee: you should clear this value in CorrectCFA.
				if (!NeedsDwarfCFI) return false;

				// Insert appropriate CFI instructions for each MBB if CFA calculation rule
				// needs to be corrected for that MBB.
				CorrectCFA(MF);

				return InsertedCFIInstr;
				}

				void CFIInstrInserter::CorrectCFA(MachineFunction &MF) {

				MachineBasicBlock &FirstMBB = MF.front();
				MachineBasicBlock *PrevMBB = &FirstMBB;
				const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();
				InsertedCFIInstr = false;

				for (auto &MBB : MF) {
				// Skip the first MBB in a function
				if (MBB.getNumber() == FirstMBB.getNumber()) continue;

				auto MBBI = MBB.begin();
				DebugLoc DL = MBB.findDebugLoc(MBBI);

				iterateeUnsubmitted Done Reply Inline Actions The negative here is confusing. Can you make it so you don't have to remember to flip the sign here? iteratee: The negative here is confusing. Can you make it so you don't have to remember to flip the sign…
				if (PrevMBB->getOutgoingCFAOffset() != MBB.getIncomingCFAOffset()) {
				// If both outgoing offset and register of a previous block don't match
				// incoming offset and register of this block, add a def_cfa instruction
				// with the correct offset and register for this block.
				if (PrevMBB->getOutgoingCFARegister() != MBB.getIncomingCFARegister()) {
				unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createDefCfa(
				nullptr, MBB.getIncomingCFARegister(), getCorrectCFAOffset(MBB)));
				BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
				.addCFIIndex(CFIIndex);
				iterateeUnsubmitted Done Reply Inline Actions Same here. iteratee: Same here.
				// If outgoing offset of a previous block doesn't match incoming offset
				// of this block, add a def_cfa_offset instruction with the correct
				// offset for this block.
				} else {
				unsigned CFIIndex =
				MF.addFrameInst(MCCFIInstruction::createDefCfaOffset(
				nullptr, getCorrectCFAOffset(MBB)));
				BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
				.addCFIIndex(CFIIndex);
				}
				InsertedCFIInstr = true;
				// If outgoing register of a previous block doesn't match incoming
				// register of this block, add a def_cfa_register instruction with the
				// correct register for this block.
				} else if (PrevMBB->getOutgoingCFARegister() !=
				MBB.getIncomingCFARegister()) {
				unsigned CFIIndex =
				MF.addFrameInst(MCCFIInstruction::createDefCfaRegister(
				nullptr, MBB.getIncomingCFARegister()));
				BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
				.addCFIIndex(CFIIndex);
				InsertedCFIInstr = true;
				}
				PrevMBB = &MBB;
				}
				}

lib/CodeGen/CMakeLists.txt

	add_llvm_library(LLVMCodeGen			add_llvm_library(LLVMCodeGen
	AggressiveAntiDepBreaker.cpp			AggressiveAntiDepBreaker.cpp
	AllocationOrder.cpp			AllocationOrder.cpp
	Analysis.cpp			Analysis.cpp
	AtomicExpandPass.cpp			AtomicExpandPass.cpp
	BasicTargetTransformInfo.cpp			BasicTargetTransformInfo.cpp
	BranchCoalescing.cpp			BranchCoalescing.cpp
	BranchFolding.cpp			BranchFolding.cpp
	BranchRelaxation.cpp			BranchRelaxation.cpp
	BuiltinGCs.cpp			BuiltinGCs.cpp
	CalcSpillWeights.cpp			CalcSpillWeights.cpp
	CallingConvLower.cpp			CallingConvLower.cpp
				CFIInfoVerifier.cpp
				CFIInstrInserter.cpp
	CodeGen.cpp			CodeGen.cpp
	CodeGenPrepare.cpp			CodeGenPrepare.cpp
	CountingFunctionInserter.cpp			CountingFunctionInserter.cpp
	CriticalAntiDepBreaker.cpp			CriticalAntiDepBreaker.cpp
	DeadMachineInstructionElim.cpp			DeadMachineInstructionElim.cpp
	DetectDeadLanes.cpp			DetectDeadLanes.cpp
	DFAPacketizer.cpp			DFAPacketizer.cpp
	DwarfEHPrepare.cpp			DwarfEHPrepare.cpp
	▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

lib/CodeGen/CodeGen.cpp

	Show All 18 Lines
	using namespace llvm;			using namespace llvm;

	/// initializeCodeGen - Initialize all passes linked into the CodeGen library.			/// initializeCodeGen - Initialize all passes linked into the CodeGen library.
	void llvm::initializeCodeGen(PassRegistry &Registry) {			void llvm::initializeCodeGen(PassRegistry &Registry) {
	initializeAtomicExpandPass(Registry);			initializeAtomicExpandPass(Registry);
	initializeBranchCoalescingPass(Registry);			initializeBranchCoalescingPass(Registry);
	initializeBranchFolderPassPass(Registry);			initializeBranchFolderPassPass(Registry);
	initializeBranchRelaxationPass(Registry);			initializeBranchRelaxationPass(Registry);
				initializeCFIInfoVerifierPass(Registry);
				initializeCFIInstrInserterPass(Registry);
	initializeCodeGenPreparePass(Registry);			initializeCodeGenPreparePass(Registry);
	initializeCountingFunctionInserterPass(Registry);			initializeCountingFunctionInserterPass(Registry);
	initializeDeadMachineInstructionElimPass(Registry);			initializeDeadMachineInstructionElimPass(Registry);
	initializeDetectDeadLanesPass(Registry);			initializeDetectDeadLanesPass(Registry);
	initializeDwarfEHPreparePass(Registry);			initializeDwarfEHPreparePass(Registry);
	initializeEarlyIfConverterPass(Registry);			initializeEarlyIfConverterPass(Registry);
	initializeExpandISelPseudosPass(Registry);			initializeExpandISelPseudosPass(Registry);
	initializeExpandPostRAPass(Registry);			initializeExpandPostRAPass(Registry);
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

lib/CodeGen/MachineBasicBlock.cpp

	Show All 29 Lines
	#include "llvm/Support/DataTypes.h"			#include "llvm/Support/DataTypes.h"
	#include "llvm/Support/Debug.h"			#include "llvm/Support/Debug.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"
	#include "llvm/Target/TargetInstrInfo.h"			#include "llvm/Target/TargetInstrInfo.h"
	#include "llvm/Target/TargetMachine.h"			#include "llvm/Target/TargetMachine.h"
	#include "llvm/Target/TargetRegisterInfo.h"			#include "llvm/Target/TargetRegisterInfo.h"
	#include "llvm/Target/TargetSubtargetInfo.h"			#include "llvm/Target/TargetSubtargetInfo.h"
	#include <algorithm>			#include <algorithm>
				#include <queue>
				#include <set>
	using namespace llvm;			using namespace llvm;

	#define DEBUG_TYPE "codegen"			#define DEBUG_TYPE "codegen"

	MachineBasicBlock::MachineBasicBlock(MachineFunction &MF, const BasicBlock *B)			MachineBasicBlock::MachineBasicBlock(MachineFunction &MF, const BasicBlock *B)
	: BB(B), Number(-1), xParent(&MF) {			: BB(B), Number(-1), xParent(&MF) {
	Insts.Parent = this;			Insts.Parent = this;
	}			}
	▲ Show 20 Lines • Show All 1,292 Lines • ▼ Show 20 Lines
	}			}

	MachineBasicBlock::livein_iterator MachineBasicBlock::livein_begin() const {			MachineBasicBlock::livein_iterator MachineBasicBlock::livein_begin() const {
	assert(getParent()->getProperties().hasProperty(			assert(getParent()->getProperties().hasProperty(
	MachineFunctionProperties::Property::TracksLiveness) &&			MachineFunctionProperties::Property::TracksLiveness) &&
	"Liveness information is accurate");			"Liveness information is accurate");
	return LiveIns.begin();			return LiveIns.begin();
	}			}

				void MachineBasicBlock::updateCFIInfo(MachineBasicBlock::iterator Pos) {
				// Used for calculating outgoing cfa offset when CFI instruction added at Pos
				// is def_cfa or def_cfa_offset.
				/* For example:
				...
				.cfi_adjust_cfa_offset 4
				...
				.cfi_adjust_cfa_offset 4
				...
				.cfi_def_cfa_offset 16 <---- newly added CFI instruction at Pos
				...
				.cfi_adjust_cfa_offset 4
				...
				Once def_cfa_offset is inserted, outgoing cfa offset is no longer
				iterateeUnsubmitted Not Done Reply Inline Actions I think this would be more clear if you used a local variable. iteratee: I think this would be more clear if you used a local variable.
				calculated as incoming offset incremented by the sum of all adjustments
				(12). It becomes equal to the offset set by the added CFI instruction (16)
				incremented by the sum of adjustments below it (4). Adjustments above the
				added def_cfa_offset directive don't have effect below it anymore and
				therefore don't affect the value of outgoing cfa offset.
				*/
				int AdjustAmount = 0;
				// Used to check if outgoing cfa offset should be updated or not (when def_cfa
				// is inserted).
				bool ShouldSetOffset = true;
				// Used to check if outgoing cfa register should be updated or not (when
				// def_cfa is inserted).
				bool ShouldSetRegister = true;
				const std::vector<MCCFIInstruction> CFIInstructions =
				getParent()->getFrameInstructions();
				MCCFIInstruction CFI = CFIInstructions[Pos->getOperand(0).getCFIIndex()];
				// Type of the CFI instruction that was inserted.
				MCCFIInstruction::OpType CFIType = CFI.getOperation();

				// Check if there are already existing CFI instructions below Pos and see if
				// outgoing CFI info should be updated or not.
				for (MachineBasicBlock::reverse_iterator RI = rbegin();
				RI != Pos.getReverse(); ++RI) {
				if (RI->isCFIInstruction()) {
				MCCFIInstruction::OpType RIType =
				CFIInstructions[RI->getOperand(0).getCFIIndex()].getOperation();
				switch (RIType) {
				case MCCFIInstruction::OpAdjustCfaOffset:
				AdjustAmount +=
				CFIInstructions[RI->getOperand(0).getCFIIndex()].getOffset();
				break;
				case MCCFIInstruction::OpDefCfaOffset:
				// CFI instruction doesn't affect outgoing cfa offset if there is
				// already a def_cfa_offset instruction below it.
				if (CFIType == MCCFIInstruction::OpDefCfaOffset \|\|
				CFIType == MCCFIInstruction::OpAdjustCfaOffset)
				return;
				if (CFIType == MCCFIInstruction::OpDefCfa) {
				// CFI instruction doesn't affect outgoing cfa offset and register
				// if there are both def_cfa_offset and def_cfa_register
				// instructions below it.
				if (!ShouldSetRegister) return;
				ShouldSetOffset = false;
				}
				break;
				case MCCFIInstruction::OpDefCfaRegister:
				// CFI instruction doesn't affect outgoing cfa register if there is
				// already a def_cfa_register instruction below it.
				if (CFIType == MCCFIInstruction::OpDefCfaRegister) return;
				if (CFIType == MCCFIInstruction::OpDefCfa) {
				// CFI instruction doesn't affect outgoing cfa offset and register
				// if there are both def_cfa_offset and def_cfa_register
				// instructions below it.
				if (!ShouldSetOffset) return;
				ShouldSetRegister = false;
				}
				break;
				case MCCFIInstruction::OpDefCfa:
				// CFI instruction doesn't affect outgoing cfa offset and register if
				// there is already a def_cfa instruction below it.
				if (CFIType == MCCFIInstruction::OpDefCfaRegister \|\|
				CFIType == MCCFIInstruction::OpDefCfaOffset \|\|
				CFIType == MCCFIInstruction::OpDefCfa \|\|
				CFIType == MCCFIInstruction::OpAdjustCfaOffset)
				return;
				break;
				default:
				break;
				}
				}
				}
				iterateeUnsubmitted Done Reply Inline Actions Isn't there a missing case here, where if you encounter both a def_cfa_offset and a def_cfa_register, you can early return, like below? iteratee: Isn't there a missing case here, where if you encounter both a def_cfa_offset and a…

				// Update the outgoing CFI info based on the added CFI instruction.
				switch (CFIType) {
				case MCCFIInstruction::OpAdjustCfaOffset:
				setOutgoingCFAOffset(getOutgoingCFAOffset() + CFI.getOffset());
				break;
				case MCCFIInstruction::OpDefCfaOffset:
				setOutgoingCFAOffset(CFI.getOffset() + AdjustAmount);
				break;
				case MCCFIInstruction::OpDefCfaRegister:
				setOutgoingCFARegister(CFI.getRegister());
				break;
				case MCCFIInstruction::OpDefCfa:
				if (ShouldSetOffset) setOutgoingCFAOffset(CFI.getOffset() + AdjustAmount);
				if (ShouldSetRegister) setOutgoingCFARegister(CFI.getRegister());
				break;
				default:
				break;
				}
				}

				void MachineBasicBlock::updateCFIInfoSucc() {
				// Blocks whose successors' CFI info should be updated.
				iterateeUnsubmitted Not Done Reply Inline Actions Why is this offset negative? Can you explain it to me? iteratee: Why is this offset negative? Can you explain it to me?
				violetavAuthorUnsubmitted Not Done Reply Inline Actions A short answer would be: because the value of cfa offset that is passed when creating a CFI instruction with createDefCfa() and createDefCfaOffset() is negated. That is the value that will end up as the actual offset in cfi directives. I was calculating in/out cfa offset with this non-negated value (that is passed when creating these instructions) so I had to negate the offset when reading the value with getOffset(). I am not sure why the correct value is not passed in the first place (e.g. in createAdjustCfaOffset(), the passed value is not negated). I realised there was a bug with calculating in/out cfa offset - for DefCfa and DefCfaOffset values used for calculating in/out cfa offset were negative of the ones that end up in cfi directives, but that was not the case for AdjustCfaOffset. I changed it so that the values stored as in/out cfa offset are the values of actual offsets set by cfi directives (and not their negative values). violetav: A short answer would be: because the value of cfa offset that is passed when creating a CFI…
				std::queue<MachineBasicBlock *> Successors;
				// Keep track of basic blocks that have already been put in the Successors
				// queue.
				std::set<MachineBasicBlock *> ProcessedMBBs;
				// Start with updating CFI info for direct successors of this block.
				Successors.push(this);
				ProcessedMBBs.insert(this);

				// Go through the successors and update their CFI info if needed.
				while (!Successors.empty()) {
				MachineBasicBlock *CurrSucc = Successors.front();
				Successors.pop();

				// Update CFI info for CurrSucc's successors.
				for (auto Succ : CurrSucc->successors()) {
				if (ProcessedMBBs.find(Succ) != ProcessedMBBs.end()) continue;
				if (Succ->getIncomingCFAOffset() == CurrSucc->getOutgoingCFAOffset() &&
				Succ->getIncomingCFARegister() == CurrSucc->getOutgoingCFARegister())
				continue;
				bool ChangedOutgoingInfo = false;
				// Do not update cfa offset if the existing value matches the new.
				if (Succ->getIncomingCFAOffset() != CurrSucc->getOutgoingCFAOffset()) {
				// If the block doesn't have a def_cfa_offset or def_cfa directive,
				// update its outgoing offset.
				if (!Succ->hasDefOffset()) {
				// Succ block doesn't set absolute offset, so the difference between
				// outgoing and incoming offset remains the same. This difference is
				// the sum of offsets set by adjust_cfa_offset directives.
				int AdjustAmount =
				Succ->getOutgoingCFAOffset() - Succ->getIncomingCFAOffset();
				iterateeUnsubmitted Done Reply Inline Actions Should this short-circuit if they already match? iteratee: Should this short-circuit if they already match?
				Succ->setOutgoingCFAOffset(CurrSucc->getOutgoingCFAOffset() +
				AdjustAmount);
				ChangedOutgoingInfo = true;
				}
				Succ->setIncomingCFAOffset(CurrSucc->getOutgoingCFAOffset());
				}
				// Do not update cfa register if the existing value matches the new.
				if (Succ->getIncomingCFARegister() !=
				CurrSucc->getOutgoingCFARegister()) {
				Succ->setIncomingCFARegister(CurrSucc->getOutgoingCFARegister());
				// If the block doesn't have a def_cfa_register or def_cfa directive,
				// update its outgoing register.
				if (!Succ->hasDefRegister()) {
				Succ->setOutgoingCFARegister(Succ->getIncomingCFARegister());
				ChangedOutgoingInfo = true;
				}
				}
				// If Succ's outgoing CFI info has been changed, it's successors should be
				// updated as well.
				if (ChangedOutgoingInfo) {
				Successors.push(Succ);
				ProcessedMBBs.insert(Succ);
				}
				}
				}
				}

				void MachineBasicBlock::recalculateCFIInfo(bool UseExistingIncoming,
				iterateeUnsubmitted Done Reply Inline Actions Same here. It looks like you could short-circuit if they're equal. iteratee: Same here. It looks like you could short-circuit if they're equal.
				int NewIncomingOffset,
				unsigned NewIncomingRegister) {
				// Outgoing cfa offset set by the block.
				int SetOffset;
				// Outgoing cfa register set by the block.
				unsigned SetRegister;
				const std::vector<MCCFIInstruction> &Instrs =
				getParent()->getFrameInstructions();

				// Set initial values to SetOffset and SetRegister. Use existing incoming
				// values or values passed as arguments.
				if (!UseExistingIncoming) {
				// Set new incoming cfa offset and register values.
				setIncomingCFAOffset(NewIncomingOffset);
				setIncomingCFARegister(NewIncomingRegister);
				}

				SetOffset = getIncomingCFAOffset();
				SetRegister = getIncomingCFARegister();

				setDefOffset(false);
				setDefRegister(false);

				// Determine cfa offset and register set by the block.
				for (MachineBasicBlock::iterator MI = begin(); MI != end(); ++MI) {
				if (MI->isCFIInstruction()) {
				unsigned CFIIndex = MI->getOperand(0).getCFIIndex();
				const MCCFIInstruction &CFI = Instrs[CFIIndex];
				if (CFI.getOperation() == MCCFIInstruction::OpDefCfaRegister) {
				SetRegister = CFI.getRegister();
				setDefRegister(true);
				} else if (CFI.getOperation() == MCCFIInstruction::OpDefCfaOffset) {
				SetOffset = CFI.getOffset();
				setDefOffset(true);
				} else if (CFI.getOperation() == MCCFIInstruction::OpAdjustCfaOffset) {
				SetOffset = SetOffset + CFI.getOffset();
				} else if (CFI.getOperation() == MCCFIInstruction::OpDefCfa) {
				SetRegister = CFI.getRegister();
				SetOffset = CFI.getOffset();
				setDefOffset(true);
				setDefRegister(true);
				}
				}
				}

				// Update outgoing CFI info.
				setOutgoingCFAOffset(SetOffset);
				setOutgoingCFARegister(SetRegister);
				}

				void MachineBasicBlock::mergeCFIInfo(MachineBasicBlock *MBB) {
				// Update CFI info. This basic block acquires MBB's outgoing cfa offset and
				// register values.
				setOutgoingCFAOffset(MBB->getOutgoingCFAOffset());
				setOutgoingCFARegister(MBB->getOutgoingCFARegister());
				setDefOffset(hasDefOffset() \|\| MBB->hasDefOffset());
				setDefRegister(hasDefRegister() \|\| MBB->hasDefRegister());
				}
				iterateeUnsubmitted Not Done Reply Inline Actions It feels weird to me that you would scan the block twice. This could be done in the loop above, couldn't it? iteratee: It feels weird to me that you would scan the block twice. This could be done in the loop above…

lib/CodeGen/MachineInstr.cpp

Show First 20 Lines • Show All 299 Lines • ▼ Show 20 Lines	case MachineOperand::MO_RegisterLiveOut: {
const TargetRegisterInfo *TRI = MF->getSubtarget().getRegisterInfo();		const TargetRegisterInfo *TRI = MF->getSubtarget().getRegisterInfo();
unsigned RegMaskSize = (TRI->getNumRegs() + 31) / 32;		unsigned RegMaskSize = (TRI->getNumRegs() + 31) / 32;

// Deep compare of the two RegMasks		// Deep compare of the two RegMasks
return std::equal(RegMask, RegMask + RegMaskSize, OtherRegMask);		return std::equal(RegMask, RegMask + RegMaskSize, OtherRegMask);
}		}
case MachineOperand::MO_MCSymbol:		case MachineOperand::MO_MCSymbol:
return getMCSymbol() == Other.getMCSymbol();		return getMCSymbol() == Other.getMCSymbol();
case MachineOperand::MO_CFIIndex:		case MachineOperand::MO_CFIIndex: {
return getCFIIndex() == Other.getCFIIndex();		const MachineFunction *MF = getParent()->getParent()->getParent();
		const MachineFunction *OtherMF =
		Other.getParent()->getParent()->getParent();
		MCCFIInstruction Inst = MF->getFrameInstructions()[getCFIIndex()];
		MCCFIInstruction OtherInst =
		OtherMF->getFrameInstructions()[Other.getCFIIndex()];
		MCCFIInstruction::OpType op = Inst.getOperation();
		if (op != OtherInst.getOperation()) return false;
		if (op == MCCFIInstruction::OpDefCfa \|\| op == MCCFIInstruction::OpOffset \|\|
		op == MCCFIInstruction::OpRestore \|\|
		op == MCCFIInstruction::OpUndefined \|\|
		op == MCCFIInstruction::OpSameValue \|\|
		op == MCCFIInstruction::OpDefCfaRegister \|\|
		op == MCCFIInstruction::OpRelOffset \|\|
		op == MCCFIInstruction::OpRegister)
		if (Inst.getRegister() != OtherInst.getRegister()) return false;
		if (op == MCCFIInstruction::OpRegister)
		if (Inst.getRegister2() != OtherInst.getRegister2()) return false;
		if (op == MCCFIInstruction::OpDefCfa \|\| op == MCCFIInstruction::OpOffset \|\|
		op == MCCFIInstruction::OpRelOffset \|\|
		op == MCCFIInstruction::OpDefCfaOffset \|\|
		op == MCCFIInstruction::OpAdjustCfaOffset \|\|
		op == MCCFIInstruction::OpGnuArgsSize)
		if (Inst.getOffset() != OtherInst.getOffset()) return false;
		return true;
		}
case MachineOperand::MO_Metadata:		case MachineOperand::MO_Metadata:
return getMetadata() == Other.getMetadata();		return getMetadata() == Other.getMetadata();
case MachineOperand::MO_IntrinsicID:		case MachineOperand::MO_IntrinsicID:
return getIntrinsicID() == Other.getIntrinsicID();		return getIntrinsicID() == Other.getIntrinsicID();
case MachineOperand::MO_Predicate:		case MachineOperand::MO_Predicate:
return getPredicate() == Other.getPredicate();		return getPredicate() == Other.getPredicate();
}		}
llvm_unreachable("Invalid machine operand type");		llvm_unreachable("Invalid machine operand type");
Show All 32 Lines	return hash_combine(MO.getType(), MO.getTargetFlags(),
MO.getBlockAddress(), MO.getOffset());		MO.getBlockAddress(), MO.getOffset());
case MachineOperand::MO_RegisterMask:		case MachineOperand::MO_RegisterMask:
case MachineOperand::MO_RegisterLiveOut:		case MachineOperand::MO_RegisterLiveOut:
return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getRegMask());		return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getRegMask());
case MachineOperand::MO_Metadata:		case MachineOperand::MO_Metadata:
return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getMetadata());		return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getMetadata());
case MachineOperand::MO_MCSymbol:		case MachineOperand::MO_MCSymbol:
return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getMCSymbol());		return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getMCSymbol());
case MachineOperand::MO_CFIIndex:		case MachineOperand::MO_CFIIndex: {
return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getCFIIndex());		const MachineFunction *MF = MO.getParent()->getParent()->getParent();
		MCCFIInstruction Inst = MF->getFrameInstructions()[MO.getCFIIndex()];
		return hash_combine(MO.getType(), MO.getTargetFlags(), Inst.getOperation(),
		Inst.getRegister(), Inst.getRegister2(),
		Inst.getOffset());
		}
case MachineOperand::MO_IntrinsicID:		case MachineOperand::MO_IntrinsicID:
return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getIntrinsicID());		return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getIntrinsicID());
case MachineOperand::MO_Predicate:		case MachineOperand::MO_Predicate:
return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getPredicate());		return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getPredicate());
}		}
llvm_unreachable("Invalid machine operand type");		llvm_unreachable("Invalid machine operand type");
}		}

▲ Show 20 Lines • Show All 2,010 Lines • Show Last 20 Lines

lib/CodeGen/PrologEpilogInserter.cpp

	Show First 20 Lines • Show All 971 Lines • ▼ Show 20 Lines

	/// insertPrologEpilogCode - Scan the function for modified callee saved			/// insertPrologEpilogCode - Scan the function for modified callee saved
	/// registers, insert spill code for these callee saved registers, then add			/// registers, insert spill code for these callee saved registers, then add
	/// prolog and epilog code to the function.			/// prolog and epilog code to the function.
	///			///
	void PEI::insertPrologEpilogCode(MachineFunction &Fn) {			void PEI::insertPrologEpilogCode(MachineFunction &Fn) {
	const TargetFrameLowering &TFI = *Fn.getSubtarget().getFrameLowering();			const TargetFrameLowering &TFI = *Fn.getSubtarget().getFrameLowering();

				// Set initial incoming and outgoing cfa offset and register values for basic
				// blocks.
				TFI.initializeCFIInfo(Fn);

	// Add prologue to the function...			// Add prologue to the function...
	for (MachineBasicBlock *SaveBlock : SaveBlocks)			for (MachineBasicBlock *SaveBlock : SaveBlocks)
	TFI.emitPrologue(Fn, *SaveBlock);			TFI.emitPrologue(Fn, *SaveBlock);

	// Add epilogue to restore the callee-save registers in each exiting block.			// Add epilogue to restore the callee-save registers in each exiting block.
	for (MachineBasicBlock *RestoreBlock : RestoreBlocks)			for (MachineBasicBlock *RestoreBlock : RestoreBlocks)
	TFI.emitEpilogue(Fn, *RestoreBlock);			TFI.emitEpilogue(Fn, *RestoreBlock);

	▲ Show 20 Lines • Show All 165 Lines • Show Last 20 Lines

lib/CodeGen/TailDuplicator.cpp

Show First 20 Lines • Show All 579 Lines • ▼ Show 20 Lines	if (HasIndirectbr && PreRegAlloc)
MaxDuplicateCount = TailDupIndirectBranchSize;		MaxDuplicateCount = TailDupIndirectBranchSize;

// Check the instructions in the block to determine whether tail-duplication		// Check the instructions in the block to determine whether tail-duplication
// is invalid or unlikely to be profitable.		// is invalid or unlikely to be profitable.
unsigned InstrCount = 0;		unsigned InstrCount = 0;
for (MachineInstr &MI : TailBB) {		for (MachineInstr &MI : TailBB) {
// Non-duplicable things shouldn't be tail-duplicated.		// Non-duplicable things shouldn't be tail-duplicated.
if (MI.isNotDuplicable())		if (MI.isNotDuplicable())
return false;		return false;
		iterateeUnsubmitted Not Done Reply Inline Actions This tells me that marking CFI Instructions as not duplicable is wrong for all platforms. iteratee: This tells me that marking CFI Instructions as not duplicable is wrong for all platforms.

// Convergent instructions can be duplicated only if doing so doesn't add		// Convergent instructions can be duplicated only if doing so doesn't add
// new control dependencies, which is what we're going to do here.		// new control dependencies, which is what we're going to do here.
if (MI.isConvergent())		if (MI.isConvergent())
return false;		return false;

// Do not duplicate 'return' instructions if this is a pre-regalloc run.		// Do not duplicate 'return' instructions if this is a pre-regalloc run.
// A return may expand into a lot more instructions (e.g. reload of callee		// A return may expand into a lot more instructions (e.g. reload of callee
// saved registers) after PEI.		// saved registers) after PEI.
if (PreRegAlloc && MI.isReturn())		if (PreRegAlloc && MI.isReturn())
return false;		return false;

// Avoid duplicating calls before register allocation. Calls presents a		// Avoid duplicating calls before register allocation. Calls presents a
// barrier to register allocation so duplicating them may end up increasing		// barrier to register allocation so duplicating them may end up increasing
// spills.		// spills.
if (PreRegAlloc && MI.isCall())		if (PreRegAlloc && MI.isCall())
return false;		return false;

if (!MI.isPHI() && !MI.isDebugValue())		if (!MI.isPHI() && !MI.isDirective())
InstrCount += 1;		InstrCount += 1;
		iterateeUnsubmitted Not Done Reply Inline Actions And this tells me that either it needs to be marked as a debug instruction, or we need a broader category for assembler directives that are not instructions. iteratee: And this tells me that either it needs to be marked as a debug instruction, or we need a…
		iterateeUnsubmitted Done Reply Inline Actions Can you create a function on MI: isDirective() and have it return true for debugvalue and CFIInstruction? iteratee: Can you create a function on MI: isDirective() and have it return true for debugvalue and…

if (InstrCount > MaxDuplicateCount)		if (InstrCount > MaxDuplicateCount)
return false;		return false;
}		}

// Check if any of the successors of TailBB has a PHI node in which the		// Check if any of the successors of TailBB has a PHI node in which the
// value corresponding to TailBB uses a subregister.		// value corresponding to TailBB uses a subregister.
// If a phi node uses a register paired with a subregister, the actual		// If a phi node uses a register paired with a subregister, the actual
▲ Show 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	for (MachineBasicBlock *PredBB : Preds) {

// Update the CFG.		// Update the CFG.
PredBB->removeSuccessor(PredBB->succ_begin());		PredBB->removeSuccessor(PredBB->succ_begin());
assert(PredBB->succ_empty() &&		assert(PredBB->succ_empty() &&
"TailDuplicate called on block with multiple successors!");		"TailDuplicate called on block with multiple successors!");
for (MachineBasicBlock *Succ : TailBB->successors())		for (MachineBasicBlock *Succ : TailBB->successors())
PredBB->addSuccessor(Succ, MBPI->getEdgeProbability(TailBB, Succ));		PredBB->addSuccessor(Succ, MBPI->getEdgeProbability(TailBB, Succ));

		// Update the CFI info for PredBB.
		PredBB->mergeCFIInfo(TailBB);

Changed = true;		Changed = true;
++NumTailDups;		++NumTailDups;
}		}

// If TailBB was duplicated into all its predecessors except for the prior		// If TailBB was duplicated into all its predecessors except for the prior
// block, which falls through unconditionally, move the contents of this		// block, which falls through unconditionally, move the contents of this
// block into the prior block.		// block into the prior block.
MachineBasicBlock *PrevBB = ForcedLayoutPred;		MachineBasicBlock *PrevBB = ForcedLayoutPred;
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	if (PreRegAlloc) {
// No PHIs to worry about, just splice the instructions over.		// No PHIs to worry about, just splice the instructions over.
PrevBB->splice(PrevBB->end(), TailBB, TailBB->begin(), TailBB->end());		PrevBB->splice(PrevBB->end(), TailBB, TailBB->begin(), TailBB->end());
}		}
PrevBB->removeSuccessor(PrevBB->succ_begin());		PrevBB->removeSuccessor(PrevBB->succ_begin());
assert(PrevBB->succ_empty());		assert(PrevBB->succ_empty());
PrevBB->transferSuccessors(TailBB);		PrevBB->transferSuccessors(TailBB);
TDBBs.push_back(PrevBB);		TDBBs.push_back(PrevBB);
Changed = true;		Changed = true;

		// Update the CFI info for PrevBB.
		iterateeUnsubmitted Done Reply Inline Actions It would nice to not have this in the pass, but rather as an abstraction: PrevBB->mergeCFIInfo(TailBB); iteratee: It would nice to not have this in the pass, but rather as an abstraction: PrevBB->mergeCFIInfo…
		PrevBB->mergeCFIInfo(TailBB);
}		}

// If this is after register allocation, there are no phis to fix.		// If this is after register allocation, there are no phis to fix.
if (!PreRegAlloc)		if (!PreRegAlloc)
return Changed;		return Changed;

// If we made no changes so far, we are safe.		// If we made no changes so far, we are safe.
if (!Changed)		if (!Changed)
▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

lib/CodeGen/TargetPassConfig.cpp

Show First 20 Lines • Show All 763 Lines • ▼ Show 20 Lines	if (addGCPasses()) {
if (PrintGCInfo)		if (PrintGCInfo)
addPass(createGCInfoPrinter(dbgs()), false, false);		addPass(createGCInfoPrinter(dbgs()), false, false);
}		}

// Basic block placement.		// Basic block placement.
if (getOptLevel() != CodeGenOpt::None)		if (getOptLevel() != CodeGenOpt::None)
addBlockPlacement();		addBlockPlacement();

		// Verify basic block incoming and outgoing cfa offset and register values.
		addPass(createCFIInfoVerifier());

addPreEmitPass();		addPreEmitPass();

		// Correct CFA calculation rule where needed by inserting appropriate CFI
		// instructions.
		addPass(createCFIInstrInserter(), false);

if (TM->Options.EnableIPRA)		if (TM->Options.EnableIPRA)
// Collect register usage information and produce a register mask of		// Collect register usage information and produce a register mask of
// clobbered registers, to be used to optimize call sites.		// clobbered registers, to be used to optimize call sites.
addPass(createRegUsageInfoCollector());		addPass(createRegUsageInfoCollector());

addPass(&FuncletLayoutID, false);		addPass(&FuncletLayoutID, false);

addPass(&StackMapLivenessID, false);		addPass(&StackMapLivenessID, false);
▲ Show 20 Lines • Show All 251 Lines • Show Last 20 Lines

lib/Target/X86/X86CallFrameOptimization.cpp

Show First 20 Lines • Show All 228 Lines • ▼ Show 20 Lines	bool X86CallFrameOptimization::runOnMachineFunction(MachineFunction &MF) {
MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();

const X86RegisterInfo &RegInfo =		const X86RegisterInfo &RegInfo =
static_cast<const X86RegisterInfo >(STI->getRegisterInfo());		static_cast<const X86RegisterInfo >(STI->getRegisterInfo());
SlotSize = RegInfo.getSlotSize();		SlotSize = RegInfo.getSlotSize();
assert(isPowerOf2_32(SlotSize) && "Expect power of 2 stack slot size");		assert(isPowerOf2_32(SlotSize) && "Expect power of 2 stack slot size");
Log2SlotSize = Log2_32(SlotSize);		Log2SlotSize = Log2_32(SlotSize);

		// Set initial incoming and outgoing cfa offset and register values for basic
		// blocks. This is done here because this pass runs before PEI and can insert
		// CFI instructions.
		// TODO: Find a better solution to this problem.
		TFL->initializeCFIInfo(MF);

if (skipFunction(*MF.getFunction()) \|\| !isLegal(MF))		if (skipFunction(*MF.getFunction()) \|\| !isLegal(MF))
return false;		return false;

unsigned FrameSetupOpcode = TII->getCallFrameSetupOpcode();		unsigned FrameSetupOpcode = TII->getCallFrameSetupOpcode();

bool Changed = false;		bool Changed = false;

ContextVector CallSeqVector;		ContextVector CallSeqVector;
▲ Show 20 Lines • Show All 286 Lines • ▼ Show 20 Lines	case X86::MOV64mr: {
}		}
break;		break;
}		}
}		}

// For debugging, when using SP-based CFA, we need to adjust the CFA		// For debugging, when using SP-based CFA, we need to adjust the CFA
// offset after each push.		// offset after each push.
// TODO: This is needed only if we require precise CFA.		// TODO: This is needed only if we require precise CFA.
if (!TFL->hasFP(MF))		if (!TFL->hasFP(MF)) {
TFL->BuildCFI(		TFL->BuildCFI(MBB, std::next(Push), DL,
MBB, std::next(Push), DL,
MCCFIInstruction::createAdjustCfaOffset(nullptr, SlotSize));		MCCFIInstruction::createAdjustCfaOffset(nullptr, SlotSize));
		// Update the CFI information for MBB and it's successors.
		MBB.updateCFIInfo(std::next(Push));
		MBB.updateCFIInfoSucc();
		}
MBB.erase(MOV);		MBB.erase(MOV);
}		}

// The stack-pointer copy is no longer used in the call sequences.		// The stack-pointer copy is no longer used in the call sequences.
// There should not be any other users, but we can't commit to that, so:		// There should not be any other users, but we can't commit to that, so:
if (Context.SPCopy && MRI->use_empty(Context.SPCopy->getOperand(0).getReg()))		if (Context.SPCopy && MRI->use_empty(Context.SPCopy->getOperand(0).getReg()))
Context.SPCopy->eraseFromParent();		Context.SPCopy->eraseFromParent();

▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

lib/Target/X86/X86FrameLowering.h

Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	public:

/// Sets up EBP and optionally ESI based on the incoming EBP value. Only		/// Sets up EBP and optionally ESI based on the incoming EBP value. Only
/// needed for 32-bit. Used in funclet prologues and at catchret destinations.		/// needed for 32-bit. Used in funclet prologues and at catchret destinations.
MachineBasicBlock::iterator		MachineBasicBlock::iterator
restoreWin32EHStackPointers(MachineBasicBlock &MBB,		restoreWin32EHStackPointers(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI,		MachineBasicBlock::iterator MBBI,
const DebugLoc &DL, bool RestoreSP = false) const;		const DebugLoc &DL, bool RestoreSP = false) const;

		void initializeCFIInfo(MachineFunction &MF) const override;

private:		private:
uint64_t calculateMaxStackAlign(const MachineFunction &MF) const;		uint64_t calculateMaxStackAlign(const MachineFunction &MF) const;

/// Emit target stack probe as a call to a helper function		/// Emit target stack probe as a call to a helper function
void emitStackProbeCall(MachineFunction &MF, MachineBasicBlock &MBB,		void emitStackProbeCall(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, const DebugLoc &DL,		MachineBasicBlock::iterator MBBI, const DebugLoc &DL,
bool InProlog) const;		bool InProlog) const;

Show All 34 Lines

lib/Target/X86/X86FrameLowering.cpp

Show First 20 Lines • Show All 952 Lines • ▼ Show 20 Lines	void X86FrameLowering::emitPrologue(MachineFunction &MF,
bool NeedsDwarfCFI =		bool NeedsDwarfCFI =
!IsWin64Prologue && (MMI.hasDebugInfo() \|\| Fn->needsUnwindTableEntry());		!IsWin64Prologue && (MMI.hasDebugInfo() \|\| Fn->needsUnwindTableEntry());
unsigned FramePtr = TRI->getFrameRegister(MF);		unsigned FramePtr = TRI->getFrameRegister(MF);
const unsigned MachineFramePtr =		const unsigned MachineFramePtr =
STI.isTarget64BitILP32()		STI.isTarget64BitILP32()
? getX86SubSuperRegister(FramePtr, 64) : FramePtr;		? getX86SubSuperRegister(FramePtr, 64) : FramePtr;
unsigned BasePtr = TRI->getBaseRegister();		unsigned BasePtr = TRI->getBaseRegister();
bool HasWinCFI = false;		bool HasWinCFI = false;
		bool InsertedCFI = false;

// Debug location must be unknown since the first debug location is used		// Debug location must be unknown since the first debug location is used
// to determine the end of the prologue.		// to determine the end of the prologue.
DebugLoc DL;		DebugLoc DL;

// Add RETADDR move area to callee saved frame size.		// Add RETADDR move area to callee saved frame size.
int TailCallReturnAddrDelta = X86FI->getTCReturnAddrDelta();		int TailCallReturnAddrDelta = X86FI->getTCReturnAddrDelta();
if (TailCallReturnAddrDelta && IsWin64Prologue)		if (TailCallReturnAddrDelta && IsWin64Prologue)
report_fatal_error("Can't handle guaranteed tail call under win64 yet");		report_fatal_error("Can't handle guaranteed tail call under win64 yet");
▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	BuildMI(MBB, MBBI, DL, TII.get(Is64Bit ? X86::PUSH64r : X86::PUSH32r))
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);

if (NeedsDwarfCFI) {		if (NeedsDwarfCFI) {
// Mark the place where EBP/RBP was saved.		// Mark the place where EBP/RBP was saved.
// Define the current CFA rule to use the provided offset.		// Define the current CFA rule to use the provided offset.
assert(StackSize);		assert(StackSize);
BuildCFI(MBB, MBBI, DL,		BuildCFI(MBB, MBBI, DL,
MCCFIInstruction::createDefCfaOffset(nullptr, 2 * stackGrowth));		MCCFIInstruction::createDefCfaOffset(nullptr, 2 * stackGrowth));
		MBB.setDefOffset(true);
		MBB.updateCFIInfo(std::prev(MBBI));
		InsertedCFI = true;

// Change the rule for the FramePtr to be an "offset" rule.		// Change the rule for the FramePtr to be an "offset" rule.
unsigned DwarfFramePtr = TRI->getDwarfRegNum(MachineFramePtr, true);		unsigned DwarfFramePtr = TRI->getDwarfRegNum(MachineFramePtr, true);
BuildCFI(MBB, MBBI, DL, MCCFIInstruction::createOffset(		BuildCFI(MBB, MBBI, DL, MCCFIInstruction::createOffset(
nullptr, DwarfFramePtr, 2 * stackGrowth));		nullptr, DwarfFramePtr, 2 * stackGrowth));
}		}

if (NeedsWinCFI) {		if (NeedsWinCFI) {
Show All 12 Lines	if (!IsWin64Prologue && !IsFunclet) {
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);

if (NeedsDwarfCFI) {		if (NeedsDwarfCFI) {
// Mark effective beginning of when frame pointer becomes valid.		// Mark effective beginning of when frame pointer becomes valid.
// Define the current CFA to use the EBP/RBP register.		// Define the current CFA to use the EBP/RBP register.
unsigned DwarfFramePtr = TRI->getDwarfRegNum(MachineFramePtr, true);		unsigned DwarfFramePtr = TRI->getDwarfRegNum(MachineFramePtr, true);
BuildCFI(MBB, MBBI, DL, MCCFIInstruction::createDefCfaRegister(		BuildCFI(MBB, MBBI, DL, MCCFIInstruction::createDefCfaRegister(
nullptr, DwarfFramePtr));		nullptr, DwarfFramePtr));
		MBB.setDefRegister(true);
		MBB.updateCFIInfo(std::prev(MBBI));
		InsertedCFI = true;
}		}
}		}
} else {		} else {
assert(!IsFunclet && "funclets without FPs not yet implemented");		assert(!IsFunclet && "funclets without FPs not yet implemented");
NumBytes = StackSize - X86FI->getCalleeSavedFrameSize();		NumBytes = StackSize - X86FI->getCalleeSavedFrameSize();
}		}

// For EH funclets, only allocate enough space for outgoing calls. Save the		// For EH funclets, only allocate enough space for outgoing calls. Save the
Show All 15 Lines	while (MBBI != MBB.end() &&
++MBBI;		++MBBI;

if (!HasFP && NeedsDwarfCFI) {		if (!HasFP && NeedsDwarfCFI) {
// Mark callee-saved push instruction.		// Mark callee-saved push instruction.
// Define the current CFA rule to use the provided offset.		// Define the current CFA rule to use the provided offset.
assert(StackSize);		assert(StackSize);
BuildCFI(MBB, MBBI, DL,		BuildCFI(MBB, MBBI, DL,
MCCFIInstruction::createDefCfaOffset(nullptr, StackOffset));		MCCFIInstruction::createDefCfaOffset(nullptr, StackOffset));
		MBB.setDefOffset(true);
		MBB.updateCFIInfo(std::prev(MBBI));
		InsertedCFI = true;
StackOffset += stackGrowth;		StackOffset += stackGrowth;
}		}

if (NeedsWinCFI) {		if (NeedsWinCFI) {
HasWinCFI = true;		HasWinCFI = true;
BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_PushReg)).addImm(Reg).setMIFlag(		BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_PushReg)).addImm(Reg).setMIFlag(
MachineInstr::FrameSetup);		MachineInstr::FrameSetup);
}		}
▲ Show 20 Lines • Show All 249 Lines • ▼ Show 20 Lines	void X86FrameLowering::emitPrologue(MachineFunction &MF,

if (((!HasFP && NumBytes) \|\| PushedRegs) && NeedsDwarfCFI) {		if (((!HasFP && NumBytes) \|\| PushedRegs) && NeedsDwarfCFI) {
// Mark end of stack pointer adjustment.		// Mark end of stack pointer adjustment.
if (!HasFP && NumBytes) {		if (!HasFP && NumBytes) {
// Define the current CFA rule to use the provided offset.		// Define the current CFA rule to use the provided offset.
assert(StackSize);		assert(StackSize);
BuildCFI(MBB, MBBI, DL, MCCFIInstruction::createDefCfaOffset(		BuildCFI(MBB, MBBI, DL, MCCFIInstruction::createDefCfaOffset(
nullptr, -StackSize + stackGrowth));		nullptr, -StackSize + stackGrowth));
		MBB.setDefOffset(true);
		MBB.updateCFIInfo(std::prev(MBBI));
		InsertedCFI = true;
}		}

// Emit DWARF info specifying the offsets of the callee-saved registers.		// Emit DWARF info specifying the offsets of the callee-saved registers.
if (PushedRegs)		if (PushedRegs)
emitCalleeSavedFrameMoves(MBB, MBBI, DL);		emitCalleeSavedFrameMoves(MBB, MBBI, DL);
}		}

// X86 Interrupt handling function cannot assume anything about the direction		// X86 Interrupt handling function cannot assume anything about the direction
// flag (DF in EFLAGS register). Clear this flag by creating "cld" instruction		// flag (DF in EFLAGS register). Clear this flag by creating "cld" instruction
// in each prologue of interrupt handler function.		// in each prologue of interrupt handler function.
//		//
// FIXME: Create "cld" instruction only in these cases:		// FIXME: Create "cld" instruction only in these cases:
// 1. The interrupt handling function uses any of the "rep" instructions.		// 1. The interrupt handling function uses any of the "rep" instructions.
// 2. Interrupt handling function calls another function.		// 2. Interrupt handling function calls another function.
//		//
if (Fn->getCallingConv() == CallingConv::X86_INTR)		if (Fn->getCallingConv() == CallingConv::X86_INTR)
BuildMI(MBB, MBBI, DL, TII.get(X86::CLD))		BuildMI(MBB, MBBI, DL, TII.get(X86::CLD))
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);

// At this point we know if the function has WinCFI or not.		// At this point we know if the function has WinCFI or not.
MF.setHasWinCFI(HasWinCFI);		MF.setHasWinCFI(HasWinCFI);

		if (InsertedCFI)
		MBB.updateCFIInfoSucc();
}		}

bool X86FrameLowering::canUseLEAForSPInEpilogue(		bool X86FrameLowering::canUseLEAForSPInEpilogue(
const MachineFunction &MF) const {		const MachineFunction &MF) const {
// We can't use LEA instructions for adjusting the stack pointer if we don't		// We can't use LEA instructions for adjusting the stack pointer if we don't
// have a frame pointer in the Win64 ABI. Only ADD instructions may be used		// have a frame pointer in the Win64 ABI. Only ADD instructions may be used
// to deallocate the stack.		// to deallocate the stack.
// This means that we can use LEA for SP in two situations:		// This means that we can use LEA for SP in two situations:
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	void X86FrameLowering::emitEpilogue(MachineFunction &MF,
MachineBasicBlock *TargetMBB = nullptr;		MachineBasicBlock *TargetMBB = nullptr;

// Get the number of bytes to allocate from the FrameInfo.		// Get the number of bytes to allocate from the FrameInfo.
uint64_t StackSize = MFI.getStackSize();		uint64_t StackSize = MFI.getStackSize();
uint64_t MaxAlign = calculateMaxStackAlign(MF);		uint64_t MaxAlign = calculateMaxStackAlign(MF);
unsigned CSSize = X86FI->getCalleeSavedFrameSize();		unsigned CSSize = X86FI->getCalleeSavedFrameSize();
uint64_t NumBytes = 0;		uint64_t NumBytes = 0;

		bool NeedsDwarfCFI = (MF.getMMI().hasDebugInfo() \|\|
		MF.getFunction()->needsUnwindTableEntry()) &&
		(!MF.getSubtarget<X86Subtarget>().isTargetDarwin() &&
		!MF.getSubtarget<X86Subtarget>().isOSWindows());
		bool InsertedCFI = false;

if (RetOpcode && *RetOpcode == X86::CATCHRET) {		if (RetOpcode && *RetOpcode == X86::CATCHRET) {
// SEH shouldn't use catchret.		// SEH shouldn't use catchret.
assert(!isAsynchronousEHPersonality(		assert(!isAsynchronousEHPersonality(
classifyEHPersonality(MF.getFunction()->getPersonalityFn())) &&		classifyEHPersonality(MF.getFunction()->getPersonalityFn())) &&
"SEH should not use CATCHRET");		"SEH should not use CATCHRET");

NumBytes = getWinEHFuncletFrameSize(MF);		NumBytes = getWinEHFuncletFrameSize(MF);
assert(hasFP(MF) && "EH funclets without FP not yet implemented");		assert(hasFP(MF) && "EH funclets without FP not yet implemented");
Show All 18 Lines	if (RetOpcode && *RetOpcode == X86::CATCHRET) {
// realigned.		// realigned.
if (TRI->needsStackRealignment(MF) && !IsWin64Prologue)		if (TRI->needsStackRealignment(MF) && !IsWin64Prologue)
NumBytes = alignTo(FrameSize, MaxAlign);		NumBytes = alignTo(FrameSize, MaxAlign);

// Pop EBP.		// Pop EBP.
BuildMI(MBB, MBBI, DL,		BuildMI(MBB, MBBI, DL,
TII.get(Is64Bit ? X86::POP64r : X86::POP32r), MachineFramePtr)		TII.get(Is64Bit ? X86::POP64r : X86::POP32r), MachineFramePtr)
.setMIFlag(MachineInstr::FrameDestroy);		.setMIFlag(MachineInstr::FrameDestroy);
		if (NeedsDwarfCFI) {
		unsigned DwarfStackPtr =
		TRI->getDwarfRegNum(Is64Bit ? X86::RSP : X86::ESP, true);
		BuildCFI(MBB, MBBI, DL, MCCFIInstruction::createDefCfa(
		nullptr, DwarfStackPtr, -SlotSize));
		--MBBI;
		MBB.setDefOffset(true);
		MBB.setDefRegister(true);
		MBB.updateCFIInfo(MBBI);
		InsertedCFI = true;
		}
} else {		} else {
NumBytes = StackSize - CSSize;		NumBytes = StackSize - CSSize;
}		}
uint64_t SEHStackAllocAmt = NumBytes;		uint64_t SEHStackAllocAmt = NumBytes;

MachineBasicBlock::iterator FirstCSPop = MBBI;		MachineBasicBlock::iterator FirstCSPop = MBBI;
// Skip the callee-saved pop instructions.		// Skip the callee-saved pop instructions.
while (MBBI != MBB.begin()) {		while (MBBI != MBB.begin()) {
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	if (LEAAmount != 0) {
unsigned Opc = (Uses64BitFramePtr ? X86::MOV64rr : X86::MOV32rr);		unsigned Opc = (Uses64BitFramePtr ? X86::MOV64rr : X86::MOV32rr);
BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)		BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)
.addReg(FramePtr);		.addReg(FramePtr);
--MBBI;		--MBBI;
}		}
} else if (NumBytes) {		} else if (NumBytes) {
// Adjust stack pointer back: ESP += numbytes.		// Adjust stack pointer back: ESP += numbytes.
emitSPUpdate(MBB, MBBI, NumBytes, /InEpilogue=/true);		emitSPUpdate(MBB, MBBI, NumBytes, /InEpilogue=/true);
		if (!hasFP(MF) && NeedsDwarfCFI) {
		// Define the current CFA rule to use the provided offset.
		BuildCFI(MBB, MBBI, DL, MCCFIInstruction::createDefCfaOffset(
		nullptr, -CSSize - SlotSize));
		MBB.setDefOffset(true);
		MBB.updateCFIInfo(std::prev(MBBI));
		InsertedCFI = true;
		}
--MBBI;		--MBBI;
}		}

// Windows unwinder will not invoke function's exception handler if IP is		// Windows unwinder will not invoke function's exception handler if IP is
// either in prologue or in epilogue. This behavior causes a problem when a		// either in prologue or in epilogue. This behavior causes a problem when a
// call immediately precedes an epilogue, because the return address points		// call immediately precedes an epilogue, because the return address points
// into the epilogue. To cope with that, we insert an epilogue marker here,		// into the epilogue. To cope with that, we insert an epilogue marker here,
// then replace it with a 'nop' if it ends up immediately after a CALL in the		// then replace it with a 'nop' if it ends up immediately after a CALL in the
// final emitted code.		// final emitted code.
if (NeedsWinCFI && MF.hasWinCFI())		if (NeedsWinCFI && MF.hasWinCFI())
BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_Epilogue));		BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_Epilogue));

		if (!hasFP(MF) && NeedsDwarfCFI) {
		MBBI = FirstCSPop;
		int64_t Offset = -CSSize - SlotSize;
		// Mark callee-saved pop instruction.
		// Define the current CFA rule to use the provided offset.
		while (MBBI != MBB.end()) {
		MachineBasicBlock::iterator PI = MBBI;
		unsigned Opc = PI->getOpcode();
		++MBBI;
		if (Opc == X86::POP32r \|\| Opc == X86::POP64r) {
		Offset += SlotSize;
		BuildCFI(MBB, MBBI, DL,
		MCCFIInstruction::createDefCfaOffset(nullptr, Offset));
		MBB.setDefOffset(true);
		MBB.updateCFIInfo(std::prev(MBBI));
		InsertedCFI = true;
		}
		}
		}

if (!RetOpcode \|\| !isTailCallOpcode(*RetOpcode)) {		if (!RetOpcode \|\| !isTailCallOpcode(*RetOpcode)) {
// Add the return addr area delta back since we are not tail calling.		// Add the return addr area delta back since we are not tail calling.
int Offset = -1 * X86FI->getTCReturnAddrDelta();		int Offset = -1 * X86FI->getTCReturnAddrDelta();
assert(Offset >= 0 && "TCDelta should never be positive");		assert(Offset >= 0 && "TCDelta should never be positive");
if (Offset) {		if (Offset) {
MBBI = MBB.getFirstTerminator();		MBBI = MBB.getFirstTerminator();

// Check for possible merge with preceding ADD instruction.		// Check for possible merge with preceding ADD instruction.
Offset += mergeSPUpdates(MBB, MBBI, true);		Offset += mergeSPUpdates(MBB, MBBI, true);
emitSPUpdate(MBB, MBBI, Offset, /InEpilogue=/true);		emitSPUpdate(MBB, MBBI, Offset, /InEpilogue=/true);
}		}
}		}

		if (InsertedCFI)
		MBB.updateCFIInfoSucc();
}		}

int X86FrameLowering::getFrameIndexReference(const MachineFunction &MF, int FI,		int X86FrameLowering::getFrameIndexReference(const MachineFunction &MF, int FI,
unsigned &FrameReg) const {		unsigned &FrameReg) const {
const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();

bool IsFixed = MFI.isFixedObjectIndex(FI);		bool IsFixed = MFI.isFixedObjectIndex(FI);
// We can't calculate offset from frame pointer if the stack is realigned,		// We can't calculate offset from frame pointer if the stack is realigned,
▲ Show 20 Lines • Show All 658 Lines • ▼ Show 20 Lines	void X86FrameLowering::adjustForSegmentedStacks(
else		else
BuildMI(allocMBB, DL, TII.get(X86::MORESTACK_RET));		BuildMI(allocMBB, DL, TII.get(X86::MORESTACK_RET));

allocMBB->addSuccessor(&PrologueMBB);		allocMBB->addSuccessor(&PrologueMBB);

checkMBB->addSuccessor(allocMBB);		checkMBB->addSuccessor(allocMBB);
checkMBB->addSuccessor(&PrologueMBB);		checkMBB->addSuccessor(&PrologueMBB);

		int InitialOffset = TRI->getSlotSize();
		unsigned InitialRegister = TRI->getDwarfRegNum(StackPtr, true);
		// Set CFI info for checkMBB.
		checkMBB->setIncomingCFAOffset(InitialOffset);
		checkMBB->setIncomingCFARegister(InitialRegister);
		checkMBB->setOutgoingCFAOffset(InitialOffset);
		checkMBB->setOutgoingCFARegister(InitialRegister);
		// Set CFI info for allocMBB.
		allocMBB->setIncomingCFAOffset(InitialOffset);
		allocMBB->setIncomingCFARegister(InitialRegister);
		allocMBB->setOutgoingCFAOffset(InitialOffset);
		allocMBB->setOutgoingCFARegister(InitialRegister);

#ifdef EXPENSIVE_CHECKS		#ifdef EXPENSIVE_CHECKS
MF.verify();		MF.verify();
#endif		#endif
}		}

/// Lookup an ERTS parameter in the !hipe.literals named metadata node.		/// Lookup an ERTS parameter in the !hipe.literals named metadata node.
/// HiPE provides Erlang Runtime System-internal parameters, such as PCB offsets		/// HiPE provides Erlang Runtime System-internal parameters, such as PCB offsets
/// to fields it needs, through a named metadata node "hipe.literals" containing		/// to fields it needs, through a named metadata node "hipe.literals" containing
▲ Show 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	if (MaxStack > Guaranteed) {
addRegOffset(BuildMI(incStackMBB, DL, TII.get(CMPop))		addRegOffset(BuildMI(incStackMBB, DL, TII.get(CMPop))
.addReg(ScratchReg), PReg, false, SPLimitOffset);		.addReg(ScratchReg), PReg, false, SPLimitOffset);
BuildMI(incStackMBB, DL, TII.get(X86::JLE_1)).addMBB(incStackMBB);		BuildMI(incStackMBB, DL, TII.get(X86::JLE_1)).addMBB(incStackMBB);

stackCheckMBB->addSuccessor(&PrologueMBB, {99, 100});		stackCheckMBB->addSuccessor(&PrologueMBB, {99, 100});
stackCheckMBB->addSuccessor(incStackMBB, {1, 100});		stackCheckMBB->addSuccessor(incStackMBB, {1, 100});
incStackMBB->addSuccessor(&PrologueMBB, {99, 100});		incStackMBB->addSuccessor(&PrologueMBB, {99, 100});
incStackMBB->addSuccessor(incStackMBB, {1, 100});		incStackMBB->addSuccessor(incStackMBB, {1, 100});

		int InitialOffset = TRI->getSlotSize();
		unsigned InitialRegister = TRI->getDwarfRegNum(StackPtr, true);
		// Set CFI info to stackCheckMBB.
		stackCheckMBB->setIncomingCFAOffset(InitialOffset);
		stackCheckMBB->setIncomingCFARegister(InitialRegister);
		stackCheckMBB->setOutgoingCFAOffset(InitialOffset);
		stackCheckMBB->setOutgoingCFARegister(InitialRegister);
		// Set CFI info to incStackMBB.
		incStackMBB->setIncomingCFAOffset(InitialOffset);
		incStackMBB->setIncomingCFARegister(InitialRegister);
		incStackMBB->setOutgoingCFAOffset(InitialOffset);
		incStackMBB->setOutgoingCFARegister(InitialRegister);
}		}
#ifdef EXPENSIVE_CHECKS		#ifdef EXPENSIVE_CHECKS
MF.verify();		MF.verify();
#endif		#endif
}		}

bool X86FrameLowering::adjustStackWithPops(MachineBasicBlock &MBB,		bool X86FrameLowering::adjustStackWithPops(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI,		MachineBasicBlock::iterator MBBI,
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	if (!reserveCallFrame) {
unsigned StackAlign = getStackAlignment();		unsigned StackAlign = getStackAlignment();
Amount = alignTo(Amount, StackAlign);		Amount = alignTo(Amount, StackAlign);

MachineModuleInfo &MMI = MF.getMMI();		MachineModuleInfo &MMI = MF.getMMI();
const Function *Fn = MF.getFunction();		const Function *Fn = MF.getFunction();
bool WindowsCFI = MF.getTarget().getMCAsmInfo()->usesWindowsCFI();		bool WindowsCFI = MF.getTarget().getMCAsmInfo()->usesWindowsCFI();
bool DwarfCFI = !WindowsCFI &&		bool DwarfCFI = !WindowsCFI &&
(MMI.hasDebugInfo() \|\| Fn->needsUnwindTableEntry());		(MMI.hasDebugInfo() \|\| Fn->needsUnwindTableEntry());
		bool InsertedCFI = false;

// If we have any exception handlers in this function, and we adjust		// If we have any exception handlers in this function, and we adjust
// the SP before calls, we may need to indicate this to the unwinder		// the SP before calls, we may need to indicate this to the unwinder
// using GNU_ARGS_SIZE. Note that this may be necessary even when		// using GNU_ARGS_SIZE. Note that this may be necessary even when
// Amount == 0, because the preceding function may have set a non-0		// Amount == 0, because the preceding function may have set a non-0
// GNU_ARGS_SIZE.		// GNU_ARGS_SIZE.
// TODO: We don't need to reset this between subsequent functions,		// TODO: We don't need to reset this between subsequent functions,
// if it didn't change.		// if it didn't change.
Show All 9 Lines	if (!reserveCallFrame) {

// Factor out the amount that gets handled inside the sequence		// Factor out the amount that gets handled inside the sequence
// (Pushes of argument for frame setup, callee pops for frame destroy)		// (Pushes of argument for frame setup, callee pops for frame destroy)
Amount -= InternalAmt;		Amount -= InternalAmt;

// TODO: This is needed only if we require precise CFA.		// TODO: This is needed only if we require precise CFA.
// If this is a callee-pop calling convention, emit a CFA adjust for		// If this is a callee-pop calling convention, emit a CFA adjust for
// the amount the callee popped.		// the amount the callee popped.
if (isDestroy && InternalAmt && DwarfCFI && !hasFP(MF))		if (isDestroy && InternalAmt && DwarfCFI && !hasFP(MF)) {
BuildCFI(MBB, InsertPos, DL,		BuildCFI(MBB, InsertPos, DL,
MCCFIInstruction::createAdjustCfaOffset(nullptr, -InternalAmt));		MCCFIInstruction::createAdjustCfaOffset(nullptr, -InternalAmt));
		MBB.updateCFIInfo(std::prev(InsertPos));
		InsertedCFI = true;
		}
// Add Amount to SP to destroy a frame, or subtract to setup.		// Add Amount to SP to destroy a frame, or subtract to setup.
int64_t StackAdjustment = isDestroy ? Amount : -Amount;		int64_t StackAdjustment = isDestroy ? Amount : -Amount;
int64_t CfaAdjustment = -StackAdjustment;		int64_t CfaAdjustment = -StackAdjustment;

if (StackAdjustment) {		if (StackAdjustment) {
// Merge with any previous or following adjustment instruction. Note: the		// Merge with any previous or following adjustment instruction. Note: the
// instructions merged with here do not have CFI, so their stack		// instructions merged with here do not have CFI, so their stack
// adjustments do not feed into CfaAdjustment.		// adjustments do not feed into CfaAdjustment.
Show All 17 Lines	if (DwarfCFI && !hasFP(MF)) {
// it to be more precise.		// it to be more precise.

// TODO: When not using precise CFA, we also need to adjust for the		// TODO: When not using precise CFA, we also need to adjust for the
// InternalAmt here.		// InternalAmt here.
if (CfaAdjustment) {		if (CfaAdjustment) {
BuildCFI(MBB, InsertPos, DL,		BuildCFI(MBB, InsertPos, DL,
MCCFIInstruction::createAdjustCfaOffset(nullptr,		MCCFIInstruction::createAdjustCfaOffset(nullptr,
CfaAdjustment));		CfaAdjustment));
		MBB.updateCFIInfo(std::prev(InsertPos));
		InsertedCFI = true;
}		}
}		}

		if (InsertedCFI) MBB.updateCFIInfoSucc();

return I;		return I;
}		}

if (isDestroy && InternalAmt) {		if (isDestroy && InternalAmt) {
// If we are performing frame pointer elimination and if the callee pops		// If we are performing frame pointer elimination and if the callee pops
// something off the stack pointer, add it back. We do this until we have		// something off the stack pointer, add it back. We do this until we have
// more advanced stack pointer tracking ability.		// more advanced stack pointer tracking ability.
// We are not tracking the stack pointer adjustment by the callee, so make		// We are not tracking the stack pointer adjustment by the callee, so make
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(X86::MOV32rm), FramePtr),
UsedReg, true, Offset)		UsedReg, true, Offset)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
} else {		} else {
llvm_unreachable("32-bit frames with WinEH must use FramePtr or BasePtr");		llvm_unreachable("32-bit frames with WinEH must use FramePtr or BasePtr");
}		}
return MBBI;		return MBBI;
}		}

		void X86FrameLowering::initializeCFIInfo(MachineFunction &MF) const {
		int InitialOffset = TRI->getSlotSize();
		unsigned InitialRegister = TRI->getDwarfRegNum(StackPtr, true);
		// Initialize CFI info if it hasn't already been initialized.
		for (auto &MBB : MF) {
		if (MBB.getIncomingCFAOffset() == -1)
		MBB.setIncomingCFAOffset(InitialOffset);
		if (MBB.getOutgoingCFAOffset() == -1)
		MBB.setOutgoingCFAOffset(InitialOffset);
		if (MBB.getIncomingCFARegister() == 0)
		MBB.setIncomingCFARegister(InitialRegister);
		if (MBB.getOutgoingCFARegister() == 0)
		MBB.setOutgoingCFARegister(InitialRegister);
		}
		}

namespace {		namespace {
// Struct used by orderFrameObjects to help sort the stack objects.		// Struct used by orderFrameObjects to help sort the stack objects.
struct X86FrameSortingObject {		struct X86FrameSortingObject {
bool IsValid = false; // true if we care about this Object.		bool IsValid = false; // true if we care about this Object.
unsigned ObjectIndex = 0; // Index of Object into MFI list.		unsigned ObjectIndex = 0; // Index of Object into MFI list.
unsigned ObjectSize = 0; // Size of Object in bytes.		unsigned ObjectSize = 0; // Size of Object in bytes.
unsigned ObjectAlignment = 1; // Alignment of Object in bytes.		unsigned ObjectAlignment = 1; // Alignment of Object in bytes.
unsigned ObjectNumUses = 0; // Object static number of uses.		unsigned ObjectNumUses = 0; // Object static number of uses.
▲ Show 20 Lines • Show All 204 Lines • Show Last 20 Lines

test/CodeGen/X86/2009-03-16-PHIElimInLPad.ll

	Show All 17 Lines
	lpad: ; preds = %cont, %entry			lpad: ; preds = %cont, %entry
	%v = phi i32 [ %x, %entry ], [ %a, %cont ] ; <i32> [#uses=1]			%v = phi i32 [ %x, %entry ], [ %a, %cont ] ; <i32> [#uses=1]
	%exn = landingpad {i8*, i32}			%exn = landingpad {i8*, i32}
	cleanup			cleanup
	ret i32 %v			ret i32 %v
	}			}

	; CHECK: lpad			; CHECK: lpad
				; CHECK-NEXT: :
				; CHECK-NEXT: .cfi_def_cfa_offset 16
	; CHECK-NEXT: Ltmp			; CHECK-NEXT: Ltmp

	declare i32 @__gxx_personality_v0(...)			declare i32 @__gxx_personality_v0(...)

test/CodeGen/X86/2011-10-19-widen_vselect.ll

	Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
	; X32-NEXT: cmpeqps %xmm2, %xmm1			; X32-NEXT: cmpeqps %xmm2, %xmm1
	; X32-NEXT: movaps %xmm1, %xmm0			; X32-NEXT: movaps %xmm1, %xmm0
	; X32-NEXT: blendvps %xmm0, %xmm2, %xmm4			; X32-NEXT: blendvps %xmm0, %xmm2, %xmm4
	; X32-NEXT: extractps $1, %xmm4, {{[0-9]+}}(%esp)			; X32-NEXT: extractps $1, %xmm4, {{[0-9]+}}(%esp)
	; X32-NEXT: movss %xmm4, {{[0-9]+}}(%esp)			; X32-NEXT: movss %xmm4, {{[0-9]+}}(%esp)
	; X32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X32-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X32-NEXT: movsd %xmm0, {{[0-9]+}}(%esp)			; X32-NEXT: movsd %xmm0, {{[0-9]+}}(%esp)
	; X32-NEXT: addl $60, %esp			; X32-NEXT: addl $60, %esp
				; X32-NEXT: .Lcfi1:
				; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: full_test:			; X64-LABEL: full_test:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero			; X64-NEXT: movsd {{.*#+}} xmm2 = mem[0],zero
	; X64-NEXT: cvttps2dq %xmm2, %xmm0			; X64-NEXT: cvttps2dq %xmm2, %xmm0
	; X64-NEXT: cvtdq2ps %xmm0, %xmm1			; X64-NEXT: cvtdq2ps %xmm0, %xmm1
	; X64-NEXT: xorps %xmm0, %xmm0			; X64-NEXT: xorps %xmm0, %xmm0
	Show All 32 Lines

test/CodeGen/X86/GlobalISel/add-scalar.ll

	Show All 17 Lines
	; X32-NEXT: movl %esp, %ebp			; X32-NEXT: movl %esp, %ebp
	; X32-NEXT: .Lcfi2:			; X32-NEXT: .Lcfi2:
	; X32-NEXT: .cfi_def_cfa_register %ebp			; X32-NEXT: .cfi_def_cfa_register %ebp
	; X32-NEXT: movl 16(%ebp), %eax			; X32-NEXT: movl 16(%ebp), %eax
	; X32-NEXT: movl 20(%ebp), %edx			; X32-NEXT: movl 20(%ebp), %edx
	; X32-NEXT: addl 8(%ebp), %eax			; X32-NEXT: addl 8(%ebp), %eax
	; X32-NEXT: adcl 12(%ebp), %edx			; X32-NEXT: adcl 12(%ebp), %edx
	; X32-NEXT: popl %ebp			; X32-NEXT: popl %ebp
				; X32-NEXT: .Lcfi3:
				; X32-NEXT: .cfi_def_cfa %esp, 4
	; X32-NEXT: retl			; X32-NEXT: retl
	%ret = add i64 %arg1, %arg2			%ret = add i64 %arg1, %arg2
	ret i64 %ret			ret i64 %ret
	}			}

	define i32 @test_add_i32(i32 %arg1, i32 %arg2) {			define i32 @test_add_i32(i32 %arg1, i32 %arg2) {
	; X64-LABEL: test_add_i32:			; X64-LABEL: test_add_i32:
	; X64: # BB#0:			; X64: # BB#0:
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

test/CodeGen/X86/GlobalISel/frameIndex.ll

	Show All 13 Lines
	;			;
	; X32-LABEL: allocai32:			; X32-LABEL: allocai32:
	; X32: # BB#0:			; X32: # BB#0:
	; X32-NEXT: pushl %eax			; X32-NEXT: pushl %eax
	; X32-NEXT: .Lcfi0:			; X32-NEXT: .Lcfi0:
	; X32-NEXT: .cfi_def_cfa_offset 8			; X32-NEXT: .cfi_def_cfa_offset 8
	; X32-NEXT: movl %esp, %eax			; X32-NEXT: movl %esp, %eax
	; X32-NEXT: popl %ecx			; X32-NEXT: popl %ecx
				; X32-NEXT: .Lcfi1:
				; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X32ABI-LABEL: allocai32:			; X32ABI-LABEL: allocai32:
	; X32ABI: # BB#0:			; X32ABI: # BB#0:
	; X32ABI-NEXT: leal -4(%rsp), %eax			; X32ABI-NEXT: leal -4(%rsp), %eax
	; X32ABI-NEXT: retq			; X32ABI-NEXT: retq
	%ptr1 = alloca i32			%ptr1 = alloca i32
	ret i32* %ptr1			ret i32* %ptr1
	}			}

test/CodeGen/X86/O0-pipeline.ll

	Show All 40 Lines
	; CHECK-NEXT: Two-Address instruction pass			; CHECK-NEXT: Two-Address instruction pass
	; CHECK-NEXT: Fast Register Allocator			; CHECK-NEXT: Fast Register Allocator
	; CHECK-NEXT: Bundle Machine CFG Edges			; CHECK-NEXT: Bundle Machine CFG Edges
	; CHECK-NEXT: X86 FP Stackifier			; CHECK-NEXT: X86 FP Stackifier
	; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization			; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization
	; CHECK-NEXT: Post-RA pseudo instruction expansion pass			; CHECK-NEXT: Post-RA pseudo instruction expansion pass
	; CHECK-NEXT: X86 pseudo instruction expansion pass			; CHECK-NEXT: X86 pseudo instruction expansion pass
	; CHECK-NEXT: Analyze Machine Code For Garbage Collection			; CHECK-NEXT: Analyze Machine Code For Garbage Collection
				; CHECK-NEXT: Verify that corresponding in/out CFI info matches
	; CHECK-NEXT: X86 vzeroupper inserter			; CHECK-NEXT: X86 vzeroupper inserter
				; CHECK-NEXT: CFI Instruction Inserter
	; CHECK-NEXT: Contiguously Lay Out Funclets			; CHECK-NEXT: Contiguously Lay Out Funclets
	; CHECK-NEXT: StackMap Liveness Analysis			; CHECK-NEXT: StackMap Liveness Analysis
	; CHECK-NEXT: Live DEBUG_VALUE analysis			; CHECK-NEXT: Live DEBUG_VALUE analysis
	; CHECK-NEXT: Insert fentry calls			; CHECK-NEXT: Insert fentry calls
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: Machine Natural Loop Construction			; CHECK-NEXT: Machine Natural Loop Construction
	; CHECK-NEXT: Insert XRay ops			; CHECK-NEXT: Insert XRay ops
	; CHECK-NEXT: Implement the 'patchable-function' attribute			; CHECK-NEXT: Implement the 'patchable-function' attribute
	Show All 10 Lines

test/CodeGen/X86/avg.ll

	Show First 20 Lines • Show All 585 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: vpand %xmm5, %xmm4, %xmm4			; AVX1-NEXT: vpand %xmm5, %xmm4, %xmm4
	; AVX1-NEXT: vpand %xmm5, %xmm0, %xmm0			; AVX1-NEXT: vpand %xmm5, %xmm0, %xmm0
	; AVX1-NEXT: vpackuswb %xmm4, %xmm0, %xmm0			; AVX1-NEXT: vpackuswb %xmm4, %xmm0, %xmm0
	; AVX1-NEXT: vpackuswb %xmm3, %xmm0, %xmm0			; AVX1-NEXT: vpackuswb %xmm3, %xmm0, %xmm0
	; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
	; AVX1-NEXT: vmovups %ymm0, (%rax)			; AVX1-NEXT: vmovups %ymm0, (%rax)
	; AVX1-NEXT: vmovups %ymm1, (%rax)			; AVX1-NEXT: vmovups %ymm1, (%rax)
	; AVX1-NEXT: addq $24, %rsp			; AVX1-NEXT: addq $24, %rsp
				; AVX1-NEXT: .Lcfi1:
				; AVX1-NEXT: .cfi_def_cfa_offset 8
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: avg_v64i8:			; AVX2-LABEL: avg_v64i8:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero,mem[4],zero,zero,zero,mem[5],zero,zero,zero,mem[6],zero,zero,zero,mem[7],zero,zero,zero			; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero,mem[4],zero,zero,zero,mem[5],zero,zero,zero,mem[6],zero,zero,zero,mem[7],zero,zero,zero
	; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm1 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero,mem[4],zero,zero,zero,mem[5],zero,zero,zero,mem[6],zero,zero,zero,mem[7],zero,zero,zero			; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm1 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero,mem[4],zero,zero,zero,mem[5],zero,zero,zero,mem[6],zero,zero,zero,mem[7],zero,zero,zero
	; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm2 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero,mem[4],zero,zero,zero,mem[5],zero,zero,zero,mem[6],zero,zero,zero,mem[7],zero,zero,zero			; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm2 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero,mem[4],zero,zero,zero,mem[5],zero,zero,zero,mem[6],zero,zero,zero,mem[7],zero,zero,zero
	▲ Show 20 Lines • Show All 2,367 Lines • Show Last 20 Lines

test/CodeGen/X86/avx512-vbroadcast.ll

	Show First 20 Lines • Show All 408 Lines • ▼ Show 20 Lines
	; ALL-NEXT: subq $24, %rsp			; ALL-NEXT: subq $24, %rsp
	; ALL-NEXT: .Lcfi0:			; ALL-NEXT: .Lcfi0:
	; ALL-NEXT: .cfi_def_cfa_offset 32			; ALL-NEXT: .cfi_def_cfa_offset 32
	; ALL-NEXT: vaddss %xmm0, %xmm0, %xmm0			; ALL-NEXT: vaddss %xmm0, %xmm0, %xmm0
	; ALL-NEXT: vmovaps %xmm0, (%rsp) # 16-byte Spill			; ALL-NEXT: vmovaps %xmm0, (%rsp) # 16-byte Spill
	; ALL-NEXT: callq func_f32			; ALL-NEXT: callq func_f32
	; ALL-NEXT: vbroadcastss (%rsp), %zmm0 # 16-byte Folded Reload			; ALL-NEXT: vbroadcastss (%rsp), %zmm0 # 16-byte Folded Reload
	; ALL-NEXT: addq $24, %rsp			; ALL-NEXT: addq $24, %rsp
				; ALL-NEXT: .Lcfi1:
				; ALL-NEXT: .cfi_def_cfa_offset 8
	; ALL-NEXT: retq			; ALL-NEXT: retq
	%a = fadd float %x, %x			%a = fadd float %x, %x
	call void @func_f32(float %a)			call void @func_f32(float %a)
	%b = insertelement <16 x float> undef, float %a, i32 0			%b = insertelement <16 x float> undef, float %a, i32 0
	%c = shufflevector <16 x float> %b, <16 x float> undef, <16 x i32> zeroinitializer			%c = shufflevector <16 x float> %b, <16 x float> undef, <16 x i32> zeroinitializer
	ret <16 x float> %c			ret <16 x float> %c
	}			}

	declare void @func_f64(double)			declare void @func_f64(double)
	define <8 x double> @broadcast_sd_spill(double %x) {			define <8 x double> @broadcast_sd_spill(double %x) {
	; ALL-LABEL: broadcast_sd_spill:			; ALL-LABEL: broadcast_sd_spill:
	; ALL: # BB#0:			; ALL: # BB#0:
	; ALL-NEXT: subq $24, %rsp			; ALL-NEXT: subq $24, %rsp
	; ALL-NEXT: .Lcfi1:			; ALL-NEXT: .Lcfi2:
	; ALL-NEXT: .cfi_def_cfa_offset 32			; ALL-NEXT: .cfi_def_cfa_offset 32
	; ALL-NEXT: vaddsd %xmm0, %xmm0, %xmm0			; ALL-NEXT: vaddsd %xmm0, %xmm0, %xmm0
	; ALL-NEXT: vmovapd %xmm0, (%rsp) # 16-byte Spill			; ALL-NEXT: vmovapd %xmm0, (%rsp) # 16-byte Spill
	; ALL-NEXT: callq func_f64			; ALL-NEXT: callq func_f64
	; ALL-NEXT: vbroadcastsd (%rsp), %zmm0 # 16-byte Folded Reload			; ALL-NEXT: vbroadcastsd (%rsp), %zmm0 # 16-byte Folded Reload
	; ALL-NEXT: addq $24, %rsp			; ALL-NEXT: addq $24, %rsp
				; ALL-NEXT: .Lcfi3:
				; ALL-NEXT: .cfi_def_cfa_offset 8
	; ALL-NEXT: retq			; ALL-NEXT: retq
	%a = fadd double %x, %x			%a = fadd double %x, %x
	call void @func_f64(double %a)			call void @func_f64(double %a)
	%b = insertelement <8 x double> undef, double %a, i32 0			%b = insertelement <8 x double> undef, double %a, i32 0
	%c = shufflevector <8 x double> %b, <8 x double> undef, <8 x i32> zeroinitializer			%c = shufflevector <8 x double> %b, <8 x double> undef, <8 x i32> zeroinitializer
	ret <8 x double> %c			ret <8 x double> %c
	}			}

test/CodeGen/X86/avx512bw-intrinsics-upgrade.ll

	Show First 20 Lines • Show All 283 Lines • ▼ Show 20 Lines
	; AVX512F-32-NEXT: subl $12, %esp			; AVX512F-32-NEXT: subl $12, %esp
	; AVX512F-32-NEXT: .Lcfi0:			; AVX512F-32-NEXT: .Lcfi0:
	; AVX512F-32-NEXT: .cfi_def_cfa_offset 16			; AVX512F-32-NEXT: .cfi_def_cfa_offset 16
	; AVX512F-32-NEXT: vpcmpeqb %zmm1, %zmm0, %k0			; AVX512F-32-NEXT: vpcmpeqb %zmm1, %zmm0, %k0
	; AVX512F-32-NEXT: kmovq %k0, (%esp)			; AVX512F-32-NEXT: kmovq %k0, (%esp)
	; AVX512F-32-NEXT: movl (%esp), %eax			; AVX512F-32-NEXT: movl (%esp), %eax
	; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: addl $12, %esp			; AVX512F-32-NEXT: addl $12, %esp
				; AVX512F-32-NEXT: .Lcfi1:
				; AVX512F-32-NEXT: .cfi_def_cfa_offset 4
	; AVX512F-32-NEXT: retl			; AVX512F-32-NEXT: retl
	%res = call i64 @llvm.x86.avx512.mask.pcmpeq.b.512(<64 x i8> %a, <64 x i8> %b, i64 -1)			%res = call i64 @llvm.x86.avx512.mask.pcmpeq.b.512(<64 x i8> %a, <64 x i8> %b, i64 -1)
	ret i64 %res			ret i64 %res
	}			}

	define i64 @test_mask_pcmpeq_b(<64 x i8> %a, <64 x i8> %b, i64 %mask) {			define i64 @test_mask_pcmpeq_b(<64 x i8> %a, <64 x i8> %b, i64 %mask) {
	; AVX512BW-LABEL: test_mask_pcmpeq_b:			; AVX512BW-LABEL: test_mask_pcmpeq_b:
	; AVX512BW: ## BB#0:			; AVX512BW: ## BB#0:
	; AVX512BW-NEXT: kmovq %rdi, %k1			; AVX512BW-NEXT: kmovq %rdi, %k1
	; AVX512BW-NEXT: vpcmpeqb %zmm1, %zmm0, %k0 {%k1}			; AVX512BW-NEXT: vpcmpeqb %zmm1, %zmm0, %k0 {%k1}
	; AVX512BW-NEXT: kmovq %k0, %rax			; AVX512BW-NEXT: kmovq %k0, %rax
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512F-32-LABEL: test_mask_pcmpeq_b:			; AVX512F-32-LABEL: test_mask_pcmpeq_b:
	; AVX512F-32: # BB#0:			; AVX512F-32: # BB#0:
	; AVX512F-32-NEXT: subl $12, %esp			; AVX512F-32-NEXT: subl $12, %esp
	; AVX512F-32-NEXT: .Lcfi1:			; AVX512F-32-NEXT: .Lcfi2:
	; AVX512F-32-NEXT: .cfi_def_cfa_offset 16			; AVX512F-32-NEXT: .cfi_def_cfa_offset 16
	; AVX512F-32-NEXT: kmovq {{[0-9]+}}(%esp), %k1			; AVX512F-32-NEXT: kmovq {{[0-9]+}}(%esp), %k1
	; AVX512F-32-NEXT: vpcmpeqb %zmm1, %zmm0, %k0 {%k1}			; AVX512F-32-NEXT: vpcmpeqb %zmm1, %zmm0, %k0 {%k1}
	; AVX512F-32-NEXT: kmovq %k0, (%esp)			; AVX512F-32-NEXT: kmovq %k0, (%esp)
	; AVX512F-32-NEXT: movl (%esp), %eax			; AVX512F-32-NEXT: movl (%esp), %eax
	; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: addl $12, %esp			; AVX512F-32-NEXT: addl $12, %esp
				; AVX512F-32-NEXT: .Lcfi3:
				; AVX512F-32-NEXT: .cfi_def_cfa_offset 4
	; AVX512F-32-NEXT: retl			; AVX512F-32-NEXT: retl
	%res = call i64 @llvm.x86.avx512.mask.pcmpeq.b.512(<64 x i8> %a, <64 x i8> %b, i64 %mask)			%res = call i64 @llvm.x86.avx512.mask.pcmpeq.b.512(<64 x i8> %a, <64 x i8> %b, i64 %mask)
	ret i64 %res			ret i64 %res
	}			}

	declare i64 @llvm.x86.avx512.mask.pcmpeq.b.512(<64 x i8>, <64 x i8>, i64)			declare i64 @llvm.x86.avx512.mask.pcmpeq.b.512(<64 x i8>, <64 x i8>, i64)

	define i32 @test_pcmpeq_w(<32 x i16> %a, <32 x i16> %b) {			define i32 @test_pcmpeq_w(<32 x i16> %a, <32 x i16> %b) {
	Show All 37 Lines
	; AVX512BW: ## BB#0:			; AVX512BW: ## BB#0:
	; AVX512BW-NEXT: vpcmpgtb %zmm1, %zmm0, %k0			; AVX512BW-NEXT: vpcmpgtb %zmm1, %zmm0, %k0
	; AVX512BW-NEXT: kmovq %k0, %rax			; AVX512BW-NEXT: kmovq %k0, %rax
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512F-32-LABEL: test_pcmpgt_b:			; AVX512F-32-LABEL: test_pcmpgt_b:
	; AVX512F-32: # BB#0:			; AVX512F-32: # BB#0:
	; AVX512F-32-NEXT: subl $12, %esp			; AVX512F-32-NEXT: subl $12, %esp
	; AVX512F-32-NEXT: .Lcfi2:			; AVX512F-32-NEXT: .Lcfi4:
	; AVX512F-32-NEXT: .cfi_def_cfa_offset 16			; AVX512F-32-NEXT: .cfi_def_cfa_offset 16
	; AVX512F-32-NEXT: vpcmpgtb %zmm1, %zmm0, %k0			; AVX512F-32-NEXT: vpcmpgtb %zmm1, %zmm0, %k0
	; AVX512F-32-NEXT: kmovq %k0, (%esp)			; AVX512F-32-NEXT: kmovq %k0, (%esp)
	; AVX512F-32-NEXT: movl (%esp), %eax			; AVX512F-32-NEXT: movl (%esp), %eax
	; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: addl $12, %esp			; AVX512F-32-NEXT: addl $12, %esp
				; AVX512F-32-NEXT: .Lcfi5:
				; AVX512F-32-NEXT: .cfi_def_cfa_offset 4
	; AVX512F-32-NEXT: retl			; AVX512F-32-NEXT: retl
	%res = call i64 @llvm.x86.avx512.mask.pcmpgt.b.512(<64 x i8> %a, <64 x i8> %b, i64 -1)			%res = call i64 @llvm.x86.avx512.mask.pcmpgt.b.512(<64 x i8> %a, <64 x i8> %b, i64 -1)
	ret i64 %res			ret i64 %res
	}			}

	define i64 @test_mask_pcmpgt_b(<64 x i8> %a, <64 x i8> %b, i64 %mask) {			define i64 @test_mask_pcmpgt_b(<64 x i8> %a, <64 x i8> %b, i64 %mask) {
	; AVX512BW-LABEL: test_mask_pcmpgt_b:			; AVX512BW-LABEL: test_mask_pcmpgt_b:
	; AVX512BW: ## BB#0:			; AVX512BW: ## BB#0:
	; AVX512BW-NEXT: kmovq %rdi, %k1			; AVX512BW-NEXT: kmovq %rdi, %k1
	; AVX512BW-NEXT: vpcmpgtb %zmm1, %zmm0, %k0 {%k1}			; AVX512BW-NEXT: vpcmpgtb %zmm1, %zmm0, %k0 {%k1}
	; AVX512BW-NEXT: kmovq %k0, %rax			; AVX512BW-NEXT: kmovq %k0, %rax
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512F-32-LABEL: test_mask_pcmpgt_b:			; AVX512F-32-LABEL: test_mask_pcmpgt_b:
	; AVX512F-32: # BB#0:			; AVX512F-32: # BB#0:
	; AVX512F-32-NEXT: subl $12, %esp			; AVX512F-32-NEXT: subl $12, %esp
	; AVX512F-32-NEXT: .Lcfi3:			; AVX512F-32-NEXT: .Lcfi6:
	; AVX512F-32-NEXT: .cfi_def_cfa_offset 16			; AVX512F-32-NEXT: .cfi_def_cfa_offset 16
	; AVX512F-32-NEXT: kmovq {{[0-9]+}}(%esp), %k1			; AVX512F-32-NEXT: kmovq {{[0-9]+}}(%esp), %k1
	; AVX512F-32-NEXT: vpcmpgtb %zmm1, %zmm0, %k0 {%k1}			; AVX512F-32-NEXT: vpcmpgtb %zmm1, %zmm0, %k0 {%k1}
	; AVX512F-32-NEXT: kmovq %k0, (%esp)			; AVX512F-32-NEXT: kmovq %k0, (%esp)
	; AVX512F-32-NEXT: movl (%esp), %eax			; AVX512F-32-NEXT: movl (%esp), %eax
	; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: addl $12, %esp			; AVX512F-32-NEXT: addl $12, %esp
				; AVX512F-32-NEXT: .Lcfi7:
				; AVX512F-32-NEXT: .cfi_def_cfa_offset 4
	; AVX512F-32-NEXT: retl			; AVX512F-32-NEXT: retl
	%res = call i64 @llvm.x86.avx512.mask.pcmpgt.b.512(<64 x i8> %a, <64 x i8> %b, i64 %mask)			%res = call i64 @llvm.x86.avx512.mask.pcmpgt.b.512(<64 x i8> %a, <64 x i8> %b, i64 %mask)
	ret i64 %res			ret i64 %res
	}			}

	declare i64 @llvm.x86.avx512.mask.pcmpgt.b.512(<64 x i8>, <64 x i8>, i64)			declare i64 @llvm.x86.avx512.mask.pcmpgt.b.512(<64 x i8>, <64 x i8>, i64)

	define i32 @test_pcmpgt_w(<32 x i16> %a, <32 x i16> %b) {			define i32 @test_pcmpgt_w(<32 x i16> %a, <32 x i16> %b) {
	▲ Show 20 Lines • Show All 1,180 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: kxnorq %k0, %k0, %k0			; AVX512BW-NEXT: kxnorq %k0, %k0, %k0
	; AVX512BW-NEXT: kmovq %k0, %rax			; AVX512BW-NEXT: kmovq %k0, %rax
	; AVX512BW-NEXT: addq %rcx, %rax			; AVX512BW-NEXT: addq %rcx, %rax
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512F-32-LABEL: test_cmp_b_512:			; AVX512F-32-LABEL: test_cmp_b_512:
	; AVX512F-32: # BB#0:			; AVX512F-32: # BB#0:
	; AVX512F-32-NEXT: subl $60, %esp			; AVX512F-32-NEXT: subl $60, %esp
	; AVX512F-32-NEXT: .Lcfi4:			; AVX512F-32-NEXT: .Lcfi8:
	; AVX512F-32-NEXT: .cfi_def_cfa_offset 64			; AVX512F-32-NEXT: .cfi_def_cfa_offset 64
	; AVX512F-32-NEXT: vpcmpeqb %zmm1, %zmm0, %k0			; AVX512F-32-NEXT: vpcmpeqb %zmm1, %zmm0, %k0
	; AVX512F-32-NEXT: kmovq %k0, {{[0-9]+}}(%esp)			; AVX512F-32-NEXT: kmovq %k0, {{[0-9]+}}(%esp)
	; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %eax			; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: vpcmpgtb %zmm0, %zmm1, %k0			; AVX512F-32-NEXT: vpcmpgtb %zmm0, %zmm1, %k0
	; AVX512F-32-NEXT: kmovq %k0, {{[0-9]+}}(%esp)			; AVX512F-32-NEXT: kmovq %k0, {{[0-9]+}}(%esp)
	; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax			; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax
	Show All 14 Lines
	; AVX512F-32-NEXT: kmovq %k0, (%esp)			; AVX512F-32-NEXT: kmovq %k0, (%esp)
	; AVX512F-32-NEXT: addl (%esp), %eax			; AVX512F-32-NEXT: addl (%esp), %eax
	; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: kxnorq %k0, %k0, %k0			; AVX512F-32-NEXT: kxnorq %k0, %k0, %k0
	; AVX512F-32-NEXT: kmovq %k0, {{[0-9]+}}(%esp)			; AVX512F-32-NEXT: kmovq %k0, {{[0-9]+}}(%esp)
	; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax			; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax
	; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: addl $60, %esp			; AVX512F-32-NEXT: addl $60, %esp
				; AVX512F-32-NEXT: .Lcfi9:
				; AVX512F-32-NEXT: .cfi_def_cfa_offset 4
	; AVX512F-32-NEXT: retl			; AVX512F-32-NEXT: retl
	%res0 = call i64 @llvm.x86.avx512.mask.cmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 0, i64 -1)			%res0 = call i64 @llvm.x86.avx512.mask.cmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 0, i64 -1)
	%res1 = call i64 @llvm.x86.avx512.mask.cmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 1, i64 -1)			%res1 = call i64 @llvm.x86.avx512.mask.cmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 1, i64 -1)
	%ret1 = add i64 %res0, %res1			%ret1 = add i64 %res0, %res1
	%res2 = call i64 @llvm.x86.avx512.mask.cmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 2, i64 -1)			%res2 = call i64 @llvm.x86.avx512.mask.cmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 2, i64 -1)
	%ret2 = add i64 %ret1, %res2			%ret2 = add i64 %ret1, %res2
	%res3 = call i64 @llvm.x86.avx512.mask.cmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 3, i64 -1)			%res3 = call i64 @llvm.x86.avx512.mask.cmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 3, i64 -1)
	%ret3 = add i64 %ret2, %res3			%ret3 = add i64 %ret2, %res3
	Show All 33 Lines
	; AVX512BW-NEXT: kmovq %k0, %rax			; AVX512BW-NEXT: kmovq %k0, %rax
	; AVX512BW-NEXT: addq %rcx, %rax			; AVX512BW-NEXT: addq %rcx, %rax
	; AVX512BW-NEXT: addq %rdi, %rax			; AVX512BW-NEXT: addq %rdi, %rax
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512F-32-LABEL: test_mask_cmp_b_512:			; AVX512F-32-LABEL: test_mask_cmp_b_512:
	; AVX512F-32: # BB#0:			; AVX512F-32: # BB#0:
	; AVX512F-32-NEXT: pushl %ebx			; AVX512F-32-NEXT: pushl %ebx
	; AVX512F-32-NEXT: .Lcfi5:			; AVX512F-32-NEXT: .Lcfi10:
	; AVX512F-32-NEXT: .cfi_def_cfa_offset 8			; AVX512F-32-NEXT: .cfi_def_cfa_offset 8
	; AVX512F-32-NEXT: pushl %esi			; AVX512F-32-NEXT: pushl %esi
	; AVX512F-32-NEXT: .Lcfi6:			; AVX512F-32-NEXT: .Lcfi11:
	; AVX512F-32-NEXT: .cfi_def_cfa_offset 12			; AVX512F-32-NEXT: .cfi_def_cfa_offset 12
	; AVX512F-32-NEXT: subl $60, %esp			; AVX512F-32-NEXT: subl $60, %esp
	; AVX512F-32-NEXT: .Lcfi7:			; AVX512F-32-NEXT: .Lcfi12:
	; AVX512F-32-NEXT: .cfi_def_cfa_offset 72			; AVX512F-32-NEXT: .cfi_def_cfa_offset 72
	; AVX512F-32-NEXT: .Lcfi8:			; AVX512F-32-NEXT: .Lcfi13:
	; AVX512F-32-NEXT: .cfi_offset %esi, -12			; AVX512F-32-NEXT: .cfi_offset %esi, -12
	; AVX512F-32-NEXT: .Lcfi9:			; AVX512F-32-NEXT: .Lcfi14:
	; AVX512F-32-NEXT: .cfi_offset %ebx, -8			; AVX512F-32-NEXT: .cfi_offset %ebx, -8
	; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %ecx			; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; AVX512F-32-NEXT: movb %cl, %al			; AVX512F-32-NEXT: movb %cl, %al
	; AVX512F-32-NEXT: shrb $5, %al			; AVX512F-32-NEXT: shrb $5, %al
	; AVX512F-32-NEXT: andb $1, %al			; AVX512F-32-NEXT: andb $1, %al
	; AVX512F-32-NEXT: movb %cl, %bl			; AVX512F-32-NEXT: movb %cl, %bl
	; AVX512F-32-NEXT: andb $15, %bl			; AVX512F-32-NEXT: andb $15, %bl
	; AVX512F-32-NEXT: movb %cl, %dl			; AVX512F-32-NEXT: movb %cl, %dl
	▲ Show 20 Lines • Show All 726 Lines • ▼ Show 20 Lines
	; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax			; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax
	; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: kmovq %k1, {{[0-9]+}}(%esp)			; AVX512F-32-NEXT: kmovq %k1, {{[0-9]+}}(%esp)
	; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax			; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax
	; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: addl %esi, %eax			; AVX512F-32-NEXT: addl %esi, %eax
	; AVX512F-32-NEXT: adcxl %ecx, %edx			; AVX512F-32-NEXT: adcxl %ecx, %edx
	; AVX512F-32-NEXT: addl $60, %esp			; AVX512F-32-NEXT: addl $60, %esp
				; AVX512F-32-NEXT: .Lcfi15:
				; AVX512F-32-NEXT: .cfi_def_cfa_offset 12
	; AVX512F-32-NEXT: popl %esi			; AVX512F-32-NEXT: popl %esi
				; AVX512F-32-NEXT: .Lcfi16:
				; AVX512F-32-NEXT: .cfi_def_cfa_offset 8
	; AVX512F-32-NEXT: popl %ebx			; AVX512F-32-NEXT: popl %ebx
				; AVX512F-32-NEXT: .Lcfi17:
				; AVX512F-32-NEXT: .cfi_def_cfa_offset 4
	; AVX512F-32-NEXT: retl			; AVX512F-32-NEXT: retl
	%res0 = call i64 @llvm.x86.avx512.mask.cmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 0, i64 %mask)			%res0 = call i64 @llvm.x86.avx512.mask.cmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 0, i64 %mask)
	%res1 = call i64 @llvm.x86.avx512.mask.cmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 1, i64 %mask)			%res1 = call i64 @llvm.x86.avx512.mask.cmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 1, i64 %mask)
	%ret1 = add i64 %res0, %res1			%ret1 = add i64 %res0, %res1
	%res2 = call i64 @llvm.x86.avx512.mask.cmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 2, i64 %mask)			%res2 = call i64 @llvm.x86.avx512.mask.cmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 2, i64 %mask)
	%ret2 = add i64 %ret1, %res2			%ret2 = add i64 %ret1, %res2
	%res3 = call i64 @llvm.x86.avx512.mask.cmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 3, i64 %mask)			%res3 = call i64 @llvm.x86.avx512.mask.cmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 3, i64 %mask)
	%ret3 = add i64 %ret2, %res3			%ret3 = add i64 %ret2, %res3
	Show All 33 Lines
	; AVX512BW-NEXT: kxnorq %k0, %k0, %k0			; AVX512BW-NEXT: kxnorq %k0, %k0, %k0
	; AVX512BW-NEXT: kmovq %k0, %rax			; AVX512BW-NEXT: kmovq %k0, %rax
	; AVX512BW-NEXT: addq %rcx, %rax			; AVX512BW-NEXT: addq %rcx, %rax
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512F-32-LABEL: test_ucmp_b_512:			; AVX512F-32-LABEL: test_ucmp_b_512:
	; AVX512F-32: # BB#0:			; AVX512F-32: # BB#0:
	; AVX512F-32-NEXT: subl $60, %esp			; AVX512F-32-NEXT: subl $60, %esp
	; AVX512F-32-NEXT: .Lcfi10:			; AVX512F-32-NEXT: .Lcfi18:
	; AVX512F-32-NEXT: .cfi_def_cfa_offset 64			; AVX512F-32-NEXT: .cfi_def_cfa_offset 64
	; AVX512F-32-NEXT: vpcmpeqb %zmm1, %zmm0, %k0			; AVX512F-32-NEXT: vpcmpeqb %zmm1, %zmm0, %k0
	; AVX512F-32-NEXT: kmovq %k0, {{[0-9]+}}(%esp)			; AVX512F-32-NEXT: kmovq %k0, {{[0-9]+}}(%esp)
	; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %eax			; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: vpcmpltub %zmm1, %zmm0, %k0			; AVX512F-32-NEXT: vpcmpltub %zmm1, %zmm0, %k0
	; AVX512F-32-NEXT: kmovq %k0, {{[0-9]+}}(%esp)			; AVX512F-32-NEXT: kmovq %k0, {{[0-9]+}}(%esp)
	; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax			; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax
	Show All 14 Lines
	; AVX512F-32-NEXT: kmovq %k0, (%esp)			; AVX512F-32-NEXT: kmovq %k0, (%esp)
	; AVX512F-32-NEXT: addl (%esp), %eax			; AVX512F-32-NEXT: addl (%esp), %eax
	; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: kxnorq %k0, %k0, %k0			; AVX512F-32-NEXT: kxnorq %k0, %k0, %k0
	; AVX512F-32-NEXT: kmovq %k0, {{[0-9]+}}(%esp)			; AVX512F-32-NEXT: kmovq %k0, {{[0-9]+}}(%esp)
	; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax			; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax
	; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: addl $60, %esp			; AVX512F-32-NEXT: addl $60, %esp
				; AVX512F-32-NEXT: .Lcfi19:
				; AVX512F-32-NEXT: .cfi_def_cfa_offset 4
	; AVX512F-32-NEXT: retl			; AVX512F-32-NEXT: retl
	%res0 = call i64 @llvm.x86.avx512.mask.ucmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 0, i64 -1)			%res0 = call i64 @llvm.x86.avx512.mask.ucmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 0, i64 -1)
	%res1 = call i64 @llvm.x86.avx512.mask.ucmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 1, i64 -1)			%res1 = call i64 @llvm.x86.avx512.mask.ucmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 1, i64 -1)
	%ret1 = add i64 %res0, %res1			%ret1 = add i64 %res0, %res1
	%res2 = call i64 @llvm.x86.avx512.mask.ucmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 2, i64 -1)			%res2 = call i64 @llvm.x86.avx512.mask.ucmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 2, i64 -1)
	%ret2 = add i64 %ret1, %res2			%ret2 = add i64 %ret1, %res2
	%res3 = call i64 @llvm.x86.avx512.mask.ucmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 3, i64 -1)			%res3 = call i64 @llvm.x86.avx512.mask.ucmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 3, i64 -1)
	%ret3 = add i64 %ret2, %res3			%ret3 = add i64 %ret2, %res3
	Show All 33 Lines
	; AVX512BW-NEXT: kmovq %k0, %rax			; AVX512BW-NEXT: kmovq %k0, %rax
	; AVX512BW-NEXT: addq %rcx, %rax			; AVX512BW-NEXT: addq %rcx, %rax
	; AVX512BW-NEXT: addq %rdi, %rax			; AVX512BW-NEXT: addq %rdi, %rax
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512F-32-LABEL: test_mask_x86_avx512_ucmp_b_512:			; AVX512F-32-LABEL: test_mask_x86_avx512_ucmp_b_512:
	; AVX512F-32: # BB#0:			; AVX512F-32: # BB#0:
	; AVX512F-32-NEXT: pushl %ebx			; AVX512F-32-NEXT: pushl %ebx
	; AVX512F-32-NEXT: .Lcfi11:			; AVX512F-32-NEXT: .Lcfi20:
	; AVX512F-32-NEXT: .cfi_def_cfa_offset 8			; AVX512F-32-NEXT: .cfi_def_cfa_offset 8
	; AVX512F-32-NEXT: pushl %esi			; AVX512F-32-NEXT: pushl %esi
	; AVX512F-32-NEXT: .Lcfi12:			; AVX512F-32-NEXT: .Lcfi21:
	; AVX512F-32-NEXT: .cfi_def_cfa_offset 12			; AVX512F-32-NEXT: .cfi_def_cfa_offset 12
	; AVX512F-32-NEXT: subl $60, %esp			; AVX512F-32-NEXT: subl $60, %esp
	; AVX512F-32-NEXT: .Lcfi13:			; AVX512F-32-NEXT: .Lcfi22:
	; AVX512F-32-NEXT: .cfi_def_cfa_offset 72			; AVX512F-32-NEXT: .cfi_def_cfa_offset 72
	; AVX512F-32-NEXT: .Lcfi14:			; AVX512F-32-NEXT: .Lcfi23:
	; AVX512F-32-NEXT: .cfi_offset %esi, -12			; AVX512F-32-NEXT: .cfi_offset %esi, -12
	; AVX512F-32-NEXT: .Lcfi15:			; AVX512F-32-NEXT: .Lcfi24:
	; AVX512F-32-NEXT: .cfi_offset %ebx, -8			; AVX512F-32-NEXT: .cfi_offset %ebx, -8
	; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %ecx			; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; AVX512F-32-NEXT: movb %cl, %al			; AVX512F-32-NEXT: movb %cl, %al
	; AVX512F-32-NEXT: shrb $5, %al			; AVX512F-32-NEXT: shrb $5, %al
	; AVX512F-32-NEXT: andb $1, %al			; AVX512F-32-NEXT: andb $1, %al
	; AVX512F-32-NEXT: movb %cl, %bl			; AVX512F-32-NEXT: movb %cl, %bl
	; AVX512F-32-NEXT: andb $15, %bl			; AVX512F-32-NEXT: andb $15, %bl
	; AVX512F-32-NEXT: movb %cl, %dl			; AVX512F-32-NEXT: movb %cl, %dl
	▲ Show 20 Lines • Show All 726 Lines • ▼ Show 20 Lines
	; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax			; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax
	; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: kmovq %k1, {{[0-9]+}}(%esp)			; AVX512F-32-NEXT: kmovq %k1, {{[0-9]+}}(%esp)
	; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax			; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax
	; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: addl %esi, %eax			; AVX512F-32-NEXT: addl %esi, %eax
	; AVX512F-32-NEXT: adcxl %ecx, %edx			; AVX512F-32-NEXT: adcxl %ecx, %edx
	; AVX512F-32-NEXT: addl $60, %esp			; AVX512F-32-NEXT: addl $60, %esp
				; AVX512F-32-NEXT: .Lcfi25:
				; AVX512F-32-NEXT: .cfi_def_cfa_offset 12
	; AVX512F-32-NEXT: popl %esi			; AVX512F-32-NEXT: popl %esi
				; AVX512F-32-NEXT: .Lcfi26:
				; AVX512F-32-NEXT: .cfi_def_cfa_offset 8
	; AVX512F-32-NEXT: popl %ebx			; AVX512F-32-NEXT: popl %ebx
				; AVX512F-32-NEXT: .Lcfi27:
				; AVX512F-32-NEXT: .cfi_def_cfa_offset 4
	; AVX512F-32-NEXT: retl			; AVX512F-32-NEXT: retl
	%res0 = call i64 @llvm.x86.avx512.mask.ucmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 0, i64 %mask)			%res0 = call i64 @llvm.x86.avx512.mask.ucmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 0, i64 %mask)
	%res1 = call i64 @llvm.x86.avx512.mask.ucmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 1, i64 %mask)			%res1 = call i64 @llvm.x86.avx512.mask.ucmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 1, i64 %mask)
	%ret1 = add i64 %res0, %res1			%ret1 = add i64 %res0, %res1
	%res2 = call i64 @llvm.x86.avx512.mask.ucmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 2, i64 %mask)			%res2 = call i64 @llvm.x86.avx512.mask.ucmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 2, i64 %mask)
	%ret2 = add i64 %ret1, %res2			%ret2 = add i64 %ret1, %res2
	%res3 = call i64 @llvm.x86.avx512.mask.ucmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 3, i64 %mask)			%res3 = call i64 @llvm.x86.avx512.mask.ucmp.b.512(<64 x i8> %a0, <64 x i8> %a1, i32 3, i64 %mask)
	%ret3 = add i64 %ret2, %res3			%ret3 = add i64 %ret2, %res3
	▲ Show 20 Lines • Show All 290 Lines • Show Last 20 Lines

test/CodeGen/X86/avx512bw-intrinsics.ll

	Show First 20 Lines • Show All 1,594 Lines • ▼ Show 20 Lines
	; AVX512F-32-NEXT: .cfi_def_cfa_offset 16			; AVX512F-32-NEXT: .cfi_def_cfa_offset 16
	; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k0			; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k0
	; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1			; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
	; AVX512F-32-NEXT: kunpckdq %k0, %k1, %k0			; AVX512F-32-NEXT: kunpckdq %k0, %k1, %k0
	; AVX512F-32-NEXT: kmovq %k0, (%esp)			; AVX512F-32-NEXT: kmovq %k0, (%esp)
	; AVX512F-32-NEXT: movl (%esp), %eax			; AVX512F-32-NEXT: movl (%esp), %eax
	; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: addl $12, %esp			; AVX512F-32-NEXT: addl $12, %esp
				; AVX512F-32-NEXT: .Lcfi1:
				; AVX512F-32-NEXT: .cfi_def_cfa_offset 4
	; AVX512F-32-NEXT: retl			; AVX512F-32-NEXT: retl
	%res = call i64 @llvm.x86.avx512.kunpck.dq(i64 %x0, i64 %x1)			%res = call i64 @llvm.x86.avx512.kunpck.dq(i64 %x0, i64 %x1)
	ret i64 %res			ret i64 %res
	}			}

	declare i64 @llvm.x86.avx512.cvtb2mask.512(<64 x i8>)			declare i64 @llvm.x86.avx512.cvtb2mask.512(<64 x i8>)

	define i64@test_int_x86_avx512_cvtb2mask_512(<64 x i8> %x0) {			define i64@test_int_x86_avx512_cvtb2mask_512(<64 x i8> %x0) {
	; AVX512BW-LABEL: test_int_x86_avx512_cvtb2mask_512:			; AVX512BW-LABEL: test_int_x86_avx512_cvtb2mask_512:
	; AVX512BW: ## BB#0:			; AVX512BW: ## BB#0:
	; AVX512BW-NEXT: vpmovb2m %zmm0, %k0			; AVX512BW-NEXT: vpmovb2m %zmm0, %k0
	; AVX512BW-NEXT: kmovq %k0, %rax			; AVX512BW-NEXT: kmovq %k0, %rax
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512F-32-LABEL: test_int_x86_avx512_cvtb2mask_512:			; AVX512F-32-LABEL: test_int_x86_avx512_cvtb2mask_512:
	; AVX512F-32: # BB#0:			; AVX512F-32: # BB#0:
	; AVX512F-32-NEXT: subl $12, %esp			; AVX512F-32-NEXT: subl $12, %esp
	; AVX512F-32-NEXT: .Lcfi1:			; AVX512F-32-NEXT: .Lcfi2:
	; AVX512F-32-NEXT: .cfi_def_cfa_offset 16			; AVX512F-32-NEXT: .cfi_def_cfa_offset 16
	; AVX512F-32-NEXT: vpmovb2m %zmm0, %k0			; AVX512F-32-NEXT: vpmovb2m %zmm0, %k0
	; AVX512F-32-NEXT: kmovq %k0, (%esp)			; AVX512F-32-NEXT: kmovq %k0, (%esp)
	; AVX512F-32-NEXT: movl (%esp), %eax			; AVX512F-32-NEXT: movl (%esp), %eax
	; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: addl $12, %esp			; AVX512F-32-NEXT: addl $12, %esp
				; AVX512F-32-NEXT: .Lcfi3:
				; AVX512F-32-NEXT: .cfi_def_cfa_offset 4
	; AVX512F-32-NEXT: retl			; AVX512F-32-NEXT: retl
	%res = call i64 @llvm.x86.avx512.cvtb2mask.512(<64 x i8> %x0)			%res = call i64 @llvm.x86.avx512.cvtb2mask.512(<64 x i8> %x0)
	ret i64 %res			ret i64 %res
	}			}

	declare i32 @llvm.x86.avx512.cvtw2mask.512(<32 x i16>)			declare i32 @llvm.x86.avx512.cvtw2mask.512(<32 x i16>)

	define i32@test_int_x86_avx512_cvtw2mask_512(<32 x i16> %x0) {			define i32@test_int_x86_avx512_cvtw2mask_512(<32 x i16> %x0) {
	▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: vptestmb %zmm1, %zmm0, %k0			; AVX512BW-NEXT: vptestmb %zmm1, %zmm0, %k0
	; AVX512BW-NEXT: kmovq %k0, %rax			; AVX512BW-NEXT: kmovq %k0, %rax
	; AVX512BW-NEXT: addq %rcx, %rax			; AVX512BW-NEXT: addq %rcx, %rax
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512F-32-LABEL: test_int_x86_avx512_ptestm_b_512:			; AVX512F-32-LABEL: test_int_x86_avx512_ptestm_b_512:
	; AVX512F-32: # BB#0:			; AVX512F-32: # BB#0:
	; AVX512F-32-NEXT: subl $20, %esp			; AVX512F-32-NEXT: subl $20, %esp
	; AVX512F-32-NEXT: .Lcfi2:			; AVX512F-32-NEXT: .Lcfi4:
	; AVX512F-32-NEXT: .cfi_def_cfa_offset 24			; AVX512F-32-NEXT: .cfi_def_cfa_offset 24
	; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k0			; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k0
	; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1			; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
	; AVX512F-32-NEXT: kunpckdq %k0, %k1, %k1			; AVX512F-32-NEXT: kunpckdq %k0, %k1, %k1
	; AVX512F-32-NEXT: vptestmb %zmm1, %zmm0, %k0 {%k1}			; AVX512F-32-NEXT: vptestmb %zmm1, %zmm0, %k0 {%k1}
	; AVX512F-32-NEXT: kmovq %k0, (%esp)			; AVX512F-32-NEXT: kmovq %k0, (%esp)
	; AVX512F-32-NEXT: vptestmb %zmm1, %zmm0, %k0			; AVX512F-32-NEXT: vptestmb %zmm1, %zmm0, %k0
	; AVX512F-32-NEXT: kmovq %k0, {{[0-9]+}}(%esp)			; AVX512F-32-NEXT: kmovq %k0, {{[0-9]+}}(%esp)
	; AVX512F-32-NEXT: movl (%esp), %eax			; AVX512F-32-NEXT: movl (%esp), %eax
	; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax			; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax
	; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: addl $20, %esp			; AVX512F-32-NEXT: addl $20, %esp
				; AVX512F-32-NEXT: .Lcfi5:
				; AVX512F-32-NEXT: .cfi_def_cfa_offset 4
	; AVX512F-32-NEXT: retl			; AVX512F-32-NEXT: retl
	%res = call i64 @llvm.x86.avx512.ptestm.b.512(<64 x i8> %x0, <64 x i8> %x1, i64 %x2)			%res = call i64 @llvm.x86.avx512.ptestm.b.512(<64 x i8> %x0, <64 x i8> %x1, i64 %x2)
	%res1 = call i64 @llvm.x86.avx512.ptestm.b.512(<64 x i8> %x0, <64 x i8> %x1, i64-1)			%res1 = call i64 @llvm.x86.avx512.ptestm.b.512(<64 x i8> %x0, <64 x i8> %x1, i64-1)
	%res2 = add i64 %res, %res1			%res2 = add i64 %res, %res1
	ret i64 %res2			ret i64 %res2
	}			}

	declare i32 @llvm.x86.avx512.ptestm.w.512(<32 x i16>, <32 x i16>, i32)			declare i32 @llvm.x86.avx512.ptestm.w.512(<32 x i16>, <32 x i16>, i32)
	Show All 35 Lines
	; AVX512BW-NEXT: vptestnmb %zmm1, %zmm0, %k0			; AVX512BW-NEXT: vptestnmb %zmm1, %zmm0, %k0
	; AVX512BW-NEXT: kmovq %k0, %rax			; AVX512BW-NEXT: kmovq %k0, %rax
	; AVX512BW-NEXT: addq %rcx, %rax			; AVX512BW-NEXT: addq %rcx, %rax
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
	;			;
	; AVX512F-32-LABEL: test_int_x86_avx512_ptestnm_b_512:			; AVX512F-32-LABEL: test_int_x86_avx512_ptestnm_b_512:
	; AVX512F-32: # BB#0:			; AVX512F-32: # BB#0:
	; AVX512F-32-NEXT: subl $20, %esp			; AVX512F-32-NEXT: subl $20, %esp
	; AVX512F-32-NEXT: .Lcfi3:			; AVX512F-32-NEXT: .Lcfi6:
	; AVX512F-32-NEXT: .cfi_def_cfa_offset 24			; AVX512F-32-NEXT: .cfi_def_cfa_offset 24
	; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k0			; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k0
	; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1			; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
	; AVX512F-32-NEXT: kunpckdq %k0, %k1, %k1			; AVX512F-32-NEXT: kunpckdq %k0, %k1, %k1
	; AVX512F-32-NEXT: vptestnmb %zmm1, %zmm0, %k0 {%k1}			; AVX512F-32-NEXT: vptestnmb %zmm1, %zmm0, %k0 {%k1}
	; AVX512F-32-NEXT: kmovq %k0, (%esp)			; AVX512F-32-NEXT: kmovq %k0, (%esp)
	; AVX512F-32-NEXT: vptestnmb %zmm1, %zmm0, %k0			; AVX512F-32-NEXT: vptestnmb %zmm1, %zmm0, %k0
	; AVX512F-32-NEXT: kmovq %k0, {{[0-9]+}}(%esp)			; AVX512F-32-NEXT: kmovq %k0, {{[0-9]+}}(%esp)
	; AVX512F-32-NEXT: movl (%esp), %eax			; AVX512F-32-NEXT: movl (%esp), %eax
	; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax			; AVX512F-32-NEXT: addl {{[0-9]+}}(%esp), %eax
	; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx			; AVX512F-32-NEXT: adcxl {{[0-9]+}}(%esp), %edx
	; AVX512F-32-NEXT: addl $20, %esp			; AVX512F-32-NEXT: addl $20, %esp
				; AVX512F-32-NEXT: .Lcfi7:
				; AVX512F-32-NEXT: .cfi_def_cfa_offset 4
	; AVX512F-32-NEXT: retl			; AVX512F-32-NEXT: retl
	%res = call i64 @llvm.x86.avx512.ptestnm.b.512(<64 x i8> %x0, <64 x i8> %x1, i64 %x2)			%res = call i64 @llvm.x86.avx512.ptestnm.b.512(<64 x i8> %x0, <64 x i8> %x1, i64 %x2)
	%res1 = call i64 @llvm.x86.avx512.ptestnm.b.512(<64 x i8> %x0, <64 x i8> %x1, i64-1)			%res1 = call i64 @llvm.x86.avx512.ptestnm.b.512(<64 x i8> %x0, <64 x i8> %x1, i64-1)
	%res2 = add i64 %res, %res1			%res2 = add i64 %res, %res1
	ret i64 %res2			ret i64 %res2
	}			}

	declare i32 @llvm.x86.avx512.ptestnm.w.512(<32 x i16>, <32 x i16>, i32 %x2)			declare i32 @llvm.x86.avx512.ptestnm.w.512(<32 x i16>, <32 x i16>, i32 %x2)
	▲ Show 20 Lines • Show All 402 Lines • Show Last 20 Lines

test/CodeGen/X86/avx512vl-intrinsics-fast-isel.ll

Show All 27 Lines
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vpbroadcastd %xmm1, %xmm0 {%k1}		; X32-NEXT: vpbroadcastd %xmm1, %xmm0 {%k1}
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi1:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_mask_broadcastd_epi32:		; X64-LABEL: test_mm_mask_broadcastd_epi32:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
; X64-NEXT: vpbroadcastd %xmm1, %xmm0 {%k1}		; X64-NEXT: vpbroadcastd %xmm1, %xmm0 {%k1}
; X64-NEXT: retq		; X64-NEXT: retq
%trn1 = trunc i8 %a1 to i4		%trn1 = trunc i8 %a1 to i4
%arg0 = bitcast <2 x i64> %a0 to <4 x i32>		%arg0 = bitcast <2 x i64> %a0 to <4 x i32>
%arg1 = bitcast i4 %trn1 to <4 x i1>		%arg1 = bitcast i4 %trn1 to <4 x i1>
%arg2 = bitcast <2 x i64> %a2 to <4 x i32>		%arg2 = bitcast <2 x i64> %a2 to <4 x i32>
%res0 = shufflevector <4 x i32> %arg2, <4 x i32> undef, <4 x i32> zeroinitializer		%res0 = shufflevector <4 x i32> %arg2, <4 x i32> undef, <4 x i32> zeroinitializer
%res1 = select <4 x i1> %arg1, <4 x i32> %res0, <4 x i32> %arg0		%res1 = select <4 x i1> %arg1, <4 x i32> %res0, <4 x i32> %arg0
%res2 = bitcast <4 x i32> %res1 to <2 x i64>		%res2 = bitcast <4 x i32> %res1 to <2 x i64>
ret <2 x i64> %res2		ret <2 x i64> %res2
}		}

define <2 x i64> @test_mm_maskz_broadcastd_epi32(i8 %a0, <2 x i64> %a1) {		define <2 x i64> @test_mm_maskz_broadcastd_epi32(i8 %a0, <2 x i64> %a1) {
; X32-LABEL: test_mm_maskz_broadcastd_epi32:		; X32-LABEL: test_mm_maskz_broadcastd_epi32:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi1:		; X32-NEXT: .Lcfi2:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vpbroadcastd %xmm0, %xmm0 {%k1} {z}		; X32-NEXT: vpbroadcastd %xmm0, %xmm0 {%k1} {z}
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi3:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_maskz_broadcastd_epi32:		; X64-LABEL: test_mm_maskz_broadcastd_epi32:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%res = shufflevector <2 x i64> %a0, <2 x i64> undef, <2 x i32> zeroinitializer		%res = shufflevector <2 x i64> %a0, <2 x i64> undef, <2 x i32> zeroinitializer
ret <2 x i64> %res		ret <2 x i64> %res
}		}

define <2 x i64> @test_mm_mask_broadcastq_epi64(<2 x i64> %a0, i8 %a1, <2 x i64> %a2) {		define <2 x i64> @test_mm_mask_broadcastq_epi64(<2 x i64> %a0, i8 %a1, <2 x i64> %a2) {
; X32-LABEL: test_mm_mask_broadcastq_epi64:		; X32-LABEL: test_mm_mask_broadcastq_epi64:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi2:		; X32-NEXT: .Lcfi4:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $3, %al		; X32-NEXT: andb $3, %al
; X32-NEXT: movb %al, {{[0-9]+}}(%esp)		; X32-NEXT: movb %al, {{[0-9]+}}(%esp)
; X32-NEXT: movzbl {{[0-9]+}}(%esp), %eax		; X32-NEXT: movzbl {{[0-9]+}}(%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vpbroadcastq %xmm1, %xmm0 {%k1}		; X32-NEXT: vpbroadcastq %xmm1, %xmm0 {%k1}
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi5:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_mask_broadcastq_epi64:		; X64-LABEL: test_mm_mask_broadcastq_epi64:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $3, %dil		; X64-NEXT: andb $3, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
; X64-NEXT: vpbroadcastq %xmm1, %xmm0 {%k1}		; X64-NEXT: vpbroadcastq %xmm1, %xmm0 {%k1}
; X64-NEXT: retq		; X64-NEXT: retq
%trn1 = trunc i8 %a1 to i2		%trn1 = trunc i8 %a1 to i2
%arg1 = bitcast i2 %trn1 to <2 x i1>		%arg1 = bitcast i2 %trn1 to <2 x i1>
%res0 = shufflevector <2 x i64> %a2, <2 x i64> undef, <2 x i32> zeroinitializer		%res0 = shufflevector <2 x i64> %a2, <2 x i64> undef, <2 x i32> zeroinitializer
%res1 = select <2 x i1> %arg1, <2 x i64> %res0, <2 x i64> %a0		%res1 = select <2 x i1> %arg1, <2 x i64> %res0, <2 x i64> %a0
ret <2 x i64> %res1		ret <2 x i64> %res1
}		}

define <2 x i64> @test_mm_maskz_broadcastq_epi64(i8 %a0, <2 x i64> %a1) {		define <2 x i64> @test_mm_maskz_broadcastq_epi64(i8 %a0, <2 x i64> %a1) {
; X32-LABEL: test_mm_maskz_broadcastq_epi64:		; X32-LABEL: test_mm_maskz_broadcastq_epi64:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi3:		; X32-NEXT: .Lcfi6:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $3, %al		; X32-NEXT: andb $3, %al
; X32-NEXT: movb %al, {{[0-9]+}}(%esp)		; X32-NEXT: movb %al, {{[0-9]+}}(%esp)
; X32-NEXT: movzbl {{[0-9]+}}(%esp), %eax		; X32-NEXT: movzbl {{[0-9]+}}(%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vpbroadcastq %xmm0, %xmm0 {%k1} {z}		; X32-NEXT: vpbroadcastq %xmm0, %xmm0 {%k1} {z}
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi7:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_maskz_broadcastq_epi64:		; X64-LABEL: test_mm_maskz_broadcastq_epi64:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $3, %dil		; X64-NEXT: andb $3, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
Show All 19 Lines	; X64-NEXT: retq
%res = shufflevector <2 x i64> %a0, <2 x i64> undef, <4 x i32> zeroinitializer		%res = shufflevector <2 x i64> %a0, <2 x i64> undef, <4 x i32> zeroinitializer
ret <4 x i64> %res		ret <4 x i64> %res
}		}

define <4 x i64> @test_mm256_mask_broadcastq_epi64(<4 x i64> %a0, i8 %a1, <2 x i64> %a2) {		define <4 x i64> @test_mm256_mask_broadcastq_epi64(<4 x i64> %a0, i8 %a1, <2 x i64> %a2) {
; X32-LABEL: test_mm256_mask_broadcastq_epi64:		; X32-LABEL: test_mm256_mask_broadcastq_epi64:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi4:		; X32-NEXT: .Lcfi8:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vpbroadcastq %xmm1, %ymm0 {%k1}		; X32-NEXT: vpbroadcastq %xmm1, %ymm0 {%k1}
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi9:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm256_mask_broadcastq_epi64:		; X64-LABEL: test_mm256_mask_broadcastq_epi64:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
; X64-NEXT: vpbroadcastq %xmm1, %ymm0 {%k1}		; X64-NEXT: vpbroadcastq %xmm1, %ymm0 {%k1}
; X64-NEXT: retq		; X64-NEXT: retq
%trn1 = trunc i8 %a1 to i4		%trn1 = trunc i8 %a1 to i4
%arg1 = bitcast i4 %trn1 to <4 x i1>		%arg1 = bitcast i4 %trn1 to <4 x i1>
%res0 = shufflevector <2 x i64> %a2, <2 x i64> undef, <4 x i32> zeroinitializer		%res0 = shufflevector <2 x i64> %a2, <2 x i64> undef, <4 x i32> zeroinitializer
%res1 = select <4 x i1> %arg1, <4 x i64> %res0, <4 x i64> %a0		%res1 = select <4 x i1> %arg1, <4 x i64> %res0, <4 x i64> %a0
ret <4 x i64> %res1		ret <4 x i64> %res1
}		}

define <4 x i64> @test_mm256_maskz_broadcastq_epi64(i8 %a0, <2 x i64> %a1) {		define <4 x i64> @test_mm256_maskz_broadcastq_epi64(i8 %a0, <2 x i64> %a1) {
; X32-LABEL: test_mm256_maskz_broadcastq_epi64:		; X32-LABEL: test_mm256_maskz_broadcastq_epi64:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi5:		; X32-NEXT: .Lcfi10:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vpbroadcastq %xmm0, %ymm0 {%k1} {z}		; X32-NEXT: vpbroadcastq %xmm0, %ymm0 {%k1} {z}
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi11:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm256_maskz_broadcastq_epi64:		; X64-LABEL: test_mm256_maskz_broadcastq_epi64:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
Show All 19 Lines	; X64-NEXT: retq
%res = shufflevector <2 x double> %a0, <2 x double> undef, <2 x i32> zeroinitializer		%res = shufflevector <2 x double> %a0, <2 x double> undef, <2 x i32> zeroinitializer
ret <2 x double> %res		ret <2 x double> %res
}		}

define <2 x double> @test_mm_mask_broadcastsd_pd(<2 x double> %a0, i8 %a1, <2 x double> %a2) {		define <2 x double> @test_mm_mask_broadcastsd_pd(<2 x double> %a0, i8 %a1, <2 x double> %a2) {
; X32-LABEL: test_mm_mask_broadcastsd_pd:		; X32-LABEL: test_mm_mask_broadcastsd_pd:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi6:		; X32-NEXT: .Lcfi12:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $3, %al		; X32-NEXT: andb $3, %al
; X32-NEXT: movb %al, {{[0-9]+}}(%esp)		; X32-NEXT: movb %al, {{[0-9]+}}(%esp)
; X32-NEXT: movzbl {{[0-9]+}}(%esp), %eax		; X32-NEXT: movzbl {{[0-9]+}}(%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vmovddup {{.*#+}} xmm0 {%k1} = xmm1[0,0]		; X32-NEXT: vmovddup {{.*#+}} xmm0 {%k1} = xmm1[0,0]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi13:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_mask_broadcastsd_pd:		; X64-LABEL: test_mm_mask_broadcastsd_pd:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $3, %dil		; X64-NEXT: andb $3, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
; X64-NEXT: vmovddup {{.*#+}} xmm0 {%k1} = xmm1[0,0]		; X64-NEXT: vmovddup {{.*#+}} xmm0 {%k1} = xmm1[0,0]
; X64-NEXT: retq		; X64-NEXT: retq
%trn1 = trunc i8 %a1 to i2		%trn1 = trunc i8 %a1 to i2
%arg1 = bitcast i2 %trn1 to <2 x i1>		%arg1 = bitcast i2 %trn1 to <2 x i1>
%res0 = shufflevector <2 x double> %a2, <2 x double> undef, <2 x i32> zeroinitializer		%res0 = shufflevector <2 x double> %a2, <2 x double> undef, <2 x i32> zeroinitializer
%res1 = select <2 x i1> %arg1, <2 x double> %res0, <2 x double> %a0		%res1 = select <2 x i1> %arg1, <2 x double> %res0, <2 x double> %a0
ret <2 x double> %res1		ret <2 x double> %res1
}		}

define <2 x double> @test_mm_maskz_broadcastsd_pd(i8 %a0, <2 x double> %a1) {		define <2 x double> @test_mm_maskz_broadcastsd_pd(i8 %a0, <2 x double> %a1) {
; X32-LABEL: test_mm_maskz_broadcastsd_pd:		; X32-LABEL: test_mm_maskz_broadcastsd_pd:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi7:		; X32-NEXT: .Lcfi14:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $3, %al		; X32-NEXT: andb $3, %al
; X32-NEXT: movb %al, {{[0-9]+}}(%esp)		; X32-NEXT: movb %al, {{[0-9]+}}(%esp)
; X32-NEXT: movzbl {{[0-9]+}}(%esp), %eax		; X32-NEXT: movzbl {{[0-9]+}}(%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vmovddup {{.*#+}} xmm0 {%k1} {z} = xmm0[0,0]		; X32-NEXT: vmovddup {{.*#+}} xmm0 {%k1} {z} = xmm0[0,0]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi15:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_maskz_broadcastsd_pd:		; X64-LABEL: test_mm_maskz_broadcastsd_pd:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $3, %dil		; X64-NEXT: andb $3, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
Show All 19 Lines	; X64-NEXT: retq
%res = shufflevector <2 x double> %a0, <2 x double> undef, <4 x i32> zeroinitializer		%res = shufflevector <2 x double> %a0, <2 x double> undef, <4 x i32> zeroinitializer
ret <4 x double> %res		ret <4 x double> %res
}		}

define <4 x double> @test_mm256_mask_broadcastsd_pd(<4 x double> %a0, i8 %a1, <2 x double> %a2) {		define <4 x double> @test_mm256_mask_broadcastsd_pd(<4 x double> %a0, i8 %a1, <2 x double> %a2) {
; X32-LABEL: test_mm256_mask_broadcastsd_pd:		; X32-LABEL: test_mm256_mask_broadcastsd_pd:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi8:		; X32-NEXT: .Lcfi16:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vbroadcastsd %xmm1, %ymm0 {%k1}		; X32-NEXT: vbroadcastsd %xmm1, %ymm0 {%k1}
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi17:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm256_mask_broadcastsd_pd:		; X64-LABEL: test_mm256_mask_broadcastsd_pd:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
; X64-NEXT: vbroadcastsd %xmm1, %ymm0 {%k1}		; X64-NEXT: vbroadcastsd %xmm1, %ymm0 {%k1}
; X64-NEXT: retq		; X64-NEXT: retq
%trn1 = trunc i8 %a1 to i4		%trn1 = trunc i8 %a1 to i4
%arg1 = bitcast i4 %trn1 to <4 x i1>		%arg1 = bitcast i4 %trn1 to <4 x i1>
%res0 = shufflevector <2 x double> %a2, <2 x double> undef, <4 x i32> zeroinitializer		%res0 = shufflevector <2 x double> %a2, <2 x double> undef, <4 x i32> zeroinitializer
%res1 = select <4 x i1> %arg1, <4 x double> %res0, <4 x double> %a0		%res1 = select <4 x i1> %arg1, <4 x double> %res0, <4 x double> %a0
ret <4 x double> %res1		ret <4 x double> %res1
}		}

define <4 x double> @test_mm256_maskz_broadcastsd_pd(i8 %a0, <2 x double> %a1) {		define <4 x double> @test_mm256_maskz_broadcastsd_pd(i8 %a0, <2 x double> %a1) {
; X32-LABEL: test_mm256_maskz_broadcastsd_pd:		; X32-LABEL: test_mm256_maskz_broadcastsd_pd:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi9:		; X32-NEXT: .Lcfi18:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vbroadcastsd %xmm0, %ymm0 {%k1} {z}		; X32-NEXT: vbroadcastsd %xmm0, %ymm0 {%k1} {z}
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi19:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm256_maskz_broadcastsd_pd:		; X64-LABEL: test_mm256_maskz_broadcastsd_pd:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
Show All 19 Lines	; X64-NEXT: retq
%res = shufflevector <4 x float> %a0, <4 x float> undef, <4 x i32> zeroinitializer		%res = shufflevector <4 x float> %a0, <4 x float> undef, <4 x i32> zeroinitializer
ret <4 x float> %res		ret <4 x float> %res
}		}

define <4 x float> @test_mm_mask_broadcastss_ps(<4 x float> %a0, i8 %a1, <4 x float> %a2) {		define <4 x float> @test_mm_mask_broadcastss_ps(<4 x float> %a0, i8 %a1, <4 x float> %a2) {
; X32-LABEL: test_mm_mask_broadcastss_ps:		; X32-LABEL: test_mm_mask_broadcastss_ps:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi10:		; X32-NEXT: .Lcfi20:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vbroadcastss %xmm1, %xmm0 {%k1}		; X32-NEXT: vbroadcastss %xmm1, %xmm0 {%k1}
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi21:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_mask_broadcastss_ps:		; X64-LABEL: test_mm_mask_broadcastss_ps:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
; X64-NEXT: vbroadcastss %xmm1, %xmm0 {%k1}		; X64-NEXT: vbroadcastss %xmm1, %xmm0 {%k1}
; X64-NEXT: retq		; X64-NEXT: retq
%trn1 = trunc i8 %a1 to i4		%trn1 = trunc i8 %a1 to i4
%arg1 = bitcast i4 %trn1 to <4 x i1>		%arg1 = bitcast i4 %trn1 to <4 x i1>
%res0 = shufflevector <4 x float> %a2, <4 x float> undef, <4 x i32> zeroinitializer		%res0 = shufflevector <4 x float> %a2, <4 x float> undef, <4 x i32> zeroinitializer
%res1 = select <4 x i1> %arg1, <4 x float> %res0, <4 x float> %a0		%res1 = select <4 x i1> %arg1, <4 x float> %res0, <4 x float> %a0
ret <4 x float> %res1		ret <4 x float> %res1
}		}

define <4 x float> @test_mm_maskz_broadcastss_ps(i8 %a0, <4 x float> %a1) {		define <4 x float> @test_mm_maskz_broadcastss_ps(i8 %a0, <4 x float> %a1) {
; X32-LABEL: test_mm_maskz_broadcastss_ps:		; X32-LABEL: test_mm_maskz_broadcastss_ps:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi11:		; X32-NEXT: .Lcfi22:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vbroadcastss %xmm0, %xmm0 {%k1} {z}		; X32-NEXT: vbroadcastss %xmm0, %xmm0 {%k1} {z}
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi23:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_maskz_broadcastss_ps:		; X64-LABEL: test_mm_maskz_broadcastss_ps:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%res = shufflevector <2 x double> %a0, <2 x double> undef, <2 x i32> zeroinitializer		%res = shufflevector <2 x double> %a0, <2 x double> undef, <2 x i32> zeroinitializer
ret <2 x double> %res		ret <2 x double> %res
}		}

define <2 x double> @test_mm_mask_movddup_pd(<2 x double> %a0, i8 %a1, <2 x double> %a2) {		define <2 x double> @test_mm_mask_movddup_pd(<2 x double> %a0, i8 %a1, <2 x double> %a2) {
; X32-LABEL: test_mm_mask_movddup_pd:		; X32-LABEL: test_mm_mask_movddup_pd:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi12:		; X32-NEXT: .Lcfi24:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $3, %al		; X32-NEXT: andb $3, %al
; X32-NEXT: movb %al, {{[0-9]+}}(%esp)		; X32-NEXT: movb %al, {{[0-9]+}}(%esp)
; X32-NEXT: movzbl {{[0-9]+}}(%esp), %eax		; X32-NEXT: movzbl {{[0-9]+}}(%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vmovddup {{.*#+}} xmm0 {%k1} = xmm1[0,0]		; X32-NEXT: vmovddup {{.*#+}} xmm0 {%k1} = xmm1[0,0]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi25:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_mask_movddup_pd:		; X64-LABEL: test_mm_mask_movddup_pd:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $3, %dil		; X64-NEXT: andb $3, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
; X64-NEXT: vmovddup {{.*#+}} xmm0 {%k1} = xmm1[0,0]		; X64-NEXT: vmovddup {{.*#+}} xmm0 {%k1} = xmm1[0,0]
; X64-NEXT: retq		; X64-NEXT: retq
%trn1 = trunc i8 %a1 to i2		%trn1 = trunc i8 %a1 to i2
%arg1 = bitcast i2 %trn1 to <2 x i1>		%arg1 = bitcast i2 %trn1 to <2 x i1>
%res0 = shufflevector <2 x double> %a2, <2 x double> undef, <2 x i32> zeroinitializer		%res0 = shufflevector <2 x double> %a2, <2 x double> undef, <2 x i32> zeroinitializer
%res1 = select <2 x i1> %arg1, <2 x double> %res0, <2 x double> %a0		%res1 = select <2 x i1> %arg1, <2 x double> %res0, <2 x double> %a0
ret <2 x double> %res1		ret <2 x double> %res1
}		}

define <2 x double> @test_mm_maskz_movddup_pd(i8 %a0, <2 x double> %a1) {		define <2 x double> @test_mm_maskz_movddup_pd(i8 %a0, <2 x double> %a1) {
; X32-LABEL: test_mm_maskz_movddup_pd:		; X32-LABEL: test_mm_maskz_movddup_pd:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi13:		; X32-NEXT: .Lcfi26:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $3, %al		; X32-NEXT: andb $3, %al
; X32-NEXT: movb %al, {{[0-9]+}}(%esp)		; X32-NEXT: movb %al, {{[0-9]+}}(%esp)
; X32-NEXT: movzbl {{[0-9]+}}(%esp), %eax		; X32-NEXT: movzbl {{[0-9]+}}(%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vmovddup {{.*#+}} xmm0 {%k1} {z} = xmm0[0,0]		; X32-NEXT: vmovddup {{.*#+}} xmm0 {%k1} {z} = xmm0[0,0]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi27:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_maskz_movddup_pd:		; X64-LABEL: test_mm_maskz_movddup_pd:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $3, %dil		; X64-NEXT: andb $3, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
Show All 19 Lines	; X64-NEXT: retq
%res = shufflevector <4 x double> %a0, <4 x double> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>		%res = shufflevector <4 x double> %a0, <4 x double> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>
ret <4 x double> %res		ret <4 x double> %res
}		}

define <4 x double> @test_mm256_mask_movddup_pd(<4 x double> %a0, i8 %a1, <4 x double> %a2) {		define <4 x double> @test_mm256_mask_movddup_pd(<4 x double> %a0, i8 %a1, <4 x double> %a2) {
; X32-LABEL: test_mm256_mask_movddup_pd:		; X32-LABEL: test_mm256_mask_movddup_pd:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi14:		; X32-NEXT: .Lcfi28:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vmovddup {{.*#+}} ymm0 {%k1} = ymm1[0,0,2,2]		; X32-NEXT: vmovddup {{.*#+}} ymm0 {%k1} = ymm1[0,0,2,2]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi29:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm256_mask_movddup_pd:		; X64-LABEL: test_mm256_mask_movddup_pd:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
; X64-NEXT: vmovddup {{.*#+}} ymm0 {%k1} = ymm1[0,0,2,2]		; X64-NEXT: vmovddup {{.*#+}} ymm0 {%k1} = ymm1[0,0,2,2]
; X64-NEXT: retq		; X64-NEXT: retq
%trn1 = trunc i8 %a1 to i4		%trn1 = trunc i8 %a1 to i4
%arg1 = bitcast i4 %trn1 to <4 x i1>		%arg1 = bitcast i4 %trn1 to <4 x i1>
%res0 = shufflevector <4 x double> %a2, <4 x double> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>		%res0 = shufflevector <4 x double> %a2, <4 x double> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>
%res1 = select <4 x i1> %arg1, <4 x double> %res0, <4 x double> %a0		%res1 = select <4 x i1> %arg1, <4 x double> %res0, <4 x double> %a0
ret <4 x double> %res1		ret <4 x double> %res1
}		}

define <4 x double> @test_mm256_maskz_movddup_pd(i8 %a0, <4 x double> %a1) {		define <4 x double> @test_mm256_maskz_movddup_pd(i8 %a0, <4 x double> %a1) {
; X32-LABEL: test_mm256_maskz_movddup_pd:		; X32-LABEL: test_mm256_maskz_movddup_pd:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi15:		; X32-NEXT: .Lcfi30:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vmovddup {{.*#+}} ymm0 {%k1} {z} = ymm0[0,0,2,2]		; X32-NEXT: vmovddup {{.*#+}} ymm0 {%k1} {z} = ymm0[0,0,2,2]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi31:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm256_maskz_movddup_pd:		; X64-LABEL: test_mm256_maskz_movddup_pd:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
Show All 19 Lines	; X64-NEXT: retq
%res = shufflevector <4 x float> %a0, <4 x float> undef, <4 x i32> <i32 1, i32 1, i32 3, i32 3>		%res = shufflevector <4 x float> %a0, <4 x float> undef, <4 x i32> <i32 1, i32 1, i32 3, i32 3>
ret <4 x float> %res		ret <4 x float> %res
}		}

define <4 x float> @test_mm_mask_movehdup_ps(<4 x float> %a0, i8 %a1, <4 x float> %a2) {		define <4 x float> @test_mm_mask_movehdup_ps(<4 x float> %a0, i8 %a1, <4 x float> %a2) {
; X32-LABEL: test_mm_mask_movehdup_ps:		; X32-LABEL: test_mm_mask_movehdup_ps:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi16:		; X32-NEXT: .Lcfi32:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vmovshdup {{.*#+}} xmm0 {%k1} = xmm1[1,1,3,3]		; X32-NEXT: vmovshdup {{.*#+}} xmm0 {%k1} = xmm1[1,1,3,3]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi33:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_mask_movehdup_ps:		; X64-LABEL: test_mm_mask_movehdup_ps:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
; X64-NEXT: vmovshdup {{.*#+}} xmm0 {%k1} = xmm1[1,1,3,3]		; X64-NEXT: vmovshdup {{.*#+}} xmm0 {%k1} = xmm1[1,1,3,3]
; X64-NEXT: retq		; X64-NEXT: retq
%trn1 = trunc i8 %a1 to i4		%trn1 = trunc i8 %a1 to i4
%arg1 = bitcast i4 %trn1 to <4 x i1>		%arg1 = bitcast i4 %trn1 to <4 x i1>
%res0 = shufflevector <4 x float> %a2, <4 x float> undef, <4 x i32> <i32 1, i32 1, i32 3, i32 3>		%res0 = shufflevector <4 x float> %a2, <4 x float> undef, <4 x i32> <i32 1, i32 1, i32 3, i32 3>
%res1 = select <4 x i1> %arg1, <4 x float> %res0, <4 x float> %a0		%res1 = select <4 x i1> %arg1, <4 x float> %res0, <4 x float> %a0
ret <4 x float> %res1		ret <4 x float> %res1
}		}

define <4 x float> @test_mm_maskz_movehdup_ps(i8 %a0, <4 x float> %a1) {		define <4 x float> @test_mm_maskz_movehdup_ps(i8 %a0, <4 x float> %a1) {
; X32-LABEL: test_mm_maskz_movehdup_ps:		; X32-LABEL: test_mm_maskz_movehdup_ps:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi17:		; X32-NEXT: .Lcfi34:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vmovshdup {{.*#+}} xmm0 {%k1} {z} = xmm0[1,1,3,3]		; X32-NEXT: vmovshdup {{.*#+}} xmm0 {%k1} {z} = xmm0[1,1,3,3]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi35:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_maskz_movehdup_ps:		; X64-LABEL: test_mm_maskz_movehdup_ps:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%res = shufflevector <4 x float> %a0, <4 x float> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>		%res = shufflevector <4 x float> %a0, <4 x float> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>
ret <4 x float> %res		ret <4 x float> %res
}		}

define <4 x float> @test_mm_mask_moveldup_ps(<4 x float> %a0, i8 %a1, <4 x float> %a2) {		define <4 x float> @test_mm_mask_moveldup_ps(<4 x float> %a0, i8 %a1, <4 x float> %a2) {
; X32-LABEL: test_mm_mask_moveldup_ps:		; X32-LABEL: test_mm_mask_moveldup_ps:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi18:		; X32-NEXT: .Lcfi36:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vmovsldup {{.*#+}} xmm0 {%k1} = xmm1[0,0,2,2]		; X32-NEXT: vmovsldup {{.*#+}} xmm0 {%k1} = xmm1[0,0,2,2]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi37:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_mask_moveldup_ps:		; X64-LABEL: test_mm_mask_moveldup_ps:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
; X64-NEXT: vmovsldup {{.*#+}} xmm0 {%k1} = xmm1[0,0,2,2]		; X64-NEXT: vmovsldup {{.*#+}} xmm0 {%k1} = xmm1[0,0,2,2]
; X64-NEXT: retq		; X64-NEXT: retq
%trn1 = trunc i8 %a1 to i4		%trn1 = trunc i8 %a1 to i4
%arg1 = bitcast i4 %trn1 to <4 x i1>		%arg1 = bitcast i4 %trn1 to <4 x i1>
%res0 = shufflevector <4 x float> %a2, <4 x float> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>		%res0 = shufflevector <4 x float> %a2, <4 x float> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>
%res1 = select <4 x i1> %arg1, <4 x float> %res0, <4 x float> %a0		%res1 = select <4 x i1> %arg1, <4 x float> %res0, <4 x float> %a0
ret <4 x float> %res1		ret <4 x float> %res1
}		}

define <4 x float> @test_mm_maskz_moveldup_ps(i8 %a0, <4 x float> %a1) {		define <4 x float> @test_mm_maskz_moveldup_ps(i8 %a0, <4 x float> %a1) {
; X32-LABEL: test_mm_maskz_moveldup_ps:		; X32-LABEL: test_mm_maskz_moveldup_ps:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi19:		; X32-NEXT: .Lcfi38:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vmovsldup {{.*#+}} xmm0 {%k1} {z} = xmm0[0,0,2,2]		; X32-NEXT: vmovsldup {{.*#+}} xmm0 {%k1} {z} = xmm0[0,0,2,2]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi39:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_maskz_moveldup_ps:		; X64-LABEL: test_mm_maskz_moveldup_ps:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%res = shufflevector <4 x i64> %a0, <4 x i64> undef, <4 x i32> <i32 3, i32 0, i32 0, i32 0>		%res = shufflevector <4 x i64> %a0, <4 x i64> undef, <4 x i32> <i32 3, i32 0, i32 0, i32 0>
ret <4 x i64> %res		ret <4 x i64> %res
}		}

define <4 x i64> @test_mm256_mask_permutex_epi64(<4 x i64> %a0, i8 %a1, <4 x i64> %a2) {		define <4 x i64> @test_mm256_mask_permutex_epi64(<4 x i64> %a0, i8 %a1, <4 x i64> %a2) {
; X32-LABEL: test_mm256_mask_permutex_epi64:		; X32-LABEL: test_mm256_mask_permutex_epi64:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi20:		; X32-NEXT: .Lcfi40:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vpermq {{.*#+}} ymm0 {%k1} = ymm1[1,0,0,0]		; X32-NEXT: vpermq {{.*#+}} ymm0 {%k1} = ymm1[1,0,0,0]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi41:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm256_mask_permutex_epi64:		; X64-LABEL: test_mm256_mask_permutex_epi64:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
; X64-NEXT: vpermq {{.*#+}} ymm0 {%k1} = ymm1[1,0,0,0]		; X64-NEXT: vpermq {{.*#+}} ymm0 {%k1} = ymm1[1,0,0,0]
; X64-NEXT: retq		; X64-NEXT: retq
%trn1 = trunc i8 %a1 to i4		%trn1 = trunc i8 %a1 to i4
%arg1 = bitcast i4 %trn1 to <4 x i1>		%arg1 = bitcast i4 %trn1 to <4 x i1>
%res0 = shufflevector <4 x i64> %a2, <4 x i64> undef, <4 x i32> <i32 1, i32 0, i32 0, i32 0>		%res0 = shufflevector <4 x i64> %a2, <4 x i64> undef, <4 x i32> <i32 1, i32 0, i32 0, i32 0>
%res1 = select <4 x i1> %arg1, <4 x i64> %res0, <4 x i64> %a0		%res1 = select <4 x i1> %arg1, <4 x i64> %res0, <4 x i64> %a0
ret <4 x i64> %res1		ret <4 x i64> %res1
}		}

define <4 x i64> @test_mm256_maskz_permutex_epi64(i8 %a0, <4 x i64> %a1) {		define <4 x i64> @test_mm256_maskz_permutex_epi64(i8 %a0, <4 x i64> %a1) {
; X32-LABEL: test_mm256_maskz_permutex_epi64:		; X32-LABEL: test_mm256_maskz_permutex_epi64:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi21:		; X32-NEXT: .Lcfi42:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vpermq {{.*#+}} ymm0 {%k1} {z} = ymm0[1,0,0,0]		; X32-NEXT: vpermq {{.*#+}} ymm0 {%k1} {z} = ymm0[1,0,0,0]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi43:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm256_maskz_permutex_epi64:		; X64-LABEL: test_mm256_maskz_permutex_epi64:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
Show All 19 Lines	; X64-NEXT: retq
%res = shufflevector <4 x double> %a0, <4 x double> undef, <4 x i32> <i32 3, i32 0, i32 0, i32 0>		%res = shufflevector <4 x double> %a0, <4 x double> undef, <4 x i32> <i32 3, i32 0, i32 0, i32 0>
ret <4 x double> %res		ret <4 x double> %res
}		}

define <4 x double> @test_mm256_mask_permutex_pd(<4 x double> %a0, i8 %a1, <4 x double> %a2) {		define <4 x double> @test_mm256_mask_permutex_pd(<4 x double> %a0, i8 %a1, <4 x double> %a2) {
; X32-LABEL: test_mm256_mask_permutex_pd:		; X32-LABEL: test_mm256_mask_permutex_pd:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi22:		; X32-NEXT: .Lcfi44:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vpermpd {{.*#+}} ymm0 {%k1} = ymm1[1,0,0,0]		; X32-NEXT: vpermpd {{.*#+}} ymm0 {%k1} = ymm1[1,0,0,0]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi45:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm256_mask_permutex_pd:		; X64-LABEL: test_mm256_mask_permutex_pd:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
; X64-NEXT: vpermpd {{.*#+}} ymm0 {%k1} = ymm1[1,0,0,0]		; X64-NEXT: vpermpd {{.*#+}} ymm0 {%k1} = ymm1[1,0,0,0]
; X64-NEXT: retq		; X64-NEXT: retq
%trn1 = trunc i8 %a1 to i4		%trn1 = trunc i8 %a1 to i4
%arg1 = bitcast i4 %trn1 to <4 x i1>		%arg1 = bitcast i4 %trn1 to <4 x i1>
%res0 = shufflevector <4 x double> %a2, <4 x double> undef, <4 x i32> <i32 1, i32 0, i32 0, i32 0>		%res0 = shufflevector <4 x double> %a2, <4 x double> undef, <4 x i32> <i32 1, i32 0, i32 0, i32 0>
%res1 = select <4 x i1> %arg1, <4 x double> %res0, <4 x double> %a0		%res1 = select <4 x i1> %arg1, <4 x double> %res0, <4 x double> %a0
ret <4 x double> %res1		ret <4 x double> %res1
}		}

define <4 x double> @test_mm256_maskz_permutex_pd(i8 %a0, <4 x double> %a1) {		define <4 x double> @test_mm256_maskz_permutex_pd(i8 %a0, <4 x double> %a1) {
; X32-LABEL: test_mm256_maskz_permutex_pd:		; X32-LABEL: test_mm256_maskz_permutex_pd:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi23:		; X32-NEXT: .Lcfi46:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vpermpd {{.*#+}} ymm0 {%k1} {z} = ymm0[1,0,0,0]		; X32-NEXT: vpermpd {{.*#+}} ymm0 {%k1} {z} = ymm0[1,0,0,0]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi47:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm256_maskz_permutex_pd:		; X64-LABEL: test_mm256_maskz_permutex_pd:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
Show All 19 Lines	; X64-NEXT: retq
%res = shufflevector <2 x double> %a0, <2 x double> %a1, <2 x i32> <i32 1, i32 3>		%res = shufflevector <2 x double> %a0, <2 x double> %a1, <2 x i32> <i32 1, i32 3>
ret <2 x double> %res		ret <2 x double> %res
}		}

define <2 x double> @test_mm_mask_shuffle_pd(<2 x double> %a0, i8 %a1, <2 x double> %a2, <2 x double> %a3) {		define <2 x double> @test_mm_mask_shuffle_pd(<2 x double> %a0, i8 %a1, <2 x double> %a2, <2 x double> %a3) {
; X32-LABEL: test_mm_mask_shuffle_pd:		; X32-LABEL: test_mm_mask_shuffle_pd:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi24:		; X32-NEXT: .Lcfi48:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $3, %al		; X32-NEXT: andb $3, %al
; X32-NEXT: movb %al, {{[0-9]+}}(%esp)		; X32-NEXT: movb %al, {{[0-9]+}}(%esp)
; X32-NEXT: movzbl {{[0-9]+}}(%esp), %eax		; X32-NEXT: movzbl {{[0-9]+}}(%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vunpckhpd {{.*#+}} xmm0 {%k1} = xmm1[1],xmm2[1]		; X32-NEXT: vunpckhpd {{.*#+}} xmm0 {%k1} = xmm1[1],xmm2[1]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi49:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_mask_shuffle_pd:		; X64-LABEL: test_mm_mask_shuffle_pd:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $3, %dil		; X64-NEXT: andb $3, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
; X64-NEXT: vunpckhpd {{.*#+}} xmm0 {%k1} = xmm1[1],xmm2[1]		; X64-NEXT: vunpckhpd {{.*#+}} xmm0 {%k1} = xmm1[1],xmm2[1]
; X64-NEXT: retq		; X64-NEXT: retq
%trn1 = trunc i8 %a1 to i2		%trn1 = trunc i8 %a1 to i2
%arg1 = bitcast i2 %trn1 to <2 x i1>		%arg1 = bitcast i2 %trn1 to <2 x i1>
%res0 = shufflevector <2 x double> %a2, <2 x double> %a3, <2 x i32> <i32 1, i32 3>		%res0 = shufflevector <2 x double> %a2, <2 x double> %a3, <2 x i32> <i32 1, i32 3>
%res1 = select <2 x i1> %arg1, <2 x double> %res0, <2 x double> %a0		%res1 = select <2 x i1> %arg1, <2 x double> %res0, <2 x double> %a0
ret <2 x double> %res1		ret <2 x double> %res1
}		}

define <2 x double> @test_mm_maskz_shuffle_pd(i8 %a0, <2 x double> %a1, <2 x double> %a2) {		define <2 x double> @test_mm_maskz_shuffle_pd(i8 %a0, <2 x double> %a1, <2 x double> %a2) {
; X32-LABEL: test_mm_maskz_shuffle_pd:		; X32-LABEL: test_mm_maskz_shuffle_pd:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi25:		; X32-NEXT: .Lcfi50:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $3, %al		; X32-NEXT: andb $3, %al
; X32-NEXT: movb %al, {{[0-9]+}}(%esp)		; X32-NEXT: movb %al, {{[0-9]+}}(%esp)
; X32-NEXT: movzbl {{[0-9]+}}(%esp), %eax		; X32-NEXT: movzbl {{[0-9]+}}(%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vunpckhpd {{.*#+}} xmm0 {%k1} {z} = xmm0[1],xmm1[1]		; X32-NEXT: vunpckhpd {{.*#+}} xmm0 {%k1} {z} = xmm0[1],xmm1[1]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi51:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_maskz_shuffle_pd:		; X64-LABEL: test_mm_maskz_shuffle_pd:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $3, %dil		; X64-NEXT: andb $3, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
Show All 19 Lines	; X64-NEXT: retq
%res = shufflevector <4 x double> %a0, <4 x double> %a1, <4 x i32> <i32 1, i32 5, i32 2, i32 6>		%res = shufflevector <4 x double> %a0, <4 x double> %a1, <4 x i32> <i32 1, i32 5, i32 2, i32 6>
ret <4 x double> %res		ret <4 x double> %res
}		}

define <4 x double> @test_mm256_mask_shuffle_pd(<4 x double> %a0, i8 %a1, <4 x double> %a2, <4 x double> %a3) {		define <4 x double> @test_mm256_mask_shuffle_pd(<4 x double> %a0, i8 %a1, <4 x double> %a2, <4 x double> %a3) {
; X32-LABEL: test_mm256_mask_shuffle_pd:		; X32-LABEL: test_mm256_mask_shuffle_pd:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi26:		; X32-NEXT: .Lcfi52:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vshufpd {{.*#+}} ymm0 {%k1} = ymm1[1],ymm2[1],ymm1[2],ymm2[2]		; X32-NEXT: vshufpd {{.*#+}} ymm0 {%k1} = ymm1[1],ymm2[1],ymm1[2],ymm2[2]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi53:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm256_mask_shuffle_pd:		; X64-LABEL: test_mm256_mask_shuffle_pd:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
; X64-NEXT: vshufpd {{.*#+}} ymm0 {%k1} = ymm1[1],ymm2[1],ymm1[2],ymm2[2]		; X64-NEXT: vshufpd {{.*#+}} ymm0 {%k1} = ymm1[1],ymm2[1],ymm1[2],ymm2[2]
; X64-NEXT: retq		; X64-NEXT: retq
%trn1 = trunc i8 %a1 to i4		%trn1 = trunc i8 %a1 to i4
%arg1 = bitcast i4 %trn1 to <4 x i1>		%arg1 = bitcast i4 %trn1 to <4 x i1>
%res0 = shufflevector <4 x double> %a2, <4 x double> %a3, <4 x i32> <i32 1, i32 5, i32 2, i32 6>		%res0 = shufflevector <4 x double> %a2, <4 x double> %a3, <4 x i32> <i32 1, i32 5, i32 2, i32 6>
%res1 = select <4 x i1> %arg1, <4 x double> %res0, <4 x double> %a0		%res1 = select <4 x i1> %arg1, <4 x double> %res0, <4 x double> %a0
ret <4 x double> %res1		ret <4 x double> %res1
}		}

define <4 x double> @test_mm256_maskz_shuffle_pd(i8 %a0, <4 x double> %a1, <4 x double> %a2) {		define <4 x double> @test_mm256_maskz_shuffle_pd(i8 %a0, <4 x double> %a1, <4 x double> %a2) {
; X32-LABEL: test_mm256_maskz_shuffle_pd:		; X32-LABEL: test_mm256_maskz_shuffle_pd:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi27:		; X32-NEXT: .Lcfi54:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vshufpd {{.*#+}} ymm0 {%k1} {z} = ymm0[1],ymm1[1],ymm0[2],ymm1[2]		; X32-NEXT: vshufpd {{.*#+}} ymm0 {%k1} {z} = ymm0[1],ymm1[1],ymm0[2],ymm1[2]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi55:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm256_maskz_shuffle_pd:		; X64-LABEL: test_mm256_maskz_shuffle_pd:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
Show All 19 Lines	; X64-NEXT: retq
%res = shufflevector <4 x float> %a0, <4 x float> %a1, <4 x i32> <i32 0, i32 1, i32 4, i32 4>		%res = shufflevector <4 x float> %a0, <4 x float> %a1, <4 x i32> <i32 0, i32 1, i32 4, i32 4>
ret <4 x float> %res		ret <4 x float> %res
}		}

define <4 x float> @test_mm_mask_shuffle_ps(<4 x float> %a0, i8 %a1, <4 x float> %a2, <4 x float> %a3) {		define <4 x float> @test_mm_mask_shuffle_ps(<4 x float> %a0, i8 %a1, <4 x float> %a2, <4 x float> %a3) {
; X32-LABEL: test_mm_mask_shuffle_ps:		; X32-LABEL: test_mm_mask_shuffle_ps:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi28:		; X32-NEXT: .Lcfi56:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vshufps {{.*#+}} xmm0 {%k1} = xmm1[0,1],xmm2[0,0]		; X32-NEXT: vshufps {{.*#+}} xmm0 {%k1} = xmm1[0,1],xmm2[0,0]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi57:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_mask_shuffle_ps:		; X64-LABEL: test_mm_mask_shuffle_ps:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
; X64-NEXT: vshufps {{.*#+}} xmm0 {%k1} = xmm1[0,1],xmm2[0,0]		; X64-NEXT: vshufps {{.*#+}} xmm0 {%k1} = xmm1[0,1],xmm2[0,0]
; X64-NEXT: retq		; X64-NEXT: retq
%trn1 = trunc i8 %a1 to i4		%trn1 = trunc i8 %a1 to i4
%arg1 = bitcast i4 %trn1 to <4 x i1>		%arg1 = bitcast i4 %trn1 to <4 x i1>
%res0 = shufflevector <4 x float> %a2, <4 x float> %a3, <4 x i32> <i32 0, i32 1, i32 4, i32 4>		%res0 = shufflevector <4 x float> %a2, <4 x float> %a3, <4 x i32> <i32 0, i32 1, i32 4, i32 4>
%res1 = select <4 x i1> %arg1, <4 x float> %res0, <4 x float> %a0		%res1 = select <4 x i1> %arg1, <4 x float> %res0, <4 x float> %a0
ret <4 x float> %res1		ret <4 x float> %res1
}		}

define <4 x float> @test_mm_maskz_shuffle_ps(i8 %a0, <4 x float> %a1, <4 x float> %a2) {		define <4 x float> @test_mm_maskz_shuffle_ps(i8 %a0, <4 x float> %a1, <4 x float> %a2) {
; X32-LABEL: test_mm_maskz_shuffle_ps:		; X32-LABEL: test_mm_maskz_shuffle_ps:
; X32: # BB#0:		; X32: # BB#0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: .Lcfi29:		; X32-NEXT: .Lcfi58:
; X32-NEXT: .cfi_def_cfa_offset 8		; X32-NEXT: .cfi_def_cfa_offset 8
; X32-NEXT: movb {{[0-9]+}}(%esp), %al		; X32-NEXT: movb {{[0-9]+}}(%esp), %al
; X32-NEXT: andb $15, %al		; X32-NEXT: andb $15, %al
; X32-NEXT: movb %al, (%esp)		; X32-NEXT: movb %al, (%esp)
; X32-NEXT: movzbl (%esp), %eax		; X32-NEXT: movzbl (%esp), %eax
; X32-NEXT: kmovw %eax, %k1		; X32-NEXT: kmovw %eax, %k1
; X32-NEXT: vshufps {{.*#+}} xmm0 {%k1} {z} = xmm0[0,1],xmm1[0,0]		; X32-NEXT: vshufps {{.*#+}} xmm0 {%k1} {z} = xmm0[0,1],xmm1[0,0]
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
		; X32-NEXT: .Lcfi59:
		; X32-NEXT: .cfi_def_cfa_offset 4
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_maskz_shuffle_ps:		; X64-LABEL: test_mm_maskz_shuffle_ps:
; X64: # BB#0:		; X64: # BB#0:
; X64-NEXT: andb $15, %dil		; X64-NEXT: andb $15, %dil
; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)		; X64-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax		; X64-NEXT: movzbl -{{[0-9]+}}(%rsp), %eax
; X64-NEXT: kmovw %eax, %k1		; X64-NEXT: kmovw %eax, %k1
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

test/CodeGen/X86/avx512vl-vbroadcast.ll

	; NOTE: Assertions have been autogenerated by update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512f -mattr=+avx512vl\| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512f -mattr=+avx512vl\| FileCheck %s

	declare void @func_f32(float)			declare void @func_f32(float)
	define <8 x float> @_256_broadcast_ss_spill(float %x) {			define <8 x float> @_256_broadcast_ss_spill(float %x) {
	; CHECK-LABEL: _256_broadcast_ss_spill:			; CHECK-LABEL: _256_broadcast_ss_spill:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: subq $24, %rsp			; CHECK-NEXT: subq $24, %rsp
	; CHECK-NEXT: .Lcfi0:			; CHECK-NEXT: .Lcfi0:
	; CHECK-NEXT: .cfi_def_cfa_offset 32			; CHECK-NEXT: .cfi_def_cfa_offset 32
	; CHECK-NEXT: vaddss %xmm0, %xmm0, %xmm0			; CHECK-NEXT: vaddss %xmm0, %xmm0, %xmm0
	; CHECK-NEXT: vmovaps %xmm0, (%rsp) # 16-byte Spill			; CHECK-NEXT: vmovaps %xmm0, (%rsp) # 16-byte Spill
	; CHECK-NEXT: callq func_f32			; CHECK-NEXT: callq func_f32
	; CHECK-NEXT: vbroadcastss (%rsp), %ymm0 # 16-byte Folded Reload			; CHECK-NEXT: vbroadcastss (%rsp), %ymm0 # 16-byte Folded Reload
	; CHECK-NEXT: addq $24, %rsp			; CHECK-NEXT: addq $24, %rsp
				; CHECK-NEXT: .Lcfi1:
				; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%a = fadd float %x, %x			%a = fadd float %x, %x
	call void @func_f32(float %a)			call void @func_f32(float %a)
	%b = insertelement <8 x float> undef, float %a, i32 0			%b = insertelement <8 x float> undef, float %a, i32 0
	%c = shufflevector <8 x float> %b, <8 x float> undef, <8 x i32> zeroinitializer			%c = shufflevector <8 x float> %b, <8 x float> undef, <8 x i32> zeroinitializer
	ret <8 x float> %c			ret <8 x float> %c
	}			}

	define <4 x float> @_128_broadcast_ss_spill(float %x) {			define <4 x float> @_128_broadcast_ss_spill(float %x) {
	; CHECK-LABEL: _128_broadcast_ss_spill:			; CHECK-LABEL: _128_broadcast_ss_spill:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: subq $24, %rsp			; CHECK-NEXT: subq $24, %rsp
	; CHECK-NEXT: .Lcfi1:			; CHECK-NEXT: .Lcfi2:
	; CHECK-NEXT: .cfi_def_cfa_offset 32			; CHECK-NEXT: .cfi_def_cfa_offset 32
	; CHECK-NEXT: vaddss %xmm0, %xmm0, %xmm0			; CHECK-NEXT: vaddss %xmm0, %xmm0, %xmm0
	; CHECK-NEXT: vmovaps %xmm0, (%rsp) # 16-byte Spill			; CHECK-NEXT: vmovaps %xmm0, (%rsp) # 16-byte Spill
	; CHECK-NEXT: callq func_f32			; CHECK-NEXT: callq func_f32
	; CHECK-NEXT: vbroadcastss (%rsp), %xmm0 # 16-byte Folded Reload			; CHECK-NEXT: vbroadcastss (%rsp), %xmm0 # 16-byte Folded Reload
	; CHECK-NEXT: addq $24, %rsp			; CHECK-NEXT: addq $24, %rsp
				; CHECK-NEXT: .Lcfi3:
				; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%a = fadd float %x, %x			%a = fadd float %x, %x
	call void @func_f32(float %a)			call void @func_f32(float %a)
	%b = insertelement <4 x float> undef, float %a, i32 0			%b = insertelement <4 x float> undef, float %a, i32 0
	%c = shufflevector <4 x float> %b, <4 x float> undef, <4 x i32> zeroinitializer			%c = shufflevector <4 x float> %b, <4 x float> undef, <4 x i32> zeroinitializer
	ret <4 x float> %c			ret <4 x float> %c
	}			}

	declare void @func_f64(double)			declare void @func_f64(double)
	define <4 x double> @_256_broadcast_sd_spill(double %x) {			define <4 x double> @_256_broadcast_sd_spill(double %x) {
	; CHECK-LABEL: _256_broadcast_sd_spill:			; CHECK-LABEL: _256_broadcast_sd_spill:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: subq $24, %rsp			; CHECK-NEXT: subq $24, %rsp
	; CHECK-NEXT: .Lcfi2:			; CHECK-NEXT: .Lcfi4:
	; CHECK-NEXT: .cfi_def_cfa_offset 32			; CHECK-NEXT: .cfi_def_cfa_offset 32
	; CHECK-NEXT: vaddsd %xmm0, %xmm0, %xmm0			; CHECK-NEXT: vaddsd %xmm0, %xmm0, %xmm0
	; CHECK-NEXT: vmovapd %xmm0, (%rsp) # 16-byte Spill			; CHECK-NEXT: vmovapd %xmm0, (%rsp) # 16-byte Spill
	; CHECK-NEXT: callq func_f64			; CHECK-NEXT: callq func_f64
	; CHECK-NEXT: vbroadcastsd (%rsp), %ymm0 # 16-byte Folded Reload			; CHECK-NEXT: vbroadcastsd (%rsp), %ymm0 # 16-byte Folded Reload
	; CHECK-NEXT: addq $24, %rsp			; CHECK-NEXT: addq $24, %rsp
				; CHECK-NEXT: .Lcfi5:
				; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%a = fadd double %x, %x			%a = fadd double %x, %x
	call void @func_f64(double %a)			call void @func_f64(double %a)
	%b = insertelement <4 x double> undef, double %a, i32 0			%b = insertelement <4 x double> undef, double %a, i32 0
	%c = shufflevector <4 x double> %b, <4 x double> undef, <4 x i32> zeroinitializer			%c = shufflevector <4 x double> %b, <4 x double> undef, <4 x i32> zeroinitializer
	ret <4 x double> %c			ret <4 x double> %c
	}			}

	▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

test/CodeGen/X86/emutls-pie.ll

	Show All 12 Lines

	define i32 @my_get_xyz() {			define i32 @my_get_xyz() {
	; X32-LABEL: my_get_xyz:			; X32-LABEL: my_get_xyz:
	; X32: movl my_emutls_v_xyz@GOT(%ebx), %eax			; X32: movl my_emutls_v_xyz@GOT(%ebx), %eax
	; X32-NEXT: movl %eax, (%esp)			; X32-NEXT: movl %eax, (%esp)
	; X32-NEXT: calll my_emutls_get_address@PLT			; X32-NEXT: calll my_emutls_get_address@PLT
	; X32-NEXT: movl (%eax), %eax			; X32-NEXT: movl (%eax), %eax
	; X32-NEXT: addl $8, %esp			; X32-NEXT: addl $8, %esp
				; X32-NEXT: :
				; X32-NEXT: .cfi_def_cfa_offset 8
	; X32-NEXT: popl %ebx			; X32-NEXT: popl %ebx
				; X32-NEXT: :
				; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl
	; X64-LABEL: my_get_xyz:			; X64-LABEL: my_get_xyz:
	; X64: movq my_emutls_v_xyz@GOTPCREL(%rip), %rdi			; X64: movq my_emutls_v_xyz@GOTPCREL(%rip), %rdi
	; X64-NEXT: callq my_emutls_get_address@PLT			; X64-NEXT: callq my_emutls_get_address@PLT
	; X64-NEXT: movl (%rax), %eax			; X64-NEXT: movl (%rax), %eax
	; X64-NEXT: popq %rcx			; X64-NEXT: popq %rcx
				; X64-NEXT: :
				; X64-NEXT: .cfi_def_cfa_offset 8
	; X64-NEXT: retq			; X64-NEXT: retq

	entry:			entry:
	%call = call i8* @my_emutls_get_address(i8* bitcast (i8** @my_emutls_v_xyz to i8*))			%call = call i8* @my_emutls_get_address(i8* bitcast (i8** @my_emutls_v_xyz to i8*))
	%0 = bitcast i8* %call to i32*			%0 = bitcast i8* %call to i32*
	%1 = load i32, i32* %0, align 4			%1 = load i32, i32* %0, align 4
	ret i32 %1			ret i32 %1
	}			}

	@i = thread_local global i32 15			@i = thread_local global i32 15
	@i2 = external thread_local global i32			@i2 = external thread_local global i32

	define i32 @f1() {			define i32 @f1() {
	; X32-LABEL: f1:			; X32-LABEL: f1:
	; X32: leal __emutls_v.i@GOTOFF(%ebx), %eax			; X32: leal __emutls_v.i@GOTOFF(%ebx), %eax
	; X32-NEXT: movl %eax, (%esp)			; X32-NEXT: movl %eax, (%esp)
	; X32-NEXT: calll __emutls_get_address@PLT			; X32-NEXT: calll __emutls_get_address@PLT
	; X32-NEXT: movl (%eax), %eax			; X32-NEXT: movl (%eax), %eax
	; X32-NEXT: addl $8, %esp			; X32-NEXT: addl $8, %esp
				; X32-NEXT: :
				; X32-NEXT: .cfi_def_cfa_offset 8
	; X32-NEXT: popl %ebx			; X32-NEXT: popl %ebx
				; X32-NEXT: :
				; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl
	; X64-LABEL: f1:			; X64-LABEL: f1:
	; X64: leaq __emutls_v.i(%rip), %rdi			; X64: leaq __emutls_v.i(%rip), %rdi
	; X64-NEXT: callq __emutls_get_address@PLT			; X64-NEXT: callq __emutls_get_address@PLT
	; X64-NEXT: movl (%rax), %eax			; X64-NEXT: movl (%rax), %eax
	; X64-NEXT: popq %rcx			; X64-NEXT: popq %rcx
				; X64-NEXT: :
				; X64-NEXT: .cfi_def_cfa_offset 8
	; X64-NEXT: retq			; X64-NEXT: retq

	entry:			entry:
	%tmp1 = load i32, i32* @i			%tmp1 = load i32, i32* @i
	ret i32 %tmp1			ret i32 %tmp1
	}			}

	define i32* @f2() {			define i32* @f2() {
	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

test/CodeGen/X86/emutls.ll

	Show All 10 Lines
	declare i8* @my_emutls_get_address(i8*)			declare i8* @my_emutls_get_address(i8*)

	define i32 @my_get_xyz() {			define i32 @my_get_xyz() {
	; X32-LABEL: my_get_xyz:			; X32-LABEL: my_get_xyz:
	; X32: movl $my_emutls_v_xyz, (%esp)			; X32: movl $my_emutls_v_xyz, (%esp)
	; X32-NEXT: calll my_emutls_get_address			; X32-NEXT: calll my_emutls_get_address
	; X32-NEXT: movl (%eax), %eax			; X32-NEXT: movl (%eax), %eax
	; X32-NEXT: addl $12, %esp			; X32-NEXT: addl $12, %esp
				; X32-NEXT: :
				; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl
	; X64-LABEL: my_get_xyz:			; X64-LABEL: my_get_xyz:
	; X64: movl $my_emutls_v_xyz, %edi			; X64: movl $my_emutls_v_xyz, %edi
	; X64-NEXT: callq my_emutls_get_address			; X64-NEXT: callq my_emutls_get_address
	; X64-NEXT: movl (%rax), %eax			; X64-NEXT: movl (%rax), %eax
	; X64-NEXT: popq %rcx			; X64-NEXT: popq %rcx
				; X64-NEXT: :
				; X64-NEXT: .cfi_def_cfa_offset 8
	; X64-NEXT: retq			; X64-NEXT: retq

	entry:			entry:
	%call = call i8* @my_emutls_get_address(i8* bitcast (i8** @my_emutls_v_xyz to i8*))			%call = call i8* @my_emutls_get_address(i8* bitcast (i8** @my_emutls_v_xyz to i8*))
	%0 = bitcast i8* %call to i32*			%0 = bitcast i8* %call to i32*
	%1 = load i32, i32* %0, align 4			%1 = load i32, i32* %0, align 4
	ret i32 %1			ret i32 %1
	}			}

	@i1 = thread_local global i32 15			@i1 = thread_local global i32 15
	@i2 = external thread_local global i32			@i2 = external thread_local global i32
	@i3 = internal thread_local global i32 15			@i3 = internal thread_local global i32 15
	@i4 = hidden thread_local global i32 15			@i4 = hidden thread_local global i32 15
	@i5 = external hidden thread_local global i32			@i5 = external hidden thread_local global i32
	@s1 = thread_local global i16 15			@s1 = thread_local global i16 15
	@b1 = thread_local global i8 0			@b1 = thread_local global i8 0

	define i32 @f1() {			define i32 @f1() {
	; X32-LABEL: f1:			; X32-LABEL: f1:
	; X32: movl $__emutls_v.i1, (%esp)			; X32: movl $__emutls_v.i1, (%esp)
	; X32-NEXT: calll __emutls_get_address			; X32-NEXT: calll __emutls_get_address
	; X32-NEXT: movl (%eax), %eax			; X32-NEXT: movl (%eax), %eax
	; X32-NEXT: addl $12, %esp			; X32-NEXT: addl $12, %esp
				; X32-NEXT: :
				; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl
	; X64-LABEL: f1:			; X64-LABEL: f1:
	; X64: movl $__emutls_v.i1, %edi			; X64: movl $__emutls_v.i1, %edi
	; X64-NEXT: callq __emutls_get_address			; X64-NEXT: callq __emutls_get_address
	; X64-NEXT: movl (%rax), %eax			; X64-NEXT: movl (%rax), %eax
	; X64-NEXT: popq %rcx			; X64-NEXT: popq %rcx
				; X64-NEXT: :
				; X64-NEXT: .cfi_def_cfa_offset 8
	; X64-NEXT: retq			; X64-NEXT: retq

	entry:			entry:
	%tmp1 = load i32, i32* @i1			%tmp1 = load i32, i32* @i1
	ret i32 %tmp1			ret i32 %tmp1
	}			}

	define i32* @f2() {			define i32* @f2() {
	; X32-LABEL: f2:			; X32-LABEL: f2:
	; X32: movl $__emutls_v.i1, (%esp)			; X32: movl $__emutls_v.i1, (%esp)
	; X32-NEXT: calll __emutls_get_address			; X32-NEXT: calll __emutls_get_address
	; X32-NEXT: addl $12, %esp			; X32-NEXT: addl $12, %esp
				; X32-NEXT: :
				; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl
	; X64-LABEL: f2:			; X64-LABEL: f2:
	; X64: movl $__emutls_v.i1, %edi			; X64: movl $__emutls_v.i1, %edi
	; X64-NEXT: callq __emutls_get_address			; X64-NEXT: callq __emutls_get_address
	; X64-NEXT: popq %rcx			; X64-NEXT: popq %rcx
				; X64-NEXT: :
				; X64-NEXT: .cfi_def_cfa_offset 8
	; X64-NEXT: retq			; X64-NEXT: retq

	entry:			entry:
	ret i32* @i1			ret i32* @i1
	}			}

	define i32 @f3() nounwind {			define i32 @f3() nounwind {
	; X32-LABEL: f3:			; X32-LABEL: f3:
	; X32: movl $__emutls_v.i2, (%esp)			; X32: movl $__emutls_v.i2, (%esp)
	; X32-NEXT: calll __emutls_get_address			; X32-NEXT: calll __emutls_get_address
	; X32-NEXT: movl (%eax), %eax			; X32-NEXT: movl (%eax), %eax
	; X32-NEXT: addl $12, %esp			; X32-NEXT: addl $12, %esp
	; X32-NEXT: retl			; X32-NEXT: retl

	entry:			entry:
	%tmp1 = load i32, i32* @i2			%tmp1 = load i32, i32* @i2
	ret i32 %tmp1			ret i32 %tmp1
	}			}

	define i32* @f4() {			define i32* @f4() {
	; X32-LABEL: f4:			; X32-LABEL: f4:
	; X32: movl $__emutls_v.i2, (%esp)			; X32: movl $__emutls_v.i2, (%esp)
	; X32-NEXT: calll __emutls_get_address			; X32-NEXT: calll __emutls_get_address
	; X32-NEXT: addl $12, %esp			; X32-NEXT: addl $12, %esp
				; X32-NEXT: :
				; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl

	entry:			entry:
	ret i32* @i2			ret i32* @i2
	}			}

	define i32 @f5() nounwind {			define i32 @f5() nounwind {
	; X32-LABEL: f5:			; X32-LABEL: f5:
	; X32: movl $__emutls_v.i3, (%esp)			; X32: movl $__emutls_v.i3, (%esp)
	; X32-NEXT: calll __emutls_get_address			; X32-NEXT: calll __emutls_get_address
	; X32-NEXT: movl (%eax), %eax			; X32-NEXT: movl (%eax), %eax
	; X32-NEXT: addl $12, %esp			; X32-NEXT: addl $12, %esp
	; X32-NEXT: retl			; X32-NEXT: retl

	entry:			entry:
	%tmp1 = load i32, i32* @i3			%tmp1 = load i32, i32* @i3
	ret i32 %tmp1			ret i32 %tmp1
	}			}

	define i32* @f6() {			define i32* @f6() {
	; X32-LABEL: f6:			; X32-LABEL: f6:
	; X32: movl $__emutls_v.i3, (%esp)			; X32: movl $__emutls_v.i3, (%esp)
	; X32-NEXT: calll __emutls_get_address			; X32-NEXT: calll __emutls_get_address
	; X32-NEXT: addl $12, %esp			; X32-NEXT: addl $12, %esp
				; X32-NEXT: :
				; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl

	entry:			entry:
	ret i32* @i3			ret i32* @i3
	}			}

	define i32 @f7() {			define i32 @f7() {
	; X32-LABEL: f7:			; X32-LABEL: f7:
	; X32: movl $__emutls_v.i4, (%esp)			; X32: movl $__emutls_v.i4, (%esp)
	; X32-NEXT: calll __emutls_get_address			; X32-NEXT: calll __emutls_get_address
	; X32-NEXT: movl (%eax), %eax			; X32-NEXT: movl (%eax), %eax
	; X32-NEXT: addl $12, %esp			; X32-NEXT: addl $12, %esp
				; X32-NEXT: :
				; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl

	entry:			entry:
	%tmp1 = load i32, i32* @i4			%tmp1 = load i32, i32* @i4
	ret i32 %tmp1			ret i32 %tmp1
	}			}

	define i32* @f8() {			define i32* @f8() {
	; X32-LABEL: f8:			; X32-LABEL: f8:
	; X32: movl $__emutls_v.i4, (%esp)			; X32: movl $__emutls_v.i4, (%esp)
	; X32-NEXT: calll __emutls_get_address			; X32-NEXT: calll __emutls_get_address
	; X32-NEXT: addl $12, %esp			; X32-NEXT: addl $12, %esp
				; X32-NEXT: :
				; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl

	entry:			entry:
	ret i32* @i4			ret i32* @i4
	}			}

	define i32 @f9() {			define i32 @f9() {
	; X32-LABEL: f9:			; X32-LABEL: f9:
	; X32: movl $__emutls_v.i5, (%esp)			; X32: movl $__emutls_v.i5, (%esp)
	; X32-NEXT: calll __emutls_get_address			; X32-NEXT: calll __emutls_get_address
	; X32-NEXT: movl (%eax), %eax			; X32-NEXT: movl (%eax), %eax
	; X32-NEXT: addl $12, %esp			; X32-NEXT: addl $12, %esp
				; X32-NEXT: :
				; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl

	entry:			entry:
	%tmp1 = load i32, i32* @i5			%tmp1 = load i32, i32* @i5
	ret i32 %tmp1			ret i32 %tmp1
	}			}

	define i32* @f10() {			define i32* @f10() {
	; X32-LABEL: f10:			; X32-LABEL: f10:
	; X32: movl $__emutls_v.i5, (%esp)			; X32: movl $__emutls_v.i5, (%esp)
	; X32-NEXT: calll __emutls_get_address			; X32-NEXT: calll __emutls_get_address
	; X32-NEXT: addl $12, %esp			; X32-NEXT: addl $12, %esp
				; X32-NEXT: :
				; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl

	entry:			entry:
	ret i32* @i5			ret i32* @i5
	}			}

	define i16 @f11() {			define i16 @f11() {
	; X32-LABEL: f11:			; X32-LABEL: f11:
	; X32: movl $__emutls_v.s1, (%esp)			; X32: movl $__emutls_v.s1, (%esp)
	; X32-NEXT: calll __emutls_get_address			; X32-NEXT: calll __emutls_get_address
	; X32-NEXT: movzwl (%eax), %eax			; X32-NEXT: movzwl (%eax), %eax
	; X32-NEXT: addl $12, %esp			; X32-NEXT: addl $12, %esp
				; X32-NEXT: :
				; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl

	entry:			entry:
	%tmp1 = load i16, i16* @s1			%tmp1 = load i16, i16* @s1
	ret i16 %tmp1			ret i16 %tmp1
	}			}

	define i32 @f12() {			define i32 @f12() {
	; X32-LABEL: f12:			; X32-LABEL: f12:
	; X32: movl $__emutls_v.s1, (%esp)			; X32: movl $__emutls_v.s1, (%esp)
	; X32-NEXT: calll __emutls_get_address			; X32-NEXT: calll __emutls_get_address
	; X32-NEXT: movswl (%eax), %eax			; X32-NEXT: movswl (%eax), %eax
	; X32-NEXT: addl $12, %esp			; X32-NEXT: addl $12, %esp
				; X32-NEXT: :
				; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl

	entry:			entry:
	%tmp1 = load i16, i16* @s1			%tmp1 = load i16, i16* @s1
	%tmp2 = sext i16 %tmp1 to i32			%tmp2 = sext i16 %tmp1 to i32
	ret i32 %tmp2			ret i32 %tmp2
	}			}

	define i8 @f13() {			define i8 @f13() {
	; X32-LABEL: f13:			; X32-LABEL: f13:
	; X32: movl $__emutls_v.b1, (%esp)			; X32: movl $__emutls_v.b1, (%esp)
	; X32-NEXT: calll __emutls_get_address			; X32-NEXT: calll __emutls_get_address
	; X32-NEXT: movb (%eax), %al			; X32-NEXT: movb (%eax), %al
	; X32-NEXT: addl $12, %esp			; X32-NEXT: addl $12, %esp
				; X32-NEXT: :
				; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl

	entry:			entry:
	%tmp1 = load i8, i8* @b1			%tmp1 = load i8, i8* @b1
	ret i8 %tmp1			ret i8 %tmp1
	}			}

	define i32 @f14() {			define i32 @f14() {
	; X32-LABEL: f14:			; X32-LABEL: f14:
	; X32: movl $__emutls_v.b1, (%esp)			; X32: movl $__emutls_v.b1, (%esp)
	; X32-NEXT: calll __emutls_get_address			; X32-NEXT: calll __emutls_get_address
	; X32-NEXT: movsbl (%eax), %eax			; X32-NEXT: movsbl (%eax), %eax
	; X32-NEXT: addl $12, %esp			; X32-NEXT: addl $12, %esp
				; X32-NEXT: :
				; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl

	entry:			entry:
	%tmp1 = load i8, i8* @b1			%tmp1 = load i8, i8* @b1
	%tmp2 = sext i8 %tmp1 to i32			%tmp2 = sext i8 %tmp1 to i32
	ret i32 %tmp2			ret i32 %tmp2
	}			}

	▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines

test/CodeGen/X86/epilogue-cfi-fp.ll

This file was added.

				; RUN: llc -O0 %s -o - \| FileCheck %s

				mkuperUnsubmitted Not Done Reply Inline Actions Any chance the tests can be made smaller? E.g. even if we need debug info to be available, we don't actually need the debug info, only the flag, right? mkuper: Any chance the tests can be made smaller? E.g. even if we need debug info to be available, we…
				; ModuleID = 'epilogue-cfi-fp.c'
				mkuperUnsubmitted Not Done Reply Inline Actions This is a good start, but the tests still look like they contain much more than needed to check the specific things they check. Both in terms of debug info and attributes, and, for the EH test, a lot of code that seems to me to be redundant. mkuper: This is a good start, but the tests still look like they contain much more than needed to check…
				source_filename = "epilogue-cfi-fp.c"
				target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
				target triple = "i686-pc-linux"

				; Function Attrs: noinline nounwind
				define i32 @foo(i32 %i, i32 %j, i32 %k, i32 %l, i32 %m) #0 {

				; CHECK-LABEL: foo:
				; CHECK: popl %ebp
				; CHECK-NEXT: :
				; CHECK-NEXT: .cfi_def_cfa %esp, 4
				; CHECK-NEXT: retl

				entry:
				%i.addr = alloca i32, align 4
				%j.addr = alloca i32, align 4
				%k.addr = alloca i32, align 4
				%l.addr = alloca i32, align 4
				%m.addr = alloca i32, align 4
				store i32 %i, i32* %i.addr, align 4
				store i32 %j, i32* %j.addr, align 4
				store i32 %k, i32* %k.addr, align 4
				store i32 %l, i32* %l.addr, align 4
				store i32 %m, i32* %m.addr, align 4
				ret i32 0
				}

				attributes #0 = { "no-frame-pointer-elim"="true" }

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!3, !4, !5, !6, !7}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 5.0.0 (http://llvm.org/git/clang.git 3f8116e6a2815b1d5f3491493938d0c63c9f42c9) (http://llvm.org/git/llvm.git 4fde77f8f1a8e4482e69b6a7484bc7d1b99b3c0a)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
				!1 = !DIFile(filename: "epilogue-cfi-fp.c", directory: "epilogue-dwarf/test")
				!2 = !{}
				!3 = !{i32 1, !"NumRegisterParameters", i32 0}
				!4 = !{i32 2, !"Dwarf Version", i32 4}
				!5 = !{i32 2, !"Debug Info Version", i32 3}
				!6 = !{i32 1, !"wchar_size", i32 4}
				!7 = !{i32 7, !"PIC Level", i32 2}

test/CodeGen/X86/epilogue-cfi-no-fp.ll

This file was added.

				; RUN: llc -O0 < %s \| FileCheck %s

				; ModuleID = 'epilogue-cfi-no-fp.c'
				source_filename = "epilogue-cfi-no-fp.c"
				target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
				target triple = "i686-pc-linux"

				; Function Attrs: noinline nounwind
				define i32 @foo(i32 %i, i32 %j, i32 %k, i32 %l, i32 %m) {
				; CHECK-LABEL: foo:
				; CHECK: addl $20, %esp
				; CHECK-NEXT: :
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: popl %esi
				; CHECK-NEXT: :
				; CHECK-NEXT: .cfi_def_cfa_offset 12
				; CHECK-NEXT: popl %edi
				; CHECK-NEXT: :
				; CHECK-NEXT: .cfi_def_cfa_offset 8
				; CHECK-NEXT: popl %ebx
				; CHECK-NEXT: :
				; CHECK-NEXT: .cfi_def_cfa_offset 4
				; CHECK-NEXT: retl
				entry:
				%i.addr = alloca i32, align 4
				%j.addr = alloca i32, align 4
				%k.addr = alloca i32, align 4
				%l.addr = alloca i32, align 4
				%m.addr = alloca i32, align 4
				store i32 %i, i32* %i.addr, align 4
				store i32 %j, i32* %j.addr, align 4
				store i32 %k, i32* %k.addr, align 4
				store i32 %l, i32* %l.addr, align 4
				store i32 %m, i32* %m.addr, align 4
				ret i32 0
				}

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!3, !4, !5, !6, !7}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 5.0.0 (http://llvm.org/git/clang.git 3f8116e6a2815b1d5f3491493938d0c63c9f42c9) (http://llvm.org/git/llvm.git 4fde77f8f1a8e4482e69b6a7484bc7d1b99b3c0a)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
				!1 = !DIFile(filename: "epilogue-cfi-no-fp.c", directory: "epilogue-dwarf/test")
				!2 = !{}
				!3 = !{i32 1, !"NumRegisterParameters", i32 0}
				!4 = !{i32 2, !"Dwarf Version", i32 4}
				!5 = !{i32 2, !"Debug Info Version", i32 3}
				!6 = !{i32 1, !"wchar_size", i32 4}
				!7 = !{i32 7, !"PIC Level", i32 2}

test/CodeGen/X86/fast-isel-store.ll

	Show First 20 Lines • Show All 370 Lines • ▼ Show 20 Lines
	; SSE64-NEXT: .Lcfi0:			; SSE64-NEXT: .Lcfi0:
	; SSE64-NEXT: .cfi_def_cfa_offset 16			; SSE64-NEXT: .cfi_def_cfa_offset 16
	; SSE64-NEXT: movl {{[0-9]+}}(%esp), %eax			; SSE64-NEXT: movl {{[0-9]+}}(%esp), %eax
	; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm1			; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm1
	; SSE64-NEXT: addpd %xmm2, %xmm0			; SSE64-NEXT: addpd %xmm2, %xmm0
	; SSE64-NEXT: movupd %xmm0, (%eax)			; SSE64-NEXT: movupd %xmm0, (%eax)
	; SSE64-NEXT: movupd %xmm1, 16(%eax)			; SSE64-NEXT: movupd %xmm1, 16(%eax)
	; SSE64-NEXT: addl $12, %esp			; SSE64-NEXT: addl $12, %esp
				; SSE64-NEXT: .Lcfi1:
				; SSE64-NEXT: .cfi_def_cfa_offset 4
	; SSE64-NEXT: retl			; SSE64-NEXT: retl
	;			;
	; AVX32-LABEL: test_store_4xf64:			; AVX32-LABEL: test_store_4xf64:
	; AVX32: # BB#0:			; AVX32: # BB#0:
	; AVX32-NEXT: vaddpd %ymm1, %ymm0, %ymm0			; AVX32-NEXT: vaddpd %ymm1, %ymm0, %ymm0
	; AVX32-NEXT: vmovupd %ymm0, (%rdi)			; AVX32-NEXT: vmovupd %ymm0, (%rdi)
	; AVX32-NEXT: retq			; AVX32-NEXT: retq
	;			;
	Show All 15 Lines
	; SSE32-NEXT: addpd %xmm2, %xmm0			; SSE32-NEXT: addpd %xmm2, %xmm0
	; SSE32-NEXT: movapd %xmm0, (%rdi)			; SSE32-NEXT: movapd %xmm0, (%rdi)
	; SSE32-NEXT: movapd %xmm1, 16(%rdi)			; SSE32-NEXT: movapd %xmm1, 16(%rdi)
	; SSE32-NEXT: retq			; SSE32-NEXT: retq
	;			;
	; SSE64-LABEL: test_store_4xf64_aligned:			; SSE64-LABEL: test_store_4xf64_aligned:
	; SSE64: # BB#0:			; SSE64: # BB#0:
	; SSE64-NEXT: subl $12, %esp			; SSE64-NEXT: subl $12, %esp
	; SSE64-NEXT: .Lcfi1:			; SSE64-NEXT: .Lcfi2:
	; SSE64-NEXT: .cfi_def_cfa_offset 16			; SSE64-NEXT: .cfi_def_cfa_offset 16
	; SSE64-NEXT: movl {{[0-9]+}}(%esp), %eax			; SSE64-NEXT: movl {{[0-9]+}}(%esp), %eax
	; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm1			; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm1
	; SSE64-NEXT: addpd %xmm2, %xmm0			; SSE64-NEXT: addpd %xmm2, %xmm0
	; SSE64-NEXT: movapd %xmm0, (%eax)			; SSE64-NEXT: movapd %xmm0, (%eax)
	; SSE64-NEXT: movapd %xmm1, 16(%eax)			; SSE64-NEXT: movapd %xmm1, 16(%eax)
	; SSE64-NEXT: addl $12, %esp			; SSE64-NEXT: addl $12, %esp
				; SSE64-NEXT: .Lcfi3:
				; SSE64-NEXT: .cfi_def_cfa_offset 4
	; SSE64-NEXT: retl			; SSE64-NEXT: retl
	;			;
	; AVX32-LABEL: test_store_4xf64_aligned:			; AVX32-LABEL: test_store_4xf64_aligned:
	; AVX32: # BB#0:			; AVX32: # BB#0:
	; AVX32-NEXT: vaddpd %ymm1, %ymm0, %ymm0			; AVX32-NEXT: vaddpd %ymm1, %ymm0, %ymm0
	; AVX32-NEXT: vmovapd %ymm0, (%rdi)			; AVX32-NEXT: vmovapd %ymm0, (%rdi)
	; AVX32-NEXT: retq			; AVX32-NEXT: retq
	;			;
	Show All 15 Lines
	; SSE32-NEXT: movups %xmm1, 16(%rdi)			; SSE32-NEXT: movups %xmm1, 16(%rdi)
	; SSE32-NEXT: movups %xmm2, 32(%rdi)			; SSE32-NEXT: movups %xmm2, 32(%rdi)
	; SSE32-NEXT: movups %xmm3, 48(%rdi)			; SSE32-NEXT: movups %xmm3, 48(%rdi)
	; SSE32-NEXT: retq			; SSE32-NEXT: retq
	;			;
	; SSE64-LABEL: test_store_16xi32:			; SSE64-LABEL: test_store_16xi32:
	; SSE64: # BB#0:			; SSE64: # BB#0:
	; SSE64-NEXT: subl $12, %esp			; SSE64-NEXT: subl $12, %esp
	; SSE64-NEXT: .Lcfi2:			; SSE64-NEXT: .Lcfi4:
	; SSE64-NEXT: .cfi_def_cfa_offset 16			; SSE64-NEXT: .cfi_def_cfa_offset 16
	; SSE64-NEXT: movaps {{[0-9]+}}(%esp), %xmm3			; SSE64-NEXT: movaps {{[0-9]+}}(%esp), %xmm3
	; SSE64-NEXT: movl {{[0-9]+}}(%esp), %eax			; SSE64-NEXT: movl {{[0-9]+}}(%esp), %eax
	; SSE64-NEXT: movups %xmm0, (%eax)			; SSE64-NEXT: movups %xmm0, (%eax)
	; SSE64-NEXT: movups %xmm1, 16(%eax)			; SSE64-NEXT: movups %xmm1, 16(%eax)
	; SSE64-NEXT: movups %xmm2, 32(%eax)			; SSE64-NEXT: movups %xmm2, 32(%eax)
	; SSE64-NEXT: movups %xmm3, 48(%eax)			; SSE64-NEXT: movups %xmm3, 48(%eax)
	; SSE64-NEXT: addl $12, %esp			; SSE64-NEXT: addl $12, %esp
				; SSE64-NEXT: .Lcfi5:
				; SSE64-NEXT: .cfi_def_cfa_offset 4
	; SSE64-NEXT: retl			; SSE64-NEXT: retl
	;			;
	; AVXONLY32-LABEL: test_store_16xi32:			; AVXONLY32-LABEL: test_store_16xi32:
	; AVXONLY32: # BB#0:			; AVXONLY32: # BB#0:
	; AVXONLY32-NEXT: vmovups %ymm0, (%rdi)			; AVXONLY32-NEXT: vmovups %ymm0, (%rdi)
	; AVXONLY32-NEXT: vmovups %ymm1, 32(%rdi)			; AVXONLY32-NEXT: vmovups %ymm1, 32(%rdi)
	; AVXONLY32-NEXT: retq			; AVXONLY32-NEXT: retq
	;			;
	Show All 25 Lines
	; SSE32-NEXT: movaps %xmm1, 16(%rdi)			; SSE32-NEXT: movaps %xmm1, 16(%rdi)
	; SSE32-NEXT: movaps %xmm2, 32(%rdi)			; SSE32-NEXT: movaps %xmm2, 32(%rdi)
	; SSE32-NEXT: movaps %xmm3, 48(%rdi)			; SSE32-NEXT: movaps %xmm3, 48(%rdi)
	; SSE32-NEXT: retq			; SSE32-NEXT: retq
	;			;
	; SSE64-LABEL: test_store_16xi32_aligned:			; SSE64-LABEL: test_store_16xi32_aligned:
	; SSE64: # BB#0:			; SSE64: # BB#0:
	; SSE64-NEXT: subl $12, %esp			; SSE64-NEXT: subl $12, %esp
	; SSE64-NEXT: .Lcfi3:			; SSE64-NEXT: .Lcfi6:
	; SSE64-NEXT: .cfi_def_cfa_offset 16			; SSE64-NEXT: .cfi_def_cfa_offset 16
	; SSE64-NEXT: movaps {{[0-9]+}}(%esp), %xmm3			; SSE64-NEXT: movaps {{[0-9]+}}(%esp), %xmm3
	; SSE64-NEXT: movl {{[0-9]+}}(%esp), %eax			; SSE64-NEXT: movl {{[0-9]+}}(%esp), %eax
	; SSE64-NEXT: movaps %xmm0, (%eax)			; SSE64-NEXT: movaps %xmm0, (%eax)
	; SSE64-NEXT: movaps %xmm1, 16(%eax)			; SSE64-NEXT: movaps %xmm1, 16(%eax)
	; SSE64-NEXT: movaps %xmm2, 32(%eax)			; SSE64-NEXT: movaps %xmm2, 32(%eax)
	; SSE64-NEXT: movaps %xmm3, 48(%eax)			; SSE64-NEXT: movaps %xmm3, 48(%eax)
	; SSE64-NEXT: addl $12, %esp			; SSE64-NEXT: addl $12, %esp
				; SSE64-NEXT: .Lcfi7:
				; SSE64-NEXT: .cfi_def_cfa_offset 4
	; SSE64-NEXT: retl			; SSE64-NEXT: retl
	;			;
	; AVXONLY32-LABEL: test_store_16xi32_aligned:			; AVXONLY32-LABEL: test_store_16xi32_aligned:
	; AVXONLY32: # BB#0:			; AVXONLY32: # BB#0:
	; AVXONLY32-NEXT: vmovaps %ymm0, (%rdi)			; AVXONLY32-NEXT: vmovaps %ymm0, (%rdi)
	; AVXONLY32-NEXT: vmovaps %ymm1, 32(%rdi)			; AVXONLY32-NEXT: vmovaps %ymm1, 32(%rdi)
	; AVXONLY32-NEXT: retq			; AVXONLY32-NEXT: retq
	;			;
	Show All 25 Lines
	; SSE32-NEXT: movups %xmm1, 16(%rdi)			; SSE32-NEXT: movups %xmm1, 16(%rdi)
	; SSE32-NEXT: movups %xmm2, 32(%rdi)			; SSE32-NEXT: movups %xmm2, 32(%rdi)
	; SSE32-NEXT: movups %xmm3, 48(%rdi)			; SSE32-NEXT: movups %xmm3, 48(%rdi)
	; SSE32-NEXT: retq			; SSE32-NEXT: retq
	;			;
	; SSE64-LABEL: test_store_16xf32:			; SSE64-LABEL: test_store_16xf32:
	; SSE64: # BB#0:			; SSE64: # BB#0:
	; SSE64-NEXT: subl $12, %esp			; SSE64-NEXT: subl $12, %esp
	; SSE64-NEXT: .Lcfi4:			; SSE64-NEXT: .Lcfi8:
	; SSE64-NEXT: .cfi_def_cfa_offset 16			; SSE64-NEXT: .cfi_def_cfa_offset 16
	; SSE64-NEXT: movaps {{[0-9]+}}(%esp), %xmm3			; SSE64-NEXT: movaps {{[0-9]+}}(%esp), %xmm3
	; SSE64-NEXT: movl {{[0-9]+}}(%esp), %eax			; SSE64-NEXT: movl {{[0-9]+}}(%esp), %eax
	; SSE64-NEXT: movups %xmm0, (%eax)			; SSE64-NEXT: movups %xmm0, (%eax)
	; SSE64-NEXT: movups %xmm1, 16(%eax)			; SSE64-NEXT: movups %xmm1, 16(%eax)
	; SSE64-NEXT: movups %xmm2, 32(%eax)			; SSE64-NEXT: movups %xmm2, 32(%eax)
	; SSE64-NEXT: movups %xmm3, 48(%eax)			; SSE64-NEXT: movups %xmm3, 48(%eax)
	; SSE64-NEXT: addl $12, %esp			; SSE64-NEXT: addl $12, %esp
				; SSE64-NEXT: .Lcfi9:
				; SSE64-NEXT: .cfi_def_cfa_offset 4
	; SSE64-NEXT: retl			; SSE64-NEXT: retl
	;			;
	; AVXONLY32-LABEL: test_store_16xf32:			; AVXONLY32-LABEL: test_store_16xf32:
	; AVXONLY32: # BB#0:			; AVXONLY32: # BB#0:
	; AVXONLY32-NEXT: vmovups %ymm0, (%rdi)			; AVXONLY32-NEXT: vmovups %ymm0, (%rdi)
	; AVXONLY32-NEXT: vmovups %ymm1, 32(%rdi)			; AVXONLY32-NEXT: vmovups %ymm1, 32(%rdi)
	; AVXONLY32-NEXT: retq			; AVXONLY32-NEXT: retq
	;			;
	Show All 25 Lines
	; SSE32-NEXT: movaps %xmm1, 16(%rdi)			; SSE32-NEXT: movaps %xmm1, 16(%rdi)
	; SSE32-NEXT: movaps %xmm2, 32(%rdi)			; SSE32-NEXT: movaps %xmm2, 32(%rdi)
	; SSE32-NEXT: movaps %xmm3, 48(%rdi)			; SSE32-NEXT: movaps %xmm3, 48(%rdi)
	; SSE32-NEXT: retq			; SSE32-NEXT: retq
	;			;
	; SSE64-LABEL: test_store_16xf32_aligned:			; SSE64-LABEL: test_store_16xf32_aligned:
	; SSE64: # BB#0:			; SSE64: # BB#0:
	; SSE64-NEXT: subl $12, %esp			; SSE64-NEXT: subl $12, %esp
	; SSE64-NEXT: .Lcfi5:			; SSE64-NEXT: .Lcfi10:
	; SSE64-NEXT: .cfi_def_cfa_offset 16			; SSE64-NEXT: .cfi_def_cfa_offset 16
	; SSE64-NEXT: movaps {{[0-9]+}}(%esp), %xmm3			; SSE64-NEXT: movaps {{[0-9]+}}(%esp), %xmm3
	; SSE64-NEXT: movl {{[0-9]+}}(%esp), %eax			; SSE64-NEXT: movl {{[0-9]+}}(%esp), %eax
	; SSE64-NEXT: movaps %xmm0, (%eax)			; SSE64-NEXT: movaps %xmm0, (%eax)
	; SSE64-NEXT: movaps %xmm1, 16(%eax)			; SSE64-NEXT: movaps %xmm1, 16(%eax)
	; SSE64-NEXT: movaps %xmm2, 32(%eax)			; SSE64-NEXT: movaps %xmm2, 32(%eax)
	; SSE64-NEXT: movaps %xmm3, 48(%eax)			; SSE64-NEXT: movaps %xmm3, 48(%eax)
	; SSE64-NEXT: addl $12, %esp			; SSE64-NEXT: addl $12, %esp
				; SSE64-NEXT: .Lcfi11:
				; SSE64-NEXT: .cfi_def_cfa_offset 4
	; SSE64-NEXT: retl			; SSE64-NEXT: retl
	;			;
	; AVXONLY32-LABEL: test_store_16xf32_aligned:			; AVXONLY32-LABEL: test_store_16xf32_aligned:
	; AVXONLY32: # BB#0:			; AVXONLY32: # BB#0:
	; AVXONLY32-NEXT: vmovaps %ymm0, (%rdi)			; AVXONLY32-NEXT: vmovaps %ymm0, (%rdi)
	; AVXONLY32-NEXT: vmovaps %ymm1, 32(%rdi)			; AVXONLY32-NEXT: vmovaps %ymm1, 32(%rdi)
	; AVXONLY32-NEXT: retq			; AVXONLY32-NEXT: retq
	;			;
	Show All 29 Lines
	; SSE32-NEXT: movupd %xmm1, 16(%rdi)			; SSE32-NEXT: movupd %xmm1, 16(%rdi)
	; SSE32-NEXT: movupd %xmm2, 32(%rdi)			; SSE32-NEXT: movupd %xmm2, 32(%rdi)
	; SSE32-NEXT: movupd %xmm3, 48(%rdi)			; SSE32-NEXT: movupd %xmm3, 48(%rdi)
	; SSE32-NEXT: retq			; SSE32-NEXT: retq
	;			;
	; SSE64-LABEL: test_store_8xf64:			; SSE64-LABEL: test_store_8xf64:
	; SSE64: # BB#0:			; SSE64: # BB#0:
	; SSE64-NEXT: subl $12, %esp			; SSE64-NEXT: subl $12, %esp
	; SSE64-NEXT: .Lcfi6:			; SSE64-NEXT: .Lcfi12:
	; SSE64-NEXT: .cfi_def_cfa_offset 16			; SSE64-NEXT: .cfi_def_cfa_offset 16
	; SSE64-NEXT: movapd {{[0-9]+}}(%esp), %xmm3			; SSE64-NEXT: movapd {{[0-9]+}}(%esp), %xmm3
	; SSE64-NEXT: movl {{[0-9]+}}(%esp), %eax			; SSE64-NEXT: movl {{[0-9]+}}(%esp), %eax
	; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm3			; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm3
	; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm2			; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm2
	; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm1			; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm1
	; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm0			; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm0
	; SSE64-NEXT: movupd %xmm0, (%eax)			; SSE64-NEXT: movupd %xmm0, (%eax)
	; SSE64-NEXT: movupd %xmm1, 16(%eax)			; SSE64-NEXT: movupd %xmm1, 16(%eax)
	; SSE64-NEXT: movupd %xmm2, 32(%eax)			; SSE64-NEXT: movupd %xmm2, 32(%eax)
	; SSE64-NEXT: movupd %xmm3, 48(%eax)			; SSE64-NEXT: movupd %xmm3, 48(%eax)
	; SSE64-NEXT: addl $12, %esp			; SSE64-NEXT: addl $12, %esp
				; SSE64-NEXT: .Lcfi13:
				; SSE64-NEXT: .cfi_def_cfa_offset 4
	; SSE64-NEXT: retl			; SSE64-NEXT: retl
	;			;
	; AVXONLY32-LABEL: test_store_8xf64:			; AVXONLY32-LABEL: test_store_8xf64:
	; AVXONLY32: # BB#0:			; AVXONLY32: # BB#0:
	; AVXONLY32-NEXT: vaddpd %ymm3, %ymm1, %ymm1			; AVXONLY32-NEXT: vaddpd %ymm3, %ymm1, %ymm1
	; AVXONLY32-NEXT: vaddpd %ymm2, %ymm0, %ymm0			; AVXONLY32-NEXT: vaddpd %ymm2, %ymm0, %ymm0
	; AVXONLY32-NEXT: vmovupd %ymm0, (%rdi)			; AVXONLY32-NEXT: vmovupd %ymm0, (%rdi)
	; AVXONLY32-NEXT: vmovupd %ymm1, 32(%rdi)			; AVXONLY32-NEXT: vmovupd %ymm1, 32(%rdi)
	Show All 13 Lines
	; AVXONLY64-NEXT: subl $32, %esp			; AVXONLY64-NEXT: subl $32, %esp
	; AVXONLY64-NEXT: movl 8(%ebp), %eax			; AVXONLY64-NEXT: movl 8(%ebp), %eax
	; AVXONLY64-NEXT: vaddpd 40(%ebp), %ymm1, %ymm1			; AVXONLY64-NEXT: vaddpd 40(%ebp), %ymm1, %ymm1
	; AVXONLY64-NEXT: vaddpd %ymm2, %ymm0, %ymm0			; AVXONLY64-NEXT: vaddpd %ymm2, %ymm0, %ymm0
	; AVXONLY64-NEXT: vmovupd %ymm0, (%eax)			; AVXONLY64-NEXT: vmovupd %ymm0, (%eax)
	; AVXONLY64-NEXT: vmovupd %ymm1, 32(%eax)			; AVXONLY64-NEXT: vmovupd %ymm1, 32(%eax)
	; AVXONLY64-NEXT: movl %ebp, %esp			; AVXONLY64-NEXT: movl %ebp, %esp
	; AVXONLY64-NEXT: popl %ebp			; AVXONLY64-NEXT: popl %ebp
				; AVXONLY64-NEXT: .Lcfi3:
				; AVXONLY64-NEXT: .cfi_def_cfa %esp, 4
	; AVXONLY64-NEXT: retl			; AVXONLY64-NEXT: retl
	;			;
	; AVX51232-LABEL: test_store_8xf64:			; AVX51232-LABEL: test_store_8xf64:
	; AVX51232: # BB#0:			; AVX51232: # BB#0:
	; AVX51232-NEXT: vaddpd %zmm1, %zmm0, %zmm0			; AVX51232-NEXT: vaddpd %zmm1, %zmm0, %zmm0
	; AVX51232-NEXT: vmovupd %zmm0, (%rdi)			; AVX51232-NEXT: vmovupd %zmm0, (%rdi)
	; AVX51232-NEXT: retq			; AVX51232-NEXT: retq
	;			;
	Show All 19 Lines
	; SSE32-NEXT: movapd %xmm1, 16(%rdi)			; SSE32-NEXT: movapd %xmm1, 16(%rdi)
	; SSE32-NEXT: movapd %xmm2, 32(%rdi)			; SSE32-NEXT: movapd %xmm2, 32(%rdi)
	; SSE32-NEXT: movapd %xmm3, 48(%rdi)			; SSE32-NEXT: movapd %xmm3, 48(%rdi)
	; SSE32-NEXT: retq			; SSE32-NEXT: retq
	;			;
	; SSE64-LABEL: test_store_8xf64_aligned:			; SSE64-LABEL: test_store_8xf64_aligned:
	; SSE64: # BB#0:			; SSE64: # BB#0:
	; SSE64-NEXT: subl $12, %esp			; SSE64-NEXT: subl $12, %esp
	; SSE64-NEXT: .Lcfi7:			; SSE64-NEXT: .Lcfi14:
	; SSE64-NEXT: .cfi_def_cfa_offset 16			; SSE64-NEXT: .cfi_def_cfa_offset 16
	; SSE64-NEXT: movapd {{[0-9]+}}(%esp), %xmm3			; SSE64-NEXT: movapd {{[0-9]+}}(%esp), %xmm3
	; SSE64-NEXT: movl {{[0-9]+}}(%esp), %eax			; SSE64-NEXT: movl {{[0-9]+}}(%esp), %eax
	; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm3			; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm3
	; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm2			; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm2
	; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm1			; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm1
	; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm0			; SSE64-NEXT: addpd {{[0-9]+}}(%esp), %xmm0
	; SSE64-NEXT: movapd %xmm0, (%eax)			; SSE64-NEXT: movapd %xmm0, (%eax)
	; SSE64-NEXT: movapd %xmm1, 16(%eax)			; SSE64-NEXT: movapd %xmm1, 16(%eax)
	; SSE64-NEXT: movapd %xmm2, 32(%eax)			; SSE64-NEXT: movapd %xmm2, 32(%eax)
	; SSE64-NEXT: movapd %xmm3, 48(%eax)			; SSE64-NEXT: movapd %xmm3, 48(%eax)
	; SSE64-NEXT: addl $12, %esp			; SSE64-NEXT: addl $12, %esp
				; SSE64-NEXT: .Lcfi15:
				; SSE64-NEXT: .cfi_def_cfa_offset 4
	; SSE64-NEXT: retl			; SSE64-NEXT: retl
	;			;
	; AVXONLY32-LABEL: test_store_8xf64_aligned:			; AVXONLY32-LABEL: test_store_8xf64_aligned:
	; AVXONLY32: # BB#0:			; AVXONLY32: # BB#0:
	; AVXONLY32-NEXT: vaddpd %ymm3, %ymm1, %ymm1			; AVXONLY32-NEXT: vaddpd %ymm3, %ymm1, %ymm1
	; AVXONLY32-NEXT: vaddpd %ymm2, %ymm0, %ymm0			; AVXONLY32-NEXT: vaddpd %ymm2, %ymm0, %ymm0
	; AVXONLY32-NEXT: vmovapd %ymm0, (%rdi)			; AVXONLY32-NEXT: vmovapd %ymm0, (%rdi)
	; AVXONLY32-NEXT: vmovapd %ymm1, 32(%rdi)			; AVXONLY32-NEXT: vmovapd %ymm1, 32(%rdi)
	; AVXONLY32-NEXT: retq			; AVXONLY32-NEXT: retq
	;			;
	; AVXONLY64-LABEL: test_store_8xf64_aligned:			; AVXONLY64-LABEL: test_store_8xf64_aligned:
	; AVXONLY64: # BB#0:			; AVXONLY64: # BB#0:
	; AVXONLY64-NEXT: pushl %ebp			; AVXONLY64-NEXT: pushl %ebp
	; AVXONLY64-NEXT: .Lcfi3:
	; AVXONLY64-NEXT: .cfi_def_cfa_offset 8
	; AVXONLY64-NEXT: .Lcfi4:			; AVXONLY64-NEXT: .Lcfi4:
				; AVXONLY64-NEXT: .cfi_def_cfa_offset 8
				; AVXONLY64-NEXT: .Lcfi5:
	; AVXONLY64-NEXT: .cfi_offset %ebp, -8			; AVXONLY64-NEXT: .cfi_offset %ebp, -8
	; AVXONLY64-NEXT: movl %esp, %ebp			; AVXONLY64-NEXT: movl %esp, %ebp
	; AVXONLY64-NEXT: .Lcfi5:			; AVXONLY64-NEXT: .Lcfi6:
	; AVXONLY64-NEXT: .cfi_def_cfa_register %ebp			; AVXONLY64-NEXT: .cfi_def_cfa_register %ebp
	; AVXONLY64-NEXT: andl $-32, %esp			; AVXONLY64-NEXT: andl $-32, %esp
	; AVXONLY64-NEXT: subl $32, %esp			; AVXONLY64-NEXT: subl $32, %esp
	; AVXONLY64-NEXT: movl 8(%ebp), %eax			; AVXONLY64-NEXT: movl 8(%ebp), %eax
	; AVXONLY64-NEXT: vaddpd 40(%ebp), %ymm1, %ymm1			; AVXONLY64-NEXT: vaddpd 40(%ebp), %ymm1, %ymm1
	; AVXONLY64-NEXT: vaddpd %ymm2, %ymm0, %ymm0			; AVXONLY64-NEXT: vaddpd %ymm2, %ymm0, %ymm0
	; AVXONLY64-NEXT: vmovapd %ymm0, (%eax)			; AVXONLY64-NEXT: vmovapd %ymm0, (%eax)
	; AVXONLY64-NEXT: vmovapd %ymm1, 32(%eax)			; AVXONLY64-NEXT: vmovapd %ymm1, 32(%eax)
	; AVXONLY64-NEXT: movl %ebp, %esp			; AVXONLY64-NEXT: movl %ebp, %esp
	; AVXONLY64-NEXT: popl %ebp			; AVXONLY64-NEXT: popl %ebp
				; AVXONLY64-NEXT: .Lcfi7:
				; AVXONLY64-NEXT: .cfi_def_cfa %esp, 4
	; AVXONLY64-NEXT: retl			; AVXONLY64-NEXT: retl
	;			;
	; AVX51232-LABEL: test_store_8xf64_aligned:			; AVX51232-LABEL: test_store_8xf64_aligned:
	; AVX51232: # BB#0:			; AVX51232: # BB#0:
	; AVX51232-NEXT: vaddpd %zmm1, %zmm0, %zmm0			; AVX51232-NEXT: vaddpd %zmm1, %zmm0, %zmm0
	; AVX51232-NEXT: vmovapd %zmm0, (%rdi)			; AVX51232-NEXT: vmovapd %zmm0, (%rdi)
	; AVX51232-NEXT: retq			; AVX51232-NEXT: retq
	;			;
	Show All 10 Lines

test/CodeGen/X86/frame-lowering-debug-intrinsic-2.ll

Show All 12 Lines	entry:
%4 = extractvalue { i64, i1 } %3, 0		%4 = extractvalue { i64, i1 } %3, 0
%5 = tail call i64 @fn1(i64 %4, i64 %2)		%5 = tail call i64 @fn1(i64 %4, i64 %2)
tail call void (...) @printf()		tail call void (...) @printf()
tail call void (...) @printf(i64 1, i64 2, i64 3, i64 4, i32 0, i64 0, i64 %4, i64 %5)		tail call void (...) @printf(i64 1, i64 2, i64 3, i64 4, i32 0, i64 0, i64 %4, i64 %5)
ret void		ret void
}		}

; CHECK-LABEL: noDebug		; CHECK-LABEL: noDebug
; CHECK: addq $24, %rsp		; CHECK: addq $16, %rsp
		; CHECK: addq $8, %rsp
; CHECK: popq %rbx		; CHECK: popq %rbx
		; CHECK-NEXT: :
		; CHECK-NEXT: .cfi_def_cfa_offset 16
; CHECK-NEXT: popq %r14		; CHECK-NEXT: popq %r14
		; CHECK-NEXT: :
		; CHECK-NEXT: .cfi_def_cfa_offset 8
; CHECK-NEXT: retq		; CHECK-NEXT: retq


define void @withDebug() !dbg !18 {		define void @withDebug() !dbg !18 {
entry:		entry:
%0 = load i64, i64* @a, align 8		%0 = load i64, i64* @a, align 8
%1 = load i64, i64* @a, align 8		%1 = load i64, i64* @a, align 8
%2 = load i64, i64* @a, align 8		%2 = load i64, i64* @a, align 8
%3 = tail call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %0, i64 %1)		%3 = tail call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %0, i64 %1)
%4 = extractvalue { i64, i1 } %3, 0		%4 = extractvalue { i64, i1 } %3, 0
%5 = tail call i64 @fn1(i64 %4, i64 %2)		%5 = tail call i64 @fn1(i64 %4, i64 %2)
tail call void @llvm.dbg.value(metadata i64 %4, i64 0, metadata !23, metadata !33), !dbg !34		tail call void @llvm.dbg.value(metadata i64 %4, i64 0, metadata !23, metadata !33), !dbg !34
tail call void @llvm.dbg.value(metadata i64 %5, i64 0, metadata !22, metadata !33), !dbg !35		tail call void @llvm.dbg.value(metadata i64 %5, i64 0, metadata !22, metadata !33), !dbg !35
tail call void (...) @printf()		tail call void (...) @printf()
tail call void (...) @printf(i64 1, i64 2, i64 3, i64 4, i32 0, i64 0, i64 %4, i64 %5)		tail call void (...) @printf(i64 1, i64 2, i64 3, i64 4, i32 0, i64 0, i64 %4, i64 %5)
ret void		ret void
}		}

; CHECK-LABEL: withDebug		; CHECK-LABEL: withDebug
; CHECK: #DEBUG_VALUE: test:j <- %RBX		; CHECK: #DEBUG_VALUE: test:j <- %RBX
; CHECK-NEXT: addq $24, %rsp		; CHECK-NEXT: addq $16, %rsp
		; CHECK: addq $8, %rsp
; CHECK: popq %rbx		; CHECK: popq %rbx
		; CHECK-NEXT: :
		; CHECK-NEXT: .cfi_def_cfa_offset 16
; CHECK-NEXT: popq %r14		; CHECK-NEXT: popq %r14
		; CHECK-NEXT: :
		; CHECK-NEXT: .cfi_def_cfa_offset 8
; CHECK-NEXT: retq		; CHECK-NEXT: retq

declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64)		declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64)
declare i64 @fn1(i64, i64)		declare i64 @fn1(i64, i64)

declare void @printf(...)		declare void @printf(...)

declare void @llvm.dbg.value(metadata, i64, metadata, metadata)		declare void @llvm.dbg.value(metadata, i64, metadata, metadata)
Show All 18 Lines

test/CodeGen/X86/frame-lowering-debug-intrinsic.ll

	; Test ensuring debug intrinsics do not affect generated function prologue.			; Test ensuring debug intrinsics do not affect generated function prologue.
	;			;
	; RUN: llc -O1 -mtriple=x86_64-unknown-unknown -o - %s \| FileCheck %s			; RUN: llc -O1 -mtriple=x86_64-unknown-unknown -o - %s \| FileCheck %s

	define i64 @fn1NoDebug(i64 %a) {			define i64 @fn1NoDebug(i64 %a) {
	%call = call i64 @fn(i64 %a, i64 0)			%call = call i64 @fn(i64 %a, i64 0)
	ret i64 %call			ret i64 %call
	}			}

	; CHECK-LABEL: fn1NoDebug			; CHECK-LABEL: fn1NoDebug
	; CHECK: popq %rcx			; CHECK: popq %rcx
				; CHECK-NEXT: :
				; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret

	define i64 @fn1WithDebug(i64 %a) !dbg !4 {			define i64 @fn1WithDebug(i64 %a) !dbg !4 {
	%call = call i64 @fn(i64 %a, i64 0)			%call = call i64 @fn(i64 %a, i64 0)
	tail call void @llvm.dbg.value(metadata i64 %call, i64 0, metadata !5, metadata !6), !dbg !7			tail call void @llvm.dbg.value(metadata i64 %call, i64 0, metadata !5, metadata !6), !dbg !7
	ret i64 %call			ret i64 %call
	}			}

	; CHECK-LABEL: fn1WithDebug			; CHECK-LABEL: fn1WithDebug
	; CHECK: popq %rcx			; CHECK: popq %rcx
				; CHECK-NEXT: :
				; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret

	%struct.Buffer = type { i8, [63 x i8] }			%struct.Buffer = type { i8, [63 x i8] }

	define void @fn2NoDebug(%struct.Buffer* byval align 64 %p1) {			define void @fn2NoDebug(%struct.Buffer* byval align 64 %p1) {
	ret void			ret void
	}			}

	; CHECK-LABEL: fn2NoDebug			; CHECK-LABEL: fn2NoDebug
	; CHECK: and			; CHECK: and
	; CHECK-NOT: add			; CHECK-NOT: add
	; CHECK-NOT: sub			; CHECK-NOT: sub
	; CHECK: mov			; CHECK: mov
	; CHECK-NEXT: pop			; CHECK-NEXT: pop
				; CHECK-NEXT: :
				; CHECK-NEXT: .cfi_def_cfa %rsp, 8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret

	define void @fn2WithDebug(%struct.Buffer* byval align 64 %p1) !dbg !8 {			define void @fn2WithDebug(%struct.Buffer* byval align 64 %p1) !dbg !8 {
	call void @llvm.dbg.declare(metadata %struct.Buffer* %p1, metadata !9, metadata !6), !dbg !10			call void @llvm.dbg.declare(metadata %struct.Buffer* %p1, metadata !9, metadata !6), !dbg !10
	ret void			ret void
	}			}

	; CHECK-LABEL: fn2WithDebug			; CHECK-LABEL: fn2WithDebug
	; CHECK: and			; CHECK: and
	; CHECK-NOT: add			; CHECK-NOT: add
	; CHECK-NOT: sub			; CHECK-NOT: sub
	; CHECK: mov			; CHECK: mov
	; CHECK-NEXT: pop			; CHECK-NEXT: pop
				; CHECK-NEXT: :
				; CHECK-NEXT: .cfi_def_cfa %rsp, 8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret

	declare i64 @fn(i64, i64)			declare i64 @fn(i64, i64)

	declare void @llvm.dbg.value(metadata, i64, metadata, metadata)			declare void @llvm.dbg.value(metadata, i64, metadata, metadata)
	declare void @llvm.dbg.declare(metadata, metadata, metadata)			declare void @llvm.dbg.declare(metadata, metadata, metadata)

	!llvm.dbg.cu = !{!0}			!llvm.dbg.cu = !{!0}
	Show All 13 Lines

test/CodeGen/X86/haddsub-2.ll

Show First 20 Lines • Show All 730 Lines • ▼ Show 20 Lines
; SSE3-NEXT: punpcklwd {{.*#+}} xmm6 = xmm6[0],xmm12[0],xmm6[1],xmm12[1],xmm6[2],xmm12[2],xmm6[3],xmm12[3]		; SSE3-NEXT: punpcklwd {{.*#+}} xmm6 = xmm6[0],xmm12[0],xmm6[1],xmm12[1],xmm6[2],xmm12[2],xmm6[3],xmm12[3]
; SSE3-NEXT: punpcklwd {{.*#+}} xmm5 = xmm5[0],xmm13[0],xmm5[1],xmm13[1],xmm5[2],xmm13[2],xmm5[3],xmm13[3]		; SSE3-NEXT: punpcklwd {{.*#+}} xmm5 = xmm5[0],xmm13[0],xmm5[1],xmm13[1],xmm5[2],xmm13[2],xmm5[3],xmm13[3]
; SSE3-NEXT: punpckldq {{.*#+}} xmm5 = xmm5[0],xmm6[0],xmm5[1],xmm6[1]		; SSE3-NEXT: punpckldq {{.*#+}} xmm5 = xmm5[0],xmm6[0],xmm5[1],xmm6[1]
; SSE3-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm14[0],xmm2[1],xmm14[1],xmm2[2],xmm14[2],xmm2[3],xmm14[3]		; SSE3-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm14[0],xmm2[1],xmm14[1],xmm2[2],xmm14[2],xmm2[3],xmm14[3]
; SSE3-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm15[0],xmm1[1],xmm15[1],xmm1[2],xmm15[2],xmm1[3],xmm15[3]		; SSE3-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm15[0],xmm1[1],xmm15[1],xmm1[2],xmm15[2],xmm1[3],xmm15[3]
; SSE3-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]		; SSE3-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
; SSE3-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm5[0]		; SSE3-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm5[0]
; SSE3-NEXT: popq %rbx		; SSE3-NEXT: popq %rbx
		; SSE3-NEXT: .Lcfi12:
		; SSE3-NEXT: .cfi_def_cfa_offset 48
; SSE3-NEXT: popq %r12		; SSE3-NEXT: popq %r12
		; SSE3-NEXT: .Lcfi13:
		; SSE3-NEXT: .cfi_def_cfa_offset 40
; SSE3-NEXT: popq %r13		; SSE3-NEXT: popq %r13
		; SSE3-NEXT: .Lcfi14:
		; SSE3-NEXT: .cfi_def_cfa_offset 32
; SSE3-NEXT: popq %r14		; SSE3-NEXT: popq %r14
		; SSE3-NEXT: .Lcfi15:
		; SSE3-NEXT: .cfi_def_cfa_offset 24
; SSE3-NEXT: popq %r15		; SSE3-NEXT: popq %r15
		; SSE3-NEXT: .Lcfi16:
		; SSE3-NEXT: .cfi_def_cfa_offset 16
; SSE3-NEXT: popq %rbp		; SSE3-NEXT: popq %rbp
		; SSE3-NEXT: .Lcfi17:
		; SSE3-NEXT: .cfi_def_cfa_offset 8
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
; SSSE3-LABEL: avx2_vphadd_w_test:		; SSSE3-LABEL: avx2_vphadd_w_test:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: phaddw %xmm1, %xmm0		; SSSE3-NEXT: phaddw %xmm1, %xmm0
; SSSE3-NEXT: phaddw %xmm3, %xmm2		; SSSE3-NEXT: phaddw %xmm3, %xmm2
; SSSE3-NEXT: movdqa %xmm2, %xmm1		; SSSE3-NEXT: movdqa %xmm2, %xmm1
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
▲ Show 20 Lines • Show All 506 Lines • ▼ Show 20 Lines	; AVX2-NEXT: retq
%vecinit29 = insertelement <8 x i32> %vecinit25, i32 %add28, i32 7		%vecinit29 = insertelement <8 x i32> %vecinit25, i32 %add28, i32 7
ret <8 x i32> %vecinit29		ret <8 x i32> %vecinit29
}		}

define <16 x i16> @avx2_hadd_w(<16 x i16> %a, <16 x i16> %b) {		define <16 x i16> @avx2_hadd_w(<16 x i16> %a, <16 x i16> %b) {
; SSE3-LABEL: avx2_hadd_w:		; SSE3-LABEL: avx2_hadd_w:
; SSE3: # BB#0:		; SSE3: # BB#0:
; SSE3-NEXT: pushq %rbp		; SSE3-NEXT: pushq %rbp
; SSE3-NEXT: .Lcfi12:		; SSE3-NEXT: .Lcfi18:
; SSE3-NEXT: .cfi_def_cfa_offset 16		; SSE3-NEXT: .cfi_def_cfa_offset 16
; SSE3-NEXT: pushq %r15		; SSE3-NEXT: pushq %r15
; SSE3-NEXT: .Lcfi13:		; SSE3-NEXT: .Lcfi19:
; SSE3-NEXT: .cfi_def_cfa_offset 24		; SSE3-NEXT: .cfi_def_cfa_offset 24
; SSE3-NEXT: pushq %r14		; SSE3-NEXT: pushq %r14
; SSE3-NEXT: .Lcfi14:		; SSE3-NEXT: .Lcfi20:
; SSE3-NEXT: .cfi_def_cfa_offset 32		; SSE3-NEXT: .cfi_def_cfa_offset 32
; SSE3-NEXT: pushq %r13		; SSE3-NEXT: pushq %r13
; SSE3-NEXT: .Lcfi15:		; SSE3-NEXT: .Lcfi21:
; SSE3-NEXT: .cfi_def_cfa_offset 40		; SSE3-NEXT: .cfi_def_cfa_offset 40
; SSE3-NEXT: pushq %r12		; SSE3-NEXT: pushq %r12
; SSE3-NEXT: .Lcfi16:		; SSE3-NEXT: .Lcfi22:
; SSE3-NEXT: .cfi_def_cfa_offset 48		; SSE3-NEXT: .cfi_def_cfa_offset 48
; SSE3-NEXT: pushq %rbx		; SSE3-NEXT: pushq %rbx
; SSE3-NEXT: .Lcfi17:		; SSE3-NEXT: .Lcfi23:
; SSE3-NEXT: .cfi_def_cfa_offset 56		; SSE3-NEXT: .cfi_def_cfa_offset 56
; SSE3-NEXT: .Lcfi18:		; SSE3-NEXT: .Lcfi24:
; SSE3-NEXT: .cfi_offset %rbx, -56		; SSE3-NEXT: .cfi_offset %rbx, -56
; SSE3-NEXT: .Lcfi19:		; SSE3-NEXT: .Lcfi25:
; SSE3-NEXT: .cfi_offset %r12, -48		; SSE3-NEXT: .cfi_offset %r12, -48
; SSE3-NEXT: .Lcfi20:		; SSE3-NEXT: .Lcfi26:
; SSE3-NEXT: .cfi_offset %r13, -40		; SSE3-NEXT: .cfi_offset %r13, -40
; SSE3-NEXT: .Lcfi21:		; SSE3-NEXT: .Lcfi27:
; SSE3-NEXT: .cfi_offset %r14, -32		; SSE3-NEXT: .cfi_offset %r14, -32
; SSE3-NEXT: .Lcfi22:		; SSE3-NEXT: .Lcfi28:
; SSE3-NEXT: .cfi_offset %r15, -24		; SSE3-NEXT: .cfi_offset %r15, -24
; SSE3-NEXT: .Lcfi23:		; SSE3-NEXT: .Lcfi29:
; SSE3-NEXT: .cfi_offset %rbp, -16		; SSE3-NEXT: .cfi_offset %rbp, -16
; SSE3-NEXT: movd %xmm0, %eax		; SSE3-NEXT: movd %xmm0, %eax
; SSE3-NEXT: pextrw $1, %xmm0, %r10d		; SSE3-NEXT: pextrw $1, %xmm0, %r10d
; SSE3-NEXT: addl %eax, %r10d		; SSE3-NEXT: addl %eax, %r10d
; SSE3-NEXT: pextrw $2, %xmm0, %eax		; SSE3-NEXT: pextrw $2, %xmm0, %eax
; SSE3-NEXT: pextrw $3, %xmm0, %r11d		; SSE3-NEXT: pextrw $3, %xmm0, %r11d
; SSE3-NEXT: addl %eax, %r11d		; SSE3-NEXT: addl %eax, %r11d
; SSE3-NEXT: pextrw $4, %xmm0, %eax		; SSE3-NEXT: pextrw $4, %xmm0, %eax
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
; SSE3-NEXT: punpcklwd {{.*#+}} xmm6 = xmm6[0],xmm12[0],xmm6[1],xmm12[1],xmm6[2],xmm12[2],xmm6[3],xmm12[3]		; SSE3-NEXT: punpcklwd {{.*#+}} xmm6 = xmm6[0],xmm12[0],xmm6[1],xmm12[1],xmm6[2],xmm12[2],xmm6[3],xmm12[3]
; SSE3-NEXT: punpcklwd {{.*#+}} xmm5 = xmm5[0],xmm13[0],xmm5[1],xmm13[1],xmm5[2],xmm13[2],xmm5[3],xmm13[3]		; SSE3-NEXT: punpcklwd {{.*#+}} xmm5 = xmm5[0],xmm13[0],xmm5[1],xmm13[1],xmm5[2],xmm13[2],xmm5[3],xmm13[3]
; SSE3-NEXT: punpckldq {{.*#+}} xmm5 = xmm5[0],xmm6[0],xmm5[1],xmm6[1]		; SSE3-NEXT: punpckldq {{.*#+}} xmm5 = xmm5[0],xmm6[0],xmm5[1],xmm6[1]
; SSE3-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm14[0],xmm2[1],xmm14[1],xmm2[2],xmm14[2],xmm2[3],xmm14[3]		; SSE3-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm14[0],xmm2[1],xmm14[1],xmm2[2],xmm14[2],xmm2[3],xmm14[3]
; SSE3-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm15[0],xmm1[1],xmm15[1],xmm1[2],xmm15[2],xmm1[3],xmm15[3]		; SSE3-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm15[0],xmm1[1],xmm15[1],xmm1[2],xmm15[2],xmm1[3],xmm15[3]
; SSE3-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]		; SSE3-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
; SSE3-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm5[0]		; SSE3-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm5[0]
; SSE3-NEXT: popq %rbx		; SSE3-NEXT: popq %rbx
		; SSE3-NEXT: .Lcfi30:
		; SSE3-NEXT: .cfi_def_cfa_offset 48
; SSE3-NEXT: popq %r12		; SSE3-NEXT: popq %r12
		; SSE3-NEXT: .Lcfi31:
		; SSE3-NEXT: .cfi_def_cfa_offset 40
; SSE3-NEXT: popq %r13		; SSE3-NEXT: popq %r13
		; SSE3-NEXT: .Lcfi32:
		; SSE3-NEXT: .cfi_def_cfa_offset 32
; SSE3-NEXT: popq %r14		; SSE3-NEXT: popq %r14
		; SSE3-NEXT: .Lcfi33:
		; SSE3-NEXT: .cfi_def_cfa_offset 24
; SSE3-NEXT: popq %r15		; SSE3-NEXT: popq %r15
		; SSE3-NEXT: .Lcfi34:
		; SSE3-NEXT: .cfi_def_cfa_offset 16
; SSE3-NEXT: popq %rbp		; SSE3-NEXT: popq %rbp
		; SSE3-NEXT: .Lcfi35:
		; SSE3-NEXT: .cfi_def_cfa_offset 8
; SSE3-NEXT: retq		; SSE3-NEXT: retq
;		;
; SSSE3-LABEL: avx2_hadd_w:		; SSSE3-LABEL: avx2_hadd_w:
; SSSE3: # BB#0:		; SSSE3: # BB#0:
; SSSE3-NEXT: phaddw %xmm2, %xmm0		; SSSE3-NEXT: phaddw %xmm2, %xmm0
; SSSE3-NEXT: phaddw %xmm3, %xmm1		; SSSE3-NEXT: phaddw %xmm3, %xmm1
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

test/CodeGen/X86/hipe-cc64.ll

	Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	; Sanity-check the tail call sequence. Number of arguments was chosen as to			; Sanity-check the tail call sequence. Number of arguments was chosen as to
	; expose a bug where the tail call sequence clobbered the stack.			; expose a bug where the tail call sequence clobbered the stack.
	define cc 11 { i64, i64, i64 } @tailcaller(i64 %hp, i64 %p) #0 {			define cc 11 { i64, i64, i64 } @tailcaller(i64 %hp, i64 %p) #0 {
	; CHECK: movl $15, %esi			; CHECK: movl $15, %esi
	; CHECK-NEXT: movl $31, %edx			; CHECK-NEXT: movl $31, %edx
	; CHECK-NEXT: movl $47, %ecx			; CHECK-NEXT: movl $47, %ecx
	; CHECK-NEXT: movl $63, %r8d			; CHECK-NEXT: movl $63, %r8d
	; CHECK-NEXT: popq %rax			; CHECK-NEXT: popq %rax
				; CHECK-NEXT: :
				; CHECK-NEXT: .cfi_def_cfa_offset 16
	; CHECK-NEXT: jmp tailcallee			; CHECK-NEXT: jmp tailcallee
	%ret = tail call cc11 { i64, i64, i64 } @tailcallee(i64 %hp, i64 %p, i64 15,			%ret = tail call cc11 { i64, i64, i64 } @tailcallee(i64 %hp, i64 %p, i64 15,
	i64 31, i64 47, i64 63, i64 79) #1			i64 31, i64 47, i64 63, i64 79) #1
	ret { i64, i64, i64 } %ret			ret { i64, i64, i64 } %ret
	}			}

	!hipe.literals = !{ !0, !1, !2 }			!hipe.literals = !{ !0, !1, !2 }
	!0 = !{ !"P_NSP_LIMIT", i32 160 }			!0 = !{ !"P_NSP_LIMIT", i32 160 }
	!1 = !{ !"X86_LEAF_WORDS", i32 24 }			!1 = !{ !"X86_LEAF_WORDS", i32 24 }
	!2 = !{ !"AMD64_LEAF_WORDS", i32 24 }			!2 = !{ !"AMD64_LEAF_WORDS", i32 24 }
	@clos = external constant i64			@clos = external constant i64
	declare cc 11 void @bar(i64, i64, i64, i64, i64, i64)			declare cc 11 void @bar(i64, i64, i64, i64, i64, i64)
	declare cc 11 { i64, i64, i64 } @tailcallee(i64, i64, i64, i64, i64, i64, i64)			declare cc 11 { i64, i64, i64 } @tailcallee(i64, i64, i64, i64, i64, i64, i64)

test/CodeGen/X86/imul.ll

	Show First 20 Lines • Show All 303 Lines • ▼ Show 20 Lines
	; X86-NEXT: shll $5, %esi			; X86-NEXT: shll $5, %esi
	; X86-NEXT: subl %eax, %esi			; X86-NEXT: subl %eax, %esi
	; X86-NEXT: movl $-31, %edx			; X86-NEXT: movl $-31, %edx
	; X86-NEXT: movl %ecx, %eax			; X86-NEXT: movl %ecx, %eax
	; X86-NEXT: mull %edx			; X86-NEXT: mull %edx
	; X86-NEXT: subl %ecx, %edx			; X86-NEXT: subl %ecx, %edx
	; X86-NEXT: subl %esi, %edx			; X86-NEXT: subl %esi, %edx
	; X86-NEXT: popl %esi			; X86-NEXT: popl %esi
				; X86-NEXT: .Lcfi2:
				; X86-NEXT: .cfi_def_cfa_offset 4
	; X86-NEXT: retl			; X86-NEXT: retl
	entry:			entry:
	%tmp3 = mul i64 %a, -31			%tmp3 = mul i64 %a, -31
	ret i64 %tmp3			ret i64 %tmp3
	}			}


	define i64 @test6(i64 %a) {			define i64 @test6(i64 %a) {
	Show All 26 Lines
	; X64-NEXT: shlq $5, %rax			; X64-NEXT: shlq $5, %rax
	; X64-NEXT: leaq (%rax,%rdi), %rax			; X64-NEXT: leaq (%rax,%rdi), %rax
	; X64-NEXT: negq %rax			; X64-NEXT: negq %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: test7:			; X86-LABEL: test7:
	; X86: # BB#0: # %entry			; X86: # BB#0: # %entry
	; X86-NEXT: pushl %esi			; X86-NEXT: pushl %esi
	; X86-NEXT: .Lcfi2:
	; X86-NEXT: .cfi_def_cfa_offset 8
	; X86-NEXT: .Lcfi3:			; X86-NEXT: .Lcfi3:
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: .Lcfi4:
	; X86-NEXT: .cfi_offset %esi, -8			; X86-NEXT: .cfi_offset %esi, -8
	; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: movl %eax, %esi			; X86-NEXT: movl %eax, %esi
	; X86-NEXT: shll $5, %esi			; X86-NEXT: shll $5, %esi
	; X86-NEXT: addl %eax, %esi			; X86-NEXT: addl %eax, %esi
	; X86-NEXT: movl $-33, %edx			; X86-NEXT: movl $-33, %edx
	; X86-NEXT: movl %ecx, %eax			; X86-NEXT: movl %ecx, %eax
	; X86-NEXT: mull %edx			; X86-NEXT: mull %edx
	; X86-NEXT: subl %ecx, %edx			; X86-NEXT: subl %ecx, %edx
	; X86-NEXT: subl %esi, %edx			; X86-NEXT: subl %esi, %edx
	; X86-NEXT: popl %esi			; X86-NEXT: popl %esi
				; X86-NEXT: .Lcfi5:
				; X86-NEXT: .cfi_def_cfa_offset 4
	; X86-NEXT: retl			; X86-NEXT: retl
	entry:			entry:
	%tmp3 = mul i64 %a, -33			%tmp3 = mul i64 %a, -33
	ret i64 %tmp3			ret i64 %tmp3
	}			}

	define i64 @testOverflow(i64 %a) {			define i64 @testOverflow(i64 %a) {
	; X64-LABEL: testOverflow:			; X64-LABEL: testOverflow:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: movabsq $9223372036854775807, %rax # imm = 0x7FFFFFFFFFFFFFFF			; X64-NEXT: movabsq $9223372036854775807, %rax # imm = 0x7FFFFFFFFFFFFFFF
	; X64-NEXT: imulq %rdi, %rax			; X64-NEXT: imulq %rdi, %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: testOverflow:			; X86-LABEL: testOverflow:
	; X86: # BB#0: # %entry			; X86: # BB#0: # %entry
	; X86-NEXT: pushl %esi			; X86-NEXT: pushl %esi
	; X86-NEXT: .Lcfi4:			; X86-NEXT: .Lcfi6:
	; X86-NEXT: .cfi_def_cfa_offset 8			; X86-NEXT: .cfi_def_cfa_offset 8
	; X86-NEXT: .Lcfi5:			; X86-NEXT: .Lcfi7:
	; X86-NEXT: .cfi_offset %esi, -8			; X86-NEXT: .cfi_offset %esi, -8
	; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: movl $-1, %edx			; X86-NEXT: movl $-1, %edx
	; X86-NEXT: movl %ecx, %eax			; X86-NEXT: movl %ecx, %eax
	; X86-NEXT: mull %edx			; X86-NEXT: mull %edx
	; X86-NEXT: movl %ecx, %esi			; X86-NEXT: movl %ecx, %esi
	; X86-NEXT: shll $31, %esi			; X86-NEXT: shll $31, %esi
	; X86-NEXT: subl %ecx, %esi			; X86-NEXT: subl %ecx, %esi
	; X86-NEXT: addl %esi, %edx			; X86-NEXT: addl %esi, %edx
	; X86-NEXT: subl {{[0-9]+}}(%esp), %edx			; X86-NEXT: subl {{[0-9]+}}(%esp), %edx
	; X86-NEXT: popl %esi			; X86-NEXT: popl %esi
				; X86-NEXT: .Lcfi8:
				; X86-NEXT: .cfi_def_cfa_offset 4
	; X86-NEXT: retl			; X86-NEXT: retl
	entry:			entry:
	%tmp3 = mul i64 %a, 9223372036854775807			%tmp3 = mul i64 %a, 9223372036854775807
	ret i64 %tmp3			ret i64 %tmp3
	}			}

test/CodeGen/X86/legalize-shift-64.ll

	Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: movl %ebx, %ebp			; CHECK-NEXT: movl %ebx, %ebp
	; CHECK-NEXT: xorl %ebx, %ebx			; CHECK-NEXT: xorl %ebx, %ebx
	; CHECK-NEXT: .LBB4_4:			; CHECK-NEXT: .LBB4_4:
	; CHECK-NEXT: movl %ebp, 12(%eax)			; CHECK-NEXT: movl %ebp, 12(%eax)
	; CHECK-NEXT: movl %ebx, 8(%eax)			; CHECK-NEXT: movl %ebx, 8(%eax)
	; CHECK-NEXT: movl %esi, 4(%eax)			; CHECK-NEXT: movl %esi, 4(%eax)
	; CHECK-NEXT: movl %edi, (%eax)			; CHECK-NEXT: movl %edi, (%eax)
	; CHECK-NEXT: popl %esi			; CHECK-NEXT: popl %esi
				; CHECK-NEXT: .Lcfi8:
				; CHECK-NEXT: .cfi_def_cfa_offset 16
	; CHECK-NEXT: popl %edi			; CHECK-NEXT: popl %edi
				; CHECK-NEXT: .Lcfi9:
				; CHECK-NEXT: .cfi_def_cfa_offset 12
	; CHECK-NEXT: popl %ebx			; CHECK-NEXT: popl %ebx
				; CHECK-NEXT: .Lcfi10:
				; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: popl %ebp			; CHECK-NEXT: popl %ebp
				; CHECK-NEXT: .Lcfi11:
				; CHECK-NEXT: .cfi_def_cfa_offset 4
	; CHECK-NEXT: retl $4			; CHECK-NEXT: retl $4
	%shl = shl <2 x i64> %A, %B			%shl = shl <2 x i64> %A, %B
	ret <2 x i64> %shl			ret <2 x i64> %shl
	}			}

	; PR16108			; PR16108
	define i32 @test6() {			define i32 @test6() {
	; CHECK-LABEL: test6:			; CHECK-LABEL: test6:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: pushl %ebp			; CHECK-NEXT: pushl %ebp
	; CHECK-NEXT: .Lcfi8:			; CHECK-NEXT: .Lcfi12:
	; CHECK-NEXT: .cfi_def_cfa_offset 8			; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: .Lcfi9:			; CHECK-NEXT: .Lcfi13:
	; CHECK-NEXT: .cfi_offset %ebp, -8			; CHECK-NEXT: .cfi_offset %ebp, -8
	; CHECK-NEXT: movl %esp, %ebp			; CHECK-NEXT: movl %esp, %ebp
	; CHECK-NEXT: .Lcfi10:			; CHECK-NEXT: .Lcfi14:
	; CHECK-NEXT: .cfi_def_cfa_register %ebp			; CHECK-NEXT: .cfi_def_cfa_register %ebp
	; CHECK-NEXT: andl $-8, %esp			; CHECK-NEXT: andl $-8, %esp
	; CHECK-NEXT: subl $16, %esp			; CHECK-NEXT: subl $16, %esp
	; CHECK-NEXT: movl $1, {{[0-9]+}}(%esp)			; CHECK-NEXT: movl $1, {{[0-9]+}}(%esp)
	; CHECK-NEXT: movl $0, {{[0-9]+}}(%esp)			; CHECK-NEXT: movl $0, {{[0-9]+}}(%esp)
	; CHECK-NEXT: movl $1, (%esp)			; CHECK-NEXT: movl $1, (%esp)
	; CHECK-NEXT: movl $1, %eax			; CHECK-NEXT: movl $1, %eax
	; CHECK-NEXT: xorl %ecx, %ecx			; CHECK-NEXT: xorl %ecx, %ecx
	Show All 12 Lines
	; CHECK-NEXT: # BB#3: # %if.then			; CHECK-NEXT: # BB#3: # %if.then
	; CHECK-NEXT: movl $1, %eax			; CHECK-NEXT: movl $1, %eax
	; CHECK-NEXT: jmp .LBB5_4			; CHECK-NEXT: jmp .LBB5_4
	; CHECK-NEXT: .LBB5_5: # %if.end			; CHECK-NEXT: .LBB5_5: # %if.end
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: .LBB5_4: # %if.then			; CHECK-NEXT: .LBB5_4: # %if.then
	; CHECK-NEXT: movl %ebp, %esp			; CHECK-NEXT: movl %ebp, %esp
	; CHECK-NEXT: popl %ebp			; CHECK-NEXT: popl %ebp
				; CHECK-NEXT: .Lcfi15:
				; CHECK-NEXT: .cfi_def_cfa %esp, 4
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%x = alloca i32, align 4			%x = alloca i32, align 4
	%t = alloca i64, align 8			%t = alloca i64, align 8
	store i32 1, i32* %x, align 4			store i32 1, i32* %x, align 4
	store i64 1, i64* %t, align 8 ;; DEAD			store i64 1, i64* %t, align 8 ;; DEAD
	%load = load i32, i32* %x, align 4			%load = load i32, i32* %x, align 4
	%shl = shl i32 %load, 8			%shl = shl i32 %load, 8
	%add = add i32 %shl, -224			%add = add i32 %shl, -224
	Show All 12 Lines

test/CodeGen/X86/load-combine.ll

	Show First 20 Lines • Show All 372 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: orl %ecx, %esi			; CHECK-NEXT: orl %ecx, %esi
	; CHECK-NEXT: movzbl 2(%eax), %ecx			; CHECK-NEXT: movzbl 2(%eax), %ecx
	; CHECK-NEXT: shll $8, %ecx			; CHECK-NEXT: shll $8, %ecx
	; CHECK-NEXT: orl %esi, %ecx			; CHECK-NEXT: orl %esi, %ecx
	; CHECK-NEXT: movzbl 3(%eax), %eax			; CHECK-NEXT: movzbl 3(%eax), %eax
	; CHECK-NEXT: orl %ecx, %eax			; CHECK-NEXT: orl %ecx, %eax
	; CHECK-NEXT: orl %edx, %eax			; CHECK-NEXT: orl %edx, %eax
	; CHECK-NEXT: popl %esi			; CHECK-NEXT: popl %esi
				; CHECK-NEXT: .Lcfi2:
				; CHECK-NEXT: .cfi_def_cfa_offset 4
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	;			;
	; CHECK64-LABEL: load_i32_by_i8_bswap_uses:			; CHECK64-LABEL: load_i32_by_i8_bswap_uses:
	; CHECK64: # BB#0:			; CHECK64: # BB#0:
	; CHECK64-NEXT: movzbl (%rdi), %eax			; CHECK64-NEXT: movzbl (%rdi), %eax
	; CHECK64-NEXT: shll $24, %eax			; CHECK64-NEXT: shll $24, %eax
	; CHECK64-NEXT: movzbl 1(%rdi), %ecx			; CHECK64-NEXT: movzbl 1(%rdi), %ecx
	; CHECK64-NEXT: movl %ecx, %edx			; CHECK64-NEXT: movl %ecx, %edx
	▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	; res1 = ((i32) p[0] << 24) \| ((i32) p[1] << 16)			; res1 = ((i32) p[0] << 24) \| ((i32) p[1] << 16)
	; *q = 0;			; *q = 0;
	; res2 = ((i32) p[2] << 8) \| (i32) p[3]			; res2 = ((i32) p[2] << 8) \| (i32) p[3]
	; res1 \| res2			; res1 \| res2
	define i32 @load_i32_by_i8_bswap_store_in_between(i32* %arg, i32* %arg1) {			define i32 @load_i32_by_i8_bswap_store_in_between(i32* %arg, i32* %arg1) {
	; CHECK-LABEL: load_i32_by_i8_bswap_store_in_between:			; CHECK-LABEL: load_i32_by_i8_bswap_store_in_between:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: pushl %esi			; CHECK-NEXT: pushl %esi
	; CHECK-NEXT: .Lcfi2:
	; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: .Lcfi3:			; CHECK-NEXT: .Lcfi3:
				; CHECK-NEXT: .cfi_def_cfa_offset 8
				; CHECK-NEXT: .Lcfi4:
	; CHECK-NEXT: .cfi_offset %esi, -8			; CHECK-NEXT: .cfi_offset %esi, -8
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; CHECK-NEXT: movzbl (%ecx), %edx			; CHECK-NEXT: movzbl (%ecx), %edx
	; CHECK-NEXT: shll $24, %edx			; CHECK-NEXT: shll $24, %edx
	; CHECK-NEXT: movzbl 1(%ecx), %esi			; CHECK-NEXT: movzbl 1(%ecx), %esi
	; CHECK-NEXT: movl $0, (%eax)			; CHECK-NEXT: movl $0, (%eax)
	; CHECK-NEXT: shll $16, %esi			; CHECK-NEXT: shll $16, %esi
	; CHECK-NEXT: orl %edx, %esi			; CHECK-NEXT: orl %edx, %esi
	; CHECK-NEXT: movzbl 2(%ecx), %edx			; CHECK-NEXT: movzbl 2(%ecx), %edx
	; CHECK-NEXT: shll $8, %edx			; CHECK-NEXT: shll $8, %edx
	; CHECK-NEXT: orl %esi, %edx			; CHECK-NEXT: orl %esi, %edx
	; CHECK-NEXT: movzbl 3(%ecx), %eax			; CHECK-NEXT: movzbl 3(%ecx), %eax
	; CHECK-NEXT: orl %edx, %eax			; CHECK-NEXT: orl %edx, %eax
	; CHECK-NEXT: popl %esi			; CHECK-NEXT: popl %esi
				; CHECK-NEXT: .Lcfi5:
				; CHECK-NEXT: .cfi_def_cfa_offset 4
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	;			;
	; CHECK64-LABEL: load_i32_by_i8_bswap_store_in_between:			; CHECK64-LABEL: load_i32_by_i8_bswap_store_in_between:
	; CHECK64: # BB#0:			; CHECK64: # BB#0:
	; CHECK64-NEXT: movzbl (%rdi), %eax			; CHECK64-NEXT: movzbl (%rdi), %eax
	; CHECK64-NEXT: shll $24, %eax			; CHECK64-NEXT: shll $24, %eax
	; CHECK64-NEXT: movzbl 1(%rdi), %ecx			; CHECK64-NEXT: movzbl 1(%rdi), %ecx
	; CHECK64-NEXT: movl $0, (%rsi)			; CHECK64-NEXT: movl $0, (%rsi)
	▲ Show 20 Lines • Show All 804 Lines • Show Last 20 Lines

test/CodeGen/X86/masked_gather_scatter.ll

	Show First 20 Lines • Show All 1,758 Lines • ▼ Show 20 Lines
	; KNL_32-NEXT: vmovdqa64 8(%ebp), %zmm1			; KNL_32-NEXT: vmovdqa64 8(%ebp), %zmm1
	; KNL_32-NEXT: kshiftrw $8, %k1, %k2			; KNL_32-NEXT: kshiftrw $8, %k1, %k2
	; KNL_32-NEXT: vpgatherdq (,%ymm0), %zmm2 {%k1}			; KNL_32-NEXT: vpgatherdq (,%ymm0), %zmm2 {%k1}
	; KNL_32-NEXT: vextracti64x4 $1, %zmm0, %ymm0			; KNL_32-NEXT: vextracti64x4 $1, %zmm0, %ymm0
	; KNL_32-NEXT: vpgatherdq (,%ymm0), %zmm1 {%k2}			; KNL_32-NEXT: vpgatherdq (,%ymm0), %zmm1 {%k2}
	; KNL_32-NEXT: vmovdqa64 %zmm2, %zmm0			; KNL_32-NEXT: vmovdqa64 %zmm2, %zmm0
	; KNL_32-NEXT: movl %ebp, %esp			; KNL_32-NEXT: movl %ebp, %esp
	; KNL_32-NEXT: popl %ebp			; KNL_32-NEXT: popl %ebp
				; KNL_32-NEXT: .Lcfi3:
				; KNL_32-NEXT: .cfi_def_cfa %esp, 4
	; KNL_32-NEXT: retl			; KNL_32-NEXT: retl
	;			;
	; SKX-LABEL: test_gather_16i64:			; SKX-LABEL: test_gather_16i64:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vpmovsxbd %xmm2, %zmm2			; SKX-NEXT: vpmovsxbd %xmm2, %zmm2
	; SKX-NEXT: vpslld $31, %zmm2, %zmm2			; SKX-NEXT: vpslld $31, %zmm2, %zmm2
	; SKX-NEXT: vptestmd %zmm2, %zmm2, %k1			; SKX-NEXT: vptestmd %zmm2, %zmm2, %k1
	; SKX-NEXT: kshiftrw $8, %k1, %k2			; SKX-NEXT: kshiftrw $8, %k1, %k2
	; SKX-NEXT: vpgatherqq (,%zmm0), %zmm3 {%k1}			; SKX-NEXT: vpgatherqq (,%zmm0), %zmm3 {%k1}
	; SKX-NEXT: vpgatherqq (,%zmm1), %zmm4 {%k2}			; SKX-NEXT: vpgatherqq (,%zmm1), %zmm4 {%k2}
	; SKX-NEXT: vmovdqa64 %zmm3, %zmm0			; SKX-NEXT: vmovdqa64 %zmm3, %zmm0
	; SKX-NEXT: vmovdqa64 %zmm4, %zmm1			; SKX-NEXT: vmovdqa64 %zmm4, %zmm1
	; SKX-NEXT: retq			; SKX-NEXT: retq
	;			;
	; SKX_32-LABEL: test_gather_16i64:			; SKX_32-LABEL: test_gather_16i64:
	; SKX_32: # BB#0:			; SKX_32: # BB#0:
	; SKX_32-NEXT: pushl %ebp			; SKX_32-NEXT: pushl %ebp
	; SKX_32-NEXT: .Lcfi1:
	; SKX_32-NEXT: .cfi_def_cfa_offset 8
	; SKX_32-NEXT: .Lcfi2:			; SKX_32-NEXT: .Lcfi2:
				; SKX_32-NEXT: .cfi_def_cfa_offset 8
				; SKX_32-NEXT: .Lcfi3:
	; SKX_32-NEXT: .cfi_offset %ebp, -8			; SKX_32-NEXT: .cfi_offset %ebp, -8
	; SKX_32-NEXT: movl %esp, %ebp			; SKX_32-NEXT: movl %esp, %ebp
	; SKX_32-NEXT: .Lcfi3:			; SKX_32-NEXT: .Lcfi4:
	; SKX_32-NEXT: .cfi_def_cfa_register %ebp			; SKX_32-NEXT: .cfi_def_cfa_register %ebp
	; SKX_32-NEXT: andl $-64, %esp			; SKX_32-NEXT: andl $-64, %esp
	; SKX_32-NEXT: subl $64, %esp			; SKX_32-NEXT: subl $64, %esp
	; SKX_32-NEXT: vpmovsxbd %xmm1, %zmm1			; SKX_32-NEXT: vpmovsxbd %xmm1, %zmm1
	; SKX_32-NEXT: vpslld $31, %zmm1, %zmm1			; SKX_32-NEXT: vpslld $31, %zmm1, %zmm1
	; SKX_32-NEXT: vptestmd %zmm1, %zmm1, %k1			; SKX_32-NEXT: vptestmd %zmm1, %zmm1, %k1
	; SKX_32-NEXT: vmovdqa64 8(%ebp), %zmm1			; SKX_32-NEXT: vmovdqa64 8(%ebp), %zmm1
	; SKX_32-NEXT: kshiftrw $8, %k1, %k2			; SKX_32-NEXT: kshiftrw $8, %k1, %k2
	; SKX_32-NEXT: vpgatherdq (,%ymm0), %zmm2 {%k1}			; SKX_32-NEXT: vpgatherdq (,%ymm0), %zmm2 {%k1}
	; SKX_32-NEXT: vextracti32x8 $1, %zmm0, %ymm0			; SKX_32-NEXT: vextracti32x8 $1, %zmm0, %ymm0
	; SKX_32-NEXT: vpgatherdq (,%ymm0), %zmm1 {%k2}			; SKX_32-NEXT: vpgatherdq (,%ymm0), %zmm1 {%k2}
	; SKX_32-NEXT: vmovdqa64 %zmm2, %zmm0			; SKX_32-NEXT: vmovdqa64 %zmm2, %zmm0
	; SKX_32-NEXT: movl %ebp, %esp			; SKX_32-NEXT: movl %ebp, %esp
	; SKX_32-NEXT: popl %ebp			; SKX_32-NEXT: popl %ebp
				; SKX_32-NEXT: .Lcfi5:
				; SKX_32-NEXT: .cfi_def_cfa %esp, 4
	; SKX_32-NEXT: retl			; SKX_32-NEXT: retl
	%res = call <16 x i64> @llvm.masked.gather.v16i64.v16p0i64(<16 x i64*> %ptrs, i32 4, <16 x i1> %mask, <16 x i64> %src0)			%res = call <16 x i64> @llvm.masked.gather.v16i64.v16p0i64(<16 x i64*> %ptrs, i32 4, <16 x i1> %mask, <16 x i64> %src0)
	ret <16 x i64> %res			ret <16 x i64> %res
	}			}
	declare <16 x i64> @llvm.masked.gather.v16i64.v16p0i64(<16 x i64*> %ptrs, i32, <16 x i1> %mask, <16 x i64> %src0)			declare <16 x i64> @llvm.masked.gather.v16i64.v16p0i64(<16 x i64*> %ptrs, i32, <16 x i1> %mask, <16 x i64> %src0)
	define <16 x float> @test_gather_16f32(<16 x float*> %ptrs, <16 x i1> %mask, <16 x float> %src0) {			define <16 x float> @test_gather_16f32(<16 x float*> %ptrs, <16 x i1> %mask, <16 x float> %src0) {
	; KNL_64-LABEL: test_gather_16f32:			; KNL_64-LABEL: test_gather_16f32:
	; KNL_64: # BB#0:			; KNL_64: # BB#0:
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; KNL_64-NEXT: vgatherqpd (,%zmm1), %zmm4 {%k2}			; KNL_64-NEXT: vgatherqpd (,%zmm1), %zmm4 {%k2}
	; KNL_64-NEXT: vmovapd %zmm3, %zmm0			; KNL_64-NEXT: vmovapd %zmm3, %zmm0
	; KNL_64-NEXT: vmovapd %zmm4, %zmm1			; KNL_64-NEXT: vmovapd %zmm4, %zmm1
	; KNL_64-NEXT: retq			; KNL_64-NEXT: retq
	;			;
	; KNL_32-LABEL: test_gather_16f64:			; KNL_32-LABEL: test_gather_16f64:
	; KNL_32: # BB#0:			; KNL_32: # BB#0:
	; KNL_32-NEXT: pushl %ebp			; KNL_32-NEXT: pushl %ebp
	; KNL_32-NEXT: .Lcfi3:
	; KNL_32-NEXT: .cfi_def_cfa_offset 8
	; KNL_32-NEXT: .Lcfi4:			; KNL_32-NEXT: .Lcfi4:
				; KNL_32-NEXT: .cfi_def_cfa_offset 8
				; KNL_32-NEXT: .Lcfi5:
	; KNL_32-NEXT: .cfi_offset %ebp, -8			; KNL_32-NEXT: .cfi_offset %ebp, -8
	; KNL_32-NEXT: movl %esp, %ebp			; KNL_32-NEXT: movl %esp, %ebp
	; KNL_32-NEXT: .Lcfi5:			; KNL_32-NEXT: .Lcfi6:
	; KNL_32-NEXT: .cfi_def_cfa_register %ebp			; KNL_32-NEXT: .cfi_def_cfa_register %ebp
	; KNL_32-NEXT: andl $-64, %esp			; KNL_32-NEXT: andl $-64, %esp
	; KNL_32-NEXT: subl $64, %esp			; KNL_32-NEXT: subl $64, %esp
	; KNL_32-NEXT: vpmovsxbd %xmm1, %zmm1			; KNL_32-NEXT: vpmovsxbd %xmm1, %zmm1
	; KNL_32-NEXT: vpslld $31, %zmm1, %zmm1			; KNL_32-NEXT: vpslld $31, %zmm1, %zmm1
	; KNL_32-NEXT: vptestmd %zmm1, %zmm1, %k1			; KNL_32-NEXT: vptestmd %zmm1, %zmm1, %k1
	; KNL_32-NEXT: vmovapd 8(%ebp), %zmm1			; KNL_32-NEXT: vmovapd 8(%ebp), %zmm1
	; KNL_32-NEXT: kshiftrw $8, %k1, %k2			; KNL_32-NEXT: kshiftrw $8, %k1, %k2
	; KNL_32-NEXT: vgatherdpd (,%ymm0), %zmm2 {%k1}			; KNL_32-NEXT: vgatherdpd (,%ymm0), %zmm2 {%k1}
	; KNL_32-NEXT: vextracti64x4 $1, %zmm0, %ymm0			; KNL_32-NEXT: vextracti64x4 $1, %zmm0, %ymm0
	; KNL_32-NEXT: vgatherdpd (,%ymm0), %zmm1 {%k2}			; KNL_32-NEXT: vgatherdpd (,%ymm0), %zmm1 {%k2}
	; KNL_32-NEXT: vmovapd %zmm2, %zmm0			; KNL_32-NEXT: vmovapd %zmm2, %zmm0
	; KNL_32-NEXT: movl %ebp, %esp			; KNL_32-NEXT: movl %ebp, %esp
	; KNL_32-NEXT: popl %ebp			; KNL_32-NEXT: popl %ebp
				; KNL_32-NEXT: .Lcfi7:
				; KNL_32-NEXT: .cfi_def_cfa %esp, 4
	; KNL_32-NEXT: retl			; KNL_32-NEXT: retl
	;			;
	; SKX-LABEL: test_gather_16f64:			; SKX-LABEL: test_gather_16f64:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vpmovsxbd %xmm2, %zmm2			; SKX-NEXT: vpmovsxbd %xmm2, %zmm2
	; SKX-NEXT: vpslld $31, %zmm2, %zmm2			; SKX-NEXT: vpslld $31, %zmm2, %zmm2
	; SKX-NEXT: vptestmd %zmm2, %zmm2, %k1			; SKX-NEXT: vptestmd %zmm2, %zmm2, %k1
	; SKX-NEXT: kshiftrw $8, %k1, %k2			; SKX-NEXT: kshiftrw $8, %k1, %k2
	; SKX-NEXT: vgatherqpd (,%zmm0), %zmm3 {%k1}			; SKX-NEXT: vgatherqpd (,%zmm0), %zmm3 {%k1}
	; SKX-NEXT: vgatherqpd (,%zmm1), %zmm4 {%k2}			; SKX-NEXT: vgatherqpd (,%zmm1), %zmm4 {%k2}
	; SKX-NEXT: vmovapd %zmm3, %zmm0			; SKX-NEXT: vmovapd %zmm3, %zmm0
	; SKX-NEXT: vmovapd %zmm4, %zmm1			; SKX-NEXT: vmovapd %zmm4, %zmm1
	; SKX-NEXT: retq			; SKX-NEXT: retq
	;			;
	; SKX_32-LABEL: test_gather_16f64:			; SKX_32-LABEL: test_gather_16f64:
	; SKX_32: # BB#0:			; SKX_32: # BB#0:
	; SKX_32-NEXT: pushl %ebp			; SKX_32-NEXT: pushl %ebp
	; SKX_32-NEXT: .Lcfi4:			; SKX_32-NEXT: .Lcfi6:
	; SKX_32-NEXT: .cfi_def_cfa_offset 8			; SKX_32-NEXT: .cfi_def_cfa_offset 8
	; SKX_32-NEXT: .Lcfi5:			; SKX_32-NEXT: .Lcfi7:
	; SKX_32-NEXT: .cfi_offset %ebp, -8			; SKX_32-NEXT: .cfi_offset %ebp, -8
	; SKX_32-NEXT: movl %esp, %ebp			; SKX_32-NEXT: movl %esp, %ebp
	; SKX_32-NEXT: .Lcfi6:			; SKX_32-NEXT: .Lcfi8:
	; SKX_32-NEXT: .cfi_def_cfa_register %ebp			; SKX_32-NEXT: .cfi_def_cfa_register %ebp
	; SKX_32-NEXT: andl $-64, %esp			; SKX_32-NEXT: andl $-64, %esp
	; SKX_32-NEXT: subl $64, %esp			; SKX_32-NEXT: subl $64, %esp
	; SKX_32-NEXT: vpmovsxbd %xmm1, %zmm1			; SKX_32-NEXT: vpmovsxbd %xmm1, %zmm1
	; SKX_32-NEXT: vpslld $31, %zmm1, %zmm1			; SKX_32-NEXT: vpslld $31, %zmm1, %zmm1
	; SKX_32-NEXT: vptestmd %zmm1, %zmm1, %k1			; SKX_32-NEXT: vptestmd %zmm1, %zmm1, %k1
	; SKX_32-NEXT: vmovapd 8(%ebp), %zmm1			; SKX_32-NEXT: vmovapd 8(%ebp), %zmm1
	; SKX_32-NEXT: kshiftrw $8, %k1, %k2			; SKX_32-NEXT: kshiftrw $8, %k1, %k2
	; SKX_32-NEXT: vgatherdpd (,%ymm0), %zmm2 {%k1}			; SKX_32-NEXT: vgatherdpd (,%ymm0), %zmm2 {%k1}
	; SKX_32-NEXT: vextracti32x8 $1, %zmm0, %ymm0			; SKX_32-NEXT: vextracti32x8 $1, %zmm0, %ymm0
	; SKX_32-NEXT: vgatherdpd (,%ymm0), %zmm1 {%k2}			; SKX_32-NEXT: vgatherdpd (,%ymm0), %zmm1 {%k2}
	; SKX_32-NEXT: vmovapd %zmm2, %zmm0			; SKX_32-NEXT: vmovapd %zmm2, %zmm0
	; SKX_32-NEXT: movl %ebp, %esp			; SKX_32-NEXT: movl %ebp, %esp
	; SKX_32-NEXT: popl %ebp			; SKX_32-NEXT: popl %ebp
				; SKX_32-NEXT: .Lcfi9:
				; SKX_32-NEXT: .cfi_def_cfa %esp, 4
	; SKX_32-NEXT: retl			; SKX_32-NEXT: retl
	%res = call <16 x double> @llvm.masked.gather.v16f64.v16p0f64(<16 x double*> %ptrs, i32 4, <16 x i1> %mask, <16 x double> %src0)			%res = call <16 x double> @llvm.masked.gather.v16f64.v16p0f64(<16 x double*> %ptrs, i32 4, <16 x i1> %mask, <16 x double> %src0)
	ret <16 x double> %res			ret <16 x double> %res
	}			}
	declare <16 x double> @llvm.masked.gather.v16f64.v16p0f64(<16 x double*> %ptrs, i32, <16 x i1> %mask, <16 x double> %src0)			declare <16 x double> @llvm.masked.gather.v16f64.v16p0f64(<16 x double*> %ptrs, i32, <16 x i1> %mask, <16 x double> %src0)
	define void @test_scatter_16i32(<16 x i32*> %ptrs, <16 x i1> %mask, <16 x i32> %src0) {			define void @test_scatter_16i32(<16 x i32*> %ptrs, <16 x i1> %mask, <16 x i32> %src0) {
	; KNL_64-LABEL: test_scatter_16i32:			; KNL_64-LABEL: test_scatter_16i32:
	; KNL_64: # BB#0:			; KNL_64: # BB#0:
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; KNL_64-NEXT: vpscatterqq %zmm3, (,%zmm0) {%k1}			; KNL_64-NEXT: vpscatterqq %zmm3, (,%zmm0) {%k1}
	; KNL_64-NEXT: vpscatterqq %zmm4, (,%zmm1) {%k2}			; KNL_64-NEXT: vpscatterqq %zmm4, (,%zmm1) {%k2}
	; KNL_64-NEXT: vzeroupper			; KNL_64-NEXT: vzeroupper
	; KNL_64-NEXT: retq			; KNL_64-NEXT: retq
	;			;
	; KNL_32-LABEL: test_scatter_16i64:			; KNL_32-LABEL: test_scatter_16i64:
	; KNL_32: # BB#0:			; KNL_32: # BB#0:
	; KNL_32-NEXT: pushl %ebp			; KNL_32-NEXT: pushl %ebp
	; KNL_32-NEXT: .Lcfi6:			; KNL_32-NEXT: .Lcfi8:
	; KNL_32-NEXT: .cfi_def_cfa_offset 8			; KNL_32-NEXT: .cfi_def_cfa_offset 8
	; KNL_32-NEXT: .Lcfi7:			; KNL_32-NEXT: .Lcfi9:
	; KNL_32-NEXT: .cfi_offset %ebp, -8			; KNL_32-NEXT: .cfi_offset %ebp, -8
	; KNL_32-NEXT: movl %esp, %ebp			; KNL_32-NEXT: movl %esp, %ebp
	; KNL_32-NEXT: .Lcfi8:			; KNL_32-NEXT: .Lcfi10:
	; KNL_32-NEXT: .cfi_def_cfa_register %ebp			; KNL_32-NEXT: .cfi_def_cfa_register %ebp
	; KNL_32-NEXT: andl $-64, %esp			; KNL_32-NEXT: andl $-64, %esp
	; KNL_32-NEXT: subl $64, %esp			; KNL_32-NEXT: subl $64, %esp
	; KNL_32-NEXT: vpmovsxbd %xmm1, %zmm1			; KNL_32-NEXT: vpmovsxbd %xmm1, %zmm1
	; KNL_32-NEXT: vpslld $31, %zmm1, %zmm1			; KNL_32-NEXT: vpslld $31, %zmm1, %zmm1
	; KNL_32-NEXT: vptestmd %zmm1, %zmm1, %k1			; KNL_32-NEXT: vptestmd %zmm1, %zmm1, %k1
	; KNL_32-NEXT: vmovdqa64 8(%ebp), %zmm1			; KNL_32-NEXT: vmovdqa64 8(%ebp), %zmm1
	; KNL_32-NEXT: kshiftrw $8, %k1, %k2			; KNL_32-NEXT: kshiftrw $8, %k1, %k2
	; KNL_32-NEXT: vpscatterdq %zmm2, (,%ymm0) {%k1}			; KNL_32-NEXT: vpscatterdq %zmm2, (,%ymm0) {%k1}
	; KNL_32-NEXT: vextracti64x4 $1, %zmm0, %ymm0			; KNL_32-NEXT: vextracti64x4 $1, %zmm0, %ymm0
	; KNL_32-NEXT: vpscatterdq %zmm1, (,%ymm0) {%k2}			; KNL_32-NEXT: vpscatterdq %zmm1, (,%ymm0) {%k2}
	; KNL_32-NEXT: movl %ebp, %esp			; KNL_32-NEXT: movl %ebp, %esp
	; KNL_32-NEXT: popl %ebp			; KNL_32-NEXT: popl %ebp
				; KNL_32-NEXT: .Lcfi11:
				; KNL_32-NEXT: .cfi_def_cfa %esp, 4
	; KNL_32-NEXT: vzeroupper			; KNL_32-NEXT: vzeroupper
	; KNL_32-NEXT: retl			; KNL_32-NEXT: retl
	;			;
	; SKX-LABEL: test_scatter_16i64:			; SKX-LABEL: test_scatter_16i64:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vpmovsxbd %xmm2, %zmm2			; SKX-NEXT: vpmovsxbd %xmm2, %zmm2
	; SKX-NEXT: vpslld $31, %zmm2, %zmm2			; SKX-NEXT: vpslld $31, %zmm2, %zmm2
	; SKX-NEXT: vptestmd %zmm2, %zmm2, %k1			; SKX-NEXT: vptestmd %zmm2, %zmm2, %k1
	; SKX-NEXT: kshiftrw $8, %k1, %k2			; SKX-NEXT: kshiftrw $8, %k1, %k2
	; SKX-NEXT: vpscatterqq %zmm3, (,%zmm0) {%k1}			; SKX-NEXT: vpscatterqq %zmm3, (,%zmm0) {%k1}
	; SKX-NEXT: vpscatterqq %zmm4, (,%zmm1) {%k2}			; SKX-NEXT: vpscatterqq %zmm4, (,%zmm1) {%k2}
	; SKX-NEXT: vzeroupper			; SKX-NEXT: vzeroupper
	; SKX-NEXT: retq			; SKX-NEXT: retq
	;			;
	; SKX_32-LABEL: test_scatter_16i64:			; SKX_32-LABEL: test_scatter_16i64:
	; SKX_32: # BB#0:			; SKX_32: # BB#0:
	; SKX_32-NEXT: pushl %ebp			; SKX_32-NEXT: pushl %ebp
	; SKX_32-NEXT: .Lcfi7:			; SKX_32-NEXT: .Lcfi10:
	; SKX_32-NEXT: .cfi_def_cfa_offset 8			; SKX_32-NEXT: .cfi_def_cfa_offset 8
	; SKX_32-NEXT: .Lcfi8:			; SKX_32-NEXT: .Lcfi11:
	; SKX_32-NEXT: .cfi_offset %ebp, -8			; SKX_32-NEXT: .cfi_offset %ebp, -8
	; SKX_32-NEXT: movl %esp, %ebp			; SKX_32-NEXT: movl %esp, %ebp
	; SKX_32-NEXT: .Lcfi9:			; SKX_32-NEXT: .Lcfi12:
	; SKX_32-NEXT: .cfi_def_cfa_register %ebp			; SKX_32-NEXT: .cfi_def_cfa_register %ebp
	; SKX_32-NEXT: andl $-64, %esp			; SKX_32-NEXT: andl $-64, %esp
	; SKX_32-NEXT: subl $64, %esp			; SKX_32-NEXT: subl $64, %esp
	; SKX_32-NEXT: vpmovsxbd %xmm1, %zmm1			; SKX_32-NEXT: vpmovsxbd %xmm1, %zmm1
	; SKX_32-NEXT: vpslld $31, %zmm1, %zmm1			; SKX_32-NEXT: vpslld $31, %zmm1, %zmm1
	; SKX_32-NEXT: vptestmd %zmm1, %zmm1, %k1			; SKX_32-NEXT: vptestmd %zmm1, %zmm1, %k1
	; SKX_32-NEXT: vmovdqa64 8(%ebp), %zmm1			; SKX_32-NEXT: vmovdqa64 8(%ebp), %zmm1
	; SKX_32-NEXT: kshiftrw $8, %k1, %k2			; SKX_32-NEXT: kshiftrw $8, %k1, %k2
	; SKX_32-NEXT: vpscatterdq %zmm2, (,%ymm0) {%k1}			; SKX_32-NEXT: vpscatterdq %zmm2, (,%ymm0) {%k1}
	; SKX_32-NEXT: vextracti32x8 $1, %zmm0, %ymm0			; SKX_32-NEXT: vextracti32x8 $1, %zmm0, %ymm0
	; SKX_32-NEXT: vpscatterdq %zmm1, (,%ymm0) {%k2}			; SKX_32-NEXT: vpscatterdq %zmm1, (,%ymm0) {%k2}
	; SKX_32-NEXT: movl %ebp, %esp			; SKX_32-NEXT: movl %ebp, %esp
	; SKX_32-NEXT: popl %ebp			; SKX_32-NEXT: popl %ebp
				; SKX_32-NEXT: .Lcfi13:
				; SKX_32-NEXT: .cfi_def_cfa %esp, 4
	; SKX_32-NEXT: vzeroupper			; SKX_32-NEXT: vzeroupper
	; SKX_32-NEXT: retl			; SKX_32-NEXT: retl
	call void @llvm.masked.scatter.v16i64.v16p0i64(<16 x i64> %src0, <16 x i64*> %ptrs, i32 4, <16 x i1> %mask)			call void @llvm.masked.scatter.v16i64.v16p0i64(<16 x i64> %src0, <16 x i64*> %ptrs, i32 4, <16 x i1> %mask)
	ret void			ret void
	}			}
	declare void @llvm.masked.scatter.v16i64.v16p0i64(<16 x i64> %src0, <16 x i64*> %ptrs, i32, <16 x i1> %mask)			declare void @llvm.masked.scatter.v16i64.v16p0i64(<16 x i64> %src0, <16 x i64*> %ptrs, i32, <16 x i1> %mask)
	define void @test_scatter_16f32(<16 x float*> %ptrs, <16 x i1> %mask, <16 x float> %src0) {			define void @test_scatter_16f32(<16 x float*> %ptrs, <16 x i1> %mask, <16 x float> %src0) {
	; KNL_64-LABEL: test_scatter_16f32:			; KNL_64-LABEL: test_scatter_16f32:
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; KNL_64-NEXT: vscatterqpd %zmm3, (,%zmm0) {%k1}			; KNL_64-NEXT: vscatterqpd %zmm3, (,%zmm0) {%k1}
	; KNL_64-NEXT: vscatterqpd %zmm4, (,%zmm1) {%k2}			; KNL_64-NEXT: vscatterqpd %zmm4, (,%zmm1) {%k2}
	; KNL_64-NEXT: vzeroupper			; KNL_64-NEXT: vzeroupper
	; KNL_64-NEXT: retq			; KNL_64-NEXT: retq
	;			;
	; KNL_32-LABEL: test_scatter_16f64:			; KNL_32-LABEL: test_scatter_16f64:
	; KNL_32: # BB#0:			; KNL_32: # BB#0:
	; KNL_32-NEXT: pushl %ebp			; KNL_32-NEXT: pushl %ebp
	; KNL_32-NEXT: .Lcfi9:			; KNL_32-NEXT: .Lcfi12:
	; KNL_32-NEXT: .cfi_def_cfa_offset 8			; KNL_32-NEXT: .cfi_def_cfa_offset 8
	; KNL_32-NEXT: .Lcfi10:			; KNL_32-NEXT: .Lcfi13:
	; KNL_32-NEXT: .cfi_offset %ebp, -8			; KNL_32-NEXT: .cfi_offset %ebp, -8
	; KNL_32-NEXT: movl %esp, %ebp			; KNL_32-NEXT: movl %esp, %ebp
	; KNL_32-NEXT: .Lcfi11:			; KNL_32-NEXT: .Lcfi14:
	; KNL_32-NEXT: .cfi_def_cfa_register %ebp			; KNL_32-NEXT: .cfi_def_cfa_register %ebp
	; KNL_32-NEXT: andl $-64, %esp			; KNL_32-NEXT: andl $-64, %esp
	; KNL_32-NEXT: subl $64, %esp			; KNL_32-NEXT: subl $64, %esp
	; KNL_32-NEXT: vpmovsxbd %xmm1, %zmm1			; KNL_32-NEXT: vpmovsxbd %xmm1, %zmm1
	; KNL_32-NEXT: vpslld $31, %zmm1, %zmm1			; KNL_32-NEXT: vpslld $31, %zmm1, %zmm1
	; KNL_32-NEXT: vptestmd %zmm1, %zmm1, %k1			; KNL_32-NEXT: vptestmd %zmm1, %zmm1, %k1
	; KNL_32-NEXT: vmovapd 8(%ebp), %zmm1			; KNL_32-NEXT: vmovapd 8(%ebp), %zmm1
	; KNL_32-NEXT: kshiftrw $8, %k1, %k2			; KNL_32-NEXT: kshiftrw $8, %k1, %k2
	; KNL_32-NEXT: vscatterdpd %zmm2, (,%ymm0) {%k1}			; KNL_32-NEXT: vscatterdpd %zmm2, (,%ymm0) {%k1}
	; KNL_32-NEXT: vextracti64x4 $1, %zmm0, %ymm0			; KNL_32-NEXT: vextracti64x4 $1, %zmm0, %ymm0
	; KNL_32-NEXT: vscatterdpd %zmm1, (,%ymm0) {%k2}			; KNL_32-NEXT: vscatterdpd %zmm1, (,%ymm0) {%k2}
	; KNL_32-NEXT: movl %ebp, %esp			; KNL_32-NEXT: movl %ebp, %esp
	; KNL_32-NEXT: popl %ebp			; KNL_32-NEXT: popl %ebp
				; KNL_32-NEXT: .Lcfi15:
				; KNL_32-NEXT: .cfi_def_cfa %esp, 4
	; KNL_32-NEXT: vzeroupper			; KNL_32-NEXT: vzeroupper
	; KNL_32-NEXT: retl			; KNL_32-NEXT: retl
	;			;
	; SKX-LABEL: test_scatter_16f64:			; SKX-LABEL: test_scatter_16f64:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vpmovsxbd %xmm2, %zmm2			; SKX-NEXT: vpmovsxbd %xmm2, %zmm2
	; SKX-NEXT: vpslld $31, %zmm2, %zmm2			; SKX-NEXT: vpslld $31, %zmm2, %zmm2
	; SKX-NEXT: vptestmd %zmm2, %zmm2, %k1			; SKX-NEXT: vptestmd %zmm2, %zmm2, %k1
	; SKX-NEXT: kshiftrw $8, %k1, %k2			; SKX-NEXT: kshiftrw $8, %k1, %k2
	; SKX-NEXT: vscatterqpd %zmm3, (,%zmm0) {%k1}			; SKX-NEXT: vscatterqpd %zmm3, (,%zmm0) {%k1}
	; SKX-NEXT: vscatterqpd %zmm4, (,%zmm1) {%k2}			; SKX-NEXT: vscatterqpd %zmm4, (,%zmm1) {%k2}
	; SKX-NEXT: vzeroupper			; SKX-NEXT: vzeroupper
	; SKX-NEXT: retq			; SKX-NEXT: retq
	;			;
	; SKX_32-LABEL: test_scatter_16f64:			; SKX_32-LABEL: test_scatter_16f64:
	; SKX_32: # BB#0:			; SKX_32: # BB#0:
	; SKX_32-NEXT: pushl %ebp			; SKX_32-NEXT: pushl %ebp
	; SKX_32-NEXT: .Lcfi10:			; SKX_32-NEXT: .Lcfi14:
	; SKX_32-NEXT: .cfi_def_cfa_offset 8			; SKX_32-NEXT: .cfi_def_cfa_offset 8
	; SKX_32-NEXT: .Lcfi11:			; SKX_32-NEXT: .Lcfi15:
	; SKX_32-NEXT: .cfi_offset %ebp, -8			; SKX_32-NEXT: .cfi_offset %ebp, -8
	; SKX_32-NEXT: movl %esp, %ebp			; SKX_32-NEXT: movl %esp, %ebp
	; SKX_32-NEXT: .Lcfi12:			; SKX_32-NEXT: .Lcfi16:
	; SKX_32-NEXT: .cfi_def_cfa_register %ebp			; SKX_32-NEXT: .cfi_def_cfa_register %ebp
	; SKX_32-NEXT: andl $-64, %esp			; SKX_32-NEXT: andl $-64, %esp
	; SKX_32-NEXT: subl $64, %esp			; SKX_32-NEXT: subl $64, %esp
	; SKX_32-NEXT: vpmovsxbd %xmm1, %zmm1			; SKX_32-NEXT: vpmovsxbd %xmm1, %zmm1
	; SKX_32-NEXT: vpslld $31, %zmm1, %zmm1			; SKX_32-NEXT: vpslld $31, %zmm1, %zmm1
	; SKX_32-NEXT: vptestmd %zmm1, %zmm1, %k1			; SKX_32-NEXT: vptestmd %zmm1, %zmm1, %k1
	; SKX_32-NEXT: vmovapd 8(%ebp), %zmm1			; SKX_32-NEXT: vmovapd 8(%ebp), %zmm1
	; SKX_32-NEXT: kshiftrw $8, %k1, %k2			; SKX_32-NEXT: kshiftrw $8, %k1, %k2
	; SKX_32-NEXT: vscatterdpd %zmm2, (,%ymm0) {%k1}			; SKX_32-NEXT: vscatterdpd %zmm2, (,%ymm0) {%k1}
	; SKX_32-NEXT: vextracti32x8 $1, %zmm0, %ymm0			; SKX_32-NEXT: vextracti32x8 $1, %zmm0, %ymm0
	; SKX_32-NEXT: vscatterdpd %zmm1, (,%ymm0) {%k2}			; SKX_32-NEXT: vscatterdpd %zmm1, (,%ymm0) {%k2}
	; SKX_32-NEXT: movl %ebp, %esp			; SKX_32-NEXT: movl %ebp, %esp
	; SKX_32-NEXT: popl %ebp			; SKX_32-NEXT: popl %ebp
				; SKX_32-NEXT: .Lcfi17:
				; SKX_32-NEXT: .cfi_def_cfa %esp, 4
	; SKX_32-NEXT: vzeroupper			; SKX_32-NEXT: vzeroupper
	; SKX_32-NEXT: retl			; SKX_32-NEXT: retl
	call void @llvm.masked.scatter.v16f64.v16p0f64(<16 x double> %src0, <16 x double*> %ptrs, i32 4, <16 x i1> %mask)			call void @llvm.masked.scatter.v16f64.v16p0f64(<16 x double> %src0, <16 x double*> %ptrs, i32 4, <16 x i1> %mask)
	ret void			ret void
	}			}
	declare void @llvm.masked.scatter.v16f64.v16p0f64(<16 x double> %src0, <16 x double*> %ptrs, i32, <16 x i1> %mask)			declare void @llvm.masked.scatter.v16f64.v16p0f64(<16 x double> %src0, <16 x double*> %ptrs, i32, <16 x i1> %mask)

	define <4 x i64> @test_pr28312(<4 x i64*> %p1, <4 x i1> %k, <4 x i1> %k2,<4 x i64> %d) {			define <4 x i64> @test_pr28312(<4 x i64*> %p1, <4 x i1> %k, <4 x i1> %k2,<4 x i64> %d) {
	Show All 10 Lines
	; KNL_64-NEXT: vpgatherqq (,%zmm0), %zmm1 {%k1}			; KNL_64-NEXT: vpgatherqq (,%zmm0), %zmm1 {%k1}
	; KNL_64-NEXT: vpaddq %ymm1, %ymm1, %ymm0			; KNL_64-NEXT: vpaddq %ymm1, %ymm1, %ymm0
	; KNL_64-NEXT: vpaddq %ymm0, %ymm1, %ymm0			; KNL_64-NEXT: vpaddq %ymm0, %ymm1, %ymm0
	; KNL_64-NEXT: retq			; KNL_64-NEXT: retq
	;			;
	; KNL_32-LABEL: test_pr28312:			; KNL_32-LABEL: test_pr28312:
	; KNL_32: # BB#0:			; KNL_32: # BB#0:
	; KNL_32-NEXT: pushl %ebp			; KNL_32-NEXT: pushl %ebp
	; KNL_32-NEXT: .Lcfi12:			; KNL_32-NEXT: .Lcfi16:
	; KNL_32-NEXT: .cfi_def_cfa_offset 8			; KNL_32-NEXT: .cfi_def_cfa_offset 8
	; KNL_32-NEXT: .Lcfi13:			; KNL_32-NEXT: .Lcfi17:
	; KNL_32-NEXT: .cfi_offset %ebp, -8			; KNL_32-NEXT: .cfi_offset %ebp, -8
	; KNL_32-NEXT: movl %esp, %ebp			; KNL_32-NEXT: movl %esp, %ebp
	; KNL_32-NEXT: .Lcfi14:			; KNL_32-NEXT: .Lcfi18:
	; KNL_32-NEXT: .cfi_def_cfa_register %ebp			; KNL_32-NEXT: .cfi_def_cfa_register %ebp
	; KNL_32-NEXT: andl $-32, %esp			; KNL_32-NEXT: andl $-32, %esp
	; KNL_32-NEXT: subl $32, %esp			; KNL_32-NEXT: subl $32, %esp
	; KNL_32-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<def>			; KNL_32-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<def>
	; KNL_32-NEXT: vpslld $31, %xmm1, %xmm1			; KNL_32-NEXT: vpslld $31, %xmm1, %xmm1
	; KNL_32-NEXT: vpsrad $31, %xmm1, %xmm1			; KNL_32-NEXT: vpsrad $31, %xmm1, %xmm1
	; KNL_32-NEXT: vpmovsxdq %xmm1, %ymm1			; KNL_32-NEXT: vpmovsxdq %xmm1, %ymm1
	; KNL_32-NEXT: vpxord %zmm2, %zmm2, %zmm2			; KNL_32-NEXT: vpxord %zmm2, %zmm2, %zmm2
	; KNL_32-NEXT: vinserti64x4 $0, %ymm1, %zmm2, %zmm1			; KNL_32-NEXT: vinserti64x4 $0, %ymm1, %zmm2, %zmm1
	; KNL_32-NEXT: vpmovsxdq %ymm0, %zmm0			; KNL_32-NEXT: vpmovsxdq %ymm0, %zmm0
	; KNL_32-NEXT: vpsllq $63, %zmm1, %zmm1			; KNL_32-NEXT: vpsllq $63, %zmm1, %zmm1
	; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k1			; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k1
	; KNL_32-NEXT: vpgatherqq (,%zmm0), %zmm1 {%k1}			; KNL_32-NEXT: vpgatherqq (,%zmm0), %zmm1 {%k1}
	; KNL_32-NEXT: vpaddq %ymm1, %ymm1, %ymm0			; KNL_32-NEXT: vpaddq %ymm1, %ymm1, %ymm0
	; KNL_32-NEXT: vpaddq %ymm0, %ymm1, %ymm0			; KNL_32-NEXT: vpaddq %ymm0, %ymm1, %ymm0
	; KNL_32-NEXT: movl %ebp, %esp			; KNL_32-NEXT: movl %ebp, %esp
	; KNL_32-NEXT: popl %ebp			; KNL_32-NEXT: popl %ebp
				; KNL_32-NEXT: .Lcfi19:
				; KNL_32-NEXT: .cfi_def_cfa %esp, 4
	; KNL_32-NEXT: retl			; KNL_32-NEXT: retl
	;			;
	; SKX-LABEL: test_pr28312:			; SKX-LABEL: test_pr28312:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vpslld $31, %xmm1, %xmm1			; SKX-NEXT: vpslld $31, %xmm1, %xmm1
	; SKX-NEXT: vptestmd %xmm1, %xmm1, %k1			; SKX-NEXT: vptestmd %xmm1, %xmm1, %k1
	; SKX-NEXT: vpgatherqq (,%ymm0), %ymm1 {%k1}			; SKX-NEXT: vpgatherqq (,%ymm0), %ymm1 {%k1}
	; SKX-NEXT: vpaddq %ymm1, %ymm1, %ymm0			; SKX-NEXT: vpaddq %ymm1, %ymm1, %ymm0
	; SKX-NEXT: vpaddq %ymm0, %ymm1, %ymm0			; SKX-NEXT: vpaddq %ymm0, %ymm1, %ymm0
	; SKX-NEXT: retq			; SKX-NEXT: retq
	;			;
	; SKX_32-LABEL: test_pr28312:			; SKX_32-LABEL: test_pr28312:
	; SKX_32: # BB#0:			; SKX_32: # BB#0:
	; SKX_32-NEXT: pushl %ebp			; SKX_32-NEXT: pushl %ebp
	; SKX_32-NEXT: .Lcfi13:			; SKX_32-NEXT: .Lcfi18:
	; SKX_32-NEXT: .cfi_def_cfa_offset 8			; SKX_32-NEXT: .cfi_def_cfa_offset 8
	; SKX_32-NEXT: .Lcfi14:			; SKX_32-NEXT: .Lcfi19:
	; SKX_32-NEXT: .cfi_offset %ebp, -8			; SKX_32-NEXT: .cfi_offset %ebp, -8
	; SKX_32-NEXT: movl %esp, %ebp			; SKX_32-NEXT: movl %esp, %ebp
	; SKX_32-NEXT: .Lcfi15:			; SKX_32-NEXT: .Lcfi20:
	; SKX_32-NEXT: .cfi_def_cfa_register %ebp			; SKX_32-NEXT: .cfi_def_cfa_register %ebp
	; SKX_32-NEXT: andl $-32, %esp			; SKX_32-NEXT: andl $-32, %esp
	; SKX_32-NEXT: subl $32, %esp			; SKX_32-NEXT: subl $32, %esp
	; SKX_32-NEXT: vpslld $31, %xmm1, %xmm1			; SKX_32-NEXT: vpslld $31, %xmm1, %xmm1
	; SKX_32-NEXT: vptestmd %xmm1, %xmm1, %k1			; SKX_32-NEXT: vptestmd %xmm1, %xmm1, %k1
	; SKX_32-NEXT: vpgatherdq (,%xmm0), %ymm1 {%k1}			; SKX_32-NEXT: vpgatherdq (,%xmm0), %ymm1 {%k1}
	; SKX_32-NEXT: vpaddq %ymm1, %ymm1, %ymm0			; SKX_32-NEXT: vpaddq %ymm1, %ymm1, %ymm0
	; SKX_32-NEXT: vpaddq %ymm0, %ymm1, %ymm0			; SKX_32-NEXT: vpaddq %ymm0, %ymm1, %ymm0
	; SKX_32-NEXT: movl %ebp, %esp			; SKX_32-NEXT: movl %ebp, %esp
	; SKX_32-NEXT: popl %ebp			; SKX_32-NEXT: popl %ebp
				; SKX_32-NEXT: .Lcfi21:
				; SKX_32-NEXT: .cfi_def_cfa %esp, 4
	; SKX_32-NEXT: retl			; SKX_32-NEXT: retl
	%g1 = call <4 x i64> @llvm.masked.gather.v4i64.v4p0i64(<4 x i64*> %p1, i32 8, <4 x i1> %k, <4 x i64> undef)			%g1 = call <4 x i64> @llvm.masked.gather.v4i64.v4p0i64(<4 x i64*> %p1, i32 8, <4 x i1> %k, <4 x i64> undef)
	%g2 = call <4 x i64> @llvm.masked.gather.v4i64.v4p0i64(<4 x i64*> %p1, i32 8, <4 x i1> %k, <4 x i64> undef)			%g2 = call <4 x i64> @llvm.masked.gather.v4i64.v4p0i64(<4 x i64*> %p1, i32 8, <4 x i1> %k, <4 x i64> undef)
	%g3 = call <4 x i64> @llvm.masked.gather.v4i64.v4p0i64(<4 x i64*> %p1, i32 8, <4 x i1> %k, <4 x i64> undef)			%g3 = call <4 x i64> @llvm.masked.gather.v4i64.v4p0i64(<4 x i64*> %p1, i32 8, <4 x i1> %k, <4 x i64> undef)
	%a = add <4 x i64> %g1, %g2			%a = add <4 x i64> %g1, %g2
	%b = add <4 x i64> %a, %g3			%b = add <4 x i64> %a, %g3
	ret <4 x i64> %b			ret <4 x i64> %b
	}			}
	declare <4 x i64> @llvm.masked.gather.v4i64.v4p0i64(<4 x i64*>, i32, <4 x i1>, <4 x i64>)			declare <4 x i64> @llvm.masked.gather.v4i64.v4p0i64(<4 x i64*>, i32, <4 x i1>, <4 x i64>)

test/CodeGen/X86/memset-nonzero.ll

	Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: pushq %rax			; SSE-NEXT: pushq %rax
	; SSE-NEXT: .Lcfi0:			; SSE-NEXT: .Lcfi0:
	; SSE-NEXT: .cfi_def_cfa_offset 16			; SSE-NEXT: .cfi_def_cfa_offset 16
	; SSE-NEXT: movl $42, %esi			; SSE-NEXT: movl $42, %esi
	; SSE-NEXT: movl $256, %edx # imm = 0x100			; SSE-NEXT: movl $256, %edx # imm = 0x100
	; SSE-NEXT: callq memset			; SSE-NEXT: callq memset
	; SSE-NEXT: popq %rax			; SSE-NEXT: popq %rax
				; SSE-NEXT: .Lcfi1:
				; SSE-NEXT: .cfi_def_cfa_offset 8
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; SSE2FAST-LABEL: memset_256_nonzero_bytes:			; SSE2FAST-LABEL: memset_256_nonzero_bytes:
	; SSE2FAST: # BB#0:			; SSE2FAST: # BB#0:
	; SSE2FAST-NEXT: movaps {{.*#+}} xmm0 = [42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42]			; SSE2FAST-NEXT: movaps {{.*#+}} xmm0 = [42,42,42,42,42,42,42,42,42,42,42,42,42,42,42,42]
	; SSE2FAST-NEXT: movups %xmm0, 240(%rdi)			; SSE2FAST-NEXT: movups %xmm0, 240(%rdi)
	; SSE2FAST-NEXT: movups %xmm0, 224(%rdi)			; SSE2FAST-NEXT: movups %xmm0, 224(%rdi)
	; SSE2FAST-NEXT: movups %xmm0, 208(%rdi)			; SSE2FAST-NEXT: movups %xmm0, 208(%rdi)
	▲ Show 20 Lines • Show All 301 Lines • Show Last 20 Lines

test/CodeGen/X86/merge-consecutive-loads-128.ll

	Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	; X32-SSE1-NEXT: movl 12(%ecx), %esi			; X32-SSE1-NEXT: movl 12(%ecx), %esi
	; X32-SSE1-NEXT: movl 16(%ecx), %edi			; X32-SSE1-NEXT: movl 16(%ecx), %edi
	; X32-SSE1-NEXT: movl 20(%ecx), %ecx			; X32-SSE1-NEXT: movl 20(%ecx), %ecx
	; X32-SSE1-NEXT: movl %ecx, 12(%eax)			; X32-SSE1-NEXT: movl %ecx, 12(%eax)
	; X32-SSE1-NEXT: movl %edi, 8(%eax)			; X32-SSE1-NEXT: movl %edi, 8(%eax)
	; X32-SSE1-NEXT: movl %esi, 4(%eax)			; X32-SSE1-NEXT: movl %esi, 4(%eax)
	; X32-SSE1-NEXT: movl %edx, (%eax)			; X32-SSE1-NEXT: movl %edx, (%eax)
	; X32-SSE1-NEXT: popl %esi			; X32-SSE1-NEXT: popl %esi
				; X32-SSE1-NEXT: .Lcfi4:
				; X32-SSE1-NEXT: .cfi_def_cfa_offset 8
	; X32-SSE1-NEXT: popl %edi			; X32-SSE1-NEXT: popl %edi
				; X32-SSE1-NEXT: .Lcfi5:
				; X32-SSE1-NEXT: .cfi_def_cfa_offset 4
	; X32-SSE1-NEXT: retl $4			; X32-SSE1-NEXT: retl $4
	;			;
	; X32-SSE41-LABEL: merge_2i64_i64_12:			; X32-SSE41-LABEL: merge_2i64_i64_12:
	; X32-SSE41: # BB#0:			; X32-SSE41: # BB#0:
	; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-SSE41-NEXT: movups 8(%eax), %xmm0			; X32-SSE41-NEXT: movups 8(%eax), %xmm0
	; X32-SSE41-NEXT: retl			; X32-SSE41-NEXT: retl
	%ptr0 = getelementptr inbounds i64, i64* %ptr, i64 1			%ptr0 = getelementptr inbounds i64, i64* %ptr, i64 1
	▲ Show 20 Lines • Show All 284 Lines • ▼ Show 20 Lines
	; AVX-LABEL: merge_4i32_i32_23u5:			; AVX-LABEL: merge_4i32_i32_23u5:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vmovups 8(%rdi), %xmm0			; AVX-NEXT: vmovups 8(%rdi), %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; X32-SSE1-LABEL: merge_4i32_i32_23u5:			; X32-SSE1-LABEL: merge_4i32_i32_23u5:
	; X32-SSE1: # BB#0:			; X32-SSE1: # BB#0:
	; X32-SSE1-NEXT: pushl %esi			; X32-SSE1-NEXT: pushl %esi
	; X32-SSE1-NEXT: .Lcfi4:			; X32-SSE1-NEXT: .Lcfi6:
	; X32-SSE1-NEXT: .cfi_def_cfa_offset 8			; X32-SSE1-NEXT: .cfi_def_cfa_offset 8
	; X32-SSE1-NEXT: .Lcfi5:			; X32-SSE1-NEXT: .Lcfi7:
	; X32-SSE1-NEXT: .cfi_offset %esi, -8			; X32-SSE1-NEXT: .cfi_offset %esi, -8
	; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-SSE1-NEXT: movl 8(%ecx), %edx			; X32-SSE1-NEXT: movl 8(%ecx), %edx
	; X32-SSE1-NEXT: movl 12(%ecx), %esi			; X32-SSE1-NEXT: movl 12(%ecx), %esi
	; X32-SSE1-NEXT: movl 20(%ecx), %ecx			; X32-SSE1-NEXT: movl 20(%ecx), %ecx
	; X32-SSE1-NEXT: movl %esi, 4(%eax)			; X32-SSE1-NEXT: movl %esi, 4(%eax)
	; X32-SSE1-NEXT: movl %edx, (%eax)			; X32-SSE1-NEXT: movl %edx, (%eax)
	; X32-SSE1-NEXT: movl %ecx, 12(%eax)			; X32-SSE1-NEXT: movl %ecx, 12(%eax)
	; X32-SSE1-NEXT: popl %esi			; X32-SSE1-NEXT: popl %esi
				; X32-SSE1-NEXT: .Lcfi8:
				; X32-SSE1-NEXT: .cfi_def_cfa_offset 4
	; X32-SSE1-NEXT: retl $4			; X32-SSE1-NEXT: retl $4
	;			;
	; X32-SSE41-LABEL: merge_4i32_i32_23u5:			; X32-SSE41-LABEL: merge_4i32_i32_23u5:
	; X32-SSE41: # BB#0:			; X32-SSE41: # BB#0:
	; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-SSE41-NEXT: movups 8(%eax), %xmm0			; X32-SSE41-NEXT: movups 8(%eax), %xmm0
	; X32-SSE41-NEXT: retl			; X32-SSE41-NEXT: retl
	%ptr0 = getelementptr inbounds i32, i32* %ptr, i64 2			%ptr0 = getelementptr inbounds i32, i32* %ptr, i64 2
	▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines
	; AVX-LABEL: merge_8i16_i16_23u567u9:			; AVX-LABEL: merge_8i16_i16_23u567u9:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vmovups 4(%rdi), %xmm0			; AVX-NEXT: vmovups 4(%rdi), %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; X32-SSE1-LABEL: merge_8i16_i16_23u567u9:			; X32-SSE1-LABEL: merge_8i16_i16_23u567u9:
	; X32-SSE1: # BB#0:			; X32-SSE1: # BB#0:
	; X32-SSE1-NEXT: pushl %ebp			; X32-SSE1-NEXT: pushl %ebp
	; X32-SSE1-NEXT: .Lcfi6:			; X32-SSE1-NEXT: .Lcfi9:
	; X32-SSE1-NEXT: .cfi_def_cfa_offset 8			; X32-SSE1-NEXT: .cfi_def_cfa_offset 8
	; X32-SSE1-NEXT: pushl %ebx			; X32-SSE1-NEXT: pushl %ebx
	; X32-SSE1-NEXT: .Lcfi7:			; X32-SSE1-NEXT: .Lcfi10:
	; X32-SSE1-NEXT: .cfi_def_cfa_offset 12			; X32-SSE1-NEXT: .cfi_def_cfa_offset 12
	; X32-SSE1-NEXT: pushl %edi			; X32-SSE1-NEXT: pushl %edi
	; X32-SSE1-NEXT: .Lcfi8:			; X32-SSE1-NEXT: .Lcfi11:
	; X32-SSE1-NEXT: .cfi_def_cfa_offset 16			; X32-SSE1-NEXT: .cfi_def_cfa_offset 16
	; X32-SSE1-NEXT: pushl %esi			; X32-SSE1-NEXT: pushl %esi
	; X32-SSE1-NEXT: .Lcfi9:			; X32-SSE1-NEXT: .Lcfi12:
	; X32-SSE1-NEXT: .cfi_def_cfa_offset 20			; X32-SSE1-NEXT: .cfi_def_cfa_offset 20
	; X32-SSE1-NEXT: .Lcfi10:			; X32-SSE1-NEXT: .Lcfi13:
	; X32-SSE1-NEXT: .cfi_offset %esi, -20			; X32-SSE1-NEXT: .cfi_offset %esi, -20
	; X32-SSE1-NEXT: .Lcfi11:			; X32-SSE1-NEXT: .Lcfi14:
	; X32-SSE1-NEXT: .cfi_offset %edi, -16			; X32-SSE1-NEXT: .cfi_offset %edi, -16
	; X32-SSE1-NEXT: .Lcfi12:			; X32-SSE1-NEXT: .Lcfi15:
	; X32-SSE1-NEXT: .cfi_offset %ebx, -12			; X32-SSE1-NEXT: .cfi_offset %ebx, -12
	; X32-SSE1-NEXT: .Lcfi13:			; X32-SSE1-NEXT: .Lcfi16:
	; X32-SSE1-NEXT: .cfi_offset %ebp, -8			; X32-SSE1-NEXT: .cfi_offset %ebp, -8
	; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-SSE1-NEXT: movzwl 4(%ecx), %edx			; X32-SSE1-NEXT: movzwl 4(%ecx), %edx
	; X32-SSE1-NEXT: movzwl 6(%ecx), %esi			; X32-SSE1-NEXT: movzwl 6(%ecx), %esi
	; X32-SSE1-NEXT: movzwl 10(%ecx), %edi			; X32-SSE1-NEXT: movzwl 10(%ecx), %edi
	; X32-SSE1-NEXT: movzwl 12(%ecx), %ebx			; X32-SSE1-NEXT: movzwl 12(%ecx), %ebx
	; X32-SSE1-NEXT: movzwl 14(%ecx), %ebp			; X32-SSE1-NEXT: movzwl 14(%ecx), %ebp
	; X32-SSE1-NEXT: movzwl 18(%ecx), %ecx			; X32-SSE1-NEXT: movzwl 18(%ecx), %ecx
	; X32-SSE1-NEXT: movw %bp, 10(%eax)			; X32-SSE1-NEXT: movw %bp, 10(%eax)
	; X32-SSE1-NEXT: movw %bx, 8(%eax)			; X32-SSE1-NEXT: movw %bx, 8(%eax)
	; X32-SSE1-NEXT: movw %cx, 14(%eax)			; X32-SSE1-NEXT: movw %cx, 14(%eax)
	; X32-SSE1-NEXT: movw %si, 2(%eax)			; X32-SSE1-NEXT: movw %si, 2(%eax)
	; X32-SSE1-NEXT: movw %dx, (%eax)			; X32-SSE1-NEXT: movw %dx, (%eax)
	; X32-SSE1-NEXT: movw %di, 6(%eax)			; X32-SSE1-NEXT: movw %di, 6(%eax)
	; X32-SSE1-NEXT: popl %esi			; X32-SSE1-NEXT: popl %esi
				; X32-SSE1-NEXT: .Lcfi17:
				; X32-SSE1-NEXT: .cfi_def_cfa_offset 16
	; X32-SSE1-NEXT: popl %edi			; X32-SSE1-NEXT: popl %edi
				; X32-SSE1-NEXT: .Lcfi18:
				; X32-SSE1-NEXT: .cfi_def_cfa_offset 12
	; X32-SSE1-NEXT: popl %ebx			; X32-SSE1-NEXT: popl %ebx
				; X32-SSE1-NEXT: .Lcfi19:
				; X32-SSE1-NEXT: .cfi_def_cfa_offset 8
	; X32-SSE1-NEXT: popl %ebp			; X32-SSE1-NEXT: popl %ebp
				; X32-SSE1-NEXT: .Lcfi20:
				; X32-SSE1-NEXT: .cfi_def_cfa_offset 4
	; X32-SSE1-NEXT: retl $4			; X32-SSE1-NEXT: retl $4
	;			;
	; X32-SSE41-LABEL: merge_8i16_i16_23u567u9:			; X32-SSE41-LABEL: merge_8i16_i16_23u567u9:
	; X32-SSE41: # BB#0:			; X32-SSE41: # BB#0:
	; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-SSE41-NEXT: movups 4(%eax), %xmm0			; X32-SSE41-NEXT: movups 4(%eax), %xmm0
	; X32-SSE41-NEXT: retl			; X32-SSE41-NEXT: retl
	%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 2			%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 2
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; AVX-LABEL: merge_8i16_i16_45u7zzzz:			; AVX-LABEL: merge_8i16_i16_45u7zzzz:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; X32-SSE1-LABEL: merge_8i16_i16_45u7zzzz:			; X32-SSE1-LABEL: merge_8i16_i16_45u7zzzz:
	; X32-SSE1: # BB#0:			; X32-SSE1: # BB#0:
	; X32-SSE1-NEXT: pushl %esi			; X32-SSE1-NEXT: pushl %esi
	; X32-SSE1-NEXT: .Lcfi14:			; X32-SSE1-NEXT: .Lcfi21:
	; X32-SSE1-NEXT: .cfi_def_cfa_offset 8			; X32-SSE1-NEXT: .cfi_def_cfa_offset 8
	; X32-SSE1-NEXT: .Lcfi15:			; X32-SSE1-NEXT: .Lcfi22:
	; X32-SSE1-NEXT: .cfi_offset %esi, -8			; X32-SSE1-NEXT: .cfi_offset %esi, -8
	; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-SSE1-NEXT: movzwl 8(%ecx), %edx			; X32-SSE1-NEXT: movzwl 8(%ecx), %edx
	; X32-SSE1-NEXT: movzwl 10(%ecx), %esi			; X32-SSE1-NEXT: movzwl 10(%ecx), %esi
	; X32-SSE1-NEXT: movzwl 14(%ecx), %ecx			; X32-SSE1-NEXT: movzwl 14(%ecx), %ecx
	; X32-SSE1-NEXT: movw %si, 2(%eax)			; X32-SSE1-NEXT: movw %si, 2(%eax)
	; X32-SSE1-NEXT: movw %dx, (%eax)			; X32-SSE1-NEXT: movw %dx, (%eax)
	; X32-SSE1-NEXT: movw %cx, 6(%eax)			; X32-SSE1-NEXT: movw %cx, 6(%eax)
	; X32-SSE1-NEXT: movw $0, 14(%eax)			; X32-SSE1-NEXT: movw $0, 14(%eax)
	; X32-SSE1-NEXT: movw $0, 12(%eax)			; X32-SSE1-NEXT: movw $0, 12(%eax)
	; X32-SSE1-NEXT: movw $0, 10(%eax)			; X32-SSE1-NEXT: movw $0, 10(%eax)
	; X32-SSE1-NEXT: movw $0, 8(%eax)			; X32-SSE1-NEXT: movw $0, 8(%eax)
	; X32-SSE1-NEXT: popl %esi			; X32-SSE1-NEXT: popl %esi
				; X32-SSE1-NEXT: .Lcfi23:
				; X32-SSE1-NEXT: .cfi_def_cfa_offset 4
	; X32-SSE1-NEXT: retl $4			; X32-SSE1-NEXT: retl $4
	;			;
	; X32-SSE41-LABEL: merge_8i16_i16_45u7zzzz:			; X32-SSE41-LABEL: merge_8i16_i16_45u7zzzz:
	; X32-SSE41: # BB#0:			; X32-SSE41: # BB#0:
	; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-SSE41-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X32-SSE41-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X32-SSE41-NEXT: retl			; X32-SSE41-NEXT: retl
	%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 4			%ptr0 = getelementptr inbounds i16, i16* %ptr, i64 4
	Show All 21 Lines
	; AVX-LABEL: merge_16i8_i8_01u3456789ABCDuF:			; AVX-LABEL: merge_16i8_i8_01u3456789ABCDuF:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vmovups (%rdi), %xmm0			; AVX-NEXT: vmovups (%rdi), %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; X32-SSE1-LABEL: merge_16i8_i8_01u3456789ABCDuF:			; X32-SSE1-LABEL: merge_16i8_i8_01u3456789ABCDuF:
	; X32-SSE1: # BB#0:			; X32-SSE1: # BB#0:
	; X32-SSE1-NEXT: pushl %ebx			; X32-SSE1-NEXT: pushl %ebx
	; X32-SSE1-NEXT: .Lcfi16:			; X32-SSE1-NEXT: .Lcfi24:
	; X32-SSE1-NEXT: .cfi_def_cfa_offset 8			; X32-SSE1-NEXT: .cfi_def_cfa_offset 8
	; X32-SSE1-NEXT: subl $12, %esp			; X32-SSE1-NEXT: subl $12, %esp
	; X32-SSE1-NEXT: .Lcfi17:			; X32-SSE1-NEXT: .Lcfi25:
	; X32-SSE1-NEXT: .cfi_def_cfa_offset 20			; X32-SSE1-NEXT: .cfi_def_cfa_offset 20
	; X32-SSE1-NEXT: .Lcfi18:			; X32-SSE1-NEXT: .Lcfi26:
	; X32-SSE1-NEXT: .cfi_offset %ebx, -8			; X32-SSE1-NEXT: .cfi_offset %ebx, -8
	; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-SSE1-NEXT: movb (%ecx), %dl			; X32-SSE1-NEXT: movb (%ecx), %dl
	; X32-SSE1-NEXT: movb %dl, {{[0-9]+}}(%esp) # 1-byte Spill			; X32-SSE1-NEXT: movb %dl, {{[0-9]+}}(%esp) # 1-byte Spill
	; X32-SSE1-NEXT: movb 1(%ecx), %dl			; X32-SSE1-NEXT: movb 1(%ecx), %dl
	; X32-SSE1-NEXT: movb %dl, {{[0-9]+}}(%esp) # 1-byte Spill			; X32-SSE1-NEXT: movb %dl, {{[0-9]+}}(%esp) # 1-byte Spill
	; X32-SSE1-NEXT: movb 3(%ecx), %dl			; X32-SSE1-NEXT: movb 3(%ecx), %dl
	Show All 34 Lines
	; X32-SSE1-NEXT: movb %cl, 4(%eax)			; X32-SSE1-NEXT: movb %cl, 4(%eax)
	; X32-SSE1-NEXT: movb {{[0-9]+}}(%esp), %cl # 1-byte Reload			; X32-SSE1-NEXT: movb {{[0-9]+}}(%esp), %cl # 1-byte Reload
	; X32-SSE1-NEXT: movb %cl, 1(%eax)			; X32-SSE1-NEXT: movb %cl, 1(%eax)
	; X32-SSE1-NEXT: movb {{[0-9]+}}(%esp), %cl # 1-byte Reload			; X32-SSE1-NEXT: movb {{[0-9]+}}(%esp), %cl # 1-byte Reload
	; X32-SSE1-NEXT: movb %cl, (%eax)			; X32-SSE1-NEXT: movb %cl, (%eax)
	; X32-SSE1-NEXT: movb {{[0-9]+}}(%esp), %cl # 1-byte Reload			; X32-SSE1-NEXT: movb {{[0-9]+}}(%esp), %cl # 1-byte Reload
	; X32-SSE1-NEXT: movb %cl, 3(%eax)			; X32-SSE1-NEXT: movb %cl, 3(%eax)
	; X32-SSE1-NEXT: addl $12, %esp			; X32-SSE1-NEXT: addl $12, %esp
				; X32-SSE1-NEXT: .Lcfi27:
				; X32-SSE1-NEXT: .cfi_def_cfa_offset 8
	; X32-SSE1-NEXT: popl %ebx			; X32-SSE1-NEXT: popl %ebx
				; X32-SSE1-NEXT: .Lcfi28:
				; X32-SSE1-NEXT: .cfi_def_cfa_offset 4
	; X32-SSE1-NEXT: retl $4			; X32-SSE1-NEXT: retl $4
	;			;
	; X32-SSE41-LABEL: merge_16i8_i8_01u3456789ABCDuF:			; X32-SSE41-LABEL: merge_16i8_i8_01u3456789ABCDuF:
	; X32-SSE41: # BB#0:			; X32-SSE41: # BB#0:
	; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-SSE41-NEXT: movups (%eax), %xmm0			; X32-SSE41-NEXT: movups (%eax), %xmm0
	; X32-SSE41-NEXT: retl			; X32-SSE41-NEXT: retl
	%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 0			%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 0
	▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	; AVX-LABEL: merge_16i8_i8_0123uu67uuuuuzzz:			; AVX-LABEL: merge_16i8_i8_0123uu67uuuuuzzz:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; X32-SSE1-LABEL: merge_16i8_i8_0123uu67uuuuuzzz:			; X32-SSE1-LABEL: merge_16i8_i8_0123uu67uuuuuzzz:
	; X32-SSE1: # BB#0:			; X32-SSE1: # BB#0:
	; X32-SSE1-NEXT: pushl %ebx			; X32-SSE1-NEXT: pushl %ebx
	; X32-SSE1-NEXT: .Lcfi19:			; X32-SSE1-NEXT: .Lcfi29:
	; X32-SSE1-NEXT: .cfi_def_cfa_offset 8			; X32-SSE1-NEXT: .cfi_def_cfa_offset 8
	; X32-SSE1-NEXT: pushl %eax			; X32-SSE1-NEXT: pushl %eax
	; X32-SSE1-NEXT: .Lcfi20:			; X32-SSE1-NEXT: .Lcfi30:
	; X32-SSE1-NEXT: .cfi_def_cfa_offset 12			; X32-SSE1-NEXT: .cfi_def_cfa_offset 12
	; X32-SSE1-NEXT: .Lcfi21:			; X32-SSE1-NEXT: .Lcfi31:
	; X32-SSE1-NEXT: .cfi_offset %ebx, -8			; X32-SSE1-NEXT: .cfi_offset %ebx, -8
	; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-SSE1-NEXT: movb (%ecx), %dl			; X32-SSE1-NEXT: movb (%ecx), %dl
	; X32-SSE1-NEXT: movb %dl, {{[0-9]+}}(%esp) # 1-byte Spill			; X32-SSE1-NEXT: movb %dl, {{[0-9]+}}(%esp) # 1-byte Spill
	; X32-SSE1-NEXT: movb 1(%ecx), %dh			; X32-SSE1-NEXT: movb 1(%ecx), %dh
	; X32-SSE1-NEXT: movb 2(%ecx), %bl			; X32-SSE1-NEXT: movb 2(%ecx), %bl
	; X32-SSE1-NEXT: movb 3(%ecx), %bh			; X32-SSE1-NEXT: movb 3(%ecx), %bh
	; X32-SSE1-NEXT: movb 6(%ecx), %dl			; X32-SSE1-NEXT: movb 6(%ecx), %dl
	; X32-SSE1-NEXT: movb 7(%ecx), %cl			; X32-SSE1-NEXT: movb 7(%ecx), %cl
	; X32-SSE1-NEXT: movb %cl, 7(%eax)			; X32-SSE1-NEXT: movb %cl, 7(%eax)
	; X32-SSE1-NEXT: movb %dl, 6(%eax)			; X32-SSE1-NEXT: movb %dl, 6(%eax)
	; X32-SSE1-NEXT: movb %bh, 3(%eax)			; X32-SSE1-NEXT: movb %bh, 3(%eax)
	; X32-SSE1-NEXT: movb %bl, 2(%eax)			; X32-SSE1-NEXT: movb %bl, 2(%eax)
	; X32-SSE1-NEXT: movb %dh, 1(%eax)			; X32-SSE1-NEXT: movb %dh, 1(%eax)
	; X32-SSE1-NEXT: movb {{[0-9]+}}(%esp), %cl # 1-byte Reload			; X32-SSE1-NEXT: movb {{[0-9]+}}(%esp), %cl # 1-byte Reload
	; X32-SSE1-NEXT: movb %cl, (%eax)			; X32-SSE1-NEXT: movb %cl, (%eax)
	; X32-SSE1-NEXT: movb $0, 15(%eax)			; X32-SSE1-NEXT: movb $0, 15(%eax)
	; X32-SSE1-NEXT: movb $0, 14(%eax)			; X32-SSE1-NEXT: movb $0, 14(%eax)
	; X32-SSE1-NEXT: movb $0, 13(%eax)			; X32-SSE1-NEXT: movb $0, 13(%eax)
	; X32-SSE1-NEXT: addl $4, %esp			; X32-SSE1-NEXT: addl $4, %esp
				; X32-SSE1-NEXT: .Lcfi32:
				; X32-SSE1-NEXT: .cfi_def_cfa_offset 8
	; X32-SSE1-NEXT: popl %ebx			; X32-SSE1-NEXT: popl %ebx
				; X32-SSE1-NEXT: .Lcfi33:
				; X32-SSE1-NEXT: .cfi_def_cfa_offset 4
	; X32-SSE1-NEXT: retl $4			; X32-SSE1-NEXT: retl $4
	;			;
	; X32-SSE41-LABEL: merge_16i8_i8_0123uu67uuuuuzzz:			; X32-SSE41-LABEL: merge_16i8_i8_0123uu67uuuuuzzz:
	; X32-SSE41: # BB#0:			; X32-SSE41: # BB#0:
	; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-SSE41-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; X32-SSE41-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X32-SSE41-NEXT: retl			; X32-SSE41-NEXT: retl
	%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 0			%ptr0 = getelementptr inbounds i8, i8* %ptr, i64 0
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	; AVX-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero			; AVX-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero
	; AVX-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero			; AVX-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
	; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; X32-SSE1-LABEL: merge_2i64_i64_12_volatile:			; X32-SSE1-LABEL: merge_2i64_i64_12_volatile:
	; X32-SSE1: # BB#0:			; X32-SSE1: # BB#0:
	; X32-SSE1-NEXT: pushl %edi			; X32-SSE1-NEXT: pushl %edi
	; X32-SSE1-NEXT: .Lcfi22:			; X32-SSE1-NEXT: .Lcfi34:
	; X32-SSE1-NEXT: .cfi_def_cfa_offset 8			; X32-SSE1-NEXT: .cfi_def_cfa_offset 8
	; X32-SSE1-NEXT: pushl %esi			; X32-SSE1-NEXT: pushl %esi
	; X32-SSE1-NEXT: .Lcfi23:			; X32-SSE1-NEXT: .Lcfi35:
	; X32-SSE1-NEXT: .cfi_def_cfa_offset 12			; X32-SSE1-NEXT: .cfi_def_cfa_offset 12
	; X32-SSE1-NEXT: .Lcfi24:			; X32-SSE1-NEXT: .Lcfi36:
	; X32-SSE1-NEXT: .cfi_offset %esi, -12			; X32-SSE1-NEXT: .cfi_offset %esi, -12
	; X32-SSE1-NEXT: .Lcfi25:			; X32-SSE1-NEXT: .Lcfi37:
	; X32-SSE1-NEXT: .cfi_offset %edi, -8			; X32-SSE1-NEXT: .cfi_offset %edi, -8
	; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-SSE1-NEXT: movl 8(%ecx), %edx			; X32-SSE1-NEXT: movl 8(%ecx), %edx
	; X32-SSE1-NEXT: movl 12(%ecx), %esi			; X32-SSE1-NEXT: movl 12(%ecx), %esi
	; X32-SSE1-NEXT: movl 16(%ecx), %edi			; X32-SSE1-NEXT: movl 16(%ecx), %edi
	; X32-SSE1-NEXT: movl 20(%ecx), %ecx			; X32-SSE1-NEXT: movl 20(%ecx), %ecx
	; X32-SSE1-NEXT: movl %ecx, 12(%eax)			; X32-SSE1-NEXT: movl %ecx, 12(%eax)
	; X32-SSE1-NEXT: movl %edi, 8(%eax)			; X32-SSE1-NEXT: movl %edi, 8(%eax)
	; X32-SSE1-NEXT: movl %esi, 4(%eax)			; X32-SSE1-NEXT: movl %esi, 4(%eax)
	; X32-SSE1-NEXT: movl %edx, (%eax)			; X32-SSE1-NEXT: movl %edx, (%eax)
	; X32-SSE1-NEXT: popl %esi			; X32-SSE1-NEXT: popl %esi
				; X32-SSE1-NEXT: .Lcfi38:
				; X32-SSE1-NEXT: .cfi_def_cfa_offset 8
	; X32-SSE1-NEXT: popl %edi			; X32-SSE1-NEXT: popl %edi
				; X32-SSE1-NEXT: .Lcfi39:
				; X32-SSE1-NEXT: .cfi_def_cfa_offset 4
	; X32-SSE1-NEXT: retl $4			; X32-SSE1-NEXT: retl $4
	;			;
	; X32-SSE41-LABEL: merge_2i64_i64_12_volatile:			; X32-SSE41-LABEL: merge_2i64_i64_12_volatile:
	; X32-SSE41: # BB#0:			; X32-SSE41: # BB#0:
	; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-SSE41-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero			; X32-SSE41-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; X32-SSE41-NEXT: pinsrd $1, 12(%eax), %xmm0			; X32-SSE41-NEXT: pinsrd $1, 12(%eax), %xmm0
	; X32-SSE41-NEXT: pinsrd $2, 16(%eax), %xmm0			; X32-SSE41-NEXT: pinsrd $2, 16(%eax), %xmm0
	▲ Show 20 Lines • Show All 145 Lines • Show Last 20 Lines

test/CodeGen/X86/movtopush.ll

	Show First 20 Lines • Show All 370 Lines • ▼ Show 20 Lines
	; LINUX: .cfi_adjust_cfa_offset 4			; LINUX: .cfi_adjust_cfa_offset 4
	; LINUX: pushl $3			; LINUX: pushl $3
	; LINUX: .cfi_adjust_cfa_offset 4			; LINUX: .cfi_adjust_cfa_offset 4
	; LINUX: pushl $2			; LINUX: pushl $2
	; LINUX: .cfi_adjust_cfa_offset 4			; LINUX: .cfi_adjust_cfa_offset 4
	; LINUX: pushl $1			; LINUX: pushl $1
	; LINUX: .cfi_adjust_cfa_offset 4			; LINUX: .cfi_adjust_cfa_offset 4
	; LINUX: calll good			; LINUX: calll good
	; LINUX: addl $28, %esp			; LINUX: addl $16, %esp
	; LINUX: .cfi_adjust_cfa_offset -16			; LINUX: .cfi_adjust_cfa_offset -16
				; LINUX: addl $12, %esp
				; LINUX: .cfi_def_cfa_offset 4
	; LINUX-NOT: add			; LINUX-NOT: add
	; LINUX: retl			; LINUX: retl
	define void @pr27140() optsize {			define void @pr27140() optsize {
	entry:			entry:
	tail call void @good(i32 1, i32 2, i32 3, i32 4)			tail call void @good(i32 1, i32 2, i32 3, i32 4)
	ret void			ret void
	}			}

	Show All 26 Lines

test/CodeGen/X86/mul-constant-result.ll

Show All 27 Lines
; X86-NEXT: decl %ecx		; X86-NEXT: decl %ecx
; X86-NEXT: cmpl $31, %ecx		; X86-NEXT: cmpl $31, %ecx
; X86-NEXT: ja .LBB0_39		; X86-NEXT: ja .LBB0_39
; X86-NEXT: # BB#5:		; X86-NEXT: # BB#5:
; X86-NEXT: jmpl *.LJTI0_0(,%ecx,4)		; X86-NEXT: jmpl *.LJTI0_0(,%ecx,4)
; X86-NEXT: .LBB0_6:		; X86-NEXT: .LBB0_6:
; X86-NEXT: addl %eax, %eax		; X86-NEXT: addl %eax, %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi2:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_39:		; X86-NEXT: .LBB0_39:
		; X86-NEXT: .Lcfi3:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: xorl %eax, %eax		; X86-NEXT: xorl %eax, %eax
; X86-NEXT: .LBB0_40:		; X86-NEXT: .LBB0_40:
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi4:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_7:		; X86-NEXT: .LBB0_7:
		; X86-NEXT: .Lcfi5:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: leal (%eax,%eax,2), %eax		; X86-NEXT: leal (%eax,%eax,2), %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi6:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_8:		; X86-NEXT: .LBB0_8:
		; X86-NEXT: .Lcfi7:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: shll $2, %eax		; X86-NEXT: shll $2, %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi8:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_9:		; X86-NEXT: .LBB0_9:
		; X86-NEXT: .Lcfi9:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: leal (%eax,%eax,4), %eax		; X86-NEXT: leal (%eax,%eax,4), %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi10:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_10:		; X86-NEXT: .LBB0_10:
		; X86-NEXT: .Lcfi11:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: addl %eax, %eax		; X86-NEXT: addl %eax, %eax
; X86-NEXT: leal (%eax,%eax,2), %eax		; X86-NEXT: leal (%eax,%eax,2), %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi12:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_11:		; X86-NEXT: .LBB0_11:
		; X86-NEXT: .Lcfi13:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: leal (,%eax,8), %ecx		; X86-NEXT: leal (,%eax,8), %ecx
; X86-NEXT: jmp .LBB0_12		; X86-NEXT: jmp .LBB0_12
; X86-NEXT: .LBB0_13:		; X86-NEXT: .LBB0_13:
; X86-NEXT: shll $3, %eax		; X86-NEXT: shll $3, %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi14:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_14:		; X86-NEXT: .LBB0_14:
		; X86-NEXT: .Lcfi15:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: leal (%eax,%eax,8), %eax		; X86-NEXT: leal (%eax,%eax,8), %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi16:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_15:		; X86-NEXT: .LBB0_15:
		; X86-NEXT: .Lcfi17:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: addl %eax, %eax		; X86-NEXT: addl %eax, %eax
; X86-NEXT: leal (%eax,%eax,4), %eax		; X86-NEXT: leal (%eax,%eax,4), %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi18:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_16:		; X86-NEXT: .LBB0_16:
		; X86-NEXT: .Lcfi19:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: leal (%eax,%eax,4), %ecx		; X86-NEXT: leal (%eax,%eax,4), %ecx
; X86-NEXT: leal (%eax,%ecx,2), %eax		; X86-NEXT: leal (%eax,%ecx,2), %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi20:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_17:		; X86-NEXT: .LBB0_17:
		; X86-NEXT: .Lcfi21:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: shll $2, %eax		; X86-NEXT: shll $2, %eax
; X86-NEXT: leal (%eax,%eax,2), %eax		; X86-NEXT: leal (%eax,%eax,2), %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi22:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_18:		; X86-NEXT: .LBB0_18:
		; X86-NEXT: .Lcfi23:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: leal (%eax,%eax,2), %ecx		; X86-NEXT: leal (%eax,%eax,2), %ecx
; X86-NEXT: leal (%eax,%ecx,4), %eax		; X86-NEXT: leal (%eax,%ecx,4), %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi24:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_19:		; X86-NEXT: .LBB0_19:
		; X86-NEXT: .Lcfi25:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: leal (%eax,%eax,2), %ecx		; X86-NEXT: leal (%eax,%eax,2), %ecx
; X86-NEXT: jmp .LBB0_20		; X86-NEXT: jmp .LBB0_20
; X86-NEXT: .LBB0_21:		; X86-NEXT: .LBB0_21:
; X86-NEXT: leal (%eax,%eax,4), %eax		; X86-NEXT: leal (%eax,%eax,4), %eax
; X86-NEXT: leal (%eax,%eax,2), %eax		; X86-NEXT: leal (%eax,%eax,2), %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi26:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_22:		; X86-NEXT: .LBB0_22:
		; X86-NEXT: .Lcfi27:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: shll $4, %eax		; X86-NEXT: shll $4, %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi28:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_23:		; X86-NEXT: .LBB0_23:
		; X86-NEXT: .Lcfi29:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: movl %eax, %ecx		; X86-NEXT: movl %eax, %ecx
; X86-NEXT: shll $4, %ecx		; X86-NEXT: shll $4, %ecx
; X86-NEXT: addl %ecx, %eax		; X86-NEXT: addl %ecx, %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi30:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_24:		; X86-NEXT: .LBB0_24:
		; X86-NEXT: .Lcfi31:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: addl %eax, %eax		; X86-NEXT: addl %eax, %eax
; X86-NEXT: leal (%eax,%eax,8), %eax		; X86-NEXT: leal (%eax,%eax,8), %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi32:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_25:		; X86-NEXT: .LBB0_25:
		; X86-NEXT: .Lcfi33:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: leal (%eax,%eax,4), %ecx		; X86-NEXT: leal (%eax,%eax,4), %ecx
; X86-NEXT: shll $2, %ecx		; X86-NEXT: shll $2, %ecx
; X86-NEXT: jmp .LBB0_12		; X86-NEXT: jmp .LBB0_12
; X86-NEXT: .LBB0_26:		; X86-NEXT: .LBB0_26:
; X86-NEXT: shll $2, %eax		; X86-NEXT: shll $2, %eax
; X86-NEXT: leal (%eax,%eax,4), %eax		; X86-NEXT: leal (%eax,%eax,4), %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi34:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_27:		; X86-NEXT: .LBB0_27:
		; X86-NEXT: .Lcfi35:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: leal (%eax,%eax,4), %ecx		; X86-NEXT: leal (%eax,%eax,4), %ecx
; X86-NEXT: leal (%eax,%ecx,4), %eax		; X86-NEXT: leal (%eax,%ecx,4), %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi36:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_28:		; X86-NEXT: .LBB0_28:
		; X86-NEXT: .Lcfi37:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: leal (%eax,%eax,4), %ecx		; X86-NEXT: leal (%eax,%eax,4), %ecx
; X86-NEXT: .LBB0_20:		; X86-NEXT: .LBB0_20:
; X86-NEXT: leal (%eax,%ecx,4), %ecx		; X86-NEXT: leal (%eax,%ecx,4), %ecx
; X86-NEXT: addl %ecx, %eax		; X86-NEXT: addl %ecx, %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi38:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_29:		; X86-NEXT: .LBB0_29:
		; X86-NEXT: .Lcfi39:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: leal (%eax,%eax,2), %ecx		; X86-NEXT: leal (%eax,%eax,2), %ecx
; X86-NEXT: shll $3, %ecx		; X86-NEXT: shll $3, %ecx
; X86-NEXT: jmp .LBB0_12		; X86-NEXT: jmp .LBB0_12
; X86-NEXT: .LBB0_30:		; X86-NEXT: .LBB0_30:
; X86-NEXT: shll $3, %eax		; X86-NEXT: shll $3, %eax
; X86-NEXT: leal (%eax,%eax,2), %eax		; X86-NEXT: leal (%eax,%eax,2), %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi40:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_31:		; X86-NEXT: .LBB0_31:
		; X86-NEXT: .Lcfi41:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: leal (%eax,%eax,4), %eax		; X86-NEXT: leal (%eax,%eax,4), %eax
; X86-NEXT: leal (%eax,%eax,4), %eax		; X86-NEXT: leal (%eax,%eax,4), %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi42:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_32:		; X86-NEXT: .LBB0_32:
		; X86-NEXT: .Lcfi43:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: leal (%eax,%eax,8), %ecx		; X86-NEXT: leal (%eax,%eax,8), %ecx
; X86-NEXT: leal (%ecx,%ecx,2), %ecx		; X86-NEXT: leal (%ecx,%ecx,2), %ecx
; X86-NEXT: jmp .LBB0_12		; X86-NEXT: jmp .LBB0_12
; X86-NEXT: .LBB0_33:		; X86-NEXT: .LBB0_33:
; X86-NEXT: leal (%eax,%eax,8), %eax		; X86-NEXT: leal (%eax,%eax,8), %eax
; X86-NEXT: leal (%eax,%eax,2), %eax		; X86-NEXT: leal (%eax,%eax,2), %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi44:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_34:		; X86-NEXT: .LBB0_34:
		; X86-NEXT: .Lcfi45:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: leal (%eax,%eax,8), %ecx		; X86-NEXT: leal (%eax,%eax,8), %ecx
; X86-NEXT: leal (%ecx,%ecx,2), %ecx		; X86-NEXT: leal (%ecx,%ecx,2), %ecx
; X86-NEXT: addl %ecx, %eax		; X86-NEXT: addl %ecx, %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi46:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_35:		; X86-NEXT: .LBB0_35:
		; X86-NEXT: .Lcfi47:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: leal (%eax,%eax,8), %ecx		; X86-NEXT: leal (%eax,%eax,8), %ecx
; X86-NEXT: leal (%ecx,%ecx,2), %ecx		; X86-NEXT: leal (%ecx,%ecx,2), %ecx
; X86-NEXT: addl %eax, %ecx		; X86-NEXT: addl %eax, %ecx
; X86-NEXT: addl %ecx, %eax		; X86-NEXT: addl %ecx, %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi48:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_36:		; X86-NEXT: .LBB0_36:
		; X86-NEXT: .Lcfi49:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: movl %eax, %ecx		; X86-NEXT: movl %eax, %ecx
; X86-NEXT: shll $5, %ecx		; X86-NEXT: shll $5, %ecx
; X86-NEXT: subl %eax, %ecx		; X86-NEXT: subl %eax, %ecx
; X86-NEXT: jmp .LBB0_12		; X86-NEXT: jmp .LBB0_12
; X86-NEXT: .LBB0_37:		; X86-NEXT: .LBB0_37:
; X86-NEXT: movl %eax, %ecx		; X86-NEXT: movl %eax, %ecx
; X86-NEXT: shll $5, %ecx		; X86-NEXT: shll $5, %ecx
; X86-NEXT: .LBB0_12:		; X86-NEXT: .LBB0_12:
; X86-NEXT: subl %eax, %ecx		; X86-NEXT: subl %eax, %ecx
; X86-NEXT: movl %ecx, %eax		; X86-NEXT: movl %ecx, %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi50:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
; X86-NEXT: .LBB0_38:		; X86-NEXT: .LBB0_38:
		; X86-NEXT: .Lcfi51:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: shll $5, %eax		; X86-NEXT: shll $5, %eax
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi52:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-HSW-LABEL: mult:		; X64-HSW-LABEL: mult:
; X64-HSW: # BB#0:		; X64-HSW: # BB#0:
; X64-HSW-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>		; X64-HSW-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>
; X64-HSW-NEXT: cmpl $1, %esi		; X64-HSW-NEXT: cmpl $1, %esi
; X64-HSW-NEXT: movl $1, %ecx		; X64-HSW-NEXT: movl $1, %ecx
; X64-HSW-NEXT: movl %esi, %eax		; X64-HSW-NEXT: movl %esi, %eax
▲ Show 20 Lines • Show All 326 Lines • ▼ Show 20 Lines	; <label>:70: ; preds = %2, %69, %67, %65, %63, %61, %59, %57, %55, %53, %51, %49, %47, %45, %43, %41, %39, %37, %35, %33, %31, %29, %27, %25, %23, %21, %19, %17, %15, %13, %11, %9, %7
ret i32 %71		ret i32 %71
}		}

; Function Attrs: norecurse nounwind readnone uwtable		; Function Attrs: norecurse nounwind readnone uwtable
define i32 @foo() local_unnamed_addr #0 {		define i32 @foo() local_unnamed_addr #0 {
; X86-LABEL: foo:		; X86-LABEL: foo:
; X86: # BB#0:		; X86: # BB#0:
; X86-NEXT: pushl %ebx		; X86-NEXT: pushl %ebx
; X86-NEXT: .Lcfi2:		; X86-NEXT: .Lcfi53:
; X86-NEXT: .cfi_def_cfa_offset 8		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: pushl %edi		; X86-NEXT: pushl %edi
; X86-NEXT: .Lcfi3:		; X86-NEXT: .Lcfi54:
; X86-NEXT: .cfi_def_cfa_offset 12		; X86-NEXT: .cfi_def_cfa_offset 12
; X86-NEXT: pushl %esi		; X86-NEXT: pushl %esi
; X86-NEXT: .Lcfi4:		; X86-NEXT: .Lcfi55:
; X86-NEXT: .cfi_def_cfa_offset 16		; X86-NEXT: .cfi_def_cfa_offset 16
; X86-NEXT: .Lcfi5:		; X86-NEXT: .Lcfi56:
; X86-NEXT: .cfi_offset %esi, -16		; X86-NEXT: .cfi_offset %esi, -16
; X86-NEXT: .Lcfi6:		; X86-NEXT: .Lcfi57:
; X86-NEXT: .cfi_offset %edi, -12		; X86-NEXT: .cfi_offset %edi, -12
; X86-NEXT: .Lcfi7:		; X86-NEXT: .Lcfi58:
; X86-NEXT: .cfi_offset %ebx, -8		; X86-NEXT: .cfi_offset %ebx, -8
; X86-NEXT: pushl $0		; X86-NEXT: pushl $0
; X86-NEXT: .Lcfi8:		; X86-NEXT: .Lcfi59:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $1		; X86-NEXT: pushl $1
; X86-NEXT: .Lcfi9:		; X86-NEXT: .Lcfi60:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi10:		; X86-NEXT: .Lcfi61:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %esi		; X86-NEXT: movl %eax, %esi
; X86-NEXT: xorl $1, %esi		; X86-NEXT: xorl $1, %esi
; X86-NEXT: pushl $1		; X86-NEXT: pushl $1
; X86-NEXT: .Lcfi11:		; X86-NEXT: .Lcfi62:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $2		; X86-NEXT: pushl $2
; X86-NEXT: .Lcfi12:		; X86-NEXT: .Lcfi63:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi13:		; X86-NEXT: .Lcfi64:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %edi		; X86-NEXT: movl %eax, %edi
; X86-NEXT: xorl $2, %edi		; X86-NEXT: xorl $2, %edi
; X86-NEXT: pushl $1		; X86-NEXT: pushl $1
; X86-NEXT: .Lcfi14:		; X86-NEXT: .Lcfi65:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $3		; X86-NEXT: pushl $3
; X86-NEXT: .Lcfi15:		; X86-NEXT: .Lcfi66:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi16:		; X86-NEXT: .Lcfi67:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %ebx		; X86-NEXT: movl %eax, %ebx
; X86-NEXT: xorl $3, %ebx		; X86-NEXT: xorl $3, %ebx
; X86-NEXT: orl %edi, %ebx		; X86-NEXT: orl %edi, %ebx
; X86-NEXT: pushl $2		; X86-NEXT: pushl $2
; X86-NEXT: .Lcfi17:		; X86-NEXT: .Lcfi68:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $4		; X86-NEXT: pushl $4
; X86-NEXT: .Lcfi18:		; X86-NEXT: .Lcfi69:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi19:		; X86-NEXT: .Lcfi70:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %edi		; X86-NEXT: movl %eax, %edi
; X86-NEXT: xorl $4, %edi		; X86-NEXT: xorl $4, %edi
; X86-NEXT: orl %ebx, %edi		; X86-NEXT: orl %ebx, %edi
; X86-NEXT: pushl $2		; X86-NEXT: pushl $2
; X86-NEXT: .Lcfi20:		; X86-NEXT: .Lcfi71:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $5		; X86-NEXT: pushl $5
; X86-NEXT: .Lcfi21:		; X86-NEXT: .Lcfi72:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi22:		; X86-NEXT: .Lcfi73:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %ebx		; X86-NEXT: movl %eax, %ebx
; X86-NEXT: xorl $5, %ebx		; X86-NEXT: xorl $5, %ebx
; X86-NEXT: orl %edi, %ebx		; X86-NEXT: orl %edi, %ebx
; X86-NEXT: pushl $3		; X86-NEXT: pushl $3
; X86-NEXT: .Lcfi23:		; X86-NEXT: .Lcfi74:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $6		; X86-NEXT: pushl $6
; X86-NEXT: .Lcfi24:		; X86-NEXT: .Lcfi75:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi25:		; X86-NEXT: .Lcfi76:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %edi		; X86-NEXT: movl %eax, %edi
; X86-NEXT: xorl $6, %edi		; X86-NEXT: xorl $6, %edi
; X86-NEXT: orl %ebx, %edi		; X86-NEXT: orl %ebx, %edi
; X86-NEXT: pushl $3		; X86-NEXT: pushl $3
; X86-NEXT: .Lcfi26:		; X86-NEXT: .Lcfi77:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $7		; X86-NEXT: pushl $7
; X86-NEXT: .Lcfi27:		; X86-NEXT: .Lcfi78:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi28:		; X86-NEXT: .Lcfi79:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %ebx		; X86-NEXT: movl %eax, %ebx
; X86-NEXT: xorl $7, %ebx		; X86-NEXT: xorl $7, %ebx
; X86-NEXT: orl %edi, %ebx		; X86-NEXT: orl %edi, %ebx
; X86-NEXT: pushl $4		; X86-NEXT: pushl $4
; X86-NEXT: .Lcfi29:		; X86-NEXT: .Lcfi80:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $8		; X86-NEXT: pushl $8
; X86-NEXT: .Lcfi30:		; X86-NEXT: .Lcfi81:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi31:		; X86-NEXT: .Lcfi82:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %edi		; X86-NEXT: movl %eax, %edi
; X86-NEXT: xorl $8, %edi		; X86-NEXT: xorl $8, %edi
; X86-NEXT: orl %ebx, %edi		; X86-NEXT: orl %ebx, %edi
; X86-NEXT: pushl $4		; X86-NEXT: pushl $4
; X86-NEXT: .Lcfi32:		; X86-NEXT: .Lcfi83:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $9		; X86-NEXT: pushl $9
; X86-NEXT: .Lcfi33:		; X86-NEXT: .Lcfi84:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi34:		; X86-NEXT: .Lcfi85:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %ebx		; X86-NEXT: movl %eax, %ebx
; X86-NEXT: xorl $9, %ebx		; X86-NEXT: xorl $9, %ebx
; X86-NEXT: orl %edi, %ebx		; X86-NEXT: orl %edi, %ebx
; X86-NEXT: pushl $5		; X86-NEXT: pushl $5
; X86-NEXT: .Lcfi35:		; X86-NEXT: .Lcfi86:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $10		; X86-NEXT: pushl $10
; X86-NEXT: .Lcfi36:		; X86-NEXT: .Lcfi87:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi37:		; X86-NEXT: .Lcfi88:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %edi		; X86-NEXT: movl %eax, %edi
; X86-NEXT: xorl $10, %edi		; X86-NEXT: xorl $10, %edi
; X86-NEXT: orl %ebx, %edi		; X86-NEXT: orl %ebx, %edi
; X86-NEXT: pushl $5		; X86-NEXT: pushl $5
; X86-NEXT: .Lcfi38:		; X86-NEXT: .Lcfi89:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $11		; X86-NEXT: pushl $11
; X86-NEXT: .Lcfi39:		; X86-NEXT: .Lcfi90:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi40:		; X86-NEXT: .Lcfi91:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %ebx		; X86-NEXT: movl %eax, %ebx
; X86-NEXT: xorl $11, %ebx		; X86-NEXT: xorl $11, %ebx
; X86-NEXT: orl %edi, %ebx		; X86-NEXT: orl %edi, %ebx
; X86-NEXT: pushl $6		; X86-NEXT: pushl $6
; X86-NEXT: .Lcfi41:		; X86-NEXT: .Lcfi92:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $12		; X86-NEXT: pushl $12
; X86-NEXT: .Lcfi42:		; X86-NEXT: .Lcfi93:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi43:		; X86-NEXT: .Lcfi94:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %edi		; X86-NEXT: movl %eax, %edi
; X86-NEXT: xorl $12, %edi		; X86-NEXT: xorl $12, %edi
; X86-NEXT: orl %ebx, %edi		; X86-NEXT: orl %ebx, %edi
; X86-NEXT: pushl $6		; X86-NEXT: pushl $6
; X86-NEXT: .Lcfi44:		; X86-NEXT: .Lcfi95:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $13		; X86-NEXT: pushl $13
; X86-NEXT: .Lcfi45:		; X86-NEXT: .Lcfi96:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi46:		; X86-NEXT: .Lcfi97:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %ebx		; X86-NEXT: movl %eax, %ebx
; X86-NEXT: xorl $13, %ebx		; X86-NEXT: xorl $13, %ebx
; X86-NEXT: orl %edi, %ebx		; X86-NEXT: orl %edi, %ebx
; X86-NEXT: pushl $7		; X86-NEXT: pushl $7
; X86-NEXT: .Lcfi47:		; X86-NEXT: .Lcfi98:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $14		; X86-NEXT: pushl $14
; X86-NEXT: .Lcfi48:		; X86-NEXT: .Lcfi99:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi49:		; X86-NEXT: .Lcfi100:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %edi		; X86-NEXT: movl %eax, %edi
; X86-NEXT: xorl $14, %edi		; X86-NEXT: xorl $14, %edi
; X86-NEXT: orl %ebx, %edi		; X86-NEXT: orl %ebx, %edi
; X86-NEXT: pushl $7		; X86-NEXT: pushl $7
; X86-NEXT: .Lcfi50:		; X86-NEXT: .Lcfi101:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $15		; X86-NEXT: pushl $15
; X86-NEXT: .Lcfi51:		; X86-NEXT: .Lcfi102:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi52:		; X86-NEXT: .Lcfi103:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %ebx		; X86-NEXT: movl %eax, %ebx
; X86-NEXT: xorl $15, %ebx		; X86-NEXT: xorl $15, %ebx
; X86-NEXT: orl %edi, %ebx		; X86-NEXT: orl %edi, %ebx
; X86-NEXT: pushl $8		; X86-NEXT: pushl $8
; X86-NEXT: .Lcfi53:		; X86-NEXT: .Lcfi104:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $16		; X86-NEXT: pushl $16
; X86-NEXT: .Lcfi54:		; X86-NEXT: .Lcfi105:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi55:		; X86-NEXT: .Lcfi106:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %edi		; X86-NEXT: movl %eax, %edi
; X86-NEXT: xorl $16, %edi		; X86-NEXT: xorl $16, %edi
; X86-NEXT: orl %ebx, %edi		; X86-NEXT: orl %ebx, %edi
; X86-NEXT: pushl $8		; X86-NEXT: pushl $8
; X86-NEXT: .Lcfi56:		; X86-NEXT: .Lcfi107:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $17		; X86-NEXT: pushl $17
; X86-NEXT: .Lcfi57:		; X86-NEXT: .Lcfi108:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi58:		; X86-NEXT: .Lcfi109:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %ebx		; X86-NEXT: movl %eax, %ebx
; X86-NEXT: xorl $17, %ebx		; X86-NEXT: xorl $17, %ebx
; X86-NEXT: orl %edi, %ebx		; X86-NEXT: orl %edi, %ebx
; X86-NEXT: pushl $9		; X86-NEXT: pushl $9
; X86-NEXT: .Lcfi59:		; X86-NEXT: .Lcfi110:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $18		; X86-NEXT: pushl $18
; X86-NEXT: .Lcfi60:		; X86-NEXT: .Lcfi111:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi61:		; X86-NEXT: .Lcfi112:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %edi		; X86-NEXT: movl %eax, %edi
; X86-NEXT: xorl $18, %edi		; X86-NEXT: xorl $18, %edi
; X86-NEXT: orl %ebx, %edi		; X86-NEXT: orl %ebx, %edi
; X86-NEXT: pushl $9		; X86-NEXT: pushl $9
; X86-NEXT: .Lcfi62:		; X86-NEXT: .Lcfi113:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $19		; X86-NEXT: pushl $19
; X86-NEXT: .Lcfi63:		; X86-NEXT: .Lcfi114:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi64:		; X86-NEXT: .Lcfi115:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %ebx		; X86-NEXT: movl %eax, %ebx
; X86-NEXT: xorl $19, %ebx		; X86-NEXT: xorl $19, %ebx
; X86-NEXT: orl %edi, %ebx		; X86-NEXT: orl %edi, %ebx
; X86-NEXT: pushl $10		; X86-NEXT: pushl $10
; X86-NEXT: .Lcfi65:		; X86-NEXT: .Lcfi116:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $20		; X86-NEXT: pushl $20
; X86-NEXT: .Lcfi66:		; X86-NEXT: .Lcfi117:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi67:		; X86-NEXT: .Lcfi118:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %edi		; X86-NEXT: movl %eax, %edi
; X86-NEXT: xorl $20, %edi		; X86-NEXT: xorl $20, %edi
; X86-NEXT: orl %ebx, %edi		; X86-NEXT: orl %ebx, %edi
; X86-NEXT: pushl $10		; X86-NEXT: pushl $10
; X86-NEXT: .Lcfi68:		; X86-NEXT: .Lcfi119:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $21		; X86-NEXT: pushl $21
; X86-NEXT: .Lcfi69:		; X86-NEXT: .Lcfi120:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi70:		; X86-NEXT: .Lcfi121:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %ebx		; X86-NEXT: movl %eax, %ebx
; X86-NEXT: xorl $21, %ebx		; X86-NEXT: xorl $21, %ebx
; X86-NEXT: orl %edi, %ebx		; X86-NEXT: orl %edi, %ebx
; X86-NEXT: pushl $11		; X86-NEXT: pushl $11
; X86-NEXT: .Lcfi71:		; X86-NEXT: .Lcfi122:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $22		; X86-NEXT: pushl $22
; X86-NEXT: .Lcfi72:		; X86-NEXT: .Lcfi123:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi73:		; X86-NEXT: .Lcfi124:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %edi		; X86-NEXT: movl %eax, %edi
; X86-NEXT: xorl $22, %edi		; X86-NEXT: xorl $22, %edi
; X86-NEXT: orl %ebx, %edi		; X86-NEXT: orl %ebx, %edi
; X86-NEXT: pushl $11		; X86-NEXT: pushl $11
; X86-NEXT: .Lcfi74:		; X86-NEXT: .Lcfi125:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $23		; X86-NEXT: pushl $23
; X86-NEXT: .Lcfi75:		; X86-NEXT: .Lcfi126:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi76:		; X86-NEXT: .Lcfi127:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %ebx		; X86-NEXT: movl %eax, %ebx
; X86-NEXT: xorl $23, %ebx		; X86-NEXT: xorl $23, %ebx
; X86-NEXT: orl %edi, %ebx		; X86-NEXT: orl %edi, %ebx
; X86-NEXT: pushl $12		; X86-NEXT: pushl $12
; X86-NEXT: .Lcfi77:		; X86-NEXT: .Lcfi128:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $24		; X86-NEXT: pushl $24
; X86-NEXT: .Lcfi78:		; X86-NEXT: .Lcfi129:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi79:		; X86-NEXT: .Lcfi130:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %edi		; X86-NEXT: movl %eax, %edi
; X86-NEXT: xorl $24, %edi		; X86-NEXT: xorl $24, %edi
; X86-NEXT: orl %ebx, %edi		; X86-NEXT: orl %ebx, %edi
; X86-NEXT: pushl $12		; X86-NEXT: pushl $12
; X86-NEXT: .Lcfi80:		; X86-NEXT: .Lcfi131:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $25		; X86-NEXT: pushl $25
; X86-NEXT: .Lcfi81:		; X86-NEXT: .Lcfi132:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi82:		; X86-NEXT: .Lcfi133:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %ebx		; X86-NEXT: movl %eax, %ebx
; X86-NEXT: xorl $25, %ebx		; X86-NEXT: xorl $25, %ebx
; X86-NEXT: orl %edi, %ebx		; X86-NEXT: orl %edi, %ebx
; X86-NEXT: pushl $13		; X86-NEXT: pushl $13
; X86-NEXT: .Lcfi83:		; X86-NEXT: .Lcfi134:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $26		; X86-NEXT: pushl $26
; X86-NEXT: .Lcfi84:		; X86-NEXT: .Lcfi135:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi85:		; X86-NEXT: .Lcfi136:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %edi		; X86-NEXT: movl %eax, %edi
; X86-NEXT: xorl $26, %edi		; X86-NEXT: xorl $26, %edi
; X86-NEXT: orl %ebx, %edi		; X86-NEXT: orl %ebx, %edi
; X86-NEXT: pushl $13		; X86-NEXT: pushl $13
; X86-NEXT: .Lcfi86:		; X86-NEXT: .Lcfi137:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $27		; X86-NEXT: pushl $27
; X86-NEXT: .Lcfi87:		; X86-NEXT: .Lcfi138:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi88:		; X86-NEXT: .Lcfi139:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %ebx		; X86-NEXT: movl %eax, %ebx
; X86-NEXT: xorl $27, %ebx		; X86-NEXT: xorl $27, %ebx
; X86-NEXT: orl %edi, %ebx		; X86-NEXT: orl %edi, %ebx
; X86-NEXT: pushl $14		; X86-NEXT: pushl $14
; X86-NEXT: .Lcfi89:		; X86-NEXT: .Lcfi140:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $28		; X86-NEXT: pushl $28
; X86-NEXT: .Lcfi90:		; X86-NEXT: .Lcfi141:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi91:		; X86-NEXT: .Lcfi142:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %edi		; X86-NEXT: movl %eax, %edi
; X86-NEXT: xorl $28, %edi		; X86-NEXT: xorl $28, %edi
; X86-NEXT: orl %ebx, %edi		; X86-NEXT: orl %ebx, %edi
; X86-NEXT: pushl $14		; X86-NEXT: pushl $14
; X86-NEXT: .Lcfi92:		; X86-NEXT: .Lcfi143:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $29		; X86-NEXT: pushl $29
; X86-NEXT: .Lcfi93:		; X86-NEXT: .Lcfi144:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi94:		; X86-NEXT: .Lcfi145:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %ebx		; X86-NEXT: movl %eax, %ebx
; X86-NEXT: xorl $29, %ebx		; X86-NEXT: xorl $29, %ebx
; X86-NEXT: orl %edi, %ebx		; X86-NEXT: orl %edi, %ebx
; X86-NEXT: pushl $15		; X86-NEXT: pushl $15
; X86-NEXT: .Lcfi95:		; X86-NEXT: .Lcfi146:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $30		; X86-NEXT: pushl $30
; X86-NEXT: .Lcfi96:		; X86-NEXT: .Lcfi147:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi97:		; X86-NEXT: .Lcfi148:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %edi		; X86-NEXT: movl %eax, %edi
; X86-NEXT: xorl $30, %edi		; X86-NEXT: xorl $30, %edi
; X86-NEXT: orl %ebx, %edi		; X86-NEXT: orl %ebx, %edi
; X86-NEXT: pushl $15		; X86-NEXT: pushl $15
; X86-NEXT: .Lcfi98:		; X86-NEXT: .Lcfi149:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $31		; X86-NEXT: pushl $31
; X86-NEXT: .Lcfi99:		; X86-NEXT: .Lcfi150:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi100:		; X86-NEXT: .Lcfi151:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: movl %eax, %ebx		; X86-NEXT: movl %eax, %ebx
; X86-NEXT: xorl $31, %ebx		; X86-NEXT: xorl $31, %ebx
; X86-NEXT: orl %edi, %ebx		; X86-NEXT: orl %edi, %ebx
; X86-NEXT: orl %esi, %ebx		; X86-NEXT: orl %esi, %ebx
; X86-NEXT: pushl $16		; X86-NEXT: pushl $16
; X86-NEXT: .Lcfi101:		; X86-NEXT: .Lcfi152:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: pushl $32		; X86-NEXT: pushl $32
; X86-NEXT: .Lcfi102:		; X86-NEXT: .Lcfi153:
; X86-NEXT: .cfi_adjust_cfa_offset 4		; X86-NEXT: .cfi_adjust_cfa_offset 4
; X86-NEXT: calll mult		; X86-NEXT: calll mult
; X86-NEXT: addl $8, %esp		; X86-NEXT: addl $8, %esp
; X86-NEXT: .Lcfi103:		; X86-NEXT: .Lcfi154:
; X86-NEXT: .cfi_adjust_cfa_offset -8		; X86-NEXT: .cfi_adjust_cfa_offset -8
; X86-NEXT: xorl $32, %eax		; X86-NEXT: xorl $32, %eax
; X86-NEXT: orl %ebx, %eax		; X86-NEXT: orl %ebx, %eax
; X86-NEXT: movl $-1, %eax		; X86-NEXT: movl $-1, %eax
; X86-NEXT: jne .LBB1_2		; X86-NEXT: jne .LBB1_2
; X86-NEXT: # BB#1:		; X86-NEXT: # BB#1:
; X86-NEXT: xorl %eax, %eax		; X86-NEXT: xorl %eax, %eax
; X86-NEXT: .LBB1_2:		; X86-NEXT: .LBB1_2:
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
		; X86-NEXT: .Lcfi155:
		; X86-NEXT: .cfi_def_cfa_offset 12
; X86-NEXT: popl %edi		; X86-NEXT: popl %edi
		; X86-NEXT: .Lcfi156:
		; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: popl %ebx		; X86-NEXT: popl %ebx
		; X86-NEXT: .Lcfi157:
		; X86-NEXT: .cfi_def_cfa_offset 4
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-HSW-LABEL: foo:		; X64-HSW-LABEL: foo:
; X64-HSW: # BB#0:		; X64-HSW: # BB#0:
; X64-HSW-NEXT: pushq %rbp		; X64-HSW-NEXT: pushq %rbp
; X64-HSW-NEXT: .Lcfi0:		; X64-HSW-NEXT: .Lcfi0:
; X64-HSW-NEXT: .cfi_def_cfa_offset 16		; X64-HSW-NEXT: .cfi_def_cfa_offset 16
; X64-HSW-NEXT: pushq %r15		; X64-HSW-NEXT: pushq %r15
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines
; X64-HSW-NEXT: movl $32, %edi		; X64-HSW-NEXT: movl $32, %edi
; X64-HSW-NEXT: movl $16, %esi		; X64-HSW-NEXT: movl $16, %esi
; X64-HSW-NEXT: callq mult		; X64-HSW-NEXT: callq mult
; X64-HSW-NEXT: xorl $32, %eax		; X64-HSW-NEXT: xorl $32, %eax
; X64-HSW-NEXT: orl %ebx, %eax		; X64-HSW-NEXT: orl %ebx, %eax
; X64-HSW-NEXT: movl $-1, %eax		; X64-HSW-NEXT: movl $-1, %eax
; X64-HSW-NEXT: cmovel %r12d, %eax		; X64-HSW-NEXT: cmovel %r12d, %eax
; X64-HSW-NEXT: popq %rbx		; X64-HSW-NEXT: popq %rbx
		; X64-HSW-NEXT: .Lcfi10:
		; X64-HSW-NEXT: .cfi_def_cfa_offset 40
; X64-HSW-NEXT: popq %r12		; X64-HSW-NEXT: popq %r12
		; X64-HSW-NEXT: .Lcfi11:
		; X64-HSW-NEXT: .cfi_def_cfa_offset 32
; X64-HSW-NEXT: popq %r14		; X64-HSW-NEXT: popq %r14
		; X64-HSW-NEXT: .Lcfi12:
		; X64-HSW-NEXT: .cfi_def_cfa_offset 24
; X64-HSW-NEXT: popq %r15		; X64-HSW-NEXT: popq %r15
		; X64-HSW-NEXT: .Lcfi13:
		; X64-HSW-NEXT: .cfi_def_cfa_offset 16
; X64-HSW-NEXT: popq %rbp		; X64-HSW-NEXT: popq %rbp
		; X64-HSW-NEXT: .Lcfi14:
		; X64-HSW-NEXT: .cfi_def_cfa_offset 8
; X64-HSW-NEXT: retq		; X64-HSW-NEXT: retq
%1 = tail call i32 @mult(i32 1, i32 0)		%1 = tail call i32 @mult(i32 1, i32 0)
%2 = icmp ne i32 %1, 1		%2 = icmp ne i32 %1, 1
%3 = tail call i32 @mult(i32 2, i32 1)		%3 = tail call i32 @mult(i32 2, i32 1)
%4 = icmp ne i32 %3, 2		%4 = icmp ne i32 %3, 2
%5 = or i1 %2, %4		%5 = or i1 %2, %4
%6 = tail call i32 @mult(i32 3, i32 1)		%6 = tail call i32 @mult(i32 3, i32 1)
%7 = icmp ne i32 %6, 3		%7 = icmp ne i32 %6, 3
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

test/CodeGen/X86/mul-i256.ll

	Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines
	; X32-NEXT: movl %ecx, 20(%ebx)			; X32-NEXT: movl %ecx, 20(%ebx)
	; X32-NEXT: movl %edx, 24(%ebx)			; X32-NEXT: movl %edx, 24(%ebx)
	; X32-NEXT: movl %eax, 28(%ebx)			; X32-NEXT: movl %eax, 28(%ebx)
	; X32-NEXT: leal -12(%ebp), %esp			; X32-NEXT: leal -12(%ebp), %esp
	; X32-NEXT: popl %esi			; X32-NEXT: popl %esi
	; X32-NEXT: popl %edi			; X32-NEXT: popl %edi
	; X32-NEXT: popl %ebx			; X32-NEXT: popl %ebx
	; X32-NEXT: popl %ebp			; X32-NEXT: popl %ebp
				; X32-NEXT: .Lcfi6:
				; X32-NEXT: .cfi_def_cfa %esp, 4
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test:			; X64-LABEL: test:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: pushq %r15			; X64-NEXT: pushq %r15
	; X64-NEXT: .Lcfi0:			; X64-NEXT: .Lcfi0:
	; X64-NEXT: .cfi_def_cfa_offset 16			; X64-NEXT: .cfi_def_cfa_offset 16
	; X64-NEXT: pushq %r14			; X64-NEXT: pushq %r14
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; X64-NEXT: adcq %rcx, %rdx			; X64-NEXT: adcq %rcx, %rdx
	; X64-NEXT: addq %r10, %rax			; X64-NEXT: addq %r10, %rax
	; X64-NEXT: adcq %rdi, %rdx			; X64-NEXT: adcq %rdi, %rdx
	; X64-NEXT: movq %r14, (%r9)			; X64-NEXT: movq %r14, (%r9)
	; X64-NEXT: movq %r11, 8(%r9)			; X64-NEXT: movq %r11, 8(%r9)
	; X64-NEXT: movq %rax, 16(%r9)			; X64-NEXT: movq %rax, 16(%r9)
	; X64-NEXT: movq %rdx, 24(%r9)			; X64-NEXT: movq %rdx, 24(%r9)
	; X64-NEXT: popq %rbx			; X64-NEXT: popq %rbx
				; X64-NEXT: .Lcfi6:
				; X64-NEXT: .cfi_def_cfa_offset 24
	; X64-NEXT: popq %r14			; X64-NEXT: popq %r14
				; X64-NEXT: .Lcfi7:
				; X64-NEXT: .cfi_def_cfa_offset 16
	; X64-NEXT: popq %r15			; X64-NEXT: popq %r15
				; X64-NEXT: .Lcfi8:
				; X64-NEXT: .cfi_def_cfa_offset 8
	; X64-NEXT: retq			; X64-NEXT: retq
	entry:			entry:
	%av = load i256, i256* %a			%av = load i256, i256* %a
	%bv = load i256, i256* %b			%bv = load i256, i256* %b
	%r = mul i256 %av, %bv			%r = mul i256 %av, %bv
	store i256 %r, i256* %out			store i256 %r, i256* %out
	ret void			ret void
	}			}

	attributes #0 = { norecurse nounwind uwtable }			attributes #0 = { norecurse nounwind uwtable }

test/CodeGen/X86/pr21792.ll

	Show All 23 Lines
	; CHECK-NEXT: leaq stuff(%r9), %rsi			; CHECK-NEXT: leaq stuff(%r9), %rsi
	; CHECK-NEXT: andl $2032, %edx # imm = 0x7F0			; CHECK-NEXT: andl $2032, %edx # imm = 0x7F0
	; CHECK-NEXT: leaq stuff(%rdx), %rdx			; CHECK-NEXT: leaq stuff(%rdx), %rdx
	; CHECK-NEXT: leaq stuff(%rcx), %rcx			; CHECK-NEXT: leaq stuff(%rcx), %rcx
	; CHECK-NEXT: leaq stuff+8(%rax), %r8			; CHECK-NEXT: leaq stuff+8(%rax), %r8
	; CHECK-NEXT: leaq stuff+8(%r9), %r9			; CHECK-NEXT: leaq stuff+8(%r9), %r9
	; CHECK-NEXT: callq toto			; CHECK-NEXT: callq toto
	; CHECK-NEXT: popq %rax			; CHECK-NEXT: popq %rax
				; CHECK-NEXT: .Lcfi1:
				; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%tmp2 = bitcast <4 x float> %vx to <2 x i64>			%tmp2 = bitcast <4 x float> %vx to <2 x i64>
	%and.i = and <2 x i64> %tmp2, <i64 8727373547504, i64 8727373547504>			%and.i = and <2 x i64> %tmp2, <i64 8727373547504, i64 8727373547504>
	%tmp3 = bitcast <2 x i64> %and.i to <4 x i32>			%tmp3 = bitcast <2 x i64> %and.i to <4 x i32>
	%index.sroa.0.0.vec.extract = extractelement <4 x i32> %tmp3, i32 0			%index.sroa.0.0.vec.extract = extractelement <4 x i32> %tmp3, i32 0
	%idx.ext = sext i32 %index.sroa.0.0.vec.extract to i64			%idx.ext = sext i32 %index.sroa.0.0.vec.extract to i64
	%add.ptr = getelementptr inbounds i8, i8* bitcast ([256 x double]* @stuff to i8*), i64 %idx.ext			%add.ptr = getelementptr inbounds i8, i8* bitcast ([256 x double]* @stuff to i8*), i64 %idx.ext
	Show All 22 Lines

test/CodeGen/X86/pr29112.ll

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vmovaps %xmm9, (%rsp)			; CHECK-NEXT: vmovaps %xmm9, (%rsp)
	; CHECK-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm3 # 16-byte Reload			; CHECK-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm3 # 16-byte Reload
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: callq foo			; CHECK-NEXT: callq foo
	; CHECK-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm1 # 16-byte Reload			; CHECK-NEXT: vmovaps {{[0-9]+}}(%rsp), %xmm1 # 16-byte Reload
	; CHECK-NEXT: vaddps {{[0-9]+}}(%rsp), %xmm1, %xmm1 # 16-byte Folded Reload			; CHECK-NEXT: vaddps {{[0-9]+}}(%rsp), %xmm1, %xmm1 # 16-byte Folded Reload
	; CHECK-NEXT: vaddps %xmm0, %xmm1, %xmm0			; CHECK-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; CHECK-NEXT: addq $88, %rsp			; CHECK-NEXT: addq $88, %rsp
				; CHECK-NEXT: .Lcfi1:
				; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%a1 = shufflevector <16 x float>%c1, <16 x float>%c2, <4 x i32> <i32 4, i32 20, i32 1, i32 17>			%a1 = shufflevector <16 x float>%c1, <16 x float>%c2, <4 x i32> <i32 4, i32 20, i32 1, i32 17>

	%a2 = shufflevector <16 x float>%c1, <16 x float>%c2, <4 x i32> <i32 4, i32 21, i32 1, i32 17>			%a2 = shufflevector <16 x float>%c1, <16 x float>%c2, <4 x i32> <i32 4, i32 21, i32 1, i32 17>
	%a5 = shufflevector <16 x float>%c1, <16 x float>%c2, <4 x i32> <i32 4, i32 20, i32 1, i32 27>			%a5 = shufflevector <16 x float>%c1, <16 x float>%c2, <4 x i32> <i32 4, i32 20, i32 1, i32 27>
	%a6 = shufflevector <16 x float>%c1, <16 x float>%c2, <4 x i32> <i32 3, i32 20, i32 1, i32 17>			%a6 = shufflevector <16 x float>%c1, <16 x float>%c2, <4 x i32> <i32 3, i32 20, i32 1, i32 17>
	%a7 = shufflevector <16 x float>%c1, <16 x float>%c2, <4 x i32> <i32 4, i32 21, i32 1, i32 17>			%a7 = shufflevector <16 x float>%c1, <16 x float>%c2, <4 x i32> <i32 4, i32 21, i32 1, i32 17>
	%a8 = shufflevector <16 x float>%c1, <16 x float>%c2, <4 x i32> <i32 5, i32 20, i32 1, i32 19>			%a8 = shufflevector <16 x float>%c1, <16 x float>%c2, <4 x i32> <i32 5, i32 20, i32 1, i32 19>
	Show All 30 Lines

test/CodeGen/X86/pr30430.ll

	Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vmovss %xmm9, {{[0-9]+}}(%rsp) # 4-byte Spill			; CHECK-NEXT: vmovss %xmm9, {{[0-9]+}}(%rsp) # 4-byte Spill
	; CHECK-NEXT: vmovss %xmm10, {{[0-9]+}}(%rsp) # 4-byte Spill			; CHECK-NEXT: vmovss %xmm10, {{[0-9]+}}(%rsp) # 4-byte Spill
	; CHECK-NEXT: vmovss %xmm11, {{[0-9]+}}(%rsp) # 4-byte Spill			; CHECK-NEXT: vmovss %xmm11, {{[0-9]+}}(%rsp) # 4-byte Spill
	; CHECK-NEXT: vmovss %xmm12, {{[0-9]+}}(%rsp) # 4-byte Spill			; CHECK-NEXT: vmovss %xmm12, {{[0-9]+}}(%rsp) # 4-byte Spill
	; CHECK-NEXT: vmovss %xmm13, {{[0-9]+}}(%rsp) # 4-byte Spill			; CHECK-NEXT: vmovss %xmm13, {{[0-9]+}}(%rsp) # 4-byte Spill
	; CHECK-NEXT: vmovss %xmm14, (%rsp) # 4-byte Spill			; CHECK-NEXT: vmovss %xmm14, (%rsp) # 4-byte Spill
	; CHECK-NEXT: movq %rbp, %rsp			; CHECK-NEXT: movq %rbp, %rsp
	; CHECK-NEXT: popq %rbp			; CHECK-NEXT: popq %rbp
				; CHECK-NEXT: .Lcfi3:
				; CHECK-NEXT: .cfi_def_cfa %rsp, 8
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%__A.addr.i = alloca float, align 4			%__A.addr.i = alloca float, align 4
	%__B.addr.i = alloca float, align 4			%__B.addr.i = alloca float, align 4
	%__C.addr.i = alloca float, align 4			%__C.addr.i = alloca float, align 4
	%__D.addr.i = alloca float, align 4			%__D.addr.i = alloca float, align 4
	%__E.addr.i = alloca float, align 4			%__E.addr.i = alloca float, align 4
	%__F.addr.i = alloca float, align 4			%__F.addr.i = alloca float, align 4
	▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

test/CodeGen/X86/pr32241.ll

	Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: .LBB0_4: # %lor.end5			; CHECK-NEXT: .LBB0_4: # %lor.end5
	; CHECK-NEXT: movb {{[0-9]+}}(%esp), %al # 1-byte Reload			; CHECK-NEXT: movb {{[0-9]+}}(%esp), %al # 1-byte Reload
	; CHECK-NEXT: andb $1, %al			; CHECK-NEXT: andb $1, %al
	; CHECK-NEXT: movzbl %al, %ecx			; CHECK-NEXT: movzbl %al, %ecx
	; CHECK-NEXT: movw %cx, %dx			; CHECK-NEXT: movw %cx, %dx
	; CHECK-NEXT: movw %dx, {{[0-9]+}}(%esp)			; CHECK-NEXT: movw %dx, {{[0-9]+}}(%esp)
	; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: addl $24, %esp			; CHECK-NEXT: addl $24, %esp
				; CHECK-NEXT: .Lcfi3:
				; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: popl %esi			; CHECK-NEXT: popl %esi
				; CHECK-NEXT: .Lcfi4:
				; CHECK-NEXT: .cfi_def_cfa_offset 4
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	entry:			entry:
	%aa = alloca i16, align 2			%aa = alloca i16, align 2
	%bb = alloca i16, align 2			%bb = alloca i16, align 2
	%cc = alloca i16, align 2			%cc = alloca i16, align 2
	store i16 10959, i16* %aa, align 2			store i16 10959, i16* %aa, align 2
	store i16 -15498, i16* %bb, align 2			store i16 -15498, i16* %bb, align 2
	store i16 19417, i16* %cc, align 2			store i16 19417, i16* %cc, align 2
	Show All 29 Lines

test/CodeGen/X86/pr32256.ll

	Show All 22 Lines
	; CHECK-NEXT: movb %al, %cl			; CHECK-NEXT: movb %al, %cl
	; CHECK-NEXT: movb %cl, (%esp) # 1-byte Spill			; CHECK-NEXT: movb %cl, (%esp) # 1-byte Spill
	; CHECK-NEXT: jmp .LBB0_2			; CHECK-NEXT: jmp .LBB0_2
	; CHECK-NEXT: .LBB0_2: # %land.end			; CHECK-NEXT: .LBB0_2: # %land.end
	; CHECK-NEXT: movb (%esp), %al # 1-byte Reload			; CHECK-NEXT: movb (%esp), %al # 1-byte Reload
	; CHECK-NEXT: andb $1, %al			; CHECK-NEXT: andb $1, %al
	; CHECK-NEXT: movb %al, {{[0-9]+}}(%esp)			; CHECK-NEXT: movb %al, {{[0-9]+}}(%esp)
	; CHECK-NEXT: addl $2, %esp			; CHECK-NEXT: addl $2, %esp
				; CHECK-NEXT: .Lcfi1:
				; CHECK-NEXT: .cfi_def_cfa_offset 4
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	entry:			entry:
	%b = alloca i8, align 1			%b = alloca i8, align 1
	%0 = load i8, i8* @c, align 1			%0 = load i8, i8* @c, align 1
	%tobool = trunc i8 %0 to i1			%tobool = trunc i8 %0 to i1
	%lnot = xor i1 %tobool, true			%lnot = xor i1 %tobool, true
	br i1 %lnot, label %land.rhs, label %land.end			br i1 %lnot, label %land.rhs, label %land.end

	Show All 9 Lines

test/CodeGen/X86/pr32329.ll

	Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; X86-NEXT: cmovnel %ecx, %esi			; X86-NEXT: cmovnel %ecx, %esi
	; X86-NEXT: cmpl %edx, %edi			; X86-NEXT: cmpl %edx, %edi
	; X86-NEXT: movl %ebp, var_50+4			; X86-NEXT: movl %ebp, var_50+4
	; X86-NEXT: setge var_205			; X86-NEXT: setge var_205
	; X86-NEXT: movl %esi, var_50			; X86-NEXT: movl %esi, var_50
	; X86-NEXT: imull %eax, %ebx			; X86-NEXT: imull %eax, %ebx
	; X86-NEXT: movb %bl, var_218			; X86-NEXT: movb %bl, var_218
	; X86-NEXT: popl %esi			; X86-NEXT: popl %esi
				; X86-NEXT: .Lcfi8:
				; X86-NEXT: .cfi_def_cfa_offset 16
	; X86-NEXT: popl %edi			; X86-NEXT: popl %edi
				; X86-NEXT: .Lcfi9:
				; X86-NEXT: .cfi_def_cfa_offset 12
	; X86-NEXT: popl %ebx			; X86-NEXT: popl %ebx
				; X86-NEXT: .Lcfi10:
				; X86-NEXT: .cfi_def_cfa_offset 8
	; X86-NEXT: popl %ebp			; X86-NEXT: popl %ebp
				; X86-NEXT: .Lcfi11:
				; X86-NEXT: .cfi_def_cfa_offset 4
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: foo:			; X64-LABEL: foo:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: movl {{.*}}(%rip), %eax			; X64-NEXT: movl {{.*}}(%rip), %eax
	; X64-NEXT: movsbl {{.*}}(%rip), %r9d			; X64-NEXT: movsbl {{.*}}(%rip), %r9d
	; X64-NEXT: movzwl {{.*}}(%rip), %r8d			; X64-NEXT: movzwl {{.*}}(%rip), %r8d
	; X64-NEXT: movl {{.*}}(%rip), %esi			; X64-NEXT: movl {{.*}}(%rip), %esi
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

test/CodeGen/X86/pr32345.ll

	Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
	; 6860-NEXT: movb %al, %cl			; 6860-NEXT: movb %al, %cl
	; 6860-NEXT: movl {{[0-9]+}}(%esp), %eax # 4-byte Reload			; 6860-NEXT: movl {{[0-9]+}}(%esp), %eax # 4-byte Reload
	; 6860-NEXT: movb %cl, (%eax)			; 6860-NEXT: movb %cl, (%eax)
	; 6860-NEXT: leal -12(%ebp), %esp			; 6860-NEXT: leal -12(%ebp), %esp
	; 6860-NEXT: popl %esi			; 6860-NEXT: popl %esi
	; 6860-NEXT: popl %edi			; 6860-NEXT: popl %edi
	; 6860-NEXT: popl %ebx			; 6860-NEXT: popl %ebx
	; 6860-NEXT: popl %ebp			; 6860-NEXT: popl %ebp
				; 6860-NEXT: .Lcfi6:
				; 6860-NEXT: .cfi_def_cfa %esp, 4
	; 6860-NEXT: retl			; 6860-NEXT: retl
	;			;
	; X64-LABEL: foo:			; X64-LABEL: foo:
	; X64: # BB#0: # %bb			; X64: # BB#0: # %bb
	; X64-NEXT: movzwl {{.*}}(%rip), %ecx			; X64-NEXT: movzwl {{.*}}(%rip), %ecx
	; X64-NEXT: movw {{.*}}(%rip), %ax			; X64-NEXT: movw {{.*}}(%rip), %ax
	; X64-NEXT: xorw %cx, %ax			; X64-NEXT: xorw %cx, %ax
	; X64-NEXT: xorl %ecx, %eax			; X64-NEXT: xorl %ecx, %eax
	Show All 30 Lines
	; 686-NEXT: testb $32, %cl			; 686-NEXT: testb $32, %cl
	; 686-NEXT: jne .LBB0_2			; 686-NEXT: jne .LBB0_2
	; 686-NEXT: # BB#1: # %bb			; 686-NEXT: # BB#1: # %bb
	; 686-NEXT: movl %eax, %edx			; 686-NEXT: movl %eax, %edx
	; 686-NEXT: .LBB0_2: # %bb			; 686-NEXT: .LBB0_2: # %bb
	; 686-NEXT: movb %dl, (%eax)			; 686-NEXT: movb %dl, (%eax)
	; 686-NEXT: movl %ebp, %esp			; 686-NEXT: movl %ebp, %esp
	; 686-NEXT: popl %ebp			; 686-NEXT: popl %ebp
				; 686-NEXT: .Lcfi3:
				; 686-NEXT: .cfi_def_cfa %esp, 4
	; 686-NEXT: retl			; 686-NEXT: retl
	bb:			bb:
	%tmp = alloca i64, align 8			%tmp = alloca i64, align 8
	%tmp1 = load i16, i16* @var_22, align 2			%tmp1 = load i16, i16* @var_22, align 2
	%tmp2 = zext i16 %tmp1 to i32			%tmp2 = zext i16 %tmp1 to i32
	%tmp3 = load i16, i16* @var_27, align 2			%tmp3 = load i16, i16* @var_27, align 2
	%tmp4 = zext i16 %tmp3 to i32			%tmp4 = zext i16 %tmp3 to i32
	%tmp5 = xor i32 %tmp2, %tmp4			%tmp5 = xor i32 %tmp2, %tmp4
	Show All 23 Lines

test/CodeGen/X86/pr32451.ll

	Show All 27 Lines
	; CHECK-NEXT: andb $1, %bl			; CHECK-NEXT: andb $1, %bl
	; CHECK-NEXT: movzbl %bl, %edx			; CHECK-NEXT: movzbl %bl, %edx
	; CHECK-NEXT: movl %edx, (%esp)			; CHECK-NEXT: movl %edx, (%esp)
	; CHECK-NEXT: movl %eax, {{[0-9]+}}(%esp) # 4-byte Spill			; CHECK-NEXT: movl %eax, {{[0-9]+}}(%esp) # 4-byte Spill
	; CHECK-NEXT: calll jl_box_int32			; CHECK-NEXT: calll jl_box_int32
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx # 4-byte Reload			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %ecx # 4-byte Reload
	; CHECK-NEXT: movl %eax, (%ecx)			; CHECK-NEXT: movl %eax, (%ecx)
	; CHECK-NEXT: addl $16, %esp			; CHECK-NEXT: addl $16, %esp
				; CHECK-NEXT: .Lcfi3:
				; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: popl %ebx			; CHECK-NEXT: popl %ebx
				; CHECK-NEXT: .Lcfi4:
				; CHECK-NEXT: .cfi_def_cfa_offset 4
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	top:			top:
	%3 = alloca i8***			%3 = alloca i8***
	store volatile i8* %1, i8** %3			store volatile i8* %1, i8** %3
	%4 = call i8*** @julia.gc_root_decl()			%4 = call i8*** @julia.gc_root_decl()
	%5 = call i8**** @jl_get_ptls_states()			%5 = call i8**** @jl_get_ptls_states()
	%6 = bitcast i8** %5 to i8*			%6 = bitcast i8** %5 to i8*
	%7 = getelementptr i8, i8* %6, i64 3			%7 = getelementptr i8, i8* %6, i64 3
	Show All 19 Lines

test/CodeGen/X86/pr9743.ll

	; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -disable-fp-elim -asm-verbose=0 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -disable-fp-elim -asm-verbose=0 \| FileCheck %s

	define void @f() {			define void @f() {
	ret void			ret void
	}			}

	; CHECK: .cfi_startproc			; CHECK: .cfi_startproc
	; CHECK-NEXT: pushq			; CHECK-NEXT: pushq
	; CHECK-NEXT: :			; CHECK-NEXT: :
	; CHECK-NEXT: .cfi_def_cfa_offset 16			; CHECK-NEXT: .cfi_def_cfa_offset 16
	; CHECK-NEXT: :			; CHECK-NEXT: :
	; CHECK-NEXT: .cfi_offset %rbp, -16			; CHECK-NEXT: .cfi_offset %rbp, -16
	; CHECK-NEXT: movq %rsp, %rbp			; CHECK-NEXT: movq %rsp, %rbp
	; CHECK-NEXT: :			; CHECK-NEXT: :
	; CHECK-NEXT: .cfi_def_cfa_register %rbp			; CHECK-NEXT: .cfi_def_cfa_register %rbp
	; CHECK-NEXT: popq %rbp			; CHECK-NEXT: popq %rbp
				; CHECK-NEXT: :
				; CHECK-NEXT: .cfi_def_cfa %rsp, 8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret

test/CodeGen/X86/push-cfi-debug.ll

	Show All 17 Lines
	; CHECK: subl $8, %esp			; CHECK: subl $8, %esp
	; CHECK: .cfi_adjust_cfa_offset 8			; CHECK: .cfi_adjust_cfa_offset 8
	; CHECK: pushl $4			; CHECK: pushl $4
	; CHECK: .cfi_adjust_cfa_offset 4			; CHECK: .cfi_adjust_cfa_offset 4
	; CHECK: pushl $3			; CHECK: pushl $3
	; CHECK: .cfi_adjust_cfa_offset 4			; CHECK: .cfi_adjust_cfa_offset 4
	; CHECK: calll stdfoo			; CHECK: calll stdfoo
	; CHECK: .cfi_adjust_cfa_offset -8			; CHECK: .cfi_adjust_cfa_offset -8
	; CHECK: addl $20, %esp			; CHECK: addl $8, %esp
	; CHECK: .cfi_adjust_cfa_offset -8			; CHECK: .cfi_adjust_cfa_offset -8
				; CHECK: addl $12, %esp
				; CHECK: .cfi_def_cfa_offset 4
	define void @test1() #0 !dbg !4 {			define void @test1() #0 !dbg !4 {
	entry:			entry:
	tail call void @foo(i32 1, i32 2) #1, !dbg !10			tail call void @foo(i32 1, i32 2) #1, !dbg !10
	tail call x86_stdcallcc void @stdfoo(i32 3, i32 4) #1, !dbg !11			tail call x86_stdcallcc void @stdfoo(i32 3, i32 4) #1, !dbg !11
	ret void, !dbg !12			ret void, !dbg !12
	}			}

	attributes #0 = { nounwind optsize }			attributes #0 = { nounwind optsize }
	Show All 17 Lines

test/CodeGen/X86/push-cfi-obj.ll

	; RUN: llc < %s -mtriple=i686-pc-linux -filetype=obj \| llvm-readobj -s -sr -sd \| FileCheck %s -check-prefix=LINUX			; RUN: llc < %s -mtriple=i686-pc-linux -filetype=obj \| llvm-readobj -s -sr -sd \| FileCheck %s -check-prefix=LINUX
	; RUN: llc < %s -mtriple=i686-darwin-macosx10.7 -filetype=obj \| llvm-readobj -sections \| FileCheck -check-prefix=DARWIN %s			; RUN: llc < %s -mtriple=i686-darwin-macosx10.7 -filetype=obj \| llvm-readobj -sections \| FileCheck -check-prefix=DARWIN %s

	; On darwin, check that we manage to generate the compact unwind section			; On darwin, check that we manage to generate the compact unwind section
	; DARWIN: Name: __compact_unwind			; DARWIN: Name: __compact_unwind
	; DARWIN: Segment: __LD			; DARWIN: Segment: __LD

	; LINUX: Name: .eh_frame			; LINUX: Name: .eh_frame
	; LINUX-NEXT: Type: SHT_PROGBITS (0x1)			; LINUX-NEXT: Type: SHT_PROGBITS (0x1)
	; LINUX-NEXT: Flags [ (0x2)			; LINUX-NEXT: Flags [ (0x2)
	; LINUX-NEXT: SHF_ALLOC (0x2)			; LINUX-NEXT: SHF_ALLOC (0x2)
	; LINUX-NEXT: ]			; LINUX-NEXT: ]
	; LINUX-NEXT: Address: 0x0			; LINUX-NEXT: Address: 0x0
	; LINUX-NEXT: Offset: 0x68			; LINUX-NEXT: Offset: 0x68
	; LINUX-NEXT: Size: 64			; LINUX-NEXT: Size: 72
	; LINUX-NEXT: Link: 0			; LINUX-NEXT: Link: 0
	; LINUX-NEXT: Info: 0			; LINUX-NEXT: Info: 0
	; LINUX-NEXT: AddressAlignment: 4			; LINUX-NEXT: AddressAlignment: 4
	; LINUX-NEXT: EntrySize: 0			; LINUX-NEXT: EntrySize: 0
	; LINUX-NEXT: Relocations [			; LINUX-NEXT: Relocations [
	; LINUX-NEXT: ]			; LINUX-NEXT: ]
	; LINUX-NEXT: SectionData (			; LINUX-NEXT: SectionData (
	; LINUX-NEXT: 0000: 1C000000 00000000 017A504C 5200017C \|.........zPLR..\|\|			; LINUX-NEXT: 0000: 1C000000 00000000 017A504C 5200017C \|.........zPLR..\|\|
	; LINUX-NEXT: 0010: 08070000 00000000 1B0C0404 88010000 \|................\|			; LINUX-NEXT: 0010: 08070000 00000000 1B0C0404 88010000 \|................\|
	; LINUX-NEXT: 0020: 1C000000 24000000 00000000 1D000000 \|....$...........\|			; LINUX-NEXT: 0020: 24000000 24000000 00000000 1D000000 \|$...$...........\|
	; LINUX-NEXT: 0030: 04000000 00410E08 8502420D 05432E10 \|.....A....B..C..\|			; LINUX-NEXT: 0030: 04000000 00410E08 8502420D 05432E10 \|.....A....B..C..\|
				; LINUX-NEXT: 0040: 540C0404 410C0508 \|T...A...\|
	; LINUX-NEXT: )			; LINUX-NEXT: )

	declare i32 @__gxx_personality_v0(...)			declare i32 @__gxx_personality_v0(...)
	declare void @good(i32 %a, i32 %b, i32 %c, i32 %d)			declare void @good(i32 %a, i32 %b, i32 %c, i32 %d)

	define void @test() #0 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {			define void @test() #0 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
	entry:			entry:
	invoke void @good(i32 1, i32 2, i32 3, i32 4)			invoke void @good(i32 1, i32 2, i32 3, i32 4)
	to label %continue unwind label %cleanup			to label %continue unwind label %cleanup
	continue:			continue:
	ret void			ret void
	cleanup:			cleanup:
	landingpad { i8*, i32 }			landingpad { i8*, i32 }
	cleanup			cleanup
	ret void			ret void
	}			}

	attributes #0 = { optsize "no-frame-pointer-elim"="true" }			attributes #0 = { optsize "no-frame-pointer-elim"="true" }

test/CodeGen/X86/push-cfi.ll

	Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; LINUX-NEXT: .cfi_adjust_cfa_offset 4			; LINUX-NEXT: .cfi_adjust_cfa_offset 4
	; LINUX-NEXT: pushl $2			; LINUX-NEXT: pushl $2
	; LINUX-NEXT: Lcfi{{[0-9]+}}:			; LINUX-NEXT: Lcfi{{[0-9]+}}:
	; LINUX-NEXT: .cfi_adjust_cfa_offset 4			; LINUX-NEXT: .cfi_adjust_cfa_offset 4
	; LINUX-NEXT: pushl $1			; LINUX-NEXT: pushl $1
	; LINUX-NEXT: Lcfi{{[0-9]+}}:			; LINUX-NEXT: Lcfi{{[0-9]+}}:
	; LINUX-NEXT: .cfi_adjust_cfa_offset 4			; LINUX-NEXT: .cfi_adjust_cfa_offset 4
	; LINUX-NEXT: call			; LINUX-NEXT: call
	; LINUX-NEXT: addl $28, %esp			; LINUX-NEXT: addl $16, %esp
	; LINUX: .cfi_adjust_cfa_offset -16			; LINUX: .cfi_adjust_cfa_offset -16
				; LINUX: addl $12, %esp
	; DARWIN-NOT: .cfi_escape			; DARWIN-NOT: .cfi_escape
	; DARWIN-NOT: pushl			; DARWIN-NOT: pushl
	define void @test2_nofp() #0 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {			define void @test2_nofp() #0 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
	entry:			entry:
	call void @good(i32 1, i32 2, i32 3, i32 4)			call void @good(i32 1, i32 2, i32 3, i32 4)
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 210 Lines • Show Last 20 Lines

test/CodeGen/X86/return-ext.ll

Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	entry:
%1 = zext i1 %0 to i32		%1 = zext i1 %0 to i32
ret i32 %1		ret i32 %1

; The high 24 bits of %eax from a function returning i1 are undefined.		; The high 24 bits of %eax from a function returning i1 are undefined.
; CHECK-LABEL: use_i1:		; CHECK-LABEL: use_i1:
; CHECK: call		; CHECK: call
; CHECK-NEXT: movzbl		; CHECK-NEXT: movzbl
; CHECK-NEXT: {{pop\|add}}		; CHECK-NEXT: {{pop\|add}}
		; CHECK-NEXT: :
		; CHECK-NEXT: .cfi_def_cfa_offset {{4\|8}}
; CHECK-NEXT: ret		; CHECK-NEXT: ret
}		}

define i32 @use_i8() {		define i32 @use_i8() {
entry:		entry:
%0 = call i8 @unsigned_i8();		%0 = call i8 @unsigned_i8();
%1 = zext i8 %0 to i32		%1 = zext i8 %0 to i32
ret i32 %1		ret i32 %1

; The high 24 bits of %eax from a function returning i8 are undefined.		; The high 24 bits of %eax from a function returning i8 are undefined.
; CHECK-LABEL: use_i8:		; CHECK-LABEL: use_i8:
; CHECK: call		; CHECK: call
; CHECK-NEXT: movzbl		; CHECK-NEXT: movzbl
; CHECK-NEXT: {{pop\|add}}		; CHECK-NEXT: {{pop\|add}}
		; CHECK-NEXT: :
		; CHECK-NEXT: .cfi_def_cfa_offset {{4\|8}}
; CHECK-NEXT: ret		; CHECK-NEXT: ret
}		}

define i32 @use_i16() {		define i32 @use_i16() {
entry:		entry:
%0 = call i16 @unsigned_i16();		%0 = call i16 @unsigned_i16();
%1 = zext i16 %0 to i32		%1 = zext i16 %0 to i32
ret i32 %1		ret i32 %1

; The high 16 bits of %eax from a function returning i16 are undefined.		; The high 16 bits of %eax from a function returning i16 are undefined.
; CHECK-LABEL: use_i16:		; CHECK-LABEL: use_i16:
; CHECK: call		; CHECK: call
; CHECK-NEXT: movzwl		; CHECK-NEXT: movzwl
; CHECK-NEXT: {{pop\|add}}		; CHECK-NEXT: {{pop\|add}}
		; CHECK-NEXT: :
		; CHECK-NEXT: .cfi_def_cfa_offset {{4\|8}}
; CHECK-NEXT: ret		; CHECK-NEXT: ret
}		}

test/CodeGen/X86/rtm.ll

	Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: pushq %rax			; X64-NEXT: pushq %rax
	; X64-NEXT: .Lcfi0:			; X64-NEXT: .Lcfi0:
	; X64-NEXT: .cfi_def_cfa_offset 16			; X64-NEXT: .cfi_def_cfa_offset 16
	; X64-NEXT: movl %edi, {{[0-9]+}}(%rsp)			; X64-NEXT: movl %edi, {{[0-9]+}}(%rsp)
	; X64-NEXT: xabort $1			; X64-NEXT: xabort $1
	; X64-NEXT: callq f1			; X64-NEXT: callq f1
	; X64-NEXT: popq %rax			; X64-NEXT: popq %rax
				; X64-NEXT: .Lcfi1:
				; X64-NEXT: .cfi_def_cfa_offset 8
	; X64-NEXT: retq			; X64-NEXT: retq
	entry:			entry:
	%x.addr = alloca i32, align 4			%x.addr = alloca i32, align 4
	store i32 %x, i32* %x.addr, align 4			store i32 %x, i32* %x.addr, align 4
	call void @llvm.x86.xabort(i8 1)			call void @llvm.x86.xabort(i8 1)
	call void @f1()			call void @f1()
	ret void			ret void
	}			}

test/CodeGen/X86/setcc-lowering.ll

	Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
	; KNL-32-NEXT: cmovlw %dx, %si			; KNL-32-NEXT: cmovlw %dx, %si
	; KNL-32-NEXT: kmovw %esi, %k1			; KNL-32-NEXT: kmovw %esi, %k1
	; KNL-32-NEXT: kandw %k0, %k1, %k1			; KNL-32-NEXT: kandw %k0, %k1, %k1
	; KNL-32-NEXT: kmovw %k1, %esi			; KNL-32-NEXT: kmovw %k1, %esi
	; KNL-32-NEXT: testw %si, %si			; KNL-32-NEXT: testw %si, %si
	; KNL-32-NEXT: jne .LBB1_1			; KNL-32-NEXT: jne .LBB1_1
	; KNL-32-NEXT: # BB#2: # %for_exit600			; KNL-32-NEXT: # BB#2: # %for_exit600
	; KNL-32-NEXT: popl %esi			; KNL-32-NEXT: popl %esi
				; KNL-32-NEXT: .Lcfi2:
				; KNL-32-NEXT: .cfi_def_cfa_offset 4
	; KNL-32-NEXT: retl			; KNL-32-NEXT: retl
	allocas:			allocas:
	br label %for_test11.preheader			br label %for_test11.preheader

	for_test11.preheader: ; preds = %for_test11.preheader, %allocas			for_test11.preheader: ; preds = %for_test11.preheader, %allocas
	br i1 undef, label %for_loop599, label %for_test11.preheader			br i1 undef, label %for_loop599, label %for_test11.preheader

	for_loop599: ; preds = %for_loop599, %for_test11.preheader			for_loop599: ; preds = %for_loop599, %for_test11.preheader
	Show All 11 Lines

test/CodeGen/X86/statepoint-call-lowering.ll

	Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines

	define i1 @test_relocate(i32 addrspace(1)* %a) gc "statepoint-example" {			define i1 @test_relocate(i32 addrspace(1)* %a) gc "statepoint-example" {
	; CHECK-LABEL: test_relocate			; CHECK-LABEL: test_relocate
	; Check that an ununsed relocate has no code-generation impact			; Check that an ununsed relocate has no code-generation impact
	; CHECK: pushq %rax			; CHECK: pushq %rax
	; CHECK: callq return_i1			; CHECK: callq return_i1
	; CHECK-NEXT: .Ltmp5:			; CHECK-NEXT: .Ltmp5:
	; CHECK-NEXT: popq %rcx			; CHECK-NEXT: popq %rcx
				; CHECK-NEXT: .Lcfi11:
				; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%safepoint_token = tail call token (i64, i32, i1 (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_i1f(i64 0, i32 0, i1 () @return_i1, i32 0, i32 0, i32 0, i32 0, i32 addrspace(1)* %a)			%safepoint_token = tail call token (i64, i32, i1 (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_i1f(i64 0, i32 0, i1 () @return_i1, i32 0, i32 0, i32 0, i32 0, i32 addrspace(1)* %a)
	%call1 = call i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(token %safepoint_token, i32 7, i32 7)			%call1 = call i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(token %safepoint_token, i32 7, i32 7)
	%call2 = call zeroext i1 @llvm.experimental.gc.result.i1(token %safepoint_token)			%call2 = call zeroext i1 @llvm.experimental.gc.result.i1(token %safepoint_token)
	ret i1 %call2			ret i1 %call2
	}			}

	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

test/CodeGen/X86/statepoint-gctransition-call-lowering.ll

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines

	define i1 @test_relocate(i32 addrspace(1)* %a) gc "statepoint-example" {			define i1 @test_relocate(i32 addrspace(1)* %a) gc "statepoint-example" {
	; CHECK-LABEL: test_relocate			; CHECK-LABEL: test_relocate
	; Check that an ununsed relocate has no code-generation impact			; Check that an ununsed relocate has no code-generation impact
	; CHECK: pushq %rax			; CHECK: pushq %rax
	; CHECK: callq return_i1			; CHECK: callq return_i1
	; CHECK-NEXT: .Ltmp4:			; CHECK-NEXT: .Ltmp4:
	; CHECK-NEXT: popq %rcx			; CHECK-NEXT: popq %rcx
				; CHECK-NEXT: :
				; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%safepoint_token = tail call token (i64, i32, i1 (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_i1f(i64 0, i32 0, i1 () @return_i1, i32 0, i32 1, i32 0, i32 0, i32 addrspace(1)* %a)			%safepoint_token = tail call token (i64, i32, i1 (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_i1f(i64 0, i32 0, i1 () @return_i1, i32 0, i32 1, i32 0, i32 0, i32 addrspace(1)* %a)
	%call1 = call i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(token %safepoint_token, i32 7, i32 7)			%call1 = call i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(token %safepoint_token, i32 7, i32 7)
	%call2 = call zeroext i1 @llvm.experimental.gc.result.i1(token %safepoint_token)			%call2 = call zeroext i1 @llvm.experimental.gc.result.i1(token %safepoint_token)
	ret i1 %call2			ret i1 %call2
	}			}

	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

test/CodeGen/X86/statepoint-invoke.ll

Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	entry:
; CHECK: callq some_call		; CHECK: callq some_call
%sp1 = invoke token (i64, i32, void (i64 addrspace(1)), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i64f(i64 0, i32 0, void (i64 addrspace(1)) @some_call, i32 1, i32 0, i64 addrspace(1)* %val1, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i64 addrspace(1)* null, i64 addrspace(1)* undef)		%sp1 = invoke token (i64, i32, void (i64 addrspace(1)), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i64f(i64 0, i32 0, void (i64 addrspace(1)) @some_call, i32 1, i32 0, i64 addrspace(1)* %val1, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i64 addrspace(1)* null, i64 addrspace(1)* undef)
to label %normal_return unwind label %exceptional_return		to label %normal_return unwind label %exceptional_return

normal_return:		normal_return:
; CHECK-LABEL: %normal_return		; CHECK-LABEL: %normal_return
; CHECK: xorl %eax, %eax		; CHECK: xorl %eax, %eax
; CHECK-NEXT: popq		; CHECK-NEXT: popq
		; CHECK-NEXT: :
		; CHECK-NEXT: .cfi_def_cfa_offset 8
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%null.relocated = call coldcc i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(token %sp1, i32 13, i32 13)		%null.relocated = call coldcc i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(token %sp1, i32 13, i32 13)
%undef.relocated = call coldcc i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(token %sp1, i32 14, i32 14)		%undef.relocated = call coldcc i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(token %sp1, i32 14, i32 14)
ret i64 addrspace(1)* %null.relocated		ret i64 addrspace(1)* %null.relocated

exceptional_return:		exceptional_return:
%landing_pad = landingpad token		%landing_pad = landingpad token
cleanup		cleanup
Show All 11 Lines	entry:
%c = inttoptr i64 15 to i64 addrspace(1)*		%c = inttoptr i64 15 to i64 addrspace(1)*
; CHECK: callq		; CHECK: callq
%sp = invoke token (i64, i32, void (i64 addrspace(1)), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i64f(i64 0, i32 0, void (i64 addrspace(1)) @some_call, i32 1, i32 0, i64 addrspace(1)* %val1, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i32 addrspace(1)* %aa, i64 addrspace(1)* %c)		%sp = invoke token (i64, i32, void (i64 addrspace(1)), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp1i64f(i64 0, i32 0, void (i64 addrspace(1)) @some_call, i32 1, i32 0, i64 addrspace(1)* %val1, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i32 addrspace(1)* %aa, i64 addrspace(1)* %c)
to label %normal_return unwind label %exceptional_return		to label %normal_return unwind label %exceptional_return

normal_return:		normal_return:
; CHECK: leaq		; CHECK: leaq
; CHECK-NEXT: popq		; CHECK-NEXT: popq
		; CHECK-NEXT: :
		; CHECK-NEXT: .cfi_def_cfa_offset 8
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%aa.rel = call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(token %sp, i32 13, i32 13)		%aa.rel = call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(token %sp, i32 13, i32 13)
%aa.converted = bitcast i32 addrspace(1)* %aa.rel to i64 addrspace(1)*		%aa.converted = bitcast i32 addrspace(1)* %aa.rel to i64 addrspace(1)*
ret i64 addrspace(1)* %aa.converted		ret i64 addrspace(1)* %aa.converted

exceptional_return:		exceptional_return:
; CHECK: movl $15		; CHECK: movl $15
; CHECK-NEXT: popq		; CHECK-NEXT: popq
		; CHECK-NEXT: :
		; CHECK-NEXT: .cfi_def_cfa_offset 8
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%landing_pad = landingpad token		%landing_pad = landingpad token
cleanup		cleanup
%aa.rel2 = call coldcc i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(token %landing_pad, i32 14, i32 14)		%aa.rel2 = call coldcc i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(token %landing_pad, i32 14, i32 14)
ret i64 addrspace(1)* %aa.rel2		ret i64 addrspace(1)* %aa.rel2
}		}

declare token @llvm.experimental.gc.statepoint.p0f_isVoidp1i64f(i64, i32, void (i64 addrspace(1)), i32, i32, ...)		declare token @llvm.experimental.gc.statepoint.p0f_isVoidp1i64f(i64, i32, void (i64 addrspace(1)), i32, i32, ...)
declare token @llvm.experimental.gc.statepoint.p0f_p1i64p1i64f(i64, i32, i64 addrspace(1)* (i64 addrspace(1)), i32, i32, ...)		declare token @llvm.experimental.gc.statepoint.p0f_p1i64p1i64f(i64, i32, i64 addrspace(1)* (i64 addrspace(1)), i32, i32, ...)

declare i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(token, i32, i32)		declare i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(token, i32, i32)
declare i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(token, i32, i32)		declare i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(token, i32, i32)
declare i64 addrspace(1)* @llvm.experimental.gc.result.p1i64(token)		declare i64 addrspace(1)* @llvm.experimental.gc.result.p1i64(token)

test/CodeGen/X86/throws-cfi-fp.ll

This file was added.

				; RUN: llc %s -o - \| FileCheck %s

				; ModuleID = 'throws-cfi-fp.cpp'
				source_filename = "throws-cfi-fp.cpp"
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				$__clang_call_terminate = comdat any

				@_ZL11ShouldThrow = internal unnamed_addr global i1 false, align 1
				@_ZTIi = external constant i8*
				@str = private unnamed_addr constant [20 x i8] c"Threw an exception!\00"

				; Function Attrs: uwtable
				define void @_Z6throwsv() #0 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {

				; CHECK-LABEL: _Z6throwsv:
				; CHECK: popq %rbp
				; CHECK-NEXT: :
				; CHECK-NEXT: .cfi_def_cfa %rsp, 8
				; CHECK-NEXT: retq
				; CHECK-NEXT: .LBB0_1:
				; CHECK-NEXT: :
				; CHECK-NEXT: .cfi_def_cfa %rbp, 16

				entry:
				%.b5 = load i1, i1* @_ZL11ShouldThrow, align 1
				br i1 %.b5, label %if.then, label %try.cont

				if.then: ; preds = %entry
				%exception = tail call i8* @__cxa_allocate_exception(i64 4)
				%0 = bitcast i8* %exception to i32*
				store i32 1, i32* %0, align 16
				invoke void @__cxa_throw(i8* %exception, i8* bitcast (i8** @_ZTIi to i8), i8 null)
				to label %unreachable unwind label %lpad

				lpad: ; preds = %if.then
				%1 = landingpad { i8*, i32 }
				catch i8* null
				%2 = extractvalue { i8*, i32 } %1, 0
				%3 = tail call i8* @__cxa_begin_catch(i8* %2)
				%puts = tail call i32 @puts(i8* getelementptr inbounds ([20 x i8], [20 x i8]* @str, i64 0, i64 0))
				invoke void @__cxa_rethrow() #4
				to label %unreachable unwind label %lpad1

				lpad1: ; preds = %lpad
				%4 = landingpad { i8*, i32 }
				cleanup
				invoke void @__cxa_end_catch()
				to label %eh.resume unwind label %terminate.lpad

				try.cont: ; preds = %entry
				ret void

				eh.resume: ; preds = %lpad1
				resume { i8*, i32 } %4

				terminate.lpad: ; preds = %lpad1
				%5 = landingpad { i8*, i32 }
				catch i8* null
				%6 = extractvalue { i8*, i32 } %5, 0
				tail call void @__clang_call_terminate(i8* %6) #5
				unreachable

				unreachable: ; preds = %lpad, %if.then
				unreachable
				}

				declare i8* @__cxa_allocate_exception(i64)

				declare void @__cxa_throw(i8, i8, i8*)

				declare i32 @__gxx_personality_v0(...)

				declare i8* @__cxa_begin_catch(i8*)

				declare void @__cxa_rethrow()

				declare void @__cxa_end_catch()

				; Function Attrs: noinline noreturn nounwind
				declare void @__clang_call_terminate(i8*)

				declare void @_ZSt9terminatev()

				; Function Attrs: nounwind
				declare i32 @puts(i8* nocapture readonly) #3

				attributes #0 = { "no-frame-pointer-elim"="true" }

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!7, !8, !9}

				!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "clang version 5.0.0 (http://llvm.org/git/clang.git 3f8116e6a2815b1d5f3491493938d0c63c9f42c9) (http://llvm.org/git/llvm.git 4fde77f8f1a8e4482e69b6a7484bc7d1b99b3c0a)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, globals: !3)
				!1 = !DIFile(filename: "throws-cfi-fp.cpp", directory: "epilogue-dwarf/test")
				!2 = !{}
				!3 = !{!4}
				!4 = !DIGlobalVariableExpression(var: !5)
				!5 = distinct !DIGlobalVariable(name: "ShouldThrow", linkageName: "_ZL11ShouldThrow", scope: !0, file: !1, line: 2, type: !6, isLocal: true, isDefinition: true)
				!6 = !DIBasicType(name: "bool", size: 8, encoding: DW_ATE_boolean)
				!7 = !{i32 2, !"Dwarf Version", i32 4}
				!8 = !{i32 2, !"Debug Info Version", i32 3}
				!9 = !{i32 1, !"wchar_size", i32 4}

test/CodeGen/X86/throws-cfi-no-fp.ll

This file was added.

				; RUN: llc %s -o - \| FileCheck %s

				; ModuleID = 'throws-cfi-no-fp.cpp'
				source_filename = "throws-cfi-no-fp.cpp"
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				$__clang_call_terminate = comdat any

				@_ZL11ShouldThrow = internal unnamed_addr global i1 false, align 1
				@_ZTIi = external constant i8*
				@str = private unnamed_addr constant [20 x i8] c"Threw an exception!\00"

				; Function Attrs: uwtable
				define void @_Z6throwsv() personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {

				; CHECK-LABEL: _Z6throwsv:
				; CHECK: popq %rbx
				; CHECK-NEXT: :
				; CHECK-NEXT: .cfi_def_cfa_offset 8
				; CHECK-NEXT: retq
				; CHECK-NEXT: .LBB0_1:
				; CHECK-NEXT: :
				; CHECK-NEXT: .cfi_def_cfa_offset 16

				entry:
				%.b5 = load i1, i1* @_ZL11ShouldThrow, align 1
				br i1 %.b5, label %if.then, label %try.cont

				if.then: ; preds = %entry
				%exception = tail call i8* @__cxa_allocate_exception(i64 4)
				%0 = bitcast i8* %exception to i32*
				store i32 1, i32* %0, align 16
				invoke void @__cxa_throw(i8* %exception, i8* bitcast (i8** @_ZTIi to i8), i8 null)
				to label %unreachable unwind label %lpad

				lpad: ; preds = %if.then
				%1 = landingpad { i8*, i32 }
				catch i8* null
				%2 = extractvalue { i8*, i32 } %1, 0
				%3 = tail call i8* @__cxa_begin_catch(i8* %2)
				%puts = tail call i32 @puts(i8* getelementptr inbounds ([20 x i8], [20 x i8]* @str, i64 0, i64 0))
				invoke void @__cxa_rethrow() #4
				to label %unreachable unwind label %lpad1

				lpad1: ; preds = %lpad
				%4 = landingpad { i8*, i32 }
				cleanup
				invoke void @__cxa_end_catch()
				to label %eh.resume unwind label %terminate.lpad

				try.cont: ; preds = %entry
				ret void

				eh.resume: ; preds = %lpad1
				resume { i8*, i32 } %4

				terminate.lpad: ; preds = %lpad1
				%5 = landingpad { i8*, i32 }
				catch i8* null
				%6 = extractvalue { i8*, i32 } %5, 0
				tail call void @__clang_call_terminate(i8* %6)
				unreachable

				unreachable: ; preds = %lpad, %if.then
				unreachable
				}

				declare i8* @__cxa_allocate_exception(i64)

				declare void @__cxa_throw(i8, i8, i8*)

				declare i32 @__gxx_personality_v0(...)

				declare i8* @__cxa_begin_catch(i8*)

				declare void @__cxa_rethrow()

				declare void @__cxa_end_catch()

				; Function Attrs: noinline noreturn nounwind
				declare void @__clang_call_terminate(i8*)

				declare void @_ZSt9terminatev()


				; Function Attrs: nounwind
				declare i32 @puts(i8* nocapture readonly)

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!7, !8, !9}

				!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "clang version 5.0.0 (http://llvm.org/git/clang.git 3f8116e6a2815b1d5f3491493938d0c63c9f42c9) (http://llvm.org/git/llvm.git 4fde77f8f1a8e4482e69b6a7484bc7d1b99b3c0a)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, globals: !3)
				!1 = !DIFile(filename: "throws-cfi-no-fp.cpp", directory: "epilogue-dwarf/test")
				!2 = !{}
				!3 = !{!4}
				!4 = !DIGlobalVariableExpression(var: !5)
				!5 = distinct !DIGlobalVariable(name: "ShouldThrow", linkageName: "_ZL11ShouldThrow", scope: !0, file: !1, line: 2, type: !6, isLocal: true, isDefinition: true)
				!6 = !DIBasicType(name: "bool", size: 8, encoding: DW_ATE_boolean)
				!7 = !{i32 2, !"Dwarf Version", i32 4}
				!8 = !{i32 2, !"Debug Info Version", i32 3}
				!9 = !{i32 1, !"wchar_size", i32 4}

test/CodeGen/X86/vector-sext.ll

	Show First 20 Lines • Show All 3,339 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: vpinsrw $5, %edx, %xmm1, %xmm1			; AVX1-NEXT: vpinsrw $5, %edx, %xmm1, %xmm1
	; AVX1-NEXT: shlq $57, %rsi			; AVX1-NEXT: shlq $57, %rsi
	; AVX1-NEXT: sarq $63, %rsi			; AVX1-NEXT: sarq $63, %rsi
	; AVX1-NEXT: vpinsrw $6, %esi, %xmm1, %xmm1			; AVX1-NEXT: vpinsrw $6, %esi, %xmm1, %xmm1
	; AVX1-NEXT: shrq $7, %rbp			; AVX1-NEXT: shrq $7, %rbp
	; AVX1-NEXT: vpinsrw $7, %ebp, %xmm1, %xmm1			; AVX1-NEXT: vpinsrw $7, %ebp, %xmm1, %xmm1
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	; AVX1-NEXT: popq %rbx			; AVX1-NEXT: popq %rbx
				; AVX1-NEXT: .Lcfi12:
				; AVX1-NEXT: .cfi_def_cfa_offset 48
	; AVX1-NEXT: popq %r12			; AVX1-NEXT: popq %r12
				; AVX1-NEXT: .Lcfi13:
				; AVX1-NEXT: .cfi_def_cfa_offset 40
	; AVX1-NEXT: popq %r13			; AVX1-NEXT: popq %r13
				; AVX1-NEXT: .Lcfi14:
				; AVX1-NEXT: .cfi_def_cfa_offset 32
	; AVX1-NEXT: popq %r14			; AVX1-NEXT: popq %r14
				; AVX1-NEXT: .Lcfi15:
				; AVX1-NEXT: .cfi_def_cfa_offset 24
	; AVX1-NEXT: popq %r15			; AVX1-NEXT: popq %r15
				; AVX1-NEXT: .Lcfi16:
				; AVX1-NEXT: .cfi_def_cfa_offset 16
	; AVX1-NEXT: popq %rbp			; AVX1-NEXT: popq %rbp
				; AVX1-NEXT: .Lcfi17:
				; AVX1-NEXT: .cfi_def_cfa_offset 8
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: load_sext_16i1_to_16i16:			; AVX2-LABEL: load_sext_16i1_to_16i16:
	; AVX2: # BB#0: # %entry			; AVX2: # BB#0: # %entry
	; AVX2-NEXT: pushq %rbp			; AVX2-NEXT: pushq %rbp
	; AVX2-NEXT: .Lcfi0:			; AVX2-NEXT: .Lcfi0:
	; AVX2-NEXT: .cfi_def_cfa_offset 16			; AVX2-NEXT: .cfi_def_cfa_offset 16
	; AVX2-NEXT: pushq %r15			; AVX2-NEXT: pushq %r15
	▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpinsrw $5, %edx, %xmm1, %xmm1			; AVX2-NEXT: vpinsrw $5, %edx, %xmm1, %xmm1
	; AVX2-NEXT: shlq $57, %rsi			; AVX2-NEXT: shlq $57, %rsi
	; AVX2-NEXT: sarq $63, %rsi			; AVX2-NEXT: sarq $63, %rsi
	; AVX2-NEXT: vpinsrw $6, %esi, %xmm1, %xmm1			; AVX2-NEXT: vpinsrw $6, %esi, %xmm1, %xmm1
	; AVX2-NEXT: shrq $7, %rbp			; AVX2-NEXT: shrq $7, %rbp
	; AVX2-NEXT: vpinsrw $7, %ebp, %xmm1, %xmm1			; AVX2-NEXT: vpinsrw $7, %ebp, %xmm1, %xmm1
	; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0			; AVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
	; AVX2-NEXT: popq %rbx			; AVX2-NEXT: popq %rbx
				; AVX2-NEXT: .Lcfi12:
				; AVX2-NEXT: .cfi_def_cfa_offset 48
	; AVX2-NEXT: popq %r12			; AVX2-NEXT: popq %r12
				; AVX2-NEXT: .Lcfi13:
				; AVX2-NEXT: .cfi_def_cfa_offset 40
	; AVX2-NEXT: popq %r13			; AVX2-NEXT: popq %r13
				; AVX2-NEXT: .Lcfi14:
				; AVX2-NEXT: .cfi_def_cfa_offset 32
	; AVX2-NEXT: popq %r14			; AVX2-NEXT: popq %r14
				; AVX2-NEXT: .Lcfi15:
				; AVX2-NEXT: .cfi_def_cfa_offset 24
	; AVX2-NEXT: popq %r15			; AVX2-NEXT: popq %r15
				; AVX2-NEXT: .Lcfi16:
				; AVX2-NEXT: .cfi_def_cfa_offset 16
	; AVX2-NEXT: popq %rbp			; AVX2-NEXT: popq %rbp
				; AVX2-NEXT: .Lcfi17:
				; AVX2-NEXT: .cfi_def_cfa_offset 8
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512F-LABEL: load_sext_16i1_to_16i16:			; AVX512F-LABEL: load_sext_16i1_to_16i16:
	; AVX512F: # BB#0: # %entry			; AVX512F: # BB#0: # %entry
	; AVX512F-NEXT: kmovw (%rdi), %k1			; AVX512F-NEXT: kmovw (%rdi), %k1
	; AVX512F-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}			; AVX512F-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
	; AVX512F-NEXT: vpmovdw %zmm0, %ymm0			; AVX512F-NEXT: vpmovdw %zmm0, %ymm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	▲ Show 20 Lines • Show All 1,380 Lines • ▼ Show 20 Lines
	; X32-SSE41-LABEL: sext_2i8_to_i32:			; X32-SSE41-LABEL: sext_2i8_to_i32:
	; X32-SSE41: # BB#0: # %entry			; X32-SSE41: # BB#0: # %entry
	; X32-SSE41-NEXT: pushl %eax			; X32-SSE41-NEXT: pushl %eax
	; X32-SSE41-NEXT: .Lcfi0:			; X32-SSE41-NEXT: .Lcfi0:
	; X32-SSE41-NEXT: .cfi_def_cfa_offset 8			; X32-SSE41-NEXT: .cfi_def_cfa_offset 8
	; X32-SSE41-NEXT: pmovsxbw %xmm0, %xmm0			; X32-SSE41-NEXT: pmovsxbw %xmm0, %xmm0
	; X32-SSE41-NEXT: movd %xmm0, %eax			; X32-SSE41-NEXT: movd %xmm0, %eax
	; X32-SSE41-NEXT: popl %ecx			; X32-SSE41-NEXT: popl %ecx
				; X32-SSE41-NEXT: .Lcfi1:
				; X32-SSE41-NEXT: .cfi_def_cfa_offset 4
	; X32-SSE41-NEXT: retl			; X32-SSE41-NEXT: retl
	entry:			entry:
	%Shuf = shufflevector <16 x i8> %A, <16 x i8> undef, <2 x i32> <i32 0, i32 1>			%Shuf = shufflevector <16 x i8> %A, <16 x i8> undef, <2 x i32> <i32 0, i32 1>
	%Ex = sext <2 x i8> %Shuf to <2 x i16>			%Ex = sext <2 x i8> %Shuf to <2 x i16>
	%Bc = bitcast <2 x i16> %Ex to i32			%Bc = bitcast <2 x i16> %Ex to i32
	ret i32 %Bc			ret i32 %Bc
	}			}

	▲ Show 20 Lines • Show All 222 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-avx512.ll

	Show First 20 Lines • Show All 616 Lines • ▼ Show 20 Lines
	; KNL32-NEXT: .cfi_def_cfa_register %ebp			; KNL32-NEXT: .cfi_def_cfa_register %ebp
	; KNL32-NEXT: andl $-32, %esp			; KNL32-NEXT: andl $-32, %esp
	; KNL32-NEXT: subl $32, %esp			; KNL32-NEXT: subl $32, %esp
	; KNL32-NEXT: vpbroadcastw {{\.LCPI.*}}, %ymm3			; KNL32-NEXT: vpbroadcastw {{\.LCPI.*}}, %ymm3
	; KNL32-NEXT: vpblendvb %ymm3, %ymm2, %ymm0, %ymm0			; KNL32-NEXT: vpblendvb %ymm3, %ymm2, %ymm0, %ymm0
	; KNL32-NEXT: vpblendvb %ymm3, 8(%ebp), %ymm1, %ymm1			; KNL32-NEXT: vpblendvb %ymm3, 8(%ebp), %ymm1, %ymm1
	; KNL32-NEXT: movl %ebp, %esp			; KNL32-NEXT: movl %ebp, %esp
	; KNL32-NEXT: popl %ebp			; KNL32-NEXT: popl %ebp
				; KNL32-NEXT: .Lcfi3:
				; KNL32-NEXT: .cfi_def_cfa %esp, 4
	; KNL32-NEXT: retl			; KNL32-NEXT: retl
	entry:			entry:
	%0 = shufflevector <64 x i8> %A, <64 x i8> %W, <64 x i32> <i32 64, i32 1, i32 66, i32 3, i32 68, i32 5, i32 70, i32 7, i32 72, i32 9, i32 74, i32 11, i32 76, i32 13, i32 78, i32 15, i32 80, i32 17, i32 82, i32 19, i32 84, i32 21, i32 86, i32 23, i32 88, i32 25, i32 90, i32 27, i32 92, i32 29, i32 94, i32 31, i32 96, i32 33, i32 98, i32 35, i32 100, i32 37, i32 102, i32 39, i32 104, i32 41, i32 106, i32 43, i32 108, i32 45, i32 110, i32 47, i32 112, i32 49, i32 114, i32 51, i32 116, i32 53, i32 118, i32 55, i32 120, i32 57, i32 122, i32 59, i32 124, i32 61, i32 126, i32 63>			%0 = shufflevector <64 x i8> %A, <64 x i8> %W, <64 x i32> <i32 64, i32 1, i32 66, i32 3, i32 68, i32 5, i32 70, i32 7, i32 72, i32 9, i32 74, i32 11, i32 76, i32 13, i32 78, i32 15, i32 80, i32 17, i32 82, i32 19, i32 84, i32 21, i32 86, i32 23, i32 88, i32 25, i32 90, i32 27, i32 92, i32 29, i32 94, i32 31, i32 96, i32 33, i32 98, i32 35, i32 100, i32 37, i32 102, i32 39, i32 104, i32 41, i32 106, i32 43, i32 108, i32 45, i32 110, i32 47, i32 112, i32 49, i32 114, i32 51, i32 116, i32 53, i32 118, i32 55, i32 120, i32 57, i32 122, i32 59, i32 124, i32 61, i32 126, i32 63>
	ret <64 x i8> %0			ret <64 x i8> %0
	}			}

	define <32 x i16> @test_mm512_mask_blend_epi16(<32 x i16> %A, <32 x i16> %W){			define <32 x i16> @test_mm512_mask_blend_epi16(<32 x i16> %A, <32 x i16> %W){
	; SKX64-LABEL: test_mm512_mask_blend_epi16:			; SKX64-LABEL: test_mm512_mask_blend_epi16:
	Show All 14 Lines
	; SKX32-NEXT: movl $-1431655766, %eax # imm = 0xAAAAAAAA			; SKX32-NEXT: movl $-1431655766, %eax # imm = 0xAAAAAAAA
	; SKX32-NEXT: kmovd %eax, %k1			; SKX32-NEXT: kmovd %eax, %k1
	; SKX32-NEXT: vpblendmw %zmm0, %zmm1, %zmm0 {%k1}			; SKX32-NEXT: vpblendmw %zmm0, %zmm1, %zmm0 {%k1}
	; SKX32-NEXT: retl			; SKX32-NEXT: retl
	;			;
	; KNL32-LABEL: test_mm512_mask_blend_epi16:			; KNL32-LABEL: test_mm512_mask_blend_epi16:
	; KNL32: # BB#0: # %entry			; KNL32: # BB#0: # %entry
	; KNL32-NEXT: pushl %ebp			; KNL32-NEXT: pushl %ebp
	; KNL32-NEXT: .Lcfi3:
	; KNL32-NEXT: .cfi_def_cfa_offset 8
	; KNL32-NEXT: .Lcfi4:			; KNL32-NEXT: .Lcfi4:
				; KNL32-NEXT: .cfi_def_cfa_offset 8
				; KNL32-NEXT: .Lcfi5:
	; KNL32-NEXT: .cfi_offset %ebp, -8			; KNL32-NEXT: .cfi_offset %ebp, -8
	; KNL32-NEXT: movl %esp, %ebp			; KNL32-NEXT: movl %esp, %ebp
	; KNL32-NEXT: .Lcfi5:			; KNL32-NEXT: .Lcfi6:
	; KNL32-NEXT: .cfi_def_cfa_register %ebp			; KNL32-NEXT: .cfi_def_cfa_register %ebp
	; KNL32-NEXT: andl $-32, %esp			; KNL32-NEXT: andl $-32, %esp
	; KNL32-NEXT: subl $32, %esp			; KNL32-NEXT: subl $32, %esp
	; KNL32-NEXT: vpblendw {{.*#+}} ymm0 = ymm2[0],ymm0[1],ymm2[2],ymm0[3],ymm2[4],ymm0[5],ymm2[6],ymm0[7],ymm2[8],ymm0[9],ymm2[10],ymm0[11],ymm2[12],ymm0[13],ymm2[14],ymm0[15]			; KNL32-NEXT: vpblendw {{.*#+}} ymm0 = ymm2[0],ymm0[1],ymm2[2],ymm0[3],ymm2[4],ymm0[5],ymm2[6],ymm0[7],ymm2[8],ymm0[9],ymm2[10],ymm0[11],ymm2[12],ymm0[13],ymm2[14],ymm0[15]
	; KNL32-NEXT: vpblendw {{.*#+}} ymm1 = mem[0],ymm1[1],mem[2],ymm1[3],mem[4],ymm1[5],mem[6],ymm1[7],mem[8],ymm1[9],mem[10],ymm1[11],mem[12],ymm1[13],mem[14],ymm1[15]			; KNL32-NEXT: vpblendw {{.*#+}} ymm1 = mem[0],ymm1[1],mem[2],ymm1[3],mem[4],ymm1[5],mem[6],ymm1[7],mem[8],ymm1[9],mem[10],ymm1[11],mem[12],ymm1[13],mem[14],ymm1[15]
	; KNL32-NEXT: movl %ebp, %esp			; KNL32-NEXT: movl %ebp, %esp
	; KNL32-NEXT: popl %ebp			; KNL32-NEXT: popl %ebp
				; KNL32-NEXT: .Lcfi7:
				; KNL32-NEXT: .cfi_def_cfa %esp, 4
	; KNL32-NEXT: retl			; KNL32-NEXT: retl
	entry:			entry:
	%0 = shufflevector <32 x i16> %A, <32 x i16> %W, <32 x i32> <i32 32, i32 1, i32 34, i32 3, i32 36, i32 5, i32 38, i32 7, i32 40, i32 9, i32 42, i32 11, i32 44, i32 13, i32 46, i32 15, i32 48, i32 17, i32 50, i32 19, i32 52, i32 21, i32 54, i32 23, i32 56, i32 25, i32 58, i32 27, i32 60, i32 29, i32 62, i32 31>			%0 = shufflevector <32 x i16> %A, <32 x i16> %W, <32 x i32> <i32 32, i32 1, i32 34, i32 3, i32 36, i32 5, i32 38, i32 7, i32 40, i32 9, i32 42, i32 11, i32 44, i32 13, i32 46, i32 15, i32 48, i32 17, i32 50, i32 19, i32 52, i32 21, i32 54, i32 23, i32 56, i32 25, i32 58, i32 27, i32 60, i32 29, i32 62, i32 31>
	ret <32 x i16> %0			ret <32 x i16> %0
	}			}

	define <16 x i32> @test_mm512_mask_blend_epi32(<16 x i32> %A, <16 x i32> %W){			define <16 x i32> @test_mm512_mask_blend_epi32(<16 x i32> %A, <16 x i32> %W){
	; SKX64-LABEL: test_mm512_mask_blend_epi32:			; SKX64-LABEL: test_mm512_mask_blend_epi32:
	▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-shuffle-v1.ll

	Show First 20 Lines • Show All 439 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vptestmd %zmm0, %zmm0, %k0			; AVX512F-NEXT: vptestmd %zmm0, %zmm0, %k0
	; AVX512F-NEXT: kmovw %k0, (%rsp)			; AVX512F-NEXT: kmovw %k0, (%rsp)
	; AVX512F-NEXT: movl (%rsp), %ecx			; AVX512F-NEXT: movl (%rsp), %ecx
	; AVX512F-NEXT: movq %rcx, %rax			; AVX512F-NEXT: movq %rcx, %rax
	; AVX512F-NEXT: shlq $32, %rax			; AVX512F-NEXT: shlq $32, %rax
	; AVX512F-NEXT: orq %rcx, %rax			; AVX512F-NEXT: orq %rcx, %rax
	; AVX512F-NEXT: movq %rbp, %rsp			; AVX512F-NEXT: movq %rbp, %rsp
	; AVX512F-NEXT: popq %rbp			; AVX512F-NEXT: popq %rbp
				; AVX512F-NEXT: .Lcfi3:
				; AVX512F-NEXT: .cfi_def_cfa %rsp, 8
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; VL_BW_DQ-LABEL: shuf64i1_zero:			; VL_BW_DQ-LABEL: shuf64i1_zero:
	; VL_BW_DQ: # BB#0:			; VL_BW_DQ: # BB#0:
	; VL_BW_DQ-NEXT: kmovq %rdi, %k0			; VL_BW_DQ-NEXT: kmovq %rdi, %k0
	; VL_BW_DQ-NEXT: vpmovm2b %k0, %zmm0			; VL_BW_DQ-NEXT: vpmovm2b %k0, %zmm0
	; VL_BW_DQ-NEXT: vpbroadcastb %xmm0, %zmm0			; VL_BW_DQ-NEXT: vpbroadcastb %xmm0, %zmm0
	Show All 9 Lines

test/CodeGen/X86/wide-integer-cmp.ll

	Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: cmpl {{[0-9]+}}(%esp), %edx			; CHECK-NEXT: cmpl {{[0-9]+}}(%esp), %edx
	; CHECK-NEXT: sbbl {{[0-9]+}}(%esp), %esi			; CHECK-NEXT: sbbl {{[0-9]+}}(%esp), %esi
	; CHECK-NEXT: sbbl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: sbbl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: sbbl {{[0-9]+}}(%esp), %ecx			; CHECK-NEXT: sbbl {{[0-9]+}}(%esp), %ecx
	; CHECK-NEXT: jge .LBB4_2			; CHECK-NEXT: jge .LBB4_2
	; CHECK-NEXT: # BB#1: # %bb1			; CHECK-NEXT: # BB#1: # %bb1
	; CHECK-NEXT: movl $1, %eax			; CHECK-NEXT: movl $1, %eax
	; CHECK-NEXT: popl %esi			; CHECK-NEXT: popl %esi
				; CHECK-NEXT: .Lcfi2:
				; CHECK-NEXT: .cfi_def_cfa_offset 4
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	; CHECK-NEXT: .LBB4_2: # %bb2			; CHECK-NEXT: .LBB4_2: # %bb2
				; CHECK-NEXT: .Lcfi3:
				; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: movl $2, %eax			; CHECK-NEXT: movl $2, %eax
	; CHECK-NEXT: popl %esi			; CHECK-NEXT: popl %esi
				; CHECK-NEXT: .Lcfi4:
				; CHECK-NEXT: .cfi_def_cfa_offset 4
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	entry:			entry:
	%cmp = icmp slt i128 %a, %b			%cmp = icmp slt i128 %a, %b
	br i1 %cmp, label %bb1, label %bb2			br i1 %cmp, label %bb1, label %bb2
	bb1:			bb1:
	ret i32 1			ret i32 1
	bb2:			bb2:
	ret i32 2			ret i32 2
	Show All 26 Lines

test/CodeGen/X86/x86-framelowering-trap.ll

	; RUN: llc %s -o - \| FileCheck %s			; RUN: llc %s -o - \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; CHECK-LABEL: bar:			; CHECK-LABEL: bar:
	; CHECK: pushq			; CHECK: pushq
	; CHECK: ud2			; CHECK: ud2
	; CHECK-NEXT: popq			; CHECK-NEXT: popq
				; CHECK-NEXT: :
				; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	define void @bar() {			define void @bar() {
	entry:			entry:
	call void @callee()			call void @callee()
	call void @llvm.trap()			call void @llvm.trap()
	ret void			ret void
	}			}

	; Function Attrs: noreturn nounwind			; Function Attrs: noreturn nounwind
	declare void @llvm.trap()			declare void @llvm.trap()

	declare void @callee()			declare void @callee()

test/CodeGen/X86/x86-no_caller_saved_registers-preserve.ll

	Show All 17 Lines
	; CHECK-NEXT: .cfi_offset %rdx, -16			; CHECK-NEXT: .cfi_offset %rdx, -16
	; CHECK-NEXT: .Lcfi2:			; CHECK-NEXT: .Lcfi2:
	; CHECK-NEXT: .cfi_offset %xmm1, -32			; CHECK-NEXT: .cfi_offset %xmm1, -32
	; CHECK-NEXT: #APP			; CHECK-NEXT: #APP
	; CHECK-NEXT: #NO_APP			; CHECK-NEXT: #NO_APP
	; CHECK-NEXT: movl $4, %eax			; CHECK-NEXT: movl $4, %eax
	; CHECK-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm1 # 16-byte Reload			; CHECK-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm1 # 16-byte Reload
	; CHECK-NEXT: popq %rdx			; CHECK-NEXT: popq %rdx
				; CHECK-NEXT: .Lcfi3:
				; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	call void asm sideeffect "", "~{rax},~{rdx},~{xmm1},~{rdi},~{rsi},~{xmm0}"()			call void asm sideeffect "", "~{rax},~{rdx},~{xmm1},~{rdi},~{rsi},~{xmm0}"()
	ret i32 4			ret i32 4
	}			}

	;; Because "bar" has 'no_caller_saved_registers' attribute, function "foo"			;; Because "bar" has 'no_caller_saved_registers' attribute, function "foo"
	;; doesn't need to preserve registers except for the arguments passed			;; doesn't need to preserve registers except for the arguments passed
	;; to "bar" (%ESI, %EDI and %XMM0).			;; to "bar" (%ESI, %EDI and %XMM0).
	Show All 21 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Providing correct unwind info in function epilogueClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 104367

include/llvm/CodeGen/MachineBasicBlock.h

include/llvm/CodeGen/MachineInstr.h

include/llvm/CodeGen/Passes.h

include/llvm/InitializePasses.h

include/llvm/Target/Target.td

include/llvm/Target/TargetFrameLowering.h

lib/CodeGen/BranchFolding.cpp

lib/CodeGen/CFIInfoVerifier.cpp

lib/CodeGen/CFIInstrInserter.cpp

lib/CodeGen/CMakeLists.txt

lib/CodeGen/CodeGen.cpp

lib/CodeGen/MachineBasicBlock.cpp

lib/CodeGen/MachineInstr.cpp

lib/CodeGen/PrologEpilogInserter.cpp

lib/CodeGen/TailDuplicator.cpp

lib/CodeGen/TargetPassConfig.cpp

lib/Target/X86/X86CallFrameOptimization.cpp

lib/Target/X86/X86FrameLowering.h

lib/Target/X86/X86FrameLowering.cpp

test/CodeGen/X86/2009-03-16-PHIElimInLPad.ll

test/CodeGen/X86/2011-10-19-widen_vselect.ll

test/CodeGen/X86/GlobalISel/add-scalar.ll

test/CodeGen/X86/GlobalISel/frameIndex.ll

test/CodeGen/X86/O0-pipeline.ll

test/CodeGen/X86/avg.ll

test/CodeGen/X86/avx512-vbroadcast.ll

test/CodeGen/X86/avx512bw-intrinsics-upgrade.ll

test/CodeGen/X86/avx512bw-intrinsics.ll

test/CodeGen/X86/avx512vl-intrinsics-fast-isel.ll

test/CodeGen/X86/avx512vl-vbroadcast.ll

test/CodeGen/X86/emutls-pie.ll

test/CodeGen/X86/emutls.ll

test/CodeGen/X86/epilogue-cfi-fp.ll

test/CodeGen/X86/epilogue-cfi-no-fp.ll

test/CodeGen/X86/fast-isel-store.ll

test/CodeGen/X86/frame-lowering-debug-intrinsic-2.ll

test/CodeGen/X86/frame-lowering-debug-intrinsic.ll

test/CodeGen/X86/haddsub-2.ll

test/CodeGen/X86/hipe-cc64.ll

test/CodeGen/X86/imul.ll

test/CodeGen/X86/legalize-shift-64.ll

test/CodeGen/X86/load-combine.ll

test/CodeGen/X86/masked_gather_scatter.ll

test/CodeGen/X86/memset-nonzero.ll

test/CodeGen/X86/merge-consecutive-loads-128.ll

test/CodeGen/X86/movtopush.ll

test/CodeGen/X86/mul-constant-result.ll

test/CodeGen/X86/mul-i256.ll

test/CodeGen/X86/pr21792.ll

test/CodeGen/X86/pr29112.ll

test/CodeGen/X86/pr30430.ll

test/CodeGen/X86/pr32241.ll

test/CodeGen/X86/pr32256.ll

test/CodeGen/X86/pr32329.ll

test/CodeGen/X86/pr32345.ll

test/CodeGen/X86/pr32451.ll

test/CodeGen/X86/pr9743.ll

test/CodeGen/X86/push-cfi-debug.ll

test/CodeGen/X86/push-cfi-obj.ll

test/CodeGen/X86/push-cfi.ll

test/CodeGen/X86/return-ext.ll

test/CodeGen/X86/rtm.ll

test/CodeGen/X86/setcc-lowering.ll

test/CodeGen/X86/statepoint-call-lowering.ll

test/CodeGen/X86/statepoint-gctransition-call-lowering.ll

test/CodeGen/X86/statepoint-invoke.ll

test/CodeGen/X86/throws-cfi-fp.ll

test/CodeGen/X86/throws-cfi-no-fp.ll

test/CodeGen/X86/vector-sext.ll

test/CodeGen/X86/vector-shuffle-avx512.ll

test/CodeGen/X86/vector-shuffle-v1.ll

test/CodeGen/X86/wide-integer-cmp.ll

test/CodeGen/X86/x86-framelowering-trap.ll

[X86] Providing correct unwind info in function epilogue
ClosedPublic