This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
CodeGen/
2
AsmPrinter.h
2
TargetSchedule.h
-
MC/
-
MCObjectStreamer.h
-
MCStreamer.h
-
MCSubtargetInfo.h
-
Target/
2
TargetSubtargetInfo.h
-
lib/
-
CodeGen/
-
AsmPrinter/
7
AsmPrinter.cpp
4
TargetSchedule.cpp
-
TargetSubtargetInfo.cpp
-
MC/
3
MCAsmStreamer.cpp
1
MCObjectStreamer.cpp
-
MCStreamer.cpp
-
Object/
-
RecordStreamer.h
1
RecordStreamer.cpp
-
Target/
-
AArch64/MCTargetDesc/
-
MCTargetDesc/
-
AArch64ELFStreamer.cpp
-
ARM/MCTargetDesc/
-
MCTargetDesc/
-
ARMELFStreamer.cpp
-
Hexagon/MCTargetDesc/
-
MCTargetDesc/
-
HexagonMCELFStreamer.h
-
HexagonMCELFStreamer.cpp
-
Mips/MCTargetDesc/
-
MCTargetDesc/
1
MipsELFStreamer.h
-
MipsELFStreamer.cpp
-
MipsNaClELFStreamer.cpp
-
X86/
-
InstPrinter/
-
X86InstComments.cpp
3
X86MCInstLower.cpp
3
X86Subtarget.h
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
recip-fastmath.ll
1
recip-fastmath2.ll

Differential D30941

Better testing of schedule model instruction latencies/throughputs
ClosedPublic

Authored by avt77 on Mar 14 2017, 7:51 AM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
mkuper
craig.topper
javed.absar
hfinkel

Summary

This patch should close the issue raised in PR32216. The patch introduce a new llc switch "-print-latency" which allows to add comments with latencies in output .s files. The patch opens several questions:

there is real mess in terminology: the sources mix throughput and latency completely - we should fix it
sometimes the selection of our sched model is unclear because the selected alternative code sequence (sometimes) looks worse than the initial code
there are some issues with FMA support inside the model (in fact it does not work at the moment)
some values of latency (reported by model) look rather strange

This is the initial version of the patch and I'm sure all the above issues will be resolved soon.

Diff Detail

Event Timeline

avt77 created this revision.Mar 14 2017, 7:51 AM

RKSimon added reviewers: craig.topper, hfinkel, mkuper.Mar 14 2017, 8:39 AM

RKSimon added a subscriber: llvm-commits.

RKSimon added inline comments.Mar 14 2017, 8:44 AM

lib/Target/X86/InstPrinter/X86InstComments.h
22 ↗	(On Diff #91724)	Style guide - AC_PRINT_LATENCY
test/CodeGen/X86/recip-fastmath2.ll
20	Where have the retq instructions gone?

hfinkel added inline comments.Mar 14 2017, 9:28 AM

lib/Target/X86/X86MCInstLower.cpp

1269

This seems really useful, but is not target dependent. Can you please move this hook into the target-independent code? Maybe in void AsmPrinter::EmitFunctionBody(), around here:

default:
  EmitInstruction(&MI);
  break;

(right before the call to EmitInstruction).

1278

Can you call TII->getInstrLatency here instead of computing it in this loop?

(if you just call SCModel->computeInstrLatency, as suggested below, this will take care of itself).

1291

Can't you just call SCModel->computeInstrLatency return the result? The logic there seems like exactly what you want:

unsigned
TargetSchedModel::computeInstrLatency(const MachineInstr *MI,
                                      bool UseDefaultDefLatency) const {
  // For the itinerary model, fall back to the old subtarget hook.
  // Allow subtargets to compute Bundle latencies outside the machine model.
  if (hasInstrItineraries() || MI->isBundle() ||
      (!hasInstrSchedModel() && !UseDefaultDefLatency))
    return TII->getInstrLatency(&InstrItins, *MI);
 
  if (hasInstrSchedModel()) {
    const MCSchedClassDesc *SCDesc = resolveSchedClass(MI);
    if (SCDesc->isValid())
      return computeInstrLatency(*SCDesc);
  }
  return TII->defaultDefLatency(SchedModel, *MI);
}

How feasible is it to include the instruction's throughput in the comment as well?

hfinkel, could you help me? First of all could you give me a link(s) to any doc(s) related to our MCSchedModel except sources?

Next, I was told that ResourceCycles here:

class ProcWriteResources<list<ProcResourceKind> resources> {

list<ProcResourceKind> ProcResources = resources;
list<int> ResourceCycles = [];
int Latency = 1;
int NumMicroOps = 1;

could be used as Throughput of the given instruction. Is it right? Does it mean I could include it in generated comment as well? If YES I suppose it should be the max of the Cycles, right?

In D30941#701498, @avt77 wrote:
hfinkel, could you help me? First of all could you give me a link(s) to any doc(s) related to our MCSchedModel except sources?

Next, I was told that ResourceCycles here:

class ProcWriteResources<list<ProcResourceKind> resources> {
list<ProcResourceKind> ProcResources = resources;
list<int> ResourceCycles = [];
int Latency = 1;
int NumMicroOps = 1;
could be used as Throughput of the given instruction. Is it right? Does it mean I could include it in generated comment as well? If YES I suppose it should be the max of the Cycles, right?

I don't know if there are any docs besides the code and the code comments, but I think you are correct - the max of ResourceCycles is the inverse throughput for the instruction:

This is from include/llvm/Target/TargetSchedule.td :

// Optionally, ResourceCycles indicates the number of cycles the
// resource is consumed. Each ResourceCycles item is paired with the
// ProcResource item at the same position in its list. Since
// ResourceCycles are rarely specialized, the list may be
// incomplete. By default, resources are consumed for a single cycle,
// regardless of latency, which models a fully pipelined processing
// unit. A value of 0 for ResourceCycles means that the resource must
// be available but is not consumed, which is only relevant for
// unbuffered resources.

And this is in MachineScheduler.cpp:

// For reserved resources, record the highest cycle using the resource.
// For top-down scheduling, this is the cycle in which we schedule this
// instruction plus the number of cycles the operations reserves the
// resource.

We could abbreviate the comment string that you are adding like: [7:2].
I'm biased because that's the way I've always formatted [ latency : inverse throughput ], but I think that people that care about CPU timing will recognize that format, so you don't have to print out the words "latency" or "throughput".

In D30941#701621, @spatel wrote:
In D30941#701498, @avt77 wrote:
hfinkel, could you help me? First of all could you give me a link(s) to any doc(s) related to our MCSchedModel except sources?

Next, I was told that ResourceCycles here:

class ProcWriteResources<list<ProcResourceKind> resources> {
list<ProcResourceKind> ProcResources = resources;
list<int> ResourceCycles = [];
int Latency = 1;
int NumMicroOps = 1;
could be used as Throughput of the given instruction. Is it right? Does it mean I could include it in generated comment as well? If YES I suppose it should be the max of the Cycles, right?
I don't know if there are any docs besides the code and the code comments,

The documentation is primarily in the header and TableGen files (for better or worse).

but I think you are correct - the max of ResourceCycles is the inverse throughput for the instruction:

That's correct if the instruction can only dispatch through that one resource.

First, for itineraries, I think you can do something like this:

double Unknown = std::numeric_limits<double>::infinity()
double Throughput = Unknown;
if (IID.isEmpty())
  return Throughput;

for (const InstrStage *IS = IID.beginStage(ItinClassIndx),
           *E = IID.endStage(ItinClassIndx); IS != E; ++IS) {
  unsigned Cycles = IS->getCycles();
  if (!Cycles)
    continue;

  Throughput = std::min(Throughput, popcnt(IS->getUnits()) * 1.0/Cycles);
}

return Throughput;

For resource descriptions, I think that you want the inverse of ResourceCycles multiplied by the number of applicable resources. Something like this:

for (MCWriteProcResEntry *WPR = STI.getWriteProcResBegin(SCClass),
                                                     *WEnd = STI.getWriteProcResEnd(SCClass); WPR != WEnd; ++WPR) {
  unsigned Cycles = WPR->Cycles;
  if (!Cycles)
    return Unknown;

  unsigned NumUnits = SCModel->getProcResource(WPR->ProcResourceIdx)->NumUnits;
  Throughput = std::min(Throughput, NumUnits * 1.0/Cycles);
}

This is from include/llvm/Target/TargetSchedule.td :
// Optionally, ResourceCycles indicates the number of cycles the
// resource is consumed. Each ResourceCycles item is paired with the
// ProcResource item at the same position in its list. Since
// ResourceCycles are rarely specialized, the list may be
// incomplete. By default, resources are consumed for a single cycle,
// regardless of latency, which models a fully pipelined processing
// unit. A value of 0 for ResourceCycles means that the resource must
// be available but is not consumed, which is only relevant for
// unbuffered resources.
And this is in MachineScheduler.cpp:
// For reserved resources, record the highest cycle using the resource.
// For top-down scheduling, this is the cycle in which we schedule this
// instruction plus the number of cycles the operations reserves the
// resource.
We could abbreviate the comment string that you are adding like: [7:2].
I'm biased because that's the way I've always formatted [ latency : inverse throughput ], but I think that people that care about CPU timing will recognize that format, so you don't have to print out the words "latency" or "throughput".

The implementation was moved to target independent area and all Hal's comments were applied. I did not do anything with Throughput: it will be done in the patch.

In D30941#702652, @avt77 wrote:

The implementation was moved to target independent area and all Hal's comments were applied. I did not do anything with Throughput: it will be done in the patch.

Do you mean that the throughput will be done in a follow-up patch (or a later revision of this one)? Either is fine with me, although if we're going to add more than latencies we might pick a different name. More later...

In D30941#702657, @hfinkel wrote:

In D30941#702652, @avt77 wrote:

The implementation was moved to target independent area and all Hal's comments were applied. I did not do anything with Throughput: it will be done in the patch.

Do you mean that the throughput will be done in a follow-up patch (or a later revision of this one)? Either is fine with me, although if we're going to add more than latencies we might pick a different name. More later...

If possible please can we use the term 'print schedule' now so that future improvements (throughput, etc.) make sense

hfinkel, yes I mean I'll do it in the next version of this patch soon.
rksimon, do you mean we should rename the compiler option like "-print-schedule"?

In D30941#703612, @avt77 wrote:

hfinkel, yes I mean I'll do it in the next version of this patch soon.
rksimon, do you mean we should rename the compiler option like "-print-schedule"?

It is not clear to me why you won't just always do this when in verbose-asm mode. Thoughts on not having a separate option at all?

Throughput calculation is implemented.

In D30941#703613, @hfinkel wrote:

In D30941#703612, @avt77 wrote:

hfinkel, yes I mean I'll do it in the next version of this patch soon.
rksimon, do you mean we should rename the compiler option like "-print-schedule"?

It is not clear to me why you won't just always do this when in verbose-asm mode. Thoughts on not having a separate option at all?

We can do it without special option of course. Should I remove it and use "verbose-asm mode" instead?

RKSimon added inline comments.Mar 20 2017, 11:31 AM

include/llvm/CodeGen/TargetSchedule.h
192	Add doxygen comment.

Hal,
I removed the special option (-print-schedule) and tried to check-all. The result was very unpleseant but predictable:

Expected Passes    : 29409
Expected Failures  : 160
Unsupported Tests  : 442
Unexpected Failures: 643

Should I fix all these 643 failures or it's better to keep the option?

In D30941#706295, @avt77 wrote:
Hal,
I removed the special option (-print-schedule) and tried to check-all. The result was very unpleseant but predictable:
Expected Passes    : 29409
Expected Failures  : 160
Unsupported Tests  : 442
Unexpected Failures: 643
Should I fix all these 643 failures or it's better to keep the option?

No. Mostly because it is not entirely mechanical (it does not make sense to update the tests without understanding whether the results make sense; the person who updates the test should hopefully have that background). FWIW, this also does not make sense for higher-level targets (e.g. NVPTX).

I recommend the following:

Add a target callback to enable the printing of the scheduling information during verbose mode
Provide a command-line option (cl::opt) that can override that target callback (for easy testing), as in:

bool Enable = OptionName.getNumOccurrences() ? OptionName : STI->shouldPrintStuff();
Enable this only on x86 (fix those tests, if any) (*)

Once we think this works reasonably well, we can encourage other backend maintainers to look at updating their tests and enabling it on their targets if they'd like.

(*) Given that FileCheck does prefix matching, and we're adding this information last on the line, I'm curious to understand why the tests are matching that is causing these failures.

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
776	I think that one ? is enough (three looks excessive to me).
778	One only ? here please.

RKSimon added a comment.Mar 21 2017, 6:36 AM

This comment was removed by RKSimon.

I did everything accordingly to Hal's requirements except one: the default value of "print-schedule" switch is false because otherewise we have "Unexpected Failures: 530" and it's X86 tests ony. The problem is very simple: update_llc_test_checks.py generates CHECKs like here

; XOP-AVX1-NEXT: vextractf128 $1, %ymm2, %xmm5

It means FileCheck has to check the whole line but this patch adds the comment at the end of line and as result the line can't be checked properly.
Finaly what we have now:

- we have option "-print-schedule" allowing to print [latency:throughput" in output (default is false)
- we have enablePrintSchedInfo() - default is false
X86 overrides enablePrintSchedInfo()

Hope that's exactly what was required.

In D30941#707464, @avt77 wrote:

I did everything accordingly to Hal's requirements except one: the default value of "print-schedule" switch is false because otherewise we have "Unexpected Failures: 530" and it's X86 tests ony. The problem is very simple: update_llc_test_checks.py generates CHECKs like here

; XOP-AVX1-NEXT: vextractf128 $1, %ymm2, %xmm5

I did not realize that CHECK-NEXT always matched the whole line. That's interesting.

In any case, if this is an update_llc_test_checks.py problem, why don't you use it to update the tests?

It means FileCheck has to check the whole line but this patch adds the comment at the end of line and as result the line can't be checked properly.
Finaly what we have now:

we have option "-print-schedule" allowing to print [latency:throughput" in output (default is false)

we have enablePrintSchedInfo() - default is false

X86 overrides enablePrintSchedInfo()

Hope that's exactly what was required.

include/llvm/CodeGen/AsmPrinter.h
118	Variable names should start with a capital letter: EnablePrintSchedInfo
include/llvm/MC/MCTargetOptions.h
50 ↗	(On Diff #92624)	Remove this option.
lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1405	We don't add { } around single-statement blocks.
1407	When you remove the '}' here, I'd keep the blank line ;)
1408	The command-line option should be declared in this file. That way you can predicate using the command-line option value on getNumOccurrences() as I had indicated previously. Plus, there's no need to have the option name depend on the tool. Full disclosure: There is another option: You can use a tristate setup, look for DefaultOnOff in lib/CodeGen/AsmPrinter/DwarfDebug.cpp.
lib/CodeGen/TargetSchedule.cpp
342	Please remove this comment and the UseDefaultRThroughput variable. We can always add it later if we'd like.
348	This Unknown value should never be returned (this is an implementation detail). Please return an Optional<double>.

In D30941#707504, @hfinkel wrote:

In D30941#707464, @avt77 wrote:

I did everything accordingly to Hal's requirements except one: the default value of "print-schedule" switch is false because otherewise we have "Unexpected Failures: 530" and it's X86 tests ony. The problem is very simple: update_llc_test_checks.py generates CHECKs like here

; XOP-AVX1-NEXT: vextractf128 $1, %ymm2, %xmm5

I did not realize that CHECK-NEXT always matched the whole line. That's interesting.

If you'd like to CHECK the prefix only you should use something like

CHECK: vextractf128 $1, %ymm2, %xmm5{{*}}

In any case, if this is an update_llc_test_checks.py problem, why don't you use it to update the tests?

I can use it to update the tests but it means I should update 530 tests. Is it acceptable? Should I do it? For me it is not a problem but is it OK for review?

In D30941#707537, @avt77 wrote:

In D30941#707504, @hfinkel wrote:

In D30941#707464, @avt77 wrote:

I did everything accordingly to Hal's requirements except one: the default value of "print-schedule" switch is false because otherewise we have "Unexpected Failures: 530" and it's X86 tests ony. The problem is very simple: update_llc_test_checks.py generates CHECKs like here

; XOP-AVX1-NEXT: vextractf128 $1, %ymm2, %xmm5

I did not realize that CHECK-NEXT always matched the whole line. That's interesting.

If you'd like to CHECK the prefix only you should use something like

CHECK: vextractf128 $1, %ymm2, %xmm5{{*}}

In any case, if this is an update_llc_test_checks.py problem, why don't you use it to update the tests?

I can use it to update the tests but it means I should update 530 tests. Is it acceptable? Should I do it? For me it is not a problem but is it OK for review?

I think that you should update them. I assume we'd want to do it at some point. The reason not to do it would be that the numbers aren't generally right yet. The goal is that, because of the MemoryCombiner, etc. these numbers will affect not only scheduling but also instruction selection (etc.). we want to make them very visible in the tests. So, let's make them very visible... (if others disagree, please say so).

javed.absar added a subscriber: javed.absar.Mar 22 2017, 9:30 AM

The problem with failed tests raised because of new lines of comments added as result of this patch. I was wrong when I told that FileCheck does not allow adding of new comments at EOL.
I redesigned the patch to make it possible to add Latency:Throughput at the end of exisiting comment (if any). As result I was forced to change API of EmitInstruction from MCStreamer. I don't like this change because there are a lot of successors of MCStreamer but it works perfectly and maybe useful for other targets.
I regenerated (with help of update_llc_test_checks.py) 34 tests and now we have only 16 failed tests: I'm going to fix them asap.

Herald added a reviewer: javed.absar. · View Herald TranscriptMar 30 2017, 10:01 AM

Please can you put back the "## sched: [LAT:RCP]" sched prefix?

If at all possible I'd like this first patch to just do the minimum - provide a -print-sched option (default = false) that adds the scheduling comments. I'd probably not even include the comment 'appending' in this first patch.

After this initial patch I'm much more interested in then getting a high amount of test coverage of as many instructions as possible across each scheduler model - imo that is what we need to improve first to try and find the existing issues with the models. If we don't and we then fix them later, it will require the regeneration of a lot of tests that don't actually care about the schedule.

Only then should we begin investigating how to include this in more asm output.

Accordingly to requirements from Simon I inserted prefix "sched: " for scheduler comments and made "false" as default value for -print-schedule option. As result I restored original versions of all X86-tests excepting 2 ones to demonstrate the changes. Now we don't have any failed test.

A number of minor style comments.

@hfinkel does this look alright to you as a patch for initial support for scheduler comments?

include/llvm/CodeGen/AsmPrinter.h
117	style - newline before comment, add '.' to end of comment.
include/llvm/CodeGen/TargetSchedule.h
192	newline
include/llvm/Target/TargetSubtargetInfo.h
23	Any clang-format / include 'juggling' should be done as a NFC cleanup commit, not as part of a larger patch.
147	Comment says 'enable' but function says 'support' - which is it? Full stop at end of comment (style)
lib/CodeGen/AsmPrinter/AsmPrinter.cpp
771	Add brief comment explaining what you're doing.
lib/CodeGen/TargetSchedule.cpp
281	Shouldn't this be an assert?
lib/MC/MCAsmStreamer.cpp
106–109	Explain how PrintSchedInfo will be used in comment
1586	Comment this
lib/MC/MCObjectStreamer.cpp
241	You're inconsistent leaving some unused argument without names and other with.
lib/Object/RecordStreamer.cpp
82	'B' is meaningless - please use a real name to explain its usage.
lib/Target/Mips/MCTargetDesc/MipsELFStreamer.h
49	Why have you include a default value here??
lib/Target/X86/InstPrinter/X86InstComments.h
26 ↗	(On Diff #93611)	Should this be here or inside the cpp file(s) ?
lib/Target/X86/X86Subtarget.cpp
378 ↗	(On Diff #93611)	Probably best to bring this inside creatSchedInfoStr?
379 ↗	(On Diff #93611)	typo: createSchedInfoStr
lib/Target/X86/X86Subtarget.h
631	newline

avt77 added inline comments.Apr 7 2017, 9:51 AM

lib/CodeGen/TargetSchedule.cpp
281	No, that was the issue: we can have opcodes with invalid SCDesc

Hope, I fixed all comments raised by RKSimon.
hfinkel, what do you think about?

RKSimon added inline comments.Apr 7 2017, 11:19 AM

lib/MC/MCAsmStreamer.cpp
106–109	Don't use @param - you have to provide all params otherwise Wdocumentation fires.
lib/Target/X86/X86Subtarget.cpp
380 ↗	(On Diff #94538)	const char *SchedPrefix = " sched: [";

I fixed the latest requirements from RKSimon. Please, give me your feedback.

hfinkel added inline comments.Apr 11 2017, 11:25 AM

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1412	Why is this commented out? You have a target hook and it defaults to false, so that should be fine.
lib/Target/X86/X86Subtarget.cpp
394 ↗	(On Diff #94784)	Why are these functions X86-specific?
lib/Target/X86/X86Subtarget.h
627	Put the false and the TODO here.

I implemeted all requirements from hfinkel.
Please, review again.

In D30941#724863, @avt77 wrote:

I implemeted all requirements from hfinkel.
Please, review again.

LGTM, thanks!

lib/Target/X86/X86Subtarget.h
627	I'd just say: // TODO: Update the regression tests and return true.

This revision is now accepted and ready to land.Apr 12 2017, 8:42 AM

llvm/include/llvm/MC/MCSubtargetInfo.h:36:7: warning: 'llvm::MCSubtargetInfo' has virtual functions but non-virtual destructor [-Wnon-virtual-dtor]
Currently running into warnings relating to this

In D30941#727148, @martell wrote:

llvm/include/llvm/MC/MCSubtargetInfo.h:36:7: warning: 'llvm::MCSubtargetInfo' has virtual functions but non-virtual destructor [-Wnon-virtual-dtor]
Currently running into warnings relating to this

Should be fixed by rL300322

Perfect, thanks for following up @RKSimon

rL300311

Revision Contents

Path

Size

include/

llvm/

CodeGen/

AsmPrinter.h

3 lines

TargetSchedule.h

4 lines

MC/

MCObjectStreamer.h

3 lines

MCStreamer.h

4 lines

MCSubtargetInfo.h

11 lines

Target/

TargetSubtargetInfo.h

8 lines

lib/

CodeGen/

AsmPrinter/

AsmPrinter.cpp

46 lines

TargetSchedule.cpp

71 lines

TargetSubtargetInfo.cpp

46 lines

MC/

MCAsmStreamer.cpp

34 lines

MCObjectStreamer.cpp

2 lines

MCStreamer.cpp

4 lines

Object/

RecordStreamer.h

3 lines

RecordStreamer.cpp

2 lines

Target/

AArch64/

MCTargetDesc/

AArch64ELFStreamer.cpp

4 lines

ARM/

MCTargetDesc/

ARMELFStreamer.cpp

4 lines

Hexagon/

MCTargetDesc/

HexagonMCELFStreamer.h

3 lines

HexagonMCELFStreamer.cpp

2 lines

Mips/

MCTargetDesc/

MipsELFStreamer.h

3 lines

MipsELFStreamer.cpp

2 lines

MipsNaClELFStreamer.cpp

4 lines

X86/

InstPrinter/

X86InstComments.cpp

2 lines

X86MCInstLower.cpp

18 lines

X86Subtarget.h

3 lines

test/

CodeGen/

X86/

recip-fastmath.ll

412 lines

recip-fastmath2.ll

814 lines

Diff 94973

include/llvm/CodeGen/AsmPrinter.h

Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	public:
/// default, this is equal to CurrentFnSym.		/// default, this is equal to CurrentFnSym.
MCSymbol *CurrentFnSymForSize = nullptr;		MCSymbol *CurrentFnSymForSize = nullptr;

/// Map global GOT equivalent MCSymbols to GlobalVariables and keep track of		/// Map global GOT equivalent MCSymbols to GlobalVariables and keep track of
/// its number of uses by other globals.		/// its number of uses by other globals.
typedef std::pair<const GlobalVariable *, unsigned> GOTEquivUsePair;		typedef std::pair<const GlobalVariable *, unsigned> GOTEquivUsePair;
MapVector<const MCSymbol *, GOTEquivUsePair> GlobalGOTEquivs;		MapVector<const MCSymbol *, GOTEquivUsePair> GlobalGOTEquivs;

		/// Enable print [latency:throughput] in output
		bool EnablePrintSchedInfo = false;

		RKSimonUnsubmitted Not Done Reply Inline Actions style - newline before comment, add '.' to end of comment. RKSimon: style - newline before comment, add '.' to end of comment.
private:		private:
		hfinkelUnsubmitted Not Done Reply Inline Actions Variable names should start with a capital letter: EnablePrintSchedInfo hfinkel: Variable names should start with a capital letter: EnablePrintSchedInfo
MCSymbol *CurrentFnBegin = nullptr;		MCSymbol *CurrentFnBegin = nullptr;
MCSymbol *CurrentFnEnd = nullptr;		MCSymbol *CurrentFnEnd = nullptr;
MCSymbol *CurExceptionSym = nullptr;		MCSymbol *CurExceptionSym = nullptr;

// The garbage collection metadata printer table.		// The garbage collection metadata printer table.
void *GCMetadataPrinters = nullptr; // Really a DenseMap.		void *GCMetadataPrinters = nullptr; // Really a DenseMap.

/// Emit comments in assembly output if this is true.		/// Emit comments in assembly output if this is true.
▲ Show 20 Lines • Show All 502 Lines • Show Last 20 Lines

include/llvm/CodeGen/TargetSchedule.h

Show First 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	#endif
unsigned computeInstrLatency(unsigned Opcode) const;		unsigned computeInstrLatency(unsigned Opcode) const;


/// \brief Output dependency latency of a pair of defs of the same register.		/// \brief Output dependency latency of a pair of defs of the same register.
///		///
/// This is typically one cycle.		/// This is typically one cycle.
unsigned computeOutputLatency(const MachineInstr *DefMI, unsigned DefIdx,		unsigned computeOutputLatency(const MachineInstr *DefMI, unsigned DefIdx,
const MachineInstr *DepMI) const;		const MachineInstr *DepMI) const;

		RKSimonUnsubmitted Not Done Reply Inline Actions Add doxygen comment. RKSimon: Add doxygen comment.
		RKSimonUnsubmitted Not Done Reply Inline Actions newline RKSimon: newline
		/// \brief Compute the reciprocal throughput of the given instruction.
		Optional<double> computeInstrRThroughput(const MachineInstr *MI) const;
		Optional<double> computeInstrRThroughput(unsigned Opcode) const;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_CODEGEN_TARGETSCHEDULE_H		#endif // LLVM_CODEGEN_TARGETSCHEDULE_H

include/llvm/MC/MCObjectStreamer.h

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	public:
virtual void EmitLabel(MCSymbol Symbol, SMLoc Loc, MCFragment F);		virtual void EmitLabel(MCSymbol Symbol, SMLoc Loc, MCFragment F);
void EmitAssignment(MCSymbol Symbol, const MCExpr Value) override;		void EmitAssignment(MCSymbol Symbol, const MCExpr Value) override;
void EmitValueImpl(const MCExpr *Value, unsigned Size,		void EmitValueImpl(const MCExpr *Value, unsigned Size,
SMLoc Loc = SMLoc()) override;		SMLoc Loc = SMLoc()) override;
void EmitULEB128Value(const MCExpr *Value) override;		void EmitULEB128Value(const MCExpr *Value) override;
void EmitSLEB128Value(const MCExpr *Value) override;		void EmitSLEB128Value(const MCExpr *Value) override;
void EmitWeakReference(MCSymbol Alias, const MCSymbol Symbol) override;		void EmitWeakReference(MCSymbol Alias, const MCSymbol Symbol) override;
void ChangeSection(MCSection Section, const MCExpr Subsection) override;		void ChangeSection(MCSection Section, const MCExpr Subsection) override;
void EmitInstruction(const MCInst &Inst, const MCSubtargetInfo& STI) override;		void EmitInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,
		bool = false) override;

/// \brief Emit an instruction to a special fragment, because this instruction		/// \brief Emit an instruction to a special fragment, because this instruction
/// can change its size during relaxation.		/// can change its size during relaxation.
virtual void EmitInstToFragment(const MCInst &Inst, const MCSubtargetInfo &);		virtual void EmitInstToFragment(const MCInst &Inst, const MCSubtargetInfo &);

void EmitBundleAlignMode(unsigned AlignPow2) override;		void EmitBundleAlignMode(unsigned AlignPow2) override;
void EmitBundleLock(bool AlignToEnd) override;		void EmitBundleLock(bool AlignToEnd) override;
void EmitBundleUnlock() override;		void EmitBundleUnlock() override;
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

include/llvm/MC/MCStreamer.h

Show First 20 Lines • Show All 830 Lines • ▼ Show 20 Lines	public:
/// Returns true if the relocation could not be emitted because Name is not		/// Returns true if the relocation could not be emitted because Name is not
/// known.		/// known.
virtual bool EmitRelocDirective(const MCExpr &Offset, StringRef Name,		virtual bool EmitRelocDirective(const MCExpr &Offset, StringRef Name,
const MCExpr *Expr, SMLoc Loc) {		const MCExpr *Expr, SMLoc Loc) {
return true;		return true;
}		}

/// \brief Emit the given \p Instruction into the current section.		/// \brief Emit the given \p Instruction into the current section.
virtual void EmitInstruction(const MCInst &Inst, const MCSubtargetInfo &STI);		/// PrintSchedInfo == true then schedul comment should be added to output
		virtual void EmitInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,
		bool PrintSchedInfo = false);

/// \brief Set the bundle alignment mode from now on in the section.		/// \brief Set the bundle alignment mode from now on in the section.
/// The argument is the power of 2 to which the alignment is set. The		/// The argument is the power of 2 to which the alignment is set. The
/// value 0 means turn the bundle alignment off.		/// value 0 means turn the bundle alignment off.
virtual void EmitBundleAlignMode(unsigned AlignPow2);		virtual void EmitBundleAlignMode(unsigned AlignPow2);

/// \brief The following instructions are a bundle-locked group.		/// \brief The following instructions are a bundle-locked group.
///		///
▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

include/llvm/MC/MCSubtargetInfo.h

Show All 20 Lines
#include "llvm/MC/MCSchedule.h"		#include "llvm/MC/MCSchedule.h"
#include "llvm/MC/SubtargetFeature.h"		#include "llvm/MC/SubtargetFeature.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <string>		#include <string>

namespace llvm {		namespace llvm {
		class MachineInstr;
		class MCInst;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
///		///
/// MCSubtargetInfo - Generic base class for all target subtargets.		/// MCSubtargetInfo - Generic base class for all target subtargets.
///		///
class MCSubtargetInfo {		class MCSubtargetInfo {
Triple TargetTriple; // Target triple		Triple TargetTriple; // Target triple
std::string CPU; // CPU being targeted.		std::string CPU; // CPU being targeted.
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	public:
/// Initialize an InstrItineraryData instance.		/// Initialize an InstrItineraryData instance.
void initInstrItins(InstrItineraryData &InstrItins) const;		void initInstrItins(InstrItineraryData &InstrItins) const;

/// Check whether the CPU string is valid.		/// Check whether the CPU string is valid.
bool isCPUStringValid(StringRef CPU) const {		bool isCPUStringValid(StringRef CPU) const {
auto Found = std::lower_bound(ProcDesc.begin(), ProcDesc.end(), CPU);		auto Found = std::lower_bound(ProcDesc.begin(), ProcDesc.end(), CPU);
return Found != ProcDesc.end() && StringRef(Found->Key) == CPU;		return Found != ProcDesc.end() && StringRef(Found->Key) == CPU;
}		}

		/// Returns string representation of scheduler comment
		virtual std::string getSchedInfoStr(const MachineInstr &MI) const {
		return std::string();
		}

		virtual std::string getSchedInfoStr(MCInst const &MCI) const {
		return std::string();
		}
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_MC_MCSUBTARGETINFO_H		#endif // LLVM_MC_MCSUBTARGETINFO_H

include/llvm/Target/TargetSubtargetInfo.h

Show All 14 Lines
#define LLVM_TARGET_TARGETSUBTARGETINFO_H		#define LLVM_TARGET_TARGETSUBTARGETINFO_H

#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/CodeGen/PBQPRAConstraint.h"		#include "llvm/CodeGen/PBQPRAConstraint.h"
#include "llvm/CodeGen/SchedulerRegistry.h"		#include "llvm/CodeGen/SchedulerRegistry.h"
#include "llvm/CodeGen/ScheduleDAGMutation.h"		#include "llvm/CodeGen/ScheduleDAGMutation.h"
		#include "llvm/MC/MCInst.h"
		RKSimonUnsubmitted Not Done Reply Inline Actions Any clang-format / include 'juggling' should be done as a NFC cleanup commit, not as part of a larger patch. RKSimon: Any clang-format / include 'juggling' should be done as a NFC cleanup commit, not as part of a…
#include "llvm/MC/MCSubtargetInfo.h"		#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/Support/CodeGen.h"		#include "llvm/Support/CodeGen.h"
#include <memory>		#include <memory>
#include <vector>		#include <vector>

namespace llvm {		namespace llvm {

class CallLowering;		class CallLowering;
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	public:
/// \brief True if the subtarget should run MachineScheduler after aggressive		/// \brief True if the subtarget should run MachineScheduler after aggressive
/// coalescing.		/// coalescing.
///		///
/// This currently replaces the SelectionDAG scheduler with the "source" order		/// This currently replaces the SelectionDAG scheduler with the "source" order
/// scheduler (though see below for an option to turn this off and use the		/// scheduler (though see below for an option to turn this off and use the
/// TargetLowering preference). It does not yet disable the postRA scheduler.		/// TargetLowering preference). It does not yet disable the postRA scheduler.
virtual bool enableMachineScheduler() const;		virtual bool enableMachineScheduler() const;

		/// \brief Support printing of [latency:throughput] comment in output .S file.
		RKSimonUnsubmitted Not Done Reply Inline Actions Comment says 'enable' but function says 'support' - which is it? Full stop at end of comment (style) RKSimon: Comment says 'enable' but function says 'support' - which is it? Full stop at end of comment…
		virtual bool supportPrintSchedInfo() const { return false; }

/// \brief True if the machine scheduler should disable the TLI preference		/// \brief True if the machine scheduler should disable the TLI preference
/// for preRA scheduling with the source level scheduler.		/// for preRA scheduling with the source level scheduler.
virtual bool enableMachineSchedDefaultSched() const { return true; }		virtual bool enableMachineSchedDefaultSched() const { return true; }

/// \brief True if the subtarget should enable joining global copies.		/// \brief True if the subtarget should enable joining global copies.
///		///
/// By default this is enabled if the machine scheduler is enabled, but		/// By default this is enabled if the machine scheduler is enabled, but
/// can be overridden.		/// can be overridden.
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	public:
virtual std::unique_ptr<PBQPRAConstraint> getCustomPBQPConstraints() const {		virtual std::unique_ptr<PBQPRAConstraint> getCustomPBQPConstraints() const {
return nullptr;		return nullptr;
}		}

/// Enable tracking of subregister liveness in register allocator.		/// Enable tracking of subregister liveness in register allocator.
/// Please use MachineRegisterInfo::subRegLivenessEnabled() instead where		/// Please use MachineRegisterInfo::subRegLivenessEnabled() instead where
/// possible.		/// possible.
virtual bool enableSubRegLiveness() const { return false; }		virtual bool enableSubRegLiveness() const { return false; }

		/// Returns string representation of scheduler comment
		std::string getSchedInfoStr(const MachineInstr &MI) const override;
		std::string getSchedInfoStr(MCInst const &MCI) const override;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_TARGET_TARGETSUBTARGETINFO_H		#endif // LLVM_TARGET_TARGETSUBTARGETINFO_H

lib/CodeGen/AsmPrinter/AsmPrinter.cpp

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
static const char *const EHTimerName = "write_exception";		static const char *const EHTimerName = "write_exception";
static const char *const EHTimerDescription = "DWARF Exception Writer";		static const char *const EHTimerDescription = "DWARF Exception Writer";
static const char *const CodeViewLineTablesGroupName = "linetables";		static const char *const CodeViewLineTablesGroupName = "linetables";
static const char *const CodeViewLineTablesGroupDescription =		static const char *const CodeViewLineTablesGroupDescription =
"CodeView Line Tables";		"CodeView Line Tables";

STATISTIC(EmittedInsts, "Number of machine instrs printed");		STATISTIC(EmittedInsts, "Number of machine instrs printed");

		static cl::opt<bool>
		PrintSchedule("print-schedule", cl::Hidden, cl::init(false),
		cl::desc("Print 'sched: [latency:throughput]' in .s output"));

char AsmPrinter::ID = 0;		char AsmPrinter::ID = 0;

typedef DenseMap<GCStrategy*, std::unique_ptr<GCMetadataPrinter>> gcp_map_type;		typedef DenseMap<GCStrategy*, std::unique_ptr<GCMetadataPrinter>> gcp_map_type;
static gcp_map_type &getGCMap(void *&P) {		static gcp_map_type &getGCMap(void *&P) {
if (!P)		if (!P)
P = new gcp_map_type();		P = new gcp_map_type();
return (gcp_map_type)P;		return (gcp_map_type)P;
}		}
▲ Show 20 Lines • Show All 581 Lines • ▼ Show 20 Lines	void AsmPrinter::EmitFunctionEntryLabel() {
if (CurrentFnSym->isDefined())		if (CurrentFnSym->isDefined())
report_fatal_error("'" + Twine(CurrentFnSym->getName()) +		report_fatal_error("'" + Twine(CurrentFnSym->getName()) +
"' label emitted multiple times to assembly file");		"' label emitted multiple times to assembly file");

return OutStreamer->EmitLabel(CurrentFnSym);		return OutStreamer->EmitLabel(CurrentFnSym);
}		}

/// emitComments - Pretty-print comments for instructions.		/// emitComments - Pretty-print comments for instructions.
static void emitComments(const MachineInstr &MI, raw_ostream &CommentOS) {		static void emitComments(const MachineInstr &MI, raw_ostream &CommentOS,
		AsmPrinter *AP) {
const MachineFunction *MF = MI.getParent()->getParent();		const MachineFunction *MF = MI.getParent()->getParent();
const TargetInstrInfo *TII = MF->getSubtarget().getInstrInfo();		const TargetInstrInfo *TII = MF->getSubtarget().getInstrInfo();

// Check for spills and reloads		// Check for spills and reloads
int FI;		int FI;

const MachineFrameInfo &MFI = MF->getFrameInfo();		const MachineFrameInfo &MFI = MF->getFrameInfo();
		bool Commented = false;

// We assume a single instruction only has a spill or reload, not		// We assume a single instruction only has a spill or reload, not
// both.		// both.
const MachineMemOperand *MMO;		const MachineMemOperand *MMO;
if (TII->isLoadFromStackSlotPostFE(MI, FI)) {		if (TII->isLoadFromStackSlotPostFE(MI, FI)) {
if (MFI.isSpillSlotObjectIndex(FI)) {		if (MFI.isSpillSlotObjectIndex(FI)) {
MMO = *MI.memoperands_begin();		MMO = *MI.memoperands_begin();
CommentOS << MMO->getSize() << "-byte Reload\n";		CommentOS << MMO->getSize() << "-byte Reload";
		Commented = true;
}		}
} else if (TII->hasLoadFromStackSlot(MI, MMO, FI)) {		} else if (TII->hasLoadFromStackSlot(MI, MMO, FI)) {
if (MFI.isSpillSlotObjectIndex(FI))		if (MFI.isSpillSlotObjectIndex(FI)) {
CommentOS << MMO->getSize() << "-byte Folded Reload\n";		CommentOS << MMO->getSize() << "-byte Folded Reload";
		Commented = true;
		}
} else if (TII->isStoreToStackSlotPostFE(MI, FI)) {		} else if (TII->isStoreToStackSlotPostFE(MI, FI)) {
if (MFI.isSpillSlotObjectIndex(FI)) {		if (MFI.isSpillSlotObjectIndex(FI)) {
MMO = *MI.memoperands_begin();		MMO = *MI.memoperands_begin();
CommentOS << MMO->getSize() << "-byte Spill\n";		CommentOS << MMO->getSize() << "-byte Spill";
		Commented = true;
}		}
} else if (TII->hasStoreToStackSlot(MI, MMO, FI)) {		} else if (TII->hasStoreToStackSlot(MI, MMO, FI)) {
if (MFI.isSpillSlotObjectIndex(FI))		if (MFI.isSpillSlotObjectIndex(FI)) {
CommentOS << MMO->getSize() << "-byte Folded Spill\n";		CommentOS << MMO->getSize() << "-byte Folded Spill";
		Commented = true;
		}
}		}

// Check for spill-induced copies		// Check for spill-induced copies
if (MI.getAsmPrinterFlag(MachineInstr::ReloadReuse))		if (MI.getAsmPrinterFlag(MachineInstr::ReloadReuse)) {
CommentOS << " Reload Reuse\n";		Commented = true;
		CommentOS << " Reload Reuse";
		}

		if (Commented && AP->EnablePrintSchedInfo)
		RKSimonUnsubmitted Not Done Reply Inline Actions Add brief comment explaining what you're doing. RKSimon: Add brief comment explaining what you're doing.
		// If any comment was added above and we need sched info comment then
		// add this new comment just after the above comment w/o "\n" between them.
		CommentOS << " " << MF->getSubtarget().getSchedInfoStr(MI) << "\n";
		else if (Commented)
		CommentOS << "\n";
		hfinkelUnsubmitted Not Done Reply Inline Actions I think that one ? is enough (three looks excessive to me). hfinkel: I think that one ? is enough (three looks excessive to me).
}		}

		hfinkelUnsubmitted Not Done Reply Inline Actions One only ? here please. hfinkel: One only ? here please.
/// emitImplicitDef - This method emits the specified machine instruction		/// emitImplicitDef - This method emits the specified machine instruction
/// that is an implicit def.		/// that is an implicit def.
void AsmPrinter::emitImplicitDef(const MachineInstr *MI) const {		void AsmPrinter::emitImplicitDef(const MachineInstr *MI) const {
unsigned RegNo = MI->getOperand(0).getReg();		unsigned RegNo = MI->getOperand(0).getReg();

SmallString<128> Str;		SmallString<128> Str;
raw_svector_ostream OS(Str);		raw_svector_ostream OS(Str);
OS << "implicit-def: "		OS << "implicit-def: "
▲ Show 20 Lines • Show All 195 Lines • ▼ Show 20 Lines	for (auto &MI : MBB) {
NamedRegionTimer T(HI.TimerName, HI.TimerDescription,		NamedRegionTimer T(HI.TimerName, HI.TimerDescription,
HI.TimerGroupName, HI.TimerGroupDescription,		HI.TimerGroupName, HI.TimerGroupDescription,
TimePassesIsEnabled);		TimePassesIsEnabled);
HI.Handler->beginInstruction(&MI);		HI.Handler->beginInstruction(&MI);
}		}
}		}

if (isVerbose())		if (isVerbose())
emitComments(MI, OutStreamer->GetCommentOS());		emitComments(MI, OutStreamer->GetCommentOS(), this);

switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case TargetOpcode::CFI_INSTRUCTION:		case TargetOpcode::CFI_INSTRUCTION:
emitCFIInstruction(MI);		emitCFIInstruction(MI);
break;		break;

case TargetOpcode::LOCAL_ESCAPE:		case TargetOpcode::LOCAL_ESCAPE:
emitFrameAlloc(MI);		emitFrameAlloc(MI);
▲ Show 20 Lines • Show All 398 Lines • ▼ Show 20 Lines	void AsmPrinter::SetupMachineFunction(MachineFunction &MF) {
if (!MF.getLandingPads().empty() \|\| MMI->hasDebugInfo() \|\|		if (!MF.getLandingPads().empty() \|\| MMI->hasDebugInfo() \|\|
MF.hasEHFunclets() \|\| NeedsLocalForSize) {		MF.hasEHFunclets() \|\| NeedsLocalForSize) {
CurrentFnBegin = createTempSymbol("func_begin");		CurrentFnBegin = createTempSymbol("func_begin");
if (NeedsLocalForSize)		if (NeedsLocalForSize)
CurrentFnSymForSize = CurrentFnBegin;		CurrentFnSymForSize = CurrentFnBegin;
}		}

ORE = &getAnalysis<MachineOptimizationRemarkEmitterPass>().getORE();		ORE = &getAnalysis<MachineOptimizationRemarkEmitterPass>().getORE();
if (isVerbose())		if (isVerbose())
		hfinkelUnsubmitted Not Done Reply Inline Actions We don't add { } around single-statement blocks. hfinkel: We don't add { } around single-statement blocks.
LI = &getAnalysis<MachineLoopInfo>();		LI = &getAnalysis<MachineLoopInfo>();

		hfinkelUnsubmitted Not Done Reply Inline Actions When you remove the '}' here, I'd keep the blank line ;) hfinkel: When you remove the '}' here, I'd keep the blank line ;)
		const TargetSubtargetInfo &STI = MF.getSubtarget();
		hfinkelUnsubmitted Not Done Reply Inline Actions The command-line option should be declared in this file. That way you can predicate using the command-line option value on getNumOccurrences() as I had indicated previously. Plus, there's no need to have the option name depend on the tool. Full disclosure: There is another option: You can use a tristate setup, look for DefaultOnOff in lib/CodeGen/AsmPrinter/DwarfDebug.cpp. hfinkel: The command-line option should be declared in this file. That way you can predicate using the…
		EnablePrintSchedInfo = PrintSchedule.getNumOccurrences()
		? PrintSchedule
		: STI.supportPrintSchedInfo();
}		}
		hfinkelUnsubmitted Not Done Reply Inline Actions Why is this commented out? You have a target hook and it defaults to false, so that should be fine. hfinkel: Why is this commented out? You have a target hook and it defaults to false, so that should be…

namespace {		namespace {

// Keep track the alignment, constpool entries per Section.		// Keep track the alignment, constpool entries per Section.
struct SectionCPs {		struct SectionCPs {
MCSection *S;		MCSection *S;
unsigned Alignment;		unsigned Alignment;
SmallVector<unsigned, 4> CPEs;		SmallVector<unsigned, 4> CPEs;
▲ Show 20 Lines • Show All 1,396 Lines • Show Last 20 Lines

lib/CodeGen/TargetSchedule.cpp

Show First 20 Lines • Show All 271 Lines • ▼ Show 20 Lines	unsigned TargetSchedModel::computeInstrLatency(unsigned Opcode) const {
assert(hasInstrSchedModel() && "Only call this function with a SchedModel");		assert(hasInstrSchedModel() && "Only call this function with a SchedModel");

unsigned SCIdx = TII->get(Opcode).getSchedClass();		unsigned SCIdx = TII->get(Opcode).getSchedClass();
const MCSchedClassDesc *SCDesc = SchedModel.getSchedClassDesc(SCIdx);		const MCSchedClassDesc *SCDesc = SchedModel.getSchedClassDesc(SCIdx);

if (SCDesc->isValid() && !SCDesc->isVariant())		if (SCDesc->isValid() && !SCDesc->isVariant())
return computeInstrLatency(*SCDesc);		return computeInstrLatency(*SCDesc);

llvm_unreachable("No MI sched latency");		if (SCDesc->isValid()) {
		assert (!SCDesc->isVariant() && "No MI sched latency: SCDesc->isVariant()");
		RKSimonUnsubmitted Not Done Reply Inline Actions Shouldn't this be an assert? RKSimon: Shouldn't this be an assert?
		avt77AuthorUnsubmitted Not Done Reply Inline Actions No, that was the issue: we can have opcodes with invalid SCDesc avt77: No, that was the issue: we can have opcodes with invalid SCDesc
		return computeInstrLatency(*SCDesc);
		}
		return 0;
}		}

unsigned		unsigned
TargetSchedModel::computeInstrLatency(const MachineInstr *MI,		TargetSchedModel::computeInstrLatency(const MachineInstr *MI,
bool UseDefaultDefLatency) const {		bool UseDefaultDefLatency) const {
// For the itinerary model, fall back to the old subtarget hook.		// For the itinerary model, fall back to the old subtarget hook.
// Allow subtargets to compute Bundle latencies outside the machine model.		// Allow subtargets to compute Bundle latencies outside the machine model.
if (hasInstrItineraries() \|\| MI->isBundle() \|\|		if (hasInstrItineraries() \|\| MI->isBundle() \|\|
Show All 37 Lines	if (SCDesc->isValid()) {
*PRE = STI->getWriteProcResEnd(SCDesc); PRI != PRE; ++PRI) {		*PRE = STI->getWriteProcResEnd(SCDesc); PRI != PRE; ++PRI) {
if (!SchedModel.getProcResource(PRI->ProcResourceIdx)->BufferSize)		if (!SchedModel.getProcResource(PRI->ProcResourceIdx)->BufferSize)
return 1;		return 1;
}		}
}		}
}		}
return 0;		return 0;
}		}

		static Optional<double>
		getRTroughputFromItineraries(unsigned schedClass,
		const InstrItineraryData *IID){
		double Unknown = std::numeric_limits<double>::infinity();
		hfinkelUnsubmitted Not Done Reply Inline Actions Please remove this comment and the UseDefaultRThroughput variable. We can always add it later if we'd like. hfinkel: Please remove this comment and the UseDefaultRThroughput variable. We can always add it later…
		double Throughput = Unknown;

		for (const InstrStage *IS = IID->beginStage(schedClass),
		*E = IID->endStage(schedClass);
		IS != E; ++IS) {
		unsigned Cycles = IS->getCycles();
		hfinkelUnsubmitted Not Done Reply Inline Actions This Unknown value should never be returned (this is an implementation detail). Please return an Optional<double>. hfinkel: This Unknown value should never be returned (this is an implementation detail). Please return…
		if (!Cycles)
		continue;
		Throughput =
		std::min(Throughput, countPopulation(IS->getUnits()) * 1.0 / Cycles);
		}
		// We need reciprocal throughput that's why we return such value.
		return 1 / Throughput;
		}

		static Optional<double>
		getRTroughputFromInstrSchedModel(const MCSchedClassDesc *SCDesc,
		const TargetSubtargetInfo *STI,
		const MCSchedModel &SchedModel) {
		double Unknown = std::numeric_limits<double>::infinity();
		double Throughput = Unknown;

		for (const MCWriteProcResEntry *WPR = STI->getWriteProcResBegin(SCDesc),
		*WEnd = STI->getWriteProcResEnd(SCDesc);
		WPR != WEnd; ++WPR) {
		unsigned Cycles = WPR->Cycles;
		if (!Cycles)
		return Optional<double>();

		unsigned NumUnits =
		SchedModel.getProcResource(WPR->ProcResourceIdx)->NumUnits;
		Throughput = std::min(Throughput, NumUnits * 1.0 / Cycles);
		}
		// We need reciprocal throughput that's why we return such value.
		return 1 / Throughput;
		}

		Optional<double>
		TargetSchedModel::computeInstrRThroughput(const MachineInstr *MI) const {
		if (hasInstrItineraries())
		return getRTroughputFromItineraries(MI->getDesc().getSchedClass(),
		getInstrItineraries());
		if (hasInstrSchedModel())
		return getRTroughputFromInstrSchedModel(resolveSchedClass(MI), STI,
		SchedModel);
		return Optional<double>();
		}

		Optional<double>
		TargetSchedModel::computeInstrRThroughput(unsigned Opcode) const {
		unsigned SchedClass = TII->get(Opcode).getSchedClass();
		if (hasInstrItineraries())
		return getRTroughputFromItineraries(SchedClass, getInstrItineraries());
		if (hasInstrSchedModel()) {
		const MCSchedClassDesc *SCDesc = SchedModel.getSchedClassDesc(SchedClass);
		if (SCDesc->isValid() && !SCDesc->isVariant())
		return getRTroughputFromInstrSchedModel(SCDesc, STI, SchedModel);
		}
		return Optional<double>();
		}

lib/CodeGen/TargetSubtargetInfo.cpp

	//===-- TargetSubtargetInfo.cpp - General Target Information ---------------==//			//===-- TargetSubtargetInfo.cpp - General Target Information ---------------==//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	/// \file This file describes the general parts of a Subtarget.			/// \file This file describes the general parts of a Subtarget.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

				#include "llvm/CodeGen/MachineInstr.h"
				#include "llvm/CodeGen/TargetSchedule.h"
				#include "llvm/Support/raw_ostream.h"
	#include "llvm/Target/TargetSubtargetInfo.h"			#include "llvm/Target/TargetSubtargetInfo.h"
	using namespace llvm;			using namespace llvm;

	//---------------------------------------------------------------------------			//---------------------------------------------------------------------------
	// TargetSubtargetInfo Class			// TargetSubtargetInfo Class
	//			//
	TargetSubtargetInfo::TargetSubtargetInfo(			TargetSubtargetInfo::TargetSubtargetInfo(
	const Triple &TT, StringRef CPU, StringRef FS,			const Triple &TT, StringRef CPU, StringRef FS,
	Show All 25 Lines

	bool TargetSubtargetInfo::enablePostRAScheduler() const {			bool TargetSubtargetInfo::enablePostRAScheduler() const {
	return getSchedModel().PostRAScheduler;			return getSchedModel().PostRAScheduler;
	}			}

	bool TargetSubtargetInfo::useAA() const {			bool TargetSubtargetInfo::useAA() const {
	return false;			return false;
	}			}

				static std::string createSchedInfoStr(unsigned Latency,
				Optional<double> RThroughput) {
				static const char *SchedPrefix = " sched: [";
				std::string Comment;
				raw_string_ostream CS(Comment);
				if (Latency > 0 && RThroughput.hasValue())
				CS << SchedPrefix << Latency << format(":%2.2f", RThroughput.getValue())
				<< "]";
				else if (Latency > 0)
				CS << SchedPrefix << Latency << ":?]";
				else if (RThroughput.hasValue())
				CS << SchedPrefix << "?:" << RThroughput.getValue() << "]";
				CS.flush();
				return Comment;
				}

				/// Returns string representation of scheduler comment
				std::string TargetSubtargetInfo::getSchedInfoStr(const MachineInstr &MI) const {
				if (MI.isPseudo() \|\| MI.isTerminator())
				return std::string();
				// We don't cache TSchedModel because it depends on TargetInstrInfo
				// that could be changed during the compilation
				TargetSchedModel TSchedModel;
				TSchedModel.init(getSchedModel(), this, getInstrInfo());
				unsigned Latency = TSchedModel.computeInstrLatency(&MI);
				Optional<double> RThroughput = TSchedModel.computeInstrRThroughput(&MI);
				return createSchedInfoStr(Latency, RThroughput);
				}

				/// Returns string representation of scheduler comment
				std::string TargetSubtargetInfo::getSchedInfoStr(MCInst const &MCI) const {
				// We don't cache TSchedModel because it depends on TargetInstrInfo
				// that could be changed during the compilation
				TargetSchedModel TSchedModel;
				TSchedModel.init(getSchedModel(), this, getInstrInfo());
				if (!TSchedModel.hasInstrSchedModel())
				return std::string();
				unsigned Latency = TSchedModel.computeInstrLatency(MCI.getOpcode());
				Optional<double> RThroughput =
				TSchedModel.computeInstrRThroughput(MCI.getOpcode());
				return createSchedInfoStr(Latency, RThroughput);
				}

lib/MC/MCAsmStreamer.cpp

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	public:

/// AddComment - Add a comment that can be emitted to the generated .s		/// AddComment - Add a comment that can be emitted to the generated .s
/// file if applicable as a QoI issue to make the output of the compiler		/// file if applicable as a QoI issue to make the output of the compiler
/// more readable. This only affects the MCAsmStreamer, and only when		/// more readable. This only affects the MCAsmStreamer, and only when
/// verbose assembly output is enabled.		/// verbose assembly output is enabled.
void AddComment(const Twine &T, bool EOL = true) override;		void AddComment(const Twine &T, bool EOL = true) override;

/// AddEncodingComment - Add a comment showing the encoding of an instruction.		/// AddEncodingComment - Add a comment showing the encoding of an instruction.
void AddEncodingComment(const MCInst &Inst, const MCSubtargetInfo &);		/// If PrintSchedInfo - is true then the comment sched:[x:y] should
		// be added to output if it's being supported by target
		void AddEncodingComment(const MCInst &Inst, const MCSubtargetInfo &,
		bool PrintSchedInfo);
		RKSimonUnsubmitted Not Done Reply Inline Actions Explain how PrintSchedInfo will be used in comment RKSimon: Explain how PrintSchedInfo will be used in comment
		RKSimonUnsubmitted Not Done Reply Inline Actions Don't use @param - you have to provide all params otherwise Wdocumentation fires. RKSimon: Don't use @param - you have to provide all params otherwise Wdocumentation fires.

/// GetCommentOS - Return a raw_ostream that comments can be written to.		/// GetCommentOS - Return a raw_ostream that comments can be written to.
/// Unlike AddComment, you are required to terminate comments with \n if you		/// Unlike AddComment, you are required to terminate comments with \n if you
/// use this method.		/// use this method.
raw_ostream &GetCommentOS() override {		raw_ostream &GetCommentOS() override {
if (!IsVerboseAsm)		if (!IsVerboseAsm)
return nulls(); // Discard comments unless in verbose asm mode.		return nulls(); // Discard comments unless in verbose asm mode.
return CommentStream;		return CommentStream;
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	public:
void EmitWinCFISaveReg(unsigned Register, unsigned Offset) override;		void EmitWinCFISaveReg(unsigned Register, unsigned Offset) override;
void EmitWinCFISaveXMM(unsigned Register, unsigned Offset) override;		void EmitWinCFISaveXMM(unsigned Register, unsigned Offset) override;
void EmitWinCFIPushFrame(bool Code) override;		void EmitWinCFIPushFrame(bool Code) override;
void EmitWinCFIEndProlog() override;		void EmitWinCFIEndProlog() override;

void EmitWinEHHandler(const MCSymbol *Sym, bool Unwind, bool Except) override;		void EmitWinEHHandler(const MCSymbol *Sym, bool Unwind, bool Except) override;
void EmitWinEHHandlerData() override;		void EmitWinEHHandlerData() override;

void EmitInstruction(const MCInst &Inst, const MCSubtargetInfo &STI) override;		void EmitInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,
		bool PrintSchedInfo) override;

void EmitBundleAlignMode(unsigned AlignPow2) override;		void EmitBundleAlignMode(unsigned AlignPow2) override;
void EmitBundleLock(bool AlignToEnd) override;		void EmitBundleLock(bool AlignToEnd) override;
void EmitBundleUnlock() override;		void EmitBundleUnlock() override;

bool EmitRelocDirective(const MCExpr &Offset, StringRef Name,		bool EmitRelocDirective(const MCExpr &Offset, StringRef Name,
const MCExpr *Expr, SMLoc Loc) override;		const MCExpr *Expr, SMLoc Loc) override;

▲ Show 20 Lines • Show All 1,209 Lines • ▼ Show 20 Lines
void MCAsmStreamer::EmitWinCFIEndProlog() {		void MCAsmStreamer::EmitWinCFIEndProlog() {
MCStreamer::EmitWinCFIEndProlog();		MCStreamer::EmitWinCFIEndProlog();

OS << "\t.seh_endprologue";		OS << "\t.seh_endprologue";
EmitEOL();		EmitEOL();
}		}

void MCAsmStreamer::AddEncodingComment(const MCInst &Inst,		void MCAsmStreamer::AddEncodingComment(const MCInst &Inst,
const MCSubtargetInfo &STI) {		const MCSubtargetInfo &STI,
		bool PrintSchedInfo) {
raw_ostream &OS = GetCommentOS();		raw_ostream &OS = GetCommentOS();
SmallString<256> Code;		SmallString<256> Code;
SmallVector<MCFixup, 4> Fixups;		SmallVector<MCFixup, 4> Fixups;
raw_svector_ostream VecOS(Code);		raw_svector_ostream VecOS(Code);
Emitter->encodeInstruction(Inst, VecOS, Fixups, STI);		Emitter->encodeInstruction(Inst, VecOS, Fixups, STI);

// If we are showing fixups, create symbolic markers in the encoded		// If we are showing fixups, create symbolic markers in the encoded
// representation. We do this by making a per-bit map to the fixup item index,		// representation. We do this by making a per-bit map to the fixup item index,
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	if (MapEntry != uint8_t(~0U)) {
if (uint8_t MapEntry = FixupMap[FixupBit]) {		if (uint8_t MapEntry = FixupMap[FixupBit]) {
assert(Bit == 0 && "Encoder wrote into fixed up bit!");		assert(Bit == 0 && "Encoder wrote into fixed up bit!");
OS << char('A' + MapEntry - 1);		OS << char('A' + MapEntry - 1);
} else		} else
OS << Bit;		OS << Bit;
}		}
}		}
}		}
OS << "]\n";		OS << "]";
		// If we are not going to add fixup or schedul comments after this point then
		RKSimonUnsubmitted Not Done Reply Inline Actions Comment this RKSimon: Comment this
		// we have to end the current comment line with "\n".
		if (Fixups.size() \|\| !PrintSchedInfo)
		OS << "\n";

for (unsigned i = 0, e = Fixups.size(); i != e; ++i) {		for (unsigned i = 0, e = Fixups.size(); i != e; ++i) {
MCFixup &F = Fixups[i];		MCFixup &F = Fixups[i];
const MCFixupKindInfo &Info = AsmBackend->getFixupKindInfo(F.getKind());		const MCFixupKindInfo &Info = AsmBackend->getFixupKindInfo(F.getKind());
OS << " fixup " << char('A' + i) << " - " << "offset: " << F.getOffset()		OS << " fixup " << char('A' + i) << " - " << "offset: " << F.getOffset()
<< ", value: " << *F.getValue() << ", kind: " << Info.Name << "\n";		<< ", value: " << *F.getValue() << ", kind: " << Info.Name << "\n";
}		}
}		}

void MCAsmStreamer::EmitInstruction(const MCInst &Inst,		void MCAsmStreamer::EmitInstruction(const MCInst &Inst,
const MCSubtargetInfo &STI) {		const MCSubtargetInfo &STI,
		bool PrintSchedInfo) {
assert(getCurrentSectionOnly() &&		assert(getCurrentSectionOnly() &&
"Cannot emit contents before setting section!");		"Cannot emit contents before setting section!");

// Show the encoding in a comment if we have a code emitter.		// Show the encoding in a comment if we have a code emitter.
if (Emitter)		if (Emitter)
AddEncodingComment(Inst, STI);		AddEncodingComment(Inst, STI, PrintSchedInfo);

// Show the MCInst if enabled.		// Show the MCInst if enabled.
if (ShowInst) {		if (ShowInst) {
		if (PrintSchedInfo)
		GetCommentOS() << "\n";
Inst.dump_pretty(GetCommentOS(), InstPrinter.get(), "\n ");		Inst.dump_pretty(GetCommentOS(), InstPrinter.get(), "\n ");
GetCommentOS() << "\n";		GetCommentOS() << "\n";
}		}

if(getTargetStreamer())		if(getTargetStreamer())
getTargetStreamer()->prettyPrintAsm(*InstPrinter, OS, Inst, STI);		getTargetStreamer()->prettyPrintAsm(*InstPrinter, OS, Inst, STI);
else		else
InstPrinter->printInst(&Inst, OS, "", STI);		InstPrinter->printInst(&Inst, OS, "", STI);

		if (PrintSchedInfo) {
		std::string SI = STI.getSchedInfoStr(Inst);
		if (!SI.empty())
		GetCommentOS() << SI;
		}

		StringRef Comments = CommentToEmit;
		if (Comments.size() && Comments.back() != '\n')
		GetCommentOS() << "\n";

EmitEOL();		EmitEOL();
}		}

void MCAsmStreamer::EmitBundleAlignMode(unsigned AlignPow2) {		void MCAsmStreamer::EmitBundleAlignMode(unsigned AlignPow2) {
OS << "\t.bundle_align_mode " << AlignPow2;		OS << "\t.bundle_align_mode " << AlignPow2;
EmitEOL();		EmitEOL();
}		}

▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

lib/MC/MCObjectStreamer.cpp

Show First 20 Lines • Show All 232 Lines • ▼ Show 20 Lines	void MCObjectStreamer::EmitAssignment(MCSymbol Symbol, const MCExpr Value) {
MCStreamer::EmitAssignment(Symbol, Value);		MCStreamer::EmitAssignment(Symbol, Value);
}		}

bool MCObjectStreamer::mayHaveInstructions(MCSection &Sec) const {		bool MCObjectStreamer::mayHaveInstructions(MCSection &Sec) const {
return Sec.hasInstructions();		return Sec.hasInstructions();
}		}

void MCObjectStreamer::EmitInstruction(const MCInst &Inst,		void MCObjectStreamer::EmitInstruction(const MCInst &Inst,
const MCSubtargetInfo &STI) {		const MCSubtargetInfo &STI, bool) {
		RKSimonUnsubmitted Not Done Reply Inline Actions You're inconsistent leaving some unused argument without names and other with. RKSimon: You're inconsistent leaving some unused argument without names and other with.
MCStreamer::EmitInstruction(Inst, STI);		MCStreamer::EmitInstruction(Inst, STI);

MCSection *Sec = getCurrentSectionOnly();		MCSection *Sec = getCurrentSectionOnly();
Sec->setHasInstructions(true);		Sec->setHasInstructions(true);

// Now that a machine instruction has been assembled into this section, make		// Now that a machine instruction has been assembled into this section, make
// a line entry for any .loc directive that has been seen.		// a line entry for any .loc directive that has been seen.
MCCVLineEntry::Make(this);		MCCVLineEntry::Make(this);
▲ Show 20 Lines • Show All 351 Lines • Show Last 20 Lines

lib/MC/MCStreamer.cpp

Show First 20 Lines • Show All 771 Lines • ▼ Show 20 Lines	case MCExpr::SymbolRef:
break;		break;

case MCExpr::Unary:		case MCExpr::Unary:
visitUsedExpr(*cast<MCUnaryExpr>(Expr).getSubExpr());		visitUsedExpr(*cast<MCUnaryExpr>(Expr).getSubExpr());
break;		break;
}		}
}		}

void MCStreamer::EmitInstruction(const MCInst &Inst,		void MCStreamer::EmitInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,
const MCSubtargetInfo &STI) {		bool) {
// Scan for values.		// Scan for values.
for (unsigned i = Inst.getNumOperands(); i--;)		for (unsigned i = Inst.getNumOperands(); i--;)
if (Inst.getOperand(i).isExpr())		if (Inst.getOperand(i).isExpr())
visitUsedExpr(*Inst.getOperand(i).getExpr());		visitUsedExpr(*Inst.getOperand(i).getExpr());
}		}

void MCStreamer::emitAbsoluteSymbolDiff(const MCSymbol Hi, const MCSymbol Lo,		void MCStreamer::emitAbsoluteSymbolDiff(const MCSymbol Hi, const MCSymbol Lo,
unsigned Size) {		unsigned Size) {
▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

lib/Object/RecordStreamer.h

Show All 28 Lines	private:
void markUsed(const MCSymbol &Symbol);		void markUsed(const MCSymbol &Symbol);
void visitUsedSymbol(const MCSymbol &Sym) override;		void visitUsedSymbol(const MCSymbol &Sym) override;

public:		public:
typedef StringMap<State>::const_iterator const_iterator;		typedef StringMap<State>::const_iterator const_iterator;
const_iterator begin();		const_iterator begin();
const_iterator end();		const_iterator end();
RecordStreamer(MCContext &Context);		RecordStreamer(MCContext &Context);
void EmitInstruction(const MCInst &Inst, const MCSubtargetInfo &STI) override;		void EmitInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,
		bool) override;
void EmitLabel(MCSymbol *Symbol, SMLoc Loc = SMLoc()) override;		void EmitLabel(MCSymbol *Symbol, SMLoc Loc = SMLoc()) override;
void EmitAssignment(MCSymbol Symbol, const MCExpr Value) override;		void EmitAssignment(MCSymbol Symbol, const MCExpr Value) override;
bool EmitSymbolAttribute(MCSymbol *Symbol, MCSymbolAttr Attribute) override;		bool EmitSymbolAttribute(MCSymbol *Symbol, MCSymbolAttr Attribute) override;
void EmitZerofill(MCSection Section, MCSymbol Symbol, uint64_t Size,		void EmitZerofill(MCSection Section, MCSymbol Symbol, uint64_t Size,
unsigned ByteAlignment) override;		unsigned ByteAlignment) override;
void EmitCommonSymbol(MCSymbol *Symbol, uint64_t Size,		void EmitCommonSymbol(MCSymbol *Symbol, uint64_t Size,
unsigned ByteAlignment) override;		unsigned ByteAlignment) override;
/// Record .symver aliases for later processing.		/// Record .symver aliases for later processing.
Show All 16 Lines

lib/Object/RecordStreamer.cpp

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	RecordStreamer::const_iterator RecordStreamer::begin() {
return Symbols.begin();		return Symbols.begin();
}		}

RecordStreamer::const_iterator RecordStreamer::end() { return Symbols.end(); }		RecordStreamer::const_iterator RecordStreamer::end() { return Symbols.end(); }

RecordStreamer::RecordStreamer(MCContext &Context) : MCStreamer(Context) {}		RecordStreamer::RecordStreamer(MCContext &Context) : MCStreamer(Context) {}

void RecordStreamer::EmitInstruction(const MCInst &Inst,		void RecordStreamer::EmitInstruction(const MCInst &Inst,
const MCSubtargetInfo &STI) {		const MCSubtargetInfo &STI, bool) {
MCStreamer::EmitInstruction(Inst, STI);		MCStreamer::EmitInstruction(Inst, STI);
		RKSimonUnsubmitted Not Done Reply Inline Actions 'B' is meaningless - please use a real name to explain its usage. RKSimon: 'B' is meaningless - please use a real name to explain its usage.
}		}

void RecordStreamer::EmitLabel(MCSymbol *Symbol, SMLoc Loc) {		void RecordStreamer::EmitLabel(MCSymbol *Symbol, SMLoc Loc) {
MCStreamer::EmitLabel(Symbol);		MCStreamer::EmitLabel(Symbol);
markDefined(*Symbol);		markDefined(*Symbol);
}		}

void RecordStreamer::EmitAssignment(MCSymbol Symbol, const MCExpr Value) {		void RecordStreamer::EmitAssignment(MCSymbol Symbol, const MCExpr Value) {
Show All 27 Lines

lib/Target/AArch64/MCTargetDesc/AArch64ELFStreamer.cpp

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	void ChangeSection(MCSection Section, const MCExpr Subsection) override {
LastEMS = LastMappingSymbols.lookup(Section);		LastEMS = LastMappingSymbols.lookup(Section);

MCELFStreamer::ChangeSection(Section, Subsection);		MCELFStreamer::ChangeSection(Section, Subsection);
}		}

/// This function is the one used to emit instruction data into the ELF		/// This function is the one used to emit instruction data into the ELF
/// streamer. We override it to add the appropriate mapping symbol if		/// streamer. We override it to add the appropriate mapping symbol if
/// necessary.		/// necessary.
void EmitInstruction(const MCInst &Inst,		void EmitInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,
const MCSubtargetInfo &STI) override {		bool) override {
EmitA64MappingSymbol();		EmitA64MappingSymbol();
MCELFStreamer::EmitInstruction(Inst, STI);		MCELFStreamer::EmitInstruction(Inst, STI);
}		}

/// Emit a 32-bit value as an instruction. This is only used for the .inst		/// Emit a 32-bit value as an instruction. This is only used for the .inst
/// directive, EmitInstruction should be used in other cases.		/// directive, EmitInstruction should be used in other cases.
void emitInst(uint32_t Inst) {		void emitInst(uint32_t Inst) {
char Buffer[4];		char Buffer[4];
▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp

Show First 20 Lines • Show All 471 Lines • ▼ Show 20 Lines	if (LastMappingSymbol != LastMappingSymbols.end()) {
return;		return;
}		}
LastEMSInfo.reset(new ElfMappingSymbolInfo(SMLoc(), nullptr, 0));		LastEMSInfo.reset(new ElfMappingSymbolInfo(SMLoc(), nullptr, 0));
}		}

/// This function is the one used to emit instruction data into the ELF		/// This function is the one used to emit instruction data into the ELF
/// streamer. We override it to add the appropriate mapping symbol if		/// streamer. We override it to add the appropriate mapping symbol if
/// necessary.		/// necessary.
void EmitInstruction(const MCInst& Inst,		void EmitInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,
const MCSubtargetInfo &STI) override {		bool) override {
if (IsThumb)		if (IsThumb)
EmitThumbMappingSymbol();		EmitThumbMappingSymbol();
else		else
EmitARMMappingSymbol();		EmitARMMappingSymbol();

MCELFStreamer::EmitInstruction(Inst, STI);		MCELFStreamer::EmitInstruction(Inst, STI);
}		}

▲ Show 20 Lines • Show All 1,014 Lines • Show Last 20 Lines

lib/Target/Hexagon/MCTargetDesc/HexagonMCELFStreamer.h

Show All 28 Lines	public:

HexagonMCELFStreamer(MCContext &Context,		HexagonMCELFStreamer(MCContext &Context,
MCAsmBackend &TAB,		MCAsmBackend &TAB,
raw_pwrite_stream &OS, MCCodeEmitter *Emitter,		raw_pwrite_stream &OS, MCCodeEmitter *Emitter,
MCAssembler *Assembler) :		MCAssembler *Assembler) :
MCELFStreamer(Context, TAB, OS, Emitter),		MCELFStreamer(Context, TAB, OS, Emitter),
MCII (createHexagonMCInstrInfo()) {}		MCII (createHexagonMCInstrInfo()) {}

void EmitInstruction(const MCInst &Inst, const MCSubtargetInfo &STI) override;		void EmitInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,
		bool) override;
void EmitSymbol(const MCInst &Inst);		void EmitSymbol(const MCInst &Inst);
void HexagonMCEmitLocalCommonSymbol(MCSymbol *Symbol, uint64_t Size,		void HexagonMCEmitLocalCommonSymbol(MCSymbol *Symbol, uint64_t Size,
unsigned ByteAlignment,		unsigned ByteAlignment,
unsigned AccessSize);		unsigned AccessSize);
void HexagonMCEmitCommonSymbol(MCSymbol *Symbol, uint64_t Size,		void HexagonMCEmitCommonSymbol(MCSymbol *Symbol, uint64_t Size,
unsigned ByteAlignment, unsigned AccessSize);		unsigned ByteAlignment, unsigned AccessSize);
};		};

MCStreamer *createHexagonELFStreamer(Triple const &TT, MCContext &Context,		MCStreamer *createHexagonELFStreamer(Triple const &TT, MCContext &Context,
MCAsmBackend &MAB, raw_pwrite_stream &OS,		MCAsmBackend &MAB, raw_pwrite_stream &OS,
MCCodeEmitter *CE);		MCCodeEmitter *CE);

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_HEXAGON_MCTARGETDESC_HEXAGONMCELFSTREAMER_H		#endif // LLVM_LIB_TARGET_HEXAGON_MCTARGETDESC_HEXAGONMCELFSTREAMER_H

lib/Target/Hexagon/MCTargetDesc/HexagonMCELFStreamer.cpp

	Show All 38 Lines

	static cl::opt<unsigned> GPSize			static cl::opt<unsigned> GPSize
	("gpsize", cl::NotHidden,			("gpsize", cl::NotHidden,
	cl::desc("Global Pointer Addressing Size. The default size is 8."),			cl::desc("Global Pointer Addressing Size. The default size is 8."),
	cl::Prefix,			cl::Prefix,
	cl::init(8));			cl::init(8));

	void HexagonMCELFStreamer::EmitInstruction(const MCInst &MCB,			void HexagonMCELFStreamer::EmitInstruction(const MCInst &MCB,
	const MCSubtargetInfo &STI) {			const MCSubtargetInfo &STI, bool) {
	assert(MCB.getOpcode() == Hexagon::BUNDLE);			assert(MCB.getOpcode() == Hexagon::BUNDLE);
	assert(HexagonMCInstrInfo::bundleSize(MCB) <= HEXAGON_PACKET_SIZE);			assert(HexagonMCInstrInfo::bundleSize(MCB) <= HEXAGON_PACKET_SIZE);
	assert(HexagonMCInstrInfo::bundleSize(MCB) > 0);			assert(HexagonMCInstrInfo::bundleSize(MCB) > 0);
	bool Extended = false;			bool Extended = false;
	for (auto &I : HexagonMCInstrInfo::bundleInstructions(MCB)) {			for (auto &I : HexagonMCInstrInfo::bundleInstructions(MCB)) {
	MCInst MCI = const_cast<MCInst >(I.getInst());			MCInst MCI = const_cast<MCInst >(I.getInst());
	if (Extended) {			if (Extended) {
	if (HexagonMCInstrInfo::isDuplex(MCII, MCI)) {			if (HexagonMCInstrInfo::isDuplex(MCII, MCI)) {
	▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

lib/Target/Mips/MCTargetDesc/MipsELFStreamer.h

Show All 39 Lines	MipsELFStreamer(MCContext &Context, MCAsmBackend &MAB, raw_pwrite_stream &OS,
MipsOptionRecords.push_back(		MipsOptionRecords.push_back(
std::unique_ptr<MipsRegInfoRecord>(RegInfoRecord));		std::unique_ptr<MipsRegInfoRecord>(RegInfoRecord));
}		}

/// Overriding this function allows us to add arbitrary behaviour before the		/// Overriding this function allows us to add arbitrary behaviour before the
/// \p Inst is actually emitted. For example, we can inspect the operands and		/// \p Inst is actually emitted. For example, we can inspect the operands and
/// gather sufficient information that allows us to reason about the register		/// gather sufficient information that allows us to reason about the register
/// usage for the translation unit.		/// usage for the translation unit.
void EmitInstruction(const MCInst &Inst, const MCSubtargetInfo &STI) override;		void EmitInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,
		bool = false) override;
		RKSimonUnsubmitted Not Done Reply Inline Actions Why have you include a default value here?? RKSimon: Why have you include a default value here??

/// Overriding this function allows us to record all labels that should be		/// Overriding this function allows us to record all labels that should be
/// marked as microMIPS. Based on this data marking is done in		/// marked as microMIPS. Based on this data marking is done in
/// EmitInstruction.		/// EmitInstruction.
void EmitLabel(MCSymbol *Symbol, SMLoc Loc = SMLoc()) override;		void EmitLabel(MCSymbol *Symbol, SMLoc Loc = SMLoc()) override;

/// Overriding this function allows us to dismiss all labels that are		/// Overriding this function allows us to dismiss all labels that are
/// candidates for marking as microMIPS when .section directive is processed.		/// candidates for marking as microMIPS when .section directive is processed.
Show All 20 Lines

lib/Target/Mips/MCTargetDesc/MipsELFStreamer.cpp

	Show All 14 Lines
	#include "llvm/MC/MCInst.h"			#include "llvm/MC/MCInst.h"
	#include "llvm/MC/MCSymbolELF.h"			#include "llvm/MC/MCSymbolELF.h"
	#include "llvm/Support/Casting.h"			#include "llvm/Support/Casting.h"
	#include "llvm/Support/ELF.h"			#include "llvm/Support/ELF.h"

	using namespace llvm;			using namespace llvm;

	void MipsELFStreamer::EmitInstruction(const MCInst &Inst,			void MipsELFStreamer::EmitInstruction(const MCInst &Inst,
	const MCSubtargetInfo &STI) {			const MCSubtargetInfo &STI, bool) {
	MCELFStreamer::EmitInstruction(Inst, STI);			MCELFStreamer::EmitInstruction(Inst, STI);

	MCContext &Context = getContext();			MCContext &Context = getContext();
	const MCRegisterInfo *MCRegInfo = Context.getRegisterInfo();			const MCRegisterInfo *MCRegInfo = Context.getRegisterInfo();

	for (unsigned OpIndex = 0; OpIndex < Inst.getNumOperands(); ++OpIndex) {			for (unsigned OpIndex = 0; OpIndex < Inst.getNumOperands(); ++OpIndex) {
	const MCOperand &Op = Inst.getOperand(OpIndex);			const MCOperand &Op = Inst.getOperand(OpIndex);

	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

lib/Target/Mips/MCTargetDesc/MipsNaClELFStreamer.cpp

Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	if (MaskAfter) {
emitMask(SPReg, LoadStoreStackMaskReg, STI);		emitMask(SPReg, LoadStoreStackMaskReg, STI);
}		}
EmitBundleUnlock();		EmitBundleUnlock();
}		}

public:		public:
/// This function is the one used to emit instruction data into the ELF		/// This function is the one used to emit instruction data into the ELF
/// streamer. We override it to mask dangerous instructions.		/// streamer. We override it to mask dangerous instructions.
void EmitInstruction(const MCInst &Inst,		void EmitInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,
const MCSubtargetInfo &STI) override {		bool) override {
// Sandbox indirect jumps.		// Sandbox indirect jumps.
if (isIndirectJump(Inst)) {		if (isIndirectJump(Inst)) {
if (PendingCall)		if (PendingCall)
report_fatal_error("Dangerous instruction in branch delay slot!");		report_fatal_error("Dangerous instruction in branch delay slot!");
sandboxIndirectJump(Inst, STI);		sandboxIndirectJump(Inst, STI);
return;		return;
}		}

▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines

lib/Target/X86/InstPrinter/X86InstComments.cpp

Show First 20 Lines • Show All 1,183 Lines • ▼ Show 20 Lines	while (i != e && (int)ShuffleMask[i] != SM_SentinelZero &&
OS << "u";		OS << "u";
else		else
OS << ShuffleMask[i] % ShuffleMask.size();		OS << ShuffleMask[i] % ShuffleMask.size();
++i;		++i;
}		}
OS << ']';		OS << ']';
--i; // For loop increments element #.		--i; // For loop increments element #.
}		}
//MI->print(OS, 0);
OS << "\n";

// We successfully added a comment to this instruction.		// We successfully added a comment to this instruction.
return true;		return true;
}		}

lib/Target/X86/X86MCInstLower.cpp

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	void X86AsmPrinter::StackMapShadowTracker::emitShadowPadding(
if (InShadow && CurrentShadowSize < RequiredShadowSize) {		if (InShadow && CurrentShadowSize < RequiredShadowSize) {
InShadow = false;		InShadow = false;
EmitNops(OutStreamer, RequiredShadowSize - CurrentShadowSize,		EmitNops(OutStreamer, RequiredShadowSize - CurrentShadowSize,
MF->getSubtarget<X86Subtarget>().is64Bit(), STI);		MF->getSubtarget<X86Subtarget>().is64Bit(), STI);
}		}
}		}

void X86AsmPrinter::EmitAndCountInstruction(MCInst &Inst) {		void X86AsmPrinter::EmitAndCountInstruction(MCInst &Inst) {
OutStreamer->EmitInstruction(Inst, getSubtargetInfo());		OutStreamer->EmitInstruction(Inst, getSubtargetInfo(), EnablePrintSchedInfo);
SMShadowTracker.count(Inst, getSubtargetInfo(), CodeEmitter.get());		SMShadowTracker.count(Inst, getSubtargetInfo(), CodeEmitter.get());
}		}

X86MCInstLower::X86MCInstLower(const MachineFunction &mf,		X86MCInstLower::X86MCInstLower(const MachineFunction &mf,
X86AsmPrinter &asmprinter)		X86AsmPrinter &asmprinter)
: Ctx(mf.getContext()), MF(mf), TM(mf.getTarget()), MAI(*TM.getMCAsmInfo()),		: Ctx(mf.getContext()), MF(mf), TM(mf.getTarget()), MAI(*TM.getMCAsmInfo()),
AsmPrinter(asmprinter) {}		AsmPrinter(asmprinter) {}

▲ Show 20 Lines • Show All 1,147 Lines • ▼ Show 20 Lines	void X86AsmPrinter::EmitInstruction(const MachineInstr *MI) {

// Add a comment about EVEX-2-VEX compression for AVX-512 instrs that		// Add a comment about EVEX-2-VEX compression for AVX-512 instrs that
// are compressed from EVEX encoding to VEX encoding.		// are compressed from EVEX encoding to VEX encoding.
if (TM.Options.MCOptions.ShowMCEncoding) {		if (TM.Options.MCOptions.ShowMCEncoding) {
if (MI->getAsmPrinterFlags() & AC_EVEX_2_VEX)		if (MI->getAsmPrinterFlags() & AC_EVEX_2_VEX)
OutStreamer->AddComment("EVEX TO VEX Compression ", false);		OutStreamer->AddComment("EVEX TO VEX Compression ", false);
}		}

switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
		hfinkelUnsubmitted Not Done Reply Inline Actions This seems really useful, but is not target dependent. Can you please move this hook into the target-independent code? Maybe in void AsmPrinter::EmitFunctionBody(), around here: default: EmitInstruction(&MI); break; (right before the call to EmitInstruction). hfinkel: This seems really useful, but is not target dependent. Can you please move this hook into the…
case TargetOpcode::DBG_VALUE:		case TargetOpcode::DBG_VALUE:
llvm_unreachable("Should be handled target independently");		llvm_unreachable("Should be handled target independently");

// Emit nothing here but a comment if we can.		// Emit nothing here but a comment if we can.
case X86::Int_MemBarrier:		case X86::Int_MemBarrier:
OutStreamer->emitRawComment("MEMBARRIER");		OutStreamer->emitRawComment("MEMBARRIER");
return;		return;


		hfinkelUnsubmitted Not Done Reply Inline Actions Can you call TII->getInstrLatency here instead of computing it in this loop? (if you just call SCModel->computeInstrLatency, as suggested below, this will take care of itself). hfinkel: Can you call TII->getInstrLatency here instead of computing it in this loop? (if you just call…
case X86::EH_RETURN:		case X86::EH_RETURN:
case X86::EH_RETURN64: {		case X86::EH_RETURN64: {
// Lower these as normal, but add some comments.		// Lower these as normal, but add some comments.
unsigned Reg = MI->getOperand(0).getReg();		unsigned Reg = MI->getOperand(0).getReg();
OutStreamer->AddComment(StringRef("eh_return, addr: %") +		OutStreamer->AddComment(StringRef("eh_return, addr: %") +
X86ATTInstPrinter::getRegisterName(Reg));		X86ATTInstPrinter::getRegisterName(Reg));
break;		break;
}		}
case X86::CLEANUPRET: {		case X86::CLEANUPRET: {
// Lower these as normal, but add some comments.		// Lower these as normal, but add some comments.
OutStreamer->AddComment("CLEANUPRET");		OutStreamer->AddComment("CLEANUPRET");
break;		break;
}		}
		hfinkelUnsubmitted Not Done Reply Inline Actions Can't you just call SCModel->computeInstrLatency return the result? The logic there seems like exactly what you want: unsigned TargetSchedModel::computeInstrLatency(const MachineInstr MI, bool UseDefaultDefLatency) const { // For the itinerary model, fall back to the old subtarget hook. // Allow subtargets to compute Bundle latencies outside the machine model. if (hasInstrItineraries() \|\| MI->isBundle() \|\| (!hasInstrSchedModel() && !UseDefaultDefLatency)) return TII->getInstrLatency(&InstrItins, MI); if (hasInstrSchedModel()) { const MCSchedClassDesc SCDesc = resolveSchedClass(MI); if (SCDesc->isValid()) return computeInstrLatency(SCDesc); } return TII->defaultDefLatency(SchedModel, MI); } hfinkel:* Can't you just call SCModel->computeInstrLatency return the result? The logic there seems like…

case X86::CATCHRET: {		case X86::CATCHRET: {
// Lower these as normal, but add some comments.		// Lower these as normal, but add some comments.
OutStreamer->AddComment("CATCHRET");		OutStreamer->AddComment("CATCHRET");
break;		break;
}		}

case X86::TAILJMPr:		case X86::TAILJMPr:
▲ Show 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	case X86::VPSHUFBZrmkz: {
assert(MI->getNumOperands() >= 6 &&		assert(MI->getNumOperands() >= 6 &&
"We should always have at least 6 operands!");		"We should always have at least 6 operands!");

const MachineOperand &MaskOp = MI->getOperand(MaskIdx);		const MachineOperand &MaskOp = MI->getOperand(MaskIdx);
if (auto C = getConstantFromPool(MI, MaskOp)) {		if (auto C = getConstantFromPool(MI, MaskOp)) {
SmallVector<int, 64> Mask;		SmallVector<int, 64> Mask;
DecodePSHUFBMask(C, Mask);		DecodePSHUFBMask(C, Mask);
if (!Mask.empty())		if (!Mask.empty())
OutStreamer->AddComment(getShuffleComment(MI, SrcIdx, SrcIdx, Mask));		OutStreamer->AddComment(getShuffleComment(MI, SrcIdx, SrcIdx, Mask),
		!EnablePrintSchedInfo);
}		}
break;		break;
}		}

case X86::VPERMILPSrm:		case X86::VPERMILPSrm:
case X86::VPERMILPSYrm:		case X86::VPERMILPSYrm:
case X86::VPERMILPSZ128rm:		case X86::VPERMILPSZ128rm:
case X86::VPERMILPSZ128rmk:		case X86::VPERMILPSZ128rmk:
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	case X86::VPERMILPDZrmkz: {
assert(MI->getNumOperands() >= 6 &&		assert(MI->getNumOperands() >= 6 &&
"We should always have at least 6 operands!");		"We should always have at least 6 operands!");

const MachineOperand &MaskOp = MI->getOperand(MaskIdx);		const MachineOperand &MaskOp = MI->getOperand(MaskIdx);
if (auto C = getConstantFromPool(MI, MaskOp)) {		if (auto C = getConstantFromPool(MI, MaskOp)) {
SmallVector<int, 16> Mask;		SmallVector<int, 16> Mask;
DecodeVPERMILPMask(C, ElSize, Mask);		DecodeVPERMILPMask(C, ElSize, Mask);
if (!Mask.empty())		if (!Mask.empty())
OutStreamer->AddComment(getShuffleComment(MI, SrcIdx, SrcIdx, Mask));		OutStreamer->AddComment(getShuffleComment(MI, SrcIdx, SrcIdx, Mask),
		!EnablePrintSchedInfo);
}		}
break;		break;
}		}

case X86::VPERMIL2PDrm:		case X86::VPERMIL2PDrm:
case X86::VPERMIL2PSrm:		case X86::VPERMIL2PSrm:
case X86::VPERMIL2PDYrm:		case X86::VPERMIL2PDYrm:
case X86::VPERMIL2PSYrm: {		case X86::VPERMIL2PSYrm: {
Show All 13 Lines	case X86::VPERMIL2PSYrm: {
case X86::VPERMIL2PDrm: case X86::VPERMIL2PDYrm: ElSize = 64; break;		case X86::VPERMIL2PDrm: case X86::VPERMIL2PDYrm: ElSize = 64; break;
}		}

const MachineOperand &MaskOp = MI->getOperand(6);		const MachineOperand &MaskOp = MI->getOperand(6);
if (auto C = getConstantFromPool(MI, MaskOp)) {		if (auto C = getConstantFromPool(MI, MaskOp)) {
SmallVector<int, 16> Mask;		SmallVector<int, 16> Mask;
DecodeVPERMIL2PMask(C, (unsigned)CtrlOp.getImm(), ElSize, Mask);		DecodeVPERMIL2PMask(C, (unsigned)CtrlOp.getImm(), ElSize, Mask);
if (!Mask.empty())		if (!Mask.empty())
OutStreamer->AddComment(getShuffleComment(MI, 1, 2, Mask));		OutStreamer->AddComment(getShuffleComment(MI, 1, 2, Mask),
		!EnablePrintSchedInfo);
}		}
break;		break;
}		}

case X86::VPPERMrrm: {		case X86::VPPERMrrm: {
if (!OutStreamer->isVerboseAsm())		if (!OutStreamer->isVerboseAsm())
break;		break;
assert(MI->getNumOperands() >= 7 &&		assert(MI->getNumOperands() >= 7 &&
"We should always have at least 7 operands!");		"We should always have at least 7 operands!");

const MachineOperand &MaskOp = MI->getOperand(6);		const MachineOperand &MaskOp = MI->getOperand(6);
if (auto C = getConstantFromPool(MI, MaskOp)) {		if (auto C = getConstantFromPool(MI, MaskOp)) {
SmallVector<int, 16> Mask;		SmallVector<int, 16> Mask;
DecodeVPPERMMask(C, Mask);		DecodeVPPERMMask(C, Mask);
if (!Mask.empty())		if (!Mask.empty())
OutStreamer->AddComment(getShuffleComment(MI, 1, 2, Mask));		OutStreamer->AddComment(getShuffleComment(MI, 1, 2, Mask),
		!EnablePrintSchedInfo);
}		}
break;		break;
}		}

#define MOV_CASE(Prefix, Suffix) \		#define MOV_CASE(Prefix, Suffix) \
case X86::Prefix##MOVAPD##Suffix##rm: \		case X86::Prefix##MOVAPD##Suffix##rm: \
case X86::Prefix##MOVAPS##Suffix##rm: \		case X86::Prefix##MOVAPS##Suffix##rm: \
case X86::Prefix##MOVUPD##Suffix##rm: \		case X86::Prefix##MOVUPD##Suffix##rm: \
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	if (auto C = getConstantFromPool(MI, MI->getOperand(4))) {
else if (CDS->getElementType()->isFloatTy())		else if (CDS->getElementType()->isFloatTy())
CS << CDS->getElementAsFloat(i);		CS << CDS->getElementAsFloat(i);
else if (CDS->getElementType()->isDoubleTy())		else if (CDS->getElementType()->isDoubleTy())
CS << CDS->getElementAsDouble(i);		CS << CDS->getElementAsDouble(i);
else		else
CS << "?";		CS << "?";
}		}
CS << "]";		CS << "]";
OutStreamer->AddComment(CS.str());		OutStreamer->AddComment(CS.str(), !EnablePrintSchedInfo);
} else if (auto *CV = dyn_cast<ConstantVector>(C)) {		} else if (auto *CV = dyn_cast<ConstantVector>(C)) {
CS << "<";		CS << "<";
for (int i = 0, NumOperands = CV->getNumOperands(); i < NumOperands; ++i) {		for (int i = 0, NumOperands = CV->getNumOperands(); i < NumOperands; ++i) {
if (i != 0)		if (i != 0)
CS << ",";		CS << ",";
Constant *COp = CV->getOperand(i);		Constant *COp = CV->getOperand(i);
if (isa<UndefValue>(COp)) {		if (isa<UndefValue>(COp)) {
CS << "u";		CS << "u";
Show All 15 Lines	if (auto C = getConstantFromPool(MI, MI->getOperand(4))) {
SmallString<32> Str;		SmallString<32> Str;
CF->getValueAPF().toString(Str);		CF->getValueAPF().toString(Str);
CS << Str;		CS << Str;
} else {		} else {
CS << "?";		CS << "?";
}		}
}		}
CS << ">";		CS << ">";
OutStreamer->AddComment(CS.str());		OutStreamer->AddComment(CS.str(), !EnablePrintSchedInfo);
}		}
}		}
break;		break;
}		}

MCInst TmpInst;		MCInst TmpInst;
MCInstLowering.Lower(MI, TmpInst);		MCInstLowering.Lower(MI, TmpInst);

Show All 17 Lines

lib/Target/X86/X86Subtarget.h

Show First 20 Lines • Show All 618 Lines • ▼ Show 20 Lines	public:

/// This function returns true if the target has sincos() routine in its		/// This function returns true if the target has sincos() routine in its
/// compiler runtime or math libraries.		/// compiler runtime or math libraries.
bool hasSinCos() const;		bool hasSinCos() const;

/// Enable the MachineScheduler pass for all X86 subtargets.		/// Enable the MachineScheduler pass for all X86 subtargets.
bool enableMachineScheduler() const override { return true; }		bool enableMachineScheduler() const override { return true; }

		// TODO: in fact it's true but we keep false to avoid massive test changes
		hfinkelUnsubmitted Not Done Reply Inline Actions Put the false and the TODO here. hfinkel: Put the false and the TODO here.
		hfinkelUnsubmitted Not Done Reply Inline Actions I'd just say: // TODO: Update the regression tests and return true. hfinkel: I'd just say: // TODO: Update the regression tests and return true.
		bool supportPrintSchedInfo() const override { return false; }

bool enableEarlyIfConversion() const override;		bool enableEarlyIfConversion() const override;

		RKSimonUnsubmitted Not Done Reply Inline Actions newline RKSimon: newline
/// Return the instruction itineraries based on the subtarget selection.		/// Return the instruction itineraries based on the subtarget selection.
const InstrItineraryData *getInstrItineraryData() const override {		const InstrItineraryData *getInstrItineraryData() const override {
return &InstrItins;		return &InstrItins;
}		}

AntiDepBreakMode getAntiDepBreakMode() const override {		AntiDepBreakMode getAntiDepBreakMode() const override {
return TargetSubtargetInfo::ANTIDEP_CRITICAL;		return TargetSubtargetInfo::ANTIDEP_CRITICAL;
}		}
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_X86_X86SUBTARGET_H		#endif // LLVM_LIB_TARGET_X86_X86SUBTARGET_H

test/CodeGen/X86/recip-fastmath.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE --check-prefix=SSE-RECIP			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE --check-prefix=SSE-RECIP
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=AVX-RECIP			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=AVX-RECIP
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx,+fma \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=FMA-RECIP			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx,+fma \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=FMA-RECIP
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=btver2 \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=BTVER2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=btver2 -print-schedule \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=BTVER2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=sandybridge\| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=SANDY			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=sandybridge -print-schedule \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=SANDY
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=haswell \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=HASWELL			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=haswell -print-schedule \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=HASWELL
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=haswell -mattr=-fma \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=HASWELL-NO-FMA			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=haswell -mattr=-fma \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=HASWELL-NO-FMA
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=knl \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=AVX512 --check-prefix=KNL			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=knl -print-schedule \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=AVX512 --check-prefix=KNL
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=skx \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=AVX512 --check-prefix=SKX			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=skx -print-schedule \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=AVX512 --check-prefix=SKX

	; If the target's divss/divps instructions are substantially			; If the target's divss/divps instructions are substantially
	; slower than rcpss/rcpps with a Newton-Raphson refinement,			; slower than rcpss/rcpps with a Newton-Raphson refinement,
	; we should generate the estimate sequence.			; we should generate the estimate sequence.

	; See PR21385 ( http://llvm.org/bugs/show_bug.cgi?id=21385 )			; See PR21385 ( http://llvm.org/bugs/show_bug.cgi?id=21385 )
	; for details about the accuracy, speed, and implementation			; for details about the accuracy, speed, and implementation
	; differences of x86 reciprocal estimates.			; differences of x86 reciprocal estimates.

	define float @f32_no_estimate(float %x) #0 {			define float @f32_no_estimate(float %x) #0 {
	; SSE-LABEL: f32_no_estimate:			; SSE-LABEL: f32_no_estimate:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; SSE-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
	; SSE-NEXT: divss %xmm0, %xmm1			; SSE-NEXT: divss %xmm0, %xmm1
	; SSE-NEXT: movaps %xmm1, %xmm0			; SSE-NEXT: movaps %xmm1, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: f32_no_estimate:			; AVX-RECIP-LABEL: f32_no_estimate:
	; AVX: # BB#0:			; AVX-RECIP: # BB#0:
	; AVX-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero			; AVX-RECIP-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
	; AVX-NEXT: vdivss %xmm0, %xmm1, %xmm0			; AVX-RECIP-NEXT: vdivss %xmm0, %xmm1, %xmm0
	; AVX-NEXT: retq			; AVX-RECIP-NEXT: retq
				;
				; FMA-RECIP-LABEL: f32_no_estimate:
				; FMA-RECIP: # BB#0:
				; FMA-RECIP-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; FMA-RECIP-NEXT: vdivss %xmm0, %xmm1, %xmm0
				; FMA-RECIP-NEXT: retq
				;
				; BTVER2-LABEL: f32_no_estimate:
				; BTVER2: # BB#0:
				; BTVER2-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero sched: [5:1.00]
				; BTVER2-NEXT: vdivss %xmm0, %xmm1, %xmm0 # sched: [19:19.00]
				; BTVER2-NEXT: retq # sched: [4:1.00]
				;
				; SANDY-LABEL: f32_no_estimate:
				; SANDY: # BB#0:
				; SANDY-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero sched: [4:0.50]
				; SANDY-NEXT: vdivss %xmm0, %xmm1, %xmm0 # sched: [12:1.00]
				; SANDY-NEXT: retq # sched: [5:1.00]
				;
				; HASWELL-LABEL: f32_no_estimate:
				; HASWELL: # BB#0:
				; HASWELL-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero sched: [4:0.50]
				; HASWELL-NEXT: vdivss %xmm0, %xmm1, %xmm0 # sched: [12:1.00]
				; HASWELL-NEXT: retq # sched: [1:1.00]
				;
				; HASWELL-NO-FMA-LABEL: f32_no_estimate:
				; HASWELL-NO-FMA: # BB#0:
				; HASWELL-NO-FMA-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; HASWELL-NO-FMA-NEXT: vdivss %xmm0, %xmm1, %xmm0
				; HASWELL-NO-FMA-NEXT: retq
				;
				; AVX512-LABEL: f32_no_estimate:
				; AVX512: # BB#0:
				; AVX512-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero sched: [4:0.50]
				; AVX512-NEXT: vdivss %xmm0, %xmm1, %xmm0 # sched: [12:1.00]
				; AVX512-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast float 1.0, %x			%div = fdiv fast float 1.0, %x
	ret float %div			ret float %div
	}			}

	define float @f32_one_step(float %x) #1 {			define float @f32_one_step(float %x) #1 {
	; SSE-LABEL: f32_one_step:			; SSE-LABEL: f32_one_step:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: rcpss %xmm0, %xmm2			; SSE-NEXT: rcpss %xmm0, %xmm2
	Show All 19 Lines
	; FMA-RECIP: # BB#0:			; FMA-RECIP: # BB#0:
	; FMA-RECIP-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; FMA-RECIP-NEXT: vrcpss %xmm0, %xmm0, %xmm1
	; FMA-RECIP-NEXT: vfnmadd213ss {{.*}}(%rip), %xmm1, %xmm0			; FMA-RECIP-NEXT: vfnmadd213ss {{.*}}(%rip), %xmm1, %xmm0
	; FMA-RECIP-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm0			; FMA-RECIP-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: f32_one_step:			; BTVER2-LABEL: f32_one_step:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero			; BTVER2-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero sched: [5:1.00]
	; BTVER2-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; BTVER2-NEXT: vrcpss %xmm0, %xmm0, %xmm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulss %xmm1, %xmm0, %xmm0			; BTVER2-NEXT: vmulss %xmm1, %xmm0, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubss %xmm0, %xmm2, %xmm0			; BTVER2-NEXT: vsubss %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulss %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vmulss %xmm0, %xmm1, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddss %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vaddss %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: f32_one_step:			; SANDY-LABEL: f32_one_step:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; SANDY-NEXT: vrcpss %xmm0, %xmm0, %xmm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulss %xmm1, %xmm0, %xmm0			; SANDY-NEXT: vmulss %xmm1, %xmm0, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero			; SANDY-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero sched: [4:0.50]
	; SANDY-NEXT: vsubss %xmm0, %xmm2, %xmm0			; SANDY-NEXT: vsubss %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulss %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vmulss %xmm0, %xmm1, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vaddss %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vaddss %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: f32_one_step:			; HASWELL-LABEL: f32_one_step:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; HASWELL-NEXT: vrcpss %xmm0, %xmm0, %xmm1 # sched: [5:1.00]
	; HASWELL-NEXT: vfnmadd213ss {{.*}}(%rip), %xmm1, %xmm0			; HASWELL-NEXT: vfnmadd213ss {{.*}}(%rip), %xmm1, %xmm0
	; HASWELL-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm0			; HASWELL-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm0
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: f32_one_step:			; HASWELL-NO-FMA-LABEL: f32_one_step:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; HASWELL-NO-FMA-NEXT: vrcpss %xmm0, %xmm0, %xmm1
	; HASWELL-NO-FMA-NEXT: vmulss %xmm1, %xmm0, %xmm0			; HASWELL-NO-FMA-NEXT: vmulss %xmm1, %xmm0, %xmm0
	; HASWELL-NO-FMA-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero			; HASWELL-NO-FMA-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
	; HASWELL-NO-FMA-NEXT: vsubss %xmm0, %xmm2, %xmm0			; HASWELL-NO-FMA-NEXT: vsubss %xmm0, %xmm2, %xmm0
	; HASWELL-NO-FMA-NEXT: vmulss %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vmulss %xmm0, %xmm1, %xmm0
	; HASWELL-NO-FMA-NEXT: vaddss %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vaddss %xmm0, %xmm1, %xmm0
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq
	;			;
	; AVX512-LABEL: f32_one_step:			; AVX512-LABEL: f32_one_step:
	; AVX512: # BB#0:			; AVX512: # BB#0:
	; AVX512-NEXT: vrcp14ss %xmm0, %xmm0, %xmm1			; AVX512-NEXT: vrcp14ss %xmm0, %xmm0, %xmm1
	; AVX512-NEXT: vfnmadd213ss {{.*}}(%rip), %xmm1, %xmm0			; AVX512-NEXT: vfnmadd213ss {{.*}}(%rip), %xmm1, %xmm0
	; AVX512-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm0			; AVX512-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast float 1.0, %x			%div = fdiv fast float 1.0, %x
	ret float %div			ret float %div
	}			}

	define float @f32_two_step(float %x) #2 {			define float @f32_two_step(float %x) #2 {
	; SSE-LABEL: f32_two_step:			; SSE-LABEL: f32_two_step:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: rcpss %xmm0, %xmm2			; SSE-NEXT: rcpss %xmm0, %xmm2
	Show All 33 Lines
	; FMA-RECIP-NEXT: vfnmadd213ss %xmm2, %xmm0, %xmm3			; FMA-RECIP-NEXT: vfnmadd213ss %xmm2, %xmm0, %xmm3
	; FMA-RECIP-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm3			; FMA-RECIP-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm3
	; FMA-RECIP-NEXT: vfnmadd213ss %xmm2, %xmm3, %xmm0			; FMA-RECIP-NEXT: vfnmadd213ss %xmm2, %xmm3, %xmm0
	; FMA-RECIP-NEXT: vfmadd132ss %xmm3, %xmm3, %xmm0			; FMA-RECIP-NEXT: vfmadd132ss %xmm3, %xmm3, %xmm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: f32_two_step:			; BTVER2-LABEL: f32_two_step:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero			; BTVER2-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero sched: [5:1.00]
	; BTVER2-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; BTVER2-NEXT: vrcpss %xmm0, %xmm0, %xmm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulss %xmm1, %xmm0, %xmm2			; BTVER2-NEXT: vmulss %xmm1, %xmm0, %xmm2 # sched: [2:1.00]
	; BTVER2-NEXT: vsubss %xmm2, %xmm3, %xmm2			; BTVER2-NEXT: vsubss %xmm2, %xmm3, %xmm2 # sched: [3:1.00]
	; BTVER2-NEXT: vmulss %xmm2, %xmm1, %xmm2			; BTVER2-NEXT: vmulss %xmm2, %xmm1, %xmm2 # sched: [2:1.00]
	; BTVER2-NEXT: vaddss %xmm2, %xmm1, %xmm1			; BTVER2-NEXT: vaddss %xmm2, %xmm1, %xmm1 # sched: [3:1.00]
	; BTVER2-NEXT: vmulss %xmm1, %xmm0, %xmm0			; BTVER2-NEXT: vmulss %xmm1, %xmm0, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubss %xmm0, %xmm3, %xmm0			; BTVER2-NEXT: vsubss %xmm0, %xmm3, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulss %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vmulss %xmm0, %xmm1, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddss %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vaddss %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: f32_two_step:			; SANDY-LABEL: f32_two_step:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; SANDY-NEXT: vrcpss %xmm0, %xmm0, %xmm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulss %xmm1, %xmm0, %xmm2			; SANDY-NEXT: vmulss %xmm1, %xmm0, %xmm2 # sched: [5:1.00]
	; SANDY-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero			; SANDY-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero sched: [4:0.50]
	; SANDY-NEXT: vsubss %xmm2, %xmm3, %xmm2			; SANDY-NEXT: vsubss %xmm2, %xmm3, %xmm2 # sched: [3:1.00]
	; SANDY-NEXT: vmulss %xmm2, %xmm1, %xmm2			; SANDY-NEXT: vmulss %xmm2, %xmm1, %xmm2 # sched: [5:1.00]
	; SANDY-NEXT: vaddss %xmm2, %xmm1, %xmm1			; SANDY-NEXT: vaddss %xmm2, %xmm1, %xmm1 # sched: [3:1.00]
	; SANDY-NEXT: vmulss %xmm1, %xmm0, %xmm0			; SANDY-NEXT: vmulss %xmm1, %xmm0, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vsubss %xmm0, %xmm3, %xmm0			; SANDY-NEXT: vsubss %xmm0, %xmm3, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulss %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vmulss %xmm0, %xmm1, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vaddss %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vaddss %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: f32_two_step:			; HASWELL-LABEL: f32_two_step:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; HASWELL-NEXT: vrcpss %xmm0, %xmm0, %xmm1 # sched: [5:1.00]
	; HASWELL-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero			; HASWELL-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero sched: [4:0.50]
	; HASWELL-NEXT: vmovaps %xmm1, %xmm3			; HASWELL-NEXT: vmovaps %xmm1, %xmm3 # sched: [1:1.00]
	; HASWELL-NEXT: vfnmadd213ss %xmm2, %xmm0, %xmm3			; HASWELL-NEXT: vfnmadd213ss %xmm2, %xmm0, %xmm3
	; HASWELL-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm3			; HASWELL-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm3
	; HASWELL-NEXT: vfnmadd213ss %xmm2, %xmm3, %xmm0			; HASWELL-NEXT: vfnmadd213ss %xmm2, %xmm3, %xmm0
	; HASWELL-NEXT: vfmadd132ss %xmm3, %xmm3, %xmm0			; HASWELL-NEXT: vfmadd132ss %xmm3, %xmm3, %xmm0
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: f32_two_step:			; HASWELL-NO-FMA-LABEL: f32_two_step:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; HASWELL-NO-FMA-NEXT: vrcpss %xmm0, %xmm0, %xmm1
	; HASWELL-NO-FMA-NEXT: vmulss %xmm1, %xmm0, %xmm2			; HASWELL-NO-FMA-NEXT: vmulss %xmm1, %xmm0, %xmm2
	; HASWELL-NO-FMA-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero			; HASWELL-NO-FMA-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero
	; HASWELL-NO-FMA-NEXT: vsubss %xmm2, %xmm3, %xmm2			; HASWELL-NO-FMA-NEXT: vsubss %xmm2, %xmm3, %xmm2
	; HASWELL-NO-FMA-NEXT: vmulss %xmm2, %xmm1, %xmm2			; HASWELL-NO-FMA-NEXT: vmulss %xmm2, %xmm1, %xmm2
	; HASWELL-NO-FMA-NEXT: vaddss %xmm2, %xmm1, %xmm1			; HASWELL-NO-FMA-NEXT: vaddss %xmm2, %xmm1, %xmm1
	; HASWELL-NO-FMA-NEXT: vmulss %xmm1, %xmm0, %xmm0			; HASWELL-NO-FMA-NEXT: vmulss %xmm1, %xmm0, %xmm0
	; HASWELL-NO-FMA-NEXT: vsubss %xmm0, %xmm3, %xmm0			; HASWELL-NO-FMA-NEXT: vsubss %xmm0, %xmm3, %xmm0
	; HASWELL-NO-FMA-NEXT: vmulss %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vmulss %xmm0, %xmm1, %xmm0
	; HASWELL-NO-FMA-NEXT: vaddss %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vaddss %xmm0, %xmm1, %xmm0
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq
	;			;
	; AVX512-LABEL: f32_two_step:			; AVX512-LABEL: f32_two_step:
	; AVX512: # BB#0:			; AVX512: # BB#0:
	; AVX512-NEXT: vrcp14ss %xmm0, %xmm0, %xmm1			; AVX512-NEXT: vrcp14ss %xmm0, %xmm0, %xmm1
	; AVX512-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero			; AVX512-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero sched: [4:0.50]
	; AVX512-NEXT: vmovaps %xmm1, %xmm3			; AVX512-NEXT: vmovaps %xmm1, %xmm3 # sched: [1:1.00]
	; AVX512-NEXT: vfnmadd213ss %xmm2, %xmm0, %xmm3			; AVX512-NEXT: vfnmadd213ss %xmm2, %xmm0, %xmm3
	; AVX512-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm3			; AVX512-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm3
	; AVX512-NEXT: vfnmadd213ss %xmm2, %xmm3, %xmm0			; AVX512-NEXT: vfnmadd213ss %xmm2, %xmm3, %xmm0
	; AVX512-NEXT: vfmadd132ss %xmm3, %xmm3, %xmm0			; AVX512-NEXT: vfmadd132ss %xmm3, %xmm3, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast float 1.0, %x			%div = fdiv fast float 1.0, %x
	ret float %div			ret float %div
	}			}

	define <4 x float> @v4f32_no_estimate(<4 x float> %x) #0 {			define <4 x float> @v4f32_no_estimate(<4 x float> %x) #0 {
	; SSE-LABEL: v4f32_no_estimate:			; SSE-LABEL: v4f32_no_estimate:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: movaps {{.*#+}} xmm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; SSE-NEXT: movaps {{.*#+}} xmm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]
	Show All 10 Lines
	; FMA-RECIP-LABEL: v4f32_no_estimate:			; FMA-RECIP-LABEL: v4f32_no_estimate:
	; FMA-RECIP: # BB#0:			; FMA-RECIP: # BB#0:
	; FMA-RECIP-NEXT: vmovaps {{.*#+}} xmm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; FMA-RECIP-NEXT: vmovaps {{.*#+}} xmm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]
	; FMA-RECIP-NEXT: vdivps %xmm0, %xmm1, %xmm0			; FMA-RECIP-NEXT: vdivps %xmm0, %xmm1, %xmm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v4f32_no_estimate:			; BTVER2-LABEL: v4f32_no_estimate:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} xmm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; BTVER2-NEXT: vmovaps {{.*#+}} xmm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]
	; BTVER2-NEXT: vdivps %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vdivps %xmm0, %xmm1, %xmm0 # sched: [19:19.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v4f32_no_estimate:			; SANDY-LABEL: v4f32_no_estimate:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vmovaps {{.*#+}} xmm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; SANDY-NEXT: vmovaps {{.*#+}} xmm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vdivps %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vdivps %xmm0, %xmm1, %xmm0 # sched: [12:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: v4f32_no_estimate:			; HASWELL-LABEL: v4f32_no_estimate:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %xmm1			; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %xmm1 # sched: [4:0.50]
	; HASWELL-NEXT: vdivps %xmm0, %xmm1, %xmm0			; HASWELL-NEXT: vdivps %xmm0, %xmm1, %xmm0 # sched: [12:1.00]
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: v4f32_no_estimate:			; HASWELL-NO-FMA-LABEL: v4f32_no_estimate:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %xmm1			; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %xmm1
	; HASWELL-NO-FMA-NEXT: vdivps %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vdivps %xmm0, %xmm1, %xmm0
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq
	;			;
	; AVX512-LABEL: v4f32_no_estimate:			; AVX512-LABEL: v4f32_no_estimate:
	; AVX512: # BB#0:			; AVX512: # BB#0:
	; AVX512-NEXT: vbroadcastss {{.*}}(%rip), %xmm1			; AVX512-NEXT: vbroadcastss {{.*}}(%rip), %xmm1 # sched: [4:0.50]
	; AVX512-NEXT: vdivps %xmm0, %xmm1, %xmm0			; AVX512-NEXT: vdivps %xmm0, %xmm1, %xmm0 # sched: [12:1.00]
	; AVX512-NEXT: retq			; AVX512-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %x			%div = fdiv fast <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %x
	ret <4 x float> %div			ret <4 x float> %div
	}			}

	define <4 x float> @v4f32_one_step(<4 x float> %x) #1 {			define <4 x float> @v4f32_one_step(<4 x float> %x) #1 {
	; SSE-LABEL: v4f32_one_step:			; SSE-LABEL: v4f32_one_step:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: rcpps %xmm0, %xmm2			; SSE-NEXT: rcpps %xmm0, %xmm2
	Show All 19 Lines
	; FMA-RECIP: # BB#0:			; FMA-RECIP: # BB#0:
	; FMA-RECIP-NEXT: vrcpps %xmm0, %xmm1			; FMA-RECIP-NEXT: vrcpps %xmm0, %xmm1
	; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %xmm1, %xmm0			; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %xmm1, %xmm0
	; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0			; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v4f32_one_step:			; BTVER2-LABEL: v4f32_one_step:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; BTVER2-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]
	; BTVER2-NEXT: vrcpps %xmm0, %xmm1			; BTVER2-NEXT: vrcpps %xmm0, %xmm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0			; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %xmm0, %xmm2, %xmm0			; BTVER2-NEXT: vsubps %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v4f32_one_step:			; SANDY-LABEL: v4f32_one_step:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %xmm0, %xmm1			; SANDY-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulps %xmm1, %xmm0, %xmm0			; SANDY-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; SANDY-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vsubps %xmm0, %xmm2, %xmm0			; SANDY-NEXT: vsubps %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulps %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vaddps %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: v4f32_one_step:			; HASWELL-LABEL: v4f32_one_step:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrcpps %xmm0, %xmm1			; HASWELL-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2			; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2 # sched: [4:0.50]
	; HASWELL-NEXT: vfnmadd213ps %xmm2, %xmm1, %xmm0			; HASWELL-NEXT: vfnmadd213ps %xmm2, %xmm1, %xmm0
	; HASWELL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0			; HASWELL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: v4f32_one_step:			; HASWELL-NO-FMA-LABEL: v4f32_one_step:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vrcpps %xmm0, %xmm1			; HASWELL-NO-FMA-NEXT: vrcpps %xmm0, %xmm1
	; HASWELL-NO-FMA-NEXT: vmulps %xmm1, %xmm0, %xmm0			; HASWELL-NO-FMA-NEXT: vmulps %xmm1, %xmm0, %xmm0
	; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %xmm2			; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %xmm2
	; HASWELL-NO-FMA-NEXT: vsubps %xmm0, %xmm2, %xmm0			; HASWELL-NO-FMA-NEXT: vsubps %xmm0, %xmm2, %xmm0
	; HASWELL-NO-FMA-NEXT: vmulps %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vmulps %xmm0, %xmm1, %xmm0
	; HASWELL-NO-FMA-NEXT: vaddps %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq
	;			;
	; KNL-LABEL: v4f32_one_step:			; KNL-LABEL: v4f32_one_step:
	; KNL: # BB#0:			; KNL: # BB#0:
	; KNL-NEXT: vrcpps %xmm0, %xmm1			; KNL-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	; KNL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2			; KNL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2 # sched: [4:0.50]
	; KNL-NEXT: vfnmadd213ps %xmm2, %xmm1, %xmm0			; KNL-NEXT: vfnmadd213ps %xmm2, %xmm1, %xmm0
	; KNL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0			; KNL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0
	; KNL-NEXT: retq			; KNL-NEXT: retq # sched: [1:1.00]
	;			;
	; SKX-LABEL: v4f32_one_step:			; SKX-LABEL: v4f32_one_step:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vrcp14ps %xmm0, %xmm1			; SKX-NEXT: vrcp14ps %xmm0, %xmm1
	; SKX-NEXT: vfnmadd213ps {{.*}}(%rip){1to4}, %xmm1, %xmm0			; SKX-NEXT: vfnmadd213ps {{.*}}(%rip){1to4}, %xmm1, %xmm0
	; SKX-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0			; SKX-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0
	; SKX-NEXT: retq			; SKX-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %x			%div = fdiv fast <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %x
	ret <4 x float> %div			ret <4 x float> %div
	}			}

	define <4 x float> @v4f32_two_step(<4 x float> %x) #2 {			define <4 x float> @v4f32_two_step(<4 x float> %x) #2 {
	; SSE-LABEL: v4f32_two_step:			; SSE-LABEL: v4f32_two_step:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: rcpps %xmm0, %xmm2			; SSE-NEXT: rcpps %xmm0, %xmm2
	Show All 33 Lines
	; FMA-RECIP-NEXT: vfnmadd213ps %xmm2, %xmm0, %xmm3			; FMA-RECIP-NEXT: vfnmadd213ps %xmm2, %xmm0, %xmm3
	; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3			; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3
	; FMA-RECIP-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0			; FMA-RECIP-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0
	; FMA-RECIP-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0			; FMA-RECIP-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v4f32_two_step:			; BTVER2-LABEL: v4f32_two_step:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} xmm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; BTVER2-NEXT: vmovaps {{.*#+}} xmm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]
	; BTVER2-NEXT: vrcpps %xmm0, %xmm1			; BTVER2-NEXT: vrcpps %xmm0, %xmm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm2			; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm2 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %xmm2, %xmm3, %xmm2			; BTVER2-NEXT: vsubps %xmm2, %xmm3, %xmm2 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %xmm2, %xmm1, %xmm2			; BTVER2-NEXT: vmulps %xmm2, %xmm1, %xmm2 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %xmm2, %xmm1, %xmm1			; BTVER2-NEXT: vaddps %xmm2, %xmm1, %xmm1 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0			; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %xmm0, %xmm3, %xmm0			; BTVER2-NEXT: vsubps %xmm0, %xmm3, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v4f32_two_step:			; SANDY-LABEL: v4f32_two_step:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %xmm0, %xmm1			; SANDY-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulps %xmm1, %xmm0, %xmm2			; SANDY-NEXT: vmulps %xmm1, %xmm0, %xmm2 # sched: [5:1.00]
	; SANDY-NEXT: vmovaps {{.*#+}} xmm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; SANDY-NEXT: vmovaps {{.*#+}} xmm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vsubps %xmm2, %xmm3, %xmm2			; SANDY-NEXT: vsubps %xmm2, %xmm3, %xmm2 # sched: [3:1.00]
	; SANDY-NEXT: vmulps %xmm2, %xmm1, %xmm2			; SANDY-NEXT: vmulps %xmm2, %xmm1, %xmm2 # sched: [5:1.00]
	; SANDY-NEXT: vaddps %xmm2, %xmm1, %xmm1			; SANDY-NEXT: vaddps %xmm2, %xmm1, %xmm1 # sched: [3:1.00]
	; SANDY-NEXT: vmulps %xmm1, %xmm0, %xmm0			; SANDY-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vsubps %xmm0, %xmm3, %xmm0			; SANDY-NEXT: vsubps %xmm0, %xmm3, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulps %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vaddps %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: v4f32_two_step:			; HASWELL-LABEL: v4f32_two_step:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrcpps %xmm0, %xmm1			; HASWELL-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2			; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2 # sched: [4:0.50]
	; HASWELL-NEXT: vmovaps %xmm1, %xmm3			; HASWELL-NEXT: vmovaps %xmm1, %xmm3 # sched: [1:1.00]
	; HASWELL-NEXT: vfnmadd213ps %xmm2, %xmm0, %xmm3			; HASWELL-NEXT: vfnmadd213ps %xmm2, %xmm0, %xmm3
	; HASWELL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3			; HASWELL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3
	; HASWELL-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0			; HASWELL-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0
	; HASWELL-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0			; HASWELL-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: v4f32_two_step:			; HASWELL-NO-FMA-LABEL: v4f32_two_step:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vrcpps %xmm0, %xmm1			; HASWELL-NO-FMA-NEXT: vrcpps %xmm0, %xmm1
	; HASWELL-NO-FMA-NEXT: vmulps %xmm1, %xmm0, %xmm2			; HASWELL-NO-FMA-NEXT: vmulps %xmm1, %xmm0, %xmm2
	; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %xmm3			; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %xmm3
	; HASWELL-NO-FMA-NEXT: vsubps %xmm2, %xmm3, %xmm2			; HASWELL-NO-FMA-NEXT: vsubps %xmm2, %xmm3, %xmm2
	; HASWELL-NO-FMA-NEXT: vmulps %xmm2, %xmm1, %xmm2			; HASWELL-NO-FMA-NEXT: vmulps %xmm2, %xmm1, %xmm2
	; HASWELL-NO-FMA-NEXT: vaddps %xmm2, %xmm1, %xmm1			; HASWELL-NO-FMA-NEXT: vaddps %xmm2, %xmm1, %xmm1
	; HASWELL-NO-FMA-NEXT: vmulps %xmm1, %xmm0, %xmm0			; HASWELL-NO-FMA-NEXT: vmulps %xmm1, %xmm0, %xmm0
	; HASWELL-NO-FMA-NEXT: vsubps %xmm0, %xmm3, %xmm0			; HASWELL-NO-FMA-NEXT: vsubps %xmm0, %xmm3, %xmm0
	; HASWELL-NO-FMA-NEXT: vmulps %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vmulps %xmm0, %xmm1, %xmm0
	; HASWELL-NO-FMA-NEXT: vaddps %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq
	;			;
	; KNL-LABEL: v4f32_two_step:			; KNL-LABEL: v4f32_two_step:
	; KNL: # BB#0:			; KNL: # BB#0:
	; KNL-NEXT: vrcpps %xmm0, %xmm1			; KNL-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	; KNL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2			; KNL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2 # sched: [4:0.50]
	; KNL-NEXT: vmovaps %xmm1, %xmm3			; KNL-NEXT: vmovaps %xmm1, %xmm3 # sched: [1:1.00]
	; KNL-NEXT: vfnmadd213ps %xmm2, %xmm0, %xmm3			; KNL-NEXT: vfnmadd213ps %xmm2, %xmm0, %xmm3
	; KNL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3			; KNL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3
	; KNL-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0			; KNL-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0
	; KNL-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0			; KNL-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0
	; KNL-NEXT: retq			; KNL-NEXT: retq # sched: [1:1.00]
	;			;
	; SKX-LABEL: v4f32_two_step:			; SKX-LABEL: v4f32_two_step:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vrcp14ps %xmm0, %xmm1			; SKX-NEXT: vrcp14ps %xmm0, %xmm1
	; SKX-NEXT: vbroadcastss {{.*}}(%rip), %xmm2			; SKX-NEXT: vbroadcastss {{.*}}(%rip), %xmm2 # sched: [4:0.50]
	; SKX-NEXT: vmovaps %xmm1, %xmm3			; SKX-NEXT: vmovaps %xmm1, %xmm3 # sched: [1:1.00]
	; SKX-NEXT: vfnmadd213ps %xmm2, %xmm0, %xmm3			; SKX-NEXT: vfnmadd213ps %xmm2, %xmm0, %xmm3
	; SKX-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3			; SKX-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3
	; SKX-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0			; SKX-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0
	; SKX-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0			; SKX-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0
	; SKX-NEXT: retq			; SKX-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %x			%div = fdiv fast <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %x
	ret <4 x float> %div			ret <4 x float> %div
	}			}

	define <8 x float> @v8f32_no_estimate(<8 x float> %x) #0 {			define <8 x float> @v8f32_no_estimate(<8 x float> %x) #0 {
	; SSE-LABEL: v8f32_no_estimate:			; SSE-LABEL: v8f32_no_estimate:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: movaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; SSE-NEXT: movaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]
	Show All 13 Lines
	; FMA-RECIP-LABEL: v8f32_no_estimate:			; FMA-RECIP-LABEL: v8f32_no_estimate:
	; FMA-RECIP: # BB#0:			; FMA-RECIP: # BB#0:
	; FMA-RECIP-NEXT: vmovaps {{.*#+}} ymm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; FMA-RECIP-NEXT: vmovaps {{.*#+}} ymm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]
	; FMA-RECIP-NEXT: vdivps %ymm0, %ymm1, %ymm0			; FMA-RECIP-NEXT: vdivps %ymm0, %ymm1, %ymm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v8f32_no_estimate:			; BTVER2-LABEL: v8f32_no_estimate:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} ymm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; BTVER2-NEXT: vmovaps {{.*#+}} ymm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]
	; BTVER2-NEXT: vdivps %ymm0, %ymm1, %ymm0			; BTVER2-NEXT: vdivps %ymm0, %ymm1, %ymm0 # sched: [19:19.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v8f32_no_estimate:			; SANDY-LABEL: v8f32_no_estimate:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vmovaps {{.*#+}} ymm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; SANDY-NEXT: vmovaps {{.*#+}} ymm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vdivps %ymm0, %ymm1, %ymm0			; SANDY-NEXT: vdivps %ymm0, %ymm1, %ymm0 # sched: [12:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: v8f32_no_estimate:			; HASWELL-LABEL: v8f32_no_estimate:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %ymm1			; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %ymm1 # sched: [5:1.00]
	; HASWELL-NEXT: vdivps %ymm0, %ymm1, %ymm0			; HASWELL-NEXT: vdivps %ymm0, %ymm1, %ymm0 # sched: [19:2.00]
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: v8f32_no_estimate:			; HASWELL-NO-FMA-LABEL: v8f32_no_estimate:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %ymm1			; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %ymm1
	; HASWELL-NO-FMA-NEXT: vdivps %ymm0, %ymm1, %ymm0			; HASWELL-NO-FMA-NEXT: vdivps %ymm0, %ymm1, %ymm0
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq
	;			;
	; AVX512-LABEL: v8f32_no_estimate:			; AVX512-LABEL: v8f32_no_estimate:
	; AVX512: # BB#0:			; AVX512: # BB#0:
	; AVX512-NEXT: vbroadcastss {{.*}}(%rip), %ymm1			; AVX512-NEXT: vbroadcastss {{.*}}(%rip), %ymm1 # sched: [5:1.00]
	; AVX512-NEXT: vdivps %ymm0, %ymm1, %ymm0			; AVX512-NEXT: vdivps %ymm0, %ymm1, %ymm0 # sched: [19:2.00]
	; AVX512-NEXT: retq			; AVX512-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %x			%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %x
	ret <8 x float> %div			ret <8 x float> %div
	}			}

	define <8 x float> @v8f32_one_step(<8 x float> %x) #1 {			define <8 x float> @v8f32_one_step(<8 x float> %x) #1 {
	; SSE-LABEL: v8f32_one_step:			; SSE-LABEL: v8f32_one_step:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: rcpps %xmm0, %xmm4			; SSE-NEXT: rcpps %xmm0, %xmm4
	Show All 26 Lines
	; FMA-RECIP: # BB#0:			; FMA-RECIP: # BB#0:
	; FMA-RECIP-NEXT: vrcpps %ymm0, %ymm1			; FMA-RECIP-NEXT: vrcpps %ymm0, %ymm1
	; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %ymm1, %ymm0			; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %ymm1, %ymm0
	; FMA-RECIP-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0			; FMA-RECIP-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v8f32_one_step:			; BTVER2-LABEL: v8f32_one_step:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; BTVER2-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]
	; BTVER2-NEXT: vrcpps %ymm0, %ymm1			; BTVER2-NEXT: vrcpps %ymm0, %ymm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0			; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %ymm0, %ymm2, %ymm0			; BTVER2-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0			; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0			; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v8f32_one_step:			; SANDY-LABEL: v8f32_one_step:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %ymm0, %ymm1			; SANDY-NEXT: vrcpps %ymm0, %ymm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm0			; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; SANDY-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vsubps %ymm0, %ymm2, %ymm0			; SANDY-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulps %ymm0, %ymm1, %ymm0			; SANDY-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: vaddps %ymm0, %ymm1, %ymm0			; SANDY-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: v8f32_one_step:			; HASWELL-LABEL: v8f32_one_step:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrcpps %ymm0, %ymm1			; HASWELL-NEXT: vrcpps %ymm0, %ymm1 # sched: [7:2.00]
	; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2			; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2 # sched: [5:1.00]
	; HASWELL-NEXT: vfnmadd213ps %ymm2, %ymm1, %ymm0			; HASWELL-NEXT: vfnmadd213ps %ymm2, %ymm1, %ymm0
	; HASWELL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0			; HASWELL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: v8f32_one_step:			; HASWELL-NO-FMA-LABEL: v8f32_one_step:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vrcpps %ymm0, %ymm1			; HASWELL-NO-FMA-NEXT: vrcpps %ymm0, %ymm1
	; HASWELL-NO-FMA-NEXT: vmulps %ymm1, %ymm0, %ymm0			; HASWELL-NO-FMA-NEXT: vmulps %ymm1, %ymm0, %ymm0
	; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %ymm2			; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %ymm2
	; HASWELL-NO-FMA-NEXT: vsubps %ymm0, %ymm2, %ymm0			; HASWELL-NO-FMA-NEXT: vsubps %ymm0, %ymm2, %ymm0
	; HASWELL-NO-FMA-NEXT: vmulps %ymm0, %ymm1, %ymm0			; HASWELL-NO-FMA-NEXT: vmulps %ymm0, %ymm1, %ymm0
	; HASWELL-NO-FMA-NEXT: vaddps %ymm0, %ymm1, %ymm0			; HASWELL-NO-FMA-NEXT: vaddps %ymm0, %ymm1, %ymm0
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq
	;			;
	; KNL-LABEL: v8f32_one_step:			; KNL-LABEL: v8f32_one_step:
	; KNL: # BB#0:			; KNL: # BB#0:
	; KNL-NEXT: vrcpps %ymm0, %ymm1			; KNL-NEXT: vrcpps %ymm0, %ymm1 # sched: [7:2.00]
	; KNL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2			; KNL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2 # sched: [5:1.00]
	; KNL-NEXT: vfnmadd213ps %ymm2, %ymm1, %ymm0			; KNL-NEXT: vfnmadd213ps %ymm2, %ymm1, %ymm0
	; KNL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0			; KNL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0
	; KNL-NEXT: retq			; KNL-NEXT: retq # sched: [1:1.00]
	;			;
	; SKX-LABEL: v8f32_one_step:			; SKX-LABEL: v8f32_one_step:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vrcp14ps %ymm0, %ymm1			; SKX-NEXT: vrcp14ps %ymm0, %ymm1
	; SKX-NEXT: vfnmadd213ps {{.*}}(%rip){1to8}, %ymm1, %ymm0			; SKX-NEXT: vfnmadd213ps {{.*}}(%rip){1to8}, %ymm1, %ymm0
	; SKX-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0			; SKX-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0
	; SKX-NEXT: retq			; SKX-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %x			%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %x
	ret <8 x float> %div			ret <8 x float> %div
	}			}

	define <8 x float> @v8f32_two_step(<8 x float> %x) #2 {			define <8 x float> @v8f32_two_step(<8 x float> %x) #2 {
	; SSE-LABEL: v8f32_two_step:			; SSE-LABEL: v8f32_two_step:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: movaps %xmm1, %xmm2			; SSE-NEXT: movaps %xmm1, %xmm2
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; FMA-RECIP-NEXT: vfnmadd213ps %ymm2, %ymm0, %ymm3			; FMA-RECIP-NEXT: vfnmadd213ps %ymm2, %ymm0, %ymm3
	; FMA-RECIP-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm3			; FMA-RECIP-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm3
	; FMA-RECIP-NEXT: vfnmadd213ps %ymm2, %ymm3, %ymm0			; FMA-RECIP-NEXT: vfnmadd213ps %ymm2, %ymm3, %ymm0
	; FMA-RECIP-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0			; FMA-RECIP-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v8f32_two_step:			; BTVER2-LABEL: v8f32_two_step:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} ymm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; BTVER2-NEXT: vmovaps {{.*#+}} ymm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]
	; BTVER2-NEXT: vrcpps %ymm0, %ymm1			; BTVER2-NEXT: vrcpps %ymm0, %ymm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm2			; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm2 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %ymm2, %ymm3, %ymm2			; BTVER2-NEXT: vsubps %ymm2, %ymm3, %ymm2 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %ymm2, %ymm1, %ymm2			; BTVER2-NEXT: vmulps %ymm2, %ymm1, %ymm2 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %ymm2, %ymm1, %ymm1			; BTVER2-NEXT: vaddps %ymm2, %ymm1, %ymm1 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0			; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %ymm0, %ymm3, %ymm0			; BTVER2-NEXT: vsubps %ymm0, %ymm3, %ymm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0			; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0			; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v8f32_two_step:			; SANDY-LABEL: v8f32_two_step:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %ymm0, %ymm1			; SANDY-NEXT: vrcpps %ymm0, %ymm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm2			; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm2 # sched: [5:1.00]
	; SANDY-NEXT: vmovaps {{.*#+}} ymm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; SANDY-NEXT: vmovaps {{.*#+}} ymm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vsubps %ymm2, %ymm3, %ymm2			; SANDY-NEXT: vsubps %ymm2, %ymm3, %ymm2 # sched: [3:1.00]
	; SANDY-NEXT: vmulps %ymm2, %ymm1, %ymm2			; SANDY-NEXT: vmulps %ymm2, %ymm1, %ymm2 # sched: [5:1.00]
	; SANDY-NEXT: vaddps %ymm2, %ymm1, %ymm1			; SANDY-NEXT: vaddps %ymm2, %ymm1, %ymm1 # sched: [3:1.00]
	; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm0			; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: vsubps %ymm0, %ymm3, %ymm0			; SANDY-NEXT: vsubps %ymm0, %ymm3, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulps %ymm0, %ymm1, %ymm0			; SANDY-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: vaddps %ymm0, %ymm1, %ymm0			; SANDY-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: v8f32_two_step:			; HASWELL-LABEL: v8f32_two_step:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrcpps %ymm0, %ymm1			; HASWELL-NEXT: vrcpps %ymm0, %ymm1 # sched: [7:2.00]
	; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2			; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2 # sched: [5:1.00]
	; HASWELL-NEXT: vmovaps %ymm1, %ymm3			; HASWELL-NEXT: vmovaps %ymm1, %ymm3 # sched: [1:1.00]
	; HASWELL-NEXT: vfnmadd213ps %ymm2, %ymm0, %ymm3			; HASWELL-NEXT: vfnmadd213ps %ymm2, %ymm0, %ymm3
	; HASWELL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm3			; HASWELL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm3
	; HASWELL-NEXT: vfnmadd213ps %ymm2, %ymm3, %ymm0			; HASWELL-NEXT: vfnmadd213ps %ymm2, %ymm3, %ymm0
	; HASWELL-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0			; HASWELL-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: v8f32_two_step:			; HASWELL-NO-FMA-LABEL: v8f32_two_step:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vrcpps %ymm0, %ymm1			; HASWELL-NO-FMA-NEXT: vrcpps %ymm0, %ymm1
	; HASWELL-NO-FMA-NEXT: vmulps %ymm1, %ymm0, %ymm2			; HASWELL-NO-FMA-NEXT: vmulps %ymm1, %ymm0, %ymm2
	; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %ymm3			; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %ymm3
	; HASWELL-NO-FMA-NEXT: vsubps %ymm2, %ymm3, %ymm2			; HASWELL-NO-FMA-NEXT: vsubps %ymm2, %ymm3, %ymm2
	; HASWELL-NO-FMA-NEXT: vmulps %ymm2, %ymm1, %ymm2			; HASWELL-NO-FMA-NEXT: vmulps %ymm2, %ymm1, %ymm2
	; HASWELL-NO-FMA-NEXT: vaddps %ymm2, %ymm1, %ymm1			; HASWELL-NO-FMA-NEXT: vaddps %ymm2, %ymm1, %ymm1
	; HASWELL-NO-FMA-NEXT: vmulps %ymm1, %ymm0, %ymm0			; HASWELL-NO-FMA-NEXT: vmulps %ymm1, %ymm0, %ymm0
	; HASWELL-NO-FMA-NEXT: vsubps %ymm0, %ymm3, %ymm0			; HASWELL-NO-FMA-NEXT: vsubps %ymm0, %ymm3, %ymm0
	; HASWELL-NO-FMA-NEXT: vmulps %ymm0, %ymm1, %ymm0			; HASWELL-NO-FMA-NEXT: vmulps %ymm0, %ymm1, %ymm0
	; HASWELL-NO-FMA-NEXT: vaddps %ymm0, %ymm1, %ymm0			; HASWELL-NO-FMA-NEXT: vaddps %ymm0, %ymm1, %ymm0
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq
	;			;
	; KNL-LABEL: v8f32_two_step:			; KNL-LABEL: v8f32_two_step:
	; KNL: # BB#0:			; KNL: # BB#0:
	; KNL-NEXT: vrcpps %ymm0, %ymm1			; KNL-NEXT: vrcpps %ymm0, %ymm1 # sched: [7:2.00]
	; KNL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2			; KNL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2 # sched: [5:1.00]
	; KNL-NEXT: vmovaps %ymm1, %ymm3			; KNL-NEXT: vmovaps %ymm1, %ymm3 # sched: [1:1.00]
	; KNL-NEXT: vfnmadd213ps %ymm2, %ymm0, %ymm3			; KNL-NEXT: vfnmadd213ps %ymm2, %ymm0, %ymm3
	; KNL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm3			; KNL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm3
	; KNL-NEXT: vfnmadd213ps %ymm2, %ymm3, %ymm0			; KNL-NEXT: vfnmadd213ps %ymm2, %ymm3, %ymm0
	; KNL-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0			; KNL-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0
	; KNL-NEXT: retq			; KNL-NEXT: retq # sched: [1:1.00]
	;			;
	; SKX-LABEL: v8f32_two_step:			; SKX-LABEL: v8f32_two_step:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vrcp14ps %ymm0, %ymm1			; SKX-NEXT: vrcp14ps %ymm0, %ymm1
	; SKX-NEXT: vbroadcastss {{.*}}(%rip), %ymm2			; SKX-NEXT: vbroadcastss {{.*}}(%rip), %ymm2 # sched: [5:1.00]
	; SKX-NEXT: vmovaps %ymm1, %ymm3			; SKX-NEXT: vmovaps %ymm1, %ymm3 # sched: [1:1.00]
	; SKX-NEXT: vfnmadd213ps %ymm2, %ymm0, %ymm3			; SKX-NEXT: vfnmadd213ps %ymm2, %ymm0, %ymm3
	; SKX-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm3			; SKX-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm3
	; SKX-NEXT: vfnmadd213ps %ymm2, %ymm3, %ymm0			; SKX-NEXT: vfnmadd213ps %ymm2, %ymm3, %ymm0
	; SKX-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0			; SKX-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0
	; SKX-NEXT: retq			; SKX-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %x			%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %x
	ret <8 x float> %div			ret <8 x float> %div
	}			}

	attributes #0 = { "unsafe-fp-math"="true" "reciprocal-estimates"="!divf,!vec-divf" }			attributes #0 = { "unsafe-fp-math"="true" "reciprocal-estimates"="!divf,!vec-divf" }
	attributes #1 = { "unsafe-fp-math"="true" "reciprocal-estimates"="divf,vec-divf" }			attributes #1 = { "unsafe-fp-math"="true" "reciprocal-estimates"="divf,vec-divf" }
	attributes #2 = { "unsafe-fp-math"="true" "reciprocal-estimates"="divf:2,vec-divf:2" }			attributes #2 = { "unsafe-fp-math"="true" "reciprocal-estimates"="divf:2,vec-divf:2" }

test/CodeGen/X86/recip-fastmath2.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE --check-prefix=SSE-RECIP			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 -print-schedule \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE --check-prefix=SSE-RECIP
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=AVX-RECIP			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx -print-schedule \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=AVX-RECIP
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx,+fma \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=FMA-RECIP			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx,+fma -print-schedule \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=FMA-RECIP
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=btver2 \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=BTVER2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=btver2 -print-schedule \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=BTVER2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=sandybridge\| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=SANDY			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=sandybridge -print-schedule \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=SANDY
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=haswell \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=HASWELL			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=haswell -print-schedule \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=HASWELL
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=haswell -mattr=-fma \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=HASWELL-NO-FMA			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=haswell -print-schedule -mattr=-fma \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=HASWELL-NO-FMA
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=knl \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=AVX512 --check-prefix=KNL			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=knl -print-schedule \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=AVX512 --check-prefix=KNL
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=skx \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=AVX512 --check-prefix=SKX			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=skx -print-schedule \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX --check-prefix=AVX512 --check-prefix=SKX

	; It's the extra tests coverage for recip as discussed on D26855.			; It's the extra tests coverage for recip as discussed on D26855.

	define float @f32_no_step_2(float %x) #3 {			define float @f32_no_step_2(float %x) #3 {
	; SSE-LABEL: f32_no_step_2:			; SSE-LABEL: f32_no_step_2:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: rcpss %xmm0, %xmm0			; SSE-NEXT: rcpss %xmm0, %xmm0
	; SSE-NEXT: mulss {{.*}}(%rip), %xmm0			; SSE-NEXT: mulss {{.*}}(%rip), %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
				RKSimonUnsubmitted Not Done Reply Inline Actions Where have the retq instructions gone? RKSimon: Where have the retq instructions gone?
	; AVX-RECIP-LABEL: f32_no_step_2:			; AVX-RECIP-LABEL: f32_no_step_2:
	; AVX-RECIP: # BB#0:			; AVX-RECIP: # BB#0:
	; AVX-RECIP-NEXT: vrcpss %xmm0, %xmm0, %xmm0			; AVX-RECIP-NEXT: vrcpss %xmm0, %xmm0, %xmm0
	; AVX-RECIP-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; AVX-RECIP-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; AVX-RECIP-NEXT: retq			; AVX-RECIP-NEXT: retq
	;			;
	; FMA-RECIP-LABEL: f32_no_step_2:			; FMA-RECIP-LABEL: f32_no_step_2:
	; FMA-RECIP: # BB#0:			; FMA-RECIP: # BB#0:
	; FMA-RECIP-NEXT: vrcpss %xmm0, %xmm0, %xmm0			; FMA-RECIP-NEXT: vrcpss %xmm0, %xmm0, %xmm0
	; FMA-RECIP-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; FMA-RECIP-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: f32_no_step_2:			; BTVER2-LABEL: f32_no_step_2:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vrcpss %xmm0, %xmm0, %xmm0			; BTVER2-NEXT: vrcpss %xmm0, %xmm0, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; BTVER2-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0 # sched: [7:1.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: f32_no_step_2:			; SANDY-LABEL: f32_no_step_2:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpss %xmm0, %xmm0, %xmm0			; SANDY-NEXT: vrcpss %xmm0, %xmm0, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; SANDY-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: f32_no_step_2:			; HASWELL-LABEL: f32_no_step_2:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrcpss %xmm0, %xmm0, %xmm0			; HASWELL-NEXT: vrcpss %xmm0, %xmm0, %xmm0 # sched: [5:1.00]
	; HASWELL-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; HASWELL-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:0.50]
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: f32_no_step_2:			; HASWELL-NO-FMA-LABEL: f32_no_step_2:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vrcpss %xmm0, %xmm0, %xmm0			; HASWELL-NO-FMA-NEXT: vrcpss %xmm0, %xmm0, %xmm0 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; HASWELL-NO-FMA-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:0.50]
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq # sched: [1:1.00]
	;			;
	; AVX512-LABEL: f32_no_step_2:			; AVX512-LABEL: f32_no_step_2:
	; AVX512: # BB#0:			; AVX512: # BB#0:
	; AVX512-NEXT: vrcp14ss %xmm0, %xmm0, %xmm0			; AVX512-NEXT: vrcp14ss %xmm0, %xmm0, %xmm0
	; AVX512-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; AVX512-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:0.50]
	; AVX512-NEXT: retq			; AVX512-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast float 1234.0, %x			%div = fdiv fast float 1234.0, %x
	ret float %div			ret float %div
	}			}

	define float @f32_one_step_2(float %x) #1 {			define float @f32_one_step_2(float %x) #1 {
	; SSE-LABEL: f32_one_step_2:			; SSE-LABEL: f32_one_step_2:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: rcpss %xmm0, %xmm2			; SSE-NEXT: rcpss %xmm0, %xmm2
	Show All 22 Lines
	; FMA-RECIP-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; FMA-RECIP-NEXT: vrcpss %xmm0, %xmm0, %xmm1
	; FMA-RECIP-NEXT: vfnmadd213ss {{.*}}(%rip), %xmm1, %xmm0			; FMA-RECIP-NEXT: vfnmadd213ss {{.*}}(%rip), %xmm1, %xmm0
	; FMA-RECIP-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm0			; FMA-RECIP-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm0
	; FMA-RECIP-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; FMA-RECIP-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: f32_one_step_2:			; BTVER2-LABEL: f32_one_step_2:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero			; BTVER2-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero sched: [5:1.00]
	; BTVER2-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; BTVER2-NEXT: vrcpss %xmm0, %xmm0, %xmm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulss %xmm1, %xmm0, %xmm0			; BTVER2-NEXT: vmulss %xmm1, %xmm0, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubss %xmm0, %xmm2, %xmm0			; BTVER2-NEXT: vsubss %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulss %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vmulss %xmm0, %xmm1, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddss %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vaddss %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; BTVER2-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0 # sched: [7:1.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: f32_one_step_2:			; SANDY-LABEL: f32_one_step_2:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; SANDY-NEXT: vrcpss %xmm0, %xmm0, %xmm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulss %xmm1, %xmm0, %xmm0			; SANDY-NEXT: vmulss %xmm1, %xmm0, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero			; SANDY-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero sched: [4:0.50]
	; SANDY-NEXT: vsubss %xmm0, %xmm2, %xmm0			; SANDY-NEXT: vsubss %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulss %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vmulss %xmm0, %xmm1, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vaddss %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vaddss %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; SANDY-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: f32_one_step_2:			; HASWELL-LABEL: f32_one_step_2:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; HASWELL-NEXT: vrcpss %xmm0, %xmm0, %xmm1 # sched: [5:1.00]
	; HASWELL-NEXT: vfnmadd213ss {{.*}}(%rip), %xmm1, %xmm0			; HASWELL-NEXT: vfnmadd213ss {{.*}}(%rip), %xmm1, %xmm0
	; HASWELL-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm0			; HASWELL-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm0
	; HASWELL-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; HASWELL-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:0.50]
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: f32_one_step_2:			; HASWELL-NO-FMA-LABEL: f32_one_step_2:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; HASWELL-NO-FMA-NEXT: vrcpss %xmm0, %xmm0, %xmm1 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: vmulss %xmm1, %xmm0, %xmm0			; HASWELL-NO-FMA-NEXT: vmulss %xmm1, %xmm0, %xmm0 # sched: [5:0.50]
	; HASWELL-NO-FMA-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero			; HASWELL-NO-FMA-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero sched: [4:0.50]
	; HASWELL-NO-FMA-NEXT: vsubss %xmm0, %xmm2, %xmm0			; HASWELL-NO-FMA-NEXT: vsubss %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulss %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vmulss %xmm0, %xmm1, %xmm0 # sched: [5:0.50]
	; HASWELL-NO-FMA-NEXT: vaddss %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vaddss %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; HASWELL-NO-FMA-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:0.50]
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq # sched: [1:1.00]
	;			;
	; AVX512-LABEL: f32_one_step_2:			; AVX512-LABEL: f32_one_step_2:
	; AVX512: # BB#0:			; AVX512: # BB#0:
	; AVX512-NEXT: vrcp14ss %xmm0, %xmm0, %xmm1			; AVX512-NEXT: vrcp14ss %xmm0, %xmm0, %xmm1
	; AVX512-NEXT: vfnmadd213ss {{.*}}(%rip), %xmm1, %xmm0			; AVX512-NEXT: vfnmadd213ss {{.*}}(%rip), %xmm1, %xmm0
	; AVX512-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm0			; AVX512-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm0
	; AVX512-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; AVX512-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:0.50]
	; AVX512-NEXT: retq			; AVX512-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast float 3456.0, %x			%div = fdiv fast float 3456.0, %x
	ret float %div			ret float %div
	}			}

	define float @f32_one_step_2_divs(float %x) #1 {			define float @f32_one_step_2_divs(float %x) #1 {
	; SSE-LABEL: f32_one_step_2_divs:			; SSE-LABEL: f32_one_step_2_divs:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: rcpss %xmm0, %xmm1			; SSE-NEXT: rcpss %xmm0, %xmm1
	Show All 25 Lines
	; FMA-RECIP-NEXT: vfnmadd213ss {{.*}}(%rip), %xmm1, %xmm0			; FMA-RECIP-NEXT: vfnmadd213ss {{.*}}(%rip), %xmm1, %xmm0
	; FMA-RECIP-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm0			; FMA-RECIP-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm0
	; FMA-RECIP-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm1			; FMA-RECIP-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm1
	; FMA-RECIP-NEXT: vmulss %xmm0, %xmm1, %xmm0			; FMA-RECIP-NEXT: vmulss %xmm0, %xmm1, %xmm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: f32_one_step_2_divs:			; BTVER2-LABEL: f32_one_step_2_divs:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero			; BTVER2-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero sched: [5:1.00]
	; BTVER2-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; BTVER2-NEXT: vrcpss %xmm0, %xmm0, %xmm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulss %xmm1, %xmm0, %xmm0			; BTVER2-NEXT: vmulss %xmm1, %xmm0, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubss %xmm0, %xmm2, %xmm0			; BTVER2-NEXT: vsubss %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulss %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vmulss %xmm0, %xmm1, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddss %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vaddss %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm1			; BTVER2-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm1 # sched: [7:1.00]
	; BTVER2-NEXT: vmulss %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vmulss %xmm0, %xmm1, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: f32_one_step_2_divs:			; SANDY-LABEL: f32_one_step_2_divs:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; SANDY-NEXT: vrcpss %xmm0, %xmm0, %xmm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulss %xmm1, %xmm0, %xmm0			; SANDY-NEXT: vmulss %xmm1, %xmm0, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero			; SANDY-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero sched: [4:0.50]
	; SANDY-NEXT: vsubss %xmm0, %xmm2, %xmm0			; SANDY-NEXT: vsubss %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulss %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vmulss %xmm0, %xmm1, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vaddss %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vaddss %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm1			; SANDY-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm1 # sched: [9:1.00]
	; SANDY-NEXT: vmulss %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vmulss %xmm0, %xmm1, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: f32_one_step_2_divs:			; HASWELL-LABEL: f32_one_step_2_divs:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; HASWELL-NEXT: vrcpss %xmm0, %xmm0, %xmm1 # sched: [5:1.00]
	; HASWELL-NEXT: vfnmadd213ss {{.*}}(%rip), %xmm1, %xmm0			; HASWELL-NEXT: vfnmadd213ss {{.*}}(%rip), %xmm1, %xmm0
	; HASWELL-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm0			; HASWELL-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm0
	; HASWELL-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm1			; HASWELL-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm1 # sched: [9:0.50]
	; HASWELL-NEXT: vmulss %xmm0, %xmm1, %xmm0			; HASWELL-NEXT: vmulss %xmm0, %xmm1, %xmm0 # sched: [5:0.50]
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: f32_one_step_2_divs:			; HASWELL-NO-FMA-LABEL: f32_one_step_2_divs:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; HASWELL-NO-FMA-NEXT: vrcpss %xmm0, %xmm0, %xmm1 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: vmulss %xmm1, %xmm0, %xmm0			; HASWELL-NO-FMA-NEXT: vmulss %xmm1, %xmm0, %xmm0 # sched: [5:0.50]
	; HASWELL-NO-FMA-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero			; HASWELL-NO-FMA-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero sched: [4:0.50]
	; HASWELL-NO-FMA-NEXT: vsubss %xmm0, %xmm2, %xmm0			; HASWELL-NO-FMA-NEXT: vsubss %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulss %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vmulss %xmm0, %xmm1, %xmm0 # sched: [5:0.50]
	; HASWELL-NO-FMA-NEXT: vaddss %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vaddss %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm1			; HASWELL-NO-FMA-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm1 # sched: [9:0.50]
	; HASWELL-NO-FMA-NEXT: vmulss %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vmulss %xmm0, %xmm1, %xmm0 # sched: [5:0.50]
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq # sched: [1:1.00]
	;			;
	; AVX512-LABEL: f32_one_step_2_divs:			; AVX512-LABEL: f32_one_step_2_divs:
	; AVX512: # BB#0:			; AVX512: # BB#0:
	; AVX512-NEXT: vrcp14ss %xmm0, %xmm0, %xmm1			; AVX512-NEXT: vrcp14ss %xmm0, %xmm0, %xmm1
	; AVX512-NEXT: vfnmadd213ss {{.*}}(%rip), %xmm1, %xmm0			; AVX512-NEXT: vfnmadd213ss {{.*}}(%rip), %xmm1, %xmm0
	; AVX512-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm0			; AVX512-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm0
	; AVX512-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm1			; AVX512-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm1 # sched: [9:0.50]
	; AVX512-NEXT: vmulss %xmm0, %xmm1, %xmm0			; AVX512-NEXT: vmulss %xmm0, %xmm1, %xmm0 # sched: [5:0.50]
	; AVX512-NEXT: retq			; AVX512-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast float 3456.0, %x			%div = fdiv fast float 3456.0, %x
	%div2 = fdiv fast float %div, %x			%div2 = fdiv fast float %div, %x
	ret float %div2			ret float %div2
	}			}

	define float @f32_two_step_2(float %x) #2 {			define float @f32_two_step_2(float %x) #2 {
	; SSE-LABEL: f32_two_step_2:			; SSE-LABEL: f32_two_step_2:
	; SSE: # BB#0:			; SSE: # BB#0:
	Show All 37 Lines
	; FMA-RECIP-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm3			; FMA-RECIP-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm3
	; FMA-RECIP-NEXT: vfnmadd213ss %xmm2, %xmm3, %xmm0			; FMA-RECIP-NEXT: vfnmadd213ss %xmm2, %xmm3, %xmm0
	; FMA-RECIP-NEXT: vfmadd132ss %xmm3, %xmm3, %xmm0			; FMA-RECIP-NEXT: vfmadd132ss %xmm3, %xmm3, %xmm0
	; FMA-RECIP-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; FMA-RECIP-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: f32_two_step_2:			; BTVER2-LABEL: f32_two_step_2:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero			; BTVER2-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero sched: [5:1.00]
	; BTVER2-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; BTVER2-NEXT: vrcpss %xmm0, %xmm0, %xmm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulss %xmm1, %xmm0, %xmm2			; BTVER2-NEXT: vmulss %xmm1, %xmm0, %xmm2 # sched: [2:1.00]
	; BTVER2-NEXT: vsubss %xmm2, %xmm3, %xmm2			; BTVER2-NEXT: vsubss %xmm2, %xmm3, %xmm2 # sched: [3:1.00]
	; BTVER2-NEXT: vmulss %xmm2, %xmm1, %xmm2			; BTVER2-NEXT: vmulss %xmm2, %xmm1, %xmm2 # sched: [2:1.00]
	; BTVER2-NEXT: vaddss %xmm2, %xmm1, %xmm1			; BTVER2-NEXT: vaddss %xmm2, %xmm1, %xmm1 # sched: [3:1.00]
	; BTVER2-NEXT: vmulss %xmm1, %xmm0, %xmm0			; BTVER2-NEXT: vmulss %xmm1, %xmm0, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubss %xmm0, %xmm3, %xmm0			; BTVER2-NEXT: vsubss %xmm0, %xmm3, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulss %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vmulss %xmm0, %xmm1, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddss %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vaddss %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; BTVER2-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0 # sched: [7:1.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: f32_two_step_2:			; SANDY-LABEL: f32_two_step_2:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; SANDY-NEXT: vrcpss %xmm0, %xmm0, %xmm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulss %xmm1, %xmm0, %xmm2			; SANDY-NEXT: vmulss %xmm1, %xmm0, %xmm2 # sched: [5:1.00]
	; SANDY-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero			; SANDY-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero sched: [4:0.50]
	; SANDY-NEXT: vsubss %xmm2, %xmm3, %xmm2			; SANDY-NEXT: vsubss %xmm2, %xmm3, %xmm2 # sched: [3:1.00]
	; SANDY-NEXT: vmulss %xmm2, %xmm1, %xmm2			; SANDY-NEXT: vmulss %xmm2, %xmm1, %xmm2 # sched: [5:1.00]
	; SANDY-NEXT: vaddss %xmm2, %xmm1, %xmm1			; SANDY-NEXT: vaddss %xmm2, %xmm1, %xmm1 # sched: [3:1.00]
	; SANDY-NEXT: vmulss %xmm1, %xmm0, %xmm0			; SANDY-NEXT: vmulss %xmm1, %xmm0, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vsubss %xmm0, %xmm3, %xmm0			; SANDY-NEXT: vsubss %xmm0, %xmm3, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulss %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vmulss %xmm0, %xmm1, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vaddss %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vaddss %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; SANDY-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: f32_two_step_2:			; HASWELL-LABEL: f32_two_step_2:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; HASWELL-NEXT: vrcpss %xmm0, %xmm0, %xmm1 # sched: [5:1.00]
	; HASWELL-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero			; HASWELL-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero sched: [4:0.50]
	; HASWELL-NEXT: vmovaps %xmm1, %xmm3			; HASWELL-NEXT: vmovaps %xmm1, %xmm3 # sched: [1:1.00]
	; HASWELL-NEXT: vfnmadd213ss %xmm2, %xmm0, %xmm3			; HASWELL-NEXT: vfnmadd213ss %xmm2, %xmm0, %xmm3
	; HASWELL-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm3			; HASWELL-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm3
	; HASWELL-NEXT: vfnmadd213ss %xmm2, %xmm3, %xmm0			; HASWELL-NEXT: vfnmadd213ss %xmm2, %xmm3, %xmm0
	; HASWELL-NEXT: vfmadd132ss %xmm3, %xmm3, %xmm0			; HASWELL-NEXT: vfmadd132ss %xmm3, %xmm3, %xmm0
	; HASWELL-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; HASWELL-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:0.50]
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: f32_two_step_2:			; HASWELL-NO-FMA-LABEL: f32_two_step_2:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vrcpss %xmm0, %xmm0, %xmm1			; HASWELL-NO-FMA-NEXT: vrcpss %xmm0, %xmm0, %xmm1 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: vmulss %xmm1, %xmm0, %xmm2			; HASWELL-NO-FMA-NEXT: vmulss %xmm1, %xmm0, %xmm2 # sched: [5:0.50]
	; HASWELL-NO-FMA-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero			; HASWELL-NO-FMA-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero sched: [4:0.50]
	; HASWELL-NO-FMA-NEXT: vsubss %xmm2, %xmm3, %xmm2			; HASWELL-NO-FMA-NEXT: vsubss %xmm2, %xmm3, %xmm2 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulss %xmm2, %xmm1, %xmm2			; HASWELL-NO-FMA-NEXT: vmulss %xmm2, %xmm1, %xmm2 # sched: [5:0.50]
	; HASWELL-NO-FMA-NEXT: vaddss %xmm2, %xmm1, %xmm1			; HASWELL-NO-FMA-NEXT: vaddss %xmm2, %xmm1, %xmm1 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulss %xmm1, %xmm0, %xmm0			; HASWELL-NO-FMA-NEXT: vmulss %xmm1, %xmm0, %xmm0 # sched: [5:0.50]
	; HASWELL-NO-FMA-NEXT: vsubss %xmm0, %xmm3, %xmm0			; HASWELL-NO-FMA-NEXT: vsubss %xmm0, %xmm3, %xmm0 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulss %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vmulss %xmm0, %xmm1, %xmm0 # sched: [5:0.50]
	; HASWELL-NO-FMA-NEXT: vaddss %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vaddss %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; HASWELL-NO-FMA-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:0.50]
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq # sched: [1:1.00]
	;			;
	; AVX512-LABEL: f32_two_step_2:			; AVX512-LABEL: f32_two_step_2:
	; AVX512: # BB#0:			; AVX512: # BB#0:
	; AVX512-NEXT: vrcp14ss %xmm0, %xmm0, %xmm1			; AVX512-NEXT: vrcp14ss %xmm0, %xmm0, %xmm1
	; AVX512-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero			; AVX512-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero sched: [4:0.50]
	; AVX512-NEXT: vmovaps %xmm1, %xmm3			; AVX512-NEXT: vmovaps %xmm1, %xmm3 # sched: [1:1.00]
	; AVX512-NEXT: vfnmadd213ss %xmm2, %xmm0, %xmm3			; AVX512-NEXT: vfnmadd213ss %xmm2, %xmm0, %xmm3
	; AVX512-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm3			; AVX512-NEXT: vfmadd132ss %xmm1, %xmm1, %xmm3
	; AVX512-NEXT: vfnmadd213ss %xmm2, %xmm3, %xmm0			; AVX512-NEXT: vfnmadd213ss %xmm2, %xmm3, %xmm0
	; AVX512-NEXT: vfmadd132ss %xmm3, %xmm3, %xmm0			; AVX512-NEXT: vfmadd132ss %xmm3, %xmm3, %xmm0
	; AVX512-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0			; AVX512-NEXT: vmulss {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:0.50]
	; AVX512-NEXT: retq			; AVX512-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast float 6789.0, %x			%div = fdiv fast float 6789.0, %x
	ret float %div			ret float %div
	}			}

	define <4 x float> @v4f32_one_step2(<4 x float> %x) #1 {			define <4 x float> @v4f32_one_step2(<4 x float> %x) #1 {
	; SSE-LABEL: v4f32_one_step2:			; SSE-LABEL: v4f32_one_step2:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: rcpps %xmm0, %xmm2			; SSE-NEXT: rcpps %xmm0, %xmm2
	Show All 22 Lines
	; FMA-RECIP-NEXT: vrcpps %xmm0, %xmm1			; FMA-RECIP-NEXT: vrcpps %xmm0, %xmm1
	; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %xmm1, %xmm0			; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %xmm1, %xmm0
	; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0			; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0
	; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0			; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v4f32_one_step2:			; BTVER2-LABEL: v4f32_one_step2:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; BTVER2-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]
	; BTVER2-NEXT: vrcpps %xmm0, %xmm1			; BTVER2-NEXT: vrcpps %xmm0, %xmm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0			; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %xmm0, %xmm2, %xmm0			; BTVER2-NEXT: vsubps %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0			; BTVER2-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0 # sched: [7:1.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v4f32_one_step2:			; SANDY-LABEL: v4f32_one_step2:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %xmm0, %xmm1			; SANDY-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulps %xmm1, %xmm0, %xmm0			; SANDY-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; SANDY-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vsubps %xmm0, %xmm2, %xmm0			; SANDY-NEXT: vsubps %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulps %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vaddps %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0			; SANDY-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: v4f32_one_step2:			; HASWELL-LABEL: v4f32_one_step2:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrcpps %xmm0, %xmm1			; HASWELL-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2			; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2 # sched: [4:0.50]
	; HASWELL-NEXT: vfnmadd213ps %xmm2, %xmm1, %xmm0			; HASWELL-NEXT: vfnmadd213ps %xmm2, %xmm1, %xmm0
	; HASWELL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0			; HASWELL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0
	; HASWELL-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0			; HASWELL-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:0.50]
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: v4f32_one_step2:			; HASWELL-NO-FMA-LABEL: v4f32_one_step2:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vrcpps %xmm0, %xmm1			; HASWELL-NO-FMA-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps %xmm1, %xmm0, %xmm0			; HASWELL-NO-FMA-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [5:0.50]
	; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %xmm2			; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %xmm2 # sched: [4:0.50]
	; HASWELL-NO-FMA-NEXT: vsubps %xmm0, %xmm2, %xmm0			; HASWELL-NO-FMA-NEXT: vsubps %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [5:0.50]
	; HASWELL-NO-FMA-NEXT: vaddps %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0			; HASWELL-NO-FMA-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:0.50]
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq # sched: [1:1.00]
	;			;
	; KNL-LABEL: v4f32_one_step2:			; KNL-LABEL: v4f32_one_step2:
	; KNL: # BB#0:			; KNL: # BB#0:
	; KNL-NEXT: vrcpps %xmm0, %xmm1			; KNL-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	; KNL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2			; KNL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2 # sched: [4:0.50]
	; KNL-NEXT: vfnmadd213ps %xmm2, %xmm1, %xmm0			; KNL-NEXT: vfnmadd213ps %xmm2, %xmm1, %xmm0
	; KNL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0			; KNL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0
	; KNL-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0			; KNL-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:0.50]
	; KNL-NEXT: retq			; KNL-NEXT: retq # sched: [1:1.00]
	;			;
	; SKX-LABEL: v4f32_one_step2:			; SKX-LABEL: v4f32_one_step2:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vrcp14ps %xmm0, %xmm1			; SKX-NEXT: vrcp14ps %xmm0, %xmm1
	; SKX-NEXT: vfnmadd213ps {{.*}}(%rip){1to4}, %xmm1, %xmm0			; SKX-NEXT: vfnmadd213ps {{.*}}(%rip){1to4}, %xmm1, %xmm0
	; SKX-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0			; SKX-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0
	; SKX-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0			; SKX-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:0.50]
	; SKX-NEXT: retq			; SKX-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast <4 x float> <float 1.0, float 2.0, float 3.0, float 4.0>, %x			%div = fdiv fast <4 x float> <float 1.0, float 2.0, float 3.0, float 4.0>, %x
	ret <4 x float> %div			ret <4 x float> %div
	}			}

	define <4 x float> @v4f32_one_step_2_divs(<4 x float> %x) #1 {			define <4 x float> @v4f32_one_step_2_divs(<4 x float> %x) #1 {
	; SSE-LABEL: v4f32_one_step_2_divs:			; SSE-LABEL: v4f32_one_step_2_divs:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: rcpps %xmm0, %xmm1			; SSE-NEXT: rcpps %xmm0, %xmm1
	Show All 25 Lines
	; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %xmm1, %xmm0			; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %xmm1, %xmm0
	; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0			; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0
	; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm1			; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm1
	; FMA-RECIP-NEXT: vmulps %xmm0, %xmm1, %xmm0			; FMA-RECIP-NEXT: vmulps %xmm0, %xmm1, %xmm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v4f32_one_step_2_divs:			; BTVER2-LABEL: v4f32_one_step_2_divs:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; BTVER2-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]
	; BTVER2-NEXT: vrcpps %xmm0, %xmm1			; BTVER2-NEXT: vrcpps %xmm0, %xmm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0			; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %xmm0, %xmm2, %xmm0			; BTVER2-NEXT: vsubps %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm1			; BTVER2-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm1 # sched: [7:1.00]
	; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v4f32_one_step_2_divs:			; SANDY-LABEL: v4f32_one_step_2_divs:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %xmm0, %xmm1			; SANDY-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulps %xmm1, %xmm0, %xmm0			; SANDY-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; SANDY-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vsubps %xmm0, %xmm2, %xmm0			; SANDY-NEXT: vsubps %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulps %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vaddps %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm1			; SANDY-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm1 # sched: [9:1.00]
	; SANDY-NEXT: vmulps %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: v4f32_one_step_2_divs:			; HASWELL-LABEL: v4f32_one_step_2_divs:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrcpps %xmm0, %xmm1			; HASWELL-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2			; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2 # sched: [4:0.50]
	; HASWELL-NEXT: vfnmadd213ps %xmm2, %xmm1, %xmm0			; HASWELL-NEXT: vfnmadd213ps %xmm2, %xmm1, %xmm0
	; HASWELL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0			; HASWELL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0
	; HASWELL-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm1			; HASWELL-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm1 # sched: [9:0.50]
	; HASWELL-NEXT: vmulps %xmm0, %xmm1, %xmm0			; HASWELL-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [5:0.50]
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: v4f32_one_step_2_divs:			; HASWELL-NO-FMA-LABEL: v4f32_one_step_2_divs:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vrcpps %xmm0, %xmm1			; HASWELL-NO-FMA-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps %xmm1, %xmm0, %xmm0			; HASWELL-NO-FMA-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [5:0.50]
	; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %xmm2			; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %xmm2 # sched: [4:0.50]
	; HASWELL-NO-FMA-NEXT: vsubps %xmm0, %xmm2, %xmm0			; HASWELL-NO-FMA-NEXT: vsubps %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [5:0.50]
	; HASWELL-NO-FMA-NEXT: vaddps %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm1			; HASWELL-NO-FMA-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm1 # sched: [9:0.50]
	; HASWELL-NO-FMA-NEXT: vmulps %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [5:0.50]
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq # sched: [1:1.00]
	;			;
	; KNL-LABEL: v4f32_one_step_2_divs:			; KNL-LABEL: v4f32_one_step_2_divs:
	; KNL: # BB#0:			; KNL: # BB#0:
	; KNL-NEXT: vrcpps %xmm0, %xmm1			; KNL-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	; KNL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2			; KNL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2 # sched: [4:0.50]
	; KNL-NEXT: vfnmadd213ps %xmm2, %xmm1, %xmm0			; KNL-NEXT: vfnmadd213ps %xmm2, %xmm1, %xmm0
	; KNL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0			; KNL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0
	; KNL-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm1			; KNL-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm1 # sched: [9:0.50]
	; KNL-NEXT: vmulps %xmm0, %xmm1, %xmm0			; KNL-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [5:0.50]
	; KNL-NEXT: retq			; KNL-NEXT: retq # sched: [1:1.00]
	;			;
	; SKX-LABEL: v4f32_one_step_2_divs:			; SKX-LABEL: v4f32_one_step_2_divs:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vrcp14ps %xmm0, %xmm1			; SKX-NEXT: vrcp14ps %xmm0, %xmm1
	; SKX-NEXT: vfnmadd213ps {{.*}}(%rip){1to4}, %xmm1, %xmm0			; SKX-NEXT: vfnmadd213ps {{.*}}(%rip){1to4}, %xmm1, %xmm0
	; SKX-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0			; SKX-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0
	; SKX-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm1			; SKX-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm1 # sched: [9:0.50]
	; SKX-NEXT: vmulps %xmm0, %xmm1, %xmm0			; SKX-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [5:0.50]
	; SKX-NEXT: retq			; SKX-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast <4 x float> <float 1.0, float 2.0, float 3.0, float 4.0>, %x			%div = fdiv fast <4 x float> <float 1.0, float 2.0, float 3.0, float 4.0>, %x
	%div2 = fdiv fast <4 x float> %div, %x			%div2 = fdiv fast <4 x float> %div, %x
	ret <4 x float> %div2			ret <4 x float> %div2
	}			}

	define <4 x float> @v4f32_two_step2(<4 x float> %x) #2 {			define <4 x float> @v4f32_two_step2(<4 x float> %x) #2 {
	; SSE-LABEL: v4f32_two_step2:			; SSE-LABEL: v4f32_two_step2:
	; SSE: # BB#0:			; SSE: # BB#0:
	Show All 37 Lines
	; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3			; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3
	; FMA-RECIP-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0			; FMA-RECIP-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0
	; FMA-RECIP-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0			; FMA-RECIP-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0
	; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0			; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v4f32_two_step2:			; BTVER2-LABEL: v4f32_two_step2:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} xmm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; BTVER2-NEXT: vmovaps {{.*#+}} xmm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]
	; BTVER2-NEXT: vrcpps %xmm0, %xmm1			; BTVER2-NEXT: vrcpps %xmm0, %xmm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm2			; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm2 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %xmm2, %xmm3, %xmm2			; BTVER2-NEXT: vsubps %xmm2, %xmm3, %xmm2 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %xmm2, %xmm1, %xmm2			; BTVER2-NEXT: vmulps %xmm2, %xmm1, %xmm2 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %xmm2, %xmm1, %xmm1			; BTVER2-NEXT: vaddps %xmm2, %xmm1, %xmm1 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0			; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %xmm0, %xmm3, %xmm0			; BTVER2-NEXT: vsubps %xmm0, %xmm3, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %xmm0, %xmm1, %xmm0			; BTVER2-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0			; BTVER2-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0 # sched: [7:1.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v4f32_two_step2:			; SANDY-LABEL: v4f32_two_step2:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %xmm0, %xmm1			; SANDY-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulps %xmm1, %xmm0, %xmm2			; SANDY-NEXT: vmulps %xmm1, %xmm0, %xmm2 # sched: [5:1.00]
	; SANDY-NEXT: vmovaps {{.*#+}} xmm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; SANDY-NEXT: vmovaps {{.*#+}} xmm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vsubps %xmm2, %xmm3, %xmm2			; SANDY-NEXT: vsubps %xmm2, %xmm3, %xmm2 # sched: [3:1.00]
	; SANDY-NEXT: vmulps %xmm2, %xmm1, %xmm2			; SANDY-NEXT: vmulps %xmm2, %xmm1, %xmm2 # sched: [5:1.00]
	; SANDY-NEXT: vaddps %xmm2, %xmm1, %xmm1			; SANDY-NEXT: vaddps %xmm2, %xmm1, %xmm1 # sched: [3:1.00]
	; SANDY-NEXT: vmulps %xmm1, %xmm0, %xmm0			; SANDY-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vsubps %xmm0, %xmm3, %xmm0			; SANDY-NEXT: vsubps %xmm0, %xmm3, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulps %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [5:1.00]
	; SANDY-NEXT: vaddps %xmm0, %xmm1, %xmm0			; SANDY-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0			; SANDY-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: v4f32_two_step2:			; HASWELL-LABEL: v4f32_two_step2:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrcpps %xmm0, %xmm1			; HASWELL-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2			; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2 # sched: [4:0.50]
	; HASWELL-NEXT: vmovaps %xmm1, %xmm3			; HASWELL-NEXT: vmovaps %xmm1, %xmm3 # sched: [1:1.00]
	; HASWELL-NEXT: vfnmadd213ps %xmm2, %xmm0, %xmm3			; HASWELL-NEXT: vfnmadd213ps %xmm2, %xmm0, %xmm3
	; HASWELL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3			; HASWELL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3
	; HASWELL-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0			; HASWELL-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0
	; HASWELL-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0			; HASWELL-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0
	; HASWELL-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0			; HASWELL-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:0.50]
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: v4f32_two_step2:			; HASWELL-NO-FMA-LABEL: v4f32_two_step2:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vrcpps %xmm0, %xmm1			; HASWELL-NO-FMA-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps %xmm1, %xmm0, %xmm2			; HASWELL-NO-FMA-NEXT: vmulps %xmm1, %xmm0, %xmm2 # sched: [5:0.50]
	; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %xmm3			; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %xmm3 # sched: [4:0.50]
	; HASWELL-NO-FMA-NEXT: vsubps %xmm2, %xmm3, %xmm2			; HASWELL-NO-FMA-NEXT: vsubps %xmm2, %xmm3, %xmm2 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps %xmm2, %xmm1, %xmm2			; HASWELL-NO-FMA-NEXT: vmulps %xmm2, %xmm1, %xmm2 # sched: [5:0.50]
	; HASWELL-NO-FMA-NEXT: vaddps %xmm2, %xmm1, %xmm1			; HASWELL-NO-FMA-NEXT: vaddps %xmm2, %xmm1, %xmm1 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps %xmm1, %xmm0, %xmm0			; HASWELL-NO-FMA-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [5:0.50]
	; HASWELL-NO-FMA-NEXT: vsubps %xmm0, %xmm3, %xmm0			; HASWELL-NO-FMA-NEXT: vsubps %xmm0, %xmm3, %xmm0 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [5:0.50]
	; HASWELL-NO-FMA-NEXT: vaddps %xmm0, %xmm1, %xmm0			; HASWELL-NO-FMA-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0			; HASWELL-NO-FMA-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:0.50]
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq # sched: [1:1.00]
	;			;
	; KNL-LABEL: v4f32_two_step2:			; KNL-LABEL: v4f32_two_step2:
	; KNL: # BB#0:			; KNL: # BB#0:
	; KNL-NEXT: vrcpps %xmm0, %xmm1			; KNL-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	; KNL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2			; KNL-NEXT: vbroadcastss {{.*}}(%rip), %xmm2 # sched: [4:0.50]
	; KNL-NEXT: vmovaps %xmm1, %xmm3			; KNL-NEXT: vmovaps %xmm1, %xmm3 # sched: [1:1.00]
	; KNL-NEXT: vfnmadd213ps %xmm2, %xmm0, %xmm3			; KNL-NEXT: vfnmadd213ps %xmm2, %xmm0, %xmm3
	; KNL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3			; KNL-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3
	; KNL-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0			; KNL-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0
	; KNL-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0			; KNL-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0
	; KNL-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0			; KNL-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:0.50]
	; KNL-NEXT: retq			; KNL-NEXT: retq # sched: [1:1.00]
	;			;
	; SKX-LABEL: v4f32_two_step2:			; SKX-LABEL: v4f32_two_step2:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vrcp14ps %xmm0, %xmm1			; SKX-NEXT: vrcp14ps %xmm0, %xmm1
	; SKX-NEXT: vbroadcastss {{.*}}(%rip), %xmm2			; SKX-NEXT: vbroadcastss {{.*}}(%rip), %xmm2 # sched: [4:0.50]
	; SKX-NEXT: vmovaps %xmm1, %xmm3			; SKX-NEXT: vmovaps %xmm1, %xmm3 # sched: [1:1.00]
	; SKX-NEXT: vfnmadd213ps %xmm2, %xmm0, %xmm3			; SKX-NEXT: vfnmadd213ps %xmm2, %xmm0, %xmm3
	; SKX-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3			; SKX-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3
	; SKX-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0			; SKX-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0
	; SKX-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0			; SKX-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0
	; SKX-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0			; SKX-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0 # sched: [9:0.50]
	; SKX-NEXT: retq			; SKX-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast <4 x float> <float 1.0, float 2.0, float 3.0, float 4.0>, %x			%div = fdiv fast <4 x float> <float 1.0, float 2.0, float 3.0, float 4.0>, %x
	ret <4 x float> %div			ret <4 x float> %div
	}			}

	define <8 x float> @v8f32_one_step2(<8 x float> %x) #1 {			define <8 x float> @v8f32_one_step2(<8 x float> %x) #1 {
	; SSE-LABEL: v8f32_one_step2:			; SSE-LABEL: v8f32_one_step2:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: rcpps %xmm1, %xmm4			; SSE-NEXT: rcpps %xmm1, %xmm4
	Show All 30 Lines
	; FMA-RECIP-NEXT: vrcpps %ymm0, %ymm1			; FMA-RECIP-NEXT: vrcpps %ymm0, %ymm1
	; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %ymm1, %ymm0			; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %ymm1, %ymm0
	; FMA-RECIP-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0			; FMA-RECIP-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0
	; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v8f32_one_step2:			; BTVER2-LABEL: v8f32_one_step2:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; BTVER2-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]
	; BTVER2-NEXT: vrcpps %ymm0, %ymm1			; BTVER2-NEXT: vrcpps %ymm0, %ymm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0			; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %ymm0, %ymm2, %ymm0			; BTVER2-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0			; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0			; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; BTVER2-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [7:1.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v8f32_one_step2:			; SANDY-LABEL: v8f32_one_step2:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %ymm0, %ymm1			; SANDY-NEXT: vrcpps %ymm0, %ymm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm0			; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; SANDY-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vsubps %ymm0, %ymm2, %ymm0			; SANDY-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulps %ymm0, %ymm1, %ymm0			; SANDY-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: vaddps %ymm0, %ymm1, %ymm0			; SANDY-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; SANDY-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [9:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: v8f32_one_step2:			; HASWELL-LABEL: v8f32_one_step2:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrcpps %ymm0, %ymm1			; HASWELL-NEXT: vrcpps %ymm0, %ymm1 # sched: [7:2.00]
	; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2			; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2 # sched: [5:1.00]
	; HASWELL-NEXT: vfnmadd213ps %ymm2, %ymm1, %ymm0			; HASWELL-NEXT: vfnmadd213ps %ymm2, %ymm1, %ymm0
	; HASWELL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0			; HASWELL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0
	; HASWELL-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; HASWELL-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [9:1.00]
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: v8f32_one_step2:			; HASWELL-NO-FMA-LABEL: v8f32_one_step2:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vrcpps %ymm0, %ymm1			; HASWELL-NO-FMA-NEXT: vrcpps %ymm0, %ymm1 # sched: [7:2.00]
	; HASWELL-NO-FMA-NEXT: vmulps %ymm1, %ymm0, %ymm0			; HASWELL-NO-FMA-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %ymm2			; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %ymm2 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: vsubps %ymm0, %ymm2, %ymm0			; HASWELL-NO-FMA-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps %ymm0, %ymm1, %ymm0			; HASWELL-NO-FMA-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: vaddps %ymm0, %ymm1, %ymm0			; HASWELL-NO-FMA-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; HASWELL-NO-FMA-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [9:1.00]
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq # sched: [1:1.00]
	;			;
	; KNL-LABEL: v8f32_one_step2:			; KNL-LABEL: v8f32_one_step2:
	; KNL: # BB#0:			; KNL: # BB#0:
	; KNL-NEXT: vrcpps %ymm0, %ymm1			; KNL-NEXT: vrcpps %ymm0, %ymm1 # sched: [7:2.00]
	; KNL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2			; KNL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2 # sched: [5:1.00]
	; KNL-NEXT: vfnmadd213ps %ymm2, %ymm1, %ymm0			; KNL-NEXT: vfnmadd213ps %ymm2, %ymm1, %ymm0
	; KNL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0			; KNL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0
	; KNL-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; KNL-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [9:1.00]
	; KNL-NEXT: retq			; KNL-NEXT: retq # sched: [1:1.00]
	;			;
	; SKX-LABEL: v8f32_one_step2:			; SKX-LABEL: v8f32_one_step2:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vrcp14ps %ymm0, %ymm1			; SKX-NEXT: vrcp14ps %ymm0, %ymm1
	; SKX-NEXT: vfnmadd213ps {{.*}}(%rip){1to8}, %ymm1, %ymm0			; SKX-NEXT: vfnmadd213ps {{.*}}(%rip){1to8}, %ymm1, %ymm0
	; SKX-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0			; SKX-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0
	; SKX-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; SKX-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [9:1.00]
	; SKX-NEXT: retq			; SKX-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast <8 x float> <float 1.0, float 2.0, float 3.0, float 4.0, float 5.0, float 6.0, float 7.0, float 8.0>, %x			%div = fdiv fast <8 x float> <float 1.0, float 2.0, float 3.0, float 4.0, float 5.0, float 6.0, float 7.0, float 8.0>, %x
	ret <8 x float> %div			ret <8 x float> %div
	}			}

	define <8 x float> @v8f32_one_step_2_divs(<8 x float> %x) #1 {			define <8 x float> @v8f32_one_step_2_divs(<8 x float> %x) #1 {
	; SSE-LABEL: v8f32_one_step_2_divs:			; SSE-LABEL: v8f32_one_step_2_divs:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: rcpps %xmm0, %xmm2			; SSE-NEXT: rcpps %xmm0, %xmm2
	Show All 34 Lines
	; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %ymm1, %ymm0			; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %ymm1, %ymm0
	; FMA-RECIP-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0			; FMA-RECIP-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0
	; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm1			; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm1
	; FMA-RECIP-NEXT: vmulps %ymm0, %ymm1, %ymm0			; FMA-RECIP-NEXT: vmulps %ymm0, %ymm1, %ymm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v8f32_one_step_2_divs:			; BTVER2-LABEL: v8f32_one_step_2_divs:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; BTVER2-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]
	; BTVER2-NEXT: vrcpps %ymm0, %ymm1			; BTVER2-NEXT: vrcpps %ymm0, %ymm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0			; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %ymm0, %ymm2, %ymm0			; BTVER2-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0			; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0			; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm1			; BTVER2-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm1 # sched: [7:1.00]
	; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0			; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [2:1.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v8f32_one_step_2_divs:			; SANDY-LABEL: v8f32_one_step_2_divs:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %ymm0, %ymm1			; SANDY-NEXT: vrcpps %ymm0, %ymm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm0			; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; SANDY-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vsubps %ymm0, %ymm2, %ymm0			; SANDY-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulps %ymm0, %ymm1, %ymm0			; SANDY-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: vaddps %ymm0, %ymm1, %ymm0			; SANDY-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm1			; SANDY-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm1 # sched: [9:1.00]
	; SANDY-NEXT: vmulps %ymm0, %ymm1, %ymm0			; SANDY-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: v8f32_one_step_2_divs:			; HASWELL-LABEL: v8f32_one_step_2_divs:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrcpps %ymm0, %ymm1			; HASWELL-NEXT: vrcpps %ymm0, %ymm1 # sched: [7:2.00]
	; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2			; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2 # sched: [5:1.00]
	; HASWELL-NEXT: vfnmadd213ps %ymm2, %ymm1, %ymm0			; HASWELL-NEXT: vfnmadd213ps %ymm2, %ymm1, %ymm0
	; HASWELL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0			; HASWELL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0
	; HASWELL-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm1			; HASWELL-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm1 # sched: [9:1.00]
	; HASWELL-NEXT: vmulps %ymm0, %ymm1, %ymm0			; HASWELL-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [5:1.00]
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: v8f32_one_step_2_divs:			; HASWELL-NO-FMA-LABEL: v8f32_one_step_2_divs:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vrcpps %ymm0, %ymm1			; HASWELL-NO-FMA-NEXT: vrcpps %ymm0, %ymm1 # sched: [7:2.00]
	; HASWELL-NO-FMA-NEXT: vmulps %ymm1, %ymm0, %ymm0			; HASWELL-NO-FMA-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %ymm2			; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %ymm2 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: vsubps %ymm0, %ymm2, %ymm0			; HASWELL-NO-FMA-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps %ymm0, %ymm1, %ymm0			; HASWELL-NO-FMA-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: vaddps %ymm0, %ymm1, %ymm0			; HASWELL-NO-FMA-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm1			; HASWELL-NO-FMA-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm1 # sched: [9:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps %ymm0, %ymm1, %ymm0			; HASWELL-NO-FMA-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq # sched: [1:1.00]
	;			;
	; KNL-LABEL: v8f32_one_step_2_divs:			; KNL-LABEL: v8f32_one_step_2_divs:
	; KNL: # BB#0:			; KNL: # BB#0:
	; KNL-NEXT: vrcpps %ymm0, %ymm1			; KNL-NEXT: vrcpps %ymm0, %ymm1 # sched: [7:2.00]
	; KNL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2			; KNL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2 # sched: [5:1.00]
	; KNL-NEXT: vfnmadd213ps %ymm2, %ymm1, %ymm0			; KNL-NEXT: vfnmadd213ps %ymm2, %ymm1, %ymm0
	; KNL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0			; KNL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0
	; KNL-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm1			; KNL-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm1 # sched: [9:1.00]
	; KNL-NEXT: vmulps %ymm0, %ymm1, %ymm0			; KNL-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [5:1.00]
	; KNL-NEXT: retq			; KNL-NEXT: retq # sched: [1:1.00]
	;			;
	; SKX-LABEL: v8f32_one_step_2_divs:			; SKX-LABEL: v8f32_one_step_2_divs:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vrcp14ps %ymm0, %ymm1			; SKX-NEXT: vrcp14ps %ymm0, %ymm1
	; SKX-NEXT: vfnmadd213ps {{.*}}(%rip){1to8}, %ymm1, %ymm0			; SKX-NEXT: vfnmadd213ps {{.*}}(%rip){1to8}, %ymm1, %ymm0
	; SKX-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0			; SKX-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0
	; SKX-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm1			; SKX-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm1 # sched: [9:1.00]
	; SKX-NEXT: vmulps %ymm0, %ymm1, %ymm0			; SKX-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [5:1.00]
	; SKX-NEXT: retq			; SKX-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast <8 x float> <float 1.0, float 2.0, float 3.0, float 4.0, float 5.0, float 6.0, float 7.0, float 8.0>, %x			%div = fdiv fast <8 x float> <float 1.0, float 2.0, float 3.0, float 4.0, float 5.0, float 6.0, float 7.0, float 8.0>, %x
	%div2 = fdiv fast <8 x float> %div, %x			%div2 = fdiv fast <8 x float> %div, %x
	ret <8 x float> %div2			ret <8 x float> %div2
	}			}

	define <8 x float> @v8f32_two_step2(<8 x float> %x) #2 {			define <8 x float> @v8f32_two_step2(<8 x float> %x) #2 {
	; SSE-LABEL: v8f32_two_step2:			; SSE-LABEL: v8f32_two_step2:
	; SSE: # BB#0:			; SSE: # BB#0:
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; FMA-RECIP-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm3			; FMA-RECIP-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm3
	; FMA-RECIP-NEXT: vfnmadd213ps %ymm2, %ymm3, %ymm0			; FMA-RECIP-NEXT: vfnmadd213ps %ymm2, %ymm3, %ymm0
	; FMA-RECIP-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0			; FMA-RECIP-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0
	; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v8f32_two_step2:			; BTVER2-LABEL: v8f32_two_step2:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} ymm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; BTVER2-NEXT: vmovaps {{.*#+}} ymm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]
	; BTVER2-NEXT: vrcpps %ymm0, %ymm1			; BTVER2-NEXT: vrcpps %ymm0, %ymm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm2			; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm2 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %ymm2, %ymm3, %ymm2			; BTVER2-NEXT: vsubps %ymm2, %ymm3, %ymm2 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %ymm2, %ymm1, %ymm2			; BTVER2-NEXT: vmulps %ymm2, %ymm1, %ymm2 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %ymm2, %ymm1, %ymm1			; BTVER2-NEXT: vaddps %ymm2, %ymm1, %ymm1 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0			; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %ymm0, %ymm3, %ymm0			; BTVER2-NEXT: vsubps %ymm0, %ymm3, %ymm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0			; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0			; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; BTVER2-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [7:1.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v8f32_two_step2:			; SANDY-LABEL: v8f32_two_step2:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %ymm0, %ymm1			; SANDY-NEXT: vrcpps %ymm0, %ymm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm2			; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm2 # sched: [5:1.00]
	; SANDY-NEXT: vmovaps {{.*#+}} ymm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; SANDY-NEXT: vmovaps {{.*#+}} ymm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vsubps %ymm2, %ymm3, %ymm2			; SANDY-NEXT: vsubps %ymm2, %ymm3, %ymm2 # sched: [3:1.00]
	; SANDY-NEXT: vmulps %ymm2, %ymm1, %ymm2			; SANDY-NEXT: vmulps %ymm2, %ymm1, %ymm2 # sched: [5:1.00]
	; SANDY-NEXT: vaddps %ymm2, %ymm1, %ymm1			; SANDY-NEXT: vaddps %ymm2, %ymm1, %ymm1 # sched: [3:1.00]
	; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm0			; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: vsubps %ymm0, %ymm3, %ymm0			; SANDY-NEXT: vsubps %ymm0, %ymm3, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulps %ymm0, %ymm1, %ymm0			; SANDY-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: vaddps %ymm0, %ymm1, %ymm0			; SANDY-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; SANDY-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [9:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: v8f32_two_step2:			; HASWELL-LABEL: v8f32_two_step2:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrcpps %ymm0, %ymm1			; HASWELL-NEXT: vrcpps %ymm0, %ymm1 # sched: [7:2.00]
	; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2			; HASWELL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2 # sched: [5:1.00]
	; HASWELL-NEXT: vmovaps %ymm1, %ymm3			; HASWELL-NEXT: vmovaps %ymm1, %ymm3 # sched: [1:1.00]
	; HASWELL-NEXT: vfnmadd213ps %ymm2, %ymm0, %ymm3			; HASWELL-NEXT: vfnmadd213ps %ymm2, %ymm0, %ymm3
	; HASWELL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm3			; HASWELL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm3
	; HASWELL-NEXT: vfnmadd213ps %ymm2, %ymm3, %ymm0			; HASWELL-NEXT: vfnmadd213ps %ymm2, %ymm3, %ymm0
	; HASWELL-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0			; HASWELL-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0
	; HASWELL-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; HASWELL-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [9:1.00]
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: v8f32_two_step2:			; HASWELL-NO-FMA-LABEL: v8f32_two_step2:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vrcpps %ymm0, %ymm1			; HASWELL-NO-FMA-NEXT: vrcpps %ymm0, %ymm1 # sched: [7:2.00]
	; HASWELL-NO-FMA-NEXT: vmulps %ymm1, %ymm0, %ymm2			; HASWELL-NO-FMA-NEXT: vmulps %ymm1, %ymm0, %ymm2 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %ymm3			; HASWELL-NO-FMA-NEXT: vbroadcastss {{.*}}(%rip), %ymm3 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: vsubps %ymm2, %ymm3, %ymm2			; HASWELL-NO-FMA-NEXT: vsubps %ymm2, %ymm3, %ymm2 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps %ymm2, %ymm1, %ymm2			; HASWELL-NO-FMA-NEXT: vmulps %ymm2, %ymm1, %ymm2 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: vaddps %ymm2, %ymm1, %ymm1			; HASWELL-NO-FMA-NEXT: vaddps %ymm2, %ymm1, %ymm1 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps %ymm1, %ymm0, %ymm0			; HASWELL-NO-FMA-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: vsubps %ymm0, %ymm3, %ymm0			; HASWELL-NO-FMA-NEXT: vsubps %ymm0, %ymm3, %ymm0 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps %ymm0, %ymm1, %ymm0			; HASWELL-NO-FMA-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [5:1.00]
	; HASWELL-NO-FMA-NEXT: vaddps %ymm0, %ymm1, %ymm0			; HASWELL-NO-FMA-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; HASWELL-NO-FMA-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; HASWELL-NO-FMA-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [9:1.00]
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq # sched: [1:1.00]
	;			;
	; KNL-LABEL: v8f32_two_step2:			; KNL-LABEL: v8f32_two_step2:
	; KNL: # BB#0:			; KNL: # BB#0:
	; KNL-NEXT: vrcpps %ymm0, %ymm1			; KNL-NEXT: vrcpps %ymm0, %ymm1 # sched: [7:2.00]
	; KNL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2			; KNL-NEXT: vbroadcastss {{.*}}(%rip), %ymm2 # sched: [5:1.00]
	; KNL-NEXT: vmovaps %ymm1, %ymm3			; KNL-NEXT: vmovaps %ymm1, %ymm3 # sched: [1:1.00]
	; KNL-NEXT: vfnmadd213ps %ymm2, %ymm0, %ymm3			; KNL-NEXT: vfnmadd213ps %ymm2, %ymm0, %ymm3
	; KNL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm3			; KNL-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm3
	; KNL-NEXT: vfnmadd213ps %ymm2, %ymm3, %ymm0			; KNL-NEXT: vfnmadd213ps %ymm2, %ymm3, %ymm0
	; KNL-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0			; KNL-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0
	; KNL-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; KNL-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [9:1.00]
	; KNL-NEXT: retq			; KNL-NEXT: retq # sched: [1:1.00]
	;			;
	; SKX-LABEL: v8f32_two_step2:			; SKX-LABEL: v8f32_two_step2:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vrcp14ps %ymm0, %ymm1			; SKX-NEXT: vrcp14ps %ymm0, %ymm1
	; SKX-NEXT: vbroadcastss {{.*}}(%rip), %ymm2			; SKX-NEXT: vbroadcastss {{.*}}(%rip), %ymm2 # sched: [5:1.00]
	; SKX-NEXT: vmovaps %ymm1, %ymm3			; SKX-NEXT: vmovaps %ymm1, %ymm3 # sched: [1:1.00]
	; SKX-NEXT: vfnmadd213ps %ymm2, %ymm0, %ymm3			; SKX-NEXT: vfnmadd213ps %ymm2, %ymm0, %ymm3
	; SKX-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm3			; SKX-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm3
	; SKX-NEXT: vfnmadd213ps %ymm2, %ymm3, %ymm0			; SKX-NEXT: vfnmadd213ps %ymm2, %ymm3, %ymm0
	; SKX-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0			; SKX-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0
	; SKX-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; SKX-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [9:1.00]
	; SKX-NEXT: retq			; SKX-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast <8 x float> <float 1.0, float 2.0, float 3.0, float 4.0, float 5.0, float 6.0, float 7.0, float 8.0>, %x			%div = fdiv fast <8 x float> <float 1.0, float 2.0, float 3.0, float 4.0, float 5.0, float 6.0, float 7.0, float 8.0>, %x
	ret <8 x float> %div			ret <8 x float> %div
	}			}

	define <8 x float> @v8f32_no_step(<8 x float> %x) #3 {			define <8 x float> @v8f32_no_step(<8 x float> %x) #3 {
	; SSE-LABEL: v8f32_no_step:			; SSE-LABEL: v8f32_no_step:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: rcpps %xmm0, %xmm0			; SSE-NEXT: rcpps %xmm0, %xmm0
	; SSE-NEXT: rcpps %xmm1, %xmm1			; SSE-NEXT: rcpps %xmm1, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-RECIP-LABEL: v8f32_no_step:			; AVX-RECIP-LABEL: v8f32_no_step:
	; AVX-RECIP: # BB#0:			; AVX-RECIP: # BB#0:
	; AVX-RECIP-NEXT: vrcpps %ymm0, %ymm0			; AVX-RECIP-NEXT: vrcpps %ymm0, %ymm0
	; AVX-RECIP-NEXT: retq			; AVX-RECIP-NEXT: retq
	;			;
	; FMA-RECIP-LABEL: v8f32_no_step:			; FMA-RECIP-LABEL: v8f32_no_step:
	; FMA-RECIP: # BB#0:			; FMA-RECIP: # BB#0:
	; FMA-RECIP-NEXT: vrcpps %ymm0, %ymm0			; FMA-RECIP-NEXT: vrcpps %ymm0, %ymm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v8f32_no_step:			; BTVER2-LABEL: v8f32_no_step:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vrcpps %ymm0, %ymm0			; BTVER2-NEXT: vrcpps %ymm0, %ymm0 # sched: [2:1.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v8f32_no_step:			; SANDY-LABEL: v8f32_no_step:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %ymm0, %ymm0			; SANDY-NEXT: vrcpps %ymm0, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: v8f32_no_step:			; HASWELL-LABEL: v8f32_no_step:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrcpps %ymm0, %ymm0			; HASWELL-NEXT: vrcpps %ymm0, %ymm0 # sched: [7:2.00]
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: v8f32_no_step:			; HASWELL-NO-FMA-LABEL: v8f32_no_step:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vrcpps %ymm0, %ymm0			; HASWELL-NO-FMA-NEXT: vrcpps %ymm0, %ymm0 # sched: [7:2.00]
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq # sched: [1:1.00]
	;			;
	; KNL-LABEL: v8f32_no_step:			; KNL-LABEL: v8f32_no_step:
	; KNL: # BB#0:			; KNL: # BB#0:
	; KNL-NEXT: vrcpps %ymm0, %ymm0			; KNL-NEXT: vrcpps %ymm0, %ymm0 # sched: [7:2.00]
	; KNL-NEXT: retq			; KNL-NEXT: retq # sched: [1:1.00]
	;			;
	; SKX-LABEL: v8f32_no_step:			; SKX-LABEL: v8f32_no_step:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vrcp14ps %ymm0, %ymm0			; SKX-NEXT: vrcp14ps %ymm0, %ymm0
	; SKX-NEXT: retq			; SKX-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %x			%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %x
	ret <8 x float> %div			ret <8 x float> %div
	}			}

	define <8 x float> @v8f32_no_step2(<8 x float> %x) #3 {			define <8 x float> @v8f32_no_step2(<8 x float> %x) #3 {
	; SSE-LABEL: v8f32_no_step2:			; SSE-LABEL: v8f32_no_step2:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: rcpps %xmm1, %xmm1			; SSE-NEXT: rcpps %xmm1, %xmm1
	Show All 11 Lines
	; FMA-RECIP-LABEL: v8f32_no_step2:			; FMA-RECIP-LABEL: v8f32_no_step2:
	; FMA-RECIP: # BB#0:			; FMA-RECIP: # BB#0:
	; FMA-RECIP-NEXT: vrcpps %ymm0, %ymm0			; FMA-RECIP-NEXT: vrcpps %ymm0, %ymm0
	; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v8f32_no_step2:			; BTVER2-LABEL: v8f32_no_step2:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vrcpps %ymm0, %ymm0			; BTVER2-NEXT: vrcpps %ymm0, %ymm0 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; BTVER2-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [7:1.00]
	; BTVER2-NEXT: retq			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v8f32_no_step2:			; SANDY-LABEL: v8f32_no_step2:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %ymm0, %ymm0			; SANDY-NEXT: vrcpps %ymm0, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; SANDY-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [9:1.00]
	; SANDY-NEXT: retq			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: v8f32_no_step2:			; HASWELL-LABEL: v8f32_no_step2:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrcpps %ymm0, %ymm0			; HASWELL-NEXT: vrcpps %ymm0, %ymm0 # sched: [7:2.00]
	; HASWELL-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; HASWELL-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [9:1.00]
	; HASWELL-NEXT: retq			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; HASWELL-NO-FMA-LABEL: v8f32_no_step2:			; HASWELL-NO-FMA-LABEL: v8f32_no_step2:
	; HASWELL-NO-FMA: # BB#0:			; HASWELL-NO-FMA: # BB#0:
	; HASWELL-NO-FMA-NEXT: vrcpps %ymm0, %ymm0			; HASWELL-NO-FMA-NEXT: vrcpps %ymm0, %ymm0 # sched: [7:2.00]
	; HASWELL-NO-FMA-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; HASWELL-NO-FMA-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [9:1.00]
	; HASWELL-NO-FMA-NEXT: retq			; HASWELL-NO-FMA-NEXT: retq # sched: [1:1.00]
	;			;
	; KNL-LABEL: v8f32_no_step2:			; KNL-LABEL: v8f32_no_step2:
	; KNL: # BB#0:			; KNL: # BB#0:
	; KNL-NEXT: vrcpps %ymm0, %ymm0			; KNL-NEXT: vrcpps %ymm0, %ymm0 # sched: [7:2.00]
	; KNL-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; KNL-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [9:1.00]
	; KNL-NEXT: retq			; KNL-NEXT: retq # sched: [1:1.00]
	;			;
	; SKX-LABEL: v8f32_no_step2:			; SKX-LABEL: v8f32_no_step2:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vrcp14ps %ymm0, %ymm0			; SKX-NEXT: vrcp14ps %ymm0, %ymm0
	; SKX-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; SKX-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [9:1.00]
	; SKX-NEXT: retq			; SKX-NEXT: retq # sched: [1:1.00]
	%div = fdiv fast <8 x float> <float 1.0, float 2.0, float 3.0, float 4.0, float 5.0, float 6.0, float 7.0, float 8.0>, %x			%div = fdiv fast <8 x float> <float 1.0, float 2.0, float 3.0, float 4.0, float 5.0, float 6.0, float 7.0, float 8.0>, %x
	ret <8 x float> %div			ret <8 x float> %div
	}			}

	attributes #0 = { "unsafe-fp-math"="true" "reciprocal-estimates"="!divf,!vec-divf" }			attributes #0 = { "unsafe-fp-math"="true" "reciprocal-estimates"="!divf,!vec-divf" }
	attributes #1 = { "unsafe-fp-math"="true" "reciprocal-estimates"="divf,vec-divf" }			attributes #1 = { "unsafe-fp-math"="true" "reciprocal-estimates"="divf,vec-divf" }
	attributes #2 = { "unsafe-fp-math"="true" "reciprocal-estimates"="divf:2,vec-divf:2" }			attributes #2 = { "unsafe-fp-math"="true" "reciprocal-estimates"="divf:2,vec-divf:2" }
	attributes #3 = { "unsafe-fp-math"="true" "reciprocal-estimates"="divf:0,vec-divf:0" }			attributes #3 = { "unsafe-fp-math"="true" "reciprocal-estimates"="divf:0,vec-divf:0" }

This is an archive of the discontinued LLVM Phabricator instance.

Better testing of schedule model instruction latencies/throughputsClosedPublic

Details

Diff Detail

Event Timeline

Next, I was told that ResourceCycles here:

Next, I was told that ResourceCycles here:

Next, I was told that ResourceCycles here:

Revision Contents

Diff 94973

include/llvm/CodeGen/AsmPrinter.h

include/llvm/CodeGen/TargetSchedule.h

include/llvm/MC/MCObjectStreamer.h

include/llvm/MC/MCStreamer.h

include/llvm/MC/MCSubtargetInfo.h

include/llvm/Target/TargetSubtargetInfo.h

lib/CodeGen/AsmPrinter/AsmPrinter.cpp

lib/CodeGen/TargetSchedule.cpp

lib/CodeGen/TargetSubtargetInfo.cpp

lib/MC/MCAsmStreamer.cpp

lib/MC/MCObjectStreamer.cpp

lib/MC/MCStreamer.cpp

lib/Object/RecordStreamer.h

lib/Object/RecordStreamer.cpp

lib/Target/AArch64/MCTargetDesc/AArch64ELFStreamer.cpp

lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp

lib/Target/Hexagon/MCTargetDesc/HexagonMCELFStreamer.h

lib/Target/Hexagon/MCTargetDesc/HexagonMCELFStreamer.cpp

lib/Target/Mips/MCTargetDesc/MipsELFStreamer.h

lib/Target/Mips/MCTargetDesc/MipsELFStreamer.cpp

lib/Target/Mips/MCTargetDesc/MipsNaClELFStreamer.cpp

lib/Target/X86/InstPrinter/X86InstComments.cpp

lib/Target/X86/X86MCInstLower.cpp

lib/Target/X86/X86Subtarget.h

test/CodeGen/X86/recip-fastmath.ll

test/CodeGen/X86/recip-fastmath2.ll

Better testing of schedule model instruction latencies/throughputs
ClosedPublic