This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
10
MachineScheduler.h
-
lib/
-
CodeGen/
1
MachineScheduler.cpp
-
Target/
-
AMDGPU/
2
SIMachineScheduler.cpp
-
SystemZ/
-
SystemZMachineScheduler.h
-
SystemZMachineScheduler.cpp
-
SystemZTargetMachine.cpp
-
test/CodeGen/SystemZ/
-
CodeGen/
-
SystemZ/
-
vec-cmp-cmp-logic-select.ll
-
vec-cmpsel.ll
-
vec-ctpop-01.ll

Differential D43329

[SystemZ, MachineScheduler] Refactor GenericScheduler::tryCandidate() to reuse parts in a new SystemZ scheduling strategy.
Needs ReviewPublic

Authored by jonpa on Feb 15 2018, 2:31 AM.

Download Raw Diff

Details

Reviewers

uweigand

Summary

Short: Patch with a new SystemZ SchedStrategy for Pre-RA scheduling, with refactored GenereicScheduler to reuse all of tryCandidate(). Advice needed on how to know that SU uses the vector unit.

In continuation of the discussion we had last November (http://lists.llvm.org/pipermail/llvm-dev/2017-November/119250.html), I am now much closer to actually committing something for the SystemZ backend. As then, I want to do an extra latency check on specific SUs in tryCandidate(). If it would be accepted - as Andy previously explained quite clearly it is not - this might have looked something like:

--- a/lib/CodeGen/MachineScheduler.cpp
+++ b/lib/CodeGen/MachineScheduler.cpp
@@ -2968,14 +2968,20 @@ void GenericScheduler::tryCandidate(SchedCandidate &Cand,
   // Avoid increasing the max pressure of the entire region.
   if (DAG->isTrackingPressure() && tryPressure(TryCand.RPDelta.CurrentMax,
                                                Cand.RPDelta.CurrentMax,
                                                TryCand, Cand, RegMax, TRI,
                                                DAG->MF))
     return;

+  // Let target give priority to latency for certain instructions, e.g. those
+  // using a particular pipeline.
+  if (RegionPolicy.LatencyBoost && ST.tryAggrLatency(TryCand.SU) &&
+      tryLatency(TryCand, Cand, *Zone))
+    return;
+
   if (SameBoundary) {
     // Avoid critical resource consumption and balance the schedule.

This is quite simple, but another "knob" to turn, instead of a truly flexible tryCandidate() method. I agree with Andy that tuning the pre-RA scheduling heruistics is important enough to motivate a target to derive its own strategy, so that's what I did now instead by:

Refactor tryCandidate() into digestible parts so that a target that wants to override this method with minor modifications can do so easily. I tried to do a separation into the logical parts, but I am of course open to alternatives, including name changing of the new methods.

tryLatency() becomes a protected method of GenericSchedulerBase instead of a static function in MachineScheduler.cpp, so that the derived SystemZ strategy can reuse it. More static functions may follow this pattern when needed, such as tryLess(), tryGreater(), etc.

A new class SystemZPreRASchedStrategy. Its tryCandidate() method basically needs to check if SU uses the vector unit (Z13_VecUnit / Z14_VecUnit), and if so call tryLatency(). I am not sure what the best way to do this check would be, and the current way of using a string comparison on the MCProcResourceDesc::Name is of course temporary. It would have been nice to have enums for the execution units, but Tablegen does not print them. Another alternative might be to use a TSFlag bit for this in the instruction descriptor, but that would mean duplicating information unnecessarily. What is the recommended way of identifying a specific processor resource? I don't see this being done anywhere. Is there another way, such as giving the VecUnit some type of flag or value?

Coming back to the original mail discussion, this will perhaps be somewhat awkward to maintain over time - in order to keep up with the developments in the base class, one has to 1) diff both tryCandidate(), and also 2) check what calls to DAG->addMutation() are done in the generic createGenericSchedLive().

As before, there are some nice improvements on benchmarks with this :-)

SystemZ tests updated.

Diff Detail

Event Timeline

jonpa created this revision.Feb 15 2018, 2:31 AM

Herald added subscribers: javed.absar, MatzeB. · View Herald TranscriptFeb 15 2018, 2:31 AM

Refactor tryCandidate() into digestible parts so that a target that wants to override this method with minor modifications can do so easily. I tried to do a separation into the logical parts, but I am of course open to alternatives, including name changing of the new methods.

Thanks for picking this up! This is along the lines what I outlined in the email thread and what I was planning to do, but did not find the time so far.

I left some inline comments and I think it would be best if you could split this up into a MachineScheduler patch and a SystemZ patch. One advantage of having target specific schedulers IMO should be that target maintainers can more easily accept changes without worrying about other targets.

Coming back to the original mail discussion, this will perhaps be somewhat awkward to maintain over time - in order to keep up with the developments in the base class, one has to 1) diff both tryCandidate(), and also 2) check what calls to DAG->addMutation() are done in the generic createGenericSchedLive().

I do not think this will be a huge issue, as there are not many changes to the generic heuristics (and with more target-specific schedulers, I expect even less need to tweak the generic heuristics). As for DAG mutations, at least the AArch64 backend already manages that itself and I do not think this has been a problem so far.

include/llvm/CodeGen/MachineScheduler.h
894–895	I am not sure what the benefit of that is and at the moment it seems like one setting too much to me. I think it is not unreasonable to expect people to provide their own tryCandidate implementation if they want custom latency heuristics.
988–989	Usually camel case is used for function names, so I think it would be better to have tryCandidateRegPressure - same for other methods below.
1000	I think for now it seems like making tryCandidate a virtual function gives peopl eenough freedom to ship their own heuristics while re-using most of the existing bits. I don't see the benefit of making the helper functions class members however. Also, tryCandidate should not modify the scheduler state, so I think it should be `const`. And as this is part of the interface now and we expect people to extend it, it would be good to document it.
lib/CodeGen/MachineScheduler.cpp
2599	Splitting the code into multiple helper functions would be great opportunity to document the heuristics :)

jonpa added inline comments.Feb 17 2018, 9:30 AM

include/llvm/CodeGen/MachineScheduler.h
894–895	Well... why have duplicated code in a Target backend, if it can be helped? To me, it is generally reasonable to assume that a target starts out with enabling the generic mischeduler, and then as a next step experiment with it by adding / removing / reordering heuristics. What harm is it then to provide the means to do so with protected member methods?
1000	How are those bits going to be reused if not by inheritance? How do you picture SystemZ adding this simple heuristic (see top of page) while reusing all else?

fhahn added inline comments.Feb 19 2018, 5:58 AM

include/llvm/CodeGen/MachineScheduler.h
894–895	IMO with too much subclassing and overriding, it can be quite complicated to see what's going on - but I think that comes down to personal preference, mostly. And that's just me, I suppose with those kinds of things @MatzeB and @atrick 's thoughts are much more important ;) I also think it would be nice to have all heuristic functions somewhere together, to make it easier to keep track of them.
1000	How do you picture SystemZ adding this simple heuristic (see top of page) while reusing all else? They do not have to be member functions, they could be regular exported functions. They don't access the scheduler object (I think), so I don't think they would need to be member functions. To me, it seems simpler not to couple the scheduler and the heuristics too much, but that's again mostly personal perference.

jonpa added inline comments.Feb 19 2018, 8:13 AM

include/llvm/CodeGen/MachineScheduler.h
894–895	I tried adding const to all the new methods, which seemed to increase readability at least to me. tryLatency() is used both pre- and post-RA, so it needs to be in the base. I guess tryGreater() and other such small methods should also all go in the base class. Personally, I think it makes sense to have these type of functions as part of the class, since after all that is a scheduling strategy class for which these methods actually are very central. But like you said as well, that's just me...
1000	I tried removing the 'static' from such a function in MachineScheduler.cpp, and then adding an 'extern' declaration in SystemZMachineScheduler.cpp, but that lead to a lot of linker errors. How would you do this?

fhahn added inline comments.Feb 19 2018, 8:55 AM

include/llvm/CodeGen/MachineScheduler.h
1000	ah sorry. I meant removing static and moving the declaration to the header file.

General thoughts:

Factoring code textually across components is a non-goal in itself. The resulting abstraction is often less clear and harder to maintain. It's always tempting to factor code that appears repetitious but isn't actually a serious maintainability problem. Code factoring should be driven by logical boundaries, not textual repetition.

Inheritance as a code reuse strategy usually results in less clarity and less maintainability. I prefer object composition or free standing functions and only use inheritance when polymorphism is required.

(I know that's not very helpful advice without examples, so just take it FWIW.)

MachineScheduler thoughts:

Creating small utility functions that can be easily combined into a useful scheduling strategy for targets is great.

Breaking up the top-level tryCandidate and moving the boilerplate and pesky details into smaller helpers looks good.

So, on these points, the patch is a positive contribution.

An important design principle is that when GenericScheduler's implementation changes it should *not* affect targets that have already been tuned and are overriding the scheduling strategy.

See the problem that inheritance creates? As a code reuse strategy, it violates the decoupling between Target and CodeGen.

In particular, arbitrarily gouping the heuristics into RegPressure vs. RegPressure2 and Latency vs. Latency2 is unhelpful. Each heuristic entry point that you expose to the Target should have clear semantics that aren't likely to change as GenericScheduler evolves. The contract for each entry point should be clear.

So, for example, tryCandidateClusteredWeak makes sense. It isn't going to be reimplemented as something else. You can't do this for "Latency" and "RegPressure" because that could mean a number of different things. I think the challenge here is to come up with a name for each heuristic that makes sense to expose outside of the GenericScheduler implementation.

You do not *need* to expose all heuristics used by the GenericScheduler directly to targets. It's not hard for targets to copy the few lines of code for a heuristic. Copying the code doesn't create a maintenance problem, it solves one. Just expose the heuristics that are clear and easy to describe.

Creating small utility functions that can be easily combined into a useful scheduling strategy for targets is great.

I followed Florians and your suggestion and simply removed the static keyword and put the declaration in the header file. I then realized I had to pass some member objects as arguments instead when needed. For instance, tryCandidate_RegPress() calls DAG->isTrackingPressure(), which is a ScheduleDAGMILive method. Since the SchedBoundary *Zone argument only has a ScheduleDAGMI *DAG member, it cannot be used directly. Also, tryCandidate_Latency() needs the Rem object passed.

Regardless of which utility functions we end up with, I suppose this is is acceptable and preferred still to inheritance?

An important design principle is that when GenericScheduler's implementation changes it should *not* affect targets that have already been tuned and are overriding the scheduling strategy.

Just to express my thoughts: I see this point in the sense that if a target truly had a perfect set-in-stone tuning, it would be disastrous to change anything in an uncontrollable way. But given that mischeduler is relatively new and evolving, and a target may merely have been able to improve benchmarks with a minor modification, I think it's more natural to think that a target would really want to be in on the improvements to come in the future. In other words, *not* to decouple. Take register pressure, for instance. I don't think a target that does not have it's own register pressure heuristics would want to fall behind if the common code changes in the future. That change should then be general goodness, or it should be put in a target specific strategy, right?

So to me personally, working on just one backend, it would be slightly preferred to have e.g. the tryCandidate_RegPress() function in the target strategy, so that if somebody improved it, my target would immediately get that improvement. I don't want to decouple this, since I was merely adding a heuristic with lesser priority.

On the other hand, providing just the smaller utility functions and then doing a copy-and-paste of tryCandidate() should probably work quite well in practice, as you say. In the very long run, this would also give maximum flexibility for each target.
I suppose then we could just have one or two of those new methods, like tryCandidate_Clustered_Weak().

BTW, I am still open to suggestions on the SystemZ specific issue of answering the question "does this SU use MCProcResource X?" -- see top of page.

Just to express my thoughts: I see this point in the sense that if a target truly had a perfect set-in-stone tuning, it would be disastrous to change anything in an uncontrollable way. But given that mischeduler is relatively new and evolving, and a target may merely have been able to improve benchmarks with a minor modification, I think it's more natural to think that a target would really want to be in on the improvements to come in the future. In other words, *not* to decouple. Take register pressure, for instance. I don't think a target that does not have it's own register pressure heuristics would want to fall behind if the common code changes in the future. That change should then be general goodness, or it should be put in a target specific strategy, right?

So to me personally, working on just one backend, it would be slightly preferred to have e.g. the tryCandidate_RegPress() function in the target strategy, so that if somebody improved it, my target would immediately get that improvement. I don't want to decouple this, since I was merely adding a heuristic with lesser priority.

On the other hand, providing just the smaller utility functions and then doing a copy-and-paste of tryCandidate() should probably work quite well in practice, as you say. In the very long run, this would also give maximum flexibility for each target.
I suppose then we could just have one or two of those new methods, like tryCandidate_Clustered_Weak().

OK. I think it's very hard to group heuristics in a meaningful way. Some repetition of code on the target side will make it easier to maintain. Backing up a bit, my broader concern is that when Targets become too dependent on the incidental behavior of machine independent code, it really inhibits changes to the machine independent code.

BTW, I am still open to suggestions on the SystemZ specific issue of answering the question "does this SU use MCProcResource X?" -- see top of page.

I haven't worked with the code in years, but here's my take. In your subtarget you can define your own symbolic constant, like SystemZVectorUnitIdx = 6. You can assert that 0 == strcmp(getProcResource(SytemZVectorUnitIdx).Name, "Z14_VecUnit"). If, in the rare event that you add resources to this subtarget, the assert will force you to update this constant.

You can define a method on your Subtarget roughly like this:

auto *resource = getWriteProcResBegin(SC);
if (resource->ProcResourceIdx == SystemZVectorUnitIdx)

Hope that works.

-Andy

Thanks for review and advice, patch updated.

OK. I think it's very hard to group heuristics in a meaningful way. Some repetition of code on the target side will make it easier to maintain. Backing up a bit, my broader concern is that when Targets become too dependent on the incidental behavior of machine independent code, it really inhibits changes to the machine independent code.

I have updated the patch per your guidelines where the target copies tryCandidate() while reusing only small utility functions like tryLess() etc, which are now declared in MachineScheduler.h.

... In your subtarget you can define your own symbolic constant...

OK, I tried that, which is at least a slight improvement to looking it up every time.

I had to wrap the AMDGPU tryLess() and tryGreater() methods in a local namespace to avoid conflict.

Do you still want me to split this into a separate patch without the SystemZ part? (You don't have to approve that)

Herald added subscribers: nhaehnle, arsenm. · View Herald TranscriptFeb 26 2018, 7:33 AM

This looks fine to me based on a quick review. I don't know if @fhahn or @MatzeB still want to weigh in. Not sure if anyone else needs to review the SystemZ specific code or if you effectively own that.

This revision is now accepted and ready to land.Feb 26 2018, 10:54 AM

Great, thanks Jonas! LGTM too.

lib/Target/AMDGPU/SIMachineScheduler.cpp
157	nit: not sure, should nested namespaces be separated by newlines?

Thanks for review!

Not sure if anyone else needs to review the SystemZ specific code or if you effectively own that.

Uli is the reviewer for the SystemZ part, as usual.

NFC update to put nested namespaces on separate lines.

lib/Target/AMDGPU/SIMachineScheduler.cpp
157	I copied this from the PowerPC backend, but looking at the "Namespace Indentation" section of the coding guidlines, it seems that at least in the example there newlines are used, so it seems you are right.

Hi Jonas,

are you planning on committing this patch in the near future? I suppose you are waiting until @uweigand had a look at the SystemZ specific changes?

Recently, a few people have been looking into tweaking scheduler heuristics, and after this change goes in we could easily add an experimental scheduler for AArch64 to experiment with tweaking heuristics.

Thanks again for putting up the patch!

Florian

In D43329#1064275, @fhahn wrote:

Hi Jonas,

are you planning on committing this patch in the near future? I suppose you are waiting until @uweigand had a look at the SystemZ specific changes?

Recently, a few people have been looking into tweaking scheduler heuristics, and after this change goes in we could easily add an experimental scheduler for AArch64 to experiment with tweaking heuristics.

Thanks again for putting up the patch!

Florian

I committed the common code parts as r329884. We are waiting a bit with this for SystemZ, since we are having second thoughts that we probably should enable bi-directional scheduling before enabling tryLatency() in a specialized case (without bi-directional the second tryLatency() call is never made).

Thanks for review.

Since the common code changes have now been committed, this is an update to only keep the SystemZ specific parts.

This revision now requires review to proceed.Apr 12 2018, 1:27 AM

nhaehnle removed a subscriber: nhaehnle.Apr 12 2018, 8:32 AM

Revision Contents

Path

Size

include/

llvm/

CodeGen/

MachineScheduler.h

27 lines

lib/

CodeGen/

MachineScheduler.cpp

46 lines

Target/

AMDGPU/

SIMachineScheduler.cpp

58 lines

SystemZ/

SystemZMachineScheduler.h

15 lines

SystemZMachineScheduler.cpp

128 lines

SystemZTargetMachine.cpp

11 lines

test/

CodeGen/

SystemZ/

vec-cmp-cmp-logic-select.ll

300 lines

vec-cmpsel.ll

76 lines

vec-ctpop-01.ll

8 lines

Diff 136042

include/llvm/CodeGen/MachineScheduler.h

Show First 20 Lines • Show All 885 Lines • ▼ Show 20 Lines	protected:
const TargetRegisterInfo *TRI = nullptr;		const TargetRegisterInfo *TRI = nullptr;

SchedRemainder Rem;		SchedRemainder Rem;

GenericSchedulerBase(const MachineSchedContext *C) : Context(C) {}		GenericSchedulerBase(const MachineSchedContext *C) : Context(C) {}

void setPolicy(CandPolicy &Policy, bool IsPostRA, SchedBoundary &CurrZone,		void setPolicy(CandPolicy &Policy, bool IsPostRA, SchedBoundary &CurrZone,
SchedBoundary *OtherZone);		SchedBoundary *OtherZone);

#ifndef NDEBUG		#ifndef NDEBUG
		fhahnUnsubmitted Not Done Reply Inline Actions I am not sure what the benefit of that is and at the moment it seems like one setting too much to me. I think it is not unreasonable to expect people to provide their own tryCandidate implementation if they want custom latency heuristics. fhahn: I am not sure what the benefit of that is and at the moment it seems like one setting too much…
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions Well... why have duplicated code in a Target backend, if it can be helped? To me, it is generally reasonable to assume that a target starts out with enabling the generic mischeduler, and then as a next step experiment with it by adding / removing / reordering heuristics. What harm is it then to provide the means to do so with protected member methods? jonpa: Well... why have duplicated code in a Target backend, if it can be helped? To me, it is…
		fhahnUnsubmitted Not Done Reply Inline Actions IMO with too much subclassing and overriding, it can be quite complicated to see what's going on - but I think that comes down to personal preference, mostly. And that's just me, I suppose with those kinds of things @MatzeB and @atrick 's thoughts are much more important ;) I also think it would be nice to have all heuristic functions somewhere together, to make it easier to keep track of them. fhahn: IMO with too much subclassing and overriding, it can be quite complicated to see what's going…
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions I tried adding const to all the new methods, which seemed to increase readability at least to me. tryLatency() is used both pre- and post-RA, so it needs to be in the base. I guess tryGreater() and other such small methods should also all go in the base class. Personally, I think it makes sense to have these type of functions as part of the class, since after all that is a scheduling strategy class for which these methods actually are very central. But like you said as well, that's just me... jonpa: I tried adding const to all the new methods, which seemed to increase readability at least to…
void traceCandidate(const SchedCandidate &Cand);		void traceCandidate(const SchedCandidate &Cand);
#endif		#endif
};		};

		// Utility functions used by heuristics in tryCand().
		bool tryLess(int TryVal, int CandVal,
		GenericSchedulerBase::SchedCandidate &TryCand,
		GenericSchedulerBase::SchedCandidate &Cand,
		GenericSchedulerBase::CandReason Reason);
		bool tryGreater(int TryVal, int CandVal,
		GenericSchedulerBase::SchedCandidate &TryCand,
		GenericSchedulerBase::SchedCandidate &Cand,
		GenericSchedulerBase::CandReason Reason);
		bool tryLatency(GenericSchedulerBase::SchedCandidate &TryCand,
		GenericSchedulerBase::SchedCandidate &Cand,
		SchedBoundary &Zone);
		bool tryPressure(const PressureChange &TryP,
		const PressureChange &CandP,
		GenericSchedulerBase::SchedCandidate &TryCand,
		GenericSchedulerBase::SchedCandidate &Cand,
		GenericSchedulerBase::CandReason Reason,
		const TargetRegisterInfo *TRI,
		const MachineFunction &MF);
		unsigned getWeakLeft(const SUnit *SU, bool isTop);
		int biasPhysRegCopy(const SUnit *SU, bool isTop);

/// GenericScheduler shrinks the unscheduled zone using heuristics to balance		/// GenericScheduler shrinks the unscheduled zone using heuristics to balance
/// the schedule.		/// the schedule.
class GenericScheduler : public GenericSchedulerBase {		class GenericScheduler : public GenericSchedulerBase {
public:		public:
GenericScheduler(const MachineSchedContext *C):		GenericScheduler(const MachineSchedContext *C):
GenericSchedulerBase(C), Top(SchedBoundary::TopQID, "TopQ"),		GenericSchedulerBase(C), Top(SchedBoundary::TopQID, "TopQ"),
Bot(SchedBoundary::BotQID, "BotQ") {}		Bot(SchedBoundary::BotQID, "BotQ") {}

▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	protected:
SchedCandidate BotCand;		SchedCandidate BotCand;

void checkAcyclicLatency();		void checkAcyclicLatency();

void initCandidate(SchedCandidate &Cand, SUnit *SU, bool AtTop,		void initCandidate(SchedCandidate &Cand, SUnit *SU, bool AtTop,
const RegPressureTracker &RPTracker,		const RegPressureTracker &RPTracker,
RegPressureTracker &TempTracker);		RegPressureTracker &TempTracker);

void tryCandidate(SchedCandidate &Cand,		virtual void tryCandidate(SchedCandidate &Cand, SchedCandidate &TryCand,
SchedCandidate &TryCand,		SchedBoundary *Zone) const;
		fhahnUnsubmitted Not Done Reply Inline Actions Usually camel case is used for function names, so I think it would be better to have tryCandidateRegPressure - same for other methods below. fhahn: Usually camel case is used for function names, so I think it would be better to have…
SchedBoundary *Zone);

SUnit *pickNodeBidirectional(bool &IsTopNode);		SUnit *pickNodeBidirectional(bool &IsTopNode);

void pickNodeFromQueue(SchedBoundary &Zone,		void pickNodeFromQueue(SchedBoundary &Zone,
const CandPolicy &ZonePolicy,		const CandPolicy &ZonePolicy,
const RegPressureTracker &RPTracker,		const RegPressureTracker &RPTracker,
SchedCandidate &Candidate);		SchedCandidate &Candidate);

void reschedulePhysRegCopies(SUnit *SU, bool isTop);		void reschedulePhysRegCopies(SUnit *SU, bool isTop);
};		};

		fhahnUnsubmitted Not Done Reply Inline Actions I think for now it seems like making tryCandidate a virtual function gives peopl eenough freedom to ship their own heuristics while re-using most of the existing bits. I don't see the benefit of making the helper functions class members however. Also, tryCandidate should not modify the scheduler state, so I think it should be `const`. And as this is part of the interface now and we expect people to extend it, it would be good to document it. fhahn: I think for now it seems like making tryCandidate a virtual function gives peopl eenough…
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions How are those bits going to be reused if not by inheritance? How do you picture SystemZ adding this simple heuristic (see top of page) while reusing all else? jonpa: How are those bits going to be reused if not by inheritance? How do you picture SystemZ adding…
		fhahnUnsubmitted Not Done Reply Inline Actions How do you picture SystemZ adding this simple heuristic (see top of page) while reusing all else? They do not have to be member functions, they could be regular exported functions. They don't access the scheduler object (I think), so I don't think they would need to be member functions. To me, it seems simpler not to couple the scheduler and the heuristics too much, but that's again mostly personal perference. fhahn: > How do you picture SystemZ adding this simple heuristic (see top of page) while reusing all…
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions I tried removing the 'static' from such a function in MachineScheduler.cpp, and then adding an 'extern' declaration in SystemZMachineScheduler.cpp, but that lead to a lot of linker errors. How would you do this? jonpa: I tried removing the 'static' from such a function in MachineScheduler.cpp, and then adding an…
		fhahnUnsubmitted Not Done Reply Inline Actions ah sorry. I meant removing static and moving the declaration to the header file. fhahn: ah sorry. I meant removing static and moving the declaration to the header file.
/// PostGenericScheduler - Interface to the scheduling algorithm used by		/// PostGenericScheduler - Interface to the scheduling algorithm used by
/// ScheduleDAGMI.		/// ScheduleDAGMI.
///		///
/// Callbacks from ScheduleDAGMI:		/// Callbacks from ScheduleDAGMI:
/// initPolicy -> initialize(DAG) -> registerRoots -> pickNode ...		/// initPolicy -> initialize(DAG) -> registerRoots -> pickNode ...
class PostGenericScheduler : public GenericSchedulerBase {		class PostGenericScheduler : public GenericSchedulerBase {
ScheduleDAGMI *DAG;		ScheduleDAGMI *DAG;
SchedBoundary Top;		SchedBoundary Top;
▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

lib/CodeGen/MachineScheduler.cpp

Show First 20 Lines • Show All 2,555 Lines • ▼ Show 20 Lines	void GenericSchedulerBase::traceCandidate(const SchedCandidate &Cand) {
if (Latency)		if (Latency)
dbgs() << " " << Latency << " cycles ";		dbgs() << " " << Latency << " cycles ";
else		else
dbgs() << " ";		dbgs() << " ";
dbgs() << '\n';		dbgs() << '\n';
}		}
#endif		#endif

		namespace llvm {
/// Return true if this heuristic determines order.		/// Return true if this heuristic determines order.
static bool tryLess(int TryVal, int CandVal,		bool tryLess(int TryVal, int CandVal,
GenericSchedulerBase::SchedCandidate &TryCand,		GenericSchedulerBase::SchedCandidate &TryCand,
GenericSchedulerBase::SchedCandidate &Cand,		GenericSchedulerBase::SchedCandidate &Cand,
GenericSchedulerBase::CandReason Reason) {		GenericSchedulerBase::CandReason Reason) {
if (TryVal < CandVal) {		if (TryVal < CandVal) {
TryCand.Reason = Reason;		TryCand.Reason = Reason;
return true;		return true;
}		}
if (TryVal > CandVal) {		if (TryVal > CandVal) {
if (Cand.Reason > Reason)		if (Cand.Reason > Reason)
Cand.Reason = Reason;		Cand.Reason = Reason;
return true;		return true;
}		}
return false;		return false;
}		}

static bool tryGreater(int TryVal, int CandVal,		bool tryGreater(int TryVal, int CandVal,
GenericSchedulerBase::SchedCandidate &TryCand,		GenericSchedulerBase::SchedCandidate &TryCand,
GenericSchedulerBase::SchedCandidate &Cand,		GenericSchedulerBase::SchedCandidate &Cand,
GenericSchedulerBase::CandReason Reason) {		GenericSchedulerBase::CandReason Reason) {
if (TryVal > CandVal) {		if (TryVal > CandVal) {
TryCand.Reason = Reason;		TryCand.Reason = Reason;
return true;		return true;
}		}
if (TryVal < CandVal) {		if (TryVal < CandVal) {
if (Cand.Reason > Reason)		if (Cand.Reason > Reason)
Cand.Reason = Reason;		Cand.Reason = Reason;
return true;		return true;
}		}
return false;		return false;
}		}

static bool tryLatency(GenericSchedulerBase::SchedCandidate &TryCand,		bool tryLatency(GenericSchedulerBase::SchedCandidate &TryCand,
GenericSchedulerBase::SchedCandidate &Cand,		GenericSchedulerBase::SchedCandidate &Cand,
		fhahnUnsubmitted Not Done Reply Inline Actions Splitting the code into multiple helper functions would be great opportunity to document the heuristics :) fhahn: Splitting the code into multiple helper functions would be great opportunity to document the…
SchedBoundary &Zone) {		SchedBoundary &Zone) {
if (Zone.isTop()) {		if (Zone.isTop()) {
if (Cand.SU->getDepth() > Zone.getScheduledLatency()) {		if (Cand.SU->getDepth() > Zone.getScheduledLatency()) {
if (tryLess(TryCand.SU->getDepth(), Cand.SU->getDepth(),		if (tryLess(TryCand.SU->getDepth(), Cand.SU->getDepth(),
TryCand, Cand, GenericSchedulerBase::TopDepthReduce))		TryCand, Cand, GenericSchedulerBase::TopDepthReduce))
return true;		return true;
}		}
if (tryGreater(TryCand.SU->getHeight(), Cand.SU->getHeight(),		if (tryGreater(TryCand.SU->getHeight(), Cand.SU->getHeight(),
TryCand, Cand, GenericSchedulerBase::TopPathReduce))		TryCand, Cand, GenericSchedulerBase::TopPathReduce))
return true;		return true;
} else {		} else {
if (Cand.SU->getHeight() > Zone.getScheduledLatency()) {		if (Cand.SU->getHeight() > Zone.getScheduledLatency()) {
if (tryLess(TryCand.SU->getHeight(), Cand.SU->getHeight(),		if (tryLess(TryCand.SU->getHeight(), Cand.SU->getHeight(),
TryCand, Cand, GenericSchedulerBase::BotHeightReduce))		TryCand, Cand, GenericSchedulerBase::BotHeightReduce))
return true;		return true;
}		}
if (tryGreater(TryCand.SU->getDepth(), Cand.SU->getDepth(),		if (tryGreater(TryCand.SU->getDepth(), Cand.SU->getDepth(),
TryCand, Cand, GenericSchedulerBase::BotPathReduce))		TryCand, Cand, GenericSchedulerBase::BotPathReduce))
return true;		return true;
}		}
return false;		return false;
}		}
		} // end namespace llvm

static void tracePick(GenericSchedulerBase::CandReason Reason, bool IsTop) {		static void tracePick(GenericSchedulerBase::CandReason Reason, bool IsTop) {
DEBUG(dbgs() << "Pick " << (IsTop ? "Top " : "Bot ")		DEBUG(dbgs() << "Pick " << (IsTop ? "Top " : "Bot ")
<< GenericSchedulerBase::getReasonStr(Reason) << '\n');		<< GenericSchedulerBase::getReasonStr(Reason) << '\n');
}		}

static void tracePick(const GenericSchedulerBase::SchedCandidate &Cand) {		static void tracePick(const GenericSchedulerBase::SchedCandidate &Cand) {
tracePick(Cand.Reason, Cand.AtTop);		tracePick(Cand.Reason, Cand.AtTop);
▲ Show 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	void GenericScheduler::registerRoots() {
}		}

if (EnableCyclicPath && SchedModel->getMicroOpBufferSize() > 0) {		if (EnableCyclicPath && SchedModel->getMicroOpBufferSize() > 0) {
Rem.CyclicCritPath = DAG->computeCyclicCriticalPath();		Rem.CyclicCritPath = DAG->computeCyclicCriticalPath();
checkAcyclicLatency();		checkAcyclicLatency();
}		}
}		}

static bool tryPressure(const PressureChange &TryP,		namespace llvm {
		bool tryPressure(const PressureChange &TryP,
const PressureChange &CandP,		const PressureChange &CandP,
GenericSchedulerBase::SchedCandidate &TryCand,		GenericSchedulerBase::SchedCandidate &TryCand,
GenericSchedulerBase::SchedCandidate &Cand,		GenericSchedulerBase::SchedCandidate &Cand,
GenericSchedulerBase::CandReason Reason,		GenericSchedulerBase::CandReason Reason,
const TargetRegisterInfo *TRI,		const TargetRegisterInfo *TRI,
const MachineFunction &MF) {		const MachineFunction &MF) {
// If one candidate decreases and the other increases, go with it.		// If one candidate decreases and the other increases, go with it.
// Invalid candidates have UnitInc==0.		// Invalid candidates have UnitInc==0.
if (tryGreater(TryP.getUnitInc() < 0, CandP.getUnitInc() < 0, TryCand, Cand,		if (tryGreater(TryP.getUnitInc() < 0, CandP.getUnitInc() < 0, TryCand, Cand,
Reason)) {		Reason)) {
return true;		return true;
}		}
// Do not compare the magnitude of pressure changes between top and bottom		// Do not compare the magnitude of pressure changes between top and bottom
// boundary.		// boundary.
Show All 16 Lines	int CandRank = CandP.isValid() ? TRI->getRegPressureSetScore(MF, CandPSet) :
std::numeric_limits<int>::max();		std::numeric_limits<int>::max();

// If the candidates are decreasing pressure, reverse priority.		// If the candidates are decreasing pressure, reverse priority.
if (TryP.getUnitInc() < 0)		if (TryP.getUnitInc() < 0)
std::swap(TryRank, CandRank);		std::swap(TryRank, CandRank);
return tryGreater(TryRank, CandRank, TryCand, Cand, Reason);		return tryGreater(TryRank, CandRank, TryCand, Cand, Reason);
}		}

static unsigned getWeakLeft(const SUnit *SU, bool isTop) {		unsigned getWeakLeft(const SUnit *SU, bool isTop) {
return (isTop) ? SU->WeakPredsLeft : SU->WeakSuccsLeft;		return (isTop) ? SU->WeakPredsLeft : SU->WeakSuccsLeft;
}		}

/// Minimize physical register live ranges. Regalloc wants them adjacent to		/// Minimize physical register live ranges. Regalloc wants them adjacent to
/// their physreg def/use.		/// their physreg def/use.
///		///
/// FIXME: This is an unnecessary check on the critical path. Most are root/leaf		/// FIXME: This is an unnecessary check on the critical path. Most are root/leaf
/// copies which can be prescheduled. The rest (e.g. x86 MUL) could be bundled		/// copies which can be prescheduled. The rest (e.g. x86 MUL) could be bundled
/// with the operation that produces or consumes the physreg. We'll do this when		/// with the operation that produces or consumes the physreg. We'll do this when
/// regalloc has support for parallel copies.		/// regalloc has support for parallel copies.
static int biasPhysRegCopy(const SUnit *SU, bool isTop) {		int biasPhysRegCopy(const SUnit *SU, bool isTop) {
const MachineInstr *MI = SU->getInstr();		const MachineInstr *MI = SU->getInstr();
if (!MI->isCopy())		if (!MI->isCopy())
return 0;		return 0;

unsigned ScheduledOper = isTop ? 1 : 0;		unsigned ScheduledOper = isTop ? 1 : 0;
unsigned UnscheduledOper = isTop ? 0 : 1;		unsigned UnscheduledOper = isTop ? 0 : 1;
// If we have already scheduled the physreg produce/consumer, immediately		// If we have already scheduled the physreg produce/consumer, immediately
// schedule the copy.		// schedule the copy.
if (TargetRegisterInfo::isPhysicalRegister(		if (TargetRegisterInfo::isPhysicalRegister(
MI->getOperand(ScheduledOper).getReg()))		MI->getOperand(ScheduledOper).getReg()))
return 1;		return 1;
// If the physreg is at the boundary, defer it. Otherwise schedule it		// If the physreg is at the boundary, defer it. Otherwise schedule it
// immediately to free the dependent. We can hoist the copy later.		// immediately to free the dependent. We can hoist the copy later.
bool AtBoundary = isTop ? !SU->NumSuccsLeft : !SU->NumPredsLeft;		bool AtBoundary = isTop ? !SU->NumSuccsLeft : !SU->NumPredsLeft;
if (TargetRegisterInfo::isPhysicalRegister(		if (TargetRegisterInfo::isPhysicalRegister(
MI->getOperand(UnscheduledOper).getReg()))		MI->getOperand(UnscheduledOper).getReg()))
return AtBoundary ? -1 : 1;		return AtBoundary ? -1 : 1;
return 0;		return 0;
}		}
		} // end namespace llvm

void GenericScheduler::initCandidate(SchedCandidate &Cand, SUnit *SU,		void GenericScheduler::initCandidate(SchedCandidate &Cand, SUnit *SU,
bool AtTop,		bool AtTop,
const RegPressureTracker &RPTracker,		const RegPressureTracker &RPTracker,
RegPressureTracker &TempTracker) {		RegPressureTracker &TempTracker) {
Cand.SU = SU;		Cand.SU = SU;
Cand.AtTop = AtTop;		Cand.AtTop = AtTop;
if (DAG->isTrackingPressure()) {		if (DAG->isTrackingPressure()) {
Show All 34 Lines
/// statistically analyze.		/// statistically analyze.
///		///
/// \param Cand provides the policy and current best candidate.		/// \param Cand provides the policy and current best candidate.
/// \param TryCand refers to the next SUnit candidate, otherwise uninitialized.		/// \param TryCand refers to the next SUnit candidate, otherwise uninitialized.
/// \param Zone describes the scheduled zone that we are extending, or nullptr		/// \param Zone describes the scheduled zone that we are extending, or nullptr
// if Cand is from a different zone than TryCand.		// if Cand is from a different zone than TryCand.
void GenericScheduler::tryCandidate(SchedCandidate &Cand,		void GenericScheduler::tryCandidate(SchedCandidate &Cand,
SchedCandidate &TryCand,		SchedCandidate &TryCand,
SchedBoundary *Zone) {		SchedBoundary *Zone) const {
// Initialize the candidate if needed.		// Initialize the candidate if needed.
if (!Cand.isValid()) {		if (!Cand.isValid()) {
TryCand.Reason = NodeOrder;		TryCand.Reason = NodeOrder;
return;		return;
}		}

if (tryGreater(biasPhysRegCopy(TryCand.SU, TryCand.AtTop),		if (tryGreater(biasPhysRegCopy(TryCand.SU, TryCand.AtTop),
biasPhysRegCopy(Cand.SU, Cand.AtTop),		biasPhysRegCopy(Cand.SU, Cand.AtTop),
▲ Show 20 Lines • Show All 747 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIMachineScheduler.cpp

Show First 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	static const char *getReasonStr(SIScheduleCandReason Reason) {
case Depth: return "DEPTH";		case Depth: return "DEPTH";
case NodeOrder: return "ORDER";		case NodeOrder: return "ORDER";
}		}
llvm_unreachable("Unknown reason!");		llvm_unreachable("Unknown reason!");
}		}

#endif		#endif

		namespace llvm {
		fhahnUnsubmitted Not Done Reply Inline Actions nit: not sure, should nested namespaces be separated by newlines? fhahn: nit: not sure, should nested namespaces be separated by newlines?
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions I copied this from the PowerPC backend, but looking at the "Namespace Indentation" section of the coding guidlines, it seems that at least in the example there newlines are used, so it seems you are right. jonpa: I copied this from the PowerPC backend, but looking at the "Namespace Indentation" section of…
		namespace SISched {
static bool tryLess(int TryVal, int CandVal,		static bool tryLess(int TryVal, int CandVal,
SISchedulerCandidate &TryCand,		SISchedulerCandidate &TryCand,
SISchedulerCandidate &Cand,		SISchedulerCandidate &Cand,
SIScheduleCandReason Reason) {		SIScheduleCandReason Reason) {
if (TryVal < CandVal) {		if (TryVal < CandVal) {
TryCand.Reason = Reason;		TryCand.Reason = Reason;
return true;		return true;
}		}
Show All 17 Lines	static bool tryGreater(int TryVal, int CandVal,
if (TryVal < CandVal) {		if (TryVal < CandVal) {
if (Cand.Reason > Reason)		if (Cand.Reason > Reason)
Cand.Reason = Reason;		Cand.Reason = Reason;
return true;		return true;
}		}
Cand.setRepeat(Reason);		Cand.setRepeat(Reason);
return false;		return false;
}		}
		} // end namespace SISched
		} // end namespace llvm

// SIScheduleBlock //		// SIScheduleBlock //

void SIScheduleBlock::addUnit(SUnit *SU) {		void SIScheduleBlock::addUnit(SUnit *SU) {
NodeNum2Index[SU->NodeNum] = SUnits.size();		NodeNum2Index[SU->NodeNum] = SUnits.size();
SUnits.push_back(SU);		SUnits.push_back(SU);
}		}

Show All 9 Lines	void SIScheduleBlock::tryCandidateTopDown(SISchedCandidate &Cand,
SISchedCandidate &TryCand) {		SISchedCandidate &TryCand) {
// Initialize the candidate if needed.		// Initialize the candidate if needed.
if (!Cand.isValid()) {		if (!Cand.isValid()) {
TryCand.Reason = NodeOrder;		TryCand.Reason = NodeOrder;
return;		return;
}		}

if (Cand.SGPRUsage > 60 &&		if (Cand.SGPRUsage > 60 &&
tryLess(TryCand.SGPRUsage, Cand.SGPRUsage, TryCand, Cand, RegUsage))		SISched::tryLess(TryCand.SGPRUsage, Cand.SGPRUsage,
		TryCand, Cand, RegUsage))
return;		return;

// Schedule low latency instructions as top as possible.		// Schedule low latency instructions as top as possible.
// Order of priority is:		// Order of priority is:
// . Low latency instructions which do not depend on other low latency		// . Low latency instructions which do not depend on other low latency
// instructions we haven't waited for		// instructions we haven't waited for
// . Other instructions which do not depend on low latency instructions		// . Other instructions which do not depend on low latency instructions
// we haven't waited for		// we haven't waited for
// . Low latencies		// . Low latencies
// . All other instructions		// . All other instructions
// Goal is to get: low latency instructions - independent instructions		// Goal is to get: low latency instructions - independent instructions
// - (eventually some more low latency instructions)		// - (eventually some more low latency instructions)
// - instructions that depend on the first low latency instructions.		// - instructions that depend on the first low latency instructions.
// If in the block there is a lot of constant loads, the SGPR usage		// If in the block there is a lot of constant loads, the SGPR usage
// could go quite high, thus above the arbitrary limit of 60 will encourage		// could go quite high, thus above the arbitrary limit of 60 will encourage
// use the already loaded constants (in order to release some SGPRs) before		// use the already loaded constants (in order to release some SGPRs) before
// loading more.		// loading more.
if (tryLess(TryCand.HasLowLatencyNonWaitedParent,		if (SISched::tryLess(TryCand.HasLowLatencyNonWaitedParent,
Cand.HasLowLatencyNonWaitedParent,		Cand.HasLowLatencyNonWaitedParent,
TryCand, Cand, SIScheduleCandReason::Depth))		TryCand, Cand, SIScheduleCandReason::Depth))
return;		return;

if (tryGreater(TryCand.IsLowLatency, Cand.IsLowLatency,		if (SISched::tryGreater(TryCand.IsLowLatency, Cand.IsLowLatency,
TryCand, Cand, SIScheduleCandReason::Depth))		TryCand, Cand, SIScheduleCandReason::Depth))
return;		return;

if (TryCand.IsLowLatency &&		if (TryCand.IsLowLatency &&
tryLess(TryCand.LowLatencyOffset, Cand.LowLatencyOffset,		SISched::tryLess(TryCand.LowLatencyOffset, Cand.LowLatencyOffset,
TryCand, Cand, SIScheduleCandReason::Depth))		TryCand, Cand, SIScheduleCandReason::Depth))
return;		return;

if (tryLess(TryCand.VGPRUsage, Cand.VGPRUsage, TryCand, Cand, RegUsage))		if (SISched::tryLess(TryCand.VGPRUsage, Cand.VGPRUsage,
		TryCand, Cand, RegUsage))
return;		return;

// Fall through to original instruction order.		// Fall through to original instruction order.
if (TryCand.SU->NodeNum < Cand.SU->NodeNum) {		if (TryCand.SU->NodeNum < Cand.SU->NodeNum) {
TryCand.Reason = NodeOrder;		TryCand.Reason = NodeOrder;
}		}
}		}

▲ Show 20 Lines • Show All 1,315 Lines • ▼ Show 20 Lines
bool SIScheduleBlockScheduler::tryCandidateLatency(SIBlockSchedCandidate &Cand,		bool SIScheduleBlockScheduler::tryCandidateLatency(SIBlockSchedCandidate &Cand,
SIBlockSchedCandidate &TryCand) {		SIBlockSchedCandidate &TryCand) {
if (!Cand.isValid()) {		if (!Cand.isValid()) {
TryCand.Reason = NodeOrder;		TryCand.Reason = NodeOrder;
return true;		return true;
}		}

// Try to hide high latencies.		// Try to hide high latencies.
if (tryLess(TryCand.LastPosHighLatParentScheduled,		if (SISched::tryLess(TryCand.LastPosHighLatParentScheduled,
Cand.LastPosHighLatParentScheduled, TryCand, Cand, Latency))		Cand.LastPosHighLatParentScheduled, TryCand, Cand, Latency))
return true;		return true;
// Schedule high latencies early so you can hide them better.		// Schedule high latencies early so you can hide them better.
if (tryGreater(TryCand.IsHighLatency, Cand.IsHighLatency,		if (SISched::tryGreater(TryCand.IsHighLatency, Cand.IsHighLatency,
TryCand, Cand, Latency))		TryCand, Cand, Latency))
return true;		return true;
if (TryCand.IsHighLatency && tryGreater(TryCand.Height, Cand.Height,		if (TryCand.IsHighLatency && SISched::tryGreater(TryCand.Height, Cand.Height,
TryCand, Cand, Depth))		TryCand, Cand, Depth))
return true;		return true;
if (tryGreater(TryCand.NumHighLatencySuccessors,		if (SISched::tryGreater(TryCand.NumHighLatencySuccessors,
Cand.NumHighLatencySuccessors,		Cand.NumHighLatencySuccessors,
TryCand, Cand, Successor))		TryCand, Cand, Successor))
return true;		return true;
return false;		return false;
}		}

bool SIScheduleBlockScheduler::tryCandidateRegUsage(SIBlockSchedCandidate &Cand,		bool SIScheduleBlockScheduler::tryCandidateRegUsage(SIBlockSchedCandidate &Cand,
SIBlockSchedCandidate &TryCand) {		SIBlockSchedCandidate &TryCand) {
if (!Cand.isValid()) {		if (!Cand.isValid()) {
TryCand.Reason = NodeOrder;		TryCand.Reason = NodeOrder;
return true;		return true;
}		}

if (tryLess(TryCand.VGPRUsageDiff > 0, Cand.VGPRUsageDiff > 0,		if (SISched::tryLess(TryCand.VGPRUsageDiff > 0, Cand.VGPRUsageDiff > 0,
TryCand, Cand, RegUsage))		TryCand, Cand, RegUsage))
return true;		return true;
if (tryGreater(TryCand.NumSuccessors > 0,		if (SISched::tryGreater(TryCand.NumSuccessors > 0,
Cand.NumSuccessors > 0,		Cand.NumSuccessors > 0,
TryCand, Cand, Successor))		TryCand, Cand, Successor))
return true;		return true;
if (tryGreater(TryCand.Height, Cand.Height, TryCand, Cand, Depth))		if (SISched::tryGreater(TryCand.Height, Cand.Height, TryCand, Cand, Depth))
return true;		return true;
if (tryLess(TryCand.VGPRUsageDiff, Cand.VGPRUsageDiff,		if (SISched::tryLess(TryCand.VGPRUsageDiff, Cand.VGPRUsageDiff,
TryCand, Cand, RegUsage))		TryCand, Cand, RegUsage))
return true;		return true;
return false;		return false;
}		}

SIScheduleBlock *SIScheduleBlockScheduler::pickBlock() {		SIScheduleBlock *SIScheduleBlockScheduler::pickBlock() {
SIBlockSchedCandidate Cand;		SIBlockSchedCandidate Cand;
std::vector<SIScheduleBlock*>::iterator Best;		std::vector<SIScheduleBlock*>::iterator Best;
SIScheduleBlock *Block;		SIScheduleBlock *Block;
▲ Show 20 Lines • Show All 437 Lines • Show Last 20 Lines

lib/Target/SystemZ/SystemZMachineScheduler.h

Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	public:
/// SU has had all predecessor dependencies resolved. Put it into		/// SU has had all predecessor dependencies resolved. Put it into
/// Available.		/// Available.
void releaseTopNode(SUnit *SU) override;		void releaseTopNode(SUnit *SU) override;

/// Currently only scheduling top-down, so this method is empty.		/// Currently only scheduling top-down, so this method is empty.
void releaseBottomNode(SUnit *SU) override {};		void releaseBottomNode(SUnit *SU) override {};
};		};

		class SystemZPreRASchedStrategy : public GenericScheduler {
		const SystemZSubtarget *ST;

		// The VectorUnit index is 6 for both z13 and z14.
		const unsigned SystemZVectorUnitIdx = 6;

		public:
		SystemZPreRASchedStrategy(const MachineSchedContext *C) :
		GenericScheduler(C), ST(&C->MF->getSubtarget<SystemZSubtarget>()) {}

		void tryCandidate(SchedCandidate &Cand,
		SchedCandidate &TryCand,
		SchedBoundary *Zone) const override;
		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_SYSTEMZ_SYSTEMZMACHINESCHEDULER_H		#endif // LLVM_LIB_TARGET_SYSTEMZ_SYSTEMZMACHINESCHEDULER_H

lib/Target/SystemZ/SystemZMachineScheduler.cpp

Show First 20 Lines • Show All 250 Lines • ▼ Show 20 Lines	void SystemZPostRASchedStrategy::releaseTopNode(SUnit *SU) {
// pickNode().		// pickNode().
const MCSchedClassDesc *SC = HazardRec->getSchedClass(SU);		const MCSchedClassDesc *SC = HazardRec->getSchedClass(SU);
bool AffectsGrouping = (SC->isValid() && (SC->BeginGroup \|\| SC->EndGroup));		bool AffectsGrouping = (SC->isValid() && (SC->BeginGroup \|\| SC->EndGroup));
SU->isScheduleHigh = (AffectsGrouping \|\| SU->isUnbuffered);		SU->isScheduleHigh = (AffectsGrouping \|\| SU->isUnbuffered);

// Put all released SUs in the Available set.		// Put all released SUs in the Available set.
Available.insert(SU);		Available.insert(SU);
}		}


		//////////// Pre-RA scheduling

		// This is mostly copied from MachineScheduler.cpp.
		void SystemZPreRASchedStrategy::
		tryCandidate(SchedCandidate &Cand,
		SchedCandidate &TryCand,
		SchedBoundary *Zone) const {
		// Initialize the candidate if needed.
		if (!Cand.isValid()) {
		TryCand.Reason = NodeOrder;
		return;
		}

		if (tryGreater(biasPhysRegCopy(TryCand.SU, TryCand.AtTop),
		biasPhysRegCopy(Cand.SU, Cand.AtTop),
		TryCand, Cand, PhysRegCopy))
		return;

		// Avoid exceeding the target's limit.
		if (DAG->isTrackingPressure() && tryPressure(TryCand.RPDelta.Excess,
		Cand.RPDelta.Excess,
		TryCand, Cand, RegExcess, TRI,
		DAG->MF))
		return;

		// Avoid increasing the max critical pressure in the scheduled region.
		if (DAG->isTrackingPressure() && tryPressure(TryCand.RPDelta.CriticalMax,
		Cand.RPDelta.CriticalMax,
		TryCand, Cand, RegCritical, TRI,
		DAG->MF))
		return;

		// We only compare a subset of features when comparing nodes between
		// Top and Bottom boundary. Some properties are simply incomparable, in many
		// other instances we should only override the other boundary if something
		// is a clear good pick on one boundary. Skip heuristics that are more
		// "tie-breaking" in nature.
		bool SameBoundary = Zone != nullptr;
		if (SameBoundary) {
		// For loops that are acyclic path limited, aggressively schedule for
		// latency. Within an single cycle, whenever CurrMOps > 0, allow normal
		// heuristics to take precedence.
		if (Rem.IsAcyclicLatencyLimited && !Zone->getCurrMOps() &&
		tryLatency(TryCand, Cand, *Zone))
		return;

		// Prioritize instructions that read unbuffered resources by stall cycles.
		if (tryLess(Zone->getLatencyStallCycles(TryCand.SU),
		Zone->getLatencyStallCycles(Cand.SU), TryCand, Cand, Stall))
		return;
		}

		// Keep clustered nodes together to encourage downstream peephole
		// optimizations which may reduce resource requirements.
		//
		// This is a best effort to set things up for a post-RA pass. Optimizations
		// like generating loads of multiple registers should ideally be done within
		// the scheduler pass by combining the loads during DAG postprocessing.
		const SUnit *CandNextClusterSU =
		Cand.AtTop ? DAG->getNextClusterSucc() : DAG->getNextClusterPred();
		const SUnit *TryCandNextClusterSU =
		TryCand.AtTop ? DAG->getNextClusterSucc() : DAG->getNextClusterPred();
		if (tryGreater(TryCand.SU == TryCandNextClusterSU,
		Cand.SU == CandNextClusterSU,
		TryCand, Cand, Cluster))
		return;

		if (SameBoundary) {
		// Weak edges are for clustering and other constraints.
		if (tryLess(getWeakLeft(TryCand.SU, TryCand.AtTop),
		getWeakLeft(Cand.SU, Cand.AtTop),
		TryCand, Cand, Weak))
		return;
		}

		// Avoid increasing the max pressure of the entire region.
		if (DAG->isTrackingPressure() && tryPressure(TryCand.RPDelta.CurrentMax,
		Cand.RPDelta.CurrentMax,
		TryCand, Cand, RegMax, TRI,
		DAG->MF))
		return;

		// SystemZ specific: Latency boost for instructions using the vector unit.
		if (ST->hasVector()) {
		assert ((std::
		string(SchedModel->getProcResource(SystemZVectorUnitIdx)->Name)
		.find("VecUnit") != std::string::npos) &&
		"Hard coded index for vector unit changed!");
		bool VectorPipeline = false;
		const MCSchedClassDesc *SC = DAG->getSchedClass(TryCand.SU);
		for (TargetSchedModel::ProcResIter
		PI = SchedModel->getWriteProcResBegin(SC),
		PE = SchedModel->getWriteProcResEnd(SC); PI != PE; ++PI) {
		if (PI->ProcResourceIdx == SystemZVectorUnitIdx) {
		VectorPipeline = true;
		break;
		}
		}
		if (VectorPipeline && tryLatency(TryCand, Cand, *Zone))
		return;
		}

		if (SameBoundary) {
		// Avoid critical resource consumption and balance the schedule.
		TryCand.initResourceDelta(DAG, SchedModel);
		if (tryLess(TryCand.ResDelta.CritResources, Cand.ResDelta.CritResources,
		TryCand, Cand, ResourceReduce))
		return;
		if (tryGreater(TryCand.ResDelta.DemandedResources,
		Cand.ResDelta.DemandedResources,
		TryCand, Cand, ResourceDemand))
		return;

		// Avoid serializing long latency dependence chains.
		// For acyclic path limited loops, latency was already checked above.
		if (!RegionPolicy.DisableLatencyHeuristic && TryCand.Policy.ReduceLatency &&
		!Rem.IsAcyclicLatencyLimited && tryLatency(TryCand, Cand, *Zone))
		return;

		// Fall through to original instruction order.
		if ((Zone->isTop() && TryCand.SU->NodeNum < Cand.SU->NodeNum)
		\|\| (!Zone->isTop() && TryCand.SU->NodeNum > Cand.SU->NodeNum)) {
		TryCand.Reason = NodeOrder;
		}
		}
		}

lib/Target/SystemZ/SystemZTargetMachine.cpp

Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	public:
SystemZPassConfig(SystemZTargetMachine &TM, PassManagerBase &PM)		SystemZPassConfig(SystemZTargetMachine &TM, PassManagerBase &PM)
: TargetPassConfig(TM, PM) {}		: TargetPassConfig(TM, PM) {}

SystemZTargetMachine &getSystemZTargetMachine() const {		SystemZTargetMachine &getSystemZTargetMachine() const {
return getTM<SystemZTargetMachine>();		return getTM<SystemZTargetMachine>();
}		}

ScheduleDAGInstrs *		ScheduleDAGInstrs *
		createMachineScheduler(MachineSchedContext *C) const override {
		// To run the generic pre-RA scheduler use: -misched=converge
		ScheduleDAGMILive *DAG =
		new ScheduleDAGMILive(C, llvm::make_unique<SystemZPreRASchedStrategy>(C));

		// Use same DAG mutators as are applied in createGenericSchedLive().
		DAG->addMutation(createCopyConstrainDAGMutation(DAG->TII, DAG->TRI));
		return DAG;
		}

		ScheduleDAGInstrs *
createPostMachineScheduler(MachineSchedContext *C) const override {		createPostMachineScheduler(MachineSchedContext *C) const override {
return new ScheduleDAGMI(C,		return new ScheduleDAGMI(C,
llvm::make_unique<SystemZPostRASchedStrategy>(C),		llvm::make_unique<SystemZPostRASchedStrategy>(C),
/RemoveKillFlags=/true);		/RemoveKillFlags=/true);
}		}

void addIRPasses() override;		void addIRPasses() override;
bool addInstSelector() override;		bool addInstSelector() override;
▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/vec-cmp-cmp-logic-select.ll

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
; CHECK-DAG: vceqb [[REG0:%v[0-9]+]], %v24, %v26		; CHECK-DAG: vceqb [[REG0:%v[0-9]+]], %v24, %v26
; CHECK-DAG: vuphb [[REG2:%v[0-9]+]], [[REG0]]		; CHECK-DAG: vuphb [[REG2:%v[0-9]+]], [[REG0]]
; CHECK-DAG: vmrlg [[REG1:%v[0-9]+]], [[REG0]], [[REG0]]		; CHECK-DAG: vmrlg [[REG1:%v[0-9]+]], [[REG0]], [[REG0]]
; CHECK-DAG: vuphb [[REG1]], [[REG1]]		; CHECK-DAG: vuphb [[REG1]], [[REG1]]
; CHECK-DAG: vceqh [[REG3:%v[0-9]+]], %v28, %v25		; CHECK-DAG: vceqh [[REG3:%v[0-9]+]], %v28, %v25
; CHECK-DAG: vceqh [[REG4:%v[0-9]+]], %v30, %v27		; CHECK-DAG: vceqh [[REG4:%v[0-9]+]], %v30, %v27
; CHECK-DAG: vl [[REG5:%v[0-9]+]], 176(%r15)		; CHECK-DAG: vl [[REG5:%v[0-9]+]], 176(%r15)
; CHECK-DAG: vl [[REG6:%v[0-9]+]], 160(%r15)		; CHECK-DAG: vl [[REG6:%v[0-9]+]], 160(%r15)
; CHECK-DAG: vo [[REG7:%v[0-9]+]], %v2, [[REG4]]		; CHECK-DAG: vo [[REG7:%v[0-9]+]], [[REG1]], [[REG4]]
; CHECK-DAG: vo [[REG8:%v[0-9]+]], [[REG2]], [[REG3]]		; CHECK-DAG: vo [[REG8:%v[0-9]+]], [[REG2]], [[REG3]]
; CHECK-DAG: vsel %v24, %v29, [[REG6]], [[REG8]]		; CHECK-DAG: vsel %v24, %v29, [[REG6]], [[REG8]]
; CHECK-DAG: vsel %v26, %v31, [[REG5]], [[REG7]]		; CHECK-DAG: vsel %v26, %v31, [[REG5]], [[REG7]]
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
%cmp0 = icmp eq <16 x i8> %val1, %val2		%cmp0 = icmp eq <16 x i8> %val1, %val2
%cmp1 = icmp eq <16 x i16> %val3, %val4		%cmp1 = icmp eq <16 x i16> %val3, %val4
%and = or <16 x i1> %cmp0, %cmp1		%and = or <16 x i1> %cmp0, %cmp1
%sel = select <16 x i1> %and, <16 x i16> %val5, <16 x i16> %val6		%sel = select <16 x i1> %and, <16 x i16> %val5, <16 x i16> %val6
▲ Show 20 Lines • Show All 358 Lines • ▼ Show 20 Lines	; CHECK-NEXT: br %r14
ret <2 x i64> %sel		ret <2 x i64> %sel
}		}

define <4 x i32> @fun23(<4 x i64> %val1, <4 x i64> %val2, <4 x i32> %val3, <4 x i32> %val4, <4 x i32> %val5, <4 x i32> %val6) {		define <4 x i32> @fun23(<4 x i64> %val1, <4 x i64> %val2, <4 x i32> %val3, <4 x i32> %val4, <4 x i32> %val5, <4 x i32> %val6) {
; CHECK-LABEL: fun23:		; CHECK-LABEL: fun23:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vceqg %v0, %v26, %v30		; CHECK-NEXT: vceqg %v0, %v26, %v30
; CHECK-NEXT: vceqg %v1, %v24, %v28		; CHECK-NEXT: vceqg %v1, %v24, %v28
; CHECK-NEXT: vpkg %v0, %v1, %v0		; CHECK-DAG: vpkg %v0, %v1, %v0
; CHECK-NEXT: vceqf %v1, %v25, %v27		; CHECK-DAG: vceqf [[REG0:%v[0-9]+]], %v25, %v27
; CHECK-NEXT: vx %v0, %v0, %v1		; CHECK-NEXT: vx %v0, %v0, [[REG0]]
; CHECK-NEXT: vsel %v24, %v29, %v31, %v0		; CHECK-NEXT: vsel %v24, %v29, %v31, %v0
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
%cmp0 = icmp eq <4 x i64> %val1, %val2		%cmp0 = icmp eq <4 x i64> %val1, %val2
%cmp1 = icmp eq <4 x i32> %val3, %val4		%cmp1 = icmp eq <4 x i32> %val3, %val4
%and = xor <4 x i1> %cmp0, %cmp1		%and = xor <4 x i1> %cmp0, %cmp1
%sel = select <4 x i1> %and, <4 x i32> %val5, <4 x i32> %val6		%sel = select <4 x i1> %and, <4 x i32> %val5, <4 x i32> %val6
ret <4 x i32> %sel		ret <4 x i32> %sel
}		}
Show All 21 Lines	; CHECK-NEXT: br %r14
ret <4 x i64> %sel		ret <4 x i64> %sel
}		}

define <2 x float> @fun25(<2 x float> %val1, <2 x float> %val2, <2 x double> %val3, <2 x double> %val4, <2 x float> %val5, <2 x float> %val6) {		define <2 x float> @fun25(<2 x float> %val1, <2 x float> %val2, <2 x double> %val3, <2 x double> %val4, <2 x float> %val5, <2 x float> %val6) {
; CHECK-LABEL: fun25:		; CHECK-LABEL: fun25:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vmrlf %v0, %v26, %v26		; CHECK-NEXT: vmrlf %v0, %v26, %v26
; CHECK-NEXT: vmrlf %v1, %v24, %v24		; CHECK-NEXT: vmrlf %v1, %v24, %v24
; CHECK-NEXT: vldeb %v0, %v0		; CHECK-DAG: vldeb %v0, %v0
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-DAG: vldeb %v1, %v1
; CHECK-NEXT: vfchdb %v0, %v1, %v0		; CHECK-DAG: vfchdb %v0, %v1, %v0
; CHECK-NEXT: vmrhf %v1, %v26, %v26		; CHECK-DAG: vmrhf [[REG0:%v[0-9]+]], %v26, %v26
; CHECK-NEXT: vmrhf %v2, %v24, %v24		; CHECK-DAG: vmrhf [[REG1:%v[0-9]+]], %v24, %v24
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-DAG: vldeb [[REG0]], [[REG0]]
; CHECK-NEXT: vldeb %v2, %v2		; CHECK-DAG: vldeb [[REG1]], [[REG1]]
; CHECK-NEXT: vfchdb %v1, %v2, %v1		; CHECK-DAG: vfchdb %v1, [[REG1]], [[REG0]]
; CHECK-NEXT: vpkg %v0, %v1, %v0		; CHECK-DAG: vpkg %v0, %v1, %v0
; CHECK-NEXT: vfchdb %v1, %v28, %v30		; CHECK-DAG: vfchdb [[REG2:%v[0-9]+]], %v28, %v30
; CHECK-NEXT: vpkg %v1, %v1, %v1		; CHECK-DAG: vpkg [[REG2]], [[REG2]], [[REG2]]
; CHECK-NEXT: vo %v0, %v0, %v1		; CHECK-NEXT: vo %v0, %v0, [[REG2]]
; CHECK-NEXT: vsel %v24, %v25, %v27, %v0		; CHECK-NEXT: vsel %v24, %v25, %v27, %v0
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
;		;
; CHECK-Z14-LABEL: fun25:		; CHECK-Z14-LABEL: fun25:
; CHECK-Z14: # %bb.0:		; CHECK-Z14: # %bb.0:
; CHECK-Z14-NEXT: vfchdb %v1, %v28, %v30		; CHECK-Z14-NEXT: vfchdb %v1, %v28, %v30
; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26		; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26
; CHECK-Z14-NEXT: vpkg %v1, %v1, %v1		; CHECK-Z14-NEXT: vpkg %v1, %v1, %v1
; CHECK-Z14-NEXT: vo %v0, %v0, %v1		; CHECK-Z14-NEXT: vo %v0, %v0, %v1
; CHECK-Z14-NEXT: vsel %v24, %v25, %v27, %v0		; CHECK-Z14-NEXT: vsel %v24, %v25, %v27, %v0
; CHECK-Z14-NEXT: br %r14		; CHECK-Z14-NEXT: br %r14
%cmp0 = fcmp ogt <2 x float> %val1, %val2		%cmp0 = fcmp ogt <2 x float> %val1, %val2
%cmp1 = fcmp ogt <2 x double> %val3, %val4		%cmp1 = fcmp ogt <2 x double> %val3, %val4
%and = or <2 x i1> %cmp0, %cmp1		%and = or <2 x i1> %cmp0, %cmp1
%sel = select <2 x i1> %and, <2 x float> %val5, <2 x float> %val6		%sel = select <2 x i1> %and, <2 x float> %val5, <2 x float> %val6
ret <2 x float> %sel		ret <2 x float> %sel
}		}

define <2 x double> @fun26(<2 x float> %val1, <2 x float> %val2, <2 x double> %val3, <2 x double> %val4, <2 x double> %val5, <2 x double> %val6) {		define <2 x double> @fun26(<2 x float> %val1, <2 x float> %val2, <2 x double> %val3, <2 x double> %val4, <2 x double> %val5, <2 x double> %val6) {
; CHECK-LABEL: fun26:		; CHECK-LABEL: fun26:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vmrlf %v0, %v26, %v26		; CHECK-NEXT: vmrlf %v0, %v26, %v26
; CHECK-NEXT: vmrlf %v1, %v24, %v24		; CHECK-NEXT: vmrlf %v1, %v24, %v24
; CHECK-NEXT: vldeb %v0, %v0		; CHECK-DAG: vldeb %v0, %v0
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-DAG: vldeb %v1, %v1
; CHECK-NEXT: vfchdb %v0, %v1, %v0		; CHECK-DAG: vfchdb %v0, %v1, %v0
; CHECK-NEXT: vmrhf %v1, %v26, %v26		; CHECK-DAG: vmrhf [[REG0:%v[0-9]+]], %v26, %v26
; CHECK-NEXT: vmrhf %v2, %v24, %v24		; CHECK-DAG: vmrhf [[REG1:%v[0-9]+]], %v24, %v24
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-DAG: vldeb [[REG0]], [[REG0]]
; CHECK-NEXT: vldeb %v2, %v2		; CHECK-DAG: vldeb [[REG1]], [[REG1]]
; CHECK-NEXT: vfchdb %v1, %v2, %v1		; CHECK-DAG: vfchdb %v1, [[REG1]], [[REG0]]
; CHECK-NEXT: vpkg %v0, %v1, %v0		; CHECK-DAG: vpkg %v0, %v1, %v0
; CHECK-NEXT: vuphf %v0, %v0		; CHECK-DAG: vuphf %v0, %v0
; CHECK-NEXT: vfchdb %v1, %v28, %v30		; CHECK-DAG: vfchdb [[REG2:%v[0-9]+]], %v28, %v30
; CHECK-NEXT: vo %v0, %v0, %v1		; CHECK-NEXT: vo %v0, %v0, [[REG2]]
; CHECK-NEXT: vsel %v24, %v25, %v27, %v0		; CHECK-NEXT: vsel %v24, %v25, %v27, %v0
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
;		;
; CHECK-Z14-LABEL: fun26:		; CHECK-Z14-LABEL: fun26:
; CHECK-Z14: # %bb.0:		; CHECK-Z14: # %bb.0:
; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26		; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26
; CHECK-Z14-NEXT: vuphf %v0, %v0		; CHECK-Z14-DAG: vuphf %v0, %v0
; CHECK-Z14-NEXT: vfchdb %v1, %v28, %v30		; CHECK-Z14-DAG: vfchdb %v1, %v28, %v30
; CHECK-Z14-NEXT: vo %v0, %v0, %v1		; CHECK-Z14-NEXT: vo %v0, %v0, %v1
; CHECK-Z14-NEXT: vsel %v24, %v25, %v27, %v0		; CHECK-Z14-NEXT: vsel %v24, %v25, %v27, %v0
; CHECK-Z14-NEXT: br %r14		; CHECK-Z14-NEXT: br %r14
%cmp0 = fcmp ogt <2 x float> %val1, %val2		%cmp0 = fcmp ogt <2 x float> %val1, %val2
%cmp1 = fcmp ogt <2 x double> %val3, %val4		%cmp1 = fcmp ogt <2 x double> %val3, %val4
%and = or <2 x i1> %cmp0, %cmp1		%and = or <2 x i1> %cmp0, %cmp1
%sel = select <2 x i1> %and, <2 x double> %val5, <2 x double> %val6		%sel = select <2 x i1> %and, <2 x double> %val5, <2 x double> %val6
ret <2 x double> %sel		ret <2 x double> %sel
Show All 35 Lines
; CHECK-DAG: vmrlf [[REG12:%v[0-9]+]], %v30, %v30		; CHECK-DAG: vmrlf [[REG12:%v[0-9]+]], %v30, %v30
; CHECK-DAG: vmrlf [[REG13:%v[0-9]+]], %v28, %v28		; CHECK-DAG: vmrlf [[REG13:%v[0-9]+]], %v28, %v28
; CHECK-DAG: vldeb [[REG14:%v[0-9]+]], [[REG12]]		; CHECK-DAG: vldeb [[REG14:%v[0-9]+]], [[REG12]]
; CHECK-DAG: vldeb [[REG15:%v[0-9]+]], [[REG13]]		; CHECK-DAG: vldeb [[REG15:%v[0-9]+]], [[REG13]]
; CHECK-DAG: vfchdb [[REG16:%v[0-9]+]], [[REG15]], [[REG14]]		; CHECK-DAG: vfchdb [[REG16:%v[0-9]+]], [[REG15]], [[REG14]]
; CHECK-DAG: vmrhf [[REG17:%v[0-9]+]], %v30, %v30		; CHECK-DAG: vmrhf [[REG17:%v[0-9]+]], %v30, %v30
; CHECK-DAG: vldeb [[REG19:%v[0-9]+]], [[REG17]]		; CHECK-DAG: vldeb [[REG19:%v[0-9]+]], [[REG17]]
; CHECK-DAG: vldeb [[REG20:%v[0-9]+]], [[REG8]]		; CHECK-DAG: vldeb [[REG20:%v[0-9]+]], [[REG8]]
; CHECK-NEXT: vfchdb %v2, [[REG20]], [[REG19]]		; CHECK-NEXT: vfchdb [[REG22:%v[0-9]+]], [[REG20]], [[REG19]]
; CHECK-NEXT: vpkg [[REG21:%v[0-9]+]], %v2, [[REG16]]		; CHECK-NEXT: vpkg [[REG21:%v[0-9]+]], [[REG22]], [[REG16]]
; CHECK-NEXT: vx %v0, [[REG11]], [[REG21]]		; CHECK-NEXT: vx %v0, [[REG11]], [[REG21]]
; CHECK-NEXT: vsel %v24, %v25, %v27, %v0		; CHECK-NEXT: vsel %v24, %v25, %v27, %v0
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
;		;
; CHECK-Z14-LABEL: fun28:		; CHECK-Z14-LABEL: fun28:
; CHECK-Z14: # %bb.0:		; CHECK-Z14: # %bb.0:
; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26		; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26
; CHECK-Z14-NEXT: vfchsb %v1, %v28, %v30		; CHECK-Z14-NEXT: vfchsb %v1, %v28, %v30
; CHECK-Z14-NEXT: vx %v0, %v0, %v1		; CHECK-Z14-NEXT: vx %v0, %v0, %v1
; CHECK-Z14-NEXT: vsel %v24, %v25, %v27, %v0		; CHECK-Z14-NEXT: vsel %v24, %v25, %v27, %v0
; CHECK-Z14-NEXT: br %r14		; CHECK-Z14-NEXT: br %r14
%cmp0 = fcmp ogt <4 x float> %val1, %val2		%cmp0 = fcmp ogt <4 x float> %val1, %val2
%cmp1 = fcmp ogt <4 x float> %val3, %val4		%cmp1 = fcmp ogt <4 x float> %val3, %val4
%and = xor <4 x i1> %cmp0, %cmp1		%and = xor <4 x i1> %cmp0, %cmp1
%sel = select <4 x i1> %and, <4 x float> %val5, <4 x float> %val6		%sel = select <4 x i1> %and, <4 x float> %val5, <4 x float> %val6
ret <4 x float> %sel		ret <4 x float> %sel
}		}

define <4 x double> @fun29(<4 x float> %val1, <4 x float> %val2, <4 x float> %val3, <4 x float> %val4, <4 x double> %val5, <4 x double> %val6) {		define <4 x double> @fun29(<4 x float> %val1, <4 x float> %val2, <4 x float> %val3, <4 x float> %val4, <4 x double> %val5, <4 x double> %val6) {
; CHECK-LABEL: fun29:		; CHECK-LABEL: fun29:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vmrlf %v0, %v26, %v26		; CHECK-NEXT: vmrlf %v0, %v26, %v26
; CHECK-NEXT: vmrlf %v1, %v24, %v24		; CHECK-NEXT: vmrlf %v1, %v24, %v24
; CHECK-NEXT: vldeb %v0, %v0		; CHECK-DAG: vldeb %v0, %v0
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-DAG: vldeb %v1, %v1
; CHECK-NEXT: vfchdb %v0, %v1, %v0		; CHECK-DAG: vfchdb %v0, %v1, %v0
; CHECK-NEXT: vmrhf %v1, %v26, %v26		; CHECK-DAG: vmrhf [[REG0:%v[0-9]+]], %v26, %v26
; CHECK-NEXT: vmrhf %v2, %v24, %v24		; CHECK-DAG: vmrhf [[REG1:%v[0-9]+]], %v24, %v24
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-DAG: vldeb [[REG0]], [[REG0]]
; CHECK-NEXT: vmrhf %v3, %v28, %v28		; CHECK-DAG: vmrhf [[REG2:%v[0-9]+]], %v28, %v28
; CHECK-NEXT: vldeb %v2, %v2		; CHECK-DAG: vldeb [[REG1]], [[REG1]]
; CHECK-NEXT: vfchdb %v1, %v2, %v1		; CHECK-DAG: vfchdb [[REG3:%v[0-9]+]], [[REG1]], [[REG0]]
; CHECK-NEXT: vpkg %v0, %v1, %v0		; CHECK-DAG: vpkg %v0, [[REG3]], %v0
; CHECK-NEXT: vmrlf %v1, %v30, %v30		; CHECK-DAG: vmrlf %v1, %v30, %v30
; CHECK-NEXT: vmrlf %v2, %v28, %v28		; CHECK-DAG: vmrlf [[REG4:%v[0-9]+]], %v28, %v28
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-DAG: vldeb %v1, %v1
; CHECK-NEXT: vldeb %v2, %v2		; CHECK-DAG: vldeb [[REG4]], [[REG4]]
; CHECK-NEXT: vfchdb %v1, %v2, %v1		; CHECK-DAG: vfchdb %v1, [[REG4]], %v1
; CHECK-NEXT: vmrhf %v2, %v30, %v30		; CHECK-DAG: vmrhf [[REG5:%v[0-9]+]], %v30, %v30
; CHECK-NEXT: vldeb %v2, %v2		; CHECK-NEXT: vldeb [[REG5]], [[REG5]]
; CHECK-NEXT: vldeb %v3, %v3		; CHECK-NEXT: vldeb [[REG2]], [[REG2]]
; CHECK-NEXT: vfchdb %v2, %v3, %v2		; CHECK-NEXT: vfchdb [[REG6:%v[0-9]+]], [[REG2]], [[REG5]]
; CHECK-NEXT: vpkg %v1, %v2, %v1		; CHECK-NEXT: vpkg %v1, [[REG6]], %v1
; CHECK-NEXT: vx %v0, %v0, %v1		; CHECK-NEXT: vx %v0, %v0, %v1
; CHECK-NEXT: vmrlg %v1, %v0, %v0		; CHECK-NEXT: vmrlg %v1, %v0, %v0
; CHECK-NEXT: vuphf %v1, %v1		; CHECK-DAG: vuphf %v1, %v1
; CHECK-NEXT: vuphf %v0, %v0		; CHECK-DAG: vuphf %v0, %v0
; CHECK-NEXT: vsel %v24, %v25, %v29, %v0		; CHECK-NEXT: vsel %v24, %v25, %v29, %v0
; CHECK-NEXT: vsel %v26, %v27, %v31, %v1		; CHECK-NEXT: vsel %v26, %v27, %v31, %v1
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
;		;
; CHECK-Z14-LABEL: fun29:		; CHECK-Z14-LABEL: fun29:
; CHECK-Z14: # %bb.0:		; CHECK-Z14: # %bb.0:
; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26		; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26
; CHECK-Z14-NEXT: vfchsb %v1, %v28, %v30		; CHECK-Z14-NEXT: vfchsb %v1, %v28, %v30
; CHECK-Z14-NEXT: vx %v0, %v0, %v1		; CHECK-Z14-NEXT: vx %v0, %v0, %v1
; CHECK-Z14-NEXT: vmrlg %v1, %v0, %v0		; CHECK-Z14-NEXT: vmrlg %v1, %v0, %v0
; CHECK-Z14-NEXT: vuphf %v1, %v1		; CHECK-Z14-DAG: vuphf %v1, %v1
; CHECK-Z14-NEXT: vuphf %v0, %v0		; CHECK-Z14-DAG: vuphf %v0, %v0
; CHECK-Z14-NEXT: vsel %v24, %v25, %v29, %v0		; CHECK-Z14-NEXT: vsel %v24, %v25, %v29, %v0
; CHECK-Z14-NEXT: vsel %v26, %v27, %v31, %v1		; CHECK-Z14-NEXT: vsel %v26, %v27, %v31, %v1
; CHECK-Z14-NEXT: br %r14		; CHECK-Z14-NEXT: br %r14
%cmp0 = fcmp ogt <4 x float> %val1, %val2		%cmp0 = fcmp ogt <4 x float> %val1, %val2
%cmp1 = fcmp ogt <4 x float> %val3, %val4		%cmp1 = fcmp ogt <4 x float> %val3, %val4
%and = xor <4 x i1> %cmp0, %cmp1		%and = xor <4 x i1> %cmp0, %cmp1
%sel = select <4 x i1> %and, <4 x double> %val5, <4 x double> %val6		%sel = select <4 x i1> %and, <4 x double> %val5, <4 x double> %val6
ret <4 x double> %sel		ret <4 x double> %sel
}		}

define <8 x float> @fun30(<8 x float> %val1, <8 x float> %val2, <8 x double> %val3, <8 x double> %val4, <8 x float> %val5, <8 x float> %val6) {		define <8 x float> @fun30(<8 x float> %val1, <8 x float> %val2, <8 x double> %val3, <8 x double> %val4, <8 x float> %val5, <8 x float> %val6) {
; CHECK-LABEL: fun30:		; CHECK-LABEL: fun30:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vmrlf %v16, %v28, %v28		; CHECK-DAG: vmrlf [[REG0:%v[0-9]+]], %v28, %v28
; CHECK-NEXT: vmrlf %v17, %v24, %v24		; CHECK-DAG: vmrlf [[REG1:%v[0-9]+]], %v24, %v24
; CHECK-NEXT: vldeb %v16, %v16		; CHECK-DAG: vldeb [[REG0]], [[REG0]]
; CHECK-NEXT: vldeb %v17, %v17		; CHECK-DAG: vldeb [[REG1]], [[REG1]]
; CHECK-NEXT: vfchdb %v16, %v17, %v16		; CHECK-DAG: vfchdb [[REG2:%v[0-9]+]], [[REG1]], [[REG0]]
; CHECK-NEXT: vmrhf %v17, %v28, %v28		; CHECK-DAG: vmrhf %v17, %v28, %v28
; CHECK-NEXT: vmrhf %v18, %v24, %v24		; CHECK-DAG: vmrhf %v18, %v24, %v24
; CHECK-NEXT: vldeb %v17, %v17		; CHECK-DAG: vldeb %v17, %v17
; CHECK-NEXT: vl %v4, 192(%r15)		; CHECK-DAG: vl [[REG3:%v[0-9]+]], 192(%r15)
; CHECK-NEXT: vldeb %v18, %v18		; CHECK-DAG: vldeb %v18, %v18
; CHECK-NEXT: vl %v5, 208(%r15)		; CHECK-DAG: vl [[REG4:%v[0-9]+]], 208(%r15)
; CHECK-NEXT: vl %v6, 160(%r15)		; CHECK-DAG: vl [[REG5:%v[0-9]+]], 160(%r15)
; CHECK-NEXT: vl %v7, 176(%r15)		; CHECK-DAG: vl [[REG6:%v[0-9]+]], 176(%r15)
; CHECK-NEXT: vl %v0, 272(%r15)		; CHECK-DAG: vl [[REG7:%v[0-9]+]], 272(%r15)
; CHECK-NEXT: vl %v1, 240(%r15)		; CHECK-DAG: vl [[REG8:%v[0-9]+]], 240(%r15)
; CHECK-NEXT: vfchdb %v17, %v18, %v17		; CHECK-DAG: vfchdb [[REG9:%v[0-9]+]], %v18, %v17
; CHECK-NEXT: vl %v2, 256(%r15)		; CHECK-DAG: vl [[REG10:%v[0-9]+]], 256(%r15)
; CHECK-NEXT: vl %v3, 224(%r15)		; CHECK-DAG: vl [[REG11:%v[0-9]+]], 224(%r15)
; CHECK-NEXT: vpkg %v16, %v17, %v16		; CHECK-DAG: vpkg [[REG12:%v[0-9]+]], [[REG9]], [[REG2]]
; CHECK-NEXT: vmrlf %v17, %v30, %v30		; CHECK-DAG: vmrlf [[REG13:%v[0-9]+]], %v30, %v30
; CHECK-NEXT: vmrlf %v18, %v26, %v26		; CHECK-DAG: vmrlf [[REG14:%v[0-9]+]], %v26, %v26
; CHECK-NEXT: vmrhf %v19, %v26, %v26		; CHECK-DAG: vmrhf [[REG15:%v[0-9]+]], %v26, %v26
; CHECK-NEXT: vfchdb %v7, %v27, %v7		; CHECK-DAG: vfchdb [[REG16:%v[0-9]+]], %v27, [[REG6]]
; CHECK-NEXT: vfchdb %v6, %v25, %v6		; CHECK-DAG: vfchdb [[REG17:%v[0-9]+]], %v25, [[REG5]]
; CHECK-NEXT: vfchdb %v5, %v31, %v5		; CHECK-DAG: vfchdb [[REG18:%v[0-9]+]], %v31, [[REG4]]
; CHECK-NEXT: vfchdb %v4, %v29, %v4		; CHECK-DAG: vfchdb [[REG19:%v[0-9]+]], %v29, [[REG3]]
; CHECK-NEXT: vpkg %v6, %v6, %v7		; CHECK-DAG: vpkg [[REG20:%v[0-9]+]], [[REG17]], [[REG16]]
; CHECK-NEXT: vpkg %v4, %v4, %v5		; CHECK-DAG: vpkg [[REG21:%v[0-9]+]], [[REG19]], [[REG18]]
; CHECK-NEXT: vn %v5, %v16, %v6		; CHECK-DAG: vn [[REG22:%v[0-9]+]], [[REG12]], [[REG20]]
; CHECK-NEXT: vsel %v24, %v3, %v2, %v5		; CHECK-DAG: vsel %v24, [[REG11]], [[REG10]], [[REG22]]
; CHECK-NEXT: vldeb %v17, %v17		; CHECK-DAG: vldeb [[REG13]], [[REG13]]
; CHECK-NEXT: vldeb %v18, %v18		; CHECK-DAG: vldeb [[REG14]], [[REG14]]
; CHECK-NEXT: vfchdb %v17, %v18, %v17		; CHECK-DAG: vfchdb [[REG23:%v[0-9]+]], [[REG14]], [[REG13]]
; CHECK-NEXT: vmrhf %v18, %v30, %v30		; CHECK-DAG: vmrhf [[REG24:%v[0-9]+]], %v30, %v30
; CHECK-NEXT: vldeb %v18, %v18		; CHECK-DAG: vldeb [[REG24]], [[REG24]]
; CHECK-NEXT: vldeb %v19, %v19		; CHECK-DAG: vldeb [[REG15]], [[REG15]]
; CHECK-NEXT: vfchdb %v18, %v19, %v18		; CHECK-DAG: vfchdb [[REG25:%v[0-9]+]], [[REG15]], [[REG24]]
; CHECK-NEXT: vpkg %v17, %v18, %v17		; CHECK-DAG: vpkg [[REG26:%v[0-9]+]], [[REG25]], [[REG23]]
; CHECK-NEXT: vn %v4, %v17, %v4		; CHECK-DAG: vn [[REG27:%v[0-9]+]], [[REG26]], [[REG21]]
; CHECK-NEXT: vsel %v26, %v1, %v0, %v4		; CHECK-DAG: vsel %v26, [[REG8]], [[REG7]], [[REG27]]
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
;		;
; CHECK-Z14-LABEL: fun30:		; CHECK-Z14-LABEL: fun30:
; CHECK-Z14: # %bb.0:		; CHECK-Z14: # %bb.0:
; CHECK-Z14-NEXT: vl %v4, 192(%r15)		; CHECK-Z14-NEXT: vl [[REG0:%v[0-9]+]], 192(%r15)
; CHECK-Z14-NEXT: vl %v5, 208(%r15)		; CHECK-Z14-NEXT: vl [[REG1:%v[0-9]+]], 208(%r15)
; CHECK-Z14-NEXT: vl %v6, 160(%r15)		; CHECK-Z14-NEXT: vl [[REG2:%v[0-9]+]], 160(%r15)
; CHECK-Z14-NEXT: vl %v7, 176(%r15)		; CHECK-Z14-NEXT: vl [[REG3:%v[0-9]+]], 176(%r15)
; CHECK-Z14-NEXT: vfchdb %v7, %v27, %v7		; CHECK-Z14-NEXT: vfchdb [[REG4:%v[0-9]+]], %v27, [[REG3]]
; CHECK-Z14-NEXT: vfchdb %v6, %v25, %v6		; CHECK-Z14-NEXT: vfchdb [[REG5:%v[0-9]+]], %v25, [[REG2]]
; CHECK-Z14-NEXT: vfchdb %v5, %v31, %v5		; CHECK-Z14-NEXT: vfchdb [[REG6:%v[0-9]+]], %v31, [[REG1]]
; CHECK-Z14-NEXT: vfchdb %v4, %v29, %v4		; CHECK-Z14-NEXT: vfchdb [[REG7:%v[0-9]+]], %v29, [[REG0]]
; CHECK-Z14-NEXT: vfchsb %v16, %v24, %v28		; CHECK-Z14-NEXT: vfchsb [[REG8:%v[0-9]+]], %v24, %v28
; CHECK-Z14-NEXT: vfchsb %v17, %v26, %v30		; CHECK-Z14-NEXT: vfchsb [[REG9:%v[0-9]+]], %v26, %v30
; CHECK-Z14-NEXT: vpkg %v6, %v6, %v7		; CHECK-Z14-NEXT: vpkg [[REG10:%v[0-9]+]], [[REG5]], [[REG4]]
; CHECK-Z14-NEXT: vpkg %v4, %v4, %v5		; CHECK-Z14-NEXT: vpkg [[REG11:%v[0-9]+]], [[REG7]], [[REG6]]
; CHECK-Z14-NEXT: vl %v0, 272(%r15)		; CHECK-Z14-NEXT: vl %v0, 272(%r15)
; CHECK-Z14-NEXT: vl %v1, 240(%r15)		; CHECK-Z14-NEXT: vl %v1, 240(%r15)
; CHECK-Z14-NEXT: vl %v2, 256(%r15)		; CHECK-Z14-NEXT: vl %v2, 256(%r15)
; CHECK-Z14-NEXT: vl %v3, 224(%r15)		; CHECK-Z14-NEXT: vl [[REG14:%v[0-9]+]], 224(%r15)
; CHECK-Z14-NEXT: vn %v4, %v17, %v4		; CHECK-Z14-NEXT: vn [[REG12:%v[0-9]+]], [[REG9]], [[REG11]]
; CHECK-Z14-NEXT: vn %v5, %v16, %v6		; CHECK-Z14-NEXT: vn [[REG13:%v[0-9]+]], [[REG8]], [[REG10]]
; CHECK-Z14-NEXT: vsel %v24, %v3, %v2, %v5		; CHECK-Z14-NEXT: vsel %v24, [[REG14]], %v2, [[REG13]]
; CHECK-Z14-NEXT: vsel %v26, %v1, %v0, %v4		; CHECK-Z14-NEXT: vsel %v26, %v1, %v0, [[REG12]]
; CHECK-Z14-NEXT: br %r14		; CHECK-Z14-NEXT: br %r14
%cmp0 = fcmp ogt <8 x float> %val1, %val2		%cmp0 = fcmp ogt <8 x float> %val1, %val2
%cmp1 = fcmp ogt <8 x double> %val3, %val4		%cmp1 = fcmp ogt <8 x double> %val3, %val4
%and = and <8 x i1> %cmp0, %cmp1		%and = and <8 x i1> %cmp0, %cmp1
%sel = select <8 x i1> %and, <8 x float> %val5, <8 x float> %val6		%sel = select <8 x i1> %and, <8 x float> %val5, <8 x float> %val6
ret <8 x float> %sel		ret <8 x float> %sel
}		}

Show All 26 Lines	; CHECK-NEXT: br %r14
%and = xor <2 x i1> %cmp0, %cmp1		%and = xor <2 x i1> %cmp0, %cmp1
%sel = select <2 x i1> %and, <2 x double> %val5, <2 x double> %val6		%sel = select <2 x i1> %and, <2 x double> %val5, <2 x double> %val6
ret <2 x double> %sel		ret <2 x double> %sel
}		}

define <4 x float> @fun33(<4 x double> %val1, <4 x double> %val2, <4 x float> %val3, <4 x float> %val4, <4 x float> %val5, <4 x float> %val6) {		define <4 x float> @fun33(<4 x double> %val1, <4 x double> %val2, <4 x float> %val3, <4 x float> %val4, <4 x float> %val5, <4 x float> %val6) {
; CHECK-LABEL: fun33:		; CHECK-LABEL: fun33:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vfchdb %v0, %v26, %v30		; CHECK-DAG: vfchdb %v0, %v26, %v30
; CHECK-NEXT: vfchdb %v1, %v24, %v28		; CHECK-DAG: vfchdb %v1, %v24, %v28
; CHECK-NEXT: vpkg %v0, %v1, %v0		; CHECK-DAG: vpkg %v0, %v1, %v0
; CHECK-NEXT: vmrlf %v1, %v27, %v27		; CHECK-DAG: vmrlf [[REG0:%v[0-9]+]], %v27, %v27
; CHECK-NEXT: vmrlf %v2, %v25, %v25		; CHECK-DAG: vmrlf [[REG1:%v[0-9]+]], %v25, %v25
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-DAG: vldeb [[REG2:%v[0-9]+]], [[REG0]]
; CHECK-NEXT: vldeb %v2, %v2		; CHECK-DAG: vldeb [[REG3:%v[0-9]+]], [[REG1]]
; CHECK-NEXT: vfchdb %v1, %v2, %v1		; CHECK-DAG: vfchdb [[REG4:%v[0-9]+]], [[REG3]], [[REG2]]
; CHECK-NEXT: vmrhf %v2, %v27, %v27		; CHECK-DAG: vmrhf [[REG5:%v[0-9]+]], %v27, %v27
; CHECK-NEXT: vmrhf %v3, %v25, %v25		; CHECK-DAG: vmrhf [[REG6:%v[0-9]+]], %v25, %v25
; CHECK-NEXT: vldeb %v2, %v2		; CHECK-DAG: vldeb [[REG7:%v[0-9]+]], [[REG5]]
; CHECK-NEXT: vldeb %v3, %v3		; CHECK-DAG: vldeb [[REG8:%v[0-9]+]], [[REG6]]
; CHECK-NEXT: vfchdb %v2, %v3, %v2		; CHECK-DAG: vfchdb %v2, [[REG8]], [[REG7]]
; CHECK-NEXT: vpkg %v1, %v2, %v1		; CHECK-NEXT: vpkg %v1, %v2, [[REG4]]
; CHECK-NEXT: vn %v0, %v0, %v1		; CHECK-NEXT: vn %v0, %v0, %v1
; CHECK-NEXT: vsel %v24, %v29, %v31, %v0		; CHECK-NEXT: vsel %v24, %v29, %v31, %v0
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
;		;
; CHECK-Z14-LABEL: fun33:		; CHECK-Z14-LABEL: fun33:
; CHECK-Z14: # %bb.0:		; CHECK-Z14: # %bb.0:
; CHECK-Z14-NEXT: vfchdb %v0, %v26, %v30		; CHECK-Z14-NEXT: vfchdb %v0, %v26, %v30
; CHECK-Z14-NEXT: vfchdb %v1, %v24, %v28		; CHECK-Z14-NEXT: vfchdb %v1, %v24, %v28
; CHECK-Z14-NEXT: vpkg %v0, %v1, %v0		; CHECK-Z14-DAG: vpkg %v0, %v1, %v0
; CHECK-Z14-NEXT: vfchsb %v1, %v25, %v27		; CHECK-Z14-DAG: vfchsb [[REG0:%v[0-9]+]], %v25, %v27
; CHECK-Z14-NEXT: vn %v0, %v0, %v1		; CHECK-Z14-NEXT: vn %v0, %v0, [[REG0]]
; CHECK-Z14-NEXT: vsel %v24, %v29, %v31, %v0		; CHECK-Z14-NEXT: vsel %v24, %v29, %v31, %v0
; CHECK-Z14-NEXT: br %r14		; CHECK-Z14-NEXT: br %r14
%cmp0 = fcmp ogt <4 x double> %val1, %val2		%cmp0 = fcmp ogt <4 x double> %val1, %val2
%cmp1 = fcmp ogt <4 x float> %val3, %val4		%cmp1 = fcmp ogt <4 x float> %val3, %val4
%and = and <4 x i1> %cmp0, %cmp1		%and = and <4 x i1> %cmp0, %cmp1
%sel = select <4 x i1> %and, <4 x float> %val5, <4 x float> %val6		%sel = select <4 x i1> %and, <4 x float> %val5, <4 x float> %val6
ret <4 x float> %sel		ret <4 x float> %sel
}		}

define <4 x double> @fun34(<4 x double> %val1, <4 x double> %val2, <4 x float> %val3, <4 x float> %val4, <4 x double> %val5, <4 x double> %val6) {		define <4 x double> @fun34(<4 x double> %val1, <4 x double> %val2, <4 x float> %val3, <4 x float> %val4, <4 x double> %val5, <4 x double> %val6) {
; CHECK-LABEL: fun34:		; CHECK-LABEL: fun34:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vmrlf [[REG0:%v[0-9]+]], %v27, %v27		; CHECK-DAG: vmrlf [[REG0:%v[0-9]+]], %v27, %v27
; CHECK-NEXT: vmrlf [[REG1:%v[0-9]+]], %v25, %v25		; CHECK-DAG: vmrlf [[REG1:%v[0-9]+]], %v25, %v25
; CHECK-NEXT: vldeb [[REG2:%v[0-9]+]], [[REG0]]		; CHECK-DAG: vldeb [[REG2:%v[0-9]+]], [[REG0]]
; CHECK-NEXT: vldeb [[REG3:%v[0-9]+]], [[REG1]]		; CHECK-DAG: vldeb [[REG3:%v[0-9]+]], [[REG1]]
; CHECK-NEXT: vfchdb [[REG4:%v[0-9]+]], [[REG3]], [[REG2]]		; CHECK-DAG: vfchdb [[REG4:%v[0-9]+]], [[REG3]], [[REG2]]
; CHECK-NEXT: vmrhf [[REG5:%v[0-9]+]], %v27, %v27		; CHECK-DAG: vmrhf [[REG5:%v[0-9]+]], %v27, %v27
; CHECK-NEXT: vmrhf [[REG6:%v[0-9]+]], %v25, %v25		; CHECK-DAG: vmrhf [[REG6:%v[0-9]+]], %v25, %v25
; CHECK-DAG: vldeb [[REG7:%v[0-9]+]], [[REG5]]		; CHECK-DAG: vldeb [[REG7:%v[0-9]+]], [[REG5]]
; CHECK-DAG: vl [[REG8:%v[0-9]+]], 176(%r15)		; CHECK-DAG: vl [[REG8:%v[0-9]+]], 176(%r15)
; CHECK-DAG: vldeb [[REG9:%v[0-9]+]], [[REG6]]		; CHECK-DAG: vldeb [[REG9:%v[0-9]+]], [[REG6]]
; CHECK-DAG: vl [[REG10:%v[0-9]+]], 160(%r15)		; CHECK-DAG: vl [[REG10:%v[0-9]+]], 160(%r15)
; CHECK-DAG: vfchdb [[REG11:%v[0-9]+]], [[REG9]], [[REG7]]		; CHECK-DAG: vfchdb [[REG11:%v[0-9]+]], [[REG9]], [[REG7]]
; CHECK-DAG: vpkg [[REG12:%v[0-9]+]], [[REG11]], [[REG4]]		; CHECK-DAG: vpkg [[REG12:%v[0-9]+]], [[REG11]], [[REG4]]
; CHECK-DAG: vuphf [[REG13:%v[0-9]+]], [[REG12]]		; CHECK-DAG: vuphf [[REG13:%v[0-9]+]], [[REG12]]
; CHECK-DAG: vmrlg [[REG14:%v[0-9]+]], [[REG12]], [[REG12]]		; CHECK-DAG: vmrlg [[REG14:%v[0-9]+]], [[REG12]], [[REG12]]
; CHECK-NEXT: vfchdb [[REG15:%v[0-9]+]], %v24, %v28		; CHECK-NEXT: vfchdb [[REG15:%v[0-9]+]], %v24, %v28
; CHECK-NEXT: vfchdb [[REG16:%v[0-9]+]], %v26, %v30		; CHECK-NEXT: vfchdb [[REG16:%v[0-9]+]], %v26, %v30
; CHECK-NEXT: vuphf [[REG17:%v[0-9]+]], [[REG14]]		; CHECK-NEXT: vuphf [[REG17:%v[0-9]+]], [[REG14]]
; CHECK-NEXT: vn [[REG18:%v[0-9]+]], [[REG16]], [[REG17]]		; CHECK-DAG: vn [[REG18:%v[0-9]+]], [[REG16]], [[REG17]]
; CHECK-NEXT: vn [[REG19:%v[0-9]+]], [[REG15]], [[REG13]]		; CHECK-DAG: vn [[REG19:%v[0-9]+]], [[REG15]], [[REG13]]
; CHECK-NEXT: vsel %v24, %v29, [[REG10]], [[REG19]]		; CHECK-NEXT: vsel %v24, %v29, [[REG10]], [[REG19]]
; CHECK-NEXT: vsel %v26, %v31, [[REG8]], [[REG18]]		; CHECK-NEXT: vsel %v26, %v31, [[REG8]], [[REG18]]
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14
;		;
; CHECK-Z14-LABEL: fun34:		; CHECK-Z14-LABEL: fun34:
; CHECK-Z14: # %bb.0:		; CHECK-Z14: # %bb.0:
; CHECK-Z14-NEXT: vfchsb %v4, %v25, %v27		; CHECK-Z14-NEXT: vfchsb [[REG0:%v[0-9]+]], %v25, %v27
; CHECK-Z14-NEXT: vuphf %v5, %v4		; CHECK-Z14-NEXT: vuphf [[REG1:%v[0-9]+]], [[REG0]]
; CHECK-Z14-NEXT: vmrlg %v4, %v4, %v4		; CHECK-Z14-NEXT: vmrlg [[REG0]], [[REG0]], [[REG0]]
; CHECK-Z14-NEXT: vfchdb %v2, %v24, %v28		; CHECK-Z14-NEXT: vfchdb [[REG2:%v[0-9]+]], %v24, %v28
; CHECK-Z14-NEXT: vfchdb %v3, %v26, %v30		; CHECK-Z14-NEXT: vfchdb [[REG3:%v[0-9]+]], %v26, %v30
; CHECK-Z14-NEXT: vuphf %v4, %v4		; CHECK-Z14-NEXT: vuphf [[REG0]], [[REG0]]
; CHECK-Z14-NEXT: vl %v0, 176(%r15)		; CHECK-Z14-NEXT: vl %v0, 176(%r15)
; CHECK-Z14-NEXT: vl %v1, 160(%r15)		; CHECK-Z14-NEXT: vl [[REG4:%v[0-9]+]], 160(%r15)
; CHECK-Z14-NEXT: vn %v3, %v3, %v4		; CHECK-Z14-DAG: vn [[REG5:%v[0-9]+]], [[REG3]], [[REG0]]
; CHECK-Z14-NEXT: vn %v2, %v2, %v5		; CHECK-Z14-DAG: vn [[REG6:%v[0-9]+]], [[REG2]], [[REG1]]
; CHECK-Z14-NEXT: vsel %v24, %v29, %v1, %v2		; CHECK-Z14-NEXT: vsel %v24, %v29, [[REG4]], [[REG6]]
; CHECK-Z14-NEXT: vsel %v26, %v31, %v0, %v3		; CHECK-Z14-NEXT: vsel %v26, %v31, %v0, [[REG5]]
; CHECK-Z14-NEXT: br %r14		; CHECK-Z14-NEXT: br %r14
%cmp0 = fcmp ogt <4 x double> %val1, %val2		%cmp0 = fcmp ogt <4 x double> %val1, %val2
%cmp1 = fcmp ogt <4 x float> %val3, %val4		%cmp1 = fcmp ogt <4 x float> %val3, %val4
%and = and <4 x i1> %cmp0, %cmp1		%and = and <4 x i1> %cmp0, %cmp1
%sel = select <4 x i1> %and, <4 x double> %val5, <4 x double> %val6		%sel = select <4 x i1> %and, <4 x double> %val5, <4 x double> %val6
ret <4 x double> %sel		ret <4 x double> %sel
}		}

test/CodeGen/SystemZ/vec-cmpsel.ll

Show First 20 Lines • Show All 310 Lines • ▼ Show 20 Lines	; CHECK-NEXT: br %r14
%cmp = icmp eq <4 x i64> %val1, %val2		%cmp = icmp eq <4 x i64> %val1, %val2
%sel = select <4 x i1> %cmp, <4 x i64> %val3, <4 x i64> %val4		%sel = select <4 x i1> %cmp, <4 x i64> %val3, <4 x i64> %val4
ret <4 x i64> %sel		ret <4 x i64> %sel
}		}

define <2 x float> @fun25(<2 x float> %val1, <2 x float> %val2, <2 x float> %val3, <2 x float> %val4) {		define <2 x float> @fun25(<2 x float> %val1, <2 x float> %val2, <2 x float> %val3, <2 x float> %val4) {
; CHECK-LABEL: fun25:		; CHECK-LABEL: fun25:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vmrlf %v0, %v26, %v26		; CHECK-DAG: vmrlf %v0, %v26, %v26
; CHECK-NEXT: vmrlf %v1, %v24, %v24		; CHECK-DAG: vmrlf %v1, %v24, %v24
; CHECK-NEXT: vldeb %v0, %v0		; CHECK-DAG: vldeb %v0, %v0
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-DAG: vldeb %v1, %v1
; CHECK-NEXT: vfchdb %v0, %v1, %v0		; CHECK-DAG: vfchdb [[REG0:%v[0-9]+]], %v1, %v0
; CHECK-NEXT: vmrhf %v1, %v26, %v26		; CHECK-DAG: vmrhf [[REG1:%v[0-9]+]], %v26, %v26
; CHECK-NEXT: vmrhf %v2, %v24, %v24		; CHECK-DAG: vmrhf [[REG2:%v[0-9]+]], %v24, %v24
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-DAG: vldeb [[REG1]], [[REG1]]
; CHECK-NEXT: vldeb %v2, %v2		; CHECK-DAG: vldeb [[REG2]], [[REG2]]
; CHECK-NEXT: vfchdb %v1, %v2, %v1		; CHECK-DAG: vfchdb [[REG3:%v[0-9]+]], [[REG2]], [[REG1]]
; CHECK-NEXT: vpkg %v0, %v1, %v0		; CHECK-NEXT: vpkg %v0, [[REG3]], [[REG0]]
; CHECK-NEXT: vsel %v24, %v28, %v30, %v0		; CHECK-NEXT: vsel %v24, %v28, %v30, %v0
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14

; CHECK-Z14-LABEL: fun25:		; CHECK-Z14-LABEL: fun25:
; CHECK-Z14: # %bb.0:		; CHECK-Z14: # %bb.0:
; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26		; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26
; CHECK-Z14-NEXT: vsel %v24, %v28, %v30, %v0		; CHECK-Z14-NEXT: vsel %v24, %v28, %v30, %v0
; CHECK-Z14-NEXT: br %r14		; CHECK-Z14-NEXT: br %r14

%cmp = fcmp ogt <2 x float> %val1, %val2		%cmp = fcmp ogt <2 x float> %val1, %val2
%sel = select <2 x i1> %cmp, <2 x float> %val3, <2 x float> %val4		%sel = select <2 x i1> %cmp, <2 x float> %val3, <2 x float> %val4
ret <2 x float> %sel		ret <2 x float> %sel
}		}

define <2 x double> @fun26(<2 x float> %val1, <2 x float> %val2, <2 x double> %val3, <2 x double> %val4) {		define <2 x double> @fun26(<2 x float> %val1, <2 x float> %val2, <2 x double> %val3, <2 x double> %val4) {
; CHECK-LABEL: fun26:		; CHECK-LABEL: fun26:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vmrlf %v0, %v26, %v26		; CHECK-DAG: vmrlf %v0, %v26, %v26
; CHECK-NEXT: vmrlf %v1, %v24, %v24		; CHECK-DAG: vmrlf %v1, %v24, %v24
; CHECK-NEXT: vldeb %v0, %v0		; CHECK-DAG: vldeb %v0, %v0
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-DAG: vldeb %v1, %v1
; CHECK-NEXT: vfchdb %v0, %v1, %v0		; CHECK-DAG: vfchdb %v0, %v1, %v0
; CHECK-NEXT: vmrhf %v1, %v26, %v26		; CHECK-DAG: vmrhf [[REG0:%v[0-9]+]], %v26, %v26
; CHECK-NEXT: vmrhf %v2, %v24, %v24		; CHECK-DAG: vmrhf [[REG1:%v[0-9]+]], %v24, %v24
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-DAG: vldeb [[REG0]], [[REG0]]
; CHECK-NEXT: vldeb %v2, %v2		; CHECK-DAG: vldeb [[REG1]], [[REG1]]
; CHECK-NEXT: vfchdb %v1, %v2, %v1		; CHECK-DAG: vfchdb %v1, [[REG1]], [[REG0]]
; CHECK-NEXT: vpkg %v0, %v1, %v0		; CHECK-NEXT: vpkg %v0, %v1, %v0
; CHECK-NEXT: vuphf %v0, %v0		; CHECK-NEXT: vuphf %v0, %v0
; CHECK-NEXT: vsel %v24, %v28, %v30, %v0		; CHECK-NEXT: vsel %v24, %v28, %v30, %v0
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14

; CHECK-Z14-LABEL: fun26:		; CHECK-Z14-LABEL: fun26:
; CHECK-Z14: # %bb.0:		; CHECK-Z14: # %bb.0:
; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26		; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26
Show All 21 Lines	; CHECK-NEXT: br %r14
ret <2 x float> %sel		ret <2 x float> %sel
}		}

define <4 x float> @fun28(<4 x float> %val1, <4 x float> %val2, <4 x float> %val3, <4 x float> %val4) {		define <4 x float> @fun28(<4 x float> %val1, <4 x float> %val2, <4 x float> %val3, <4 x float> %val4) {
; CHECK-LABEL: fun28:		; CHECK-LABEL: fun28:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vmrlf %v0, %v26, %v26		; CHECK-NEXT: vmrlf %v0, %v26, %v26
; CHECK-NEXT: vmrlf %v1, %v24, %v24		; CHECK-NEXT: vmrlf %v1, %v24, %v24
; CHECK-NEXT: vldeb %v0, %v0		; CHECK-DAG: vldeb %v0, %v0
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-DAG: vldeb %v1, %v1
; CHECK-NEXT: vfchdb %v0, %v1, %v0		; CHECK-DAG: vfchdb %v0, %v1, %v0
; CHECK-NEXT: vmrhf %v1, %v26, %v26		; CHECK-DAG: vmrhf [[REG0:%v[0-9]+]], %v26, %v26
; CHECK-NEXT: vmrhf %v2, %v24, %v24		; CHECK-DAG: vmrhf [[REG1:%v[0-9]+]], %v24, %v24
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-DAG: vldeb [[REG0]], [[REG0]]
; CHECK-NEXT: vldeb %v2, %v2		; CHECK-DAG: vldeb [[REG1]], [[REG1]]
; CHECK-NEXT: vfchdb %v1, %v2, %v1		; CHECK-DAG: vfchdb %v1, [[REG1]], [[REG0]]
; CHECK-NEXT: vpkg %v0, %v1, %v0		; CHECK-NEXT: vpkg %v0, %v1, %v0
; CHECK-NEXT: vsel %v24, %v28, %v30, %v0		; CHECK-NEXT: vsel %v24, %v28, %v30, %v0
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14

; CHECK-Z14-LABEL: fun28:		; CHECK-Z14-LABEL: fun28:
; CHECK-Z14: # %bb.0:		; CHECK-Z14: # %bb.0:
; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26		; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26
; CHECK-Z14-NEXT: vsel %v24, %v28, %v30, %v0		; CHECK-Z14-NEXT: vsel %v24, %v28, %v30, %v0
; CHECK-Z14-NEXT: br %r14		; CHECK-Z14-NEXT: br %r14

%cmp = fcmp ogt <4 x float> %val1, %val2		%cmp = fcmp ogt <4 x float> %val1, %val2
%sel = select <4 x i1> %cmp, <4 x float> %val3, <4 x float> %val4		%sel = select <4 x i1> %cmp, <4 x float> %val3, <4 x float> %val4
ret <4 x float> %sel		ret <4 x float> %sel
}		}

define <4 x double> @fun29(<4 x float> %val1, <4 x float> %val2, <4 x double> %val3, <4 x double> %val4) {		define <4 x double> @fun29(<4 x float> %val1, <4 x float> %val2, <4 x double> %val3, <4 x double> %val4) {
; CHECK-LABEL: fun29:		; CHECK-LABEL: fun29:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vmrlf %v0, %v26, %v26		; CHECK-NEXT: vmrlf %v0, %v26, %v26
; CHECK-NEXT: vmrlf %v1, %v24, %v24		; CHECK-NEXT: vmrlf %v1, %v24, %v24
; CHECK-NEXT: vldeb %v0, %v0		; CHECK-DAG: vldeb %v0, %v0
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-DAG: vldeb %v1, %v1
; CHECK-NEXT: vfchdb %v0, %v1, %v0		; CHECK-DAG: vfchdb %v0, %v1, %v0
; CHECK-NEXT: vmrhf %v1, %v26, %v26		; CHECK-DAG: vmrhf [[REG0:%v[0-9]+]], %v26, %v26
; CHECK-NEXT: vmrhf %v2, %v24, %v24		; CHECK-DAG: vmrhf [[REG1:%v[0-9]+]], %v24, %v24
; CHECK-NEXT: vldeb %v1, %v1		; CHECK-DAG: vldeb [[REG0]], [[REG0]]
; CHECK-NEXT: vldeb %v2, %v2		; CHECK-DAG: vldeb [[REG1]], [[REG1]]
; CHECK-NEXT: vfchdb %v1, %v2, %v1		; CHECK-DAG: vfchdb %v1, [[REG1]], [[REG0]]
; CHECK-NEXT: vpkg [[REG0:%v[0-9]+]], %v1, %v0		; CHECK-DAG: vpkg [[REG0:%v[0-9]+]], %v1, %v0
; CHECK-DAG: vmrlg [[REG1:%v[0-9]+]], [[REG0]], [[REG0]]		; CHECK-DAG: vmrlg [[REG1:%v[0-9]+]], [[REG0]], [[REG0]]
; CHECK-DAG: vuphf [[REG1]], [[REG1]]		; CHECK-DAG: vuphf [[REG1]], [[REG1]]
; CHECK-DAG: vuphf [[REG2:%v[0-9]+]], [[REG0]]		; CHECK-DAG: vuphf [[REG2:%v[0-9]+]], [[REG0]]
; CHECK-NEXT: vsel %v24, %v28, %v25, [[REG2]]		; CHECK-NEXT: vsel %v24, %v28, %v25, [[REG2]]
; CHECK-NEXT: vsel %v26, %v30, %v27, [[REG1]]		; CHECK-NEXT: vsel %v26, %v30, %v27, [[REG1]]
; CHECK-NEXT: br %r14		; CHECK-NEXT: br %r14

; CHECK-Z14-LABEL: fun29:		; CHECK-Z14-LABEL: fun29:
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/vec-ctpop-01.ll

	Show All 24 Lines
	; CHECK: br %r14			; CHECK: br %r14

	%popcnt = call <8 x i16> @llvm.ctpop.v8i16(<8 x i16> %a)			%popcnt = call <8 x i16> @llvm.ctpop.v8i16(<8 x i16> %a)
	ret <8 x i16> %popcnt			ret <8 x i16> %popcnt
	}			}

	define <4 x i32> @f3(<4 x i32> %a) {			define <4 x i32> @f3(<4 x i32> %a) {
	; CHECK-LABEL: f3:			; CHECK-LABEL: f3:
	; CHECK: vpopct [[T1:%v[0-9]+]], %v24, 0			; CHECK-DAG: vpopct [[T1:%v[0-9]+]], %v24, 0
	; CHECK: vgbm [[T2:%v[0-9]+]], 0			; CHECK-DAG: vgbm [[T2:%v[0-9]+]], 0
	; CHECK: vsumb %v24, [[T1]], [[T2]]			; CHECK: vsumb %v24, [[T1]], [[T2]]
	; CHECK: br %r14			; CHECK: br %r14

	%popcnt = call <4 x i32> @llvm.ctpop.v4i32(<4 x i32> %a)			%popcnt = call <4 x i32> @llvm.ctpop.v4i32(<4 x i32> %a)
	ret <4 x i32> %popcnt			ret <4 x i32> %popcnt
	}			}

	define <2 x i64> @f4(<2 x i64> %a) {			define <2 x i64> @f4(<2 x i64> %a) {
	; CHECK-LABEL: f4:			; CHECK-LABEL: f4:
	; CHECK: vpopct [[T1:%v[0-9]+]], %v24, 0			; CHECK-DAG: vpopct [[T1:%v[0-9]+]], %v24, 0
	; CHECK: vgbm [[T2:%v[0-9]+]], 0			; CHECK-DAG: vgbm [[T2:%v[0-9]+]], 0
	; CHECK: vsumb [[T3:%v[0-9]+]], [[T1]], [[T2]]			; CHECK: vsumb [[T3:%v[0-9]+]], [[T1]], [[T2]]
	; CHECK: vsumgf %v24, [[T3]], [[T2]]			; CHECK: vsumgf %v24, [[T3]], [[T2]]
	; CHECK: br %r14			; CHECK: br %r14

	%popcnt = call <2 x i64> @llvm.ctpop.v2i64(<2 x i64> %a)			%popcnt = call <2 x i64> @llvm.ctpop.v2i64(<2 x i64> %a)
	ret <2 x i64> %popcnt			ret <2 x i64> %popcnt
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ, MachineScheduler] Refactor GenericScheduler::tryCandidate() to reuse parts in a new SystemZ scheduling strategy.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 136042

include/llvm/CodeGen/MachineScheduler.h

lib/CodeGen/MachineScheduler.cpp

lib/Target/AMDGPU/SIMachineScheduler.cpp

lib/Target/SystemZ/SystemZMachineScheduler.h

lib/Target/SystemZ/SystemZMachineScheduler.cpp

lib/Target/SystemZ/SystemZTargetMachine.cpp

test/CodeGen/SystemZ/vec-cmp-cmp-logic-select.ll

test/CodeGen/SystemZ/vec-cmpsel.ll

test/CodeGen/SystemZ/vec-ctpop-01.ll

[SystemZ, MachineScheduler] Refactor GenericScheduler::tryCandidate() to reuse parts in a new SystemZ scheduling strategy.
Needs ReviewPublic