This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/SystemZ/
-
Target/
-
SystemZ/
-
SystemZMachineScheduler.h
-
SystemZMachineScheduler.cpp
-
SystemZTargetMachine.cpp
-
test/CodeGen/SystemZ/
-
CodeGen/
-
SystemZ/
-
vec-cmp-cmp-logic-select.ll
-
vec-cmpsel.ll
-
vec-ctpop-01.ll

Differential D43329

[SystemZ, MachineScheduler] Refactor GenericScheduler::tryCandidate() to reuse parts in a new SystemZ scheduling strategy.
Needs ReviewPublic

Authored by jonpa on Feb 15 2018, 2:31 AM.

Download Raw Diff

Details

Reviewers

uweigand

Summary

Short: Patch with a new SystemZ SchedStrategy for Pre-RA scheduling, with refactored GenereicScheduler to reuse all of tryCandidate(). Advice needed on how to know that SU uses the vector unit.

In continuation of the discussion we had last November (http://lists.llvm.org/pipermail/llvm-dev/2017-November/119250.html), I am now much closer to actually committing something for the SystemZ backend. As then, I want to do an extra latency check on specific SUs in tryCandidate(). If it would be accepted - as Andy previously explained quite clearly it is not - this might have looked something like:

--- a/lib/CodeGen/MachineScheduler.cpp
+++ b/lib/CodeGen/MachineScheduler.cpp
@@ -2968,14 +2968,20 @@ void GenericScheduler::tryCandidate(SchedCandidate &Cand,
   // Avoid increasing the max pressure of the entire region.
   if (DAG->isTrackingPressure() && tryPressure(TryCand.RPDelta.CurrentMax,
                                                Cand.RPDelta.CurrentMax,
                                                TryCand, Cand, RegMax, TRI,
                                                DAG->MF))
     return;

+  // Let target give priority to latency for certain instructions, e.g. those
+  // using a particular pipeline.
+  if (RegionPolicy.LatencyBoost && ST.tryAggrLatency(TryCand.SU) &&
+      tryLatency(TryCand, Cand, *Zone))
+    return;
+
   if (SameBoundary) {
     // Avoid critical resource consumption and balance the schedule.

This is quite simple, but another "knob" to turn, instead of a truly flexible tryCandidate() method. I agree with Andy that tuning the pre-RA scheduling heruistics is important enough to motivate a target to derive its own strategy, so that's what I did now instead by:

Refactor tryCandidate() into digestible parts so that a target that wants to override this method with minor modifications can do so easily. I tried to do a separation into the logical parts, but I am of course open to alternatives, including name changing of the new methods.

tryLatency() becomes a protected method of GenericSchedulerBase instead of a static function in MachineScheduler.cpp, so that the derived SystemZ strategy can reuse it. More static functions may follow this pattern when needed, such as tryLess(), tryGreater(), etc.

A new class SystemZPreRASchedStrategy. Its tryCandidate() method basically needs to check if SU uses the vector unit (Z13_VecUnit / Z14_VecUnit), and if so call tryLatency(). I am not sure what the best way to do this check would be, and the current way of using a string comparison on the MCProcResourceDesc::Name is of course temporary. It would have been nice to have enums for the execution units, but Tablegen does not print them. Another alternative might be to use a TSFlag bit for this in the instruction descriptor, but that would mean duplicating information unnecessarily. What is the recommended way of identifying a specific processor resource? I don't see this being done anywhere. Is there another way, such as giving the VecUnit some type of flag or value?

Coming back to the original mail discussion, this will perhaps be somewhat awkward to maintain over time - in order to keep up with the developments in the base class, one has to 1) diff both tryCandidate(), and also 2) check what calls to DAG->addMutation() are done in the generic createGenericSchedLive().

As before, there are some nice improvements on benchmarks with this :-)

SystemZ tests updated.

Diff Detail

Event Timeline

jonpa created this revision.Feb 15 2018, 2:31 AM

Herald added subscribers: javed.absar, MatzeB. · View Herald TranscriptFeb 15 2018, 2:31 AM

Refactor tryCandidate() into digestible parts so that a target that wants to override this method with minor modifications can do so easily. I tried to do a separation into the logical parts, but I am of course open to alternatives, including name changing of the new methods.

Thanks for picking this up! This is along the lines what I outlined in the email thread and what I was planning to do, but did not find the time so far.

I left some inline comments and I think it would be best if you could split this up into a MachineScheduler patch and a SystemZ patch. One advantage of having target specific schedulers IMO should be that target maintainers can more easily accept changes without worrying about other targets.

Coming back to the original mail discussion, this will perhaps be somewhat awkward to maintain over time - in order to keep up with the developments in the base class, one has to 1) diff both tryCandidate(), and also 2) check what calls to DAG->addMutation() are done in the generic createGenericSchedLive().

I do not think this will be a huge issue, as there are not many changes to the generic heuristics (and with more target-specific schedulers, I expect even less need to tweak the generic heuristics). As for DAG mutations, at least the AArch64 backend already manages that itself and I do not think this has been a problem so far.

include/llvm/CodeGen/MachineScheduler.h
894–895 ↗	(On Diff #134392)	I am not sure what the benefit of that is and at the moment it seems like one setting too much to me. I think it is not unreasonable to expect people to provide their own tryCandidate implementation if they want custom latency heuristics.
969 ↗	(On Diff #134392)	Usually camel case is used for function names, so I think it would be better to have tryCandidateRegPressure - same for other methods below.
981 ↗	(On Diff #134392)	I think for now it seems like making tryCandidate a virtual function gives peopl eenough freedom to ship their own heuristics while re-using most of the existing bits. I don't see the benefit of making the helper functions class members however. Also, tryCandidate should not modify the scheduler state, so I think it should be `const`. And as this is part of the interface now and we expect people to extend it, it would be good to document it.
lib/CodeGen/MachineScheduler.cpp
2602 ↗	(On Diff #134392)	Splitting the code into multiple helper functions would be great opportunity to document the heuristics :)

jonpa added inline comments.Feb 17 2018, 9:30 AM

include/llvm/CodeGen/MachineScheduler.h
894–895 ↗	(On Diff #134392)	Well... why have duplicated code in a Target backend, if it can be helped? To me, it is generally reasonable to assume that a target starts out with enabling the generic mischeduler, and then as a next step experiment with it by adding / removing / reordering heuristics. What harm is it then to provide the means to do so with protected member methods?
981 ↗	(On Diff #134392)	How are those bits going to be reused if not by inheritance? How do you picture SystemZ adding this simple heuristic (see top of page) while reusing all else?

fhahn added inline comments.Feb 19 2018, 5:58 AM

include/llvm/CodeGen/MachineScheduler.h
894–895 ↗	(On Diff #134392)	IMO with too much subclassing and overriding, it can be quite complicated to see what's going on - but I think that comes down to personal preference, mostly. And that's just me, I suppose with those kinds of things @MatzeB and @atrick 's thoughts are much more important ;) I also think it would be nice to have all heuristic functions somewhere together, to make it easier to keep track of them.
981 ↗	(On Diff #134392)	How do you picture SystemZ adding this simple heuristic (see top of page) while reusing all else? They do not have to be member functions, they could be regular exported functions. They don't access the scheduler object (I think), so I don't think they would need to be member functions. To me, it seems simpler not to couple the scheduler and the heuristics too much, but that's again mostly personal perference.

jonpa added inline comments.Feb 19 2018, 8:13 AM

include/llvm/CodeGen/MachineScheduler.h
894–895 ↗	(On Diff #134392)	I tried adding const to all the new methods, which seemed to increase readability at least to me. tryLatency() is used both pre- and post-RA, so it needs to be in the base. I guess tryGreater() and other such small methods should also all go in the base class. Personally, I think it makes sense to have these type of functions as part of the class, since after all that is a scheduling strategy class for which these methods actually are very central. But like you said as well, that's just me...
981 ↗	(On Diff #134392)	I tried removing the 'static' from such a function in MachineScheduler.cpp, and then adding an 'extern' declaration in SystemZMachineScheduler.cpp, but that lead to a lot of linker errors. How would you do this?

fhahn added inline comments.Feb 19 2018, 8:55 AM

include/llvm/CodeGen/MachineScheduler.h
981 ↗	(On Diff #134392)	ah sorry. I meant removing static and moving the declaration to the header file.

General thoughts:

Factoring code textually across components is a non-goal in itself. The resulting abstraction is often less clear and harder to maintain. It's always tempting to factor code that appears repetitious but isn't actually a serious maintainability problem. Code factoring should be driven by logical boundaries, not textual repetition.

Inheritance as a code reuse strategy usually results in less clarity and less maintainability. I prefer object composition or free standing functions and only use inheritance when polymorphism is required.

(I know that's not very helpful advice without examples, so just take it FWIW.)

MachineScheduler thoughts:

Creating small utility functions that can be easily combined into a useful scheduling strategy for targets is great.

Breaking up the top-level tryCandidate and moving the boilerplate and pesky details into smaller helpers looks good.

So, on these points, the patch is a positive contribution.

An important design principle is that when GenericScheduler's implementation changes it should *not* affect targets that have already been tuned and are overriding the scheduling strategy.

See the problem that inheritance creates? As a code reuse strategy, it violates the decoupling between Target and CodeGen.

In particular, arbitrarily gouping the heuristics into RegPressure vs. RegPressure2 and Latency vs. Latency2 is unhelpful. Each heuristic entry point that you expose to the Target should have clear semantics that aren't likely to change as GenericScheduler evolves. The contract for each entry point should be clear.

So, for example, tryCandidateClusteredWeak makes sense. It isn't going to be reimplemented as something else. You can't do this for "Latency" and "RegPressure" because that could mean a number of different things. I think the challenge here is to come up with a name for each heuristic that makes sense to expose outside of the GenericScheduler implementation.

You do not *need* to expose all heuristics used by the GenericScheduler directly to targets. It's not hard for targets to copy the few lines of code for a heuristic. Copying the code doesn't create a maintenance problem, it solves one. Just expose the heuristics that are clear and easy to describe.

Creating small utility functions that can be easily combined into a useful scheduling strategy for targets is great.

I followed Florians and your suggestion and simply removed the static keyword and put the declaration in the header file. I then realized I had to pass some member objects as arguments instead when needed. For instance, tryCandidate_RegPress() calls DAG->isTrackingPressure(), which is a ScheduleDAGMILive method. Since the SchedBoundary *Zone argument only has a ScheduleDAGMI *DAG member, it cannot be used directly. Also, tryCandidate_Latency() needs the Rem object passed.

Regardless of which utility functions we end up with, I suppose this is is acceptable and preferred still to inheritance?

An important design principle is that when GenericScheduler's implementation changes it should *not* affect targets that have already been tuned and are overriding the scheduling strategy.

Just to express my thoughts: I see this point in the sense that if a target truly had a perfect set-in-stone tuning, it would be disastrous to change anything in an uncontrollable way. But given that mischeduler is relatively new and evolving, and a target may merely have been able to improve benchmarks with a minor modification, I think it's more natural to think that a target would really want to be in on the improvements to come in the future. In other words, *not* to decouple. Take register pressure, for instance. I don't think a target that does not have it's own register pressure heuristics would want to fall behind if the common code changes in the future. That change should then be general goodness, or it should be put in a target specific strategy, right?

So to me personally, working on just one backend, it would be slightly preferred to have e.g. the tryCandidate_RegPress() function in the target strategy, so that if somebody improved it, my target would immediately get that improvement. I don't want to decouple this, since I was merely adding a heuristic with lesser priority.

On the other hand, providing just the smaller utility functions and then doing a copy-and-paste of tryCandidate() should probably work quite well in practice, as you say. In the very long run, this would also give maximum flexibility for each target.
I suppose then we could just have one or two of those new methods, like tryCandidate_Clustered_Weak().

BTW, I am still open to suggestions on the SystemZ specific issue of answering the question "does this SU use MCProcResource X?" -- see top of page.

Just to express my thoughts: I see this point in the sense that if a target truly had a perfect set-in-stone tuning, it would be disastrous to change anything in an uncontrollable way. But given that mischeduler is relatively new and evolving, and a target may merely have been able to improve benchmarks with a minor modification, I think it's more natural to think that a target would really want to be in on the improvements to come in the future. In other words, *not* to decouple. Take register pressure, for instance. I don't think a target that does not have it's own register pressure heuristics would want to fall behind if the common code changes in the future. That change should then be general goodness, or it should be put in a target specific strategy, right?

So to me personally, working on just one backend, it would be slightly preferred to have e.g. the tryCandidate_RegPress() function in the target strategy, so that if somebody improved it, my target would immediately get that improvement. I don't want to decouple this, since I was merely adding a heuristic with lesser priority.

On the other hand, providing just the smaller utility functions and then doing a copy-and-paste of tryCandidate() should probably work quite well in practice, as you say. In the very long run, this would also give maximum flexibility for each target.
I suppose then we could just have one or two of those new methods, like tryCandidate_Clustered_Weak().

OK. I think it's very hard to group heuristics in a meaningful way. Some repetition of code on the target side will make it easier to maintain. Backing up a bit, my broader concern is that when Targets become too dependent on the incidental behavior of machine independent code, it really inhibits changes to the machine independent code.

BTW, I am still open to suggestions on the SystemZ specific issue of answering the question "does this SU use MCProcResource X?" -- see top of page.

I haven't worked with the code in years, but here's my take. In your subtarget you can define your own symbolic constant, like SystemZVectorUnitIdx = 6. You can assert that 0 == strcmp(getProcResource(SytemZVectorUnitIdx).Name, "Z14_VecUnit"). If, in the rare event that you add resources to this subtarget, the assert will force you to update this constant.

You can define a method on your Subtarget roughly like this:

auto *resource = getWriteProcResBegin(SC);
if (resource->ProcResourceIdx == SystemZVectorUnitIdx)

Hope that works.

-Andy

Thanks for review and advice, patch updated.

OK. I think it's very hard to group heuristics in a meaningful way. Some repetition of code on the target side will make it easier to maintain. Backing up a bit, my broader concern is that when Targets become too dependent on the incidental behavior of machine independent code, it really inhibits changes to the machine independent code.

I have updated the patch per your guidelines where the target copies tryCandidate() while reusing only small utility functions like tryLess() etc, which are now declared in MachineScheduler.h.

... In your subtarget you can define your own symbolic constant...

OK, I tried that, which is at least a slight improvement to looking it up every time.

I had to wrap the AMDGPU tryLess() and tryGreater() methods in a local namespace to avoid conflict.

Do you still want me to split this into a separate patch without the SystemZ part? (You don't have to approve that)

Herald added subscribers: nhaehnle, arsenm. · View Herald TranscriptFeb 26 2018, 7:33 AM

This looks fine to me based on a quick review. I don't know if @fhahn or @MatzeB still want to weigh in. Not sure if anyone else needs to review the SystemZ specific code or if you effectively own that.

This revision is now accepted and ready to land.Feb 26 2018, 10:54 AM

Great, thanks Jonas! LGTM too.

lib/Target/AMDGPU/SIMachineScheduler.cpp
157 ↗	(On Diff #135902)	nit: not sure, should nested namespaces be separated by newlines?

Thanks for review!

Not sure if anyone else needs to review the SystemZ specific code or if you effectively own that.

Uli is the reviewer for the SystemZ part, as usual.

NFC update to put nested namespaces on separate lines.

lib/Target/AMDGPU/SIMachineScheduler.cpp
157 ↗	(On Diff #135902)	I copied this from the PowerPC backend, but looking at the "Namespace Indentation" section of the coding guidlines, it seems that at least in the example there newlines are used, so it seems you are right.

Hi Jonas,

are you planning on committing this patch in the near future? I suppose you are waiting until @uweigand had a look at the SystemZ specific changes?

Recently, a few people have been looking into tweaking scheduler heuristics, and after this change goes in we could easily add an experimental scheduler for AArch64 to experiment with tweaking heuristics.

Thanks again for putting up the patch!

Florian

In D43329#1064275, @fhahn wrote:

Hi Jonas,

are you planning on committing this patch in the near future? I suppose you are waiting until @uweigand had a look at the SystemZ specific changes?

Recently, a few people have been looking into tweaking scheduler heuristics, and after this change goes in we could easily add an experimental scheduler for AArch64 to experiment with tweaking heuristics.

Thanks again for putting up the patch!

Florian

I committed the common code parts as r329884. We are waiting a bit with this for SystemZ, since we are having second thoughts that we probably should enable bi-directional scheduling before enabling tryLatency() in a specialized case (without bi-directional the second tryLatency() call is never made).

Thanks for review.

Since the common code changes have now been committed, this is an update to only keep the SystemZ specific parts.

This revision now requires review to proceed.Apr 12 2018, 1:27 AM

nhaehnle removed a subscriber: nhaehnle.Apr 12 2018, 8:32 AM

Revision Contents

Path

Size

lib/

Target/

SystemZ/

SystemZMachineScheduler.h

15 lines

SystemZMachineScheduler.cpp

170 lines

SystemZTargetMachine.cpp

11 lines

test/

CodeGen/

SystemZ/

vec-cmp-cmp-logic-select.ll

300 lines

vec-cmpsel.ll

76 lines

vec-ctpop-01.ll

8 lines

Diff 142126

lib/Target/SystemZ/SystemZMachineScheduler.h

Context not available.
	void releaseBottomNode(SUnit *SU) override {};	void releaseBottomNode(SUnit *SU) override {};
	};	};

		class SystemZPreRASchedStrategy : public GenericScheduler {
		const SystemZSubtarget *ST;

		// The VectorUnit index is 6 for both z13 and z14.
		const unsigned SystemZVectorUnitIdx = 6;

		public:
		SystemZPreRASchedStrategy(const MachineSchedContext *C) :
		GenericScheduler(C), ST(&C->MF->getSubtarget<SystemZSubtarget>()) {}

		void tryCandidate(SchedCandidate &Cand,
		SchedCandidate &TryCand,
		SchedBoundary *Zone) const override;
		};

	} // end namespace llvm	} // end namespace llvm

	#endif // LLVM_LIB_TARGET_SYSTEMZ_SYSTEMZMACHINESCHEDULER_H	#endif // LLVM_LIB_TARGET_SYSTEMZ_SYSTEMZMACHINESCHEDULER_H
Context not available.

lib/Target/SystemZ/SystemZMachineScheduler.cpp

Context not available.
	// Put all released SUs in the Available set.	// Put all released SUs in the Available set.
	Available.insert(SU);	Available.insert(SU);
	}	}


		//////////// Pre-RA scheduling

		// EXPERIMENTAL
		#include "llvm/Support/CommandLine.h"
		static cl::opt<bool> CHECK_ACYC("check-acyc", cl::init(false));
		static cl::opt<bool> VEC_LAT("vec-lat", cl::init(false));

		// This is mostly copied from MachineScheduler.cpp.
		void SystemZPreRASchedStrategy::
		tryCandidate(SchedCandidate &Cand,
		SchedCandidate &TryCand,
		SchedBoundary *Zone) const {
		// Initialize the candidate if needed.
		if (!Cand.isValid()) {
		TryCand.Reason = NodeOrder;
		return;
		}

		if (tryGreater(biasPhysRegCopy(TryCand.SU, TryCand.AtTop),
		biasPhysRegCopy(Cand.SU, Cand.AtTop),
		TryCand, Cand, PhysRegCopy))
		return;

		// Avoid exceeding the target's limit.
		if (DAG->isTrackingPressure() && tryPressure(TryCand.RPDelta.Excess,
		Cand.RPDelta.Excess,
		TryCand, Cand, RegExcess, TRI,
		DAG->MF))
		return;

		// Avoid increasing the max critical pressure in the scheduled region.
		if (DAG->isTrackingPressure() && tryPressure(TryCand.RPDelta.CriticalMax,
		Cand.RPDelta.CriticalMax,
		TryCand, Cand, RegCritical, TRI,
		DAG->MF))
		return;

		// We only compare a subset of features when comparing nodes between
		// Top and Bottom boundary. Some properties are simply incomparable, in many
		// other instances we should only override the other boundary if something
		// is a clear good pick on one boundary. Skip heuristics that are more
		// "tie-breaking" in nature.
		bool SameBoundary = Zone != nullptr;
		if (SameBoundary) {
		// For loops that are acyclic path limited, aggressively schedule for
		// latency. Within an single cycle, whenever CurrMOps > 0, allow normal
		// heuristics to take precedence.
		if (Rem.IsAcyclicLatencyLimited && !Zone->getCurrMOps() &&
		tryLatency(TryCand, Cand, *Zone))
		return;

		// Prioritize instructions that read unbuffered resources by stall cycles.
		if (tryLess(Zone->getLatencyStallCycles(TryCand.SU),
		Zone->getLatencyStallCycles(Cand.SU), TryCand, Cand, Stall))
		return;
		}

		// Keep clustered nodes together to encourage downstream peephole
		// optimizations which may reduce resource requirements.
		//
		// This is a best effort to set things up for a post-RA pass. Optimizations
		// like generating loads of multiple registers should ideally be done within
		// the scheduler pass by combining the loads during DAG postprocessing.
		const SUnit *CandNextClusterSU =
		Cand.AtTop ? DAG->getNextClusterSucc() : DAG->getNextClusterPred();
		const SUnit *TryCandNextClusterSU =
		TryCand.AtTop ? DAG->getNextClusterSucc() : DAG->getNextClusterPred();
		if (tryGreater(TryCand.SU == TryCandNextClusterSU,
		Cand.SU == CandNextClusterSU,
		TryCand, Cand, Cluster))
		return;

		if (SameBoundary) {
		// Weak edges are for clustering and other constraints.
		if (tryLess(getWeakLeft(TryCand.SU, TryCand.AtTop),
		getWeakLeft(Cand.SU, Cand.AtTop),
		TryCand, Cand, Weak))
		return;
		}

		// Avoid increasing the max pressure of the entire region.
		if (DAG->isTrackingPressure() && tryPressure(TryCand.RPDelta.CurrentMax,
		Cand.RPDelta.CurrentMax,
		TryCand, Cand, RegMax, TRI,
		DAG->MF))
		return;

		// SystemZ specific: Latency boost for instructions using the vector unit.
		bool DO = (!Rem.IsAcyclicLatencyLimited \|\| !CHECK_ACYC);
		if (!VEC_LAT){
		if (ST->hasVector() && DO) {
		assert ((std::
		string(SchedModel->getProcResource(SystemZVectorUnitIdx)->Name)
		.find("VecUnit") != std::string::npos) &&
		"Hard coded index for vector unit changed!");
		bool VectorPipeline = false;
		const MCSchedClassDesc *SC = DAG->getSchedClass(TryCand.SU);
		for (TargetSchedModel::ProcResIter
		PI = SchedModel->getWriteProcResBegin(SC),
		PE = SchedModel->getWriteProcResEnd(SC); PI != PE; ++PI) {
		if (PI->ProcResourceIdx == SystemZVectorUnitIdx) {
		VectorPipeline = true;
		break;
		}
		}
		if (VectorPipeline && tryLatency(TryCand, Cand, *Zone))
		return;
		}
		} else {
		bool VecPipe_TryC = false;
		const MCSchedClassDesc *SC = DAG->getSchedClass(TryCand.SU);
		for (TargetSchedModel::ProcResIter
		PI = SchedModel->getWriteProcResBegin(SC),
		PE = SchedModel->getWriteProcResEnd(SC); PI != PE; ++PI) {
		if (PI->ProcResourceIdx == SystemZVectorUnitIdx) {
		VecPipe_TryC = true;
		break;
		}
		}
		bool VecPipe_Cand = false;
		SC = DAG->getSchedClass(Cand.SU);
		for (TargetSchedModel::ProcResIter
		PI = SchedModel->getWriteProcResBegin(SC),
		PE = SchedModel->getWriteProcResEnd(SC); PI != PE; ++PI) {
		if (PI->ProcResourceIdx == SystemZVectorUnitIdx) {
		VecPipe_Cand = true;
		break;
		}
		}

		CandReason Reason_Cand = Cand.Reason;
		CandReason Reason_TryCand = TryCand.Reason;
		if (VecPipe_TryC && tryLatency(TryCand, Cand, *Zone)) {
		if (Reason_Cand != Cand.Reason) {
		if (!VecPipe_Cand)
		Cand.Reason = Reason_Cand;
		else
		return;
		} else if (Reason_TryCand != TryCand.Reason) {
		return;
		}
		}
		}

		if (SameBoundary) {
		// Avoid critical resource consumption and balance the schedule.
		TryCand.initResourceDelta(DAG, SchedModel);
		if (tryLess(TryCand.ResDelta.CritResources, Cand.ResDelta.CritResources,
		TryCand, Cand, ResourceReduce))
		return;
		if (tryGreater(TryCand.ResDelta.DemandedResources,
		Cand.ResDelta.DemandedResources,
		TryCand, Cand, ResourceDemand))
		return;

		// Avoid serializing long latency dependence chains.
		// For acyclic path limited loops, latency was already checked above.
		if (!RegionPolicy.DisableLatencyHeuristic && TryCand.Policy.ReduceLatency &&
		!Rem.IsAcyclicLatencyLimited && tryLatency(TryCand, Cand, *Zone))
		return;

		// Fall through to original instruction order.
		if ((Zone->isTop() && TryCand.SU->NodeNum < Cand.SU->NodeNum)
		\|\| (!Zone->isTop() && TryCand.SU->NodeNum > Cand.SU->NodeNum)) {
		TryCand.Reason = NodeOrder;
		}
		}
		}
Context not available.

lib/Target/SystemZ/SystemZTargetMachine.cpp

Context not available.
	}	}

	ScheduleDAGInstrs *	ScheduleDAGInstrs *
		createMachineScheduler(MachineSchedContext *C) const override {
		// To run the generic pre-RA scheduler use: -misched=converge
		ScheduleDAGMILive *DAG =
		new ScheduleDAGMILive(C, llvm::make_unique<SystemZPreRASchedStrategy>(C));

		// Use same DAG mutators as are applied in createGenericSchedLive().
		DAG->addMutation(createCopyConstrainDAGMutation(DAG->TII, DAG->TRI));
		return DAG;
		}

		ScheduleDAGInstrs *
	createPostMachineScheduler(MachineSchedContext *C) const override {	createPostMachineScheduler(MachineSchedContext *C) const override {
	return new ScheduleDAGMI(C,	return new ScheduleDAGMI(C,
	llvm::make_unique<SystemZPostRASchedStrategy>(C),	llvm::make_unique<SystemZPostRASchedStrategy>(C),
Context not available.

test/CodeGen/SystemZ/vec-cmp-cmp-logic-select.ll

Context not available.
	; CHECK-DAG: vceqh [[REG4:%v[0-9]+]], %v30, %v27	; CHECK-DAG: vceqh [[REG4:%v[0-9]+]], %v30, %v27
	; CHECK-DAG: vl [[REG5:%v[0-9]+]], 176(%r15)	; CHECK-DAG: vl [[REG5:%v[0-9]+]], 176(%r15)
	; CHECK-DAG: vl [[REG6:%v[0-9]+]], 160(%r15)	; CHECK-DAG: vl [[REG6:%v[0-9]+]], 160(%r15)
	; CHECK-DAG: vo [[REG7:%v[0-9]+]], %v2, [[REG4]]	; CHECK-DAG: vo [[REG7:%v[0-9]+]], [[REG1]], [[REG4]]
	; CHECK-DAG: vo [[REG8:%v[0-9]+]], [[REG2]], [[REG3]]	; CHECK-DAG: vo [[REG8:%v[0-9]+]], [[REG2]], [[REG3]]
	; CHECK-DAG: vsel %v24, %v29, [[REG6]], [[REG8]]	; CHECK-DAG: vsel %v24, %v29, [[REG6]], [[REG8]]
	; CHECK-DAG: vsel %v26, %v31, [[REG5]], [[REG7]]	; CHECK-DAG: vsel %v26, %v31, [[REG5]], [[REG7]]
Context not available.
	; CHECK: # %bb.0:	; CHECK: # %bb.0:
	; CHECK-NEXT: vceqg %v0, %v26, %v30	; CHECK-NEXT: vceqg %v0, %v26, %v30
	; CHECK-NEXT: vceqg %v1, %v24, %v28	; CHECK-NEXT: vceqg %v1, %v24, %v28
	; CHECK-NEXT: vpkg %v0, %v1, %v0	; CHECK-DAG: vpkg %v0, %v1, %v0
	; CHECK-NEXT: vceqf %v1, %v25, %v27	; CHECK-DAG: vceqf [[REG0:%v[0-9]+]], %v25, %v27
	; CHECK-NEXT: vx %v0, %v0, %v1	; CHECK-NEXT: vx %v0, %v0, [[REG0]]
	; CHECK-NEXT: vsel %v24, %v29, %v31, %v0	; CHECK-NEXT: vsel %v24, %v29, %v31, %v0
	; CHECK-NEXT: br %r14	; CHECK-NEXT: br %r14
	%cmp0 = icmp eq <4 x i64> %val1, %val2	%cmp0 = icmp eq <4 x i64> %val1, %val2
Context not available.
	; CHECK: # %bb.0:	; CHECK: # %bb.0:
	; CHECK-NEXT: vmrlf %v0, %v26, %v26	; CHECK-NEXT: vmrlf %v0, %v26, %v26
	; CHECK-NEXT: vmrlf %v1, %v24, %v24	; CHECK-NEXT: vmrlf %v1, %v24, %v24
	; CHECK-NEXT: vldeb %v0, %v0	; CHECK-DAG: vldeb %v0, %v0
	; CHECK-NEXT: vldeb %v1, %v1	; CHECK-DAG: vldeb %v1, %v1
	; CHECK-NEXT: vfchdb %v0, %v1, %v0	; CHECK-DAG: vfchdb %v0, %v1, %v0
	; CHECK-NEXT: vmrhf %v1, %v26, %v26	; CHECK-DAG: vmrhf [[REG0:%v[0-9]+]], %v26, %v26
	; CHECK-NEXT: vmrhf %v2, %v24, %v24	; CHECK-DAG: vmrhf [[REG1:%v[0-9]+]], %v24, %v24
	; CHECK-NEXT: vldeb %v1, %v1	; CHECK-DAG: vldeb [[REG0]], [[REG0]]
	; CHECK-NEXT: vldeb %v2, %v2	; CHECK-DAG: vldeb [[REG1]], [[REG1]]
	; CHECK-NEXT: vfchdb %v1, %v2, %v1	; CHECK-DAG: vfchdb %v1, [[REG1]], [[REG0]]
	; CHECK-NEXT: vpkg %v0, %v1, %v0	; CHECK-DAG: vpkg %v0, %v1, %v0
	; CHECK-NEXT: vfchdb %v1, %v28, %v30	; CHECK-DAG: vfchdb [[REG2:%v[0-9]+]], %v28, %v30
	; CHECK-NEXT: vpkg %v1, %v1, %v1	; CHECK-DAG: vpkg [[REG2]], [[REG2]], [[REG2]]
	; CHECK-NEXT: vo %v0, %v0, %v1	; CHECK-NEXT: vo %v0, %v0, [[REG2]]
	; CHECK-NEXT: vsel %v24, %v25, %v27, %v0	; CHECK-NEXT: vsel %v24, %v25, %v27, %v0
	; CHECK-NEXT: br %r14	; CHECK-NEXT: br %r14
	;	;
Context not available.
	; CHECK: # %bb.0:	; CHECK: # %bb.0:
	; CHECK-NEXT: vmrlf %v0, %v26, %v26	; CHECK-NEXT: vmrlf %v0, %v26, %v26
	; CHECK-NEXT: vmrlf %v1, %v24, %v24	; CHECK-NEXT: vmrlf %v1, %v24, %v24
	; CHECK-NEXT: vldeb %v0, %v0	; CHECK-DAG: vldeb %v0, %v0
	; CHECK-NEXT: vldeb %v1, %v1	; CHECK-DAG: vldeb %v1, %v1
	; CHECK-NEXT: vfchdb %v0, %v1, %v0	; CHECK-DAG: vfchdb %v0, %v1, %v0
	; CHECK-NEXT: vmrhf %v1, %v26, %v26	; CHECK-DAG: vmrhf [[REG0:%v[0-9]+]], %v26, %v26
	; CHECK-NEXT: vmrhf %v2, %v24, %v24	; CHECK-DAG: vmrhf [[REG1:%v[0-9]+]], %v24, %v24
	; CHECK-NEXT: vldeb %v1, %v1	; CHECK-DAG: vldeb [[REG0]], [[REG0]]
	; CHECK-NEXT: vldeb %v2, %v2	; CHECK-DAG: vldeb [[REG1]], [[REG1]]
	; CHECK-NEXT: vfchdb %v1, %v2, %v1	; CHECK-DAG: vfchdb %v1, [[REG1]], [[REG0]]
	; CHECK-NEXT: vpkg %v0, %v1, %v0	; CHECK-DAG: vpkg %v0, %v1, %v0
	; CHECK-NEXT: vuphf %v0, %v0	; CHECK-DAG: vuphf %v0, %v0
	; CHECK-NEXT: vfchdb %v1, %v28, %v30	; CHECK-DAG: vfchdb [[REG2:%v[0-9]+]], %v28, %v30
	; CHECK-NEXT: vo %v0, %v0, %v1	; CHECK-NEXT: vo %v0, %v0, [[REG2]]
	; CHECK-NEXT: vsel %v24, %v25, %v27, %v0	; CHECK-NEXT: vsel %v24, %v25, %v27, %v0
	; CHECK-NEXT: br %r14	; CHECK-NEXT: br %r14
	;	;
	; CHECK-Z14-LABEL: fun26:	; CHECK-Z14-LABEL: fun26:
	; CHECK-Z14: # %bb.0:	; CHECK-Z14: # %bb.0:
	; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26	; CHECK-Z14-NEXT: vfchsb %v0, %v24, %v26
	; CHECK-Z14-NEXT: vuphf %v0, %v0	; CHECK-Z14-DAG: vuphf %v0, %v0
	; CHECK-Z14-NEXT: vfchdb %v1, %v28, %v30	; CHECK-Z14-DAG: vfchdb %v1, %v28, %v30
	; CHECK-Z14-NEXT: vo %v0, %v0, %v1	; CHECK-Z14-NEXT: vo %v0, %v0, %v1
	; CHECK-Z14-NEXT: vsel %v24, %v25, %v27, %v0	; CHECK-Z14-NEXT: vsel %v24, %v25, %v27, %v0
	; CHECK-Z14-NEXT: br %r14	; CHECK-Z14-NEXT: br %r14
Context not available.
	; CHECK-DAG: vmrhf [[REG17:%v[0-9]+]], %v30, %v30	; CHECK-DAG: vmrhf [[REG17:%v[0-9]+]], %v30, %v30
	; CHECK-DAG: vldeb [[REG19:%v[0-9]+]], [[REG17]]	; CHECK-DAG: vldeb [[REG19:%v[0-9]+]], [[REG17]]
	; CHECK-DAG: vldeb [[REG20:%v[0-9]+]], [[REG8]]	; CHECK-DAG: vldeb [[REG20:%v[0-9]+]], [[REG8]]
	; CHECK-NEXT: vfchdb %v2, [[REG20]], [[REG19]]	; CHECK-NEXT: vfchdb [[REG22:%v[0-9]+]], [[REG20]], [[REG19]]
	; CHECK-NEXT: vpkg [[REG21:%v[0-9]+]], %v2, [[REG16]]	; CHECK-NEXT: vpkg [[REG21:%v[0-9]+]], [[REG22]], [[REG16]]
	; CHECK-NEXT: vx %v0, [[REG11]], [[REG21]]	; CHECK-NEXT: vx %v0, [[REG11]], [[REG21]]
	; CHECK-NEXT: vsel %v24, %v25, %v27, %v0	; CHECK-NEXT: vsel %v24, %v25, %v27, %v0
	; CHECK-NEXT: br %r14	; CHECK-NEXT: br %r14
Context not available.
	; CHECK: # %bb.0:	; CHECK: # %bb.0:
	; CHECK-NEXT: vmrlf %v0, %v26, %v26	; CHECK-NEXT: vmrlf %v0, %v26, %v26
	; CHECK-NEXT: vmrlf %v1, %v24, %v24	; CHECK-NEXT: vmrlf %v1, %v24, %v24
	; CHECK-NEXT: vldeb %v0, %v0	; CHECK-DAG: vldeb %v0, %v0
	; CHECK-NEXT: vldeb %v1, %v1	; CHECK-DAG: vldeb %v1, %v1
	; CHECK-NEXT: vfchdb %v0, %v1, %v0	; CHECK-DAG: vfchdb %v0, %v1, %v0
	; CHECK-NEXT: vmrhf %v1, %v26, %v26	; CHECK-DAG: vmrhf [[REG0:%v[0-9]+]], %v26, %v26
	; CHECK-NEXT: vmrhf %v2, %v24, %v24	; CHECK-DAG: vmrhf [[REG1:%v[0-9]+]], %v24, %v24
	; CHECK-NEXT: vldeb %v1, %v1	; CHECK-DAG: vldeb [[REG0]], [[REG0]]
	; CHECK-NEXT: vmrhf %v3, %v28, %v28	; CHECK-DAG: vmrhf [[REG2:%v[0-9]+]], %v28, %v28
	; CHECK-NEXT: vldeb %v2, %v2	; CHECK-DAG: vldeb [[REG1]], [[REG1]]
	; CHECK-NEXT: vfchdb %v1, %v2, %v1	; CHECK-DAG: vfchdb [[REG3:%v[0-9]+]], [[REG1]], [[REG0]]
	; CHECK-NEXT: vpkg %v0, %v1, %v0	; CHECK-DAG: vpkg %v0, [[REG3]], %v0
	; CHECK-NEXT: vmrlf %v1, %v30, %v30	; CHECK-DAG: vmrlf %v1, %v30, %v30
	; CHECK-NEXT: vmrlf %v2, %v28, %v28	; CHECK-DAG: vmrlf [[REG4:%v[0-9]+]], %v28, %v28
	; CHECK-NEXT: vldeb %v1, %v1	; CHECK-DAG: vldeb %v1, %v1
	; CHECK-NEXT: vldeb %v2, %v2	; CHECK-DAG: vldeb [[REG4]], [[REG4]]
	; CHECK-NEXT: vfchdb %v1, %v2, %v1	; CHECK-DAG: vfchdb %v1, [[REG4]], %v1
	; CHECK-NEXT: vmrhf %v2, %v30, %v30	; CHECK-DAG: vmrhf [[REG5:%v[0-9]+]], %v30, %v30
	; CHECK-NEXT: vldeb %v2, %v2	; CHECK-NEXT: vldeb [[REG5]], [[REG5]]
	; CHECK-NEXT: vldeb %v3, %v3	; CHECK-NEXT: vldeb [[REG2]], [[REG2]]
	; CHECK-NEXT: vfchdb %v2, %v3, %v2	; CHECK-NEXT: vfchdb [[REG6:%v[0-9]+]], [[REG2]], [[REG5]]
	; CHECK-NEXT: vpkg %v1, %v2, %v1	; CHECK-NEXT: vpkg %v1, [[REG6]], %v1
	; CHECK-NEXT: vx %v0, %v0, %v1	; CHECK-NEXT: vx %v0, %v0, %v1
	; CHECK-NEXT: vmrlg %v1, %v0, %v0	; CHECK-NEXT: vmrlg %v1, %v0, %v0
	; CHECK-NEXT: vuphf %v1, %v1	; CHECK-DAG: vuphf %v1, %v1
	; CHECK-NEXT: vuphf %v0, %v0	; CHECK-DAG: vuphf %v0, %v0
	; CHECK-NEXT: vsel %v24, %v25, %v29, %v0	; CHECK-NEXT: vsel %v24, %v25, %v29, %v0
	; CHECK-NEXT: vsel %v26, %v27, %v31, %v1	; CHECK-NEXT: vsel %v26, %v27, %v31, %v1
	; CHECK-NEXT: br %r14	; CHECK-NEXT: br %r14
Context not available.
	; CHECK-Z14-NEXT: vfchsb %v1, %v28, %v30	; CHECK-Z14-NEXT: vfchsb %v1, %v28, %v30
	; CHECK-Z14-NEXT: vx %v0, %v0, %v1	; CHECK-Z14-NEXT: vx %v0, %v0, %v1
	; CHECK-Z14-NEXT: vmrlg %v1, %v0, %v0	; CHECK-Z14-NEXT: vmrlg %v1, %v0, %v0
	; CHECK-Z14-NEXT: vuphf %v1, %v1	; CHECK-Z14-DAG: vuphf %v1, %v1
	; CHECK-Z14-NEXT: vuphf %v0, %v0	; CHECK-Z14-DAG: vuphf %v0, %v0
	; CHECK-Z14-NEXT: vsel %v24, %v25, %v29, %v0	; CHECK-Z14-NEXT: vsel %v24, %v25, %v29, %v0
	; CHECK-Z14-NEXT: vsel %v26, %v27, %v31, %v1	; CHECK-Z14-NEXT: vsel %v26, %v27, %v31, %v1
	; CHECK-Z14-NEXT: br %r14	; CHECK-Z14-NEXT: br %r14
Context not available.
	define <8 x float> @fun30(<8 x float> %val1, <8 x float> %val2, <8 x double> %val3, <8 x double> %val4, <8 x float> %val5, <8 x float> %val6) {	define <8 x float> @fun30(<8 x float> %val1, <8 x float> %val2, <8 x double> %val3, <8 x double> %val4, <8 x float> %val5, <8 x float> %val6) {
	; CHECK-LABEL: fun30:	; CHECK-LABEL: fun30:
	; CHECK: # %bb.0:	; CHECK: # %bb.0:
	; CHECK-NEXT: vmrlf %v16, %v28, %v28	; CHECK-DAG: vmrlf [[REG0:%v[0-9]+]], %v28, %v28
	; CHECK-NEXT: vmrlf %v17, %v24, %v24	; CHECK-DAG: vmrlf [[REG1:%v[0-9]+]], %v24, %v24
	; CHECK-NEXT: vldeb %v16, %v16	; CHECK-DAG: vldeb [[REG0]], [[REG0]]
	; CHECK-NEXT: vldeb %v17, %v17	; CHECK-DAG: vldeb [[REG1]], [[REG1]]
	; CHECK-NEXT: vfchdb %v16, %v17, %v16	; CHECK-DAG: vfchdb [[REG2:%v[0-9]+]], [[REG1]], [[REG0]]
	; CHECK-NEXT: vmrhf %v17, %v28, %v28	; CHECK-DAG: vmrhf %v17, %v28, %v28
	; CHECK-NEXT: vmrhf %v18, %v24, %v24	; CHECK-DAG: vmrhf %v18, %v24, %v24
	; CHECK-NEXT: vldeb %v17, %v17	; CHECK-DAG: vldeb %v17, %v17
	; CHECK-NEXT: vl %v4, 192(%r15)	; CHECK-DAG: vl [[REG3:%v[0-9]+]], 192(%r15)
	; CHECK-NEXT: vldeb %v18, %v18	; CHECK-DAG: vldeb %v18, %v18
	; CHECK-NEXT: vl %v5, 208(%r15)	; CHECK-DAG: vl [[REG4:%v[0-9]+]], 208(%r15)
	; CHECK-NEXT: vl %v6, 160(%r15)	; CHECK-DAG: vl [[REG5:%v[0-9]+]], 160(%r15)
	; CHECK-NEXT: vl %v7, 176(%r15)	; CHECK-DAG: vl [[REG6:%v[0-9]+]], 176(%r15)
	; CHECK-NEXT: vl %v0, 272(%r15)	; CHECK-DAG: vl [[REG7:%v[0-9]+]], 272(%r15)
	; CHECK-NEXT: vl %v1, 240(%r15)	; CHECK-DAG: vl [[REG8:%v[0-9]+]], 240(%r15)
	; CHECK-NEXT: vfchdb %v17, %v18, %v17	; CHECK-DAG: vfchdb [[REG9:%v[0-9]+]], %v18, %v17
	; CHECK-NEXT: vl %v2, 256(%r15)	; CHECK-DAG: vl [[REG10:%v[0-9]+]], 256(%r15)
	; CHECK-NEXT: vl %v3, 224(%r15)	; CHECK-DAG: vl [[REG11:%v[0-9]+]], 224(%r15)
	; CHECK-NEXT: vpkg %v16, %v17, %v16	; CHECK-DAG: vpkg [[REG12:%v[0-9]+]], [[REG9]], [[REG2]]
	; CHECK-NEXT: vmrlf %v17, %v30, %v30	; CHECK-DAG: vmrlf [[REG13:%v[0-9]+]], %v30, %v30
	; CHECK-NEXT: vmrlf %v18, %v26, %v26	; CHECK-DAG: vmrlf [[REG14:%v[0-9]+]], %v26, %v26
	; CHECK-NEXT: vmrhf %v19, %v26, %v26	; CHECK-DAG: vmrhf [[REG15:%v[0-9]+]], %v26, %v26
	; CHECK-NEXT: vfchdb %v7, %v27, %v7	; CHECK-DAG: vfchdb [[REG16:%v[0-9]+]], %v27, [[REG6]]
	; CHECK-NEXT: vfchdb %v6, %v25, %v6	; CHECK-DAG: vfchdb [[REG17:%v[0-9]+]], %v25, [[REG5]]
	; CHECK-NEXT: vfchdb %v5, %v31, %v5	; CHECK-DAG: vfchdb [[REG18:%v[0-9]+]], %v31, [[REG4]]
	; CHECK-NEXT: vfchdb %v4, %v29, %v4	; CHECK-DAG: vfchdb [[REG19:%v[0-9]+]], %v29, [[REG3]]
	; CHECK-NEXT: vpkg %v6, %v6, %v7	; CHECK-DAG: vpkg [[REG20:%v[0-9]+]], [[REG17]], [[REG16]]
	; CHECK-NEXT: vpkg %v4, %v4, %v5	; CHECK-DAG: vpkg [[REG21:%v[0-9]+]], [[REG19]], [[REG18]]
	; CHECK-NEXT: vn %v5, %v16, %v6	; CHECK-DAG: vn [[REG22:%v[0-9]+]], [[REG12]], [[REG20]]
	; CHECK-NEXT: vsel %v24, %v3, %v2, %v5	; CHECK-DAG: vsel %v24, [[REG11]], [[REG10]], [[REG22]]
	; CHECK-NEXT: vldeb %v17, %v17	; CHECK-DAG: vldeb [[REG13]], [[REG13]]
	; CHECK-NEXT: vldeb %v18, %v18	; CHECK-DAG: vldeb [[REG14]], [[REG14]]
	; CHECK-NEXT: vfchdb %v17, %v18, %v17	; CHECK-DAG: vfchdb [[REG23:%v[0-9]+]], [[REG14]], [[REG13]]
	; CHECK-NEXT: vmrhf %v18, %v30, %v30	; CHECK-DAG: vmrhf [[REG24:%v[0-9]+]], %v30, %v30
	; CHECK-NEXT: vldeb %v18, %v18	; CHECK-DAG: vldeb [[REG24]], [[REG24]]
	; CHECK-NEXT: vldeb %v19, %v19	; CHECK-DAG: vldeb [[REG15]], [[REG15]]
	; CHECK-NEXT: vfchdb %v18, %v19, %v18	; CHECK-DAG: vfchdb [[REG25:%v[0-9]+]], [[REG15]], [[REG24]]
	; CHECK-NEXT: vpkg %v17, %v18, %v17	; CHECK-DAG: vpkg [[REG26:%v[0-9]+]], [[REG25]], [[REG23]]
	; CHECK-NEXT: vn %v4, %v17, %v4	; CHECK-DAG: vn [[REG27:%v[0-9]+]], [[REG26]], [[REG21]]
	; CHECK-NEXT: vsel %v26, %v1, %v0, %v4	; CHECK-DAG: vsel %v26, [[REG8]], [[REG7]], [[REG27]]
	; CHECK-NEXT: br %r14	; CHECK-NEXT: br %r14
	;	;
	; CHECK-Z14-LABEL: fun30:	; CHECK-Z14-LABEL: fun30:
	; CHECK-Z14: # %bb.0:	; CHECK-Z14: # %bb.0:
	; CHECK-Z14-NEXT: vl %v4, 192(%r15)	; CHECK-Z14-NEXT: vl [[REG0:%v[0-9]+]], 192(%r15)
	; CHECK-Z14-NEXT: vl %v5, 208(%r15)	; CHECK-Z14-NEXT: vl [[REG1:%v[0-9]+]], 208(%r15)
	; CHECK-Z14-NEXT: vl %v6, 160(%r15)	; CHECK-Z14-NEXT: vl [[REG2:%v[0-9]+]], 160(%r15)
	; CHECK-Z14-NEXT: vl %v7, 176(%r15)	; CHECK-Z14-NEXT: vl [[REG3:%v[0-9]+]], 176(%r15)
	; CHECK-Z14-NEXT: vfchdb %v7, %v27, %v7	; CHECK-Z14-NEXT: vfchdb [[REG4:%v[0-9]+]], %v27, [[REG3]]
	; CHECK-Z14-NEXT: vfchdb %v6, %v25, %v6	; CHECK-Z14-NEXT: vfchdb [[REG5:%v[0-9]+]], %v25, [[REG2]]
	; CHECK-Z14-NEXT: vfchdb %v5, %v31, %v5	; CHECK-Z14-NEXT: vfchdb [[REG6:%v[0-9]+]], %v31, [[REG1]]
	; CHECK-Z14-NEXT: vfchdb %v4, %v29, %v4	; CHECK-Z14-NEXT: vfchdb [[REG7:%v[0-9]+]], %v29, [[REG0]]
	; CHECK-Z14-NEXT: vfchsb %v16, %v24, %v28	; CHECK-Z14-NEXT: vfchsb [[REG8:%v[0-9]+]], %v24, %v28
	; CHECK-Z14-NEXT: vfchsb %v17, %v26, %v30	; CHECK-Z14-NEXT: vfchsb [[REG9:%v[0-9]+]], %v26, %v30
	; CHECK-Z14-NEXT: vpkg %v6, %v6, %v7	; CHECK-Z14-NEXT: vpkg [[REG10:%v[0-9]+]], [[REG5]], [[REG4]]
	; CHECK-Z14-NEXT: vpkg %v4, %v4, %v5	; CHECK-Z14-NEXT: vpkg [[REG11:%v[0-9]+]], [[REG7]], [[REG6]]
	; CHECK-Z14-NEXT: vl %v0, 272(%r15)	; CHECK-Z14-NEXT: vl %v0, 272(%r15)
	; CHECK-Z14-NEXT: vl %v1, 240(%r15)	; CHECK-Z14-NEXT: vl %v1, 240(%r15)
	; CHECK-Z14-NEXT: vl %v2, 256(%r15)	; CHECK-Z14-NEXT: vl %v2, 256(%r15)
	; CHECK-Z14-NEXT: vl %v3, 224(%r15)	; CHECK-Z14-NEXT: vl [[REG14:%v[0-9]+]], 224(%r15)
	; CHECK-Z14-NEXT: vn %v4, %v17, %v4	; CHECK-Z14-NEXT: vn [[REG12:%v[0-9]+]], [[REG9]], [[REG11]]
	; CHECK-Z14-NEXT: vn %v5, %v16, %v6	; CHECK-Z14-NEXT: vn [[REG13:%v[0-9]+]], [[REG8]], [[REG10]]
	; CHECK-Z14-NEXT: vsel %v24, %v3, %v2, %v5	; CHECK-Z14-NEXT: vsel %v24, [[REG14]], %v2, [[REG13]]
	; CHECK-Z14-NEXT: vsel %v26, %v1, %v0, %v4	; CHECK-Z14-NEXT: vsel %v26, %v1, %v0, [[REG12]]
	; CHECK-Z14-NEXT: br %r14	; CHECK-Z14-NEXT: br %r14
	%cmp0 = fcmp ogt <8 x float> %val1, %val2	%cmp0 = fcmp ogt <8 x float> %val1, %val2
	%cmp1 = fcmp ogt <8 x double> %val3, %val4	%cmp1 = fcmp ogt <8 x double> %val3, %val4
Context not available.
	define <4 x float> @fun33(<4 x double> %val1, <4 x double> %val2, <4 x float> %val3, <4 x float> %val4, <4 x float> %val5, <4 x float> %val6) {	define <4 x float> @fun33(<4 x double> %val1, <4 x double> %val2, <4 x float> %val3, <4 x float> %val4, <4 x float> %val5, <4 x float> %val6) {
	; CHECK-LABEL: fun33:	; CHECK-LABEL: fun33:
	; CHECK: # %bb.0:	; CHECK: # %bb.0:
	; CHECK-NEXT: vfchdb %v0, %v26, %v30	; CHECK-DAG: vfchdb %v0, %v26, %v30
	; CHECK-NEXT: vfchdb %v1, %v24, %v28	; CHECK-DAG: vfchdb %v1, %v24, %v28
	; CHECK-NEXT: vpkg %v0, %v1, %v0	; CHECK-DAG: vpkg %v0, %v1, %v0
	; CHECK-NEXT: vmrlf %v1, %v27, %v27	; CHECK-DAG: vmrlf [[REG0:%v[0-9]+]], %v27, %v27
	; CHECK-NEXT: vmrlf %v2, %v25, %v25	; CHECK-DAG: vmrlf [[REG1:%v[0-9]+]], %v25, %v25
	; CHECK-NEXT: vldeb %v1, %v1	; CHECK-DAG: vldeb [[REG2:%v[0-9]+]], [[REG0]]
	; CHECK-NEXT: vldeb %v2, %v2	; CHECK-DAG: vldeb [[REG3:%v[0-9]+]], [[REG1]]
	; CHECK-NEXT: vfchdb %v1, %v2, %v1	; CHECK-DAG: vfchdb [[REG4:%v[0-9]+]], [[REG3]], [[REG2]]
	; CHECK-NEXT: vmrhf %v2, %v27, %v27	; CHECK-DAG: vmrhf [[REG5:%v[0-9]+]], %v27, %v27
	; CHECK-NEXT: vmrhf %v3, %v25, %v25	; CHECK-DAG: vmrhf [[REG6:%v[0-9]+]], %v25, %v25
	; CHECK-NEXT: vldeb %v2, %v2	; CHECK-DAG: vldeb [[REG7:%v[0-9]+]], [[REG5]]
	; CHECK-NEXT: vldeb %v3, %v3	; CHECK-DAG: vldeb [[REG8:%v[0-9]+]], [[REG6]]
	; CHECK-NEXT: vfchdb %v2, %v3, %v2	; CHECK-DAG: vfchdb %v2, [[REG8]], [[REG7]]
	; CHECK-NEXT: vpkg %v1, %v2, %v1	; CHECK-NEXT: vpkg %v1, %v2, [[REG4]]
	; CHECK-NEXT: vn %v0, %v0, %v1	; CHECK-NEXT: vn %v0, %v0, %v1
	; CHECK-NEXT: vsel %v24, %v29, %v31, %v0	; CHECK-NEXT: vsel %v24, %v29, %v31, %v0
	; CHECK-NEXT: br %r14	; CHECK-NEXT: br %r14
Context not available.
	; CHECK-Z14: # %bb.0:	; CHECK-Z14: # %bb.0:
	; CHECK-Z14-NEXT: vfchdb %v0, %v26, %v30	; CHECK-Z14-NEXT: vfchdb %v0, %v26, %v30
	; CHECK-Z14-NEXT: vfchdb %v1, %v24, %v28	; CHECK-Z14-NEXT: vfchdb %v1, %v24, %v28
	; CHECK-Z14-NEXT: vpkg %v0, %v1, %v0	; CHECK-Z14-DAG: vpkg %v0, %v1, %v0
	; CHECK-Z14-NEXT: vfchsb %v1, %v25, %v27	; CHECK-Z14-DAG: vfchsb [[REG0:%v[0-9]+]], %v25, %v27
	; CHECK-Z14-NEXT: vn %v0, %v0, %v1	; CHECK-Z14-NEXT: vn %v0, %v0, [[REG0]]
	; CHECK-Z14-NEXT: vsel %v24, %v29, %v31, %v0	; CHECK-Z14-NEXT: vsel %v24, %v29, %v31, %v0
	; CHECK-Z14-NEXT: br %r14	; CHECK-Z14-NEXT: br %r14
	%cmp0 = fcmp ogt <4 x double> %val1, %val2	%cmp0 = fcmp ogt <4 x double> %val1, %val2
Context not available.
	define <4 x double> @fun34(<4 x double> %val1, <4 x double> %val2, <4 x float> %val3, <4 x float> %val4, <4 x double> %val5, <4 x double> %val6) {	define <4 x double> @fun34(<4 x double> %val1, <4 x double> %val2, <4 x float> %val3, <4 x float> %val4, <4 x double> %val5, <4 x double> %val6) {
	; CHECK-LABEL: fun34:	; CHECK-LABEL: fun34:
	; CHECK: # %bb.0:	; CHECK: # %bb.0:
	; CHECK-NEXT: vmrlf [[REG0:%v[0-9]+]], %v27, %v27	; CHECK-DAG: vmrlf [[REG0:%v[0-9]+]], %v27, %v27
	; CHECK-NEXT: vmrlf [[REG1:%v[0-9]+]], %v25, %v25	; CHECK-DAG: vmrlf [[REG1:%v[0-9]+]], %v25, %v25
	; CHECK-NEXT: vldeb [[REG2:%v[0-9]+]], [[REG0]]	; CHECK-DAG: vldeb [[REG2:%v[0-9]+]], [[REG0]]
	; CHECK-NEXT: vldeb [[REG3:%v[0-9]+]], [[REG1]]	; CHECK-DAG: vldeb [[REG3:%v[0-9]+]], [[REG1]]
	; CHECK-NEXT: vfchdb [[REG4:%v[0-9]+]], [[REG3]], [[REG2]]	; CHECK-DAG: vfchdb [[REG4:%v[0-9]+]], [[REG3]], [[REG2]]
	; CHECK-NEXT: vmrhf [[REG5:%v[0-9]+]], %v27, %v27	; CHECK-DAG: vmrhf [[REG5:%v[0-9]+]], %v27, %v27
	; CHECK-NEXT: vmrhf [[REG6:%v[0-9]+]], %v25, %v25	; CHECK-DAG: vmrhf [[REG6:%v[0-9]+]], %v25, %v25
	; CHECK-DAG: vldeb [[REG7:%v[0-9]+]], [[REG5]]	; CHECK-DAG: vldeb [[REG7:%v[0-9]+]], [[REG5]]
	; CHECK-DAG: vl [[REG8:%v[0-9]+]], 176(%r15)	; CHECK-DAG: vl [[REG8:%v[0-9]+]], 176(%r15)
	; CHECK-DAG: vldeb [[REG9:%v[0-9]+]], [[REG6]]	; CHECK-DAG: vldeb [[REG9:%v[0-9]+]], [[REG6]]
Context not available.
	; CHECK-NEXT: vfchdb [[REG15:%v[0-9]+]], %v24, %v28	; CHECK-NEXT: vfchdb [[REG15:%v[0-9]+]], %v24, %v28
	; CHECK-NEXT: vfchdb [[REG16:%v[0-9]+]], %v26, %v30	; CHECK-NEXT: vfchdb [[REG16:%v[0-9]+]], %v26, %v30
	; CHECK-NEXT: vuphf [[REG17:%v[0-9]+]], [[REG14]]	; CHECK-NEXT: vuphf [[REG17:%v[0-9]+]], [[REG14]]
	; CHECK-NEXT: vn [[REG18:%v[0-9]+]], [[REG16]], [[REG17]]	; CHECK-DAG: vn [[REG18:%v[0-9]+]], [[REG16]], [[REG17]]
	; CHECK-NEXT: vn [[REG19:%v[0-9]+]], [[REG15]], [[REG13]]	; CHECK-DAG: vn [[REG19:%v[0-9]+]], [[REG15]], [[REG13]]
	; CHECK-NEXT: vsel %v24, %v29, [[REG10]], [[REG19]]	; CHECK-NEXT: vsel %v24, %v29, [[REG10]], [[REG19]]
	; CHECK-NEXT: vsel %v26, %v31, [[REG8]], [[REG18]]	; CHECK-NEXT: vsel %v26, %v31, [[REG8]], [[REG18]]
	; CHECK-NEXT: br %r14	; CHECK-NEXT: br %r14
	;	;
	; CHECK-Z14-LABEL: fun34:	; CHECK-Z14-LABEL: fun34:
	; CHECK-Z14: # %bb.0:	; CHECK-Z14: # %bb.0:
	; CHECK-Z14-NEXT: vfchsb %v4, %v25, %v27	; CHECK-Z14-NEXT: vfchsb [[REG0:%v[0-9]+]], %v25, %v27
	; CHECK-Z14-NEXT: vuphf %v5, %v4	; CHECK-Z14-NEXT: vuphf [[REG1:%v[0-9]+]], [[REG0]]
	; CHECK-Z14-NEXT: vmrlg %v4, %v4, %v4	; CHECK-Z14-NEXT: vmrlg [[REG0]], [[REG0]], [[REG0]]
	; CHECK-Z14-NEXT: vfchdb %v2, %v24, %v28	; CHECK-Z14-NEXT: vfchdb [[REG2:%v[0-9]+]], %v24, %v28
	; CHECK-Z14-NEXT: vfchdb %v3, %v26, %v30	; CHECK-Z14-NEXT: vfchdb [[REG3:%v[0-9]+]], %v26, %v30
	; CHECK-Z14-NEXT: vuphf %v4, %v4	; CHECK-Z14-NEXT: vuphf [[REG0]], [[REG0]]
	; CHECK-Z14-NEXT: vl %v0, 176(%r15)	; CHECK-Z14-NEXT: vl %v0, 176(%r15)
	; CHECK-Z14-NEXT: vl %v1, 160(%r15)	; CHECK-Z14-NEXT: vl [[REG4:%v[0-9]+]], 160(%r15)
	; CHECK-Z14-NEXT: vn %v3, %v3, %v4	; CHECK-Z14-DAG: vn [[REG5:%v[0-9]+]], [[REG3]], [[REG0]]
	; CHECK-Z14-NEXT: vn %v2, %v2, %v5	; CHECK-Z14-DAG: vn [[REG6:%v[0-9]+]], [[REG2]], [[REG1]]
	; CHECK-Z14-NEXT: vsel %v24, %v29, %v1, %v2	; CHECK-Z14-NEXT: vsel %v24, %v29, [[REG4]], [[REG6]]
	; CHECK-Z14-NEXT: vsel %v26, %v31, %v0, %v3	; CHECK-Z14-NEXT: vsel %v26, %v31, %v0, [[REG5]]
	; CHECK-Z14-NEXT: br %r14	; CHECK-Z14-NEXT: br %r14
	%cmp0 = fcmp ogt <4 x double> %val1, %val2	%cmp0 = fcmp ogt <4 x double> %val1, %val2
	%cmp1 = fcmp ogt <4 x float> %val3, %val4	%cmp1 = fcmp ogt <4 x float> %val3, %val4
Context not available.

test/CodeGen/SystemZ/vec-cmpsel.ll

Context not available.
	define <2 x float> @fun25(<2 x float> %val1, <2 x float> %val2, <2 x float> %val3, <2 x float> %val4) {	define <2 x float> @fun25(<2 x float> %val1, <2 x float> %val2, <2 x float> %val3, <2 x float> %val4) {
	; CHECK-LABEL: fun25:	; CHECK-LABEL: fun25:
	; CHECK: # %bb.0:	; CHECK: # %bb.0:
	; CHECK-NEXT: vmrlf %v0, %v26, %v26	; CHECK-DAG: vmrlf %v0, %v26, %v26
	; CHECK-NEXT: vmrlf %v1, %v24, %v24	; CHECK-DAG: vmrlf %v1, %v24, %v24
	; CHECK-NEXT: vldeb %v0, %v0	; CHECK-DAG: vldeb %v0, %v0
	; CHECK-NEXT: vldeb %v1, %v1	; CHECK-DAG: vldeb %v1, %v1
	; CHECK-NEXT: vfchdb %v0, %v1, %v0	; CHECK-DAG: vfchdb [[REG0:%v[0-9]+]], %v1, %v0
	; CHECK-NEXT: vmrhf %v1, %v26, %v26	; CHECK-DAG: vmrhf [[REG1:%v[0-9]+]], %v26, %v26
	; CHECK-NEXT: vmrhf %v2, %v24, %v24	; CHECK-DAG: vmrhf [[REG2:%v[0-9]+]], %v24, %v24
	; CHECK-NEXT: vldeb %v1, %v1	; CHECK-DAG: vldeb [[REG1]], [[REG1]]
	; CHECK-NEXT: vldeb %v2, %v2	; CHECK-DAG: vldeb [[REG2]], [[REG2]]
	; CHECK-NEXT: vfchdb %v1, %v2, %v1	; CHECK-DAG: vfchdb [[REG3:%v[0-9]+]], [[REG2]], [[REG1]]
	; CHECK-NEXT: vpkg %v0, %v1, %v0	; CHECK-NEXT: vpkg %v0, [[REG3]], [[REG0]]
	; CHECK-NEXT: vsel %v24, %v28, %v30, %v0	; CHECK-NEXT: vsel %v24, %v28, %v30, %v0
	; CHECK-NEXT: br %r14	; CHECK-NEXT: br %r14

Context not available.
	define <2 x double> @fun26(<2 x float> %val1, <2 x float> %val2, <2 x double> %val3, <2 x double> %val4) {	define <2 x double> @fun26(<2 x float> %val1, <2 x float> %val2, <2 x double> %val3, <2 x double> %val4) {
	; CHECK-LABEL: fun26:	; CHECK-LABEL: fun26:
	; CHECK: # %bb.0:	; CHECK: # %bb.0:
	; CHECK-NEXT: vmrlf %v0, %v26, %v26	; CHECK-DAG: vmrlf %v0, %v26, %v26
	; CHECK-NEXT: vmrlf %v1, %v24, %v24	; CHECK-DAG: vmrlf %v1, %v24, %v24
	; CHECK-NEXT: vldeb %v0, %v0	; CHECK-DAG: vldeb %v0, %v0
	; CHECK-NEXT: vldeb %v1, %v1	; CHECK-DAG: vldeb %v1, %v1
	; CHECK-NEXT: vfchdb %v0, %v1, %v0	; CHECK-DAG: vfchdb %v0, %v1, %v0
	; CHECK-NEXT: vmrhf %v1, %v26, %v26	; CHECK-DAG: vmrhf [[REG0:%v[0-9]+]], %v26, %v26
	; CHECK-NEXT: vmrhf %v2, %v24, %v24	; CHECK-DAG: vmrhf [[REG1:%v[0-9]+]], %v24, %v24
	; CHECK-NEXT: vldeb %v1, %v1	; CHECK-DAG: vldeb [[REG0]], [[REG0]]
	; CHECK-NEXT: vldeb %v2, %v2	; CHECK-DAG: vldeb [[REG1]], [[REG1]]
	; CHECK-NEXT: vfchdb %v1, %v2, %v1	; CHECK-DAG: vfchdb %v1, [[REG1]], [[REG0]]
	; CHECK-NEXT: vpkg %v0, %v1, %v0	; CHECK-NEXT: vpkg %v0, %v1, %v0
	; CHECK-NEXT: vuphf %v0, %v0	; CHECK-NEXT: vuphf %v0, %v0
	; CHECK-NEXT: vsel %v24, %v28, %v30, %v0	; CHECK-NEXT: vsel %v24, %v28, %v30, %v0
Context not available.
	; CHECK: # %bb.0:	; CHECK: # %bb.0:
	; CHECK-NEXT: vmrlf %v0, %v26, %v26	; CHECK-NEXT: vmrlf %v0, %v26, %v26
	; CHECK-NEXT: vmrlf %v1, %v24, %v24	; CHECK-NEXT: vmrlf %v1, %v24, %v24
	; CHECK-NEXT: vldeb %v0, %v0	; CHECK-DAG: vldeb %v0, %v0
	; CHECK-NEXT: vldeb %v1, %v1	; CHECK-DAG: vldeb %v1, %v1
	; CHECK-NEXT: vfchdb %v0, %v1, %v0	; CHECK-DAG: vfchdb %v0, %v1, %v0
	; CHECK-NEXT: vmrhf %v1, %v26, %v26	; CHECK-DAG: vmrhf [[REG0:%v[0-9]+]], %v26, %v26
	; CHECK-NEXT: vmrhf %v2, %v24, %v24	; CHECK-DAG: vmrhf [[REG1:%v[0-9]+]], %v24, %v24
	; CHECK-NEXT: vldeb %v1, %v1	; CHECK-DAG: vldeb [[REG0]], [[REG0]]
	; CHECK-NEXT: vldeb %v2, %v2	; CHECK-DAG: vldeb [[REG1]], [[REG1]]
	; CHECK-NEXT: vfchdb %v1, %v2, %v1	; CHECK-DAG: vfchdb %v1, [[REG1]], [[REG0]]
	; CHECK-NEXT: vpkg %v0, %v1, %v0	; CHECK-NEXT: vpkg %v0, %v1, %v0
	; CHECK-NEXT: vsel %v24, %v28, %v30, %v0	; CHECK-NEXT: vsel %v24, %v28, %v30, %v0
	; CHECK-NEXT: br %r14	; CHECK-NEXT: br %r14
Context not available.
	; CHECK: # %bb.0:	; CHECK: # %bb.0:
	; CHECK-NEXT: vmrlf %v0, %v26, %v26	; CHECK-NEXT: vmrlf %v0, %v26, %v26
	; CHECK-NEXT: vmrlf %v1, %v24, %v24	; CHECK-NEXT: vmrlf %v1, %v24, %v24
	; CHECK-NEXT: vldeb %v0, %v0	; CHECK-DAG: vldeb %v0, %v0
	; CHECK-NEXT: vldeb %v1, %v1	; CHECK-DAG: vldeb %v1, %v1
	; CHECK-NEXT: vfchdb %v0, %v1, %v0	; CHECK-DAG: vfchdb %v0, %v1, %v0
	; CHECK-NEXT: vmrhf %v1, %v26, %v26	; CHECK-DAG: vmrhf [[REG0:%v[0-9]+]], %v26, %v26
	; CHECK-NEXT: vmrhf %v2, %v24, %v24	; CHECK-DAG: vmrhf [[REG1:%v[0-9]+]], %v24, %v24
	; CHECK-NEXT: vldeb %v1, %v1	; CHECK-DAG: vldeb [[REG0]], [[REG0]]
	; CHECK-NEXT: vldeb %v2, %v2	; CHECK-DAG: vldeb [[REG1]], [[REG1]]
	; CHECK-NEXT: vfchdb %v1, %v2, %v1	; CHECK-DAG: vfchdb %v1, [[REG1]], [[REG0]]
	; CHECK-NEXT: vpkg [[REG0:%v[0-9]+]], %v1, %v0	; CHECK-DAG: vpkg [[REG0:%v[0-9]+]], %v1, %v0
	; CHECK-DAG: vmrlg [[REG1:%v[0-9]+]], [[REG0]], [[REG0]]	; CHECK-DAG: vmrlg [[REG1:%v[0-9]+]], [[REG0]], [[REG0]]
	; CHECK-DAG: vuphf [[REG1]], [[REG1]]	; CHECK-DAG: vuphf [[REG1]], [[REG1]]
	; CHECK-DAG: vuphf [[REG2:%v[0-9]+]], [[REG0]]	; CHECK-DAG: vuphf [[REG2:%v[0-9]+]], [[REG0]]
Context not available.

test/CodeGen/SystemZ/vec-ctpop-01.ll

Context not available.

	define <4 x i32> @f3(<4 x i32> %a) {	define <4 x i32> @f3(<4 x i32> %a) {
	; CHECK-LABEL: f3:	; CHECK-LABEL: f3:
	; CHECK: vpopct [[T1:%v[0-9]+]], %v24, 0	; CHECK-DAG: vpopct [[T1:%v[0-9]+]], %v24, 0
	; CHECK: vgbm [[T2:%v[0-9]+]], 0	; CHECK-DAG: vgbm [[T2:%v[0-9]+]], 0
	; CHECK: vsumb %v24, [[T1]], [[T2]]	; CHECK: vsumb %v24, [[T1]], [[T2]]
	; CHECK: br %r14	; CHECK: br %r14

Context not available.

	define <2 x i64> @f4(<2 x i64> %a) {	define <2 x i64> @f4(<2 x i64> %a) {
	; CHECK-LABEL: f4:	; CHECK-LABEL: f4:
	; CHECK: vpopct [[T1:%v[0-9]+]], %v24, 0	; CHECK-DAG: vpopct [[T1:%v[0-9]+]], %v24, 0
	; CHECK: vgbm [[T2:%v[0-9]+]], 0	; CHECK-DAG: vgbm [[T2:%v[0-9]+]], 0
	; CHECK: vsumb [[T3:%v[0-9]+]], [[T1]], [[T2]]	; CHECK: vsumb [[T3:%v[0-9]+]], [[T1]], [[T2]]
	; CHECK: vsumgf %v24, [[T3]], [[T2]]	; CHECK: vsumgf %v24, [[T3]], [[T2]]
	; CHECK: br %r14	; CHECK: br %r14
Context not available.