This is an archive of the discontinued LLVM Phabricator instance.

Differential D67437

[SystemZ] Improve foldMemoryOperandImpl().
ClosedPublic

Authored by jonpa on Sep 11 2019, 4:42 AM.

Download Raw Diff

Details

Reviewers

uweigand

Commits

rG62ff9960d337: [SystemZ] Improve foldMemoryOperandImpl().

Summary

Swap the compare operands if LHS is spilled while updating the CCMask:s of the CC users.

This is relatively straight forward if the live-in lists for non-allocatable registers (CC) are assumed to be correct during register allocation. The experimental verifyLiveInLists_CC() has not detected any missing CC in any live-in lists so far, but this is still a missing piece to add if this is found useful enough.

On SPEC 2006, I see:

cg             :                10184                10861     +677
lg             :               373873               373204     -669
je             :               118876               119381     +505
cgrje          :                19726                19301     -425
c              :                17535                17941     +406
l              :                73704                73319     -385
jlh            :                61456                61810     +354
cgrjlh         :                11370                11189     -181
crjlh          :                 9502                 9353     -149
crje           :                 6488                 6406      -82
jle            :                12939                13000      +61
crjhe          :                 3843                 3787      -56
cgr            :                 1805                 1754      -51
crjle          :                 2508                 2462      -46
jhe            :                10908                10953      +45
lr             :                27011                26967      -44

I interpret this to mean that this patch handles some additional ~1000 cases. This seems to mean that the compare is now folded with the load load instead of with the jump, like 'lg; cgrje' => 'cg; je'. I am not sure this is really better, then? Would it be worth the trouble of adding some kind of live-in lists check before regalloc for non-allocatable phys reg(s)?

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jonpa created this revision.Sep 11 2019, 4:42 AM

I interpret this to mean that this patch handles some additional ~1000 cases. This seems to mean that the compare is now folded with the load load instead of with the jump, like 'lg; cgrje' => 'cg; je'. I am not sure this is really better, then? Would it be worth the trouble of adding some kind of live-in lists check before regalloc for non-allocatable phys reg(s)?

In general, 'lg; cgrje' should have the same performance as 'cg; je' (both have one single-issue and one dual-issue opcode on current micro-archs). However, the 'cg; je' version does not require a GPR to hold the value to be compared against, so it may be preferable due to register pressure concerns.

I'm not sure about the isCCLiveOut. In SystemZElimCompare we simply rely on this being correct. Why can't we rely on it here as well? Can we find out the specific rules that specify when the isLiveIn flags can be relied upon and when they cannot?

lib/Target/SystemZ/SystemZInstrInfo.cpp
1840 ↗	(On Diff #219689)	I'm not sure it is safe to assert here. If there's no CC users of a compare, then the compare is dead and will usually have been optimized away. But I'm not sure we can fully rely on that to have happened in all cases (e.g. what about -O0?).
1849 ↗	(On Diff #219689)	This logic duplicates the CC commutation operation done in SystemZISelLowering::combineCCMask, so it would be preferable to have a helper routine somewhere.
1871 ↗	(On Diff #219689)	Why do you need to modify the instruction here in the first place? The caller throws it away anyway and creates the memory variant. Should this just do the commutation in place there?
lib/Target/SystemZ/SystemZRegisterInfo.cpp
81 ↗	(On Diff #219689)	If we move it here, the copy in SystemZElimCompare is now no longer needed, right?

In general, 'lg; cgrje' should have the same performance as 'cg; je' (both have one single-issue and one dual-issue opcode on current micro-archs). However, the 'cg; je' version does not require a GPR to hold the value to be compared against, so it may be preferable due to register pressure concerns.

That's a good point. I see "Spill/Reload" comments in output go up with 4, while Copies go -28. (206 files different). On SPEC 2017, I see ~2600 cases, with "Spill/Reload" comments in output go up with 127, and Copies go -71. (371 files different).

I don't know exactly why there is an *increase* in spill/reload instructions and a decrease in register move instructions, but it anyways seem to be from this viewpoint not that interesting of a change, or?

I'm not sure about the isCCLiveOut. In SystemZElimCompare we simply rely on this being correct. Why can't we rely on it here as well? Can we find out the specific rules that specify when the isLiveIn flags can be relied upon and when they cannot?

When I asked about this on the list in June, the answer is that the live-in lists "...were intended only for post-ra passes and are only setup during the final rewriting pass.". So it seems there is *not* some kind of general agreement that live-in lists need to be correct at all times pre-/during RA. However, that seems to refer to VirtRegMap adding the allocated physical registers to the live-in lists, while CC is non-allocatable.

I tried a simple .mir program where I made CC used by a branch in a basic block where the definition (compare) was in the predecessor block. The MachineVerifier caught this as a "Bad machine code: Using an undefined physical register". If I add the CC to the live-in list, it passes the initial verification. So in a way the MachineVerifier has at least in this example demanded CC to either be defined in the basic block or be in the live-in list at the point of use.

SystemZElimCompare is after regalloc, but I can see that trusting live-in lists there or anywhere else seems to have the same implications. Maybe I was thinking that regalloc would set up all the live in lists correctly after its analysis of the function.

I am still not sure if I should go ahead and spend more time on this, given the very small change in spill/copies..?

Also recognize CLR and CLGR instructions which also can be commuted in order to fold the spilled operand. Compare logical register sets CC the same way as signed comparisons, so the trySwapCompareOperands() should work as is. This folds another ~200 reloads on benchmarks.

Also, a closer look at the issue with the (slight) increased number of spill/reload instructions in the output shows that MachineBlockPlacement is responsible for this difference since it will not duplicate the now bigger code snippets as willingly as when the load is folded. With -disable-block-placement the total difference over benchmarks is now negative, as expected.

However, a potential deficiency in the register allocator has also been reported (https://bugs.llvm.org/show_bug.cgi?id=43405). This was a test case where a folded compare operand caused a difference in the remainder of the allocation (the reload snippet no longer there to allocate), which ended up with adding 2 extra reloads. Hopefully this is something that can be improved in the register allocator.

Patch rebased and using LivePhysRegs. New tests added.

I thought about using a CutOff counter in the search for CC Users (a value of 50 is NFC on SPEC 2006). But then I thought that maybe this isn't really needed since many instructions define CC at which point the search stops...

I also realized that we could do WFCDB -> CDBR -> CDB, WFCSB -> CEBR -> CEB if the non-spilled reg is allocated to FP bank (and possibly also constrain a non-allocated reg to FP reg). This compare could also be swapped since the CC=3 for FP compares would not have to be handled, right? I am thinking this could wait until this patch has been committed, but not sure if it would be better to do both at the same time and then run benchmarks just one time...

The instruction mix difference is about the same as before on SPEC 2006, and ~230 files are affected.

Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2019, 8:12 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

In D67437#1734010, @jonpa wrote:

Patch rebased and using LivePhysRegs. New tests added.

I thought about using a CutOff counter in the search for CC Users (a value of 50 is NFC on SPEC 2006). But then I thought that maybe this isn't really needed since many instructions define CC at which point the search stops...

Agreed.

I also realized that we could do WFCDB -> CDBR -> CDB, WFCSB -> CEBR -> CEB if the non-spilled reg is allocated to FP bank (and possibly also constrain a non-allocated reg to FP reg). This compare could also be swapped since the CC=3 for FP compares would not have to be handled, right? I am thinking this could wait until this patch has been committed, but not sure if it would be better to do both at the same time and then run benchmarks just one time...

We definitely can swap FP compares. Converting WFCDB to FP is a bit of a trade-off since you need to restrict the register bank ... that probably needs more performance verification. I agree extending this to FP can be done in a separate patch.

Re-added some inline comments that seem to have been lost ...

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
1746	I believe we do not need this check if the loop below finds another CC def after MI. We only need to check live-outs if this MI is the last CC def in the basic block.
1761	I'm not sure it is safe to assert here. If there's no CC users of a compare, then the compare is dead and will usually have been optimized away. But I'm not sure we can fully rely on that to have happened in all cases (e.g. what about -O0?).
1769	This logic duplicates the CC commutation operation done in SystemZISelLowering::combineCCMask, so it would be preferable to have a helper routine somewhere.
1792	Why do you need to modify the instruction here in the first place? The caller throws it away anyway and creates the memory variant. Should this just do the commutation in place there?

Thanks for review - patch updated per suggestions.

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
1746	done
1761	OK
1769	ok: getSwappedCCMask()
1792	The high-muxes patch does the exact same thing in PostRewrite in expandCmpMux() (to handle the "low-high" case), so I thought it might be a nice function to have. But I can see that it makes things easier here to instead use NeedsCommute as then we don't need to swap the operands anymore like you suggest.

See minor comment inline. Otherwise, this now looks good to me, but we should check performance results before checking it in ...

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
1780	This does not at all depend on a SystemZInstrInfo object, so it should not require callers to pass one in. Could be either made a static member function, or maybe just a non-class function (probably in the SystemZ namespace).

getSwappedCCMask() function moved out into the SystemZ namespace.

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
1780	Ah, right...

Patch extended to also fold reloads into conditional load operands (both LOCR and SELR).

The mapping from LOC(G)R to LOC(G) gives ~ 250 folded instructions on SPEC 2006.

New MemFoldPseudo instructions for LOCG and LOCMux, so that the SELGR/SELRMux can be mapped to them, and later handled in SystemZPostRewrite as was already done previously for e.g. AG. This gives ~100 folded cases becoming LOC on SPEC 2006 on z15. There were no cases of COPYs insterted in SystemZPostRewrite due to this on the given day on SPEC 2006.

There are two new MemFoldPseudo opcodes that are not used since we always use LOCRMux : LOCFH_MemFoldPseudo and LOC_MemFoldPseudo.

These mappings are superfluous in getMemOpcode():

LOCFHR -> LOCFH 
LOCR   -> LOC        
SELR   -> LOC_MemFoldPseudo

in getTargetMemOpcode():

LOCFH_MemFoldPseudo -> LOCFH  
LOC_MemFoldPseudo   -> LOC

New test case for folding reload into LOCG/LOCMux both on z14 and z15.

Perhaps just reuse reverseCCMask() in SystemZISelLowering.cpp instead of the new SystemZ::getSwappedCCMask() ..?

jonpa retitled this revision from [SystemZ] Swap compare operands in foldMemoryOperandImpl() if possible. to [SystemZ] Improve foldMemoryOperandImpl()..Jan 9 2020, 12:55 PM

In D67437#1812901, @jonpa wrote:

Patch extended to also fold reloads into conditional load operands (both LOCR and SELR).

OK, makes sense.

There are two new MemFoldPseudo opcodes that are not used since we always use LOCRMux : LOCFH_MemFoldPseudo and LOC_MemFoldPseudo.

Then we really shouldn't generate them --- see inline comments.

Perhaps just reuse reverseCCMask() in SystemZISelLowering.cpp instead of the new SystemZ::getSwappedCCMask() ..?

Yes, agreed. See also inline comments.

llvm/lib/Target/SystemZ/SystemZInstrFormats.td
2846	I think it would be better to leave this here (to allow for selectively creating _MemFoldPseudo only when we need it). Also, for consistency, we should then set OpKey/OpType in CondUnaryRSY.
3246	Move OpKey and OpType into CondBinaryRRF.
3261	For consistency, also append cls1 to mnemonic here.
4791	Append cls.
4793	Append cls.
4838	Append cls1.
4853	Append cls1.
4888	Again, I think it would be better to leave this as-is, except for adding OpKey/OpType, and then adding another multiclass below that creates the MemFoldPseudo.
5109	This should have a different name that makes it clear that it creates a MemFold pseudo. Maybe "CondUnaryRSYPairAndMemFold" (ideally, we should rename the other MemFold pseudo multiclasses to match). We shouldn't set OpKey/OpType here, but in CondUnaryRSY. In fact, we may want to also move setting of MemKey/MemType (for MemType "target") into the original classes as well, then we don't need to duplicate stuff here.
5110	Also, we need a CondUnaryRSYPseudoAndMemFold analogously here.
llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
1781	You're right, this really should use the code in reverseCCMask -- then we also wouldn't have to deal with any failure cases.
llvm/lib/Target/SystemZ/SystemZInstrInfo.td
534	Should use ...AndMemFold exactly there where we need them. Also, the "mnemonic" for mux pseudos should probably be something like "MUXloc", then can keep simply appending "r" without extra hassle.

Patch updated per review (NFC). The new unused opcodes are gone, and the only superfluous mappings now added are in getMemOpcode():

LOCFHR -> LOCFH
LOCR   -> LOC

(which are not really "wrong")

Using reverseCCMask() includes the overflow bit being handled as well (and not checked for), but I suppose this must be equivalent since the compares never set the OF bit, right? (in combineCCMask and prepareCompareSwapOperands). It seems that we are with this patch using reverseCCMask() only for CCUsers in the presence of a compare. So to me it would make more sense to have an assert in reverseCCMask against the OF bit set, rather than handling it. This would be more overall readable given that we do introduce the use of the OF bit in SystemZElimCompare later on (or maybe I am missing something?). In other words, make it explit that the CCMask is produced by a compare and nothing else, like getSwappedCCMask() did. Would we want to use the OF (unordered bit) with FP compares?

Changed mnemonics per suggestion to MUX... This simplifies one line where just an "r" can be added, but the nested subst() in MemFoldPseudo_CondMove is not remedied. Can we still call this string 'mnemonic', or should it 'mnem' or something to show that it is not the same as the name of the instruction (on the other hand, it is already a pseudo opcode...)?

In D67437#1815569, @jonpa wrote:
Patch updated per review (NFC). The new unused opcodes are gone, and the only superfluous mappings now added are in getMemOpcode():
LOCFHR -> LOCFH
LOCR   -> LOC
(which are not really "wrong")

Yes, I think this is fine.

Using reverseCCMask() includes the overflow bit being handled as well (and not checked for), but I suppose this must be equivalent since the compares never set the OF bit, right? (in combineCCMask and prepareCompareSwapOperands). It seems that we are with this patch using reverseCCMask() only for CCUsers in the presence of a compare. So to me it would make more sense to have an assert in reverseCCMask against the OF bit set, rather than handling it. This would be more overall readable given that we do introduce the use of the OF bit in SystemZElimCompare later on (or maybe I am missing something?). In other words, make it explit that the CCMask is produced by a compare and nothing else, like getSwappedCCMask() did. Would we want to use the OF (unordered bit) with FP compares?

Yes, we need the unordered bit handling for FP compares. For integer compares the bit is never set, so I think this is still fine ...

Changed mnemonics per suggestion to MUX... This simplifies one line where just an "r" can be added, but the nested subst() in MemFoldPseudo_CondMove is not remedied. Can we still call this string 'mnemonic', or should it 'mnem' or something to show that it is not the same as the name of the instruction (on the other hand, it is already a pseudo opcode...)?

See inline comment, I think this can now be simplified.

Except for the minor tweaks pointed on inline, this now LGTM. Thanks!

llvm/lib/Target/SystemZ/SystemZInstrFormats.td
4791	I believe this can now simply be let OpKey = !subst("loc", "sel", mnemonic)#"r"#cls;
5106	You should now be able to simply use def "" : CondUnaryRSYPair<...> right?

Minor updated per review.

LGTM, thanks!

This revision is now accepted and ready to land.Jan 13 2020, 10:39 AM

Closed by commit rG62ff9960d337: [SystemZ] Improve foldMemoryOperandImpl(). (authored by jonpa). · Explain WhyMar 10 2020, 8:07 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

SystemZ/

SystemZISelLowering.cpp

23 lines

SystemZInstrFormats.td

51 lines

SystemZInstrInfo.h

10 lines

SystemZInstrInfo.cpp

115 lines

SystemZInstrInfo.td

8 lines

test/

CodeGen/

SystemZ/

cond-move-10.ll

100 lines

int-cmp-56.mir

323 lines

Diff 249383

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,184 Lines • ▼ Show 20 Lines	if (C.ICmpType != SystemZICMP::SignedOnly &&
Opcode0 == ISD::AND &&		Opcode0 == ISD::AND &&
C.Op0.getOperand(1).getOpcode() == ISD::Constant &&		C.Op0.getOperand(1).getOpcode() == ISD::Constant &&
cast<ConstantSDNode>(C.Op0.getOperand(1))->getZExtValue() == 0xffffffff)		cast<ConstantSDNode>(C.Op0.getOperand(1))->getZExtValue() == 0xffffffff)
return true;		return true;

return false;		return false;
}		}

// Return a version of comparison CC mask CCMask in which the LT and GT
// actions are swapped.
static unsigned reverseCCMask(unsigned CCMask) {
return ((CCMask & SystemZ::CCMASK_CMP_EQ) \|
(CCMask & SystemZ::CCMASK_CMP_GT ? SystemZ::CCMASK_CMP_LT : 0) \|
(CCMask & SystemZ::CCMASK_CMP_LT ? SystemZ::CCMASK_CMP_GT : 0) \|
(CCMask & SystemZ::CCMASK_CMP_UO));
}

// Check whether C tests for equality between X and Y and whether X - Y		// Check whether C tests for equality between X and Y and whether X - Y
// or Y - X is also computed. In that case it's better to compare the		// or Y - X is also computed. In that case it's better to compare the
// result of the subtraction against zero.		// result of the subtraction against zero.
static void adjustForSubtraction(SelectionDAG &DAG, const SDLoc &DL,		static void adjustForSubtraction(SelectionDAG &DAG, const SDLoc &DL,
Comparison &C) {		Comparison &C) {
if (C.CCMask == SystemZ::CCMASK_CMP_EQ \|\|		if (C.CCMask == SystemZ::CCMASK_CMP_EQ \|\|
C.CCMask == SystemZ::CCMASK_CMP_NE) {		C.CCMask == SystemZ::CCMASK_CMP_NE) {
for (auto I = C.Op0->use_begin(), E = C.Op0->use_end(); I != E; ++I) {		for (auto I = C.Op0->use_begin(), E = C.Op0->use_end(); I != E; ++I) {
Show All 19 Lines	static void adjustForFNeg(Comparison &C) {
if (C.Chain)		if (C.Chain)
return;		return;
auto *C1 = dyn_cast<ConstantFPSDNode>(C.Op1);		auto *C1 = dyn_cast<ConstantFPSDNode>(C.Op1);
if (C1 && C1->isZero()) {		if (C1 && C1->isZero()) {
for (auto I = C.Op0->use_begin(), E = C.Op0->use_end(); I != E; ++I) {		for (auto I = C.Op0->use_begin(), E = C.Op0->use_end(); I != E; ++I) {
SDNode N = I;		SDNode N = I;
if (N->getOpcode() == ISD::FNEG) {		if (N->getOpcode() == ISD::FNEG) {
C.Op0 = SDValue(N, 0);		C.Op0 = SDValue(N, 0);
C.CCMask = reverseCCMask(C.CCMask);		C.CCMask = SystemZ::reverseCCMask(C.CCMask);
return;		return;
}		}
}		}
}		}
}		}

// Check whether C compares (shl X, 32) with 0 and whether X is		// Check whether C compares (shl X, 32) with 0 and whether X is
// also sign-extended. In that case it is better to test the result		// also sign-extended. In that case it is better to test the result
▲ Show 20 Lines • Show All 350 Lines • ▼ Show 20 Lines	if (C.Op0.getValueType().isFloatingPoint()) {
adjustSubwordCmp(DAG, DL, C);		adjustSubwordCmp(DAG, DL, C);
adjustForSubtraction(DAG, DL, C);		adjustForSubtraction(DAG, DL, C);
adjustForLTGFR(C);		adjustForLTGFR(C);
adjustICmpTruncate(DAG, DL, C);		adjustICmpTruncate(DAG, DL, C);
}		}

if (shouldSwapCmpOperands(C)) {		if (shouldSwapCmpOperands(C)) {
std::swap(C.Op0, C.Op1);		std::swap(C.Op0, C.Op1);
C.CCMask = reverseCCMask(C.CCMask);		C.CCMask = SystemZ::reverseCCMask(C.CCMask);
}		}

adjustForTestUnderMask(DAG, DL, C);		adjustForTestUnderMask(DAG, DL, C);
return C;		return C;
}		}

// Emit the comparison instruction described by C.		// Emit the comparison instruction described by C.
static SDValue emitCmp(SelectionDAG &DAG, const SDLoc &DL, Comparison &C) {		static SDValue emitCmp(SelectionDAG &DAG, const SDLoc &DL, Comparison &C) {
▲ Show 20 Lines • Show All 3,659 Lines • ▼ Show 20 Lines	if (CompareLHS->getOpcode() == ISD::SRA) {
// Avoid introducing CC spills (because SRA would clobber CC).		// Avoid introducing CC spills (because SRA would clobber CC).
if (!CompareLHS->hasOneUse())		if (!CompareLHS->hasOneUse())
return false;		return false;
// Verify that the ICMP compares against zero.		// Verify that the ICMP compares against zero.
if (CompareRHS->getZExtValue() != 0)		if (CompareRHS->getZExtValue() != 0)
return false;		return false;

// Compute the effective CC mask for the new branch or select.		// Compute the effective CC mask for the new branch or select.
switch (CCMask) {		CCMask = SystemZ::reverseCCMask(CCMask);
case SystemZ::CCMASK_CMP_EQ: break;
case SystemZ::CCMASK_CMP_NE: break;
case SystemZ::CCMASK_CMP_LT: CCMask = SystemZ::CCMASK_CMP_GT; break;
case SystemZ::CCMASK_CMP_GT: CCMask = SystemZ::CCMASK_CMP_LT; break;
case SystemZ::CCMASK_CMP_LE: CCMask = SystemZ::CCMASK_CMP_GE; break;
case SystemZ::CCMASK_CMP_GE: CCMask = SystemZ::CCMASK_CMP_LE; break;
default: return false;
}

// Return the updated CCReg link.		// Return the updated CCReg link.
CCReg = IPM->getOperand(0);		CCReg = IPM->getOperand(0);
return true;		return true;
}		}

return false;		return false;
}		}
▲ Show 20 Lines • Show All 1,772 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZInstrFormats.td

Show First 20 Lines • Show All 2,801 Lines • ▼ Show 20 Lines	: InstRSYb<opcode, (outs cls:$R1),
[(set cls:$R1,		[(set cls:$R1,
(z_select_ccmask (operator bdaddr20only:$BD2), cls:$R1src,		(z_select_ccmask (operator bdaddr20only:$BD2), cls:$R1src,
cond4:$valid, cond4:$M3))]> {		cond4:$valid, cond4:$M3))]> {
let Constraints = "$R1 = $R1src";		let Constraints = "$R1 = $R1src";
let DisableEncoding = "$R1src";		let DisableEncoding = "$R1src";
let mayLoad = 1;		let mayLoad = 1;
let AccessBytes = bytes;		let AccessBytes = bytes;
let CCMaskLast = 1;		let CCMaskLast = 1;
		let OpKey = mnemonic#"r"#cls;
		let OpType = "mem";
		let MemKey = mnemonic#cls;
		let MemType = "target";
}		}

// Like CondUnaryRSY, but used for the raw assembly form. The condition-code		// Like CondUnaryRSY, but used for the raw assembly form. The condition-code
// mask is the third operand rather than being part of the mnemonic.		// mask is the third operand rather than being part of the mnemonic.
class AsmCondUnaryRSY<string mnemonic, bits<16> opcode,		class AsmCondUnaryRSY<string mnemonic, bits<16> opcode,
RegisterOperand cls, bits<5> bytes,		RegisterOperand cls, bits<5> bytes,
AddressingMode mode = bdaddr20only>		AddressingMode mode = bdaddr20only>
: InstRSYb<opcode, (outs cls:$R1), (ins cls:$R1src, mode:$BD2, imm32zx4:$M3),		: InstRSYb<opcode, (outs cls:$R1), (ins cls:$R1src, mode:$BD2, imm32zx4:$M3),
Show All 20 Lines

multiclass CondUnaryRSYPair<string mnemonic, bits<16> opcode,		multiclass CondUnaryRSYPair<string mnemonic, bits<16> opcode,
SDPatternOperator operator,		SDPatternOperator operator,
RegisterOperand cls, bits<5> bytes,		RegisterOperand cls, bits<5> bytes,
AddressingMode mode = bdaddr20only> {		AddressingMode mode = bdaddr20only> {
let isCodeGenOnly = 1 in		let isCodeGenOnly = 1 in
def "" : CondUnaryRSY<mnemonic, opcode, operator, cls, bytes, mode>;		def "" : CondUnaryRSY<mnemonic, opcode, operator, cls, bytes, mode>;
def Asm : AsmCondUnaryRSY<mnemonic, opcode, cls, bytes, mode>;		def Asm : AsmCondUnaryRSY<mnemonic, opcode, cls, bytes, mode>;
}		}
uweigandUnsubmitted Done Reply Inline Actions I think it would be better to leave this here (to allow for selectively creating _MemFoldPseudo only when we need it). Also, for consistency, we should then set OpKey/OpType in CondUnaryRSY. uweigand: I think it would be better to leave this here (to allow for selectively creating _MemFoldPseudo…

class UnaryRX<string mnemonic, bits<8> opcode, SDPatternOperator operator,		class UnaryRX<string mnemonic, bits<8> opcode, SDPatternOperator operator,
RegisterOperand cls, bits<5> bytes,		RegisterOperand cls, bits<5> bytes,
AddressingMode mode = bdxaddr12only>		AddressingMode mode = bdxaddr12only>
: InstRXa<opcode, (outs cls:$R1), (ins mode:$XBD2),		: InstRXa<opcode, (outs cls:$R1), (ins mode:$XBD2),
mnemonic#"\t$R1, $XBD2",		mnemonic#"\t$R1, $XBD2",
[(set cls:$R1, (operator mode:$XBD2))]> {		[(set cls:$R1, (operator mode:$XBD2))]> {
let OpKey = mnemonic#"r"#cls;		let OpKey = mnemonic#"r"#cls;
▲ Show 20 Lines • Show All 351 Lines • ▼ Show 20 Lines	: InstRRFc<opcode, (outs cls1:$R1),
mnemonic#"$M3\t$R1, $R2",		mnemonic#"$M3\t$R1, $R2",
[(set cls1:$R1, (z_select_ccmask cls2:$R2, cls1:$R1src,		[(set cls1:$R1, (z_select_ccmask cls2:$R2, cls1:$R1src,
cond4:$valid, cond4:$M3))]> {		cond4:$valid, cond4:$M3))]> {
let Constraints = "$R1 = $R1src";		let Constraints = "$R1 = $R1src";
let DisableEncoding = "$R1src";		let DisableEncoding = "$R1src";
let CCMaskLast = 1;		let CCMaskLast = 1;
let NumOpsKey = !subst("loc", "sel", mnemonic);		let NumOpsKey = !subst("loc", "sel", mnemonic);
let NumOpsValue = "2";		let NumOpsValue = "2";
		let OpKey = mnemonic#cls1;
		let OpType = "reg";
}		}

// Like CondBinaryRRF, but used for the raw assembly form. The condition-code		// Like CondBinaryRRF, but used for the raw assembly form. The condition-code
// mask is the third operand rather than being part of the mnemonic.		// mask is the third operand rather than being part of the mnemonic.
class AsmCondBinaryRRF<string mnemonic, bits<16> opcode, RegisterOperand cls1,		class AsmCondBinaryRRF<string mnemonic, bits<16> opcode, RegisterOperand cls1,
RegisterOperand cls2>		RegisterOperand cls2>
: InstRRFc<opcode, (outs cls1:$R1),		: InstRRFc<opcode, (outs cls1:$R1),
(ins cls1:$R1src, cls2:$R2, imm32zx4:$M3),		(ins cls1:$R1src, cls2:$R2, imm32zx4:$M3),
Show All 10 Lines	class FixedCondBinaryRRF<CondVariant V, string mnemonic, bits<16> opcode,
let Constraints = "$R1 = $R1src";		let Constraints = "$R1 = $R1src";
let DisableEncoding = "$R1src";		let DisableEncoding = "$R1src";
let isAsmParserOnly = V.alternate;		let isAsmParserOnly = V.alternate;
let M3 = V.ccmask;		let M3 = V.ccmask;
}		}

multiclass CondBinaryRRFPair<string mnemonic, bits<16> opcode,		multiclass CondBinaryRRFPair<string mnemonic, bits<16> opcode,
RegisterOperand cls1, RegisterOperand cls2> {		RegisterOperand cls1, RegisterOperand cls2> {
let isCodeGenOnly = 1 in		let isCodeGenOnly = 1 in
		uweigandUnsubmitted Done Reply Inline Actions Move OpKey and OpType into CondBinaryRRF. uweigand: Move OpKey and OpType into CondBinaryRRF.
def "" : CondBinaryRRF<mnemonic, opcode, cls1, cls2>;		def "" : CondBinaryRRF<mnemonic, opcode, cls1, cls2>;
def Asm : AsmCondBinaryRRF<mnemonic, opcode, cls1, cls2>;		def Asm : AsmCondBinaryRRF<mnemonic, opcode, cls1, cls2>;
}		}

class CondBinaryRRFa<string mnemonic, bits<16> opcode, RegisterOperand cls1,		class CondBinaryRRFa<string mnemonic, bits<16> opcode, RegisterOperand cls1,
RegisterOperand cls2, RegisterOperand cls3>		RegisterOperand cls2, RegisterOperand cls3>
: InstRRFa<opcode, (outs cls1:$R1),		: InstRRFa<opcode, (outs cls1:$R1),
(ins cls3:$R3, cls2:$R2, cond4:$valid, cond4:$M4),		(ins cls3:$R3, cls2:$R2, cond4:$valid, cond4:$M4),
mnemonic#"$M4\t$R1, $R2, $R3",		mnemonic#"$M4\t$R1, $R2, $R3",
[(set cls1:$R1, (z_select_ccmask cls2:$R2, cls3:$R3,		[(set cls1:$R1, (z_select_ccmask cls2:$R2, cls3:$R3,
cond4:$valid, cond4:$M4))]> {		cond4:$valid, cond4:$M4))]> {
let CCMaskLast = 1;		let CCMaskLast = 1;
let NumOpsKey = mnemonic;		let NumOpsKey = mnemonic;
let NumOpsValue = "3";		let NumOpsValue = "3";
		let OpKey = mnemonic#cls1;
		uweigandUnsubmitted Done Reply Inline Actions For consistency, also append cls1 to mnemonic here. uweigand: For consistency, also append cls1 to mnemonic here.
		let OpType = "reg";
}		}

// Like CondBinaryRRFa, but used for the raw assembly form. The condition-code		// Like CondBinaryRRFa, but used for the raw assembly form. The condition-code
// mask is the third operand rather than being part of the mnemonic.		// mask is the third operand rather than being part of the mnemonic.
class AsmCondBinaryRRFa<string mnemonic, bits<16> opcode, RegisterOperand cls1,		class AsmCondBinaryRRFa<string mnemonic, bits<16> opcode, RegisterOperand cls1,
RegisterOperand cls2, RegisterOperand cls3>		RegisterOperand cls2, RegisterOperand cls3>
: InstRRFa<opcode, (outs cls1:$R1), (ins cls3:$R3, cls2:$R2, imm32zx4:$M4),		: InstRRFa<opcode, (outs cls1:$R1), (ins cls3:$R3, cls2:$R2, imm32zx4:$M4),
mnemonic#"\t$R1, $R2, $R3, $M4", []>;		mnemonic#"\t$R1, $R2, $R3, $M4", []>;
▲ Show 20 Lines • Show All 1,507 Lines • ▼ Show 20 Lines	: Pseudo<(outs cls:$R1), (ins cls:$R2, mode:$XBD2), []> {
let MemKey = mnemonic#cls;		let MemKey = mnemonic#cls;
let MemType = "pseudo";		let MemType = "pseudo";
let mayLoad = 1;		let mayLoad = 1;
let AccessBytes = bytes;		let AccessBytes = bytes;
let HasIndex = 1;		let HasIndex = 1;
let hasNoSchedulingInfo = 1;		let hasNoSchedulingInfo = 1;
}		}

		// Same as MemFoldPseudo but for Load On Condition with CC operands.
		class MemFoldPseudo_CondMove<string mnemonic, RegisterOperand cls, bits<5> bytes,
		AddressingMode mode>
		: Pseudo<(outs cls:$R1),
		(ins cls:$R2, mode:$XBD2, cond4:$valid, cond4:$M3), []> {
		let OpKey = !subst("loc", "sel", mnemonic)#"r"#cls;
		uweigandUnsubmitted Done Reply Inline Actions Append cls. uweigand: Append cls.
		uweigandUnsubmitted Done Reply Inline Actions I believe this can now simply be let OpKey = !subst("loc", "sel", mnemonic)#"r"#cls; uweigand: I believe this can now simply be let OpKey = !subst("loc", "sel", mnemonic)#"r"#cls;
		let OpType = "mem";
		let MemKey = mnemonic#cls;
		uweigandUnsubmitted Done Reply Inline Actions Append cls. uweigand: Append cls.
		let MemType = "pseudo";
		let mayLoad = 1;
		let AccessBytes = bytes;
		let hasNoSchedulingInfo = 1;
		}

// Like CompareRI, but expanded after RA depending on the choice of register.		// Like CompareRI, but expanded after RA depending on the choice of register.
class CompareRIPseudo<SDPatternOperator operator, RegisterOperand cls,		class CompareRIPseudo<SDPatternOperator operator, RegisterOperand cls,
ImmOpWithPattern imm>		ImmOpWithPattern imm>
: Pseudo<(outs), (ins cls:$R1, imm:$I2),		: Pseudo<(outs), (ins cls:$R1, imm:$I2),
[(set CC, (operator cls:$R1, imm:$I2))]> {		[(set CC, (operator cls:$R1, imm:$I2))]> {
let isCompare = 1;		let isCompare = 1;
}		}

Show All 22 Lines	: Pseudo<(outs cls1:$R1),
(ins cls1:$R1src, cls2:$R2, cond4:$valid, cond4:$M3),		(ins cls1:$R1src, cls2:$R2, cond4:$valid, cond4:$M3),
[(set cls1:$R1, (z_select_ccmask cls2:$R2, cls1:$R1src,		[(set cls1:$R1, (z_select_ccmask cls2:$R2, cls1:$R1src,
cond4:$valid, cond4:$M3))]> {		cond4:$valid, cond4:$M3))]> {
let Constraints = "$R1 = $R1src";		let Constraints = "$R1 = $R1src";
let DisableEncoding = "$R1src";		let DisableEncoding = "$R1src";
let CCMaskLast = 1;		let CCMaskLast = 1;
let NumOpsKey = !subst("loc", "sel", mnemonic);		let NumOpsKey = !subst("loc", "sel", mnemonic);
let NumOpsValue = "2";		let NumOpsValue = "2";
		let OpKey = mnemonic#cls1;
		uweigandUnsubmitted Done Reply Inline Actions Append cls1. uweigand: Append cls1.
		let OpType = "reg";
}		}

// Like CondBinaryRRFa, but expanded after RA depending on the choice of		// Like CondBinaryRRFa, but expanded after RA depending on the choice of
// register.		// register.
class CondBinaryRRFaPseudo<string mnemonic, RegisterOperand cls1,		class CondBinaryRRFaPseudo<string mnemonic, RegisterOperand cls1,
RegisterOperand cls2, RegisterOperand cls3>		RegisterOperand cls2, RegisterOperand cls3>
: Pseudo<(outs cls1:$R1),		: Pseudo<(outs cls1:$R1),
(ins cls3:$R3, cls2:$R2, cond4:$valid, cond4:$M4),		(ins cls3:$R3, cls2:$R2, cond4:$valid, cond4:$M4),
[(set cls1:$R1, (z_select_ccmask cls2:$R2, cls3:$R3,		[(set cls1:$R1, (z_select_ccmask cls2:$R2, cls3:$R3,
cond4:$valid, cond4:$M4))]> {		cond4:$valid, cond4:$M4))]> {
let CCMaskLast = 1;		let CCMaskLast = 1;
let NumOpsKey = mnemonic;		let NumOpsKey = mnemonic;
let NumOpsValue = "3";		let NumOpsValue = "3";
		let OpKey = mnemonic#cls1;
		uweigandUnsubmitted Done Reply Inline Actions Append cls1. uweigand: Append cls1.
		let OpType = "reg";
}		}

// Like CondBinaryRIE, but expanded after RA depending on the choice of		// Like CondBinaryRIE, but expanded after RA depending on the choice of
// register.		// register.
class CondBinaryRIEPseudo<RegisterOperand cls, ImmOpWithPattern imm>		class CondBinaryRIEPseudo<RegisterOperand cls, ImmOpWithPattern imm>
: Pseudo<(outs cls:$R1),		: Pseudo<(outs cls:$R1),
(ins cls:$R1src, imm:$I2, cond4:$valid, cond4:$M3),		(ins cls:$R1src, imm:$I2, cond4:$valid, cond4:$M3),
[(set cls:$R1, (z_select_ccmask imm:$I2, cls:$R1src,		[(set cls:$R1, (z_select_ccmask imm:$I2, cls:$R1src,
cond4:$valid, cond4:$M3))]> {		cond4:$valid, cond4:$M3))]> {
let Constraints = "$R1 = $R1src";		let Constraints = "$R1 = $R1src";
let DisableEncoding = "$R1src";		let DisableEncoding = "$R1src";
let CCMaskLast = 1;		let CCMaskLast = 1;
}		}

// Like CondUnaryRSY, but expanded after RA depending on the choice of		// Like CondUnaryRSY, but expanded after RA depending on the choice of
// register.		// register.
class CondUnaryRSYPseudo<SDPatternOperator operator, RegisterOperand cls,		class CondUnaryRSYPseudo<string mnemonic, SDPatternOperator operator,
bits<5> bytes, AddressingMode mode = bdaddr20only>		RegisterOperand cls, bits<5> bytes,
		AddressingMode mode = bdaddr20only>
: Pseudo<(outs cls:$R1),		: Pseudo<(outs cls:$R1),
(ins cls:$R1src, mode:$BD2, cond4:$valid, cond4:$R3),		(ins cls:$R1src, mode:$BD2, cond4:$valid, cond4:$R3),
[(set cls:$R1,		[(set cls:$R1,
(z_select_ccmask (operator mode:$BD2), cls:$R1src,		(z_select_ccmask (operator mode:$BD2), cls:$R1src,
cond4:$valid, cond4:$R3))]> {		cond4:$valid, cond4:$R3))]> {
let Constraints = "$R1 = $R1src";		let Constraints = "$R1 = $R1src";
let DisableEncoding = "$R1src";		let DisableEncoding = "$R1src";
let mayLoad = 1;		let mayLoad = 1;
let AccessBytes = bytes;		let AccessBytes = bytes;
let CCMaskLast = 1;		let CCMaskLast = 1;
		let OpKey = mnemonic#"r"#cls;
		let OpType = "mem";
		let MemKey = mnemonic#cls;
		let MemType = "target";
}		}
		uweigandUnsubmitted Done Reply Inline Actions Again, I think it would be better to leave this as-is, except for adding OpKey/OpType, and then adding another multiclass below that creates the MemFoldPseudo. uweigand: Again, I think it would be better to leave this as-is, except for adding OpKey/OpType, and then…

// Like CondStoreRSY, but expanded after RA depending on the choice of		// Like CondStoreRSY, but expanded after RA depending on the choice of
// register.		// register.
class CondStoreRSYPseudo<RegisterOperand cls, bits<5> bytes,		class CondStoreRSYPseudo<RegisterOperand cls, bits<5> bytes,
AddressingMode mode = bdaddr20only>		AddressingMode mode = bdaddr20only>
: Pseudo<(outs), (ins cls:$R1, mode:$BD2, cond4:$valid, cond4:$R3), []> {		: Pseudo<(outs), (ins cls:$R1, mode:$BD2, cond4:$valid, cond4:$R3), []> {
let mayStore = 1;		let mayStore = 1;
let AccessBytes = bytes;		let AccessBytes = bytes;
▲ Show 20 Lines • Show All 195 Lines • ▼ Show 20 Lines	let DispKey = mnemonic ## #cls in {
}		}
let DispSize = "20" in		let DispSize = "20" in
def Y : BinaryRXY<mnemonic#"y", rxyOpcode, operator, cls, load,		def Y : BinaryRXY<mnemonic#"y", rxyOpcode, operator, cls, load,
bytes, bdxaddr20pair>;		bytes, bdxaddr20pair>;
}		}
def _MemFoldPseudo : MemFoldPseudo<mnemonic, cls, bytes, bdxaddr12pair>;		def _MemFoldPseudo : MemFoldPseudo<mnemonic, cls, bytes, bdxaddr12pair>;
}		}

		multiclass CondUnaryRSYPairAndMemFold<string mnemonic, bits<16> opcode,
		SDPatternOperator operator,
		RegisterOperand cls, bits<5> bytes,
		AddressingMode mode = bdaddr20only> {
		defm "" : CondUnaryRSYPair<mnemonic, opcode, operator, cls, bytes, mode>;
		def _MemFoldPseudo : MemFoldPseudo_CondMove<mnemonic, cls, bytes, mode>;
		}
		uweigandUnsubmitted Done Reply Inline Actions You should now be able to simply use def "" : CondUnaryRSYPair<...> right? uweigand: You should now be able to simply use def "" : CondUnaryRSYPair<...> right?

		multiclass CondUnaryRSYPseudoAndMemFold<string mnemonic,
		SDPatternOperator operator,
		uweigandUnsubmitted Done Reply Inline Actions This should have a different name that makes it clear that it creates a MemFold pseudo. Maybe "CondUnaryRSYPairAndMemFold" (ideally, we should rename the other MemFold pseudo multiclasses to match). We shouldn't set OpKey/OpType here, but in CondUnaryRSY. In fact, we may want to also move setting of MemKey/MemType (for MemType "target") into the original classes as well, then we don't need to duplicate stuff here. uweigand: This should have a different name that makes it clear that it creates a MemFold pseudo. Maybe…
		RegisterOperand cls, bits<5> bytes,
		uweigandUnsubmitted Done Reply Inline Actions Also, we need a CondUnaryRSYPseudoAndMemFold analogously here. uweigand: Also, we need a CondUnaryRSYPseudoAndMemFold analogously here.
		AddressingMode mode = bdaddr20only> {
		def "" : CondUnaryRSYPseudo<mnemonic, operator, cls, bytes, mode>;
		def _MemFoldPseudo : MemFoldPseudo_CondMove<mnemonic, cls, bytes, mode>;
		}

// Define an instruction that operates on two fixed-length blocks of memory,		// Define an instruction that operates on two fixed-length blocks of memory,
// and associated pseudo instructions for operating on blocks of any size.		// and associated pseudo instructions for operating on blocks of any size.
// The Sequence form uses a straight-line sequence of instructions and		// The Sequence form uses a straight-line sequence of instructions and
// the Loop form uses a loop of length-256 instructions followed by		// the Loop form uses a loop of length-256 instructions followed by
// another instruction to handle the excess.		// another instruction to handle the excess.
multiclass MemorySS<string mnemonic, bits<8> opcode,		multiclass MemorySS<string mnemonic, bits<8> opcode,
SDPatternOperator sequence, SDPatternOperator loop> {		SDPatternOperator sequence, SDPatternOperator loop> {
def "" : SideEffectBinarySSa<mnemonic, opcode>;		def "" : SideEffectBinarySSa<mnemonic, opcode>;
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZInstrInfo.h

Show First 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	enum FusedCompareType {
CompareAndTrap		CompareAndTrap
};		};

} // end namespace SystemZII		} // end namespace SystemZII

namespace SystemZ {		namespace SystemZ {
int getTwoOperandOpcode(uint16_t Opcode);		int getTwoOperandOpcode(uint16_t Opcode);
int getTargetMemOpcode(uint16_t Opcode);		int getTargetMemOpcode(uint16_t Opcode);

		// Return a version of comparison CC mask CCMask in which the LT and GT
		// actions are swapped.
		unsigned reverseCCMask(unsigned CCMask);
}		}

class SystemZInstrInfo : public SystemZGenInstrInfo {		class SystemZInstrInfo : public SystemZGenInstrInfo {
const SystemZRegisterInfo RI;		const SystemZRegisterInfo RI;
SystemZSubtarget &STI;		SystemZSubtarget &STI;

void splitMove(MachineBasicBlock::iterator MI, unsigned NewOpcode) const;		void splitMove(MachineBasicBlock::iterator MI, unsigned NewOpcode) const;
void splitAdjDynAlloc(MachineBasicBlock::iterator MI) const;		void splitAdjDynAlloc(MachineBasicBlock::iterator MI) const;
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	public:

// If Opcode is a COMPARE opcode for which an associated fused COMPARE AND *		// If Opcode is a COMPARE opcode for which an associated fused COMPARE AND *
// operation exists, return the opcode for the latter, otherwise return 0.		// operation exists, return the opcode for the latter, otherwise return 0.
// MI, if nonnull, is the compare instruction.		// MI, if nonnull, is the compare instruction.
unsigned getFusedCompare(unsigned Opcode,		unsigned getFusedCompare(unsigned Opcode,
SystemZII::FusedCompareType Type,		SystemZII::FusedCompareType Type,
const MachineInstr *MI = nullptr) const;		const MachineInstr *MI = nullptr) const;

		// Try to find all CC users of the compare instruction (MBBI) and update
		// all of them to maintain equivalent behavior after swapping the compare
		// operands. Return false if not all users can be conclusively found and
		// handled. The compare instruction is not changed.
		bool prepareCompareSwapOperands(MachineBasicBlock::iterator MBBI) const;

// If Opcode is a LOAD opcode for with an associated LOAD AND TRAP		// If Opcode is a LOAD opcode for with an associated LOAD AND TRAP
// operation exists, returh the opcode for the latter, otherwise return 0.		// operation exists, returh the opcode for the latter, otherwise return 0.
unsigned getLoadAndTrap(unsigned Opcode) const;		unsigned getLoadAndTrap(unsigned Opcode) const;

// Emit code before MBBI in MI to move immediate value Value into		// Emit code before MBBI in MI to move immediate value Value into
// physical register Reg.		// physical register Reg.
void loadImmediate(MachineBasicBlock &MBB,		void loadImmediate(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI,		MachineBasicBlock::iterator MBBI,
Show All 18 Lines

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp

Show First 20 Lines • Show All 1,144 Lines • ▼ Show 20 Lines	if (MMO->getSize() == Size && !MMO->isVolatile() && !MMO->isAtomic()) {
}		}
}		}
}		}

// If the spilled operand is the final one or the instruction is		// If the spilled operand is the final one or the instruction is
// commutable, try to change <INSN>R into <INSN>.		// commutable, try to change <INSN>R into <INSN>.
unsigned NumOps = MI.getNumExplicitOperands();		unsigned NumOps = MI.getNumExplicitOperands();
int MemOpcode = SystemZ::getMemOpcode(Opcode);		int MemOpcode = SystemZ::getMemOpcode(Opcode);
		if (MemOpcode == -1)
		return nullptr;

		// Try to swap compare operands if possible.
		bool NeedsCommute = false;
		if ((MI.getOpcode() == SystemZ::CR \|\| MI.getOpcode() == SystemZ::CGR \|\|
		MI.getOpcode() == SystemZ::CLR \|\| MI.getOpcode() == SystemZ::CLGR) &&
		OpNum == 0 && prepareCompareSwapOperands(MI))
		NeedsCommute = true;

		bool CCOperands = false;
		if (MI.getOpcode() == SystemZ::LOCRMux \|\| MI.getOpcode() == SystemZ::LOCGR \|\|
		MI.getOpcode() == SystemZ::SELRMux \|\| MI.getOpcode() == SystemZ::SELGR) {
		assert(MI.getNumOperands() == 6 && NumOps == 5 &&
		"LOCR/SELR instruction operands corrupt?");
		NumOps -= 2;
		CCOperands = true;
		}

// See if this is a 3-address instruction that is convertible to 2-address		// See if this is a 3-address instruction that is convertible to 2-address
// and suitable for folding below. Only try this with virtual registers		// and suitable for folding below. Only try this with virtual registers
// and a provided VRM (during regalloc).		// and a provided VRM (during regalloc).
bool NeedsCommute = false;		if (SystemZ::getTwoOperandOpcode(Opcode) != -1) {
if (SystemZ::getTwoOperandOpcode(Opcode) != -1 && MemOpcode != -1) {
if (VRM == nullptr)		if (VRM == nullptr)
MemOpcode = -1;		return nullptr;
else {		else {
assert(NumOps == 3 && "Expected two source registers.");		assert(NumOps == 3 && "Expected two source registers.");
Register DstReg = MI.getOperand(0).getReg();		Register DstReg = MI.getOperand(0).getReg();
Register DstPhys =		Register DstPhys =
(Register::isVirtualRegister(DstReg) ? VRM->getPhys(DstReg) : DstReg);		(Register::isVirtualRegister(DstReg) ? VRM->getPhys(DstReg) : DstReg);
Register SrcReg = (OpNum == 2 ? MI.getOperand(1).getReg()		Register SrcReg = (OpNum == 2 ? MI.getOperand(1).getReg()
: ((OpNum == 1 && MI.isCommutable())		: ((OpNum == 1 && MI.isCommutable())
? MI.getOperand(2).getReg()		? MI.getOperand(2).getReg()
: Register()));		: Register()));
if (DstPhys && !SystemZ::GRH32BitRegClass.contains(DstPhys) && SrcReg &&		if (DstPhys && !SystemZ::GRH32BitRegClass.contains(DstPhys) && SrcReg &&
Register::isVirtualRegister(SrcReg) &&		Register::isVirtualRegister(SrcReg) &&
DstPhys == VRM->getPhys(SrcReg))		DstPhys == VRM->getPhys(SrcReg))
NeedsCommute = (OpNum == 1);		NeedsCommute = (OpNum == 1);
else		else
MemOpcode = -1;		return nullptr;
}		}
}		}

if (MemOpcode >= 0) {
if ((OpNum == NumOps - 1) \|\| NeedsCommute) {		if ((OpNum == NumOps - 1) \|\| NeedsCommute) {
const MCInstrDesc &MemDesc = get(MemOpcode);		const MCInstrDesc &MemDesc = get(MemOpcode);
uint64_t AccessBytes = SystemZII::getAccessSize(MemDesc.TSFlags);		uint64_t AccessBytes = SystemZII::getAccessSize(MemDesc.TSFlags);
assert(AccessBytes != 0 && "Size of access should be known");		assert(AccessBytes != 0 && "Size of access should be known");
assert(AccessBytes <= Size && "Access outside the frame index");		assert(AccessBytes <= Size && "Access outside the frame index");
uint64_t Offset = Size - AccessBytes;		uint64_t Offset = Size - AccessBytes;
MachineInstrBuilder MIB = BuildMI(*InsertPt->getParent(), InsertPt,		MachineInstrBuilder MIB = BuildMI(*InsertPt->getParent(), InsertPt,
MI.getDebugLoc(), get(MemOpcode));		MI.getDebugLoc(), get(MemOpcode));
		if (MI.isCompare()) {
		assert(NumOps == 2 && "Expected 2 register operands for a compare.");
		MIB.add(MI.getOperand(NeedsCommute ? 1 : 0));
		}
		else {
MIB.add(MI.getOperand(0));		MIB.add(MI.getOperand(0));
if (NeedsCommute)		if (NeedsCommute)
MIB.add(MI.getOperand(2));		MIB.add(MI.getOperand(2));
else		else
for (unsigned I = 1; I < OpNum; ++I)		for (unsigned I = 1; I < OpNum; ++I)
MIB.add(MI.getOperand(I));		MIB.add(MI.getOperand(I));
		}
MIB.addFrameIndex(FrameIndex).addImm(Offset);		MIB.addFrameIndex(FrameIndex).addImm(Offset);
if (MemDesc.TSFlags & SystemZII::HasIndex)		if (MemDesc.TSFlags & SystemZII::HasIndex)
MIB.addReg(0);		MIB.addReg(0);
		if (CCOperands) {
		unsigned CCValid = MI.getOperand(NumOps).getImm();
		unsigned CCMask = MI.getOperand(NumOps + 1).getImm();
		MIB.addImm(CCValid);
		MIB.addImm(NeedsCommute ? CCMask ^ CCValid : CCMask);
		}
transferDeadCC(&MI, MIB);		transferDeadCC(&MI, MIB);
transferMIFlag(&MI, MIB, MachineInstr::NoSWrap);		transferMIFlag(&MI, MIB, MachineInstr::NoSWrap);
return MIB;		return MIB;
}		}
}

return nullptr;		return nullptr;
}		}

MachineInstr *SystemZInstrInfo::foldMemoryOperandImpl(		MachineInstr *SystemZInstrInfo::foldMemoryOperandImpl(
MachineFunction &MF, MachineInstr &MI, ArrayRef<unsigned> Ops,		MachineFunction &MF, MachineInstr &MI, ArrayRef<unsigned> Ops,
MachineBasicBlock::iterator InsertPt, MachineInstr &LoadMI,		MachineBasicBlock::iterator InsertPt, MachineInstr &LoadMI,
LiveIntervals *LIS) const {		LiveIntervals *LIS) const {
▲ Show 20 Lines • Show All 491 Lines • ▼ Show 20 Lines	case SystemZ::CLG:
return SystemZ::CLGT;		return SystemZ::CLGT;
default:		default:
return 0;		return 0;
}		}
}		}
return 0;		return 0;
}		}

		bool SystemZInstrInfo::
		prepareCompareSwapOperands(MachineBasicBlock::iterator const MBBI) const {
		assert(MBBI->isCompare() && MBBI->getOperand(0).isReg() &&
		MBBI->getOperand(1).isReg() && !MBBI->mayLoad() &&
		"Not a compare reg/reg.");

		MachineBasicBlock *MBB = MBBI->getParent();
		bool CCLive = true;
		SmallVector<MachineInstr *, 4> CCUsers;
		for (MachineBasicBlock::iterator Itr = std::next(MBBI);
		Itr != MBB->end(); ++Itr) {
		uweigandUnsubmitted Done Reply Inline Actions I believe we do not need this check if the loop below finds another CC def after MI. We only need to check live-outs if this MI is the last CC def in the basic block. uweigand: I believe we do not need this check if the loop below finds another CC def after MI. We…
		jonpaAuthorUnsubmitted Done Reply Inline Actions done jonpa: done
		if (Itr->readsRegister(SystemZ::CC)) {
		unsigned Flags = Itr->getDesc().TSFlags;
		if ((Flags & SystemZII::CCMaskFirst) \|\| (Flags & SystemZII::CCMaskLast))
		CCUsers.push_back(&*Itr);
		else
		return false;
		}
		if (Itr->definesRegister(SystemZ::CC)) {
		CCLive = false;
		break;
		}
		}
		if (CCLive) {
		LivePhysRegs LiveRegs(*MBB->getParent()->getSubtarget().getRegisterInfo());
		LiveRegs.addLiveOuts(*MBB);
		uweigandUnsubmitted Done Reply Inline Actions I'm not sure it is safe to assert here. If there's no CC users of a compare, then the compare is dead and will usually have been optimized away. But I'm not sure we can fully rely on that to have happened in all cases (e.g. what about -O0?). uweigand: I'm not sure it is safe to assert here. If there's no CC users of a compare, then the compare…
		jonpaAuthorUnsubmitted Done Reply Inline Actions OK jonpa: OK
		if (LiveRegs.contains(SystemZ::CC))
		return false;
		}

		// Update all CC users.
		for (unsigned Idx = 0; Idx < CCUsers.size(); ++Idx) {
		unsigned Flags = CCUsers[Idx]->getDesc().TSFlags;
		unsigned FirstOpNum = ((Flags & SystemZII::CCMaskFirst) ?
		uweigandUnsubmitted Done Reply Inline Actions This logic duplicates the CC commutation operation done in SystemZISelLowering::combineCCMask, so it would be preferable to have a helper routine somewhere. uweigand: This logic duplicates the CC commutation operation done in SystemZISelLowering::combineCCMask…
		jonpaAuthorUnsubmitted Done Reply Inline Actions ok: getSwappedCCMask() jonpa: ok: getSwappedCCMask()
		0 : CCUsers[Idx]->getNumExplicitOperands() - 2);
		MachineOperand &CCMaskMO = CCUsers[Idx]->getOperand(FirstOpNum + 1);
		unsigned NewCCMask = SystemZ::reverseCCMask(CCMaskMO.getImm());
		CCMaskMO.setImm(NewCCMask);
		}

		return true;
		}

		unsigned SystemZ::reverseCCMask(unsigned CCMask) {
		return ((CCMask & SystemZ::CCMASK_CMP_EQ) \|
		uweigandUnsubmitted Done Reply Inline Actions This does not at all depend on a SystemZInstrInfo object, so it should not require callers to pass one in. Could be either made a static member function, or maybe just a non-class function (probably in the SystemZ namespace). uweigand: This does not at all depend on a SystemZInstrInfo object, so it should not require callers to…
		jonpaAuthorUnsubmitted Done Reply Inline Actions Ah, right... jonpa: Ah, right...
		(CCMask & SystemZ::CCMASK_CMP_GT ? SystemZ::CCMASK_CMP_LT : 0) \|
		uweigandUnsubmitted Done Reply Inline Actions You're right, this really should use the code in reverseCCMask -- then we also wouldn't have to deal with any failure cases. uweigand: You're right, this really should use the code in reverseCCMask -- then we also wouldn't have to…
		(CCMask & SystemZ::CCMASK_CMP_LT ? SystemZ::CCMASK_CMP_GT : 0) \|
		(CCMask & SystemZ::CCMASK_CMP_UO));
		}

unsigned SystemZInstrInfo::getLoadAndTrap(unsigned Opcode) const {		unsigned SystemZInstrInfo::getLoadAndTrap(unsigned Opcode) const {
if (!STI.hasLoadAndTrap())		if (!STI.hasLoadAndTrap())
return 0;		return 0;
switch (Opcode) {		switch (Opcode) {
case SystemZ::L:		case SystemZ::L:
case SystemZ::LY:		case SystemZ::LY:
return SystemZ::LAT;		return SystemZ::LAT;
		uweigandUnsubmitted Done Reply Inline Actions Why do you need to modify the instruction here in the first place? The caller throws it away anyway and creates the memory variant. Should this just do the commutation in place there? uweigand: Why do you need to modify the instruction here in the first place? The caller throws it away…
		jonpaAuthorUnsubmitted Done Reply Inline Actions The high-muxes patch does the exact same thing in PostRewrite in expandCmpMux() (to handle the "low-high" case), so I thought it might be a nice function to have. But I can see that it makes things easier here to instead use NeedsCommute as then we don't need to swap the operands anymore like you suggest. jonpa: The high-muxes patch does the exact same thing in PostRewrite in expandCmpMux() (to handle the…
case SystemZ::LG:		case SystemZ::LG:
return SystemZ::LGAT;		return SystemZ::LGAT;
case SystemZ::LFH:		case SystemZ::LFH:
return SystemZ::LFHAT;		return SystemZ::LFHAT;
case SystemZ::LLGF:		case SystemZ::LLGF:
return SystemZ::LLGFAT;		return SystemZ::LLGFAT;
case SystemZ::LLGT:		case SystemZ::LLGT:
return SystemZ::LLGTAT;		return SystemZ::LLGTAT;
▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZInstrInfo.td

Show First 20 Lines • Show All 486 Lines • ▼ Show 20 Lines
// Conditional move instructions		// Conditional move instructions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

let Predicates = [FeatureMiscellaneousExtensions3], Uses = [CC] in {		let Predicates = [FeatureMiscellaneousExtensions3], Uses = [CC] in {
// Select.		// Select.
let isCommutable = 1 in {		let isCommutable = 1 in {
// Expands to SELR or SELFHR or a branch-and-move sequence,		// Expands to SELR or SELFHR or a branch-and-move sequence,
// depending on the choice of registers.		// depending on the choice of registers.
def SELRMux : CondBinaryRRFaPseudo<"selrmux", GRX32, GRX32, GRX32>;		def SELRMux : CondBinaryRRFaPseudo<"MUXselr", GRX32, GRX32, GRX32>;
defm SELFHR : CondBinaryRRFaPair<"selfhr", 0xB9C0, GRH32, GRH32, GRH32>;		defm SELFHR : CondBinaryRRFaPair<"selfhr", 0xB9C0, GRH32, GRH32, GRH32>;
defm SELR : CondBinaryRRFaPair<"selr", 0xB9F0, GR32, GR32, GR32>;		defm SELR : CondBinaryRRFaPair<"selr", 0xB9F0, GR32, GR32, GR32>;
defm SELGR : CondBinaryRRFaPair<"selgr", 0xB9E3, GR64, GR64, GR64>;		defm SELGR : CondBinaryRRFaPair<"selgr", 0xB9E3, GR64, GR64, GR64>;
}		}

// Define AsmParser extended mnemonics for each general condition-code mask.		// Define AsmParser extended mnemonics for each general condition-code mask.
foreach V = [ "E", "NE", "H", "NH", "L", "NL", "HE", "NHE", "LE", "NLE",		foreach V = [ "E", "NE", "H", "NH", "L", "NL", "HE", "NHE", "LE", "NLE",
"Z", "NZ", "P", "NP", "M", "NM", "LH", "NLH", "O", "NO" ] in {		"Z", "NZ", "P", "NP", "M", "NM", "LH", "NLH", "O", "NO" ] in {
Show All 16 Lines	let Predicates = [FeatureLoadStoreOnCond2], Uses = [CC] in {
defm LOCHI : CondBinaryRIEPair<"lochi", 0xEC42, GR32, imm32sx16>;		defm LOCHI : CondBinaryRIEPair<"lochi", 0xEC42, GR32, imm32sx16>;
defm LOCGHI : CondBinaryRIEPair<"locghi", 0xEC46, GR64, imm64sx16>;		defm LOCGHI : CondBinaryRIEPair<"locghi", 0xEC46, GR64, imm64sx16>;

// Move register on condition. Matched via DAG pattern and		// Move register on condition. Matched via DAG pattern and
// created by early if-conversion.		// created by early if-conversion.
let isCommutable = 1 in {		let isCommutable = 1 in {
// Expands to LOCR or LOCFHR or a branch-and-move sequence,		// Expands to LOCR or LOCFHR or a branch-and-move sequence,
// depending on the choice of registers.		// depending on the choice of registers.
def LOCRMux : CondBinaryRRFPseudo<"locrmux", GRX32, GRX32>;		def LOCRMux : CondBinaryRRFPseudo<"MUXlocr", GRX32, GRX32>;
defm LOCFHR : CondBinaryRRFPair<"locfhr", 0xB9E0, GRH32, GRH32>;		defm LOCFHR : CondBinaryRRFPair<"locfhr", 0xB9E0, GRH32, GRH32>;
}		}

// Load on condition. Matched via DAG pattern.		// Load on condition. Matched via DAG pattern.
// Expands to LOC or LOCFH, depending on the choice of register.		// Expands to LOC or LOCFH, depending on the choice of register.
def LOCMux : CondUnaryRSYPseudo<simple_load, GRX32, 4>;		defm LOCMux : CondUnaryRSYPseudoAndMemFold<"MUXloc", simple_load, GRX32, 4>;
		uweigandUnsubmitted Done Reply Inline Actions Should use ...AndMemFold exactly there where we need them. Also, the "mnemonic" for mux pseudos should probably be something like "MUXloc", then can keep simply appending "r" without extra hassle. uweigand: Should use ...AndMemFold exactly there where we need them. Also, the "mnemonic" for mux…
defm LOCFH : CondUnaryRSYPair<"locfh", 0xEBE0, simple_load, GRH32, 4>;		defm LOCFH : CondUnaryRSYPair<"locfh", 0xEBE0, simple_load, GRH32, 4>;

// Store on condition. Expanded from CondStore* pseudos.		// Store on condition. Expanded from CondStore* pseudos.
// Expands to STOC or STOCFH, depending on the choice of register.		// Expands to STOC or STOCFH, depending on the choice of register.
def STOCMux : CondStoreRSYPseudo<GRX32, 4>;		def STOCMux : CondStoreRSYPseudo<GRX32, 4>;
defm STOCFH : CondStoreRSYPair<"stocfh", 0xEBE1, GRH32, 4>;		defm STOCFH : CondStoreRSYPair<"stocfh", 0xEBE1, GRH32, 4>;

// Define AsmParser extended mnemonics for each general condition-code mask.		// Define AsmParser extended mnemonics for each general condition-code mask.
Show All 16 Lines	let Predicates = [FeatureLoadStoreOnCond], Uses = [CC] in {
// created by early if-conversion.		// created by early if-conversion.
let isCommutable = 1 in {		let isCommutable = 1 in {
defm LOCR : CondBinaryRRFPair<"locr", 0xB9F2, GR32, GR32>;		defm LOCR : CondBinaryRRFPair<"locr", 0xB9F2, GR32, GR32>;
defm LOCGR : CondBinaryRRFPair<"locgr", 0xB9E2, GR64, GR64>;		defm LOCGR : CondBinaryRRFPair<"locgr", 0xB9E2, GR64, GR64>;
}		}

// Load on condition. Matched via DAG pattern.		// Load on condition. Matched via DAG pattern.
defm LOC : CondUnaryRSYPair<"loc", 0xEBF2, simple_load, GR32, 4>;		defm LOC : CondUnaryRSYPair<"loc", 0xEBF2, simple_load, GR32, 4>;
defm LOCG : CondUnaryRSYPair<"locg", 0xEBE2, simple_load, GR64, 8>;		defm LOCG : CondUnaryRSYPairAndMemFold<"locg", 0xEBE2, simple_load, GR64, 8>;

// Store on condition. Expanded from CondStore* pseudos.		// Store on condition. Expanded from CondStore* pseudos.
defm STOC : CondStoreRSYPair<"stoc", 0xEBF3, GR32, 4>;		defm STOC : CondStoreRSYPair<"stoc", 0xEBF3, GR32, 4>;
defm STOCG : CondStoreRSYPair<"stocg", 0xEBE3, GR64, 8>;		defm STOCG : CondStoreRSYPair<"stocg", 0xEBE3, GR64, 8>;

// Define AsmParser extended mnemonics for each general condition-code mask.		// Define AsmParser extended mnemonics for each general condition-code mask.
foreach V = [ "E", "NE", "H", "NH", "L", "NL", "HE", "NHE", "LE", "NLE",		foreach V = [ "E", "NE", "H", "NH", "L", "NL", "HE", "NHE", "LE", "NLE",
"Z", "NZ", "P", "NP", "M", "NM", "LH", "NLH", "O", "NO" ] in {		"Z", "NZ", "P", "NP", "M", "NM", "LH", "NLH", "O", "NO" ] in {
▲ Show 20 Lines • Show All 1,733 Lines • Show Last 20 Lines

llvm/test/CodeGen/SystemZ/cond-move-10.ll

This file was added.

				; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z14 \| FileCheck %s
				; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z15 \| FileCheck %s
				;
				; Test that a reload of a LOCGR/SELGR operand can be folded into a LOC
				; instruction.

				declare i64 @foo()
				declare i32 @foo32()

				; Check that conditional loads of spilled values can use LOCG rather than LOCGR.
				define void @f0(i64 %ptr0, i64 %dstPtr) {
				; CHECK-LABEL: f0:
				; CHECK: brasl %r14, foo@PLT
				; CHECK: locglh {{.*}} # 8-byte Folded Reload
				; CHECK: br %r14
				%ptr1 = getelementptr i64, i64 *%ptr0, i64 2
				%ptr2 = getelementptr i64, i64 *%ptr0, i64 4
				%ptr3 = getelementptr i64, i64 *%ptr0, i64 6
				%ptr4 = getelementptr i64, i64 *%ptr0, i64 8
				%ptr5 = getelementptr i64, i64 *%ptr0, i64 10
				%ptr6 = getelementptr i64, i64 *%ptr0, i64 12
				%ptr7 = getelementptr i64, i64 *%ptr0, i64 14
				%ptr8 = getelementptr i64, i64 *%ptr0, i64 16
				%ptr9 = getelementptr i64, i64 *%ptr0, i64 18

				%val0 = load i64, i64 *%ptr0
				%val1 = load i64, i64 *%ptr1
				%val2 = load i64, i64 *%ptr2
				%val3 = load i64, i64 *%ptr3
				%val4 = load i64, i64 *%ptr4
				%val5 = load i64, i64 *%ptr5
				%val6 = load i64, i64 *%ptr6
				%val7 = load i64, i64 *%ptr7
				%val8 = load i64, i64 *%ptr8
				%val9 = load i64, i64 *%ptr9

				%ret = call i64 @foo()

				%add0 = add i64 %ret, %val0
				%add1 = add i64 %add0, %val1
				%add2 = add i64 %add1, %val2
				%add3 = add i64 %add2, %val3
				%add4 = add i64 %add3, %val4
				%add5 = add i64 %add4, %val5
				%add6 = add i64 %add5, %val6
				%add7 = add i64 %add6, %val7
				%add8 = add i64 %add7, %val8

				%cond = icmp eq i64 %add7, %add8
				%res = select i1 %cond, i64 %add8, i64 %val9

				store i64 %res, i64* %dstPtr
				ret void
				}

				; Check that conditional loads of spilled values can use LOC rather than LOCR.
				define void @f1(i32 %ptr0, i32 %dstPtr) {
				; CHECK-LABEL: f1:
				; CHECK: brasl %r14, foo32@PLT
				; CHECK: loclh {{.*}} # 4-byte Folded Reload
				; CHECK: br %r14
				%ptr1 = getelementptr i32, i32 *%ptr0, i32 2
				%ptr2 = getelementptr i32, i32 *%ptr0, i32 4
				%ptr3 = getelementptr i32, i32 *%ptr0, i32 6
				%ptr4 = getelementptr i32, i32 *%ptr0, i32 8
				%ptr5 = getelementptr i32, i32 *%ptr0, i32 10
				%ptr6 = getelementptr i32, i32 *%ptr0, i32 12
				%ptr7 = getelementptr i32, i32 *%ptr0, i32 14
				%ptr8 = getelementptr i32, i32 *%ptr0, i32 16
				%ptr9 = getelementptr i32, i32 *%ptr0, i32 18

				%val0 = load i32, i32 *%ptr0
				%val1 = load i32, i32 *%ptr1
				%val2 = load i32, i32 *%ptr2
				%val3 = load i32, i32 *%ptr3
				%val4 = load i32, i32 *%ptr4
				%val5 = load i32, i32 *%ptr5
				%val6 = load i32, i32 *%ptr6
				%val7 = load i32, i32 *%ptr7
				%val8 = load i32, i32 *%ptr8
				%val9 = load i32, i32 *%ptr9

				%ret = call i32 @foo32()

				%add0 = add i32 %ret, %val0
				%add1 = add i32 %add0, %val1
				%add2 = add i32 %add1, %val2
				%add3 = add i32 %add2, %val3
				%add4 = add i32 %add3, %val4
				%add5 = add i32 %add4, %val5
				%add6 = add i32 %add5, %val6
				%add7 = add i32 %add6, %val7
				%add8 = add i32 %add7, %val8

				%cond = icmp eq i32 %add7, %add8
				%res = select i1 %cond, i32 %add8, i32 %val9

				store i32 %res, i32* %dstPtr
				ret void
				}

llvm/test/CodeGen/SystemZ/int-cmp-56.mir

This file was added.

				# RUN: llc -mtriple=s390x-linux-gnu -mcpu=z14 -run-pass greedy %s -o - \
				# RUN: \| FileCheck %s
				#
				# Test that a reload can be folded into a compare instruction after swapping
				# operands (when the LHS register is spilled).

				--- \|
				declare i64 @foo()
				define i64 @fun1(i64* %ptr0) { ret i64 0 }
				define i64 @fun2(i64* %ptr0) { ret i64 0 }

				declare i32 @foo32()
				define i32 @fun3(i32* %ptr0) { ret i32 0 }
				define i32 @fun4(i32* %ptr0) { ret i32 0 }
				...


				# Test CGR -> CG
				# CHECK: name: fun1
				# CHECK: CG %10, %stack.0, 0, $noreg, implicit-def $cc :: (load 8 from %stack.0)
				# CHECK-NEXT: %12:gr64bit = LOCGHI %12, 8, 14, 12, implicit killed $cc
				---
				name: fun1
				alignment: 16
				tracksRegLiveness: true
				registers:
				- { id: 0, class: addr64bit }
				- { id: 1, class: gr64bit }
				- { id: 2, class: gr64bit }
				- { id: 3, class: gr64bit }
				- { id: 4, class: gr64bit }
				- { id: 5, class: gr64bit }
				- { id: 6, class: gr64bit }
				- { id: 7, class: gr64bit }
				- { id: 8, class: gr64bit }
				- { id: 9, class: gr64bit }
				- { id: 10, class: gr64bit }
				- { id: 11, class: gr64bit }
				- { id: 12, class: gr64bit }
				- { id: 13, class: gr64bit }
				- { id: 14, class: gr64bit }
				- { id: 15, class: gr64bit }
				- { id: 16, class: gr64bit }
				- { id: 17, class: gr64bit }
				- { id: 18, class: gr64bit }
				- { id: 19, class: gr64bit }
				liveins:
				- { reg: '$r2d', virtual-reg: '%0' }
				frameInfo:
				maxAlignment: 1
				hasCalls: true
				machineFunctionInfo: {}
				body: \|
				bb.0:
				liveins: $r2d

				%0:addr64bit = COPY $r2d
				%1:gr64bit = LG %0, 0, $noreg
				%2:gr64bit = LG %0, 16, $noreg
				%3:gr64bit = LG %0, 32, $noreg
				%4:gr64bit = LG %0, 48, $noreg
				%5:gr64bit = LG %0, 64, $noreg
				%6:gr64bit = LG %0, 80, $noreg
				%7:gr64bit = LG %0, 96, $noreg
				%8:gr64bit = LG %0, 112, $noreg
				%9:gr64bit = LG %0, 128, $noreg
				ADJCALLSTACKDOWN 0, 0
				CallBRASL @foo, csr_systemz, implicit-def dead $r14d, implicit-def dead $cc, implicit $fpc, implicit-def $r2d
				%10:gr64bit = COPY $r2d
				ADJCALLSTACKUP 0, 0
				CGR %10, %1, implicit-def $cc
				%12:gr64bit = COPY %10
				%12:gr64bit = LOCGHI %12, 0, 14, 10, implicit killed $cc
				CGR %10, %2, implicit-def $cc
				%12:gr64bit = LOCGHI %12, 1, 14, 10, implicit killed $cc
				CGR %10, %3, implicit-def $cc
				%12:gr64bit = LOCGHI %12, 2, 14, 10, implicit killed $cc
				CGR %10, %4, implicit-def $cc
				%12:gr64bit = LOCGHI %12, 3, 14, 10, implicit killed $cc
				CGR %10, %5, implicit-def $cc
				%12:gr64bit = LOCGHI %12, 4, 14, 10, implicit killed $cc
				CGR %10, %6, implicit-def $cc
				%12:gr64bit = LOCGHI %12, 5, 14, 10, implicit killed $cc
				CGR %10, %7, implicit-def $cc
				%12:gr64bit = LOCGHI %12, 6, 14, 10, implicit killed $cc
				CGR %10, %8, implicit-def $cc
				%12:gr64bit = LOCGHI %12, 7, 14, 10, implicit killed $cc
				CGR %9, %10, implicit-def $cc
				%12:gr64bit = LOCGHI %12, 8, 14, 10, implicit killed $cc
				$r2d = COPY %12
				Return implicit $r2d
				...


				# Test CLGR -> CLG
				# CHECK: name: fun2
				# CHECK: CLG %10, %stack.0, 0, $noreg, implicit-def $cc :: (load 8 from %stack.0)
				# CHECK-NEXT: %12:gr64bit = LOCGHI %12, 8, 14, 12, implicit killed $cc
				---
				name: fun2
				alignment: 16
				tracksRegLiveness: true
				registers:
				- { id: 0, class: addr64bit }
				- { id: 1, class: gr64bit }
				- { id: 2, class: gr64bit }
				- { id: 3, class: gr64bit }
				- { id: 4, class: gr64bit }
				- { id: 5, class: gr64bit }
				- { id: 6, class: gr64bit }
				- { id: 7, class: gr64bit }
				- { id: 8, class: gr64bit }
				- { id: 9, class: gr64bit }
				- { id: 10, class: gr64bit }
				- { id: 11, class: gr64bit }
				- { id: 12, class: gr64bit }
				- { id: 13, class: gr64bit }
				- { id: 14, class: gr64bit }
				- { id: 15, class: gr64bit }
				- { id: 16, class: gr64bit }
				- { id: 17, class: gr64bit }
				- { id: 18, class: gr64bit }
				- { id: 19, class: gr64bit }
				liveins:
				- { reg: '$r2d', virtual-reg: '%0' }
				frameInfo:
				maxAlignment: 1
				hasCalls: true
				machineFunctionInfo: {}
				body: \|
				bb.0:
				liveins: $r2d

				%0:addr64bit = COPY $r2d
				%1:gr64bit = LG %0, 0, $noreg
				%2:gr64bit = LG %0, 16, $noreg
				%3:gr64bit = LG %0, 32, $noreg
				%4:gr64bit = LG %0, 48, $noreg
				%5:gr64bit = LG %0, 64, $noreg
				%6:gr64bit = LG %0, 80, $noreg
				%7:gr64bit = LG %0, 96, $noreg
				%8:gr64bit = LG %0, 112, $noreg
				%9:gr64bit = LG %0, 128, $noreg
				ADJCALLSTACKDOWN 0, 0
				CallBRASL @foo, csr_systemz, implicit-def dead $r14d, implicit-def dead $cc, implicit $fpc, implicit-def $r2d
				%10:gr64bit = COPY $r2d
				ADJCALLSTACKUP 0, 0
				CGR %10, %1, implicit-def $cc
				%12:gr64bit = COPY %10
				%12:gr64bit = LOCGHI %12, 0, 14, 10, implicit killed $cc
				CGR %10, %2, implicit-def $cc
				%12:gr64bit = LOCGHI %12, 1, 14, 10, implicit killed $cc
				CGR %10, %3, implicit-def $cc
				%12:gr64bit = LOCGHI %12, 2, 14, 10, implicit killed $cc
				CGR %10, %4, implicit-def $cc
				%12:gr64bit = LOCGHI %12, 3, 14, 10, implicit killed $cc
				CGR %10, %5, implicit-def $cc
				%12:gr64bit = LOCGHI %12, 4, 14, 10, implicit killed $cc
				CGR %10, %6, implicit-def $cc
				%12:gr64bit = LOCGHI %12, 5, 14, 10, implicit killed $cc
				CGR %10, %7, implicit-def $cc
				%12:gr64bit = LOCGHI %12, 6, 14, 10, implicit killed $cc
				CGR %10, %8, implicit-def $cc
				%12:gr64bit = LOCGHI %12, 7, 14, 10, implicit killed $cc
				CLGR %9, %10, implicit-def $cc
				%12:gr64bit = LOCGHI %12, 8, 14, 10, implicit killed $cc
				$r2d = COPY %12
				Return implicit $r2d
				...


				# Test CR -> C
				# CHECK: name: fun3
				# CHECK: C %10, %stack.0, 0, $noreg, implicit-def $cc :: (load 4 from %stack.0)
				# CHECK: %12:gr32bit = LOCHIMux %12, 8, 14, 12, implicit killed $cc
				---
				name: fun3
				alignment: 16
				tracksRegLiveness: true
				registers:
				- { id: 0, class: addr64bit }
				- { id: 1, class: gr32bit }
				- { id: 2, class: gr32bit }
				- { id: 3, class: gr32bit }
				- { id: 4, class: gr32bit }
				- { id: 5, class: gr32bit }
				- { id: 6, class: gr32bit }
				- { id: 7, class: gr32bit }
				- { id: 8, class: gr32bit }
				- { id: 9, class: gr32bit }
				- { id: 10, class: gr32bit }
				- { id: 11, class: gr32bit }
				- { id: 12, class: gr32bit }
				- { id: 13, class: gr32bit }
				- { id: 14, class: gr32bit }
				- { id: 15, class: gr32bit }
				- { id: 16, class: gr32bit }
				- { id: 17, class: gr32bit }
				- { id: 18, class: gr32bit }
				- { id: 19, class: gr32bit }
				liveins:
				- { reg: '$r2d', virtual-reg: '%0' }
				frameInfo:
				maxAlignment: 1
				hasCalls: true
				machineFunctionInfo: {}
				body: \|
				bb.0:
				liveins: $r2d

				%0:addr64bit = COPY $r2d
				%1:gr32bit = LMux %0, 0, $noreg
				%2:gr32bit = LMux %0, 8, $noreg
				%3:gr32bit = LMux %0, 16, $noreg
				%4:gr32bit = LMux %0, 24, $noreg
				%5:gr32bit = LMux %0, 32, $noreg
				%6:gr32bit = LMux %0, 40, $noreg
				%7:gr32bit = LMux %0, 48, $noreg
				%8:gr32bit = LMux %0, 56, $noreg
				%9:gr32bit = LMux %0, 64, $noreg
				ADJCALLSTACKDOWN 0, 0
				CallBRASL @foo, csr_systemz, implicit-def dead $r14d, implicit-def dead $cc, implicit $fpc, implicit-def $r2l
				%10:gr32bit = COPY $r2l
				ADJCALLSTACKUP 0, 0
				CR %10, %1, implicit-def $cc
				%12:gr32bit = COPY %10
				%12:gr32bit = LOCHIMux %12, 0, 14, 10, implicit killed $cc
				CR %10, %2, implicit-def $cc
				%12:gr32bit = LOCHIMux %12, 1, 14, 10, implicit killed $cc
				CR %10, %3, implicit-def $cc
				%12:gr32bit = LOCHIMux %12, 2, 14, 10, implicit killed $cc
				CR %10, %4, implicit-def $cc
				%12:gr32bit = LOCHIMux %12, 3, 14, 10, implicit killed $cc
				CR %10, %5, implicit-def $cc
				%12:gr32bit = LOCHIMux %12, 4, 14, 10, implicit killed $cc
				CR %10, %6, implicit-def $cc
				%12:gr32bit = LOCHIMux %12, 5, 14, 10, implicit killed $cc
				CR %10, %7, implicit-def $cc
				%12:gr32bit = LOCHIMux %12, 6, 14, 10, implicit killed $cc
				CR %10, %8, implicit-def $cc
				%12:gr32bit = LOCHIMux %12, 7, 14, 10, implicit killed $cc
				CR %9, %10, implicit-def $cc
				%12:gr32bit = LOCHIMux %12, 8, 14, 10, implicit killed $cc
				$r2l = COPY %12
				Return implicit $r2l
				...


				# Test CLR -> CL
				# CHECK: name: fun4
				# CHECK: CL %10, %stack.0, 0, $noreg, implicit-def $cc :: (load 4 from %stack.0)
				# CHECK: %12:gr32bit = LOCHIMux %12, 8, 14, 12, implicit killed $cc
				---
				name: fun4
				alignment: 16
				tracksRegLiveness: true
				registers:
				- { id: 0, class: addr64bit }
				- { id: 1, class: gr32bit }
				- { id: 2, class: gr32bit }
				- { id: 3, class: gr32bit }
				- { id: 4, class: gr32bit }
				- { id: 5, class: gr32bit }
				- { id: 6, class: gr32bit }
				- { id: 7, class: gr32bit }
				- { id: 8, class: gr32bit }
				- { id: 9, class: gr32bit }
				- { id: 10, class: gr32bit }
				- { id: 11, class: gr32bit }
				- { id: 12, class: gr32bit }
				- { id: 13, class: gr32bit }
				- { id: 14, class: gr32bit }
				- { id: 15, class: gr32bit }
				- { id: 16, class: gr32bit }
				- { id: 17, class: gr32bit }
				- { id: 18, class: gr32bit }
				- { id: 19, class: gr32bit }
				liveins:
				- { reg: '$r2d', virtual-reg: '%0' }
				frameInfo:
				maxAlignment: 1
				hasCalls: true
				machineFunctionInfo: {}
				body: \|
				bb.0:
				liveins: $r2d

				%0:addr64bit = COPY $r2d
				%1:gr32bit = LMux %0, 0, $noreg
				%2:gr32bit = LMux %0, 8, $noreg
				%3:gr32bit = LMux %0, 16, $noreg
				%4:gr32bit = LMux %0, 24, $noreg
				%5:gr32bit = LMux %0, 32, $noreg
				%6:gr32bit = LMux %0, 40, $noreg
				%7:gr32bit = LMux %0, 48, $noreg
				%8:gr32bit = LMux %0, 56, $noreg
				%9:gr32bit = LMux %0, 64, $noreg
				ADJCALLSTACKDOWN 0, 0
				CallBRASL @foo, csr_systemz, implicit-def dead $r14d, implicit-def dead $cc, implicit $fpc, implicit-def $r2l
				%10:gr32bit = COPY $r2l
				ADJCALLSTACKUP 0, 0
				CR %10, %1, implicit-def $cc
				%12:gr32bit = COPY %10
				%12:gr32bit = LOCHIMux %12, 0, 14, 10, implicit killed $cc
				CR %10, %2, implicit-def $cc
				%12:gr32bit = LOCHIMux %12, 1, 14, 10, implicit killed $cc
				CR %10, %3, implicit-def $cc
				%12:gr32bit = LOCHIMux %12, 2, 14, 10, implicit killed $cc
				CR %10, %4, implicit-def $cc
				%12:gr32bit = LOCHIMux %12, 3, 14, 10, implicit killed $cc
				CR %10, %5, implicit-def $cc
				%12:gr32bit = LOCHIMux %12, 4, 14, 10, implicit killed $cc
				CR %10, %6, implicit-def $cc
				%12:gr32bit = LOCHIMux %12, 5, 14, 10, implicit killed $cc
				CR %10, %7, implicit-def $cc
				%12:gr32bit = LOCHIMux %12, 6, 14, 10, implicit killed $cc
				CR %10, %8, implicit-def $cc
				%12:gr32bit = LOCHIMux %12, 7, 14, 10, implicit killed $cc
				CLR %9, %10, implicit-def $cc
				%12:gr32bit = LOCHIMux %12, 8, 14, 10, implicit killed $cc
				$r2l = COPY %12
				Return implicit $r2l
				...