Download Raw Diff

Details

Reviewers

uweigand

Commits

rGf481d4889378: [SystemZ] Improve foldMemoryOperandImpl().

Summary

(MS(G)RKC part of original patch)

Diff Detail

Event Timeline

jonpa created this revision.Mar 25 2020, 6:52 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptMar 25 2020, 6:52 AM

uweigand added inline comments.Mar 25 2020, 7:31 AM

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
1201	Is this necessary for correctness, or just for performance tuning? I thought the point of the MemFold pseudos was that they actually are three-operand instructions; and if they're used with non-tied operands, that gets fixed up in a later pass. So I guess it would be correct to use them even for non-tied operands, right?

jonpa marked an inline comment as done.Mar 25 2020, 8:40 AM

jonpa added inline comments.

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
1201	Yeah, so far we have aimed to only fold instructions when we see that at least at this point the registers match. And in the case when that changes (register reassignment), SystemZPostRewrite will insert a COPY so that the 2-address reg/mem instruction can be used. This is a performance tuning consideration, I think. So yes, I it should still work to fold into a MemFold pseudo even if the registers are different, but I haven't tried it. The current (trunk) check with getTwoOperandOpcode() works as to find a 3-address instruction with a mapped FoldMem pseudo, an they have non-tied operands, so that is certainly correct.

Simplify the check to simply check for instructions with three explicit operands that are not tied (and fix the check for TIED_TO).

Moving the check for VRM up a bit in the function makes the code cleaner and is NFC (currently).

Fix a syntax error that slipped into last diff.

I'm not sure I understand those latest changes. You seem to no longer check at all whether the target opcode actually requires tied operands, you just always tie them?

In D76771#1943370, @uweigand wrote:

I'm not sure I understand those latest changes. You seem to no longer check at all whether the target opcode actually requires tied operands, you just always tie them?

The current check 'if (SystemZ::getTwoOperandOpcode(Opcode) != -1)' finds instructions that are three address that have a two-address equivalent opcode, e.g. AGRK -> AGR. I was thinking that the check 'if (NumOps == 3 && MI.getDesc().getOperandConstraint(1, MCOI::TIED_TO) == -1)' should be exactly equivalent to that and would also include the MS(G)RKC opcodes.

I suppose all these instructions should have a MemFoldPseudo, so checking for a target memory instruction to detect this case should also be equivalent.

I am also thinking that there really isn't any reg/memory instructions that are not tied (?), so this can simply be assumed to be the case and checking for that doesn't make much sense.

This is just a heuristic to not use MemFoldPseudos unless it is fairly certain that they do not need a COPY to match the target memory instruction.

This change is NFC on benchmarks.

The more I think about it, the more it seems that the original check has always been somewhat questionable.

What is it that we really need to check here? The main correctness question is, whether the resulting MI would be correct, i.e. have the right operand types. In general, for all the possible MemOpcode values, the memory operand must come last, so it is only valid to replace the final register operand with a memory operand. The exception for this is if the original instruction is commutable, in which case we may also first commute the (commutable pair of) operands and then replace the (now) final register operand with a memory operand. All of this has nothing to do with getTwoOperandOpcode, with existing physreg assignments or anything else. We do this commutability check for compares already, and we can do it just the same for other MemOperands.

And if this is all we do, I believe the code would still be correct.

Now, everything else seems to be just a performance tuning question. And even here I'm not sure whether this is really doing us any good. We originally have a three-way opcode like

dst = ARK src1, src2

where src2 now lives in a spill slot (assuming the non-commutating case for simplicity). If dst and src1 are mapped to the same register, this is transferred to A (using the MemFold pseudo for A as an intermediate step). Now, if dst and src1 are *not* mapped to the same register, we currently do *not* perform the transformation, so we get

tmp = L spill-src2
dst  = ARK src1, tmp

However, if we simply did perform the transformation, we'd instead get

dst = COPY src1
dst = A dst, spill-src2

Now, it is a bit unclear which version is preferable. The first version may have one fewer microops, but it uses one more register (which may be an issue given that we're already spilling). So this would be worthwhile to check.

However, if we actually want to avoid the COPY and keep the existing code, then I believe the correct check is simply whether or not MemOperand is a MemFold pseudo. This is the only case where this can ever be a concern. (Note that getMemoryOpcode actually preserves the state of whether or not the destination is tied; I had been confused about that above.) So going back to your previous version of the check and simply only checking for MemFold pseudos should do what we want here.

Now, it is a bit unclear which version is preferable. The first version may have one fewer microops, but it uses one more register (which may be an issue given that we're already spilling). So this would be worthwhile to check.

I made a build of benchmarks (on top of these recent improvements) where the known allocations were ignored, which caused a lot more folds with the needed register moves:

lg             :              1090487              1081897    -8590
lgr            :               874009               882174    +8165
ag             :                37427                44978    +7551
agr            :               101720                94572    -7148
l              :               229926               225837    -4089
lr             :                56066                59363    +3297
a              :                59296                61866    +2570
ar             :                42553                40021    -2532
...

Over a nightly run, it looked to be just slightly in favor to *not* do this, in other words it seemed better to avoid the many register moves. But it was a close race...

However, if we actually want to avoid the COPY and keep the existing code, then I believe the correct check is simply whether or not MemOperand is a MemFold pseudo. This is the only case where this can ever be a concern. (Note that getMemoryOpcode actually preserves the state of whether or not the destination is tied; I had been confused about that above.) So going back to your previous version of the check and simply only checking for MemFold pseudos should do what we want here.

I agree this makes most sense and patch is updated.

Thanks for running the benchmark, I guess I'm OK with the current implementation then.

Just one minor inline comment, otherwise this now looks good.

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
1179	Does it still make sense to move the VRM check up here? I guess this could now cancel the transformation in some cases where it isn't necessary because we don't even need VRM below ...
1225	The one thing I'm wondering about: are there any cases where we don't have a MemFold pseudo, but it still would be beneficial to commute the operands? I guess not, given that "real" memory operations are never three-operand ...

jonpa marked 2 inline comments as done.Mar 31 2020, 4:37 AM

jonpa added inline comments.

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
1179	I do believe that it is theoretically better to have the VRM check further down as before, but that it is cluttering the code. It never happens (at least not on a 2017 SPEC build) that this is reached with a null VRM, but of course that may change in the future.
1225	I checked this by building 2017, and it turns out that there are actually some cases of LOCGR/LOCRMux that show up here as commutable without a MemFold pseudo (select instructions are mapped to the LOC MemFold pseudos, but LOC instructions map directly to the target memory instruction). However, when they show up here (during regalloc), op0 and op1 are tied and have the same virtual register, so since we are only handling cases of exactly one operand, I don't think these instructions could have anything but op2 to fold.

uweigand added inline comments.Mar 31 2020, 5:38 AM

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
1179	Given that the check is already in place, I don't see a reason to move it up -- just leave the code as it was ...
1225	OK, that looks fine then. Thanks for checking!

Patch updated per review.

LGTM, thanks!

This revision is now accepted and ready to land.Mar 31 2020, 7:05 AM

Closed by commit rGf481d4889378: [SystemZ] Improve foldMemoryOperandImpl(). (authored by jonpa). · Explain WhyMar 31 2020, 8:50 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2020, 8:50 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Diff 252560

llvm/lib/Target/SystemZ/SystemZInstrFormats.td

	Show First 20 Lines • Show All 4,767 Lines • ▼ Show 20 Lines

	// A pseudo that is used during register allocation when folding a memory			// A pseudo that is used during register allocation when folding a memory
	// operand. The 3-address register instruction with a spilled source cannot			// operand. The 3-address register instruction with a spilled source cannot
	// be converted directly to a target 2-address reg/mem instruction.			// be converted directly to a target 2-address reg/mem instruction.
	// Mapping: <INSN>R -> MemFoldPseudo -> <INSN>			// Mapping: <INSN>R -> MemFoldPseudo -> <INSN>
	class MemFoldPseudo<string mnemonic, RegisterOperand cls, bits<5> bytes,			class MemFoldPseudo<string mnemonic, RegisterOperand cls, bits<5> bytes,
	AddressingMode mode>			AddressingMode mode>
	: Pseudo<(outs cls:$R1), (ins cls:$R2, mode:$XBD2), []> {			: Pseudo<(outs cls:$R1), (ins cls:$R2, mode:$XBD2), []> {
	let OpKey = mnemonic#"rk"#cls;			let OpKey = !subst("mscrk", "msrkc",
				!subst("msgcrk", "msgrkc",
				mnemonic#"rk"#cls));
	let OpType = "mem";			let OpType = "mem";
	let MemKey = mnemonic#cls;			let MemKey = mnemonic#cls;
	let MemType = "pseudo";			let MemType = "pseudo";
	let mayLoad = 1;			let mayLoad = 1;
	let AccessBytes = bytes;			let AccessBytes = bytes;
	let HasIndex = 1;			let HasIndex = 1;
	let hasNoSchedulingInfo = 1;			let hasNoSchedulingInfo = 1;
	}			}
	▲ Show 20 Lines • Show All 382 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp

Show First 20 Lines • Show All 1,170 Lines • ▼ Show 20 Lines	if (MMO->getSize() == Size && !MMO->isVolatile() && !MMO->isAtomic()) {
}		}
}		}
}		}

// If the spilled operand is the final one or the instruction is		// If the spilled operand is the final one or the instruction is
// commutable, try to change <INSN>R into <INSN>.		// commutable, try to change <INSN>R into <INSN>.
unsigned NumOps = MI.getNumExplicitOperands();		unsigned NumOps = MI.getNumExplicitOperands();
int MemOpcode = SystemZ::getMemOpcode(Opcode);		int MemOpcode = SystemZ::getMemOpcode(Opcode);
if (MemOpcode == -1)		if (MemOpcode == -1)
		uweigandUnsubmitted Not Done Reply Inline Actions Does it still make sense to move the VRM check up here? I guess this could now cancel the transformation in some cases where it isn't necessary because we don't even need VRM below ... uweigand: Does it still make sense to move the VRM check up here? I guess this could now cancel the…
		jonpaAuthorUnsubmitted Done Reply Inline Actions I do believe that it is theoretically better to have the VRM check further down as before, but that it is cluttering the code. It never happens (at least not on a 2017 SPEC build) that this is reached with a null VRM, but of course that may change in the future. jonpa: I do believe that it is theoretically better to have the VRM check further down as before, but…
		uweigandUnsubmitted Not Done Reply Inline Actions Given that the check is already in place, I don't see a reason to move it up -- just leave the code as it was ... uweigand: Given that the check is already in place, I don't see a reason to move it up -- just leave the…
return nullptr;		return nullptr;

// Try to swap compare operands if possible.		// Try to swap compare operands if possible.
bool NeedsCommute = false;		bool NeedsCommute = false;
if ((MI.getOpcode() == SystemZ::CR \|\| MI.getOpcode() == SystemZ::CGR \|\|		if ((MI.getOpcode() == SystemZ::CR \|\| MI.getOpcode() == SystemZ::CGR \|\|
MI.getOpcode() == SystemZ::CLR \|\| MI.getOpcode() == SystemZ::CLGR) &&		MI.getOpcode() == SystemZ::CLR \|\| MI.getOpcode() == SystemZ::CLGR) &&
OpNum == 0 && prepareCompareSwapOperands(MI))		OpNum == 0 && prepareCompareSwapOperands(MI))
NeedsCommute = true;		NeedsCommute = true;

bool CCOperands = false;		bool CCOperands = false;
if (MI.getOpcode() == SystemZ::LOCRMux \|\| MI.getOpcode() == SystemZ::LOCGR \|\|		if (MI.getOpcode() == SystemZ::LOCRMux \|\| MI.getOpcode() == SystemZ::LOCGR \|\|
MI.getOpcode() == SystemZ::SELRMux \|\| MI.getOpcode() == SystemZ::SELGR) {		MI.getOpcode() == SystemZ::SELRMux \|\| MI.getOpcode() == SystemZ::SELGR) {
assert(MI.getNumOperands() == 6 && NumOps == 5 &&		assert(MI.getNumOperands() == 6 && NumOps == 5 &&
"LOCR/SELR instruction operands corrupt?");		"LOCR/SELR instruction operands corrupt?");
NumOps -= 2;		NumOps -= 2;
CCOperands = true;		CCOperands = true;
}		}

// See if this is a 3-address instruction that is convertible to 2-address		// See if this is a 3-address instruction that is convertible to 2-address
// and suitable for folding below. Only try this with virtual registers		// and suitable for folding below. Only try this with virtual registers
// and a provided VRM (during regalloc).		// and a provided VRM (during regalloc).
if (SystemZ::getTwoOperandOpcode(Opcode) != -1) {		int MemDstTiedTo = SystemZ::getTargetMemOpcode(MemOpcode) != -1 ? 1 : -1;
		uweigandUnsubmitted Not Done Reply Inline Actions Is this necessary for correctness, or just for performance tuning? I thought the point of the MemFold pseudos was that they actually are three-operand instructions; and if they're used with non-tied operands, that gets fixed up in a later pass. So I guess it would be correct to use them even for non-tied operands, right? uweigand: Is this necessary for correctness, or just for performance tuning? I thought the point of the…
		jonpaAuthorUnsubmitted Done Reply Inline Actions Yeah, so far we have aimed to only fold instructions when we see that at least at this point the registers match. And in the case when that changes (register reassignment), SystemZPostRewrite will insert a COPY so that the 2-address reg/mem instruction can be used. This is a performance tuning consideration, I think. So yes, I it should still work to fold into a MemFold pseudo even if the registers are different, but I haven't tried it. The current (trunk) check with getTwoOperandOpcode() works as to find a 3-address instruction with a mapped FoldMem pseudo, an they have non-tied operands, so that is certainly correct. jonpa: Yeah, so far we have aimed to only fold instructions when we see that at least at this point…
		if (MemDstTiedTo == -1)
		MemDstTiedTo = get(MemOpcode).getOperandConstraint(0, MCOI::TIED_TO);
		assert((MemDstTiedTo == -1 \|\| MemDstTiedTo == 1) &&
		"Expected operand to be either untied or tied to op1.");
		if (MemDstTiedTo == 1 && !MI.getOperand(0).isTied()) {
if (VRM == nullptr)		if (VRM == nullptr)
return nullptr;		return nullptr;
else {		else {
assert(NumOps == 3 && "Expected two source registers.");		assert(NumOps == 3 && "Expected two source registers.");
Register DstReg = MI.getOperand(0).getReg();		Register DstReg = MI.getOperand(0).getReg();
Register DstPhys =		Register DstPhys =
(Register::isVirtualRegister(DstReg) ? VRM->getPhys(DstReg) : DstReg);		(Register::isVirtualRegister(DstReg) ? VRM->getPhys(DstReg) : DstReg);
Register SrcReg = (OpNum == 2 ? MI.getOperand(1).getReg()		Register SrcReg = (OpNum == 2 ? MI.getOperand(1).getReg()
: ((OpNum == 1 && MI.isCommutable())		: ((OpNum == 1 && MI.isCommutable())
? MI.getOperand(2).getReg()		? MI.getOperand(2).getReg()
: Register()));		: Register()));
if (DstPhys && !SystemZ::GRH32BitRegClass.contains(DstPhys) && SrcReg &&		if (DstPhys && !SystemZ::GRH32BitRegClass.contains(DstPhys) && SrcReg &&
Register::isVirtualRegister(SrcReg) &&		Register::isVirtualRegister(SrcReg) &&
DstPhys == VRM->getPhys(SrcReg))		DstPhys == VRM->getPhys(SrcReg))
NeedsCommute = (OpNum == 1);		NeedsCommute = (OpNum == 1);
else		else
return nullptr;		return nullptr;
}		}
}		}
		uweigandUnsubmitted Not Done Reply Inline Actions The one thing I'm wondering about: are there any cases where we don't have a MemFold pseudo, but it still would be beneficial to commute the operands? I guess not, given that "real" memory operations are never three-operand ... uweigand: The one thing I'm wondering about: are there any cases where we don't have a MemFold pseudo…
		jonpaAuthorUnsubmitted Done Reply Inline Actions I checked this by building 2017, and it turns out that there are actually some cases of LOCGR/LOCRMux that show up here as commutable without a MemFold pseudo (select instructions are mapped to the LOC MemFold pseudos, but LOC instructions map directly to the target memory instruction). However, when they show up here (during regalloc), op0 and op1 are tied and have the same virtual register, so since we are only handling cases of exactly one operand, I don't think these instructions could have anything but op2 to fold. jonpa: I checked this by building 2017, and it turns out that there are actually some cases of…
		uweigandUnsubmitted Not Done Reply Inline Actions OK, that looks fine then. Thanks for checking! uweigand: OK, that looks fine then. Thanks for checking!

if ((OpNum == NumOps - 1) \|\| NeedsCommute) {		if ((OpNum == NumOps - 1) \|\| NeedsCommute) {
const MCInstrDesc &MemDesc = get(MemOpcode);		const MCInstrDesc &MemDesc = get(MemOpcode);
uint64_t AccessBytes = SystemZII::getAccessSize(MemDesc.TSFlags);		uint64_t AccessBytes = SystemZII::getAccessSize(MemDesc.TSFlags);
assert(AccessBytes != 0 && "Size of access should be known");		assert(AccessBytes != 0 && "Size of access should be known");
assert(AccessBytes <= Size && "Access outside the frame index");		assert(AccessBytes <= Size && "Access outside the frame index");
uint64_t Offset = Size - AccessBytes;		uint64_t Offset = Size - AccessBytes;
MachineInstrBuilder MIB = BuildMI(*InsertPt->getParent(), InsertPt,		MachineInstrBuilder MIB = BuildMI(*InsertPt->getParent(), InsertPt,
▲ Show 20 Lines • Show All 677 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZInstrInfo.td

	Show First 20 Lines • Show All 1,342 Lines • ▼ Show 20 Lines
	defm MS : BinaryRXPair<"ms", 0x71, 0xE351, mul, GR32, load, 4>;			defm MS : BinaryRXPair<"ms", 0x71, 0xE351, mul, GR32, load, 4>;
	def MGH : BinaryRXY<"mgh", 0xE33C, mul, GR64, asextloadi16, 2>,			def MGH : BinaryRXY<"mgh", 0xE33C, mul, GR64, asextloadi16, 2>,
	Requires<[FeatureMiscellaneousExtensions2]>;			Requires<[FeatureMiscellaneousExtensions2]>;
	def MSGF : BinaryRXY<"msgf", 0xE31C, mul, GR64, asextloadi32, 4>;			def MSGF : BinaryRXY<"msgf", 0xE31C, mul, GR64, asextloadi32, 4>;
	def MSG : BinaryRXY<"msg", 0xE30C, mul, GR64, load, 8>;			def MSG : BinaryRXY<"msg", 0xE30C, mul, GR64, load, 8>;

	// Multiplication of memory, setting the condition code.			// Multiplication of memory, setting the condition code.
	let Predicates = [FeatureMiscellaneousExtensions2], Defs = [CC] in {			let Predicates = [FeatureMiscellaneousExtensions2], Defs = [CC] in {
	def MSC : BinaryRXY<"msc", 0xE353, null_frag, GR32, load, 4>;			defm MSC : BinaryRXYAndPseudo<"msc", 0xE353, null_frag, GR32, load, 4>;
	def MSGC : BinaryRXY<"msgc", 0xE383, null_frag, GR64, load, 8>;			defm MSGC : BinaryRXYAndPseudo<"msgc", 0xE383, null_frag, GR64, load, 8>;
	}			}

	// Multiplication of a register, producing two results.			// Multiplication of a register, producing two results.
	def MR : BinaryRR <"mr", 0x1C, null_frag, GR128, GR32>;			def MR : BinaryRR <"mr", 0x1C, null_frag, GR128, GR32>;
	def MGRK : BinaryRRFa<"mgrk", 0xB9EC, null_frag, GR128, GR64, GR64>,			def MGRK : BinaryRRFa<"mgrk", 0xB9EC, null_frag, GR128, GR64, GR64>,
	Requires<[FeatureMiscellaneousExtensions2]>;			Requires<[FeatureMiscellaneousExtensions2]>;
	def MLR : BinaryRRE<"mlr", 0xB996, null_frag, GR128, GR32>;			def MLR : BinaryRRE<"mlr", 0xB996, null_frag, GR128, GR32>;
	def MLGR : BinaryRRE<"mlgr", 0xB986, null_frag, GR128, GR64>;			def MLGR : BinaryRRE<"mlgr", 0xB986, null_frag, GR128, GR64>;
	▲ Show 20 Lines • Show All 948 Lines • Show Last 20 Lines

llvm/test/CodeGen/SystemZ/foldmemop-msc.mir

This file was added.

				# RUN: llc -mtriple=s390x-linux-gnu -mcpu=z14 -start-before=greedy %s -o - \
				# RUN: \| FileCheck %s
				#
				# Test folding of a memory operand into logical compare with an immediate.

				--- \|
				define i32 @fun0(i32* %src, i32 %arg) { ret i32 0 }
				define i64 @fun1(i64* %src, i64 %arg) { ret i64 0 }
				define i32 @fun2(i32* %src, i32 %arg) { ret i32 0 }
				define i64 @fun3(i64* %src, i64 %arg) { ret i64 0 }
				...


				# CHECK-LABEL: fun0:
				# CHECK-LABEL: .LBB0_2:
				# CHECK: msc %r0, 164(%r15) # 4-byte Folded Reload
				---
				name: fun0
				alignment: 16
				tracksRegLiveness: true
				registers:
				- { id: 0, class: grx32bit }
				- { id: 1, class: grx32bit }
				- { id: 2, class: addr64bit }
				- { id: 3, class: gr32bit }
				- { id: 4, class: grx32bit }
				- { id: 5, class: grx32bit }
				- { id: 6, class: gr32bit }
				- { id: 7, class: gr32bit }
				- { id: 8, class: gr32bit }
				liveins:
				- { reg: '$r2d', virtual-reg: '%2' }
				- { reg: '$r3l', virtual-reg: '%3' }
				frameInfo:
				maxAlignment: 1
				hasOpaqueSPAdjustment: true
				machineFunctionInfo: {}
				body: \|
				bb.0:
				successors: %bb.1(0x30000000), %bb.2(0x50000000)
				liveins: $r2d, $r3l

				%3:gr32bit = COPY $r3l
				%2:addr64bit = COPY $r2d
				%6:gr32bit = LHIMux 0
				CHIMux %3, 0, implicit-def $cc
				%8:gr32bit = LHIMux 0
				BRC 14, 6, %bb.2, implicit killed $cc
				J %bb.1

				bb.1:
				%8:gr32bit = LMux %2, 0, $noreg :: (load 4 from %ir.src)
				INLINEASM &"", 1, 12, implicit-def dead early-clobber $r0d, 12, implicit-def dead early-clobber $r1d, 12, implicit-def dead early-clobber $r2d, 12, implicit-def dead early-clobber $r3d, 12, implicit-def dead early-clobber $r4d, 12, implicit-def dead early-clobber $r5d, 12, implicit-def dead early-clobber $r6d, 12, implicit-def dead early-clobber $r7d, 12, implicit-def dead early-clobber $r8d, 12, implicit-def dead early-clobber $r9d, 12, implicit-def dead early-clobber $r10d, 12, implicit-def dead early-clobber $r11d, 12, implicit-def dead early-clobber $r12d, 12, implicit-def dead early-clobber $r13d, 12, implicit-def dead early-clobber $r14d, 12, implicit-def early-clobber $r15d

				bb.2:
				INLINEASM &"", 1, 12, implicit-def dead early-clobber $r0d, 12, implicit-def dead early-clobber $r1d, 12, implicit-def dead early-clobber $r2d, 12, implicit-def dead early-clobber $r3d, 12, implicit-def dead early-clobber $r4d, 12, implicit-def dead early-clobber $r5d, 12, implicit-def dead early-clobber $r6d, 12, implicit-def dead early-clobber $r7d, 12, implicit-def dead early-clobber $r8d, 12, implicit-def dead early-clobber $r9d, 12, implicit-def dead early-clobber $r10d, 12, implicit-def dead early-clobber $r11d, 12, implicit-def dead early-clobber $r12d, 12, implicit-def dead early-clobber $r13d, 12, implicit-def dead early-clobber $r14d, 12, implicit-def early-clobber $r15d
				%6:gr32bit = MSRKC %8, %6, implicit-def $cc
				%6:gr32bit = LOCHIMux %6, 1, 14, 6, implicit killed $cc
				%7:gr32bit = NRK %3, %6, implicit-def dead $cc
				$r2l = COPY %7
				Return implicit $r2l

				...


				# CHECK-LABEL: fun1:
				# CHECK-LABEL: .LBB1_2:
				# CHECK: msc %r0, 164(%r15) # 4-byte Folded Reload
				---
				name: fun1
				alignment: 16
				tracksRegLiveness: true
				registers:
				- { id: 0, class: grx32bit }
				- { id: 1, class: grx32bit }
				- { id: 2, class: addr64bit }
				- { id: 3, class: gr32bit }
				- { id: 4, class: grx32bit }
				- { id: 5, class: grx32bit }
				- { id: 6, class: gr32bit }
				- { id: 7, class: gr32bit }
				- { id: 8, class: gr32bit }
				liveins:
				- { reg: '$r2d', virtual-reg: '%2' }
				- { reg: '$r3l', virtual-reg: '%3' }
				frameInfo:
				maxAlignment: 1
				hasOpaqueSPAdjustment: true
				machineFunctionInfo: {}
				body: \|
				bb.0:
				successors: %bb.1(0x30000000), %bb.2(0x50000000)
				liveins: $r2d, $r3l

				%3:gr32bit = COPY $r3l
				%2:addr64bit = COPY $r2d
				%6:gr32bit = LHIMux 0
				CHIMux %3, 0, implicit-def $cc
				%8:gr32bit = LHIMux 0
				BRC 14, 6, %bb.2, implicit killed $cc
				J %bb.1

				bb.1:
				%8:gr32bit = LMux %2, 0, $noreg :: (load 4 from %ir.src)
				INLINEASM &"", 1, 12, implicit-def dead early-clobber $r0d, 12, implicit-def dead early-clobber $r1d, 12, implicit-def dead early-clobber $r2d, 12, implicit-def dead early-clobber $r3d, 12, implicit-def dead early-clobber $r4d, 12, implicit-def dead early-clobber $r5d, 12, implicit-def dead early-clobber $r6d, 12, implicit-def dead early-clobber $r7d, 12, implicit-def dead early-clobber $r8d, 12, implicit-def dead early-clobber $r9d, 12, implicit-def dead early-clobber $r10d, 12, implicit-def dead early-clobber $r11d, 12, implicit-def dead early-clobber $r12d, 12, implicit-def dead early-clobber $r13d, 12, implicit-def dead early-clobber $r14d, 12, implicit-def early-clobber $r15d

				bb.2:
				INLINEASM &"", 1, 12, implicit-def dead early-clobber $r0d, 12, implicit-def dead early-clobber $r1d, 12, implicit-def dead early-clobber $r2d, 12, implicit-def dead early-clobber $r3d, 12, implicit-def dead early-clobber $r4d, 12, implicit-def dead early-clobber $r5d, 12, implicit-def dead early-clobber $r6d, 12, implicit-def dead early-clobber $r7d, 12, implicit-def dead early-clobber $r8d, 12, implicit-def dead early-clobber $r9d, 12, implicit-def dead early-clobber $r10d, 12, implicit-def dead early-clobber $r11d, 12, implicit-def dead early-clobber $r12d, 12, implicit-def dead early-clobber $r13d, 12, implicit-def dead early-clobber $r14d, 12, implicit-def early-clobber $r15d
				%6:gr32bit = MSRKC %6, %8, implicit-def $cc
				%6:gr32bit = LOCHIMux %6, 1, 14, 6, implicit killed $cc
				%7:gr32bit = NRK %3, %6, implicit-def dead $cc
				$r2l = COPY %7
				Return implicit $r2l

				...


				# CHECK-LABEL: fun2:
				# CHECK-LABEL: .LBB2_2:
				# CHECK: msgc %r0, 168(%r15) # 8-byte Folded Reload
				---
				name: fun2
				alignment: 16
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gr64bit }
				- { id: 1, class: gr64bit }
				- { id: 2, class: addr64bit }
				- { id: 3, class: gr64bit }
				- { id: 4, class: gr64bit }
				- { id: 5, class: gr64bit }
				- { id: 6, class: gr64bit }
				- { id: 7, class: gr64bit }
				- { id: 8, class: gr64bit }
				liveins:
				- { reg: '$r2d', virtual-reg: '%2' }
				- { reg: '$r3d', virtual-reg: '%3' }
				frameInfo:
				maxAlignment: 1
				hasOpaqueSPAdjustment: true
				machineFunctionInfo: {}
				body: \|
				bb.0:
				successors: %bb.1(0x30000000), %bb.2(0x50000000)
				liveins: $r2d, $r3d

				%3:gr64bit = COPY $r3d
				%2:addr64bit = COPY $r2d
				%6:gr64bit = LGHI 0
				CGHI %3, 0, implicit-def $cc
				%8:gr64bit = LGHI 0
				BRC 14, 6, %bb.2, implicit killed $cc
				J %bb.1

				bb.1:
				%8:gr64bit = LG %2, 0, $noreg :: (load 8 from %ir.src)
				INLINEASM &"", 1, 12, implicit-def dead early-clobber $r0d, 12, implicit-def dead early-clobber $r1d, 12, implicit-def dead early-clobber $r2d, 12, implicit-def dead early-clobber $r3d, 12, implicit-def dead early-clobber $r4d, 12, implicit-def dead early-clobber $r5d, 12, implicit-def dead early-clobber $r6d, 12, implicit-def dead early-clobber $r7d, 12, implicit-def dead early-clobber $r8d, 12, implicit-def dead early-clobber $r9d, 12, implicit-def dead early-clobber $r10d, 12, implicit-def dead early-clobber $r11d, 12, implicit-def dead early-clobber $r12d, 12, implicit-def dead early-clobber $r13d, 12, implicit-def dead early-clobber $r14d, 12, implicit-def early-clobber $r15d

				bb.2:
				INLINEASM &"", 1, 12, implicit-def dead early-clobber $r0d, 12, implicit-def dead early-clobber $r1d, 12, implicit-def dead early-clobber $r2d, 12, implicit-def dead early-clobber $r3d, 12, implicit-def dead early-clobber $r4d, 12, implicit-def dead early-clobber $r5d, 12, implicit-def dead early-clobber $r6d, 12, implicit-def dead early-clobber $r7d, 12, implicit-def dead early-clobber $r8d, 12, implicit-def dead early-clobber $r9d, 12, implicit-def dead early-clobber $r10d, 12, implicit-def dead early-clobber $r11d, 12, implicit-def dead early-clobber $r12d, 12, implicit-def dead early-clobber $r13d, 12, implicit-def dead early-clobber $r14d, 12, implicit-def early-clobber $r15d
				%6:gr64bit = MSGRKC %8, %6, implicit-def $cc
				%6:gr64bit = LOCGHI %6, 1, 14, 6, implicit killed $cc
				%7:gr64bit = NGRK %3, %6, implicit-def dead $cc
				$r2d = COPY %7
				Return implicit $r2d

				...


				# CHECK-LABEL: fun3:
				# CHECK-LABEL: .LBB3_2:
				# CHECK: msgc %r0, 168(%r15) # 8-byte Folded Reload
				---
				name: fun3
				alignment: 16
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gr64bit }
				- { id: 1, class: gr64bit }
				- { id: 2, class: addr64bit }
				- { id: 3, class: gr64bit }
				- { id: 4, class: gr64bit }
				- { id: 5, class: gr64bit }
				- { id: 6, class: gr64bit }
				- { id: 7, class: gr64bit }
				- { id: 8, class: gr64bit }
				liveins:
				- { reg: '$r2d', virtual-reg: '%2' }
				- { reg: '$r3d', virtual-reg: '%3' }
				frameInfo:
				maxAlignment: 1
				hasOpaqueSPAdjustment: true
				machineFunctionInfo: {}
				body: \|
				bb.0:
				successors: %bb.1(0x30000000), %bb.2(0x50000000)
				liveins: $r2d, $r3d

				%3:gr64bit = COPY $r3d
				%2:addr64bit = COPY $r2d
				%6:gr64bit = LGHI 0
				CGHI %3, 0, implicit-def $cc
				%8:gr64bit = LGHI 0
				BRC 14, 6, %bb.2, implicit killed $cc
				J %bb.1

				bb.1:
				%8:gr64bit = LG %2, 0, $noreg :: (load 8 from %ir.src)
				INLINEASM &"", 1, 12, implicit-def dead early-clobber $r0d, 12, implicit-def dead early-clobber $r1d, 12, implicit-def dead early-clobber $r2d, 12, implicit-def dead early-clobber $r3d, 12, implicit-def dead early-clobber $r4d, 12, implicit-def dead early-clobber $r5d, 12, implicit-def dead early-clobber $r6d, 12, implicit-def dead early-clobber $r7d, 12, implicit-def dead early-clobber $r8d, 12, implicit-def dead early-clobber $r9d, 12, implicit-def dead early-clobber $r10d, 12, implicit-def dead early-clobber $r11d, 12, implicit-def dead early-clobber $r12d, 12, implicit-def dead early-clobber $r13d, 12, implicit-def dead early-clobber $r14d, 12, implicit-def early-clobber $r15d

				bb.2:
				INLINEASM &"", 1, 12, implicit-def dead early-clobber $r0d, 12, implicit-def dead early-clobber $r1d, 12, implicit-def dead early-clobber $r2d, 12, implicit-def dead early-clobber $r3d, 12, implicit-def dead early-clobber $r4d, 12, implicit-def dead early-clobber $r5d, 12, implicit-def dead early-clobber $r6d, 12, implicit-def dead early-clobber $r7d, 12, implicit-def dead early-clobber $r8d, 12, implicit-def dead early-clobber $r9d, 12, implicit-def dead early-clobber $r10d, 12, implicit-def dead early-clobber $r11d, 12, implicit-def dead early-clobber $r12d, 12, implicit-def dead early-clobber $r13d, 12, implicit-def dead early-clobber $r14d, 12, implicit-def early-clobber $r15d
				%6:gr64bit = MSGRKC %6, %8, implicit-def $cc
				%6:gr64bit = LOCGHI %6, 1, 14, 6, implicit killed $cc
				%7:gr64bit = NGRK %3, %6, implicit-def dead $cc
				$r2d = COPY %7
				Return implicit $r2d

				...

This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Improve foldMemoryOperandImpl: MS(G)RKC -> MS(G)C
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 252560

llvm/lib/Target/SystemZ/SystemZInstrFormats.td

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp

llvm/lib/Target/SystemZ/SystemZInstrInfo.td

llvm/test/CodeGen/SystemZ/foldmemop-msc.mir

This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Improve foldMemoryOperandImpl: MS(G)RKC -> MS(G)CClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 252560

llvm/lib/Target/SystemZ/SystemZInstrFormats.td

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp

llvm/lib/Target/SystemZ/SystemZInstrInfo.td

llvm/test/CodeGen/SystemZ/foldmemop-msc.mir

[SystemZ] Improve foldMemoryOperandImpl: MS(G)RKC -> MS(G)C
ClosedPublic