This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ / TII] Peephole optimization of zero-extension of i1.
AbandonedPublic

Authored by jonpa on Apr 10 2021, 3:12 AM.

Download Raw Diff

Details

Reviewers

Summary

This is yet another attempt to eliminate unnecessary loads of immediates in case where it is already known by the preceding comparison (https://reviews.llvm.org/D98905, https://reviews.llvm.org/D100039).

SystemZ:

Added isSelect flag on LOCHIMux and LOCGHI.
Implemented analyzeSelect() and optimizeSelect() for them.

TargetInstrInfo - analyzeSelect() and optimizeSelect():

Changed the handling of optimizeSelect() so that target can return a modified instrution in which case it is *not* deleted.

If (as it appears to me) PeepholeOptimizer.cpp is the only user of these hooks (and there are no downstream out-of-tree targets that have requested this), maybe we could merge these two hooks? It seems this could more or less be just one 'optimizeSelect()' method as there appears to be no use for the arguments to analyzeSelect(), or?

If the arguments to analyzeSelect() are indeed needed to be filled out, the current patch makes sense, by doing a careful analysis in that method. Otherwise, it is a waste as it has to be redone in optimizeSelect() (It would probably be better to return true from analyzeSelect() from the interesting opcodes and then do the work in optimizeSelect()).

Benchmarks:

I tried four combinations of two options: "single use of compare operand" and "find LHIMux/LGHI via MRI if not found locally" (experimental options in the patch):

master <> "multiple users" + "only cases with local LHIMux/LGHI"
lhi            :               225040               222044    -2996
lghi           :               445603               444910     -693
lr             :                61869                62276     +407
lgr            :               853946               854211     +265
...

master <> "single uer" + "only cases with local LHIMux/LGHI"
lhi            :               225040               222702    -2338
lghi           :               445603               445263     -340
lgr            :               853946               854083     +137
lr             :                61869                61928      +59
...

master <> "multiple users" + "use MRI to find LHIMux/LGHI"
lhi            :               225040               220319    -4721
lghi           :               445603               443104    -2499
lr             :                61869                62808     +939
lgr            :               853946               854436     +490
...

master <> "single user" + "use MRI to find LHIMux/LGHI"
lhi            :               225040               221788    -3252
lghi           :               445603               443556    -2047
lgr            :               853946               854352     +406
lr             :                61869                61942      +73
...

Initial measurements do not show any bigger performance changes either way...

Diff Detail

Event Timeline

jonpa created this revision.Apr 10 2021, 3:12 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptApr 10 2021, 3:12 AM

jonpa requested review of this revision.Apr 10 2021, 3:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2021, 3:12 AM

jonpa edited the summary of this revision. (Show Details)Apr 10 2021, 3:12 AM

Harbormaster completed remote builds in B98096: Diff 336600.Apr 10 2021, 3:57 AM

jonpa mentioned this in D100039: [SystemZ] Isel cleanup pass: Reuse known zeros/ones after zero-extension of i1..Apr 13 2021, 2:51 AM

jonpa mentioned this in D98905: [SystemZ] Reuse known zeros/ones after zero-extension of i1..

Only check for opcodes in analyzeSelect() and avoid common-code changes by returning a new instruction from optimizeSelect().

seemed best to use MRI to find the LHIMux / LGHI (as opposed to looking for it in MBB). Even if there is a load-immedate that has several users, the LOC is 2-address so it is still beneficial.

I tried a simplistic search to handle cases with multiple users but where the LOC use should be the kill ("last user"). The kill-flags did not really help much, so this is not trivial to handle. This gave just some additional eliminations (~5300 -> ~6000), so it is probably acceptable to just check for a single use and not worry about those extra cases. Possibly some simple check could be used rather than a full CFG-search...

I found out that the new extra LGR/LR instructions relates very much to physregs/calls: The LOC-imm serves as a "natural" change of registers if the immediate is loaded into a register, while if the compare operand is reused there will be a need for a COPY if for instance the compare register comes from a COPY of a physreg, while the LOC-def needs to be live across a call. This is also not trivial to detect - I had to use slow experimental functions to determine if the vregs (and connected vregs) crossed a call. This got rid so far of most of the extra moves, but not all.

I also tried another idea: instead of detecting the cases to avoid (per previous point), do all cases but return false in SystemZRegisterInfo::shouldCoalesce() for the COPY created by TwoAddress for the LOC(G)HI. In many cases regalloc can then eliminate the COPY without the help of RegisterCoalescer, and in the remaining cases SystemZInstrInfo::copyPhysReg() could then lower the COPY with a L(G)HI instead of L(G)R.

This however didn't seem as good as good as I had hoped:

There are with this many cases of CGHI+LGHI which previously became LTGR (not all of those COPYs become coalesced on trunk to begin with, so many of them get the LGR which then becomes an LTGR in SystemZElimCompare).
This didn't so far really eliminate the extra COPYs (LGR/LR:s), but it may be possible to investigate further in shouldCoalesce() using LiveIntervals that is available there and fine-tune this even more.

With the patch as it is we trade ~5000 immediate loads for ~500 register moves, which seems good just looking at the instruction count, but not if a register move is potentially more costly than an immediate load..?

In summary:

- There are relatively few extra cases to be handled if the interesting multiple use cases are searched for.
It is relatively hard to get rid of the extra L(G)R:s as it depends on presence of calls in the function (global search).

jonpa removed a reviewer: qcolombet.Apr 16 2021, 5:38 AM

Harbormaster completed remote builds in B99147: Diff 338071.Apr 16 2021, 6:25 AM

I did some further experiments with using a 3-address pseudo, utilizing the already in-place regalloc-hints towards 2-address opcodes and then handling the 3-address pseudos during Post RA pseudo expansion.

I am not sure which variant is really the best - see tables below. The "single user only / 3-address pseudo" is nice: it's less aggressive but seems to have only positive effects. Looking at preliminary benchmark runs, the "rewrite register (no pseudo)" versions may be slightly preferred on the other hand... If there really are any benefits, that would be nice, but I suspect these differences are so small that instead we should expect to see no improvements, and then look at the patch/opcode differences mostly.

Opcode diffs, SPEC 2017:

trunk <> patch, multiple users

lhi            :               226044               221311    -4733   // ~7000: highest number of eliminated immediate-loads
lghi           :               445650               443171    -2479
lr             :                61854                62741     +887   // Increase in register moves
lgr            :               853624               854133     +509
ltr            :                 6173                 6387     +214
lochilh        :                 9162                 9361     +199
cih            :                 8103                 7935     -168
ltgr           :                 9394                 9548     +154
chi            :                53571                53420     -151
lochie         :                13917                13794     -123
cghi           :                14071                13954     -117
iihf           :                 4263                 4163     -100
l              :               177406               177487      +81
...

trunk <> patch, single user

lhi            :               226044               222775    -3269  // ~5000
lghi           :               445650               443624    -2026
lgr            :               853624               854050     +426  // some new register moves near calls/argument copys.
cih            :                 8103                 7974     -129
lochilh        :                 9162                 9265     +103
stg            :               375242               375320      +78
...

trunk <> patch, multiple users, emit a 3-address pseudo

lhi            :               226044               222916    -3128  // ~4000
lg             :               986749               985733    -1016  // - Mostly due to a single file where many reloads
lghi           :               445650               444656     -994  //   turned into (slightly fewer) copys instead.
cghsi          :                32839                32395     -444
cih            :                 8103                 7721     -382
cgijle         :                 7698                 8059     +361
jle            :                36186                35849     -337
chsi           :                57211                57445     +234
lochilh        :                 9162                 9317     +155
jlh            :               164726               164574     -152
mvghi          :                57787                57638     -149
l              :               177406               177554     +148
lochie         :                13917                13779     -138
cijlh          :                78622                78745     +123
cgije          :               118679               118798     +119
je             :               336281               336165     -116
ltg            :               157803               157693     -110
lgr            :               853624               853724     +100
risbhg         :                 6313                 6413     +100
...

trunk <> patch, single user, emit a 3-address pseudo

lhi            :               226044               223291    -2753  // ~4000
lghi           :               445650               444558    -1092
lg             :               986749               986531     -218
lgr            :               853624               853467     -157
ltg            :               157803               157663     -140
cgije          :               118679               118792     +113
je             :               336281               336172     -109
...

Benchmark measurements (quick runs):

These show only slight variations in performance. It's hard to say which one is really best, if any. I have made previously full runs with just B and C, where they both seemed slightly profitable and possibly B was the better one...

z14:

Overall results (integral over benchmarks):
Build:                                                                    Result       Improvements Regressions
2017_C_PeepLOCI_peep_multiple_users_false                                 0.986        0.960        1.026
2017_B_PeepLOCI                                                           0.989        0.958        1.031
2017_D_PeepLOCI_peep_pseudo                                               0.996        0.962        1.034
2017_E_PeepLOCI_peep_pseudo_peep_multiple_users_false                     1.007        0.985        1.022

Overall results (by average over benchmarks):
Build:                                                                    Average result
2017_C_PeepLOCI_peep_multiple_users_false                                 99.927 %
2017_B_PeepLOCI                                                           99.942 %
2017_D_PeepLOCI_peep_pseudo                                               99.980 %
2017_E_PeepLOCI_peep_pseudo_peep_multiple_users_false                     100.038 %

z15:

Overall results (integral over benchmarks):
Build:                                                                    Result       Improvements Regressions
2017_D_PeepLOCI_peep_pseudo                                               0.989        0.967        1.021
2017_B_PeepLOCI                                                           0.998        0.966        1.032
2017_E_PeepLOCI_peep_pseudo_peep_multiple_users_false                     0.999        0.980        1.019
2017_C_PeepLOCI_peep_multiple_users_false                                 1.007        0.979        1.028

Overall results (by average over benchmarks):
Build:                                                                    Average result
2017_D_PeepLOCI_peep_pseudo                                               99.941 %
2017_B_PeepLOCI                                                           99.990 %
2017_E_PeepLOCI_peep_pseudo_peep_multiple_users_false                     99.997 %
2017_C_PeepLOCI_peep_multiple_users_false                                 100.039 %

Harbormaster completed remote builds in B100525: Diff 339965.Apr 23 2021, 4:57 AM

Results not so promising..

Revision Contents

Path

Size

llvm/

lib/

Target/

SystemZ/

SystemZInstrFormats.td

2 lines

SystemZInstrInfo.h

9 lines

SystemZInstrInfo.cpp

116 lines

SystemZInstrInfo.td

17 lines

test/

CodeGen/

SystemZ/

setcc-05.ll

130 lines

Diff 339965

llvm/lib/Target/SystemZ/SystemZInstrFormats.td

Show First 20 Lines • Show All 3,430 Lines • ▼ Show 20 Lines	class FixedCondBinaryRIE<CondVariant V, string mnemonic, bits<16> opcode,
let DisableEncoding = "$R1src";		let DisableEncoding = "$R1src";
let isAsmParserOnly = V.alternate;		let isAsmParserOnly = V.alternate;
let AsmVariantName = V.asmvariant;		let AsmVariantName = V.asmvariant;
let M3 = V.ccmask;		let M3 = V.ccmask;
}		}

multiclass CondBinaryRIEPair<string mnemonic, bits<16> opcode,		multiclass CondBinaryRIEPair<string mnemonic, bits<16> opcode,
RegisterOperand cls, ImmOpWithPattern imm> {		RegisterOperand cls, ImmOpWithPattern imm> {
let isCodeGenOnly = 1 in		let isCodeGenOnly = 1, NumOpsKey = mnemonic, NumOpsValue = "2" in
def "" : CondBinaryRIE<mnemonic, opcode, cls, imm>;		def "" : CondBinaryRIE<mnemonic, opcode, cls, imm>;
def Asm : AsmCondBinaryRIE<mnemonic, opcode, cls, imm>;		def Asm : AsmCondBinaryRIE<mnemonic, opcode, cls, imm>;
}		}

class BinaryRIL<string mnemonic, bits<12> opcode, SDPatternOperator operator,		class BinaryRIL<string mnemonic, bits<12> opcode, SDPatternOperator operator,
RegisterOperand cls, ImmOpWithPattern imm>		RegisterOperand cls, ImmOpWithPattern imm>
: InstRILa<opcode, (outs cls:$R1), (ins cls:$R1src, imm:$I2),		: InstRILa<opcode, (outs cls:$R1), (ins cls:$R1src, imm:$I2),
mnemonic#"\t$R1, $I2",		mnemonic#"\t$R1, $I2",
▲ Show 20 Lines • Show All 1,854 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZInstrInfo.h

Show First 20 Lines • Show All 179 Lines • ▼ Show 20 Lines	class SystemZInstrInfo : public SystemZGenInstrInfo {
void expandRIPseudo(MachineInstr &MI, unsigned LowOpcode, unsigned HighOpcode,		void expandRIPseudo(MachineInstr &MI, unsigned LowOpcode, unsigned HighOpcode,
bool ConvertHigh) const;		bool ConvertHigh) const;
void expandRIEPseudo(MachineInstr &MI, unsigned LowOpcode,		void expandRIEPseudo(MachineInstr &MI, unsigned LowOpcode,
unsigned LowOpcodeK, unsigned HighOpcode) const;		unsigned LowOpcodeK, unsigned HighOpcode) const;
void expandRXYPseudo(MachineInstr &MI, unsigned LowOpcode,		void expandRXYPseudo(MachineInstr &MI, unsigned LowOpcode,
unsigned HighOpcode) const;		unsigned HighOpcode) const;
void expandLOCPseudo(MachineInstr &MI, unsigned LowOpcode,		void expandLOCPseudo(MachineInstr &MI, unsigned LowOpcode,
unsigned HighOpcode) const;		unsigned HighOpcode) const;
		void expandLOCImmPseudo(MachineInstr &MI, unsigned Opcode,
		unsigned ImmLoadOpcode) const;
void expandZExtPseudo(MachineInstr &MI, unsigned LowOpcode,		void expandZExtPseudo(MachineInstr &MI, unsigned LowOpcode,
unsigned Size) const;		unsigned Size) const;
void expandLoadStackGuard(MachineInstr *MI) const;		void expandLoadStackGuard(MachineInstr *MI) const;

MachineInstrBuilder		MachineInstrBuilder
emitGRX32Move(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,		emitGRX32Move(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
const DebugLoc &DL, unsigned DestReg, unsigned SrcReg,		const DebugLoc &DL, unsigned DestReg, unsigned SrcReg,
unsigned LowLowOpcode, unsigned Size, bool KillSrc,		unsigned LowLowOpcode, unsigned Size, bool KillSrc,
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	bool isProfitableToIfCvt(MachineBasicBlock &TMBB,
unsigned NumCyclesT, unsigned ExtraPredCyclesT,		unsigned NumCyclesT, unsigned ExtraPredCyclesT,
MachineBasicBlock &FMBB,		MachineBasicBlock &FMBB,
unsigned NumCyclesF, unsigned ExtraPredCyclesF,		unsigned NumCyclesF, unsigned ExtraPredCyclesF,
BranchProbability Probability) const override;		BranchProbability Probability) const override;
bool isProfitableToDupForIfCvt(MachineBasicBlock &MBB, unsigned NumCycles,		bool isProfitableToDupForIfCvt(MachineBasicBlock &MBB, unsigned NumCycles,
BranchProbability Probability) const override;		BranchProbability Probability) const override;
bool PredicateInstruction(MachineInstr &MI,		bool PredicateInstruction(MachineInstr &MI,
ArrayRef<MachineOperand> Pred) const override;		ArrayRef<MachineOperand> Pred) const override;
		bool analyzeSelect(const MachineInstr &MI,
		SmallVectorImpl<MachineOperand> &Cond,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - SmallVectorImpl<MachineOperand> &Cond, - unsigned &TrueOp, unsigned &FalseOp, - bool &Optimizable) const override; + SmallVectorImpl<MachineOperand> &Cond, unsigned &TrueOp, + unsigned &FalseOp, bool &Optimizable) const override; Lint: Pre-merge checks: clang-format: please reformat the code ``` - …
		unsigned &TrueOp, unsigned &FalseOp,
		bool &Optimizable) const override;
		MachineInstr *optimizeSelect(MachineInstr &MI,
		SmallPtrSetImpl<MachineInstr *> &SeenMIs,
		bool) const override;
void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,		void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
const DebugLoc &DL, MCRegister DestReg, MCRegister SrcReg,		const DebugLoc &DL, MCRegister DestReg, MCRegister SrcReg,
bool KillSrc) const override;		bool KillSrc) const override;
void storeRegToStackSlot(MachineBasicBlock &MBB,		void storeRegToStackSlot(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI,		MachineBasicBlock::iterator MBBI,
Register SrcReg, bool isKill, int FrameIndex,		Register SrcReg, bool isKill, int FrameIndex,
const TargetRegisterClass *RC,		const TargetRegisterClass *RC,
const TargetRegisterInfo *TRI) const override;		const TargetRegisterInfo *TRI) const override;
▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp

Show First 20 Lines • Show All 185 Lines • ▼ Show 20 Lines
// register is a low GR32 and HighOpcode if the register is a high GR32.		// register is a low GR32 and HighOpcode if the register is a high GR32.
void SystemZInstrInfo::expandLOCPseudo(MachineInstr &MI, unsigned LowOpcode,		void SystemZInstrInfo::expandLOCPseudo(MachineInstr &MI, unsigned LowOpcode,
unsigned HighOpcode) const {		unsigned HighOpcode) const {
Register Reg = MI.getOperand(0).getReg();		Register Reg = MI.getOperand(0).getReg();
unsigned Opcode = SystemZ::isHighReg(Reg) ? HighOpcode : LowOpcode;		unsigned Opcode = SystemZ::isHighReg(Reg) ? HighOpcode : LowOpcode;
MI.setDesc(get(Opcode));		MI.setDesc(get(Opcode));
}		}

		void SystemZInstrInfo::expandLOCImmPseudo(MachineInstr &MI, unsigned Opcode,
		unsigned ImmLoadOpc) const {
		Register DstReg = MI.getOperand(0).getReg();
		Register SrcReg = MI.getOperand(1).getReg();
		if (DstReg != SrcReg) {
		MachineInstr *BuiltMI =
		BuildMI(*MI.getParent(), MI, MI.getDebugLoc(), get(ImmLoadOpc), DstReg)
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - BuildMI(MI.getParent(), MI, MI.getDebugLoc(), get(ImmLoadOpc), DstReg) - .addImm(MI.getOperand(2).getImm() == 0 ? 1 : 0); + BuildMI(MI.getParent(), MI, MI.getDebugLoc(), get(ImmLoadOpc), DstReg) + .addImm(MI.getOperand(2).getImm() == 0 ? 1 : 0); Lint: Pre-merge checks: clang-format: please reformat the code ``` - BuildMI(*MI.getParent(), MI, MI.getDebugLoc()…
		.addImm(MI.getOperand(2).getImm() == 0 ? 1 : 0);
		if (BuiltMI->isPseudo())
		expandPostRAPseudo(*BuiltMI);
		MI.getOperand(1).setReg(DstReg);
		}
		MI.setDesc(get(Opcode));
		MI.tieOperands(0, 1);
		if (MI.isPseudo())
		expandPostRAPseudo(MI);
		}

// MI is an RR-style pseudo instruction that zero-extends the low Size bits		// MI is an RR-style pseudo instruction that zero-extends the low Size bits
// of one GRX32 into another. Replace it with LowOpcode if both operands		// of one GRX32 into another. Replace it with LowOpcode if both operands
// are low registers, otherwise use RISB[LH]G.		// are low registers, otherwise use RISB[LH]G.
void SystemZInstrInfo::expandZExtPseudo(MachineInstr &MI, unsigned LowOpcode,		void SystemZInstrInfo::expandZExtPseudo(MachineInstr &MI, unsigned LowOpcode,
unsigned Size) const {		unsigned Size) const {
MachineInstrBuilder MIB =		MachineInstrBuilder MIB =
emitGRX32Move(*MI.getParent(), MI, MI.getDebugLoc(),		emitGRX32Move(*MI.getParent(), MI, MI.getDebugLoc(),
MI.getOperand(0).getReg(), MI.getOperand(1).getReg(), LowOpcode,		MI.getOperand(0).getReg(), MI.getOperand(1).getReg(), LowOpcode,
▲ Show 20 Lines • Show All 560 Lines • ▼ Show 20 Lines	MachineInstrBuilder(*MI.getParent()->getParent(), MI)
.add(Target)		.add(Target)
.addRegMask(RegMask)		.addRegMask(RegMask)
.addReg(SystemZ::CC, RegState::Implicit);		.addReg(SystemZ::CC, RegState::Implicit);
return true;		return true;
}		}
return false;		return false;
}		}

		// EXPERIMENTAL
		#include "llvm/Support/CommandLine.h"
		static cl::opt<bool> MultipleCmpOpUsers("peep-multiple-users", cl::init(true));
		static cl::opt<bool> LOCIPseudos("peep-pseudo", cl::init(false));

		bool SystemZInstrInfo::analyzeSelect(const MachineInstr &MI,
		SmallVectorImpl<MachineOperand> &Cond,
		unsigned &TrueOp, unsigned &FalseOp,
		bool &Optimizable) const {
		assert(MI.getDesc().isSelect() && "MI must be a select instruction");
		unsigned Opc = MI.getOpcode();
		if (Opc == SystemZ::LOCHIMux \|\| Opc == SystemZ::LOCGHI) {
		Optimizable = true;
		return false;
		}
		return true;
		}

		MachineInstr *SystemZInstrInfo::
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -MachineInstr SystemZInstrInfo:: -optimizeSelect(MachineInstr &MI, - SmallPtrSetImpl<MachineInstr > &SeenMIs, - bool) const { +MachineInstr SystemZInstrInfo::optimizeSelect( + MachineInstr &MI, SmallPtrSetImpl<MachineInstr > &SeenMIs, bool) const { Lint: Pre-merge checks: clang-format: please reformat the code ``` -MachineInstr *SystemZInstrInfo:: -optimizeSelect…
		optimizeSelect(MachineInstr &MI,
		SmallPtrSetImpl<MachineInstr *> &SeenMIs,
		bool) const {
		MachineBasicBlock *MBB = MI.getParent();
		const MachineRegisterInfo *MRI = &MBB->getParent()->getRegInfo();
		unsigned Opc = MI.getOpcode();
		assert(MI.getDesc().isSelect() && "MI must be a select instruction");
		assert((Opc == SystemZ::LOCHIMux \|\| Opc == SystemZ::LOCGHI) &&
		"Unexpected opcode");

		// Check that the conditionally loaded value is 1.
		if (MI.getOperand(2).getImm() != 1)
		return nullptr;

		// Check that the incoming source value is a loaded immediate zero.
		MachineInstr *SrcMI = MRI->getVRegDef(MI.getOperand(1).getReg());
		unsigned SrcOpc = SrcMI->getOpcode();
		if ((SrcOpc != SystemZ::LHIMux && SrcOpc != SystemZ::LGHI) \|\|
		SrcMI->getOperand(1).getImm() != 0)
		return nullptr;

		// Scan backwards in MBB and find the CC definition.
		MachineInstr *CmpMI = nullptr;
		for (MachineBasicBlock::iterator II = MI.getIterator();
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - for (MachineBasicBlock::iterator II = MI.getIterator(); - II != MBB->begin();) + for (MachineBasicBlock::iterator II = MI.getIterator(); II != MBB->begin();) Lint: Pre-merge checks: clang-format: please reformat the code ``` - for (MachineBasicBlock::iterator II = MI.
		II != MBB->begin();)
		if ((--II)->definesRegister(SystemZ::CC)) {
		CmpMI = &*II;
		break;
		}
		if (CmpMI == nullptr)
		return nullptr;
		unsigned CmpOpcode = CmpMI->getOpcode();
		if (CmpOpcode != SystemZ::CGHI &&
		(CmpOpcode != SystemZ::CHIMux \|\| Opc != SystemZ::LOCHIMux))
		return nullptr;

		// Check for a reusable known 0 or 1.
		int64_t CmpImm = CmpMI->getOperand(1).getImm();
		int64_t CCMask = MI.getOperand(4).getImm();
		bool NE0Case = CCMask == SystemZ::CCMASK_CMP_NE && CmpImm == 0;
		bool EQ1Case = CCMask == SystemZ::CCMASK_CMP_EQ && CmpImm == 1;
		if (!NE0Case && !EQ1Case)
		return nullptr;

		MachineOperand &CmpSrcMO = CmpMI->getOperand(0);
		if (!MRI->hasOneNonDBGUse(CmpSrcMO.getReg()) && !MultipleCmpOpUsers)
		return nullptr;

		unsigned PseudoOpc =
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - unsigned PseudoOpc = - MI.getOpcode() == SystemZ::LOCGHI ? SystemZ::LOCGHI_Pseudo_3 - : SystemZ::LOCHIMux_Pseudo_3; + unsigned PseudoOpc = MI.getOpcode() == SystemZ::LOCGHI + ? SystemZ::LOCGHI_Pseudo_3 + : SystemZ::LOCHIMux_Pseudo_3; Lint: Pre-merge checks: clang-format: please reformat the code ``` - unsigned PseudoOpc = - MI.getOpcode() ==…
		MI.getOpcode() == SystemZ::LOCGHI ? SystemZ::LOCGHI_Pseudo_3
		: SystemZ::LOCHIMux_Pseudo_3;
		if (!LOCIPseudos)
		PseudoOpc = MI.getOpcode();
		MachineInstrBuilder MIB =
		BuildMI(*MI.getParent(), MI, MI.getDebugLoc(), get(PseudoOpc))
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - BuildMI(MI.getParent(), MI, MI.getDebugLoc(), get(PseudoOpc)) - .add(MI.getOperand(0)) - .add(CmpSrcMO) - .add(MI.getOperand(2)) - .add(MI.getOperand(3)) - .add(MI.getOperand(4)); + BuildMI(MI.getParent(), MI, MI.getDebugLoc(), get(PseudoOpc)) + .add(MI.getOperand(0)) + .add(CmpSrcMO) + .add(MI.getOperand(2)) 2 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` - BuildMI(*MI.getParent(), MI, MI.getDebugLoc()…
		.add(MI.getOperand(0))
		.add(CmpSrcMO)
		.add(MI.getOperand(2))
		.add(MI.getOperand(3))
		.add(MI.getOperand(4));
		CmpSrcMO.setIsKill(false);
		if (CmpOpcode == SystemZ::CGHI && MI.getOpcode() == SystemZ::LOCHIMux)
		MIB->getOperand(1).setSubReg(SystemZ::subreg_l32);
		if (EQ1Case) {
		MIB->getOperand(2).setImm(0);
		MIB->getOperand(4).setImm(SystemZ::CCMASK_CMP_NE);
		}

		return MIB;
		}

void SystemZInstrInfo::copyPhysReg(MachineBasicBlock &MBB,		void SystemZInstrInfo::copyPhysReg(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI,		MachineBasicBlock::iterator MBBI,
const DebugLoc &DL, MCRegister DestReg,		const DebugLoc &DL, MCRegister DestReg,
MCRegister SrcReg, bool KillSrc) const {		MCRegister SrcReg, bool KillSrc) const {
// Split 128-bit GPR moves into two 64-bit moves. Add implicit uses of the		// Split 128-bit GPR moves into two 64-bit moves. Add implicit uses of the
// super register in case one of the subregs is undefined.		// super register in case one of the subregs is undefined.
// This handles ADDR128 too.		// This handles ADDR128 too.
if (SystemZ::GR128BitRegClass.contains(DestReg, SrcReg)) {		if (SystemZ::GR128BitRegClass.contains(DestReg, SrcReg)) {
▲ Show 20 Lines • Show All 601 Lines • ▼ Show 20 Lines	bool SystemZInstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
case SystemZ::LOCMux:		case SystemZ::LOCMux:
expandLOCPseudo(MI, SystemZ::LOC, SystemZ::LOCFH);		expandLOCPseudo(MI, SystemZ::LOC, SystemZ::LOCFH);
return true;		return true;

case SystemZ::LOCHIMux:		case SystemZ::LOCHIMux:
expandLOCPseudo(MI, SystemZ::LOCHI, SystemZ::LOCHHI);		expandLOCPseudo(MI, SystemZ::LOCHI, SystemZ::LOCHHI);
return true;		return true;

		case SystemZ::LOCGHI_Pseudo_3:
		expandLOCImmPseudo(MI, SystemZ::LOCGHI, SystemZ::LGHI);
		return true;

		case SystemZ::LOCHIMux_Pseudo_3:
		expandLOCImmPseudo(MI, SystemZ::LOCHIMux, SystemZ::LHIMux);
		return true;

case SystemZ::STCMux:		case SystemZ::STCMux:
expandRXYPseudo(MI, SystemZ::STC, SystemZ::STCH);		expandRXYPseudo(MI, SystemZ::STC, SystemZ::STCH);
return true;		return true;

case SystemZ::STHMux:		case SystemZ::STHMux:
expandRXYPseudo(MI, SystemZ::STH, SystemZ::STHH);		expandRXYPseudo(MI, SystemZ::STH, SystemZ::STHH);
return true;		return true;

▲ Show 20 Lines • Show All 602 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZInstrInfo.td

Show First 20 Lines • Show All 545 Lines • ▼ Show 20 Lines	let Predicates = [FeatureMiscellaneousExtensions3], Uses = [CC] in {
}		}
}		}

let Predicates = [FeatureLoadStoreOnCond2], Uses = [CC] in {		let Predicates = [FeatureLoadStoreOnCond2], Uses = [CC] in {
// Load immediate on condition. Matched via DAG pattern and created		// Load immediate on condition. Matched via DAG pattern and created
// by the PeepholeOptimizer via FoldImmediate.		// by the PeepholeOptimizer via FoldImmediate.

// Expands to LOCHI or LOCHHI, depending on the choice of register.		// Expands to LOCHI or LOCHHI, depending on the choice of register.
		let isSelect = 1, NumOpsKey = "lochimux", NumOpsValue = "2" in
def LOCHIMux : CondBinaryRIEPseudo<GRX32, imm32sx16>;		def LOCHIMux : CondBinaryRIEPseudo<GRX32, imm32sx16>;
defm LOCHHI : CondBinaryRIEPair<"lochhi", 0xEC4E, GRH32, imm32sx16>;		defm LOCHHI : CondBinaryRIEPair<"lochhi", 0xEC4E, GRH32, imm32sx16>;
defm LOCHI : CondBinaryRIEPair<"lochi", 0xEC42, GR32, imm32sx16>;		defm LOCHI : CondBinaryRIEPair<"lochi", 0xEC42, GR32, imm32sx16>;
		let isSelect = 1 in
defm LOCGHI : CondBinaryRIEPair<"locghi", 0xEC46, GR64, imm64sx16>;		defm LOCGHI : CondBinaryRIEPair<"locghi", 0xEC46, GR64, imm64sx16>;

		// 3-address pseudos inserted by optimizeSelect() in certain cases.
		// TODO: Merge these definitions with the class for LOCHIMux...
		let CCMaskLast = 1, NumOpsValue = "3" in {
		let NumOpsKey = "locghi" in
		def LOCGHI_Pseudo_3 : Pseudo<(outs GR64:$R1),
		(ins GR64:$R2, imm64sx16:$I3, cond4:$valid, cond4:$M4), []>;
		let NumOpsKey = "lochimux" in
		def LOCHIMux_Pseudo_3 : Pseudo<(outs GR32:$R1),
		(ins GR32:$R2, imm64sx16:$I3, cond4:$valid, cond4:$M4), []>;
		}

// Move register on condition. Matched via DAG pattern and		// Move register on condition. Matched via DAG pattern and
// created by early if-conversion.		// created by early if-conversion.
let isCommutable = 1 in {		let isCommutable = 1 in {
// Expands to LOCR or LOCFHR or a branch-and-move sequence,		// Expands to LOCR or LOCFHR or a branch-and-move sequence,
// depending on the choice of registers.		// depending on the choice of registers.
def LOCRMux : CondBinaryRRFPseudo<"MUXlocr", GRX32, GRX32>;		def LOCRMux : CondBinaryRRFPseudo<"MUXlocr", GRX32, GRX32>;
defm LOCFHR : CondBinaryRRFPair<"locfhr", 0xB9E0, GRH32, GRH32>;		defm LOCFHR : CondBinaryRRFPair<"locfhr", 0xB9E0, GRH32, GRH32>;
}		}
▲ Show 20 Lines • Show All 1,821 Lines • Show Last 20 Lines

llvm/test/CodeGen/SystemZ/setcc-05.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; Test SETCC for an integer comparison against 0. The 0 does not need to be
				; loaded if the condition is NE.
				;
				; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z13 \| FileCheck %s

				; ICMP NE 0: no need to load 0.
				define i32 @fun0(i8 zeroext %b) {
				; CHECK-LABEL: fun0:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: chi %r2, 0
				; CHECK-NEXT: lochilh %r2, 1
				; CHECK-NEXT: # kill: def $r2l killed $r2l killed $r2d
				; CHECK-NEXT: br %r14
				entry:
				%cc = icmp ne i8 %b, 0
				%conv = zext i1 %cc to i32
				ret i32 %conv
				}

				; ICMP EQ 0: need to load 0.
				define i32 @fun2(i8 zeroext %b) {
				; CHECK-LABEL: fun2:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: chi %r2, 0
				; CHECK-NEXT: lhi %r2, 0
				; CHECK-NEXT: lochie %r2, 1
				; CHECK-NEXT: br %r14
				entry:
				%cc = icmp eq i8 %b, 0
				%conv = zext i1 %cc to i32
				ret i32 %conv
				}

				; ICMP NE 0: The whole register is not checked, so need to load 0.
				define i32 @fun3(i32 %b) {
				; CHECK-LABEL: fun3:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: tmll %r2, 255
				; CHECK-NEXT: lhi %r2, 0
				; CHECK-NEXT: lochine %r2, 1
				; CHECK-NEXT: br %r14
				entry:
				%t = trunc i32 %b to i8
				%cc = icmp ne i8 %t, 0
				%conv = zext i1 %cc to i32
				ret i32 %conv
				}

				; ICMP NE 0: i64 with i32 use
				define i32 @fun4(i64 %b) {
				; CHECK-LABEL: fun4:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: cghi %r2, 0
				; CHECK-NEXT: lochilh %r2, 1
				; CHECK-NEXT: # kill: def $r2l killed $r2l killed $r2d
				; CHECK-NEXT: br %r14
				entry:
				%cc = icmp ne i64 %b, 0
				%conv = zext i1 %cc to i32
				ret i32 %conv
				}

				; ICMP NE 0: i64 with i64 use.
				define i64 @fun5(i64 %b) {
				; CHECK-LABEL: fun5:
				; CHECK: # %bb.0: # %bb
				; CHECK-NEXT: cghi %r2, 0
				; CHECK-NEXT: locghilh %r2, 1
				; CHECK-NEXT: br %r14
				bb:
				%cc = icmp ne i64 %b, 0
				%conv = zext i1 %cc to i64
				ret i64 %conv
				}

				; ICMP EQ 1: no need to load 1.
				define i32 @fun6(i8 zeroext %b) {
				; CHECK-LABEL: fun6:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: chi %r2, 1
				; CHECK-NEXT: lochilh %r2, 0
				; CHECK-NEXT: # kill: def $r2l killed $r2l killed $r2d
				; CHECK-NEXT: br %r14
				entry:
				%cc = icmp eq i8 %b, 1
				%conv = zext i1 %cc to i32
				ret i32 %conv
				}

				; ICMP NE 1: need to load 1.
				define i32 @fun7(i8 zeroext %b) {
				; CHECK-LABEL: fun7:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: chi %r2, 1
				; CHECK-NEXT: lhi %r2, 0
				; CHECK-NEXT: lochilh %r2, 1
				; CHECK-NEXT: br %r14
				entry:
				%cc = icmp ne i8 %b, 1
				%conv = zext i1 %cc to i32
				ret i32 %conv
				}

				; ICMP EQ 1: i64 with i32 use
				define i32 @fun8(i64 %b) {
				; CHECK-LABEL: fun8:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: cghi %r2, 1
				; CHECK-NEXT: lochilh %r2, 0
				; CHECK-NEXT: # kill: def $r2l killed $r2l killed $r2d
				; CHECK-NEXT: br %r14
				entry:
				%cc = icmp eq i64 %b, 1
				%conv = zext i1 %cc to i32
				ret i32 %conv
				}

				; ICMP EQ 1: i64 with i64 use
				define i64 @fun9(i64 %b) {
				; CHECK-LABEL: fun9:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: cghi %r2, 1
				; CHECK-NEXT: locghilh %r2, 0
				; CHECK-NEXT: br %r14
				entry:
				%cc = icmp eq i64 %b, 1
				%conv = zext i1 %cc to i64
				ret i64 %conv
				}