This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/AArch64/
-
Target/
-
AArch64/
32
AArch64InstrInfo.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
2
arm64-regress-opt-cmp.mir
-
subs-to-sub-opt.ll

Differential D18838

[AArch64][CodeGen] Fix of incorrect peephole optimization in AArch64InstrInfo::optimizeCompareInstr
ClosedPublic

Authored by eastig on Apr 6 2016, 11:38 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
jmolloy

Commits

rGfd89fe0dd363: [AArch64][CodeGen] Fix of PR27158: incorrect peephole optimization in…
rL266969: [AArch64][CodeGen] Fix of PR27158: incorrect peephole optimization in…

Summary

AArch64InstrInfo::optimizeCompareInstr has bug PR27158 which causes generation of incorrect code. Details can be found here: https://llvm.org/bugs/show_bug.cgi?id=27158

Fix:

A condition code used after CmpInstr and before the next modification of NZCV is found. The optimization is not applied if different condition codes are used. It might be difficult to find a candidate for substitution to satisfy all of them. I think this case with multiple used condition codes does not happen often.
Then it’s checked in 'canInstrSubstituteCmpInstr' that the instruction which defines a register for CmpInstr can produce the needed condition code itself or its S variant. If it or its S variant can produce then CmpInstr is removed.

A regression test is added.
A new test to check that SUBS is replaced by SUB is added.

Diff Detail

Event Timeline

eastig updated this revision to Diff 52827.Apr 6 2016, 11:38 AM

eastig retitled this revision from to [AArch64][CodeGen] Fix of incorrect peephole optimization in AArch64InstrInfo::optimizeCompareInstr.

eastig updated this object.

eastig added a reviewer: jmolloy.

eastig added a subscriber: llvm-commits.

Herald added subscribers: rengolin, aemerson. · View Herald TranscriptApr 6 2016, 11:38 AM

eastig updated this object.Apr 6 2016, 11:39 AM

t.p.northover added a subscriber: t.p.northover.Apr 6 2016, 1:01 PM

t.p.northover added inline comments.

lib/Target/AArch64/AArch64InstrInfo.cpp
975–977	I think the documentation should call out the fact that it really does check whether all uses are equivalent.
981	Why only NE? At the very least NE/EQ are pretty equivalent.
985–988	I don't follow this logic. A pure ADD should never produce any CondCode, and I'm not sure why an FI operand is involved at all.
1007–1033	I think this is probably a bit too tricksy, and both less efficient and less clear than the naive loop. I also think that looking at this from the position of CondCodes is probably the wrong level. The real distinction is which bits of NZCV are used (in particular, ADDS sets C & V differently from SUBS, as I recall, so mixing tests that only use Z & N is fine). With both of those observations, I think this function reduces to something like: for(MI= CmpInstr; MI != instr_end; ++MI) { if (MI->readsRegister(NZCV)) NZCVUsed \|= findNZCVUsedByInstr(MI); if (MI->modifiesRegister(NZCV)) { // Possibly return 0b1111 if MI also reads. return NZCVUsed; } }
1034	I think we should also check (or assert, if also checked before) that NZCV doesn't escape the basic block.
1043	We definitely shouldn't be relying on specific ordering like this, even if it is alphabetical.
1068–1071	These are unconditional (i.e. work on all possible AArch64CC) aren't they? Anyway, I think (with the repurposing of findUsedCondCodeAfterCmp suggested above) that the NZCV part of this function is really: if (sameSForm(..., isADDS) \|\| sameSForm(..., isSUBS)) return true; auto NZCVUsed = findUsedNZCVAfterCMP(...); return !NZCVUsed.C && !NZCVUsed.V;
1099–1100	This completely short-circuits the other walks you're adding doesn't it? (E.g. findUsedCondCodeAfterCmp will never find more than one use).
test/CodeGen/AArch64/arm64-regress-opt-cmp.ll
1 ↗	(On Diff #52827)	I think these tests are rather weak given how much code is changing and the complexity of the result. I'd probably suggest a MIR test for this one; they can be a bit temperamental to get started, but allow you to exercise many more edge cases quickly.

Hi Tim,

Thank you for comments. They are very useful.
See my answers.

lib/Target/AArch64/AArch64InstrInfo.cpp
975–977	It seems this function name is too general and does not correspond to what the function does. It was based on tests: CodeGen/AArch64/arm64-arm64-dead-def-elimination-flag.ll CodeGen/AArch64/arm64-dead-def-frame-index.ll The tests have a comparison of a result of 'alloca' with null. They expect a compare operation to be removed. A sequence of instruction is ADDX+SUBSX+CSINCW. I overcomplicated things.
981	You are right.
985–988	The logic is wrong.
1007–1033	Yes, you are right.
1034	Do you mean to check if flags are alive in successors of the basic block? If yes, this is checked in substituteCmpInstr.
1043	It's a pity. Maybe such functions already exist?
1068–1071	You are right.
1099–1100	It checks accesses(read, write) before Cmp and after MI. It was in the original code. I think this was done in more strict way to make code simpler because such a situation is rare if it ever exists.
test/CodeGen/AArch64/arm64-regress-opt-cmp.ll
1 ↗	(On Diff #52827)	I fully agree with you. I tried to write a simpler test but I failed to write it in IR. How can I write in MIR?

t.p.northover added inline comments.Apr 7 2016, 4:19 PM

lib/Target/AArch64/AArch64InstrInfo.cpp
1034	Yep, that's what I meant (I noticed it later). I think an assertion is probably still a good idea somewhere in the function (as documentation that anyone reading it shouldn't bother worrying, basically).
1043	Not as far as I'm aware, but I expect an explicit "Opcode == A \|\| Opcode == B \|\| ..." to be just as efficient when compiled. More verbose, but less worrying.
1099–1100	Yep, but to me it looks like your code already handles this properly, and would permit the optimization in more cases (albeit rare ones, as you say).
test/CodeGen/AArch64/arm64-regress-opt-cmp.ll
1 ↗	(On Diff #52827)	The hardest part last time I did it was making sure the pass was registered properly for it (hint: make sure initializeXYZPass gets called). Fortunately for you, it looks like this is already done properly for the peephole optimizer so you'd use "llc -run-pass=peephole-opts /path/to/file.mir" in the RUN line. Other than that, to get a basic MIR file to test you could either run "llc -stop-after=some-pass" (useful if you don't know quite how to write some construct) or just copy an existing one and modify it as needed. The format is less obvious and documented than LLVM IR so writing one from fresh is not a good idea (yet, at least).

Updated the patch according to Tim's comments.

Ping.

t.p.northover added inline comments.Apr 18 2016, 12:13 PM

lib/Target/AArch64/AArch64InstrInfo.cpp
993	Can you explicitly annotate this fall-through?
1005–1006	This shouldn't be a fallthrough if I'm reading the manual correctly (definition of `ConditionHolds` on page J1-5267 for example).
1044–1045	Do we ever check that they're comparing agains the same value as MI? (Always 0 I believe).
1053–1055	What's allowed between MI and CmpInstr depends on what MI is: If it's an ADDS/SUBS already then any use of flags is permitted and only definitions are wrong. If it's an ADD/SUB then neither uses nor defs are allowed (of any flags). Uses imply NZCV is already live and we're going to clobber it; defs imply it would be clobbered before it reached CmpInstr.
1083–1084	I think this function should either return an NZCVUsed (to be checked by the caller) or check the forms of MI and CmpInstr itself. If they're SUBS/CMP or ADDS/CMN then the substitution is valid regardless of flags used.
test/CodeGen/AArch64/arm64-regress-opt-cmp.mir
1	This test has lots of extra cruft and only ends up checking one instance anyway. The point of MIR tests is that we can exercise more aspects of the logic than from IR alone.

Hi Tim,

Thank you for comments. See my answers.
I am updating the patch. Need some time. Quite busy on armcc.

lib/Target/AArch64/AArch64InstrInfo.cpp
993	I will add comments to each case to show which flags are used.
1005–1006	Good catch. Thank you. It's a bug. 'break' is missed.
1044–1045	I will rename 'substituteCmpInstr' to 'substituteCmpToZero' to be clear what is the case. I don't think we need to check that MI and CmpInstr use the same value. I added these checks because 'optimizeCompareInstr' is called for instructions which are supported by 'analyzeCompare'. Currently they are ADDS, SUBS and ANDS. We cannot substitute ANDS because 'ANDS vreg, 0' always produces 0. We can substitute 'ANDS vreg, -1' but it's not comparision with zero. Is this case worth?
1053–1055	You are right. I would rewrite this as follows: If MI opcode is the S form there must be no defs of flags. If MI opcode is not the S form there must be neither defs of flags nor uses of flags.
1083–1084	We can only be sure for N and Z flags. SUBS/CMP or ADDS/CMN can produce different C and V flags, e.g. %vr2 = SUBS %vr1, 1 ; sets C to 0 when %vr1 == 0 %cmp = SUBS %vr2, 0 ; sets C to 1
test/CodeGen/AArch64/arm64-regress-opt-cmp.mir
1	Yes, it looks cumbersome. I am reducing it as much as possible.

t.p.northover added inline comments.Apr 19 2016, 1:10 PM

lib/Target/AArch64/AArch64InstrInfo.cpp
1044–1045	Sorry about that, I was talking nonsense. I forgot that SrcReg was the destination of the candidate instruction so comparison against 0 was automatic if flags were set. No need to support anything else.
1083–1084	Oh, of course. Same flawed reasoning from me as above I think.

Updated according to Tim's comments

Thanks for the updates. This looks good to me now.

Tim.

lib/Target/AArch64/AArch64InstrInfo.cpp
1058–1059	Nice implementation!

This revision is now accepted and ready to land.Apr 20 2016, 1:21 PM

Closed by commit rL266969: [AArch64][CodeGen] Fix of PR27158: incorrect peephole optimization in… (authored by eastig). · Explain WhyApr 21 2016, 1:59 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64InstrInfo.cpp

231 lines

test/

CodeGen/

AArch64/

arm64-regress-opt-cmp.mir

113 lines

subs-to-sub-opt.ll

23 lines

Diff 53407

lib/Target/AArch64/AArch64InstrInfo.cpp

	Show First 20 Lines • Show All 923 Lines • ▼ Show 20 Lines
	/// Check if AArch64::NZCV should be alive in successors of MBB.			/// Check if AArch64::NZCV should be alive in successors of MBB.
	static bool areCFlagsAliveInSuccessors(MachineBasicBlock *MBB) {			static bool areCFlagsAliveInSuccessors(MachineBasicBlock *MBB) {
	for (auto *BB : MBB->successors())			for (auto *BB : MBB->successors())
	if (BB->isLiveIn(AArch64::NZCV))			if (BB->isLiveIn(AArch64::NZCV))
	return true;			return true;
	return false;			return false;
	}			}

	/// Substitute CmpInstr with another instruction which produces a needed			struct UsedNZCV {
	/// condition code.			bool N;
	/// Return true on success.			bool Z;
	bool AArch64InstrInfo::substituteCmpInstr(MachineInstr *CmpInstr,			bool C;
	unsigned SrcReg, const MachineRegisterInfo *MRI) const {			bool V;
	// Get the unique definition of SrcReg.			UsedNZCV(): N(false), Z(false), C(false), V(false) {}
	MachineInstr *MI = MRI->getUniqueVRegDef(SrcReg);			UsedNZCV& operator \|=(const UsedNZCV& UsedFlags) {
	if (!MI)			this->N \|= UsedFlags.N;
	return false;			this->Z \|= UsedFlags.Z;
				this->C \|= UsedFlags.C;
	const TargetRegisterInfo *TRI = &getRegisterInfo();			this->V \|= UsedFlags.V;
	if (areCFlagsAccessedBetweenInstrs(MI, CmpInstr, TRI))			return *this;
	return false;			}
				};

	unsigned NewOpc = sForm(*MI);			/// Find a condition code used by the instruction.
	if (NewOpc == AArch64::INSTRUCTION_LIST_END)			/// Returns AArch64CC::Invalid if either the instruction does not use condition
	return false;			/// codes or we don't optimize CmpInstr in the presence of such instructions.
				static AArch64CC::CondCode findCondCodeUsedByInstr(const MachineInstr &Instr) {
				switch (Instr.getOpcode()) {
				default:
				return AArch64CC::Invalid;

	// Scan forward for the use of NZCV.			case AArch64::Bcc: {
	// When checking against MI: if it's a conditional code requires			int Idx = Instr.findRegisterUseOperandIdx(AArch64::NZCV);
	// checking of V bit, then this is not safe to do.			assert(Idx >= 2);
	// It is safe to remove CmpInstr if NZCV is redefined or killed.			return static_cast<AArch64CC::CondCode>(Instr.getOperand(Idx - 2).getImm());
	// If we are done with the basic block, we need to check whether NZCV is
	// live-out.
	bool IsSafe = false;
	for (MachineBasicBlock::iterator I = CmpInstr,
	E = CmpInstr->getParent()->end();
	!IsSafe && ++I != E;) {
	const MachineInstr &Instr = *I;
	for (unsigned IO = 0, EO = Instr.getNumOperands(); !IsSafe && IO != EO;
	++IO) {
	const MachineOperand &MO = Instr.getOperand(IO);
	if (MO.isRegMask() && MO.clobbersPhysReg(AArch64::NZCV)) {
	IsSafe = true;
	break;
	}
	if (!MO.isReg() \|\| MO.getReg() != AArch64::NZCV)
	continue;
	if (MO.isDef()) {
	IsSafe = true;
	break;
	}			}

	// Decode the condition code.
	unsigned Opc = Instr.getOpcode();
	AArch64CC::CondCode CC;
	switch (Opc) {
	default:
	return false;
	case AArch64::Bcc:
	CC = (AArch64CC::CondCode)Instr.getOperand(IO - 2).getImm();
	break;
	case AArch64::CSINVWr:			case AArch64::CSINVWr:
	case AArch64::CSINVXr:			case AArch64::CSINVXr:
	case AArch64::CSINCWr:			case AArch64::CSINCWr:
	case AArch64::CSINCXr:			case AArch64::CSINCXr:
	case AArch64::CSELWr:			case AArch64::CSELWr:
	case AArch64::CSELXr:			case AArch64::CSELXr:
	case AArch64::CSNEGWr:			case AArch64::CSNEGWr:
	case AArch64::CSNEGXr:			case AArch64::CSNEGXr:
	case AArch64::FCSELSrrr:			case AArch64::FCSELSrrr:
	case AArch64::FCSELDrrr:			case AArch64::FCSELDrrr: {
	CC = (AArch64CC::CondCode)Instr.getOperand(IO - 1).getImm();			int Idx = Instr.findRegisterUseOperandIdx(AArch64::NZCV);
	break;			assert(Idx >= 1);
				return static_cast<AArch64CC::CondCode>(Instr.getOperand(Idx - 1).getImm());
				}
				}
	}			}

				t.p.northoverUnsubmitted Not Done Reply Inline Actions I think the documentation should call out the fact that it really does check whether all uses are equivalent. t.p.northover: I think the documentation should call out the fact that it really does check whether all uses…
				eastigAuthorUnsubmitted Not Done Reply Inline Actions It seems this function name is too general and does not correspond to what the function does. It was based on tests: CodeGen/AArch64/arm64-arm64-dead-def-elimination-flag.ll CodeGen/AArch64/arm64-dead-def-frame-index.ll The tests have a comparison of a result of 'alloca' with null. They expect a compare operation to be removed. A sequence of instruction is ADDX+SUBSX+CSINCW. I overcomplicated things. eastig: It seems this function name is too general and does not correspond to what the function does.
	// It is not safe to remove Compare instruction if Overflow(V) is used.			static UsedNZCV getUsedNZCV(AArch64CC::CondCode CC) {
				assert(CC != AArch64CC::Invalid);
				UsedNZCV NZCV;
	switch (CC) {			switch (CC) {
				t.p.northoverUnsubmitted Not Done Reply Inline Actions Why only NE? At the very least NE/EQ are pretty equivalent. t.p.northover: Why only NE? At the very least NE/EQ are pretty equivalent.
				eastigAuthorUnsubmitted Not Done Reply Inline Actions You are right. eastig: You are right.
	default:			default:
	// NZCV can be used multiple times, we should continue.
	break;			break;

				case AArch64CC::EQ:
				case AArch64CC::NE:
				NZCV.Z = true;
				break;
				t.p.northoverUnsubmitted Not Done Reply Inline Actions I don't follow this logic. A pure ADD should never produce any CondCode, and I'm not sure why an FI operand is involved at all. t.p.northover: I don't follow this logic. A pure ADD should never produce any CondCode, and I'm not sure why…
				eastigAuthorUnsubmitted Not Done Reply Inline Actions The logic is wrong. eastig: The logic is wrong.

				case AArch64CC::HI:
				case AArch64CC::LS:
				NZCV.Z = true;
				case AArch64CC::HS:
				t.p.northoverUnsubmitted Not Done Reply Inline Actions Can you explicitly annotate this fall-through? t.p.northover: Can you explicitly annotate this fall-through?
				eastigAuthorUnsubmitted Not Done Reply Inline Actions I will add comments to each case to show which flags are used. eastig: I will add comments to each case to show which flags are used.
				case AArch64CC::LO:
				NZCV.C = true;
				break;

				case AArch64CC::MI:
				case AArch64CC::PL:
				NZCV.N = true;
				break;

	case AArch64CC::VS:			case AArch64CC::VS:
	case AArch64CC::VC:			case AArch64CC::VC:
	case AArch64CC::GE:			NZCV.V = true;
	case AArch64CC::LT:
				t.p.northoverUnsubmitted Not Done Reply Inline Actions This shouldn't be a fallthrough if I'm reading the manual correctly (definition of `ConditionHolds` on page J1-5267 for example). t.p.northover: This shouldn't be a fallthrough if I'm reading the manual correctly (definition of…
				eastigAuthorUnsubmitted Not Done Reply Inline Actions Good catch. Thank you. It's a bug. 'break' is missed. eastig: Good catch. Thank you. It's a bug. 'break' is missed.
	case AArch64CC::GT:			case AArch64CC::GT:
	case AArch64CC::LE:			case AArch64CC::LE:
				NZCV.Z = true;
				case AArch64CC::GE:
				case AArch64CC::LT:
				NZCV.N = true;
				NZCV.V = true;
				break;
				}
				return NZCV;
				}

				static bool isADDSRegImm(unsigned Opcode) {
				return Opcode == AArch64::ADDSWri \|\| Opcode == AArch64::ADDSXri;
				}

				static bool isSUBSRegImm(unsigned Opcode) {
				return Opcode == AArch64::SUBSWri \|\| Opcode == AArch64::SUBSXri;
				}

				// Check if CmpInstr can be substituted by MI.
				//
				//
				// CmpInstr can be substituted:
				// - CmpInstr is either ADDS or SUBS
				// - and, MI and CmpInstr are from the same MachineBB
				// - and, condition flags are not alive in successors of the CmpInstr parent
				t.p.northoverUnsubmitted Not Done Reply Inline Actions I think this is probably a bit too tricksy, and both less efficient and less clear than the naive loop. I also think that looking at this from the position of CondCodes is probably the wrong level. The real distinction is which bits of NZCV are used (in particular, ADDS sets C & V differently from SUBS, as I recall, so mixing tests that only use Z & N is fine). With both of those observations, I think this function reduces to something like: for(MI= CmpInstr; MI != instr_end; ++MI) { if (MI->readsRegister(NZCV)) NZCVUsed \|= findNZCVUsedByInstr(MI); if (MI->modifiesRegister(NZCV)) { // Possibly return 0b1111 if MI also reads. return NZCVUsed; } } t.p.northover: I think this is probably a bit too tricksy, and both less efficient and less clear than the…
				eastigAuthorUnsubmitted Not Done Reply Inline Actions Yes, you are right. eastig: Yes, you are right.
				// - and, no writes to NZCV between MI and CmpInstr
				t.p.northoverUnsubmitted Not Done Reply Inline Actions I think we should also check (or assert, if also checked before) that NZCV doesn't escape the basic block. t.p.northover: I think we should also check (or assert, if also checked before) that NZCV doesn't escape the…
				eastigAuthorUnsubmitted Not Done Reply Inline Actions Do you mean to check if flags are alive in successors of the basic block? If yes, this is checked in substituteCmpInstr. eastig: Do you mean to check if flags are alive in successors of the basic block? If yes, this is…
				t.p.northoverUnsubmitted Not Done Reply Inline Actions Yep, that's what I meant (I noticed it later). I think an assertion is probably still a good idea somewhere in the function (as documentation that anyone reading it shouldn't bother worrying, basically). t.p.northover: Yep, that's what I meant (I noticed it later). I think an assertion is probably still a good…
				// - and, C/V flags are not used between MI and CmpInstr
				// - and C/V flags are not used after CmpInstr
				static bool canInstrSubstituteCmpInstr(MachineInstr MI, MachineInstr CmpInstr,
				const TargetRegisterInfo *TRI) {
				assert(MI);
				assert(sForm(*MI) != AArch64::INSTRUCTION_LIST_END);
				assert(CmpInstr);

				const unsigned CmpOpcode = CmpInstr->getOpcode();
				t.p.northoverUnsubmitted Not Done Reply Inline Actions We definitely shouldn't be relying on specific ordering like this, even if it is alphabetical. t.p.northover: We definitely shouldn't be relying on specific ordering like this, even if it is alphabetical.
				eastigAuthorUnsubmitted Not Done Reply Inline Actions It's a pity. Maybe such functions already exist? eastig: It's a pity. Maybe such functions already exist?
				t.p.northoverUnsubmitted Not Done Reply Inline Actions Not as far as I'm aware, but I expect an explicit "Opcode == A \|\| Opcode == B \|\| ..." to be just as efficient when compiled. More verbose, but less worrying. t.p.northover: Not as far as I'm aware, but I expect an explicit "Opcode == A \|\| Opcode == B \|\| ..." to be…
				if (!isADDSRegImm(CmpOpcode) && !isSUBSRegImm(CmpOpcode))
				return false;
				t.p.northoverUnsubmitted Not Done Reply Inline Actions Do we ever check that they're comparing agains the same value as MI? (Always 0 I believe). t.p.northover: Do we ever check that they're comparing agains the same value as MI? (Always 0 I believe).
				eastigAuthorUnsubmitted Not Done Reply Inline Actions I will rename 'substituteCmpInstr' to 'substituteCmpToZero' to be clear what is the case. I don't think we need to check that MI and CmpInstr use the same value. I added these checks because 'optimizeCompareInstr' is called for instructions which are supported by 'analyzeCompare'. Currently they are ADDS, SUBS and ANDS. We cannot substitute ANDS because 'ANDS vreg, 0' always produces 0. We can substitute 'ANDS vreg, -1' but it's not comparision with zero. Is this case worth? eastig: I will rename 'substituteCmpInstr' to 'substituteCmpToZero' to be clear what is the case. I…
				t.p.northoverUnsubmitted Not Done Reply Inline Actions Sorry about that, I was talking nonsense. I forgot that SrcReg was the destination of the candidate instruction so comparison against 0 was automatic if flags were set. No need to support anything else. t.p.northover: Sorry about that, I was talking nonsense. I forgot that SrcReg was the destination of the…

				if (MI->getParent() != CmpInstr->getParent())
				return false;

				if (areCFlagsAliveInSuccessors(CmpInstr->getParent()))
				return false;

				UsedNZCV NZCVUsedBetweenMIAndCmp;
				for (auto I = std::next(MI->getIterator()), E = CmpInstr->getIterator();
				I != E; ++I) {
				t.p.northoverUnsubmitted Not Done Reply Inline Actions What's allowed between MI and CmpInstr depends on what MI is: If it's an ADDS/SUBS already then any use of flags is permitted and only definitions are wrong. If it's an ADD/SUB then neither uses nor defs are allowed (of any flags). Uses imply NZCV is already live and we're going to clobber it; defs imply it would be clobbered before it reached CmpInstr. t.p.northover: What's allowed between MI and CmpInstr depends on what MI is: * If it's an ADDS/SUBS already…
				eastigAuthorUnsubmitted Not Done Reply Inline Actions You are right. I would rewrite this as follows: If MI opcode is the S form there must be no defs of flags. If MI opcode is not the S form there must be neither defs of flags nor uses of flags. eastig: You are right. I would rewrite this as follows: - If MI opcode is the S form there must be no…
				const MachineInstr &Instr = *I;
				if (Instr.modifiesRegister(AArch64::NZCV, TRI))
				return false;
				if (!Instr.readsRegister(AArch64::NZCV, TRI))
				t.p.northoverUnsubmitted Not Done Reply Inline Actions Nice implementation! t.p.northover: Nice implementation!
				continue;

				AArch64CC::CondCode CC = findCondCodeUsedByInstr(Instr);
				if (CC == AArch64CC::Invalid) // Unsupported conditional instruction
				return false;

				NZCVUsedBetweenMIAndCmp \|= getUsedNZCV(CC);
				}
				UsedNZCV NZCVUsedAfterCmp;
				for (auto I = std::next(CmpInstr->getIterator()), E = CmpInstr->getParent()->instr_end();
				I != E; ++I) {
				const MachineInstr &Instr = *I;
				t.p.northoverUnsubmitted Not Done Reply Inline Actions These are unconditional (i.e. work on all possible AArch64CC) aren't they? Anyway, I think (with the repurposing of findUsedCondCodeAfterCmp suggested above) that the NZCV part of this function is really: if (sameSForm(..., isADDS) \|\| sameSForm(..., isSUBS)) return true; auto NZCVUsed = findUsedNZCVAfterCMP(...); return !NZCVUsed.C && !NZCVUsed.V; t.p.northover: These are unconditional (i.e. work on all possible AArch64CC) aren't they? Anyway, I think…
				eastigAuthorUnsubmitted Not Done Reply Inline Actions You are right. eastig: You are right.
				if (Instr.readsRegister(AArch64::NZCV, TRI)) {
				AArch64CC::CondCode CC = findCondCodeUsedByInstr(Instr);
				if (CC == AArch64CC::Invalid) // Unsupported conditional instruction
	return false;			return false;
				NZCVUsedAfterCmp \|= getUsedNZCV(CC);
	}			}

				if (Instr.modifiesRegister(AArch64::NZCV, TRI))
				break;
	}			}

				return !NZCVUsedBetweenMIAndCmp.C && !NZCVUsedBetweenMIAndCmp.V
				&& !NZCVUsedAfterCmp.C && !NZCVUsedAfterCmp.V;
				t.p.northoverUnsubmitted Not Done Reply Inline Actions I think this function should either return an NZCVUsed (to be checked by the caller) or check the forms of MI and CmpInstr itself. If they're SUBS/CMP or ADDS/CMN then the substitution is valid regardless of flags used. t.p.northover: I think this function should either return an NZCVUsed (to be checked by the caller) or check…
				eastigAuthorUnsubmitted Not Done Reply Inline Actions We can only be sure for N and Z flags. SUBS/CMP or ADDS/CMN can produce different C and V flags, e.g. %vr2 = SUBS %vr1, 1 ; sets C to 0 when %vr1 == 0 %cmp = SUBS %vr2, 0 ; sets C to 1 eastig: We can only be sure for N and Z flags. SUBS/CMP or ADDS/CMN can produce different C and V flags…
				t.p.northoverUnsubmitted Not Done Reply Inline Actions Oh, of course. Same flawed reasoning from me as above I think. t.p.northover: Oh, of course. Same flawed reasoning from me as above I think.
	}			}

	// If NZCV is not killed nor re-defined, we should check whether it is			/// Substitute CmpInstr with another instruction which produces needed
	// live-out. If it is live-out, do not optimize.			/// condition flags.
	if (!IsSafe && areCFlagsAliveInSuccessors(CmpInstr->getParent()))			/// Return true on success.
				bool AArch64InstrInfo::substituteCmpInstr(MachineInstr *CmpInstr,
				unsigned SrcReg, const MachineRegisterInfo *MRI) const {
				assert(CmpInstr);
				assert(MRI);
				// Get the unique definition of SrcReg.
				MachineInstr *MI = MRI->getUniqueVRegDef(SrcReg);
				if (!MI)
				return false;

				const TargetRegisterInfo *TRI = &getRegisterInfo();

				t.p.northoverUnsubmitted Not Done Reply Inline Actions This completely short-circuits the other walks you're adding doesn't it? (E.g. findUsedCondCodeAfterCmp will never find more than one use). t.p.northover: This completely short-circuits the other walks you're adding doesn't it? (E.g.
				eastigAuthorUnsubmitted Not Done Reply Inline Actions It checks accesses(read, write) before Cmp and after MI. It was in the original code. I think this was done in more strict way to make code simpler because such a situation is rare if it ever exists. eastig: It checks accesses(read, write) before Cmp and after MI. It was in the original code. I think…
				t.p.northoverUnsubmitted Not Done Reply Inline Actions Yep, but to me it looks like your code already handles this properly, and would permit the optimization in more cases (albeit rare ones, as you say). t.p.northover: Yep, but to me it looks like your code already handles this properly, and would permit the…
				unsigned NewOpc = sForm(*MI);
				if (NewOpc == AArch64::INSTRUCTION_LIST_END)
				return false;

				if (!canInstrSubstituteCmpInstr(MI, CmpInstr, TRI))
	return false;			return false;

	// Update the instruction to set NZCV.			// Update the instruction to set NZCV.
	MI->setDesc(get(NewOpc));			MI->setDesc(get(NewOpc));
	CmpInstr->eraseFromParent();			CmpInstr->eraseFromParent();
	bool succeeded = UpdateOperandRegClass(MI);			bool succeeded = UpdateOperandRegClass(MI);
	(void)succeeded;			(void)succeeded;
	assert(succeeded && "Some operands reg class are incompatible!");			assert(succeeded && "Some operands reg class are incompatible!");
	▲ Show 20 Lines • Show All 2,269 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-regress-opt-cmp.mir

This file was added.

				# RUN: llc -mtriple=aarch64-linux-gnu -run-pass peephole-opts %s 2>&1 \| FileCheck %s
				t.p.northoverUnsubmitted Not Done Reply Inline Actions This test has lots of extra cruft and only ends up checking one instance anyway. The point of MIR tests is that we can exercise more aspects of the logic than from IR alone. t.p.northover: This test has lots of extra cruft and only ends up checking one instance anyway. The point of…
				eastigAuthorUnsubmitted Not Done Reply Inline Actions Yes, it looks cumbersome. I am reducing it as much as possible. eastig: Yes, it looks cumbersome. I am reducing it as much as possible.
				# CHECK: %8 = LSLVWr {{.*}}
				# CHECK-NEXT: %9 = ANDWri {{.*}}
				# CHECK-NEXT: %10 = SUBSWri {{.*}}
				--- \|
				; ModuleID = 'arm64-regress-opt-cmp.ll'
				source_filename = "arm64-regress-opt-cmp.ll"
				target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64--linux-gnu"

				@d = internal global [4 x i8] c"\01\00\00\00", align 1
				@c = internal global i8 2, align 1

				declare void @a(i32)

				; Function Attrs: nounwind
				define i32 @test01() #0 {
				entry:
				%0 = load i8, i8* getelementptr inbounds ([4 x i8], [4 x i8]* @d, i64 0, i64 3), align 1
				store i8 %0, i8* @c, align 1
				%1 = load i8, i8* getelementptr inbounds ([4 x i8], [4 x i8]* @d, i64 0, i64 1), align 1
				%conv = zext i8 %1 to i32
				%2 = load i8, i8* getelementptr inbounds ([4 x i8], [4 x i8]* @d, i64 0, i64 2), align 1
				%conv1 = zext i8 %2 to i32
				%shl = shl i32 %conv, %conv1
				%conv3 = and i32 %shl, 65535
				%cmp = icmp ult i32 %conv3, zext (i1 icmp eq (i8* getelementptr inbounds ([4 x i8], [4 x i8]* @d, i64 0, i64 3), i8* @c) to i32)
				br i1 %cmp, label %if.end, label %if.then

				if.then: ; preds = %entry
				call void @a(i32 0)
				br label %if.end

				if.end: ; preds = %if.then, %entry
				ret i32 0
				}

				; Function Attrs: nounwind
				declare void @llvm.stackprotector(i8, i8*) #0

				attributes #0 = { nounwind }

				...
				---
				name: test01
				alignment: 2
				exposesReturnsTwice: false
				hasInlineAsm: false
				allVRegsAllocated: false
				isSSA: true
				tracksRegLiveness: true
				tracksSubRegLiveness: false
				registers:
				- { id: 0, class: gpr64common }
				- { id: 1, class: gpr64common }
				- { id: 2, class: gpr32 }
				- { id: 3, class: gpr64common }
				- { id: 4, class: gpr32 }
				- { id: 5, class: gpr32 }
				- { id: 6, class: gpr64all }
				- { id: 7, class: gpr32 }
				- { id: 8, class: gpr32 }
				- { id: 9, class: gpr32common }
				- { id: 10, class: gpr32 }
				- { id: 11, class: gpr32all }
				- { id: 12, class: gpr32all }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 0
				adjustsStack: false
				hasCalls: true
				maxCallFrameSize: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				body: \|
				bb.0.entry:
				successors: %bb.2.if.end, %bb.1.if.then

				%0 = MOVaddr target-flags(aarch64-page) @d, target-flags(aarch64-pageoff, aarch64-nc) @d
				early-clobber %1, %2 = LDRBBpre %0, 3
				%3 = MOVaddr target-flags(aarch64-page) @c, target-flags(aarch64-pageoff, aarch64-nc) @c
				STRBBui killed %2, %3, 0 :: (store 1 into @c)
				%4 = LDURBBi %1, -2 :: (load 1 from `i8* getelementptr inbounds ([4 x i8], [4 x i8]* @d, i64 0, i64 1)`)
				%5 = LDURBBi %1, -1 :: (load 1 from `i8* getelementptr inbounds ([4 x i8], [4 x i8]* @d, i64 0, i64 2)`)
				%6 = SUBREG_TO_REG 0, killed %5, 15
				%7 = COPY %6:sub_32
				%8 = LSLVWr killed %4, killed %7
				%9 = ANDWri killed %8, 15
				%10 = SUBSWri killed %9, 0, 0, implicit-def %nzcv
				Bcc 3, %bb.2.if.end, implicit %nzcv
				B %bb.1.if.then

				bb.1.if.then:
				successors: %bb.2.if.end

				ADJCALLSTACKDOWN 0, implicit-def dead %sp, implicit %sp
				%11 = COPY %wzr
				%w0 = COPY %11
				BL @a, csr_aarch64_aapcs, implicit-def dead %lr, implicit %sp, implicit %w0, implicit-def %sp
				ADJCALLSTACKUP 0, 0, implicit-def dead %sp, implicit %sp

				bb.2.if.end:
				%12 = COPY %wzr
				%w0 = COPY %12
				RET_ReallyLR implicit %w0

				...

test/CodeGen/AArch64/subs-to-sub-opt.ll

This file was added.

				; RUN: llc -mtriple=aarch64-linux-gnu -O3 -o - %s \| FileCheck %s

				@a = external global i8, align 1
				@b = external global i8, align 1

				; Test that SUBS is replaced by SUB if condition flags are not used.
				define i32 @test01() nounwind {
				; CHECK: ldrb {{.*}}
				; CHECK-NEXT: ldrb {{.*}}
				; CHECK-NEXT: sub {{.*}}
				; CHECK-NEXT: cmn {{.*}}
				entry:
				%0 = load i8, i8* @a, align 1
				%conv = zext i8 %0 to i32
				%1 = load i8, i8* @b, align 1
				%conv1 = zext i8 %1 to i32
				%s = sub nsw i32 %conv1, %conv
				%cmp0 = icmp eq i32 %s, -1
				%cmp1 = sext i1 %cmp0 to i8
				store i8 %cmp1, i8* @a
				ret i32 0
				}