This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/ARM/
-
Target/
-
ARM/
-
ARMBaseInstrInfo.h
-
ARMBaseInstrInfo.cpp
-
test/CodeGen/ARM/
-
CodeGen/
-
ARM/
-
intrinsics-overflow.ll
-
su-addsub-overflow.ll

Differential D38378

[ARM] Optimize {s,u}{add,sub}.with.overflow.
ClosedPublic

Authored by jgalenson on Sep 28 2017, 12:39 PM.

Download Raw Diff

Details

Reviewers

efriedma
t.p.northover
kristof.beyls
rovka
rengolin
rogfer01

Commits

rGfe7fa40869b5: [ARM] Optimize {s,u}{add,sub}.with.overflow.
rL322737: [ARM] Optimize {s,u}{add,sub}.with.overflow.

Summary

The ARM backend contains code that tries to optimize compares by replacing them with an existing instruction that sets the flags the same way. This allows it to replace a "cmp" with a "adds", generalizing the code that replaces "cmp" with "sub". It also heuristically disables sinking of instructions that could potentially be used to replace compares (currently only if they're next to each other).

Diff Detail

Repository: rL LLVM

Event Timeline

jgalenson created this revision.Sep 28 2017, 12:39 PM

Herald added subscribers: kristof.beyls, javed.absar, aemerson. · View Herald TranscriptSep 28 2017, 12:39 PM

jgalenson added a parent revision: D35635: [ARM] Optimize {s,u}{add,sub}.with.overflow.Sep 28 2017, 12:40 PM

Hi Joel,

Is my impression correct that this should improve code size and potentially performance in general, not just for the overflow intrinsics?
If so, did you manage to collect any numbers of the impact of this patch on code size and/or performance on some benchmark?

One other high-level thought is that it isn't just sub and add instructions that can set flags and potentially remove the need for a compare instruction.
I'm afraid I don't have a feel for how often a compare can be eliminate by using subs/adds or other flag-setting instructions.
I wonder if the changes in this patch could be generalized to enable flag-setting instructions other than adds/subs to remove compare instructions too?
Not that I think that necessarily needs to be done in this patch - just wondering if this patch is an incremental step towards enabling even more removal of compare instructions.

Thanks!

Kristof

Hi Kristof,

Yes, this in theory could improve performance and code size in general, not just with the overflow intrinsics. I hadn't thought about that, though, so I hadn't looked into it before. I just ran a couple smallish benchmarks and saw no change (I did, however, find a bug in my patch, so it was useful!), so I doubt it makes a large difference, but further testing would be good. Unfortunately, I'm busy with something else at the moment, and I won't have too much time on this for at least a few days.

As for generalizing this patch, I believe you're right, although I don't actually know ARM very well. But, for example, I would think we could use an ANDS to replace a CMP EQ. My guess is that these would be relatively uncommon, but I could certainly be wrong, so it probably would be useful to look into this more.

javed.absar added inline comments.Sep 30 2017, 12:31 AM

lib/Target/ARM/ARMBaseInstrInfo.cpp
2569 ↗	(On Diff #117225)	Did you mean t2ADDrr in second case? You have two conditions checking for ARM::ADDrr. Same wit ARM::ADDri

jgalenson added inline comments.Sep 30 2017, 9:06 AM

lib/Target/ARM/ARMBaseInstrInfo.cpp
2569 ↗	(On Diff #117225)	I'm not sure what you mean. This seems reasonable to me. I'm using the result and first operand of the add to match the two operands of the compare, so I don't care what the last operand of the add is. I thus don't care whether it's a register or an immediate, and it can also be either ARM or Thumb. So don't all four of these cases work?

Ping.

jgalenson retitled this revision from Optimize {s,u}{add,sub}.with.overflow on ARM. to [ARM] Optimize {s,u}{add,sub}.with.overflow..Nov 9 2017, 9:31 AM

javed.absar added inline comments.Nov 9 2017, 12:39 PM

lib/Target/ARM/ARMBaseInstrInfo.cpp
2569 ↗	(On Diff #117225)	OK, my bad. I missed out the "t2" on my screen. All good.

javed.absar added reviewers: rovka, rengolin.Nov 9 2017, 12:39 PM

rogfer01 added a subscriber: rogfer01.Nov 9 2017, 11:49 PM

The change itself looks good, I only have one question to clarify a part of the code, but otherwise seems fine.

I agree with Kristof that this could span across more than just overflow, but it can be done in later patches.

The benefits are clear, even from the few tests that changed, but it would be good to have a simple test-suite run in benchmark mode, just to make sure we're not creating any new pathological case on random programs.

cheers,
--renato

lib/Target/ARM/ARMBaseInstrInfo.cpp
2731 ↗	(On Diff #117225)	Can you elaborate on this change?

The benefits are clear, even from the few tests that changed, but it would be good to have a simple test-suite run in benchmark mode, just to make sure we're not creating any new pathological case on random programs.

Are there instructions for how to run lnt remotely? I can run it locally, but since I'm on an x86 machine that won't help test this.

lib/Target/ARM/ARMBaseInstrInfo.cpp
2731 ↗	(On Diff #117225)	Sure. In at least one of my testcases (the first one in su-addsub-overflow.ll), I have a basic block with the following: %vreg0 = ADDrr %vreg2, %vreg3 CMPrr %vreg0, %vreg2 Here, the compare is CmpInstr and the ADDrr is MI. But MI gets set to nullptr a little above this because SrcReg2 != 0. Without this line here, MI and SubAdd would both be nullptr, and so we'd do nothing. But I want to allow what was MI to be SubAdd, since if it's a sub or add it can potentially replace the compare. E was initialized to be MI, and the loop just below looks for SubAdd by walking backwards up to E, so I decrement E to allow it to consider MI. Does that clear it up? Should I add some of that to this comment?

Just rebasing this on top of the latest changes.

rogfer01 added inline comments.Nov 14 2017, 12:08 AM

lib/Target/ARM/ARMBaseInstrInfo.cpp
2741–2745 ↗	(On Diff #122742)	It is not clear to me why you had to swap this check with the one below? Is it related to the `--E;` in line 2731?
2677–2679 ↗	(On Diff #117225)	Maybe I'm reading it wrong: should this comment say something like `the other is a SUB or ADD instruction`?

jgalenson added inline comments.Nov 14 2017, 9:35 AM

lib/Target/ARM/ARMBaseInstrInfo.cpp
2741–2745 ↗	(On Diff #122742)	Not really. I believe the reason was that if the instruction I modifies CSPR, nothing before it can replace the compare, but it still can itself. Swapping these two checks allowed that case. However, when I undo this part of the change locally, my tests still pass. I'm pretty sure I needed this before, but I guess something changed and I don't need it any longer. So I can revert this piece if you'd like, although I still think the new way is better in theory.
2677–2679 ↗	(On Diff #117225)	No, I'd missed that. Thanks for pointing it out.

Update comment that I'd missed.

Are there any other comments?

rogfer01 added inline comments.Nov 29 2017, 10:14 AM

lib/Target/ARM/ARMBaseInstrInfo.cpp
2900 ↗	(On Diff #122860)	Sorry, I fail to see where are you using this new function.
2741–2745 ↗	(On Diff #122742)	Thanks for the clarification, I'd leave it as it was unless it is easy to add a test that proves that this is certainly better for some cases but maybe was some interaction with my change in D35192 (while it was in trunk).

jgalenson added inline comments.Nov 29 2017, 10:19 AM

lib/Target/ARM/ARMBaseInstrInfo.cpp
2900 ↗	(On Diff #122860)	I'm not calling it anywhere, but it's overriding a virtual method method in TargetInstrInfo.h that controls whether or not MachineSink should sink the given instruction.
2741–2745 ↗	(On Diff #122742)	Sure, I'll revert this part of the change.

I've now reverted the part of the change mentioned in the last couple comments.

jgalenson added a child revision: D40922: [ARM] Optimize {s|u}mul.with.overflow..Dec 6 2017, 2:54 PM

This looks sensible to me. Extend the comment in that --E; above at line 2732. If @efriedma or @rengolin do not have further comments I believe this is ready to land.

lib/Target/ARM/ARMBaseInstrInfo.cpp
2900 ↗	(On Diff #122860)	Ah! Right. I hadn't noticed the `override` keyword above. Sorry for the fuss.
test/CodeGen/ARM/su-addsub-overflow.ll
124 ↗	(On Diff #124784)	I understand this is a "sanity check"-like test and your change does not include new code to avoid this problem (all the machinery was already there). Is this correct?

Thanks for the comments. I'll update the patch in a minute.

test/CodeGen/ARM/su-addsub-overflow.ll
124 ↗	(On Diff #124784)	Yes, mostly. I added this test in my second version of this patch after seeing a bug in the first version. I had initially made an incorrect change, and I reverted that part of it and added this test. (You can diff my first two versions of this patch against each other to see the details.) So this patch doesn't contain new code to avoid this; it just makes sure I don't make that mistake again.

Update the comment above the E--.

Now that D35192 has re-landed I need to update this patch.

Note that I needed to bring back the swapped conditions that I had before to get the tests to pass. It looks like that was in fact related to D35192.

Are there any more comments, or does this look good?

lib/Target/ARM/ARMBaseInstrInfo.cpp
2914 ↗	(On Diff #126605)	I initially decided to look only at the preceding instruction to avoid the potential quadratic behavior of scanning the whole block. I just now gathered some data by compiling a few random files with this and a version that walks as far back as it can to look for a compare. In a few cases that allows us to optimize slightly more compares, but not many, and in most cases there's no difference. Thus given the extra compile time overhead and the chance that it will prevent us from sinking instructions we want to sink, keeping this heuristic for now seems good to me. Does that seem reasonable?

Friendly new year's ping. Are there any other comments?

LGTM. Wait a couple of days before commiting just in case @efriedma has further comments.

Thanks!

This revision is now accepted and ready to land.Jan 12 2018, 3:50 AM

Closed by commit rL322737: [ARM] Optimize {s,u}{add,sub}.with.overflow. (authored by jgalenson). · Explain WhyJan 17 2018, 11:21 AM

This revision was automatically updated to reflect the committed changes.

Hi,

The patch caused regressions in LNT benchmarks on Cortex-A9:

SingleSource/Benchmarks/McGill/chomp: 12.79%
MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow: 9.00%

There are also ~10% regressions in our benchmarks on Cortex-M4 and Cortex-M33 due to the patch.

I'm analysing them.

Thanks,
Evgeny Astigeevich

In SingleSource/Benchmarks/McGill/chomp the patch causes generation of subs+cmp_with_0 instead of only a subs.

Thanks for the report. I think I can reproduce it on McGill. I'm looking into it now.

Thank you, Joel. The function to look at is make_list.

The example I found is in equal_data, which I assume is being inlined in make_list.

I have a (one-line) fix for that, and I'm trying to run the test suite to see what else it improves. However, I'm not seeing any changes in fourinarow, and I don't immediately see any obvious bad code in it. Do you happen to know where the regression in it was?

In D38378#980789, @jgalenson wrote:

The example I found is in equal_data, which I assume is being inlined in make_list.

I have a (one-line) fix for that, and I'm trying to run the test suite to see what else it improves. However, I'm not seeing any changes in fourinarow, and I don't immediately see any obvious bad code in it. Do you happen to know where the regression in it was?

In function 'value,' blocks where:

Before

it eq
orrseq.w r1, r3, r4
bne.n b936 <value+0x1c66>

After

itt eq
orreq.w r1, r3, r4
cmpeq r1, #0
bne.n b946 <value+0x1c76>

Before

ittt eq
ldreq.w r0, [sp, #1712] ; 0x6b0
ldreq.w r1, [sp, #1708] ; 0x6ac
orrseq.w r1, r1, r0
bne.n c7fa <value+0x2b2a>

After

ittt eq
ldreq.w r0, [sp, #1700] ; 0x6a4
orreq r0, r6
cmpeq r0, #0
bne.n c810 <value+0x2b40>

What command-line arguments are you using?

In D38378#980879, @jgalenson wrote:

What command-line arguments are you using?

-O3 -DNDEBUG -mcpu=cortex-a9 -mthumb -fomit-frame-pointer

In D38378#980891, @eastig wrote:

In D38378#980879, @jgalenson wrote:

What command-line arguments are you using?

-O3 -DNDEBUG -mcpu=cortex-a9 -mthumb -fomit-frame-pointer

Full command:

/home/llvm-test/aarch32-rL322735-rC322729-rX322703/bin/clang  -DNDEBUG -I/work/llvm-test-suite/MultiSource/Benchmarks/FreeBench/fourinarow -I/home/llvm-test/SANDBOX/test-2018-01-18_21-02-41/MultiSource/Benchmarks/FreeBench/fourinarow  -O3 -DNDEBUG -mcpu=cortex-a9 -fomit-frame-pointer -mthumb   -O3 -DNDEBUG   -w -Werror=date-time -DVERSION=\"1.00\" -DCOMPDATE=\"today\" -DCFLAGS=\"\" -DHOSTNAME=\"thishost\" -o CMakeFiles/fourinarow.dir/fourinarow.c.o   -c /work/llvm-test-suite/MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.c

Thanks!

I believe D42263 should fix that as well.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

ARM/

ARMBaseInstrInfo.h

2 lines

ARMBaseInstrInfo.cpp

97 lines

test/

CodeGen/

ARM/

intrinsics-overflow.ll

15 lines

su-addsub-overflow.ll

63 lines

Diff 130229

llvm/trunk/lib/Target/ARM/ARMBaseInstrInfo.h

Show First 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	public:
void loadRegFromStackSlot(MachineBasicBlock &MBB,		void loadRegFromStackSlot(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI,		MachineBasicBlock::iterator MBBI,
unsigned DestReg, int FrameIndex,		unsigned DestReg, int FrameIndex,
const TargetRegisterClass *RC,		const TargetRegisterClass *RC,
const TargetRegisterInfo *TRI) const override;		const TargetRegisterInfo *TRI) const override;

bool expandPostRAPseudo(MachineInstr &MI) const override;		bool expandPostRAPseudo(MachineInstr &MI) const override;

		bool shouldSink(const MachineInstr &MI) const override;

void reMaterialize(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,		void reMaterialize(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
unsigned DestReg, unsigned SubIdx,		unsigned DestReg, unsigned SubIdx,
const MachineInstr &Orig,		const MachineInstr &Orig,
const TargetRegisterInfo &TRI) const override;		const TargetRegisterInfo &TRI) const override;

MachineInstr &		MachineInstr &
duplicate(MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertBefore,		duplicate(MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertBefore,
const MachineInstr &Orig) const override;		const MachineInstr &Orig) const override;
▲ Show 20 Lines • Show All 333 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMBaseInstrInfo.cpp

Show First 20 Lines • Show All 2,528 Lines • ▼ Show 20 Lines	inline static ARMCC::CondCodes getSwappedCondition(ARMCC::CondCodes CC) {
case ARMCC::LS: return ARMCC::HS;		case ARMCC::LS: return ARMCC::HS;
case ARMCC::GE: return ARMCC::LE;		case ARMCC::GE: return ARMCC::LE;
case ARMCC::LT: return ARMCC::GT;		case ARMCC::LT: return ARMCC::GT;
case ARMCC::GT: return ARMCC::LT;		case ARMCC::GT: return ARMCC::LT;
case ARMCC::LE: return ARMCC::GE;		case ARMCC::LE: return ARMCC::GE;
}		}
}		}

		/// getCmpToAddCondition - assume the flags are set by CMP(a,b), return
		/// the condition code if we modify the instructions such that flags are
		/// set by ADD(a,b,X).
		inline static ARMCC::CondCodes getCmpToAddCondition(ARMCC::CondCodes CC) {
		switch (CC) {
		default: return ARMCC::AL;
		case ARMCC::HS: return ARMCC::LO;
		case ARMCC::LO: return ARMCC::HS;
		case ARMCC::VS: return ARMCC::VS;
		case ARMCC::VC: return ARMCC::VC;
		}
		}

/// isRedundantFlagInstr - check whether the first instruction, whose only		/// isRedundantFlagInstr - check whether the first instruction, whose only
/// purpose is to update flags, can be made redundant.		/// purpose is to update flags, can be made redundant.
/// CMPrr can be made redundant by SUBrr if the operands are the same.		/// CMPrr can be made redundant by SUBrr if the operands are the same.
/// CMPri can be made redundant by SUBri if the operands are the same.		/// CMPri can be made redundant by SUBri if the operands are the same.
		/// CMPrr(r0, r1) can be made redundant by ADDr[ri](r0, r1, X).
/// This function can be extended later on.		/// This function can be extended later on.
inline static bool isRedundantFlagInstr(MachineInstr *CmpI, unsigned SrcReg,		inline static bool isRedundantFlagInstr(const MachineInstr *CmpI,
unsigned SrcReg2, int ImmValue,		unsigned SrcReg, unsigned SrcReg2,
MachineInstr *OI) {		int ImmValue, const MachineInstr *OI) {
if ((CmpI->getOpcode() == ARM::CMPrr \|\|		if ((CmpI->getOpcode() == ARM::CMPrr \|\|
CmpI->getOpcode() == ARM::t2CMPrr) &&		CmpI->getOpcode() == ARM::t2CMPrr) &&
(OI->getOpcode() == ARM::SUBrr \|\|		(OI->getOpcode() == ARM::SUBrr \|\|
OI->getOpcode() == ARM::t2SUBrr) &&		OI->getOpcode() == ARM::t2SUBrr) &&
((OI->getOperand(1).getReg() == SrcReg &&		((OI->getOperand(1).getReg() == SrcReg &&
OI->getOperand(2).getReg() == SrcReg2) \|\|		OI->getOperand(2).getReg() == SrcReg2) \|\|
(OI->getOperand(1).getReg() == SrcReg2 &&		(OI->getOperand(1).getReg() == SrcReg2 &&
OI->getOperand(2).getReg() == SrcReg)))		OI->getOperand(2).getReg() == SrcReg)))
return true;		return true;

if ((CmpI->getOpcode() == ARM::CMPri \|\|		if ((CmpI->getOpcode() == ARM::CMPri \|\|
CmpI->getOpcode() == ARM::t2CMPri) &&		CmpI->getOpcode() == ARM::t2CMPri) &&
(OI->getOpcode() == ARM::SUBri \|\|		(OI->getOpcode() == ARM::SUBri \|\|
OI->getOpcode() == ARM::t2SUBri) &&		OI->getOpcode() == ARM::t2SUBri) &&
OI->getOperand(1).getReg() == SrcReg &&		OI->getOperand(1).getReg() == SrcReg &&
OI->getOperand(2).getImm() == ImmValue)		OI->getOperand(2).getImm() == ImmValue)
return true;		return true;

		if ((CmpI->getOpcode() == ARM::CMPrr \|\| CmpI->getOpcode() == ARM::t2CMPrr) &&
		(OI->getOpcode() == ARM::ADDrr \|\| OI->getOpcode() == ARM::t2ADDrr \|\|
		OI->getOpcode() == ARM::ADDri \|\| OI->getOpcode() == ARM::t2ADDri) &&
		OI->getOperand(0).isReg() && OI->getOperand(1).isReg() &&
		OI->getOperand(0).getReg() == SrcReg &&
		OI->getOperand(1).getReg() == SrcReg2)
		return true;
return false;		return false;
}		}

static bool isOptimizeCompareCandidate(MachineInstr *MI, bool &IsThumb1) {		static bool isOptimizeCompareCandidate(MachineInstr *MI, bool &IsThumb1) {
switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
default: return false;		default: return false;
case ARM::tLSLri:		case ARM::tLSLri:
case ARM::tLSRri:		case ARM::tLSRri:
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	bool ARMBaseInstrInfo::optimizeCompareInstr(
// Get ready to iterate backward from CmpInstr.		// Get ready to iterate backward from CmpInstr.
MachineBasicBlock::iterator I = CmpInstr, E = MI,		MachineBasicBlock::iterator I = CmpInstr, E = MI,
B = CmpInstr.getParent()->begin();		B = CmpInstr.getParent()->begin();

// Early exit if CmpInstr is at the beginning of the BB.		// Early exit if CmpInstr is at the beginning of the BB.
if (I == B) return false;		if (I == B) return false;

// There are two possible candidates which can be changed to set CPSR:		// There are two possible candidates which can be changed to set CPSR:
// One is MI, the other is a SUB instruction.		// One is MI, the other is a SUB or ADD instruction.
// For CMPrr(r1,r2), we are looking for SUB(r1,r2) or SUB(r2,r1).		// For CMPrr(r1,r2), we are looking for SUB(r1,r2), SUB(r2,r1), or
		// ADDr[ri](r1, r2, X).
// For CMPri(r1, CmpValue), we are looking for SUBri(r1, CmpValue).		// For CMPri(r1, CmpValue), we are looking for SUBri(r1, CmpValue).
MachineInstr *Sub = nullptr;		MachineInstr *SubAdd = nullptr;
if (SrcReg2 != 0)		if (SrcReg2 != 0)
// MI is not a candidate for CMPrr.		// MI is not a candidate for CMPrr.
MI = nullptr;		MI = nullptr;
else if (MI->getParent() != CmpInstr.getParent() \|\| CmpValue != 0) {		else if (MI->getParent() != CmpInstr.getParent() \|\| CmpValue != 0) {
// Conservatively refuse to convert an instruction which isn't in the same		// Conservatively refuse to convert an instruction which isn't in the same
// BB as the comparison.		// BB as the comparison.
// For CMPri w/ CmpValue != 0, a Sub may still be a candidate.		// For CMPri w/ CmpValue != 0, a SubAdd may still be a candidate.
// Thus we cannot return here.		// Thus we cannot return here.
if (CmpInstr.getOpcode() == ARM::CMPri \|\|		if (CmpInstr.getOpcode() == ARM::CMPri \|\|
CmpInstr.getOpcode() == ARM::t2CMPri)		CmpInstr.getOpcode() == ARM::t2CMPri)
MI = nullptr;		MI = nullptr;
else		else
return false;		return false;
}		}

Show All 25 Lines	if (MI && IsThumb1) {
}		}
if (HasStmts && CanReorder) {		if (HasStmts && CanReorder) {
MI = MI->removeFromParent();		MI = MI->removeFromParent();
E = CmpInstr;		E = CmpInstr;
CmpInstr.getParent()->insert(E, MI);		CmpInstr.getParent()->insert(E, MI);
}		}
I = CmpInstr;		I = CmpInstr;
E = MI;		E = MI;
		} else {
		// Allow the loop below to search E (which was initially MI). Since MI and
		// SubAdd have different tests, even if that instruction could not be MI, it
		// could still potentially be SubAdd.
		--E;
}		}

// Check that CPSR isn't set between the comparison instruction and the one we		// Check that CPSR isn't set between the comparison instruction and the one we
// want to change. At the same time, search for Sub.		// want to change. At the same time, search for SubAdd.
const TargetRegisterInfo *TRI = &getRegisterInfo();		const TargetRegisterInfo *TRI = &getRegisterInfo();
--I;		--I;
for (; I != E; --I) {		for (; I != E; --I) {
const MachineInstr &Instr = *I;		const MachineInstr &Instr = *I;

		// Check whether CmpInstr can be made redundant by the current instruction.
		if (isRedundantFlagInstr(&CmpInstr, SrcReg, SrcReg2, CmpValue, &*I)) {
		SubAdd = &*I;
		break;
		}

if (Instr.modifiesRegister(ARM::CPSR, TRI) \|\|		if (Instr.modifiesRegister(ARM::CPSR, TRI) \|\|
Instr.readsRegister(ARM::CPSR, TRI))		Instr.readsRegister(ARM::CPSR, TRI))
// This instruction modifies or uses CPSR after the one we want to		// This instruction modifies or uses CPSR after the one we want to
// change. We can't do this transformation.		// change. We can't do this transformation.
return false;		return false;

// Check whether CmpInstr can be made redundant by the current instruction.
if (isRedundantFlagInstr(&CmpInstr, SrcReg, SrcReg2, CmpValue, &*I)) {
Sub = &*I;
break;
}

if (I == B)		if (I == B)
// The 'and' is below the comparison instruction.		// The 'and' is below the comparison instruction.
return false;		return false;
}		}

// Return false if no candidates exist.		// Return false if no candidates exist.
if (!MI && !Sub)		if (!MI && !SubAdd)
return false;		return false;

// The single candidate is called MI.		// The single candidate is called MI.
if (!MI) MI = Sub;		if (!MI) MI = SubAdd;

// We can't use a predicated instruction - it doesn't always write the flags.		// We can't use a predicated instruction - it doesn't always write the flags.
if (isPredicated(*MI))		if (isPredicated(*MI))
return false;		return false;

// Scan forward for the use of CPSR		// Scan forward for the use of CPSR
// When checking against MI: if it's a conditional code that requires		// When checking against MI: if it's a conditional code that requires
// checking of the V bit or C bit, then this is not safe to do.		// checking of the V bit or C bit, then this is not safe to do.
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	for (unsigned IO = 0, EO = Instr.getNumOperands();
CC = ARMCC::GE;		CC = ARMCC::GE;
break;		break;
case ARM::VSELVSS:		case ARM::VSELVSS:
case ARM::VSELVSD:		case ARM::VSELVSD:
CC = ARMCC::VS;		CC = ARMCC::VS;
break;		break;
}		}

if (Sub) {		if (SubAdd) {
ARMCC::CondCodes NewCC = getSwappedCondition(CC);
if (NewCC == ARMCC::AL)
return false;
// If we have SUB(r1, r2) and CMP(r2, r1), the condition code based		// If we have SUB(r1, r2) and CMP(r2, r1), the condition code based
// on CMP needs to be updated to be based on SUB.		// on CMP needs to be updated to be based on SUB.
		// If we have ADD(r1, r2, X) and CMP(r1, r2), the condition code also
		// needs to be modified.
// Push the condition code operands to OperandsToUpdate.		// Push the condition code operands to OperandsToUpdate.
// If it is safe to remove CmpInstr, the condition code of these		// If it is safe to remove CmpInstr, the condition code of these
// operands will be modified.		// operands will be modified.
if (SrcReg2 != 0 && Sub->getOperand(1).getReg() == SrcReg2 &&		unsigned Opc = SubAdd->getOpcode();
Sub->getOperand(2).getReg() == SrcReg) {		bool IsSub = Opc == ARM::SUBrr \|\| Opc == ARM::t2SUBrr \|\|
		Opc == ARM::SUBri \|\| Opc == ARM::t2SUBri;
		if (!IsSub \|\| (SrcReg2 != 0 && SubAdd->getOperand(1).getReg() == SrcReg2 &&
		SubAdd->getOperand(2).getReg() == SrcReg)) {
// VSel doesn't support condition code update.		// VSel doesn't support condition code update.
if (IsInstrVSel)		if (IsInstrVSel)
return false;		return false;
		// Ensure we can swap the condition.
		ARMCC::CondCodes NewCC = (IsSub ? getSwappedCondition(CC) : getCmpToAddCondition(CC));
		if (NewCC == ARMCC::AL)
		return false;
OperandsToUpdate.push_back(		OperandsToUpdate.push_back(
std::make_pair(&((*I).getOperand(IO - 1)), NewCC));		std::make_pair(&((*I).getOperand(IO - 1)), NewCC));
}		}
} else {		} else {
// No Sub, so this is x = <op> y, z; cmp x, 0.		// No SubAdd, so this is x = <op> y, z; cmp x, 0.
switch (CC) {		switch (CC) {
case ARMCC::EQ: // Z		case ARMCC::EQ: // Z
case ARMCC::NE: // Z		case ARMCC::NE: // Z
case ARMCC::MI: // N		case ARMCC::MI: // N
case ARMCC::PL: // N		case ARMCC::PL: // N
case ARMCC::AL: // none		case ARMCC::AL: // none
// CPSR can be used multiple times, we should continue.		// CPSR can be used multiple times, we should continue.
break;		break;
Show All 37 Lines	bool ARMBaseInstrInfo::optimizeCompareInstr(
// Since we have SUB(r1, r2) and CMP(r2, r1), the condition code needs to		// Since we have SUB(r1, r2) and CMP(r2, r1), the condition code needs to
// be changed from r2 > r1 to r1 < r2, from r2 < r1 to r1 > r2, etc.		// be changed from r2 > r1 to r1 < r2, from r2 < r1 to r1 > r2, etc.
for (unsigned i = 0, e = OperandsToUpdate.size(); i < e; i++)		for (unsigned i = 0, e = OperandsToUpdate.size(); i < e; i++)
OperandsToUpdate[i].first->setImm(OperandsToUpdate[i].second);		OperandsToUpdate[i].first->setImm(OperandsToUpdate[i].second);

return true;		return true;
}		}

		bool ARMBaseInstrInfo::shouldSink(const MachineInstr &MI) const {
		// Do not sink MI if it might be used to optimize a redundant compare.
		// We heuristically only look at the instruction immediately following MI to
		// avoid potentially searching the entire basic block.
		if (isPredicated(MI))
		return true;
		MachineBasicBlock::const_iterator Next = &MI;
		++Next;
		unsigned SrcReg, SrcReg2;
		int CmpMask, CmpValue;
		if (Next != MI.getParent()->end() &&
		analyzeCompare(*Next, SrcReg, SrcReg2, CmpMask, CmpValue) &&
		isRedundantFlagInstr(&*Next, SrcReg, SrcReg2, CmpValue, &MI))
		return false;
		return true;
		}

bool ARMBaseInstrInfo::FoldImmediate(MachineInstr &UseMI, MachineInstr &DefMI,		bool ARMBaseInstrInfo::FoldImmediate(MachineInstr &UseMI, MachineInstr &DefMI,
unsigned Reg,		unsigned Reg,
MachineRegisterInfo *MRI) const {		MachineRegisterInfo *MRI) const {
// Fold large immediates into add, sub, or, xor.		// Fold large immediates into add, sub, or, xor.
unsigned DefOpc = DefMI.getOpcode();		unsigned DefOpc = DefMI.getOpcode();
if (DefOpc != ARM::t2MOVi32imm && DefOpc != ARM::MOVi32imm)		if (DefOpc != ARM::t2MOVi32imm && DefOpc != ARM::MOVi32imm)
return false;		return false;
if (!DefMI.getOperand(1).isImm())		if (!DefMI.getOperand(1).isImm())
▲ Show 20 Lines • Show All 2,042 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/intrinsics-overflow.ll

	Show All 27 Lines
	define i32 @sadd_overflow(i32 %a, i32 %b) #0 {			define i32 @sadd_overflow(i32 %a, i32 %b) #0 {
	%sadd = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)			%sadd = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)
	%1 = extractvalue { i32, i1 } %sadd, 1			%1 = extractvalue { i32, i1 } %sadd, 1
	%2 = zext i1 %1 to i32			%2 = zext i1 %1 to i32
	ret i32 %2			ret i32 %2

	; CHECK-LABEL: sadd_overflow:			; CHECK-LABEL: sadd_overflow:

	; ARM: add r[[R2:[0-9]+]], r[[R0:[0-9]+]], r[[R1:[0-9]+]]			; ARM: adds r[[R2:[0-9]+]], r[[R0:[0-9]+]], r[[R1:[0-9]+]]
	; ARM: mov r[[R1]], #1			; ARM: mov r[[R0]], #1
	; ARM: cmp r[[R2]], r[[R0]]			; ARM: movvc r[[R0]], #0
	; ARM: movvc r[[R1]], #0			; ARM: mov pc, lr

	; THUMBV6: mov r[[R2:[0-9]+]], r[[R0:[0-9]+]]			; THUMBV6: mov r[[R2:[0-9]+]], r[[R0:[0-9]+]]
	; THUMBV6: adds r[[R3:[0-9]+]], r[[R2]], r[[R1:[0-9]+]]			; THUMBV6: adds r[[R3:[0-9]+]], r[[R2]], r[[R1:[0-9]+]]
	; THUMBV6: movs r[[R0]], #0			; THUMBV6: movs r[[R0]], #0
	; THUMBV6: movs r[[R1]], #1			; THUMBV6: movs r[[R1]], #1
	; THUMBV6: cmp r[[R3]], r[[R2]]			; THUMBV6: cmp r[[R3]], r[[R2]]
	; THUMBV6: bvc .L[[LABEL:.*]]			; THUMBV6: bvc .L[[LABEL:.*]]
	; THUMBV6: mov r[[R0]], r[[R1]]			; THUMBV6: mov r[[R0]], r[[R1]]
	; THUMBV6: .L[[LABEL]]:			; THUMBV6: .L[[LABEL]]:

	; THUMBV7: movs r[[R1]], #1			; THUMBV7: adds r[[R2:[0-9]+]], r[[R0]], r[[R1:[0-9]+]]
	; THUMBV7: cmp r[[R2]], r[[R0]]			; THUMBV7: mov.w r[[R0:[0-9]+]], #1
	; THUMBV7: it vc			; THUMBV7: it vc
	; THUMBV7: movvc r[[R1]], #0			; THUMBV7: movvc r[[R0]], #0
	; THUMBV7: mov r[[R0]], r[[R1]]
	}			}

	define i32 @usub_overflow(i32 %a, i32 %b) #0 {			define i32 @usub_overflow(i32 %a, i32 %b) #0 {
	%sadd = tail call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %a, i32 %b)			%sadd = tail call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %a, i32 %b)
	%1 = extractvalue { i32, i1 } %sadd, 1			%1 = extractvalue { i32, i1 } %sadd, 1
	%2 = zext i1 %1 to i32			%2 = zext i1 %1 to i32
	ret i32 %2			ret i32 %2

	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/su-addsub-overflow.ll

	; RUN: llc < %s -mtriple=arm-eabi -mcpu=generic \| FileCheck %s			; RUN: llc < %s -mtriple=arm-eabi -mcpu=generic \| FileCheck %s

	define i32 @sadd(i32 %a, i32 %b) local_unnamed_addr #0 {			define i32 @sadd(i32 %a, i32 %b) local_unnamed_addr #0 {
	; CHECK-LABEL: sadd:			; CHECK-LABEL: sadd:
	; CHECK: mov r[[R0:[0-9]+]], r0			; CHECK: adds r0, r0, r1
	; CHECK-NEXT: add r[[R1:[0-9]+]], r[[R0]], r1
	; CHECK-NEXT: cmp r[[R1]], r[[R0]]
	; CHECK-NEXT: movvc pc, lr			; CHECK-NEXT: movvc pc, lr
	entry:			entry:
	%0 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)			%0 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)
	%1 = extractvalue { i32, i1 } %0, 1			%1 = extractvalue { i32, i1 } %0, 1
	br i1 %1, label %trap, label %cont			br i1 %1, label %trap, label %cont

	trap:			trap:
	tail call void @llvm.trap() #2			tail call void @llvm.trap() #2
	unreachable			unreachable

	cont:			cont:
	%2 = extractvalue { i32, i1 } %0, 0			%2 = extractvalue { i32, i1 } %0, 0
	ret i32 %2			ret i32 %2

	}			}

	define i32 @uadd(i32 %a, i32 %b) local_unnamed_addr #0 {			define i32 @uadd(i32 %a, i32 %b) local_unnamed_addr #0 {
	; CHECK-LABEL: uadd:			; CHECK-LABEL: uadd:
	; CHECK: mov r[[R0:[0-9]+]], r0			; CHECK: adds r0, r0, r1
	; CHECK-NEXT: adds r[[R1:[0-9]+]], r[[R0]], r1			; CHECK-NEXT: movlo pc, lr
	; CHECK-NEXT: cmp r[[R1]], r[[R0]]
	; CHECK-NEXT: movhs pc, lr
	entry:			entry:
	%0 = tail call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %a, i32 %b)			%0 = tail call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %a, i32 %b)
	%1 = extractvalue { i32, i1 } %0, 1			%1 = extractvalue { i32, i1 } %0, 1
	br i1 %1, label %trap, label %cont			br i1 %1, label %trap, label %cont

	trap:			trap:
	tail call void @llvm.trap() #2			tail call void @llvm.trap() #2
	unreachable			unreachable

	cont:			cont:
	%2 = extractvalue { i32, i1 } %0, 0			%2 = extractvalue { i32, i1 } %0, 0
	ret i32 %2			ret i32 %2

	}			}

	define i32 @ssub(i32 %a, i32 %b) local_unnamed_addr #0 {			define i32 @ssub(i32 %a, i32 %b) local_unnamed_addr #0 {
	; CHECK-LABEL: ssub:			; CHECK-LABEL: ssub:
	; CHECK: cmp r0, r1			; CHECK: subs r0, r0, r1
	; CHECK-NEXT: subvc r0, r0, r1
	; CHECK-NEXT: movvc pc, lr			; CHECK-NEXT: movvc pc, lr
	entry:			entry:
	%0 = tail call { i32, i1 } @llvm.ssub.with.overflow.i32(i32 %a, i32 %b)			%0 = tail call { i32, i1 } @llvm.ssub.with.overflow.i32(i32 %a, i32 %b)
	%1 = extractvalue { i32, i1 } %0, 1			%1 = extractvalue { i32, i1 } %0, 1
	br i1 %1, label %trap, label %cont			br i1 %1, label %trap, label %cont

	trap:			trap:
	tail call void @llvm.trap() #2			tail call void @llvm.trap() #2
	unreachable			unreachable

	cont:			cont:
	%2 = extractvalue { i32, i1 } %0, 0			%2 = extractvalue { i32, i1 } %0, 0
	ret i32 %2			ret i32 %2

	}			}

	define i32 @usub(i32 %a, i32 %b) local_unnamed_addr #0 {			define i32 @usub(i32 %a, i32 %b) local_unnamed_addr #0 {
	; CHECK-LABEL: usub:			; CHECK-LABEL: usub:
	; CHECK: mov r[[R0:[0-9]+]], r0			; CHECK: subs r0, r0, r1
	; CHECK-NEXT: subs r[[R1:[0-9]+]], r[[R0]], r1
	; CHECK-NEXT: cmp r[[R0]], r1
	; CHECK-NEXT: movhs pc, lr			; CHECK-NEXT: movhs pc, lr
	entry:			entry:
	%0 = tail call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %a, i32 %b)			%0 = tail call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %a, i32 %b)
	%1 = extractvalue { i32, i1 } %0, 1			%1 = extractvalue { i32, i1 } %0, 1
	br i1 %1, label %trap, label %cont			br i1 %1, label %trap, label %cont

	trap:			trap:
	tail call void @llvm.trap() #2			tail call void @llvm.trap() #2
	unreachable			unreachable

	cont:			cont:
	%2 = extractvalue { i32, i1 } %0, 0			%2 = extractvalue { i32, i1 } %0, 0
	ret i32 %2			ret i32 %2

	}			}

	define void @sum(i32* %a, i32* %b, i32 %n) local_unnamed_addr #0 {			define void @sum(i32* %a, i32* %b, i32 %n) local_unnamed_addr #0 {
	; CHECK-LABEL: sum:			; CHECK-LABEL: sum:
	; CHECK: ldr [[R0:r[0-9]+]],			; CHECK: ldr [[R0:r[0-9]+]],
	; CHECK-NEXT: ldr [[R1:r[0-9]+\|lr]],			; CHECK-NEXT: ldr [[R1:r[0-9]+\|lr]],
	; CHECK-NEXT: add [[R2:r[0-9]+]], [[R1]], [[R0]]			; CHECK-NEXT: adds [[R2:r[0-9]+]], [[R1]], [[R0]]
	; CHECK-NEXT: cmp [[R2]], [[R1]]
	; CHECK-NEXT: strvc [[R2]],			; CHECK-NEXT: strvc [[R2]],
	; CHECK-NEXT: addvc			; CHECK-NEXT: addsvc
	; CHECK-NEXT: cmpvc
	; CHECK-NEXT: bvs			; CHECK-NEXT: bvs
	entry:			entry:
	%cmp7 = icmp eq i32 %n, 0			%cmp7 = icmp eq i32 %n, 0
	br i1 %cmp7, label %for.cond.cleanup, label %for.body			br i1 %cmp7, label %for.cond.cleanup, label %for.body

	for.cond.cleanup:			for.cond.cleanup:
	ret void			ret void

	Show All 20 Lines

	cont2:			cont2:
	%7 = extractvalue { i32, i1 } %5, 0			%7 = extractvalue { i32, i1 } %5, 0
	%cmp = icmp eq i32 %7, %n			%cmp = icmp eq i32 %7, %n
	br i1 %cmp, label %for.cond.cleanup, label %for.body			br i1 %cmp, label %for.cond.cleanup, label %for.body

	}			}

				define void @extern_loop(i32 %n) local_unnamed_addr #0 {
				; Do not replace the compare around the clobbering call.
				; CHECK: add {{r[0-9]+}}, {{r[0-9]+}}, #1
				; CHECK-NEXT: bl external_fn
				; CHECK: cmp
				entry:
				%0 = tail call { i32, i1 } @llvm.ssub.with.overflow.i32(i32 %n, i32 1)
				%1 = extractvalue { i32, i1 } %0, 1
				br i1 %1, label %trap, label %cont.lr.ph

				cont.lr.ph:
				%2 = extractvalue { i32, i1 } %0, 0
				%cmp5 = icmp sgt i32 %2, 0
				br i1 %cmp5, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				br label %for.body

				trap:
				tail call void @llvm.trap() #2
				unreachable

				for.cond.cleanup:
				ret void

				for.body:
				%i.046 = phi i32 [ %5, %cont1 ], [ 0, %for.body.preheader ]
				tail call void bitcast (void (...)* @external_fn to void ()*)() #4
				%3 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %i.046, i32 1)
				%4 = extractvalue { i32, i1 } %3, 1
				br i1 %4, label %trap, label %cont1

				cont1:
				%5 = extractvalue { i32, i1 } %3, 0
				%cmp = icmp slt i32 %5, %2
				br i1 %cmp, label %for.body, label %for.cond.cleanup
				}

				declare void @external_fn(...) local_unnamed_addr #0

	declare void @llvm.trap() #2			declare void @llvm.trap() #2
	declare { i32, i1 } @llvm.sadd.with.overflow.i32(i32, i32) #1			declare { i32, i1 } @llvm.sadd.with.overflow.i32(i32, i32) #1
	declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32) #1			declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32) #1
	declare { i32, i1 } @llvm.ssub.with.overflow.i32(i32, i32) #1			declare { i32, i1 } @llvm.ssub.with.overflow.i32(i32, i32) #1
	declare { i32, i1 } @llvm.usub.with.overflow.i32(i32, i32) #1			declare { i32, i1 } @llvm.usub.with.overflow.i32(i32, i32) #1

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Optimize {s,u}{add,sub}.with.overflow.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 130229

llvm/trunk/lib/Target/ARM/ARMBaseInstrInfo.h

llvm/trunk/lib/Target/ARM/ARMBaseInstrInfo.cpp

llvm/trunk/test/CodeGen/ARM/intrinsics-overflow.ll

llvm/trunk/test/CodeGen/ARM/su-addsub-overflow.ll

[ARM] Optimize {s,u}{add,sub}.with.overflow.
ClosedPublic