This is an archive of the discontinued LLVM Phabricator instance.

Allow rematerialization of ARM Thumb MOVi8 instruction in some contexts
Needs ReviewPublic

Authored by philipginsbach on Jun 6 2017, 5:13 AM.

Download Raw Diff

Details

Reviewers

jmolloy
wmi
qcolombet
MatzeB
javed.absar
aadg
john.brawn

Summary

Constants are crucial for code size in the ARM Thumb-1 instruction set.
The 16 bit instruction size often does not offer enough space for immediate arguments.
This means that additional instructions are frequently used to load constants into registers.
Since constants are hoisted, this can lead to significant register spillage if they are used multiple times in a single function.
This can be avoided by rematerialization, i.e. recomputing a constant instead of reloading it from the stack.
This patch extends the rematerialization functionality of llvm such that it is able to rematerialize ARM Thumb MOVSi8 instruction despite it setting the CPSR flags.

Diff Detail

Event Timeline

philipginsbach created this revision.Jun 6 2017, 5:13 AM

Herald added subscribers: kristof.beyls, javed.absar, qcolombet and 2 others. · View Herald TranscriptJun 6 2017, 5:13 AM

philipginsbach added reviewers: jmolloy, wmi.Jun 8 2017, 2:11 AM

philipginsbach added a reviewer: samparker.Jun 21 2017, 5:09 AM

philipginsbach updated this revision to Diff 105830.Jul 10 2017, 2:55 AM

Hi Philip,

This needs some tests.

cheers,
sam

samparker mentioned this in D33936: Allow rematerialization of ARM Thumb literal pool loads.Jul 13 2017, 3:57 AM

philipginsbach updated this revision to Diff 106393.Jul 13 2017, 4:06 AM

philipginsbach updated this revision to Diff 106397.Jul 13 2017, 4:18 AM

philipginsbach updated this revision to Diff 106426.Jul 13 2017, 7:43 AM

samparker added inline comments.Jul 14 2017, 2:46 AM

lib/CodeGen/LiveRangeEdit.cpp
162	For speed, I think you can iterate over just the defs instead of all the operands, they'll be fewer and it will negate the need to do the Reg and Def checks.
177	Allocatable isn't about liveliness, it's just whether the register can be used by the allocator generally, I'm guessing that this works here because it can't allocate the CPSR.
181	Can you not just query IsDead on the operand instead?

philipginsbach added inline comments.Jul 14 2017, 3:01 AM

lib/CodeGen/LiveRangeEdit.cpp
162	Agreed, I will fix this.
177	Yes, I'm aware of that. I discussed that part with James Molloy in quite some depth. From what I understand (and as I say in the comment there might be gaps), I can't rely on liveliness computation here, because further iterations of the register allocator might change the liveliness. Obviously this problem goes away if a register can't be used freely by the register allocator. Long story short: I'm quite certain the condition I check here is sufficient but it is probably way too strong.
181	I tried several different options here. If I recall correctly, what you propose only works for virtual registers.

samparker added inline comments.Jul 14 2017, 3:05 AM

lib/Target/ARM/ARMInstrThumb.td
1136	Can this also be used for the movs register variant?

philipginsbach added inline comments.Jul 14 2017, 3:06 AM

lib/CodeGen/LiveRangeEdit.cpp
177	The important thing is that the patch is targeted at CPSR, as you point out, which is not allocatable.

samparker added inline comments.Jul 14 2017, 3:11 AM

lib/CodeGen/LiveRangeEdit.cpp
177	Ok, definitely better to be conservative then.

philipginsbach added inline comments.Jul 14 2017, 3:14 AM

lib/Target/ARM/ARMInstrThumb.td
1136	The infrastructure that the patch sets up could be used for much more generic instructions, yes! Potentially, add, sub etc could be rematerialized the same way as well. Unfortunately it's pointless though. The problem is that the used registers would have to be preserved by chance from the first invocation to the second. The register allocator is not incentivized to make that happen, so I it's extremely unlikely to occur often, especially since the whole thing becomes relevant only in a high register pressure situation.

philipginsbach updated this revision to Diff 106606.Jul 14 2017, 3:27 AM

philipginsbach marked 2 inline comments as done.Jul 14 2017, 5:52 AM

Hi Philip,

Thanks for looking into extending the rematerialization support.

I believe the interference checking is too weak to actually work in all cases (see inline comments). More over, I think we cannot say this is still trivially rematerializable given now the callers of that method need to do interference checks. Thus I would suggest to use a new/different hook. In particular, I am not convinced the register coalescer will do the right thing if it rematerializes an instruction with more than one def.

Cheers,
-Quentin

include/llvm/Target/TargetInstrInfo.h
111 ↗	(On Diff #106606)	No else after return per LLVM coding standard.
lib/CodeGen/LiveRangeEdit.cpp
162	This is wrong. This misses all the implicit-def. > /// Returns a range over all explicit operands that are register definitions. /// Implicit definition are not included!
166	That doesn't seem right either. Indeed where is the check for virtregs? E.g., consider vreg0,vreg1 = loadimm ... vreg1 = something else /// <=== if you remat here you're going to break the next use of vreg1 vreg0 vreg1
test/CodeGen/Thumb/movi8remat.ll
1 ↗	(On Diff #106606)	Please use a .mir test. You can generate this with -stop-before greedy -simplify-mir

This revision now requires changes to proceed.Jul 14 2017, 2:31 PM

Hi Quentin,

Thanks a lot for the comments, that's really valuable feedback.
I will take some time to dig through those issues and get back to you.

Best,
Philip

Hi Quentin,

Let me follow up on your comments.

I renamed the member functions to isPotentiallyTriviallyReMaterializable and isReallyPotentiallyTriviallyReMaterializableGeneric.
This is somewhat grotesque but does bring across the functionality more clearly I think. Obviously I am open to more aesthetic suggestions.

The problem with implicit defs should be removed now and was not present in a previous version of the patch.
I didn't pay attention and was mislead by misleading documentation for MachineInstr::getNumOperands.

I think your concern for checking virtual register defs stems from isPotentiallyTriviallyReMaterializable not being very clear in it's purpose.
I hope the change to the return enum value improves this. Do you agree that this is sufficient or am I overlooking something crucial here?

Best,
Philip

include/llvm/Target/TargetInstrInfo.h
111 ↗	(On Diff #106606)	I will fix this.
lib/CodeGen/LiveRangeEdit.cpp
162	I was too eager to change this, I think the previous version should have been correct. There appears to be a mistake in the documentation for MachineInstr::getNumOperands, it says "Access to explicit operands of the instruction." However it includes the implicit operands.
166	I did not weaken the check for additional virtual registers in TargetInstrInfo::isReallyTriviallyReMaterializableGeneric. What you describe would still be filtered out by that function (i.e. return Rematerializability::NO). The only way in that it is more lenient than before is that is allows additional explicit physical register defs. Arguably the behavior that you imply (allowing additional virtreg defs) would be useful and more obvious but at the core this patch is concerned with the movi8 instruction and the current change is enough to rematerialize that. I will change the enum item YES_BUT_EXTRA_DEFS to YES_BUT_EXTRA_PHYSREG_DEFS to make the behavior of the function more obvious.
test/CodeGen/Thumb/movi8remat.ll
1 ↗	(On Diff #106606)	Ok, I will change it to a .mir test, thanks for providing the options.

philipginsbach updated this revision to Diff 106854.Jul 17 2017, 5:31 AM

philipginsbach edited edge metadata.

samparker removed a reviewer: samparker.Jul 11 2018, 7:12 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptJul 11 2018, 7:12 AM

Herald added a subscriber: chrib. · View Herald Transcript

chill added a subscriber: chill.Jul 25 2018, 6:02 AM

I have rebased onto the master branch at 244c796c894f50fb53f9dbe7627702661dfe69c2.

Quentin, you gave many useful suggestions before, what do you think is the way forward with this?

philipginsbach edited reviewers, added: MatzeB; removed: javed.absar.Oct 9 2018, 10:09 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptOct 9 2018, 10:09 AM

samparker added a subscriber: samparker.Oct 10 2018, 8:18 AM

dmgreen added a subscriber: dmgreen.Oct 10 2018, 8:27 AM

I ran some benchmarks for thumb1, they look great.

zzheng added inline comments.Oct 10 2018, 4:09 PM

lib/CodeGen/LiveRangeEdit.cpp
165	shouldn't we use auto type and range-based loop now?

Uses range based loop now.

philipginsbach marked an inline comment as done.Oct 12 2018, 5:58 AM

philipginsbach added inline comments.

lib/CodeGen/LiveRangeEdit.cpp
165	Thanks, I fixed it.

philipginsbach updated this revision to Diff 169382.Oct 12 2018, 5:59 AM

philipginsbach marked an inline comment as done.

philipginsbach added a reviewer: aadg.Oct 12 2018, 6:01 AM

I think I have addressed all your previous concerns, @qcolombet, is there anything else you'd like to see clarified?
Sam Parker suggested I add you as a reviewer, @MatzeB, do you have any suggestions?

Sorry to be pushy, this patch has been lying around for a while and it would be quite satisfying to get it through the process.

Hi,

Sorry for the delay, I haven’t had time to review the patch but I have one high level comment.
I am not sure the enum approach is desirable as it basically hard code the kind of constraints we can report and will grow very quickly if we want to extend it.
For instance, let say that on top of physreg defs we want to report virtual reg defs, now we would need “yes”, “no”, “yes_phys”, “yes_virt”, “yes_phys_virt” and the list grows with the cross product of everything we may want/need to report.

Anyhow, I’ll look closer at the patch to give more concrete feedbacks.

Cheers,
Q

I am not sure the enum approach is desirable as it basically hard code the kind of constraints we can report and will grow very quickly if we want to extend it.
For instance, let say that on top of physreg defs we want to report virtual reg defs, now we would need “yes”, “no”, “yes_phys”, “yes_virt”, “yes_phys_virt” and the list grows with the cross product of everything we may want/need to report.

Hi Quentin,

Thanks a lot for looking at this again.

I understand your concern, the easiest way of tackling it would probably be some kind of bitfield with flags for the different kind of (potential) extensions of the unrestricted rematerializability property.

However, I would vote for that to be introduced when the problem actually arises. It isn't so obvious to me that many more generalizations of rematerializability will be necessary and the path from enum to bitfield is easy.

What do you think?

Cheers,
Philip

However, I would vote for that to be introduced when the problem actually arises. It isn't so obvious to me that many more generalizations of rematerializability will be necessary and the path from enum to bitfield is easy.

What do you think?

I agree, moving with incremental steps is the right thing to do.
The question is more does this step go in the direction we want? (I am not saying it doesn't, I just haven't looked closer enough to have an opinion :))

Now, I just wanted to point out that there are more opportunities to generalize the rematerialization. The obvious one to me is rematerializing everything (like pulling chain of computations, full instructions (with both definitions and arguments)) and in that sense, only the instruction or chain of instructions carry the right level of information. Right now we often introduce pseudo instruction to work around this limitation and that's what I would like to solve.

Anyway, I'll take a closer look at the patch.

Now, I just wanted to point out that there are more opportunities to generalize the rematerialization. The obvious one to me is rematerializing everything (like pulling chain of computations, full instructions (with both definitions and arguments)) and in that sense, only the instruction or chain of instructions carry the right level of information. Right now we often introduce pseudo instruction to work around this limitation and that's what I would like to solve.

It's true, rematerializing a whole chain of instructions would require a more thorough approach.
I think it's important to be clear about motivations beforehand though: this patch was done to optimize code size for ARM Thumb. Since spilling and reloading takes only two instructions, rematerializing even chains of two instructions is pointless from that perspective. Obviously, there are other metrics to consider, but I think it would be important nonetheless to establish what exactly is to gain.

dmgreen mentioned this in D53453: [ARM] Make InstrEmitter mark CPSR defs dead for Thumb1..Oct 22 2018, 3:08 AM

philipginsbach updated this revision to Diff 172970.Nov 7 2018, 9:19 AM

john.brawn resigned from this revision.May 12 2020, 6:46 AM

Herald added subscribers: danielkiss, asbirlea. · View Herald TranscriptMay 12 2020, 6:46 AM

Revision Contents

Path

Size

include/

llvm/

CodeGen/

TargetInstrInfo.h

36 lines

lib/

CodeGen/

3 lines

36 lines

6 lines

33 lines

Target/

ARM/

ARMInstrThumb.td

2 lines

test/

CodeGen/

Thumb/

movi8remat.mir

117 lines

Diff 172970

include/llvm/CodeGen/TargetInstrInfo.h

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	public:
}		}

/// Given a machine instruction descriptor, returns the register		/// Given a machine instruction descriptor, returns the register
/// class constraint for OpNum, or NULL.		/// class constraint for OpNum, or NULL.
const TargetRegisterClass *getRegClass(const MCInstrDesc &MCID, unsigned OpNum,		const TargetRegisterClass *getRegClass(const MCInstrDesc &MCID, unsigned OpNum,
const TargetRegisterInfo *TRI,		const TargetRegisterInfo *TRI,
const MachineFunction &MF) const;		const MachineFunction &MF) const;

		/// An instruction that pollutes additional registers might still be
		/// rematerializable under the assumption that those registers aren't live.
		/// This is the purpose of YES_BUT_EXTRA_PHYSREG_DEFS.
		enum class Rematerializability
		{
		NO = 0,
		YES_BUT_EXTRA_PHYSREG_DEFS = 1,
		YES = 2
		};

/// Return true if the instruction is trivially rematerializable, meaning it		/// Return true if the instruction is trivially rematerializable, meaning it
/// has no side effects and requires no operands that aren't always available.		/// has no side effects and requires no operands that aren't always available.
/// This means the only allowed uses are constants and unallocatable physical		/// This means the only allowed uses are constants and unallocatable physical
/// registers so that the instructions result is independent of the place		/// registers so that the instructions result is independent of the place
/// in the function.		/// in the function.
bool isTriviallyReMaterializable(const MachineInstr &MI,		bool isTriviallyReMaterializable(const MachineInstr &MI,
AliasAnalysis *AA = nullptr) const {		AliasAnalysis *AA = nullptr) const {
return MI.getOpcode() == TargetOpcode::IMPLICIT_DEF \|\|		return (isPotentiallyTriviallyReMaterializable(MI, AA)
(MI.getDesc().isRematerializable() &&		== Rematerializability::YES);
(isReallyTriviallyReMaterializable(MI, AA) \|\|		}
isReallyTriviallyReMaterializableGeneric(MI, AA)));
		/// More generic version of isTriviallyReMaterializable, report instructions
		/// with extra physreg defs that are rematerializable if the corresponding
		/// registers are dead.
		Rematerializability
		isPotentiallyTriviallyReMaterializable(const MachineInstr &MI,
		AliasAnalysis *AA = nullptr) const {
		if(MI.getOpcode() == TargetOpcode::IMPLICIT_DEF)
		return Rematerializability::YES;
		if(!MI.getDesc().isRematerializable())
		return Rematerializability::NO;
		if(isReallyTriviallyReMaterializable(MI, AA))
		return Rematerializability::YES;
		return isReallyPotentiallyTriviallyReMaterializableGeneric(MI, AA);
}		}

protected:		protected:
/// For instructions with opcodes for which the M_REMATERIALIZABLE flag is		/// For instructions with opcodes for which the M_REMATERIALIZABLE flag is
/// set, this hook lets the target specify whether the instruction is actually		/// set, this hook lets the target specify whether the instruction is actually
/// trivially rematerializable, taking into consideration its operands. This		/// trivially rematerializable, taking into consideration its operands. This
/// predicate must return false if the instruction has any side effects other		/// predicate must return false if the instruction has any side effects other
/// than producing a value, or if it requres any address registers that are		/// than producing a value, or if it requres any address registers that are
Show All 37 Lines	static bool fixCommutedOpIndices(unsigned &ResultIdx1, unsigned &ResultIdx2,
unsigned CommutableOpIdx1,		unsigned CommutableOpIdx1,
unsigned CommutableOpIdx2);		unsigned CommutableOpIdx2);

private:		private:
/// For instructions with opcodes for which the M_REMATERIALIZABLE flag is		/// For instructions with opcodes for which the M_REMATERIALIZABLE flag is
/// set and the target hook isReallyTriviallyReMaterializable returns false,		/// set and the target hook isReallyTriviallyReMaterializable returns false,
/// this function does target-independent tests to determine if the		/// this function does target-independent tests to determine if the
/// instruction is really trivially rematerializable.		/// instruction is really trivially rematerializable.
bool isReallyTriviallyReMaterializableGeneric(const MachineInstr &MI,		Rematerializability
		isReallyPotentiallyTriviallyReMaterializableGeneric(const MachineInstr &MI,
AliasAnalysis *AA) const;		AliasAnalysis *AA) const;

public:		public:
/// These methods return the opcode of the frame setup/destroy instructions		/// These methods return the opcode of the frame setup/destroy instructions
/// if they exist (-1 otherwise). Some targets use pseudo instructions in		/// if they exist (-1 otherwise). Some targets use pseudo instructions in
/// order to abstract away the difference between operating with a frame		/// order to abstract away the difference between operating with a frame
/// pointer and operating without, through the use of these two instructions.		/// pointer and operating without, through the use of these two instructions.
///		///
unsigned getCallFrameSetupOpcode() const { return CallFrameSetupOpcode; }		unsigned getCallFrameSetupOpcode() const { return CallFrameSetupOpcode; }
▲ Show 20 Lines • Show All 1,552 Lines • Show Last 20 Lines

lib/CodeGen/CalcSpillWeights.cpp

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	if (VRM) {
assert(VNI && "Copy from non-existing value");		assert(VNI && "Copy from non-existing value");
if (VNI->isPHIDef())		if (VNI->isPHIDef())
return false;		return false;
MI = LIS.getInstructionFromIndex(VNI->def);		MI = LIS.getInstructionFromIndex(VNI->def);
assert(MI && "Dead valno in interval");		assert(MI && "Dead valno in interval");
}		}
}		}

if (!TII.isTriviallyReMaterializable(*MI, LIS.getAliasAnalysis()))		if (TII.isPotentiallyTriviallyReMaterializable(*MI, LIS.getAliasAnalysis())
		== TargetInstrInfo::Rematerializability::NO)
return false;		return false;
}		}
return true;		return true;
}		}

void VirtRegAuxInfo::calculateSpillWeightAndHint(LiveInterval &li) {		void VirtRegAuxInfo::calculateSpillWeightAndHint(LiveInterval &li) {
float weight = weightCalcHelper(li);		float weight = weightCalcHelper(li);
// Check if unspillable.		// Check if unspillable.
▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines

lib/CodeGen/LiveRangeEdit.cpp

Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	unsigned LiveRangeEdit::createFrom(unsigned OldReg) {
return VReg;		return VReg;
}		}

bool LiveRangeEdit::checkRematerializable(VNInfo *VNI,		bool LiveRangeEdit::checkRematerializable(VNInfo *VNI,
const MachineInstr *DefMI,		const MachineInstr *DefMI,
AliasAnalysis *aa) {		AliasAnalysis *aa) {
assert(DefMI && "Missing instruction");		assert(DefMI && "Missing instruction");
ScannedRemattable = true;		ScannedRemattable = true;
if (!TII.isTriviallyReMaterializable(*DefMI, aa))		if (TII.isPotentiallyTriviallyReMaterializable(*DefMI, aa)
		== TargetInstrInfo::Rematerializability::NO)
return false;		return false;
Remattable.insert(VNI);		Remattable.insert(VNI);
return true;		return true;
}		}

void LiveRangeEdit::scanRemattable(AliasAnalysis *aa) {		void LiveRangeEdit::scanRemattable(AliasAnalysis *aa) {
for (VNInfo *VNI : getParent().valnos) {		for (VNInfo *VNI : getParent().valnos) {
if (VNI->isUnused())		if (VNI->isUnused())
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	bool LiveRangeEdit::canRematerializeAt(Remat &RM, VNInfo *OrigVNI,
SlotIndex DefIdx;		SlotIndex DefIdx;
assert(RM.OrigMI && "No defining instruction for remattable value");		assert(RM.OrigMI && "No defining instruction for remattable value");
DefIdx = LIS.getInstructionIndex(*RM.OrigMI);		DefIdx = LIS.getInstructionIndex(*RM.OrigMI);

// If only cheap remats were requested, bail out early.		// If only cheap remats were requested, bail out early.
if (cheapAsAMove && !TII.isAsCheapAsAMove(*RM.OrigMI))		if (cheapAsAMove && !TII.isAsCheapAsAMove(*RM.OrigMI))
return false;		return false;

		// The instruction passed the checkRematerializable criterions. Now
		// that we know the context, we need to make sure the instruction does
		// not def any additional live registers.
		if (TII.isPotentiallyTriviallyReMaterializable(*RM.OrigMI, nullptr)
		samparkerUnsubmitted Done Reply Inline Actions For speed, I think you can iterate over just the defs instead of all the operands, they'll be fewer and it will negate the need to do the Reg and Def checks. samparker: For speed, I think you can iterate over just the defs instead of all the operands, they'll be…
		philipginsbachAuthorUnsubmitted Done Reply Inline Actions Agreed, I will fix this. philipginsbach: Agreed, I will fix this.
		qcolombetUnsubmitted Not Done Reply Inline Actions This is wrong. This misses all the implicit-def. > /// Returns a range over all explicit operands that are register definitions. /// Implicit definition are not included! qcolombet: This is wrong. This misses all the implicit-def. => /// Returns a range over all explicit…
		philipginsbachAuthorUnsubmitted Not Done Reply Inline Actions I was too eager to change this, I think the previous version should have been correct. There appears to be a mistake in the documentation for MachineInstr::getNumOperands, it says "Access to explicit operands of the instruction." However it includes the implicit operands. philipginsbach: I was too eager to change this, I think the previous version should have been correct. There…
		== TargetInstrInfo::Rematerializability::YES_BUT_EXTRA_PHYSREG_DEFS)
		{
		for(auto MO : RM.OrigMI->operands()) {
		zzhengUnsubmitted Done Reply Inline Actions shouldn't we use auto type and range-based loop now? zzheng: shouldn't we use auto type and range-based loop now?
		philipginsbachAuthorUnsubmitted Not Done Reply Inline Actions Thanks, I fixed it. philipginsbach: Thanks, I fixed it.
		if(!MO.isReg() \|\| !MO.isDef()) continue;
		qcolombetUnsubmitted Not Done Reply Inline Actions That doesn't seem right either. Indeed where is the check for virtregs? E.g., consider vreg0,vreg1 = loadimm ... vreg1 = something else /// <=== if you remat here you're going to break the next use of vreg1 vreg0 vreg1 qcolombet: That doesn't seem right either. Indeed where is the check for virtregs? E.g., consider vreg0…
		philipginsbachAuthorUnsubmitted Not Done Reply Inline Actions I did not weaken the check for additional virtual registers in TargetInstrInfo::isReallyTriviallyReMaterializableGeneric. What you describe would still be filtered out by that function (i.e. return Rematerializability::NO). The only way in that it is more lenient than before is that is allows additional explicit physical register defs. Arguably the behavior that you imply (allowing additional virtreg defs) would be useful and more obvious but at the core this patch is concerned with the movi8 instruction and the current change is enough to rematerialize that. I will change the enum item YES_BUT_EXTRA_DEFS to YES_BUT_EXTRA_PHYSREG_DEFS to make the behavior of the function more obvious. philipginsbach: I did not weaken the check for additional virtual registers in TargetInstrInfo…
		unsigned Reg = MO.getReg();

		// Check for a well-behaved physical register.
		if (TargetRegisterInfo::isPhysicalRegister(Reg)) {
		// A physreg def. We need to make sure the register is dead.
		SlotIndexes *Indexes;
		MachineInstr *Instruction;
		MachineBasicBlock *BasicBlock;

		// The interaction with the register allocator isn't entirely clear
		// to me, so to be on the safe side, never assume registers to be
		samparkerUnsubmitted Not Done Reply Inline Actions Allocatable isn't about liveliness, it's just whether the register can be used by the allocator generally, I'm guessing that this works here because it can't allocate the CPSR. samparker: Allocatable isn't about liveliness, it's just whether the register can be used by the allocator…
		philipginsbachAuthorUnsubmitted Not Done Reply Inline Actions Yes, I'm aware of that. I discussed that part with James Molloy in quite some depth. From what I understand (and as I say in the comment there might be gaps), I can't rely on liveliness computation here, because further iterations of the register allocator might change the liveliness. Obviously this problem goes away if a register can't be used freely by the register allocator. Long story short: I'm quite certain the condition I check here is sufficient but it is probably way too strong. philipginsbach: Yes, I'm aware of that. I discussed that part with James Molloy in quite some depth. From what…
		philipginsbachAuthorUnsubmitted Not Done Reply Inline Actions The important thing is that the patch is targeted at CPSR, as you point out, which is not allocatable. philipginsbach: The important thing is that the patch is targeted at CPSR, as you point out, which is not…
		samparkerUnsubmitted Not Done Reply Inline Actions Ok, definitely better to be conservative then. samparker: Ok, definitely better to be conservative then.
		// dead if they are allocatable.
		if (MRI.isAllocatable(Reg) \|\|
		!(Indexes = LIS.getSlotIndexes()) \|\|
		!(Instruction = Indexes->getInstructionFromIndex(UseIdx)) \|\|
		samparkerUnsubmitted Not Done Reply Inline Actions Can you not just query IsDead on the operand instead? samparker: Can you not just query IsDead on the operand instead?
		philipginsbachAuthorUnsubmitted Not Done Reply Inline Actions I tried several different options here. If I recall correctly, what you propose only works for virtual registers. philipginsbach: I tried several different options here. If I recall correctly, what you propose only works for…
		!(BasicBlock = Instruction->getParent()) \|\|
		(BasicBlock->computeRegisterLiveness(
		MRI.getTargetRegisterInfo(), Reg, Instruction)
		!= MachineBasicBlock::LivenessQueryResult::LQR_Dead)) {
		return false;
		}
		}
		}
		}

// Verify that all used registers are available with the same values.		// Verify that all used registers are available with the same values.
if (!allUsesAvailableAt(RM.OrigMI, DefIdx, UseIdx))		if (!allUsesAvailableAt(RM.OrigMI, DefIdx, UseIdx))
return false;		return false;

return true;		return true;
}		}

SlotIndex LiveRangeEdit::rematerializeAt(MachineBasicBlock &MBB,		SlotIndex LiveRangeEdit::rematerializeAt(MachineBasicBlock &MBB,
▲ Show 20 Lines • Show All 310 Lines • Show Last 20 Lines

lib/CodeGen/MachineLICM.cpp

Show First 20 Lines • Show All 1,207 Lines • ▼ Show 20 Lines	bool MachineLICMBase::IsProfitableToHoist(MachineInstr &MI) {
// Don't hoist a cheap instruction if it would create a copy in the loop.		// Don't hoist a cheap instruction if it would create a copy in the loop.
if (CheapInstr && CreatesCopy) {		if (CheapInstr && CreatesCopy) {
LLVM_DEBUG(dbgs() << "Won't hoist cheap instr with loop PHI use: " << MI);		LLVM_DEBUG(dbgs() << "Won't hoist cheap instr with loop PHI use: " << MI);
return false;		return false;
}		}

// Rematerializable instructions should always be hoisted since the register		// Rematerializable instructions should always be hoisted since the register
// allocator can just pull them down again when needed.		// allocator can just pull them down again when needed.
if (TII->isTriviallyReMaterializable(MI, AA))		if (TII->isPotentiallyTriviallyReMaterializable(MI, AA)
		!= TargetInstrInfo::Rematerializability::NO)
return true;		return true;

// FIXME: If there are long latency loop-invariant instructions inside the		// FIXME: If there are long latency loop-invariant instructions inside the
// loop at this point, why didn't the optimizer's LICM hoist them?		// loop at this point, why didn't the optimizer's LICM hoist them?
for (unsigned i = 0, e = MI.getDesc().getNumOperands(); i != e; ++i) {		for (unsigned i = 0, e = MI.getDesc().getNumOperands(); i != e; ++i) {
const MachineOperand &MO = MI.getOperand(i);		const MachineOperand &MO = MI.getOperand(i);
if (!MO.isReg() \|\| MO.isImplicit())		if (!MO.isReg() \|\| MO.isImplicit())
continue;		continue;
Show All 36 Lines	bool MachineLICMBase::IsProfitableToHoist(MachineInstr &MI) {
if (AvoidSpeculation &&		if (AvoidSpeculation &&
(!IsGuaranteedToExecute(MI.getParent()) && !MayCSE(&MI))) {		(!IsGuaranteedToExecute(MI.getParent()) && !MayCSE(&MI))) {
LLVM_DEBUG(dbgs() << "Won't speculate: " << MI);		LLVM_DEBUG(dbgs() << "Won't speculate: " << MI);
return false;		return false;
}		}

// High register pressure situation, only hoist if the instruction is going		// High register pressure situation, only hoist if the instruction is going
// to be remat'ed.		// to be remat'ed.
if (!TII->isTriviallyReMaterializable(MI, AA) &&		if (TII->isPotentiallyTriviallyReMaterializable(MI, AA)
		== TargetInstrInfo::Rematerializability::NO &&
!MI.isDereferenceableInvariantLoad(AA)) {		!MI.isDereferenceableInvariantLoad(AA)) {
LLVM_DEBUG(dbgs() << "Can't remat / high reg-pressure: " << MI);		LLVM_DEBUG(dbgs() << "Can't remat / high reg-pressure: " << MI);
return false;		return false;
}		}

return true;		return true;
}		}

▲ Show 20 Lines • Show All 249 Lines • Show Last 20 Lines

lib/CodeGen/TargetInstrInfo.cpp

Show First 20 Lines • Show All 865 Lines • ▼ Show 20 Lines	default:
break;		break;
}		}

assert(Prev && "Unknown pattern for machine combiner");		assert(Prev && "Unknown pattern for machine combiner");

reassociateOps(Root, *Prev, Pattern, InsInstrs, DelInstrs, InstIdxForVirtReg);		reassociateOps(Root, *Prev, Pattern, InsInstrs, DelInstrs, InstIdxForVirtReg);
}		}

bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric(		TargetInstrInfo::Rematerializability
		TargetInstrInfo::isReallyPotentiallyTriviallyReMaterializableGeneric(
const MachineInstr &MI, AliasAnalysis *AA) const {		const MachineInstr &MI, AliasAnalysis *AA) const {
const MachineFunction &MF = *MI.getMF();		const MachineFunction &MF = *MI.getMF();
const MachineRegisterInfo &MRI = MF.getRegInfo();		const MachineRegisterInfo &MRI = MF.getRegInfo();

// Remat clients assume operand 0 is the defined register.		// Remat clients assume operand 0 is the defined register.
if (!MI.getNumOperands() \|\| !MI.getOperand(0).isReg())		if (!MI.getNumOperands() \|\| !MI.getOperand(0).isReg())
return false;		return Rematerializability::NO;
unsigned DefReg = MI.getOperand(0).getReg();		unsigned DefReg = MI.getOperand(0).getReg();

// A sub-register definition can only be rematerialized if the instruction		// A sub-register definition can only be rematerialized if the instruction
// doesn't read the other parts of the register. Otherwise it is really a		// doesn't read the other parts of the register. Otherwise it is really a
// read-modify-write operation on the full virtual register which cannot be		// read-modify-write operation on the full virtual register which cannot be
// moved safely.		// moved safely.
if (TargetRegisterInfo::isVirtualRegister(DefReg) &&		if (TargetRegisterInfo::isVirtualRegister(DefReg) &&
MI.getOperand(0).getSubReg() && MI.readsVirtualRegister(DefReg))		MI.getOperand(0).getSubReg() && MI.readsVirtualRegister(DefReg))
return false;		return Rematerializability::NO;

// A load from a fixed stack slot can be rematerialized. This may be		// A load from a fixed stack slot can be rematerialized. This may be
// redundant with subsequent checks, but it's target-independent,		// redundant with subsequent checks, but it's target-independent,
// simple, and a common case.		// simple, and a common case.
int FrameIdx = 0;		int FrameIdx = 0;
if (isLoadFromStackSlot(MI, FrameIdx) &&		if (isLoadFromStackSlot(MI, FrameIdx) &&
MF.getFrameInfo().isImmutableObjectIndex(FrameIdx))		MF.getFrameInfo().isImmutableObjectIndex(FrameIdx))
return true;		return Rematerializability::YES;

// Avoid instructions obviously unsafe for remat.		// Avoid instructions obviously unsafe for remat.
if (MI.isNotDuplicable() \|\| MI.mayStore() \|\| MI.hasUnmodeledSideEffects())		if (MI.isNotDuplicable() \|\| MI.mayStore() \|\| MI.hasUnmodeledSideEffects())
return false;		return Rematerializability::NO;

// Don't remat inline asm. We have no idea how expensive it is		// Don't remat inline asm. We have no idea how expensive it is
// even if it's side effect free.		// even if it's side effect free.
if (MI.isInlineAsm())		if (MI.isInlineAsm())
return false;		return Rematerializability::NO;

// Avoid instructions which load from potentially varying memory.		// Avoid instructions which load from potentially varying memory.
if (MI.mayLoad() && !MI.isDereferenceableInvariantLoad(AA))		if (MI.mayLoad() && !MI.isDereferenceableInvariantLoad(AA))
return false;		return Rematerializability::NO;

		// Track whether the instruction pollutes any additional registers.
		// if they are dead at the rematerialization location, it's still ok.
		bool AdditionalDefs = false;

// If any of the registers accessed are non-constant, conservatively assume		// If any of the registers accessed are non-constant, conservatively assume
// the instruction is not rematerializable.		// the instruction is not rematerializable.
for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {		for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
const MachineOperand &MO = MI.getOperand(i);		const MachineOperand &MO = MI.getOperand(i);
if (!MO.isReg()) continue;		if (!MO.isReg()) continue;
unsigned Reg = MO.getReg();		unsigned Reg = MO.getReg();
if (Reg == 0)		if (Reg == 0)
continue;		continue;

// Check for a well-behaved physical register.		// Check for a well-behaved physical register.
if (TargetRegisterInfo::isPhysicalRegister(Reg)) {		if (TargetRegisterInfo::isPhysicalRegister(Reg)) {
if (MO.isUse()) {		if (MO.isUse()) {
// If the physreg has no defs anywhere, it's just an ambient register		// If the physreg has no defs anywhere, it's just an ambient register
// and we can freely move its uses. Alternatively, if it's allocatable,		// and we can freely move its uses. Alternatively, if it's allocatable,
// it could get allocated to something with a def during allocation.		// it could get allocated to something with a def during allocation.
if (!MRI.isConstantPhysReg(Reg))		if (!MRI.isConstantPhysReg(Reg))
return false;		return Rematerializability::NO;
} else {		} else {
// A physreg def. We can't remat it.		// A physreg def. If the register is dead, we can still rematerialize.
return false;		// This will be checked in LiveRangeEdit::canRematerializeAt.
		AdditionalDefs = true;
}		}
continue;		continue;
}		}

// Only allow one virtual-register def. There may be multiple defs of the		// Only allow one virtual-register def. There may be multiple defs of the
// same virtual register, though.		// same virtual register, though.
if (MO.isDef() && Reg != DefReg)		if (MO.isDef() && Reg != DefReg)
return false;		return Rematerializability::NO;

// Don't allow any virtual-register uses. Rematting an instruction with		// Don't allow any virtual-register uses. Rematting an instruction with
// virtual register uses would length the live ranges of the uses, which		// virtual register uses would length the live ranges of the uses, which
// is not necessarily a good idea, certainly not "trivial".		// is not necessarily a good idea, certainly not "trivial".
if (MO.isUse())		if (MO.isUse())
return false;		return Rematerializability::NO;
}		}

// Everything checked out.		// Everything checked out.
return true;		return AdditionalDefs ? Rematerializability::YES_BUT_EXTRA_PHYSREG_DEFS
		: Rematerializability::YES;
}		}

int TargetInstrInfo::getSPAdjust(const MachineInstr &MI) const {		int TargetInstrInfo::getSPAdjust(const MachineInstr &MI) const {
const MachineFunction *MF = MI.getMF();		const MachineFunction *MF = MI.getMF();
const TargetFrameLowering *TFI = MF->getSubtarget().getFrameLowering();		const TargetFrameLowering *TFI = MF->getSubtarget().getFrameLowering();
bool StackGrowsDown =		bool StackGrowsDown =
TFI->getStackGrowthDirection() == TargetFrameLowering::StackGrowsDown;		TFI->getStackGrowthDirection() == TargetFrameLowering::StackGrowsDown;

▲ Show 20 Lines • Show All 262 Lines • Show Last 20 Lines

lib/Target/ARM/ARMInstrThumb.td

	Show First 20 Lines • Show All 1,127 Lines • ▼ Show 20 Lines
	}			}

	// LSL register			// LSL register
	def tLSLrr : // A8.6.89			def tLSLrr : // A8.6.89
	T1sItDPEncode<0b0010, (outs tGPR:$Rdn), (ins tGPR:$Rn, tGPR:$Rm),			T1sItDPEncode<0b0010, (outs tGPR:$Rdn), (ins tGPR:$Rn, tGPR:$Rm),
	IIC_iMOVsr,			IIC_iMOVsr,
	"lsl", "\t$Rdn, $Rm",			"lsl", "\t$Rdn, $Rm",
	[(set tGPR:$Rdn, (shl tGPR:$Rn, tGPR:$Rm))]>, Sched<[WriteALU]>;			[(set tGPR:$Rdn, (shl tGPR:$Rn, tGPR:$Rm))]>, Sched<[WriteALU]>;

				samparkerUnsubmitted Not Done Reply Inline Actions Can this also be used for the movs register variant? samparker: Can this also be used for the movs register variant?
				philipginsbachAuthorUnsubmitted Not Done Reply Inline Actions The infrastructure that the patch sets up could be used for much more generic instructions, yes! Potentially, add, sub etc could be rematerialized the same way as well. Unfortunately it's pointless though. The problem is that the used registers would have to be preserved by chance from the first invocation to the second. The register allocator is not incentivized to make that happen, so I it's extremely unlikely to occur often, especially since the whole thing becomes relevant only in a high register pressure situation. philipginsbach: The infrastructure that the patch sets up could be used for much more generic instructions, yes!
	// LSR immediate			// LSR immediate
	def tLSRri : // A8.6.90			def tLSRri : // A8.6.90
	T1sIGenEncodeImm<{0,0,1,?,?}, (outs tGPR:$Rd), (ins tGPR:$Rm, imm_sr:$imm5),			T1sIGenEncodeImm<{0,0,1,?,?}, (outs tGPR:$Rd), (ins tGPR:$Rm, imm_sr:$imm5),
	IIC_iMOVsi,			IIC_iMOVsi,
	"lsr", "\t$Rd, $Rm, $imm5",			"lsr", "\t$Rd, $Rm, $imm5",
	[(set tGPR:$Rd, (srl tGPR:$Rm, (i32 imm_sr:$imm5)))]>,			[(set tGPR:$Rd, (srl tGPR:$Rm, (i32 imm_sr:$imm5)))]>,
	Sched<[WriteALU]> {			Sched<[WriteALU]> {
	bits<5> imm5;			bits<5> imm5;
	let Inst{10-6} = imm5;			let Inst{10-6} = imm5;
	}			}

	// LSR register			// LSR register
	def tLSRrr : // A8.6.91			def tLSRrr : // A8.6.91
	T1sItDPEncode<0b0011, (outs tGPR:$Rdn), (ins tGPR:$Rn, tGPR:$Rm),			T1sItDPEncode<0b0011, (outs tGPR:$Rdn), (ins tGPR:$Rn, tGPR:$Rm),
	IIC_iMOVsr,			IIC_iMOVsr,
	"lsr", "\t$Rdn, $Rm",			"lsr", "\t$Rdn, $Rm",
	[(set tGPR:$Rdn, (srl tGPR:$Rn, tGPR:$Rm))]>, Sched<[WriteALU]>;			[(set tGPR:$Rdn, (srl tGPR:$Rn, tGPR:$Rm))]>, Sched<[WriteALU]>;

	// Move register			// Move register
	let isMoveImm = 1 in			let isMoveImm = 1, isReMaterializable = 1 in
	def tMOVi8 : T1sI<(outs tGPR:$Rd), (ins imm0_255:$imm8), IIC_iMOVi,			def tMOVi8 : T1sI<(outs tGPR:$Rd), (ins imm0_255:$imm8), IIC_iMOVi,
	"mov", "\t$Rd, $imm8",			"mov", "\t$Rd, $imm8",
	[(set tGPR:$Rd, imm0_255:$imm8)]>,			[(set tGPR:$Rd, imm0_255:$imm8)]>,
	T1General<{1,0,0,?,?}>, Sched<[WriteALU]> {			T1General<{1,0,0,?,?}>, Sched<[WriteALU]> {
	// A8.6.96			// A8.6.96
	bits<3> Rd;			bits<3> Rd;
	bits<8> imm8;			bits<8> imm8;
	let Inst{10-8} = Rd;			let Inst{10-8} = Rd;
	▲ Show 20 Lines • Show All 552 Lines • Show Last 20 Lines

test/CodeGen/Thumb/movi8remat.mir

This file was added.

				# RUN: llc -start-before greedy %s -o - \| FileCheck %s
				--- \|
				; ModuleID = 'movi8remat.ll'
				source_filename = "movi8remat_test.ll"
				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "thumbv6m-apple--eabi"

				declare void @consume_value(i32)

				declare i32 @get_value(...)

				declare void @consume_five_values(i32, i32, i32, i32, i32)

				define void @this_spills_the_immediate_constant() {
				tail call void @consume_value(i32 42)
				%1 = tail call i32 (...) @get_value()
				%2 = tail call i32 (...) @get_value()
				%3 = tail call i32 (...) @get_value()
				%4 = tail call i32 (...) @get_value()
				%5 = tail call i32 (...) @get_value()
				tail call void @consume_value(i32 42)
				tail call void @consume_five_values(i32 %1, i32 %2, i32 %3, i32 %4, i32 %5)
				ret void
				}

				; Function Attrs: nounwind
				declare void @llvm.stackprotector(i8, i8*) #0

				attributes #0 = { nounwind }

				...
				---
				name: this_spills_the_immediate_constant
				alignment: 1
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				registers:
				- { id: 0, class: tgpr, preferred-register: '' }
				- { id: 1, class: tgpr, preferred-register: '' }
				- { id: 2, class: tgpr, preferred-register: '' }
				- { id: 3, class: tgpr, preferred-register: '' }
				- { id: 4, class: tgpr, preferred-register: '' }
				- { id: 5, class: tgpr, preferred-register: '' }
				- { id: 6, class: tgpr, preferred-register: '' }
				liveins:
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 4
				adjustsStack: true
				hasCalls: true
				stackProtector: ''
				maxCallFrameSize: 4
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack:
				stack:
				constants:
				body: \|
				bb.0 (%ir-block.0):
				ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
				%0:tgpr, dead $cpsr = tMOVi8 42, 14, $noreg
				$r0 = COPY %0
				tBL 14, $noreg, @consume_value, csr_aapcs, implicit-def dead $lr, implicit $sp, implicit killed $r0, implicit-def $sp
				ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
				ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
				tBL 14, $noreg, @get_value, csr_aapcs, implicit-def dead $lr, implicit $sp, implicit-def $sp, implicit-def $r0
				ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
				%1:tgpr = COPY killed $r0
				ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
				tBL 14, $noreg, @get_value, csr_aapcs, implicit-def dead $lr, implicit $sp, implicit-def $sp, implicit-def $r0
				ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
				%2:tgpr = COPY killed $r0
				ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
				tBL 14, $noreg, @get_value, csr_aapcs, implicit-def dead $lr, implicit $sp, implicit-def $sp, implicit-def $r0
				ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
				%3:tgpr = COPY killed $r0
				ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
				tBL 14, $noreg, @get_value, csr_aapcs, implicit-def dead $lr, implicit $sp, implicit-def $sp, implicit-def $r0
				ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
				%4:tgpr = COPY killed $r0
				ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
				tBL 14, $noreg, @get_value, csr_aapcs, implicit-def dead $lr, implicit $sp, implicit-def $sp, implicit-def $r0
				ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
				%5:tgpr = COPY killed $r0
				ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
				$r0 = COPY %0
				tBL 14, $noreg, @consume_value, csr_aapcs, implicit-def dead $lr, implicit $sp, implicit killed $r0, implicit-def $sp
				ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
				ADJCALLSTACKDOWN 4, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
				%6:tgpr = COPY $sp
				tSTRi %5, %6, 0, 14, $noreg :: (store 4 into stack)
				$r0 = COPY %1
				$r1 = COPY %2
				$r2 = COPY %3
				$r3 = COPY %4
				tBL 14, $noreg, @consume_five_values, csr_aapcs, implicit-def dead $lr, implicit $sp, implicit killed $r0, implicit killed $r1, implicit killed $r2, implicit killed $r3, implicit-def $sp
				ADJCALLSTACKUP 4, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
				tBX_RET 14, $noreg

				...

				# CHECK: movs r0, #42
				# CHECK: movs r0, #42

This is an archive of the discontinued LLVM Phabricator instance.

Allow rematerialization of ARM Thumb MOVi8 instruction in some contextsNeeds ReviewPublic

Details

Diff Detail

Event Timeline

>

vreg0

vreg1

Revision Contents

Diff 172970

include/llvm/CodeGen/TargetInstrInfo.h

lib/CodeGen/CalcSpillWeights.cpp

lib/CodeGen/LiveRangeEdit.cpp

>

vreg0

vreg1

lib/CodeGen/MachineLICM.cpp

lib/CodeGen/TargetInstrInfo.cpp

lib/Target/ARM/ARMInstrThumb.td

test/CodeGen/Thumb/movi8remat.mir

Allow rematerialization of ARM Thumb MOVi8 instruction in some contexts
Needs ReviewPublic