Allow rematerialization of ARM Thumb MOVi8 instruction in some contexts
Needs ReviewPublic

Authored by philipginsbach on Jun 6 2017, 5:13 AM.

Details

Summary

Constants are crucial for code size in the ARM Thumb-1 instruction set.
The 16 bit instruction size often does not offer enough space for immediate arguments.
This means that additional instructions are frequently used to load constants into registers.
Since constants are hoisted, this can lead to significant register spillage if they are used multiple times in a single function.
This can be avoided by rematerialization, i.e. recomputing a constant instead of reloading it from the stack.
This patch extends the rematerialization functionality of llvm such that it is able to rematerialize ARM Thumb MOVSi8 instruction despite it setting the CPSR flags.

Diff Detail

philipginsbach created this revision.Jun 6 2017, 5:13 AM

Hi Philip,

This needs some tests.

cheers,
sam

samparker added inline comments.Fri, Jul 14, 2:46 AM
lib/CodeGen/LiveRangeEdit.cpp
162

For speed, I think you can iterate over just the defs instead of all the operands, they'll be fewer and it will negate the need to do the Reg and Def checks.

177

Allocatable isn't about liveliness, it's just whether the register can be used by the allocator generally, I'm guessing that this works here because it can't allocate the CPSR.

181

Can you not just query IsDead on the operand instead?

philipginsbach added inline comments.Fri, Jul 14, 3:01 AM
lib/CodeGen/LiveRangeEdit.cpp
162

Agreed, I will fix this.

177

Yes, I'm aware of that.
I discussed that part with James Molloy in quite some depth.
From what I understand (and as I say in the comment there might be gaps), I can't rely on liveliness computation here, because further iterations of the register allocator might change the liveliness.
Obviously this problem goes away if a register can't be used freely by the register allocator.

Long story short: I'm quite certain the condition I check here is sufficient but it is probably way too strong.

181

I tried several different options here.
If I recall correctly, what you propose only works for virtual registers.

samparker added inline comments.Fri, Jul 14, 3:05 AM
lib/Target/ARM/ARMInstrThumb.td
1136

Can this also be used for the movs register variant?

philipginsbach added inline comments.Fri, Jul 14, 3:06 AM
lib/CodeGen/LiveRangeEdit.cpp
177

The important thing is that the patch is targeted at CPSR, as you point out, which is not allocatable.

samparker added inline comments.Fri, Jul 14, 3:11 AM
lib/CodeGen/LiveRangeEdit.cpp
177

Ok, definitely better to be conservative then.

philipginsbach added inline comments.Fri, Jul 14, 3:14 AM
lib/Target/ARM/ARMInstrThumb.td
1136

The infrastructure that the patch sets up could be used for much more generic instructions, yes!
Potentially, add, sub etc could be rematerialized the same way as well.

Unfortunately it's pointless though.

The problem is that the used registers would have to be preserved by chance from the first invocation to the second.
The register allocator is not incentivized to make that happen, so I it's extremely unlikely to occur often, especially since the whole thing becomes relevant only in a high register pressure situation.

philipginsbach marked 2 inline comments as done.Fri, Jul 14, 5:52 AM
qcolombet requested changes to this revision.Fri, Jul 14, 2:31 PM

Hi Philip,

Thanks for looking into extending the rematerialization support.

I believe the interference checking is too weak to actually work in all cases (see inline comments). More over, I think we cannot say this is still trivially rematerializable given now the callers of that method need to do interference checks. Thus I would suggest to use a new/different hook. In particular, I am not convinced the register coalescer will do the right thing if it rematerializes an instruction with more than one def.

Cheers,
-Quentin

include/llvm/Target/TargetInstrInfo.h
113

No else after return per LLVM coding standard.

lib/CodeGen/LiveRangeEdit.cpp
162

This is wrong. This misses all the implicit-def.

>

/// Returns a range over all explicit operands that are register definitions.
/// Implicit definition are not included!
166

That doesn't seem right either. Indeed where is the check for virtregs?

E.g., consider
vreg0,vreg1 = loadimm
...
vreg1 = something else
/// <=== if you remat here you're going to break the next use of vreg1

vreg0

vreg1

test/CodeGen/Thumb/movi8remat.ll
1 ↗(On Diff #106606)

Please use a .mir test.
You can generate this with -stop-before greedy -simplify-mir

This revision now requires changes to proceed.Fri, Jul 14, 2:31 PM

Hi Quentin,

Thanks a lot for the comments, that's really valuable feedback.
I will take some time to dig through those issues and get back to you.

Best,
Philip

Hi Quentin,

Let me follow up on your comments.

I renamed the member functions to isPotentiallyTriviallyReMaterializable and isReallyPotentiallyTriviallyReMaterializableGeneric.
This is somewhat grotesque but does bring across the functionality more clearly I think. Obviously I am open to more aesthetic suggestions.

The problem with implicit defs should be removed now and was not present in a previous version of the patch.
I didn't pay attention and was mislead by misleading documentation for MachineInstr::getNumOperands.

I think your concern for checking virtual register defs stems from isPotentiallyTriviallyReMaterializable not being very clear in it's purpose.
I hope the change to the return enum value improves this. Do you agree that this is sufficient or am I overlooking something crucial here?

Best,
Philip

include/llvm/Target/TargetInstrInfo.h
113

I will fix this.

lib/CodeGen/LiveRangeEdit.cpp
162

I was too eager to change this, I think the previous version should have been correct.

There appears to be a mistake in the documentation for MachineInstr::getNumOperands, it says "Access to explicit operands of the instruction."
However it includes the implicit operands.

166

I did not weaken the check for additional virtual registers in TargetInstrInfo::isReallyTriviallyReMaterializableGeneric.
What you describe would still be filtered out by that function (i.e. return Rematerializability::NO).
The only way in that it is more lenient than before is that is allows additional explicit physical register defs.

Arguably the behavior that you imply (allowing additional virtreg defs) would be useful and more obvious but at the core this patch is concerned with the movi8 instruction and the current change is enough to rematerialize that.

I will change the enum item YES_BUT_EXTRA_DEFS to YES_BUT_EXTRA_PHYSREG_DEFS to make the behavior of the function more obvious.

test/CodeGen/Thumb/movi8remat.ll
1 ↗(On Diff #106606)

Ok, I will change it to a .mir test, thanks for providing the options.