Page MenuHomePhabricator

Allow rematerialization of ARM Thumb MOVi8 instruction in some contexts
Needs ReviewPublic

Authored by philipginsbach on Jun 6 2017, 5:13 AM.

Details

Summary

Constants are crucial for code size in the ARM Thumb-1 instruction set.
The 16 bit instruction size often does not offer enough space for immediate arguments.
This means that additional instructions are frequently used to load constants into registers.
Since constants are hoisted, this can lead to significant register spillage if they are used multiple times in a single function.
This can be avoided by rematerialization, i.e. recomputing a constant instead of reloading it from the stack.
This patch extends the rematerialization functionality of llvm such that it is able to rematerialize ARM Thumb MOVSi8 instruction despite it setting the CPSR flags.

Diff Detail

Event Timeline

philipginsbach created this revision.Jun 6 2017, 5:13 AM
samparker edited edge metadata.Jul 10 2017, 5:19 AM

Hi Philip,

This needs some tests.

cheers,
sam

samparker added inline comments.Jul 14 2017, 2:46 AM
lib/CodeGen/LiveRangeEdit.cpp
162

For speed, I think you can iterate over just the defs instead of all the operands, they'll be fewer and it will negate the need to do the Reg and Def checks.

177

Allocatable isn't about liveliness, it's just whether the register can be used by the allocator generally, I'm guessing that this works here because it can't allocate the CPSR.

181

Can you not just query IsDead on the operand instead?

philipginsbach added inline comments.Jul 14 2017, 3:01 AM
lib/CodeGen/LiveRangeEdit.cpp
162

Agreed, I will fix this.

177

Yes, I'm aware of that.
I discussed that part with James Molloy in quite some depth.
From what I understand (and as I say in the comment there might be gaps), I can't rely on liveliness computation here, because further iterations of the register allocator might change the liveliness.
Obviously this problem goes away if a register can't be used freely by the register allocator.

Long story short: I'm quite certain the condition I check here is sufficient but it is probably way too strong.

181

I tried several different options here.
If I recall correctly, what you propose only works for virtual registers.

samparker added inline comments.Jul 14 2017, 3:05 AM
lib/Target/ARM/ARMInstrThumb.td
1136

Can this also be used for the movs register variant?

philipginsbach added inline comments.Jul 14 2017, 3:06 AM
lib/CodeGen/LiveRangeEdit.cpp
177

The important thing is that the patch is targeted at CPSR, as you point out, which is not allocatable.

samparker added inline comments.Jul 14 2017, 3:11 AM
lib/CodeGen/LiveRangeEdit.cpp
177

Ok, definitely better to be conservative then.

philipginsbach added inline comments.Jul 14 2017, 3:14 AM
lib/Target/ARM/ARMInstrThumb.td
1136

The infrastructure that the patch sets up could be used for much more generic instructions, yes!
Potentially, add, sub etc could be rematerialized the same way as well.

Unfortunately it's pointless though.

The problem is that the used registers would have to be preserved by chance from the first invocation to the second.
The register allocator is not incentivized to make that happen, so I it's extremely unlikely to occur often, especially since the whole thing becomes relevant only in a high register pressure situation.

philipginsbach marked 2 inline comments as done.Jul 14 2017, 5:52 AM
qcolombet requested changes to this revision.Jul 14 2017, 2:31 PM

Hi Philip,

Thanks for looking into extending the rematerialization support.

I believe the interference checking is too weak to actually work in all cases (see inline comments). More over, I think we cannot say this is still trivially rematerializable given now the callers of that method need to do interference checks. Thus I would suggest to use a new/different hook. In particular, I am not convinced the register coalescer will do the right thing if it rematerializes an instruction with more than one def.

Cheers,
-Quentin

include/llvm/Target/TargetInstrInfo.h
111

No else after return per LLVM coding standard.

lib/CodeGen/LiveRangeEdit.cpp
162

This is wrong. This misses all the implicit-def.

>

/// Returns a range over all explicit operands that are register definitions.
/// Implicit definition are not included!
166

That doesn't seem right either. Indeed where is the check for virtregs?

E.g., consider
vreg0,vreg1 = loadimm
...
vreg1 = something else
/// <=== if you remat here you're going to break the next use of vreg1

vreg0

vreg1

test/CodeGen/Thumb/movi8remat.ll
1

Please use a .mir test.
You can generate this with -stop-before greedy -simplify-mir

This revision now requires changes to proceed.Jul 14 2017, 2:31 PM

Hi Quentin,

Thanks a lot for the comments, that's really valuable feedback.
I will take some time to dig through those issues and get back to you.

Best,
Philip

Hi Quentin,

Let me follow up on your comments.

I renamed the member functions to isPotentiallyTriviallyReMaterializable and isReallyPotentiallyTriviallyReMaterializableGeneric.
This is somewhat grotesque but does bring across the functionality more clearly I think. Obviously I am open to more aesthetic suggestions.

The problem with implicit defs should be removed now and was not present in a previous version of the patch.
I didn't pay attention and was mislead by misleading documentation for MachineInstr::getNumOperands.

I think your concern for checking virtual register defs stems from isPotentiallyTriviallyReMaterializable not being very clear in it's purpose.
I hope the change to the return enum value improves this. Do you agree that this is sufficient or am I overlooking something crucial here?

Best,
Philip

include/llvm/Target/TargetInstrInfo.h
111

I will fix this.

lib/CodeGen/LiveRangeEdit.cpp
162

I was too eager to change this, I think the previous version should have been correct.

There appears to be a mistake in the documentation for MachineInstr::getNumOperands, it says "Access to explicit operands of the instruction."
However it includes the implicit operands.

166

I did not weaken the check for additional virtual registers in TargetInstrInfo::isReallyTriviallyReMaterializableGeneric.
What you describe would still be filtered out by that function (i.e. return Rematerializability::NO).
The only way in that it is more lenient than before is that is allows additional explicit physical register defs.

Arguably the behavior that you imply (allowing additional virtreg defs) would be useful and more obvious but at the core this patch is concerned with the movi8 instruction and the current change is enough to rematerialize that.

I will change the enum item YES_BUT_EXTRA_DEFS to YES_BUT_EXTRA_PHYSREG_DEFS to make the behavior of the function more obvious.

test/CodeGen/Thumb/movi8remat.ll
1

Ok, I will change it to a .mir test, thanks for providing the options.

philipginsbach edited edge metadata.
chill added a subscriber: chill.Jul 25 2018, 6:02 AM

I have rebased onto the master branch at 244c796c894f50fb53f9dbe7627702661dfe69c2.

Quentin, you gave many useful suggestions before, what do you think is the way forward with this?

philipginsbach edited reviewers, added: MatzeB; removed: javed.absar.Oct 9 2018, 10:09 AM

I ran some benchmarks for thumb1, they look great.

zzheng added inline comments.Oct 10 2018, 4:09 PM
lib/CodeGen/LiveRangeEdit.cpp
165

shouldn't we use auto type and range-based loop now?

Uses range based loop now.

philipginsbach marked an inline comment as done.Oct 12 2018, 5:58 AM
philipginsbach added inline comments.
lib/CodeGen/LiveRangeEdit.cpp
165

Thanks, I fixed it.

philipginsbach marked an inline comment as done.

I think I have addressed all your previous concerns, @qcolombet, is there anything else you'd like to see clarified?
Sam Parker suggested I add you as a reviewer, @MatzeB, do you have any suggestions?

Sorry to be pushy, this patch has been lying around for a while and it would be quite satisfying to get it through the process.

Hi,

Sorry for the delay, I haven’t had time to review the patch but I have one high level comment.
I am not sure the enum approach is desirable as it basically hard code the kind of constraints we can report and will grow very quickly if we want to extend it.
For instance, let say that on top of physreg defs we want to report virtual reg defs, now we would need “yes”, “no”, “yes_phys”, “yes_virt”, “yes_phys_virt” and the list grows with the cross product of everything we may want/need to report.

Anyhow, I’ll look closer at the patch to give more concrete feedbacks.

Cheers,
Q

I am not sure the enum approach is desirable as it basically hard code the kind of constraints we can report and will grow very quickly if we want to extend it.
For instance, let say that on top of physreg defs we want to report virtual reg defs, now we would need “yes”, “no”, “yes_phys”, “yes_virt”, “yes_phys_virt” and the list grows with the cross product of everything we may want/need to report.

Hi Quentin,

Thanks a lot for looking at this again.

I understand your concern, the easiest way of tackling it would probably be some kind of bitfield with flags for the different kind of (potential) extensions of the unrestricted rematerializability property.

However, I would vote for that to be introduced when the problem actually arises. It isn't so obvious to me that many more generalizations of rematerializability will be necessary and the path from enum to bitfield is easy.

What do you think?

Cheers,
Philip

However, I would vote for that to be introduced when the problem actually arises. It isn't so obvious to me that many more generalizations of rematerializability will be necessary and the path from enum to bitfield is easy.

What do you think?

I agree, moving with incremental steps is the right thing to do.
The question is more does this step go in the direction we want? (I am not saying it doesn't, I just haven't looked closer enough to have an opinion :))

Now, I just wanted to point out that there are more opportunities to generalize the rematerialization. The obvious one to me is rematerializing everything (like pulling chain of computations, full instructions (with both definitions and arguments)) and in that sense, only the instruction or chain of instructions carry the right level of information. Right now we often introduce pseudo instruction to work around this limitation and that's what I would like to solve.

Anyway, I'll take a closer look at the patch.

Now, I just wanted to point out that there are more opportunities to generalize the rematerialization. The obvious one to me is rematerializing everything (like pulling chain of computations, full instructions (with both definitions and arguments)) and in that sense, only the instruction or chain of instructions carry the right level of information. Right now we often introduce pseudo instruction to work around this limitation and that's what I would like to solve.

It's true, rematerializing a whole chain of instructions would require a more thorough approach.
I think it's important to be clear about motivations beforehand though: this patch was done to optimize code size for ARM Thumb. Since spilling and reloading takes only two instructions, rematerializing even chains of two instructions is pointless from that perspective. Obviously, there are other metrics to consider, but I think it would be important nonetheless to establish what exactly is to gain.

john.brawn resigned from this revision.May 12 2020, 6:46 AM