This is an archive of the discontinued LLVM Phabricator instance.

[ARM][RegAlloc] Add t2LoopEndDec
ClosedPublic

Authored by dmgreen on Nov 12 2020, 8:04 AM.

Details

Summary

We currently have problems with the way that low overhead loops are specified, with LR sometimes being spilled between the t2LoopDec and the t2LoopEnd forcing the entire loop to be reverted late in the backend. As they will eventually become a single instruction, this patch introduces a t2LoopEndDec which is the combination of the two, combined before registry allocation to make sure this does not fail.

Unfortunately this instruction is a terminator that produces a value (and also branches - it only produces the value around the branching edge). So this needs some adjustment to phi elimination and the register allocator to make sure that we do not spill this LR def around the loop (needing to put a spill after the terminator). We treat the loop very carefully, making sure that there is nothing else like calls that would break it's ability to use LR. For that, this adds a isUnspillableTerminator to opt in the new behaviour.

This might obviously be a little contentious, so I would like to get opinions from people who know what they are talking about. There is a chance that this could cause problems, and so I have added an escape option incase. But I have not seen any problems in the testing that I've tried, and not reverting Low overhead loops is important for our performance. If this does work then we can hopefully do the same for t2WhileLoopStart and t2DoLoopStart instructions.

This patch also contains the code needed to convert or revert the t2LoopEndDec in the backend (which just needs a subs; bne) and the code pre-ra to create them.

Diff Detail

Event Timeline

dmgreen created this revision.Nov 12 2020, 8:04 AM

This might obviously be a little contentious, so I would like to get opinions from people who know what they are talking about.

The ARM changes look very reasonable to me. To be a bit more precise, yes, this is definitely something we want, so agreed on the motivation and the (ARM) implementation, showing nice codegen changes.

But value producing terminators in regalloc is not my forte, so indeed best if someone approves that part.

Find inline some/mostly nits.

llvm/include/llvm/CodeGen/TargetInstrInfo.h
351

typo: trerminator

llvm/lib/CodeGen/CalcSpillWeights.cpp
227

Nit: guess you don't really need to check isTerminator as that will be checked in isUnspillableTerminator?

llvm/lib/Target/ARM/ARMInstrThumb2.td
5451

bikeshedding names: how about t2LoopDecEnd to more keep the semantics this is a decrement + end, in that order?

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1302–1308

It wasn't immediately clear to me why we need this now.

1365

nit: perhaps an assert that MI is a loop end?

1632

nit: how about RevertLoopEnd(LoLoop.End, RevertLoopDec(LoLoop.Dec)); to get rid of the curly brackets around the if-else here?

llvm/lib/Target/ARM/MVETPAndVPTOptimisationsPass.cpp
39

nit: arm-enable-mergeenddec is a bit difficult to read. Since it is relative long already, how about making it even a bit longer, something like: arm-enable-merge-loopenddec?

40

typo? mergeing?

243–249

nit:

Replace the loop dec into the loop end as a single instruction

->

Replace the loop dec and loop end as a single instruction

or something similar...

dmgreen updated this revision to Diff 307273.Nov 24 2020, 1:12 AM
dmgreen marked 6 inline comments as done.

Thanks for taking a look.

llvm/lib/CodeGen/CalcSpillWeights.cpp
227

I was trying to make it explicit, that it needed to be a terminator that defined the value and this shouldn't be used for anything else.

I can change that if you think it's better to remove it. It is currently just trying to be conservative.

llvm/lib/Target/ARM/ARMInstrThumb2.td
5451

Yeah, It does both together. I have no strong opinion on the order in the name. I guess I chose t2LoopEndDec as I felt that the major thing it did was behave as a loop end (like an LE instruction). The Dec somewhat of a side effect of that.

Can change it if you think it's better the other way. Let me know.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1302–1308

I updated the comment.

llvm/lib/Target/ARM/MVETPAndVPTOptimisationsPass.cpp
39

Thanks. This did feel like a particularly ugly option name.

Adding @mtrofin, @bjope, @arsenm for the unspillable terminator change.

llvm/lib/CodeGen/CalcSpillWeights.cpp
227

Not a big deal, whatever you prefer.

llvm/lib/Target/ARM/ARMInstrThumb2.td
5451

No strong opinion on this, I was bikeshedding anyway.

mtrofin added inline comments.Nov 25 2020, 9:46 AM
llvm/lib/CodeGen/CalcSpillWeights.cpp
227

from a readability perspective, isUnspillableTerminator suggests "isTerminator && isUnspillable".

How about having isUnspillableTerminator not virtual and doing this:

return isTerminator() && isUnspillableTerminatorImpl()

where isUnspillableTerminatorImpl is protected and virtual?

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1375

Nit: addReg(MCRegister::NoRegister) is probably more readable.

bjope added inline comments.Nov 25 2020, 1:05 PM
llvm/lib/CodeGen/PHIElimination.cpp
452

This is only valid if there are no other uses of SrcReg (no anywhere else, and not in the current MBB that we are eliminating PHI nodes inside, such as another PHI node in MBB).

Haven't analyzed it more closely than that. But I don't know if it can be generally guaranteed that the def in an UnspillableTerminator only has a single use. Is such a rule guaranteed here (or even feasible)?

dmgreen updated this revision to Diff 307826.Nov 26 2020, 4:14 AM
dmgreen marked 4 inline comments as done.

Thanks for the comments. Added a isUnspillableTerminatorImpl.

llvm/lib/CodeGen/CalcSpillWeights.cpp
227

Sure, sounds good to me.

llvm/lib/CodeGen/PHIElimination.cpp
452

Yeah. We are being very careful! We are only using these for hardware loop instructions that we control the insertion of and are careful to only do the transform to produce the instruction (t2LoopEndDec in this case) when it has a single remaining user. CheckUsers in MVETPAndVPTOptimisations::MergeLoopEnd does that.

That's why this is very opt-in and it's trying to be careful to say "Use with care".

bjope added inline comments.Nov 27 2020, 6:05 AM
llvm/lib/CodeGen/PHIElimination.cpp
452

My concern was that this is not inside the target specific part of the backend. So if some other target wants to use the concept of "unspillable terminators" then there must be some clear rules what to expect.

I think that having an assert that there is only one use would be appropriate here (to at least catch any violations to such a rule at this point where things would go wrong otherwise).

But maybe the MachineVerifier should help out detect problems if an unspillable terminator def has more than one use as well (to catch such problems already in a pass breaking the rule)?

And maybe we need to document what is expected related to "unspillable terminators" related to this restriction somewhere (not sure exactly where).

I guess there could be lots of generic passes, but also target specific passes, that must adhere to such a rule if it is allowed to insert an unspillable terminator already at ISel. So I still don't see how to ensure that all passes between ISel and PHI-elmination are aware about that they aren't allowed to introduce new uses of the unspillable terminator def? Or how close to PHI-elimination must a pass introducing "unspillable terminators" be?

dmgreen updated this revision to Diff 308320.Nov 30 2020, 4:19 AM

Add a machine verifier check and an extra assert.

llvm/lib/CodeGen/PHIElimination.cpp
452

I agree. I have added a check to the verifier and an assert in here. In our case we are adding these for hardware loop instructions soon before registry allocation, after all the other mir level optimization have happened. There's also some details about the instruction in isUnspillableTerminatorImpl.

bjope added inline comments.Dec 7 2020, 12:24 PM
llvm/lib/CodeGen/MachineVerifier.cpp
1553 ↗(On Diff #309942)

typ: /Checl/Check/

1559 ↗(On Diff #309942)

Use lists in MRI also contain dbg-uses. So I think we want to use use_instr_nodbg here to only count non-dbg-uses.

Also a bit unclear if zero uses should be valid or not (code is currently checking "at most one user", but code comments and the report says "should have a single user"). There is a hasOneDBGUser helper in MRI that could be used if there has to be one user.

dmgreen updated this revision to Diff 310046.Dec 7 2020, 4:07 PM

Thanks. I think hasOneDBGUse would have probably worked in our use case, but checking use_instr_nodbg is a little more general.

bjope added a comment.Dec 9 2020, 7:07 AM

@SjoerdMeijer : I've reviewed the non-target-specific changes now. And that part looks good to me now.

I don't know if the target-specific parts already has been reviewed (or if anything has changed lately). Can someone with ARM/Thumb2 knowledge can acknowledge that part and set "accept revision".

SjoerdMeijer accepted this revision.Dec 9 2020, 7:34 AM

@SjoerdMeijer : I've reviewed the non-target-specific changes now. And that part looks good to me now.

I don't know if the target-specific parts already has been reviewed (or if anything has changed lately). Can someone with ARM/Thumb2 knowledge can acknowledge that part and set "accept revision".

Thanks for helping and reviewing!

I've commented on the ARM specific part, and that has been addressed. With your comments addressed too, this then LGTM.

This revision is now accepted and ready to land.Dec 9 2020, 7:34 AM

Thanks folks.

This revision was automatically updated to reflect the committed changes.