This is an archive of the discontinued LLVM Phabricator instance.

CodeGen, AArch64, ARM, X86: Simplify SplitCSR
AbandonedPublic

Authored by MatzeB on Apr 11 2016, 9:39 PM.

Details

Summary

Simplify SplitCSR code:

  • Move generic legality checking code from SelectionDAG::runOnMachineFunction() to TargetLoweringBase::mayUseSplitCSR().
  • Avoid TargetLoweringInfo::supportSplitCSR() and initializeSplitCSR() callbacks simply by performing the lowering as part of TargetLowering::LowerFormals()
  • Avoid TargetLoweringInfo::insertCopiesSplitCSR() callback by using MachingFunction::addLiveIn() to get copies to vregs at the entry block and by creating CopyFromReg/CopyToReg nodes when lowering the return in LowerReturn. Provide a convenience function addCalleeSaveRegOps() for this.

This will be used to handle parameters passed in callee saves with the SplitCSR logic.

Diff Detail

Repository
rL LLVM

Event Timeline

MatzeB updated this revision to Diff 53364.Apr 11 2016, 9:39 PM
MatzeB retitled this revision from to CodeGen, AArch64, ARM, X86: Simplify SplitCSR.
MatzeB updated this object.
MatzeB added reviewers: qcolombet, manmanren, hfinkel.
MatzeB set the repository for this revision to rL LLVM.
MatzeB added a subscriber: llvm-commits.
manmanren edited edge metadata.Apr 12 2016, 12:21 PM

Some general comments:
Is this NFC (no functionality change)? I noticed a small testing case change.

I think the original code tries to make it easy for a target to start supporting splitCSR, or for a calling convention to start using splitCSR.
With this updated approach, is it still easy to do so? I didn't really look through the patch to figure out how :]

Cheers,
Manman

include/llvm/Target/TargetLowering.h
2670

typo: kepy

lib/Target/AArch64/AArch64ISelLowering.cpp
2678

I don't quite like this getting duplicated in every target that supports SplitCSR and in the already heavy LowerFormalArguments. Can we wrap this up somehow?

If we want to support another Calling convention, we will have more conditions here.

Some general comments:
Is this NFC (no functionality change)? I noticed a small testing case change.

It changes the order in which the physregs are copied to vregs in the entry block. We are unlucky in the testcase and the register allocator chooses registers in a way that fewer stmia instructions are formed, the code gets slightly bigger and we need a wide jump.
I can make this true NFC but that will mean changing the code to add some registers in reverse order without any good explanation of why...

I think the original code tries to make it easy for a target to start supporting splitCSR, or for a calling convention to start using splitCSR.
With this updated approach, is it still easy to do so? I didn't really look through the patch to figure out how :]

The main motivation of the change is that the decision on whether to use SplitCSR had to be done before any other SelectionDAG code, this meant that there was no CCState object available and you cannot create one because ISD::InputArgs is not available. So you had to make that decisions purely on the IR function. The commit depending on this wants to use SplitCSR for all cases when a parameter is passed in a callee save register. Without CCState available you can only match on calling convention IDs and you would need to hardcode knowledge about how registers will be chosen later based on that, doing it later you can just take the information that is already computed.

To use SplitCSR with this change you basically: Decide whether you need it in LowerFormals() and if so call addLiveIn() for every register handled this way, you also alter LowerReturn to copy the register back. I found this easier to understand as that closely follows the logic that you copy the register to a vreg in the entry block and copy it back in the return block. LowerFormals() and LowerReturn() should already be familiar to a person implementing the target. The problem I had with a series of target callback is that to understand the program flow you need to read the generic SelectionDAG code to find out when those callbacks are called. The fact that the new code is 169 lines less is also
an indication that it is simpler.

Some general comments:

It changes the order in which the physregs are copied to vregs in the entry block.

Can you explain a little more on how the order is changed?

We are unlucky in the testcase and the register allocator chooses registers in a way that fewer stmia instructions are formed, the code gets slightly bigger and we need a wide jump.
I can make this true NFC but that will mean changing the code to add some registers in reverse order without any good explanation of why...

There is no need to match exactly ... But I would like to understand why, since this change may break TLS.

I think the original code tries to make it easy for a target to start supporting splitCSR, or for a calling convention to start using splitCSR.
With this updated approach, is it still easy to do so? I didn't really look through the patch to figure out how :]

The main motivation of the change is that the decision on whether to use SplitCSR had to be done before any other SelectionDAG code, this meant that there was no CCState object available and you cannot create one because ISD::InputArgs is not available. So you had to make that decisions purely on the IR function. The commit depending on this wants to use SplitCSR for all cases when a parameter is passed in a callee save register.

Makes sense, I was thinking about just handling the swiftself attribute, but I guess we can generalize it to support using CSR to pass parameters. But please make it clear that using a CSR to pass parameter means that you are opting into SplitCSR and also note that it is possible for PEI to use a CSR so CSRs that are explicitly handled should not be used by PEI.

Without CCState available you can only match on calling convention IDs and you would need to hardcode knowledge about how registers will be chosen later based on that, doing it later you can just take the information that is already computed.

To use SplitCSR with this change you basically: Decide whether you need it in LowerFormals() and if so call addLiveIn() for every register handled this way, you also alter LowerReturn to copy the register back. I found this easier to understand as that closely follows the logic that you copy the register to a vreg in the entry block and copy it back in the return block. LowerFormals() and LowerReturn() should already be familiar to a person implementing the target. The problem I had with a series of target callback is that to understand the program flow you need to read the generic SelectionDAG code to find out when those callbacks are called. The fact that the new code is 169 lines less is also
an indication that it is simpler.

Fewer lines do not necessarily mean simpler logic :] There are other ways to de-duplicate the implementations of insertCopiesSplitCSR etc.

After this patch, the only thing that mentions SplitCSR is the function mayUseSplitCSR in TargetLowering. Please add more comments there about SplitCSR.

Cheers,
Manman

include/llvm/Target/TargetLowering.h
1223

This is the only place mentioning SplitCSR, please add more comments here.

2670

I found this comment hard to read. Can you add more context on when this will be called? Instead of addCalleeSaveRegOps, how about copyCSRSFromVregsToPhys or something more concrete?

MatzeB abandoned this revision.Sep 10 2021, 10:11 AM