This is an archive of the discontinued LLVM Phabricator instance.

[RFC][CallingConv] Add CCAssignToRegWithType Calling Convention Interface
Needs ReviewPublic

Authored by shiva0217 on Jan 23 2018, 11:00 PM.

Details

Summary

For the target have f64 registers, but have calling convention passing the f64 type by i32 registers, legalizer will not split the argument.

The target has to define custom calling convention functions to assign f64 type to two location record by CCValAssign structure.

LowerFormalArgument has to generate SDNode to describe how to get input parameters and store in InVals array. Normally, one InputArg assigns to one location, but for the above case, the target has to add custom code to generate SDNode for two locations. Lowercall will need similar effort to deal with it.

If we could split the argument to i32 type during analyzing arguments, we could eliminate the effort and handle it as normal i32 type arguments.

The idea is to add CCAssignToRegWithType interface
which could describe as

CCIfType<[f64], CCAssignToRegWithType<i32, [I0, I1, I2, I3, I4, I5]>>,

The semantic will be: if the first part of i32 type could assign to a register, split the argument into two i32 arguments.

The tablegen result will be:

if (LocVT == MVT::f64) {
  LocVT = MVT::i32;
  static const MCPhysReg RegList2[] = {
    SP::I0, SP::I1, SP::I2, SP::I3, SP::I4, SP::I5
};
if (unsigned Reg = State.AllocateReg(RegList2)) {
    State.addLoc(CCValAssign::getReg(ValNo, ValVT, Reg, LocVT, LocInfo));
    State.setCCSplitType(MVT::i32);
    State.setCCSplit();
    return false;
}

If State.isCCSplit() is true, call CCState::SplitInputArg and CCState::SplitOutputArg functions to split the argument.
The patch use sparc32 to apply the interface and pass the codegen test cases in llvm/test/Codegen/SPARC.
Any suggestion would be helpful.

Diff Detail

Repository
rL LLVM

Event Timeline

shiva0217 created this revision.Jan 23 2018, 11:00 PM
chenwj added a subscriber: chenwj.EditedJan 24 2018, 4:56 AM

I have one minor concern, please see the inline comment.

include/llvm/Target/TargetCallingConv.td
93

If f64 has to be passed with even/odd register pair (i.e., r0 + r1, but not r1 + r2), can CCAssignToRegWithType handle such case?

shiva0217 added inline comments.Jan 24 2018, 7:33 AM
include/llvm/Target/TargetCallingConv.td
93

Hi Wei-Ren, the patch can't handle the case, it would be a useful feature for supporting even pair register allocation. We could try to add a RegAlign field and write a new function something like AllocateRegWithAlign (Reglist, RegAlign) in CCstate class to handle it. As a prototype implementation, I would like to make sure the concept is practical and the split logic is robust enough. Once the concept is accepted, I'll try to extend the interface to support the case. Thanks for your input.

asb added a comment.Feb 1 2018, 5:37 AM

rL305083 by @sdardis introduced getRegisterTypeForCallingConv, getNumRegistersForCallingConvention and some other related hooks. These can be used to a similar effect - did you evaluate that approach instead?

Hi Alex, It seems that getRegisterTypeForCallingCon and relative hooks will split the type before assigning registers. Therefore, f64 will split into two i32. If two i32 types can't allocate to registers, it will generate two i32 load/stores.
Another case is that riscv ilp32d ABI will pass f64 by i32 registers if f64 registers are not available. In this case, we have to do argument analysis to know there are no f64 registers left, then we assign to i32 registers.
And congratulations!

asb added a comment.Feb 1 2018, 6:41 AM

Hi Alex, It seems that getRegisterTypeForCallingCon and relative hooks will split the type before assigning registers. Therefore, f64 will split into two i32. If two i32 types can't allocate to registers, it will generate two i32 load/stores.
Another case is that riscv ilp32d ABI will pass f64 by i32 registers if f64 registers are not available. In this case, we have to do argument analysis to know there are no f64 registers left, then we assign to i32 registers.

Thanks for the explanation, the need to do this for corner cases of the ilp32d ABI is somewhat compelling. As you know from following the RISC-V patchset I do like the idea of handling as many ABI details as practicable in the backend. Being able to handle the f64 -> 2*i32 lowering in the backend still may not free the Clang frontend from having to count registers, as it may well be necessary for handling int+int or fp+int structs correctly.

Hi Alex, yes, we have to calculate the register usage of the members of a struct in clang to determine the struct arguments should pass directly or indirectly, but we don't have to calculate for each argument right? Or there are some cases we should calculate each argument register usage to make the ABI correctly?