Page MenuHomePhabricator

[AArch64][SVE] Fix AArch64_SVE_VectorCall calling convention

Authored by MattDevereau on Sep 8 2022, 5:10 AM.



This fixes the case where callees with SVE arguments outside of the z0-z7
range were incorrectly deduced as SVE calling convention functions

Diff Detail

Event Timeline

MattDevereau created this revision.Sep 8 2022, 5:10 AM
Herald added a project: Restricted Project. · View Herald Transcript
MattDevereau requested review of this revision.Sep 8 2022, 5:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 8 2022, 5:10 AM
peterwaller-arm added inline comments.Sep 8 2022, 5:48 AM

(A && B) || A simplifies to A.

MattDevereau added inline comments.Sep 8 2022, 5:55 AM

Oops, didn't see that one. Thanks

I've simplified the pointless if(A && B || A) check. This was leftover from when I copied and simplified the code from AArch64TargetLowering::LowerFormalArguments which has seperate logic for predicates and scalable vectors like so:

if (VA.isRegLoc()) {
  // Arguments stored in registers.
  EVT RegVT = VA.getLocVT();
  const TargetRegisterClass *RC;

  if (RegVT == MVT::i32)
    RC = &AArch64::GPR32RegClass;
  else if (RegVT == MVT::i64)
    RC = &AArch64::GPR64RegClass;
  else if (RegVT == MVT::f16 || RegVT == MVT::bf16)
    RC = &AArch64::FPR16RegClass;
  else if (RegVT == MVT::f32)
    RC = &AArch64::FPR32RegClass;
  else if (RegVT == MVT::f64 || RegVT.is64BitVector())
    RC = &AArch64::FPR64RegClass;
  else if (RegVT == MVT::f128 || RegVT.is128BitVector())
    RC = &AArch64::FPR128RegClass;
  else if (RegVT.isScalableVector() &&
           RegVT.getVectorElementType() == MVT::i1) {
    RC = &AArch64::PPRRegClass;
  } else if (RegVT.isScalableVector()) {
    RC = &AArch64::ZPRRegClass;
  } else
    llvm_unreachable("RegVT not supported by FORMAL_ARGUMENTS Lowering");
Matt added a subscriber: Matt.Sep 20 2022, 1:51 PM
peterwaller-arm accepted this revision.Sep 22 2022, 2:06 AM
peterwaller-arm added inline comments.

I can't think of a way that you can get a returned scalable type which doesn't go in a register, but it doesn't seem inconceivable it may be possible in the distant future. Since we're leaving a subtlety here it seems worth a brief comment of this form:

// Check the types of the returned values. For the time being it is sufficient to check the VT as the RegVT is not known at this point.

You could also break once the CallConv is set.

This revision is now accepted and ready to land.Sep 22 2022, 2:06 AM
paulwalker-arm added inline comments.Oct 3 2022, 5:29 AM

Can we handle this in the same way as we do for arguments. The necessary work is done within LowerCallResult. It looks like every target does something special with LowerCall our only caller so I think we should hoist the necessary logic to construct RVLocs into LowerCall and pass it into LowerCallResult instead of Ins.

Doing this means we can use the same lambda as above to give us a bit of future proofness (I don't expect extra test coverage because I don't believe this is all that well defined given there's no way to write a suitable example in C).


Not sure if there's a coding standard rule against this but it's not the typically style used. Most would write:

if (EVT RegVT = ArgLocs[i].getLocVT())
  if (RegVT.isScalableVector())
    CallConv = CallingConv::AArch64_SVE_VectorCall;

That said I think relying on the ValueType is fragile. Given ArgLocs contains actual register assignments would the following also work instead of the original lambda function?

if (!Loc.isRegLoc())
  return false;

return AArch64::ZPRRegClass.contains(Loc.getLocReg()) || AArch64::PPRRegClass.contains(Loc.getLocReg());

Given the context of the patch I think you could do with a few more tests to ensure all the code paths are covered. For example:

sve_caller_non_sve_high_range_callee - this is the function you have today
sve_ret_caller_non_sve_high_range_callee - as above but with no sve arguments

It's also worth adding another aapcs# callee test whereby the callee has an out of range SVE argument but returns an SVE result.


It's better for this to be a declared function only so there's no chance of something peeking into the called function. If non_sve_callee_high_range serves no other purpose then you can just delete its definition.

MattDevereau added inline comments.Oct 4 2022, 9:08 AM

Is it possible to hoist this cleanly and eliminate all of the RVLocs assignments in LowerCallResult? The work done in LowerCallResult is dependent on CallConv which is set by the CalleeInSVE/CalleeOutSVE lambdas here. When 1:1 hoisting the RVLocs work in LowerCallResult to replace CalleeInSVE here I'm seeing test failures in arm64_32-fastisel.ll.

@paulwalker-arm In the interest of moving this ticket along I've implemented most of your feedback with the exception of hoisting the work done inside LowerFormalArguments into LowerCall. This is because LowerFormalArguments expects CallConv to be changed by CalleeInSVE/CalleeOutSVE in LowerCall. Removing the work in LowerFormalArguments causes test failures.

I don't understand what you mean by "exception of hoisting the work done inside LowerFormalArguments into LowerCall". This is not something I requested is it? I mentioned passing RVLocs directly into LowerCallResult in place of the current Ins parameter because we shouldn't need to compute it twice. I've just tried this myself with your patch and see no issues.


You don't need the Callee... local variables anymore but I guess they don't hurt.

@paulwalker-arm I'm referring to this comment, particularly the sentence "I think we should hoist the necessary logic to construct RVLocs into LowerCall and pass it into LowerCallResult instead of Ins" which I assumed was referring to this whole block at the start of LowerCallResult here

SDValue AArch64TargetLowering::LowerCallResult(
    SDValue Chain, SDValue InFlag, CallingConv::ID CallConv, bool isVarArg,
    const SmallVectorImpl<ISD::InputArg> &Ins, const SDLoc &DL,
    SelectionDAG &DAG, SmallVectorImpl<SDValue> &InVals, bool isThisReturn,
    SDValue ThisVal) const {
  CCAssignFn *RetCC = CCAssignFnForReturn(CallConv);
  // Assign locations to each value returned by this call.
  SmallVector<CCValAssign, 16> RVLocs;
  DenseMap<unsigned, SDValue> CopiedRegs;
  CCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(), RVLocs,
  CCInfo.AnalyzeCallResult(Ins, RetCC);

Is it not the case that CCInfo(CallConv... here is part of the necessary logic to construct RVLocs here and it is dependent on the work done in LowerCall, which can set CallConv to CallingConv::AArch64_SVE_VectorCall and affect this block of code? I do not see all tests passing when I move this entire block. It sounds like we're not on the same page on the word hoist which I assumed meant cut and paste everything related to RVLocs in LowerCallResult to LowerCall. Did you mean to keep the lines containing CCInfo in LowerCallResult but pass RVLocs by a non-const reference?

Hoisted RVLocs work out of LowerFormalArguments
Added further test coverage for SVE/Non-SVE caller-callee situations

paulwalker-arm accepted this revision.Oct 6 2022, 10:13 AM