This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Implement 64-bit ELFv2 Calling Convention in TableGen (for integers/floats/vectors in registers)
ClosedPublic

Authored by amyk on Nov 5 2022, 4:19 PM.

Details

Summary

This patch partially implements the parameter passing rules outlined in the ELFv2 ABI
within TableGen. Specifically, it implements the parameter assignment of integers, floats, and
vectors within registers - where the GPR numbering will be "skipped" depending on the ordering
of floats and vectors that appear within a parameter list.

As we begin to adopt GlobalISel to the PowerPC backend, there is a need for a TableGen definition
that encapsulates the ELFv2 parameter passing rules. Thus, this patch also changes the default
calling convention that is returned within the ccAssignFnForCall() function used in our GlobalISel
implementation, and also adds some additional testing of the calling convention that is implemented.

Future patches that build on top of this initial TableGen definition will aim to add more of the
ABI complexities, including support for additional types and also in-memory arguments.

Diff Detail

Event Timeline

amyk created this revision.Nov 5 2022, 4:19 PM
amyk requested review of this revision.Nov 5 2022, 4:19 PM
Kai added a comment.Nov 18 2022, 1:47 PM

Some nit comments but in general looks good to me.

llvm/lib/Target/PowerPC/PPCCallingConv.cpp
1

Sorry, that is not really part of your functionality but it confused me a lot.

16–18

Since this is only used in CC_PPC64_ELF_Shadow_GPR_Regs it can be moved inside the function.
You could also think about using an ArrayRef which would eliminate the need for ELF64NumArgGPRs:

static ArrayRef<MCPhysReg> ELF64ArgGPRs = {PPC::X3, PPC::X4, PPC::X5, PPC::X6,
                                           PPC::X7, PPC::X8, PPC::X9, PPC::X10};
31

I wonder if the following conditions will be easier to understand if you add an quick exit here:

if (FirstUnallocGPR == ELF64NumArgGPRs)
  return false;
llvm/lib/Target/PowerPC/PPCCallingConv.td
134

Please check the length of the line here and the long lines below.

amyk added inline comments.Nov 19 2022, 9:19 PM
llvm/lib/Target/PowerPC/PPCCallingConv.cpp
16–18

Thanks for the suggestion, Kai!
I just wanted to understand this a bit more. Is the suggestion to instead use ArrayRef and also the size() utility that comes along with it? If that is the case, wouldn't I still require ELF64NumArgGPRs?

My apologies if I may have misunderstood the suggestion.

amyk updated this revision to Diff 476729.Nov 19 2022, 10:13 PM

Address review comments from Kai regarding early exit, moving variables into the calling convention function and TD file definition line length.

I will address the comments regarding ArrayRef comment once I receive clarification.

amyk marked 3 inline comments as done.Nov 19 2022, 10:14 PM
amyk updated this revision to Diff 477859.Nov 24 2022, 10:28 PM

Rebase patch. I also sync'd with Kai offline and the current implementation using a static const MCPhysReg array and keeping ELF64NumArgGPRs is OK to keep.

@nemanjai Whenever you get the chance, are you able to take a look at this patch?

stefanp added a subscriber: stefanp.Dec 2 2022, 8:38 AM

Silly question:
Can we now get rid of CC_PPC64_ELF_FIS completely?

llvm/lib/Target/PowerPC/PPCCallingConv.cpp
24

nit:

This function handles the shadowing the GPRs for fp and vector types.

to:

This function handles the shadowing of GPRs for fp and vector types.
43

I'm not sure you need this computation at all.
LocVT.getSizeInBits() is either 32 or 64 because we know that LocVT is either MVT::f32 or MVT::f64.
However, in that case

f32
SizeInDWord = (32 + 63) / 64 = 1
f64
SizeInDWord = (64 + 63) / 64 = 1

Both are integer divisions which just discard the decimal part. So, it looks to always be 1 in this situation.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
18258

Does this also need to be changed in PPCFastISel?

amyk added inline comments.Dec 2 2022, 8:02 PM
llvm/lib/Target/PowerPC/PPCCallingConv.cpp
24

Good catch, I will update this.

43

That's actually a good point. Thanks Stefan.

amyk updated this revision to Diff 479799.Dec 2 2022, 8:03 PM

Address review comments:

  • Update comments.
  • Clean up section involving shadowing GPRs for float/double.
amyk added a comment.Dec 2 2022, 8:04 PM

@stefanp Thanks for the review and the questions!

Can we now get rid of CC_PPC64_ELF_FIS completely?

That’s a good question! This is related to your question about updating PPCFastISel. From what I can tell, it looks like mine covers everything the PPCFastISel one covers:

// Simple calling convention for 64-bit ELF PowerPC fast isel.
// Only handle ints and floats.  All ints are promoted to i64.
// Vector types and quadword ints are not handled.
let Entry = 1 in
def CC_PPC64_ELF_FIS : CallingConv<[
  CCIfCC<“CallingConv::AnyReg”, CCDelegateTo<CC_PPC64_AnyReg>>,

  CCIfType<[i1],  CCPromoteToType<i64>>,
  CCIfType<[i8],  CCPromoteToType<i64>>,
  CCIfType<[i16], CCPromoteToType<i64>>,
  CCIfType<[i32], CCPromoteToType<i64>>,
  CCIfType<[i64], CCAssignToReg<[X3, X4, X5, X6, X7, X8, X9, X10]>>,
  CCIfType<[f32, f64], CCAssignToReg<[F1, F2, F3, F4, F5, F6, F7, F8]>>
]>;

Although I realize there is a separate return value one that handles i128 that this patch does not yet handle:

// Simple return-value convention for 64-bit ELF PowerPC fast isel.
// All small ints are promoted to i64.  Vector types, quadword ints,
// and multiple register returns are “supported” to avoid compile
// errors, but none are handled by the fast selector.
let Entry = 1 in
def RetCC_PPC64_ELF_FIS : CallingConv<[
  CCIfCC<“CallingConv::AnyReg”, CCDelegateTo<RetCC_PPC64_AnyReg>>,

  CCIfType<[i1],   CCPromoteToType<i64>>,
  CCIfType<[i8],   CCPromoteToType<i64>>,
  CCIfType<[i16],  CCPromoteToType<i64>>,
  CCIfType<[i32],  CCPromoteToType<i64>>,
  CCIfType<[i64],  CCAssignToReg<[X3, X4, X5, X6]>>,
  CCIfType<[i128], CCAssignToReg<[X3, X4, X5, X6]>>,
  CCIfType<[f32],  CCAssignToReg<[F1, F2, F3, F4, F5, F6, F7, F8]>>,
  CCIfType<[f64],  CCAssignToReg<[F1, F2, F3, F4, F5, F6, F7, F8]>>,
  CCIfType<[f128],
           CCIfSubtarget<“hasAltivec()“,
           CCAssignToReg<[V2, V3, V4, V5, V6, V7, V8, V9]>>>,
  CCIfType<[v16i8, v8i16, v4i32, v2i64, v1i128, v4f32, v2f64],
           CCIfSubtarget<“hasAltivec()“,
           CCAssignToReg<[V2, V3, V4, V5, V6, V7, V8, V9]>>>
]>;

As I mentioned in one of my replies to your other comments, I did try to use this definition instead of the FastISel ones within PPCFastISel.cpp and there weren’t any issues.
It may be possible that we can get rid of CC_PPC64_ELF_FIS, although I’m not sure if we should do it within this patch. @nemanjai Do you have any thoughts on this?

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
18258

Just wanted to double check, do you mean if I need to update the calling convention used within PPCFastISel from CC_PPC64_ELF_FIS -> CC_PPC64_ELF?

If so, that’s actually a good point. I see that the FastISel TableGen definition is only a super simple one that handles ints and floats, and this one is supposed to be a more full implementation.

I actually did a quick test to see if anything went wrong if I removed the old FastISel definitions and replaced them with these ones. The testing came out clean and had no issues, so perhaps it’s a possibility that I could update the PPCFastISel one, as well. Or, I can probably put up a separate NFC patch afterwards to update this?

nemanjai added inline comments.Jan 20 2023, 8:16 AM
llvm/lib/Target/PowerPC/PPCCallingConv.cpp
24

Please note the section of the ABI document that describes this allocation algorithms (and then below, which part of the section talks about the specific aspect - such as allocating even GPRs for vectors, skipping odd ones, etc.).

40

Why does it not suffice for the rest of this function to be something simple like the following:

// For single/double precision, shadow a single GPR.
if (LocVT == MVT::f32 || LocVT == MVT::f64)
  State.AllocateReg(ELF64ArgGPRs);
else if (LocVT.is128BitVector() || (LocVT == MVT::f128)) {
  // For vector and __float128, shadow two even GPRs (skipping
  // the odd one if it is next in the allocation order).
  if ((State.AllocateReg(ELF64ArgGPRs) - PPC::X3) % 2 == 0)
    State.AllocateReg(ELF64ArgGPRs);
  State.AllocateReg(ELF64ArgGPRs);
  }
}
47–48

Is it possible for this loop to iterate more than once?

51

Please add a comment as to what happens with MVT::ppcf128. Does it get broken up into two MVT::f64's before here? Do we just ignore/skip it for now? Should it be an assert until it is implemented if it isn't already implemented?

@stefanp Thanks for the review and the questions!

Can we now get rid of CC_PPC64_ELF_FIS completely?

As I mentioned in one of my replies to your other comments, I did try to use this definition instead of the FastISel ones within PPCFastISel.cpp and there weren’t any issues.
It may be possible that we can get rid of CC_PPC64_ELF_FIS, although I’m not sure if we should do it within this patch. @nemanjai Do you have any thoughts on this?

I'd rather not change anything in FastISel for now. But feel free to add a TODO to remove it in the future if it is completely equivalent to the new one (although I am not convinced it is completely equivalent nor that it will remain so as we add functionality to it).

amyk updated this revision to Diff 492747.Jan 27 2023, 7:00 AM
amyk marked 2 inline comments as done.

Address review comments from Nemanja to simplify the register allocation function and also to add documentation to parts of the code.

llvm/lib/Target/PowerPC/PPCCallingConv.cpp
40

Thank you for the suggestion Nemanja. I've looked into this and tweaked it slightly. This simplification should suffice.

47–48

I thought it was before, but realized that after I updated the patch after Stefan's comment, this should only run once. Thus, it would make sense to remove this loop.

51

I have added a comment to state that MVT::ppcf128 does indeed get broken up to two MVT::f64s here.

amyk marked 5 inline comments as done.Jan 27 2023, 11:17 AM

Overall I think this patch looks great.

I've only added one comment. It relates to the way that a test case has changed when a struct is being passed in as a parameter.
I know that this is work in progress and it is quite possible that passing structs is not yet supported. If that's the case just add a comment (or TODO) into the test to show that the results are wrong and we know they are wrong and that they will be fixed at a later date.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
18258

I'm happy if this is a separate patch that comes after or even as a TODO in the code for us to look at later.

llvm/test/CodeGen/PowerPC/GlobalISel/irtranslator-args-lowering.ll
82

This is interesting because we are changing the way that a struct is being passed in registers.
Which of these set of liveins is correct?

My understanding of the ABI (and I could be wrong here so please read and make sure) is that non-homogeneous structs are passed as a block of memory. For example the i8 and float would be in R3 at the same time.
So, it might look something like this:

R3 -> i8, float
R4 -> i32, i32
R5 -> i32

If we look at it that way we should only be using R3, R4, R5.

Either way, I think it is important to look at this and figure out why it changes and what the ABI says.

amyk added inline comments.Jan 27 2023, 2:17 PM
llvm/test/CodeGen/PowerPC/GlobalISel/irtranslator-args-lowering.ll
82

I count be misunderstanding this as well but when reading the ABI initially, I understood it the same as you described in your comment where we would be utilizing r2, r4 and r5 for this particular struct.

Essentially, I thought that both the original and updated liveins is incorrect, just because this implementation is meant to handle simple cases of integers, floats and vectors within registers and doesn't fully support structs yet. I thought I had put a comment denoting this before but it turns out that I didn't, so thank you for pointing this out, Stefan.

I can definitely add the TODO here, and we can plan to add support for structs in a follow up patch at a later time. I also just wanted to check with @nemanjai if this is a reasonable approach for this patch.

nemanjai accepted this revision.Feb 24 2023, 6:42 AM

LGTM.

llvm/test/CodeGen/PowerPC/GlobalISel/irtranslator-args-lowering.ll
82

One easy way to resolve this is to compile something with current compilers that accesses the members from Stefan's example and see which register they're expected to be in.

Also, if we happen to do the wrong thing for GlobalISel for now for passing structs, I'm perfectly fine with marking it as a TODO to fix it later.

This revision is now accepted and ready to land.Feb 24 2023, 6:42 AM
This revision was landed with ongoing or failed builds.Mar 27 2023, 6:23 AM
This revision was automatically updated to reflect the committed changes.
amyk added inline comments.Mar 27 2023, 6:33 AM
llvm/test/CodeGen/PowerPC/GlobalISel/irtranslator-args-lowering.ll
82

Sounds good. I have added a TODO when I committed the patch. The correct way appears to be what Stefan has outlined/what was discussed (R3, R4, R5).