This is an archive of the discontinued LLVM Phabricator instance.

[AIX] Allow vararg calls when all arguments reside in registers.
ClosedPublic

Authored by cebowleratibm on Dec 4 2019, 6:26 AM.

Details

Summary

This patch pushes the AIX vararg unimplemented error diagnostic later and allows vararg calls so long as all the arguments can be passed in register.

This patch extends the AIX calling convention implementation to initialize GPR(s) for vararg float arguments. On AIX, both GPR(s) and FPR are allocated for floating point arguments. The GPR(s) are only initialized for vararg calls, otherwise the callee is expected to retrieve the float argument in the FPR.

f64 in AIX PPC32 requires special handling in order to allocated and initialize 2 GPRs. This is performed with bitcast, SRL, truncation to initialize one GPR for the MSW and bitcast, truncations to initialize the other GPR for the LSW.

A future patch will follow to add support for arguments passed on the stack.

Diff Detail

Event Timeline

cebowleratibm created this revision.Dec 4 2019, 6:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 4 2019, 6:26 AM
ZarkoCA added inline comments.Dec 4 2019, 9:06 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
6881

May be better to use the LocVT variable instead of hardcoding MVT::f64 so we know we the actual type. Meant to change this in my previous patch but it doesn't look like I kept it in the diff.

llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
705

Can one of these be set as a float? This way it's explicitly shown that floats and doubles are handled differently.

cebowleratibm marked 2 inline comments as done.Dec 5 2019, 4:50 AM
cebowleratibm added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
6881

Ah yes, I had thought we should use f64 because the register is 64-bit but now understand that a reg holding f32 is understood to be a widened f32 value. I'll make this change.

llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
705

We could, but there's no way to do so from C code. You can do so for typed functions, but then you don't initialize the GPR and this is outside the scope of this patch.

We could write such a test in IR with understanding nothing in C/C++ will produce such IR.

clang-format the changes please.

Clang formatted the change and updated the float reg to use LocVT.

sfertile added inline comments.Dec 10 2019, 7:45 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
6886

We should likely be creating these as shadow regs in the non-vararg case, but I failed to see that in our previous review, sorry. I don't think we need to update that for this patch, but we should before the calling convention work is finished.

7104–7108

Suggestion: I'd prefer we lead with the 2 common cases, then have the special case handling at the end.

if (VA.isRegLoc() && !VA.needsCustom())) {
  RegsToPass.push_back(std::make_pair(VA.getLocReg(), Arg));
} else if(VA.isMemLoc()) {
  /// Impl
} else (VA.isRegLoc()) {
  // Custom handling here.
}

I don't fell strongly about this though and think the current form is fine too.

llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
635

When the register is a scratch register like $r3 is in this instruction we should be using a FileCheck variable instead, to show that the choice of register isn't important.

642

IIUC if we were targeting an cpu that has the mfvsrd instruction then we wouldn't expect this store to the stack. Do we want to copy this part of the test into a standalone lit test that will ensure we verify there is no store once we enable the VSX extension for AIX?

650

Ditto on using a filecheck variable for scratch registers.

cebowleratibm marked 6 inline comments as done.Dec 12 2019, 8:53 AM

Minor tweaks and test upates to come. I'll add an assembly test to validate that we don't need to store to memory to initiailze the two GPRs for f64 in PPC32 mode on newer hardware.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
6886

A shadow reg is allocated in the same way as any other reg. What makes it a shadow reg is if there's no associated RegLoc for the register. This falls out naturally because we only create the RegLoc for vararg calls where we require it be initialized. The allocated but uninitialized GPRs are already recognized as shadow regs.

7104–7108

I think it's a good suggestion to move the MemLoc fatal error. I'll move it to the top of the loop rather than here.

I prefer to use the continue to keep the nesting down rather than use an else.

llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
635

I agree, though I think it can be onerous to manually regex scratch regs. I'd like to look into using the utils/update_* scripts for test management in the future but I'll finish this test out as per your suggestion.

642

Perhaps a sign that it may be better to write these tests in assembly. I think it's a good idea to add the coverage you've suggested.

650

ack.

hubert.reinterpretcast added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
6887

Newline before this to increase the association of this comment line with the following block.

6894

I can't fathom why there would be objections over using braces here. I believe the lack of braces seriously hurts readability in this case.

6898

A newline would help make the early exit more obvious.

7053–7054

Add const to be consistent with LinkageSize.

7054

const here as well.

7074–7083

Do the current naming conventions indicate that these should be I and E?

7075

I suggest to increment here (ArgLocs[i++]) instead of in the loop-continuation. I find increment-right-after-latching to be more symmetric for this code.

7112

Period at the end of the sentence.

7120

If changing the increment, then i == e here.

7122

If changing the increment, then ArgLocs[i++] here.

7124

What sort of situation leads to this?

7125

Period at the end of the sentence.

cebowleratibm marked 14 inline comments as done.Dec 13 2019, 7:33 AM
cebowleratibm added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
6887

All 4 comments apply to the block of code that follows. I'm not sure where you want the newline or why it improves readability.

7112

For the assertion comment? I do not see this precedent in the LLVM source but I will add an exclamation because that seems to be the thing to do ;)

7124

The matching ValNo should be an assertion. I should have written:

 // If no GPRs were left then there may be no more ArgLocs remaining.
if (I + 1 == e)
  continue;
CCValAssign &GPR2 = ArgLocs[++I];
// If no GPRs were left then there may be non-reg ArgLocs remaining.
if (!GPR2.isRegLoc()))
  continue;
assert(GPR2.getValNo() == GPR1.getValNo() && GPR2.needsCustom() &&
       "A second custom GPR is expected!");
hubert.reinterpretcast marked an inline comment as done.Dec 13 2019, 8:13 AM
hubert.reinterpretcast added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
6887

All four comments apply to the larger block of code that follows. The last comment line is about the smaller block of code that follows, and I was asking for a newline before that last comment line. I might also go for putting it inside the if block.

7112

There is not a consistency in precedent in the LLVM source on this, but it's an easier rule to follow to always use punctuation (except to check precedent carefully where user messages are involved).

7124

If we observed one custom reg and we are dealing with the var arg case. My understanding is that, if we were unable to allocate a second GPR for the vararg case, the loop would have hit report_fatal_error.

cebowleratibm marked 4 inline comments as done.Dec 13 2019, 8:59 PM
cebowleratibm added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
7124

Yes we should have emitted an error. So the final assertion checks that contract. If we see a RegLoc that follows the "GPR1" RegLoc, then we expect that to be the second custom reg and have the same ValNo as the first.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
7124

I am trying to understand why we cannot assert that there is at least one ArgLoc remaining, that said ArgLoc is a RegLoc, and that said RegLoc is the second custom reg.

llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
705

This case suggested by @ZarkoCA raises the question of how a 32-bit floating point value will be represented in a 64-bit GPR. What is the expected result? The equivalent case for XL generates a 64-bit GPR with zeroes in the high bits and the bits forming the 32-bit floating point value in the low bits.

cebowleratibm marked 3 inline comments as done.Dec 16 2019, 5:34 AM
cebowleratibm added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
7124

You're right. Given that we've seen the first custom reg we know the call is vararg, which means the second custom reg must be present or we would have emitted an error.

I think the code goes back to this when we add memloc support, however, to keep the implementation on trunk clean I'll tidy this up.

llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
705

I agree there is a question of how an f32 occupies a 64-bit register, though this issue is not specific to calling convention and I had considered it outside the scope of this patch.

I'm adding an assembly test that the GPRs can be initialized without going through memory on newer hardware. I'll also add an assembly test for passing an f32 to a typed function and validate the instructions used to populate the register.

sfertile marked an inline comment as done.Dec 16 2019, 7:37 AM
sfertile added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
6886

Thanks for clearing that up. I mistakenly though the CCState object had to specifically be told a register is a shadow reg, for example through AllocateReg (unsigned Reg, unsigned ShadowReg).

7104–7108

Sure, that sounds good to me.

llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
705

As you said, passing f32 to a typed function would not populate a 64-bit GPR. The patch does, however, admit the possibility of passing f32 in a vararg case under 64-bit mode, hence the question. Even if the patch is updated to error on any case of needing to populate a GPR from an f32, the test should probably test for the error.

In any case, it is known that the treatment of float _Complex in Clang IR codegen is not correct on AIX (the byval struct passing would populate 64 bits of GPR and no FPRs). The passing of float _Complex via varargs is a case where we populate GPRs with f32s in the manner I described.

cebowleratibm marked an inline comment as done.Dec 17 2019, 4:28 AM
cebowleratibm added inline comments.
llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
705

I was confused on my last comment. We're talking about the 64-bit GPR initialization and this issue is specific to vararg calling convention. I'll make f32 vararg an error for both 32 and 64 bit modes. The 32-bit mode case probably doesn't need to be an error but I think a follow on patch can address it and the _Complex case you mention.

cebowleratibm planned changes to this revision.Dec 17 2019, 6:42 AM

Recent problems have come up generating the GPR initializations for floating point args, which may require some rework. Hold off on review until revision is posted.

I've decided to use custom regs for all gpr inits for float varargs. I discovered that f64 passing in 64-bit gpr was missing a required bitcast to int to handle the initialization. The new logic ensures the bitcast occurs for any custom handling. Custom handling is only expected for gpr inits of float args.

I've added assembly tests to ensure that we emit the correct instructions to initialize the GPRs from float regs. On pwr8 and newer there are instructions available to do this without going through memory, however, I couldn't validate this because of the AIX altivec limitation. Consequently, I've added the assembly test for pre-pwr8 and added an error test such that we can validate this when altivec is enabled for AIX.

cebowleratibm marked 5 inline comments as done.Jan 8 2020, 5:03 AM
cebowleratibm marked 8 inline comments as done.Jan 8 2020, 5:07 AM
cebowleratibm added inline comments.
llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
705

In the most recent revision I decided it was just as well to make f32 vararg work. Test added as call_test_vararg4.

ZarkoCA added inline comments.Jan 8 2020, 6:33 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
6834

I think you should remove the !, currently it would error on all types.

6889

nit: I instead of i?

llvm/test/CodeGen/PowerPC/aix-cc-altivec.ll
4

While I am partial to using "AltiVec", most instances in the codebase of the word as used in English has it written capitalized (as a proper noun) as "Altivec".

5

The wording is ambiguous as to whether the test should fail (period) or the test should fail to alert. Replace "to" with "in order to".

6

"AIX".

cebowleratibm marked 7 inline comments as done.Jan 9 2020, 4:18 AM
cebowleratibm added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
6834

Good catch. I had a typo.

6889

It was 'i' before but I can change it while I'm here.

cebowleratibm marked 2 inline comments as done.

Rewrote comments in aix-cc-altivec.ll test, fixed the vector diagnostic bug Zarko pointed out and addressed small nits.

ZarkoCA accepted this revision.Jan 9 2020, 10:00 AM

LGTM

This revision is now accepted and ready to land.Jan 9 2020, 10:00 AM

Some minor nits that can be fixed on the commit, but otherwise LGTM.

llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
637

Reusing REG is okay, but the 64-bit test does no do so. I think being consistent is better.

706

Same comment about reuse.

777

Same comment about reuse.

846

The scratch register should not be hardcoded.

855

Same comment about hardcoding.

864

The 32-bit tests checks are grouped together for the other tests but not here.

llvm/test/CodeGen/PowerPC/aix-cc-altivec.ll
2

I don't think it hurts to have the 32-bit RUN line here now as well.

7

I think that this should be testing the passing of floats too when the time comes.

17

I don't think it hurts to have a float parameter here now as well.

cebowleratibm marked 9 inline comments as done.

Already approved but I've addressed the nits and posted the final commit here because I'll need someone to commit it for me.

This revision was automatically updated to reflect the committed changes.