This is an archive of the discontinued LLVM Phabricator instance.

[clang][PowerPC] PPC64 VAArg use coerced integer type for direct aggregate fits in register
ClosedPublic

Authored by tingwang on Sep 5 2022, 11:45 PM.

Details

Summary

This is an attempt to fix issue: https://github.com/llvm/llvm-project/issues/55900

PPC64_SVR4_ABI handles those by-value aggregate fits in one register using coerced integer type.
https://github.com/llvm/llvm-project/blob/51d33afcbe0a81bb8508d5685f38dc9fdb2b60c9/clang/lib/CodeGen/TargetInfo.cpp#L5351

Regarding the issue, the aggregate is passed using i8 as parameter. On big-endian, after register content stored to memory, the char locates at 7th byte. However current PPC64_SVR4_ABIInfo::EmitVAArg() generates argument access using the original type, so there is type mismatch between caller and callee.

This patch tries to teach PPC64_SVR4_ABIInfo::EmitVAArg() regarding the type coerce. I'm not sure if this should be fixed in clang or backend, but I guess in the clang more likely, since there is logic taking care of argument smaller than a slot:
https://github.com/llvm/llvm-project/blob/51d33afcbe0a81bb8508d5685f38dc9fdb2b60c9/clang/lib/CodeGen/TargetInfo.cpp#L356

Please help me review and let me know if any comments. Thank you!

Diff Detail

Event Timeline

tingwang created this revision.Sep 5 2022, 11:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2022, 11:45 PM
tingwang requested review of this revision.Sep 5 2022, 11:45 PM

I think it is correct to implement this in Clang. Note that on SystemZ (another big-endian platform), we also implement this in EmitVAArg. Of course the details are different since we're not using emitVoidPtrVAArg on that platform.

However, I'm not sure if the implementation itself is quite correct, in particular if it is right to just replace the type. Note that looking into emitVoidPtrVAArg I see this:

// If the argument is smaller than a slot, and this is a big-endian
// target, the argument will be right-adjusted in its slot.
if (DirectSize < SlotSize && CGF.CGM.getDataLayout().isBigEndian() &&
    !DirectTy->isStructTy()) {
  Addr = CGF.Builder.CreateConstInBoundsByteGEP(Addr, SlotSize - DirectSize);
}

which seems to implement exactly the same logic, but *without* modifying the type as seen in the IR. It looks like the only change needed for ppc would be to remove the !DirectTy->isStructTy() check here? (I guess to avoid inadvertently change other targets, this might need to be triggered by a flag passed as argument. On the other hand, maybe there is no other big-endian platform using emitVoidPtrVAArg anyway?)

It looks like the only change needed for ppc would be to remove the !DirectTy->isStructTy() check here? (I guess to avoid inadvertently change other targets, this might need to be triggered by a flag passed as argument. On the other hand, maybe there is no other big-endian platform using emitVoidPtrVAArg anyway?)

Thank you! Looked a little bit into history: the !DirectTy->isStructTy() check is specifically added in https://reviews.llvm.org/D21611. I will update patch to add a flag for PPC64 passed as argument.

During investigate, I noticed an issue caused by transition to opaque-pointer in test case introduced by D21611: the case CHECK-NOT contains typed pointer will not match clang generated opaque-pointer in any way, so these tests are invalidated and here is risk until these test cases are updated.

tingwang updated this revision to Diff 458721.Sep 8 2022, 6:59 AM

Update according to comments:
(1) The argument type is not touched.
(2) Add flag in APIs to force right-adjust the parameter for this issue.
(3) One change on if check logic: now use CoerceTy->getIntegerBitWidth() directly compare with GPRBits, previously used llvm::alignTo(CoerceTy->getIntegerBitWidth(), 8). I think llvm::alignTo() is redundant here.
(4) Created NFC patch to pre-commit the test case.

uweigand added inline comments.Sep 8 2022, 7:10 AM
clang/lib/CodeGen/TargetInfo.cpp
5463

Are all these checks really necessary here? This seems to duplicate the checks that are already in emitVoidPtrVAArg ... Can't we simply always pass true for ForceRightAdjust on PowerPC?

tingwang added inline comments.Sep 8 2022, 7:36 AM
clang/lib/CodeGen/TargetInfo.cpp
5463

Ah, yes. All these checks are redundant, and this essentially reverts back to before the commit that broke this.

tingwang updated this revision to Diff 458735.Sep 8 2022, 7:36 AM

Update according to comments:
Remove all those checks, they are redundant.

(Inviting reviewers from D21611 to look into the change here.)

The expected pattern test8va() added in this patch was broken by D21611. Now as suggested by Ulrich, we are planning to create a flag for the original case. I have verified the functionality works with all kinds of aggregates smaller than SlotSize on PPC64. (If necessary, I can add those test case borrowed from clang/test/CodeGen/PowerPC/ppc-aggregate-abi.cpp) Let me know if you have any comments. Thank you!

This LGTM now, but maybe some of the PowerPC reviewers would like to comment as well. @nemanjai ?

I am not crazy about adding the Boolean parameter here or about the name. Seems somewhat unclear when a caller wants to pass true there.

What I think would be a more robust solution would be to use the same logic that decides whether to coerce the struct argument to an integer type. It seems that any big endian ABI that does this would want to ensure the access is on the right side.

Ultimately what I am getting at here is that we consider how the caller passes the value and how the callee accesses it separately - which is what leads to problems like this. Can we decide using the same function for the caller and the callee?

tingwang updated this revision to Diff 462113.Sep 22 2022, 1:52 AM

Add TODO comments.

I am not crazy about adding the Boolean parameter here or about the name. Seems somewhat unclear when a caller wants to pass true there.

What I think would be a more robust solution would be to use the same logic that decides whether to coerce the struct argument to an integer type. It seems that any big endian ABI that does this would want to ensure the access is on the right side.

Ultimately what I am getting at here is that we consider how the caller passes the value and how the callee accesses it separately - which is what leads to problems like this. Can we decide using the same function for the caller and the callee?

I think this requirement is nontrivial for me right now, so I added TODO comments. Can we take this as a workaround for the issue #55900?

tingwang updated this revision to Diff 466710.EditedOct 10 2022, 11:44 PM
tingwang added a reviewer: nemanjai.

Rebase test case && Gentle ping.

This seems be to a genuine target difference, right? PPC64 has this ABI rule:

An aggregate or union smaller than one doubleword in size is padded so that it appears in the least significant bits of the doubleword.

which overrides the standard rule for passing aggregates:

Fixed size aggregates and unions passed by value are mapped to as many doublewords of the parameter save area as the value uses in memory. Aggregrates and unions are aligned according to their alignment requirements. This may result in doublewords being skipped for alignment.

Other big-endian targets don't do this to non-fundamental types as far as I can tell. So I don't really get the TODO in the comment; this does seem to be something that ABIs need to pass in.

This seems be to a genuine target difference, right? PPC64 has this ABI rule:

An aggregate or union smaller than one doubleword in size is padded so that it appears in the least significant bits of the doubleword.

which overrides the standard rule for passing aggregates:

Fixed size aggregates and unions passed by value are mapped to as many doublewords of the parameter save area as the value uses in memory. Aggregrates and unions are aligned according to their alignment requirements. This may result in doublewords being skipped for alignment.

Other big-endian targets don't do this to non-fundamental types as far as I can tell. So I don't really get the TODO in the comment; this does seem to be something that ABIs need to pass in.

Thank you. I realized the TODO applies to the specific PPC64 case, not the API. I will move that.

tingwang updated this revision to Diff 466992.Oct 11 2022, 6:33 PM

Address comment: move TODO close to PPC64 specific case.

rjmccall added inline comments.
clang/lib/CodeGen/TargetInfo.cpp
5470–5471

Please add this comment explaining the use of ForceRightAdjust:

The PPC64 ABI passes some arguments in integer registers, even to variadic functions. To allow va_list to use the simple "void*" representation, variadic calls allocate space in the argument area for the integer argument registers, and variadic functions spill their integer argument registers to this area in their prologues. When aggregates smaller than a register are passed this way, they are passed in the least significant bits of the register, which means that after spilling on big-endian targets they will be right-aligned in their argument slot. This is uncommon; for a variety of reasons, other big-endian targets don't end up right-aligning aggregate types this way, and so right-alignment only applies to fundamental types. So on PPC64, we must force the use of right-alignment even for aggregates.

I'm not sure what your TODO is hoping for. You'd like to re-use logic between the frontend's va_arg emission and the backend's variadic argument emission? That would be very tricky.

rjmccall added inline comments.Oct 11 2022, 8:08 PM
clang/lib/CodeGen/TargetInfo.cpp
327

Argh, Phabricator dropped one of my comments, and it's the one that explains why I CC'ed Tim Northover.

I'm a little worried about the existing uses of this function because this function is sensitive to the type produced by ConvertTypeForMem. ConvertTypeForMem *mostly* only generates IR struct types for C structs and unions, but there are a few places where it generates an IR struct for some fundamental type that stores multiple values. Most of those types are at least as large as an argument slot (e.g. they contain pointers), unless there's some weird target with huge slots. However, some of them are not; I think the most important example is _Complex T, which of course gets translated into a struct containing two Ts. So if T is smaller than half an argument slot, we're not going to right-align _Complex T on big-endian targets other than PPC64, and I don't know if that's right.

That would affect _Complex _Float16 on 64-bit targets; on 32-bit targets, I think you'd need something obscure like _Complex char to exercise it.

Now, if Clang generates arguments for one of these types using a single value that's also of IR struct type, and the backend considers that when deciding whether to right-align arguments, then maybe those two decisions cancel out and we've at least got call/va_arg compatibility, even if it's not necessarily what's formally specified by the appropriate psABI. But DirectTy is definitely not necessarily the type that call-argument lowering will use, so I'm a little worried.

tingwang added inline comments.Oct 12 2022, 1:04 AM
clang/lib/CodeGen/TargetInfo.cpp
327

Thank you!

I checked the _Complex char case on PPC64: complex element size smaller than argument slot is handled by complexTempStructure() (https://github.com/llvm/llvm-project/blob/51d33afcbe0a81bb8508d5685f38dc9fdb2b60c9/clang/lib/CodeGen/TargetInfo.cpp#L5451), and the right-adjustment is taken care by that logic. Both AIX and PPC64 use complexTempStructure() to produce variadic callee arguments in this case.

In case _Complex char is encapsulated inside structure, then the whole is considered as an aggregate, and is addressed by this fix. I will add a test case to illustrate.

Hope these addressed your concern.

5470–5471

Sure, I will add the comment. Thank you.

Maybe I misunderstood some previous comment. I will drop the TODO.

tingwang updated this revision to Diff 467048.Oct 12 2022, 1:12 AM

Update according to comment.

rjmccall accepted this revision.Oct 12 2022, 10:56 AM
rjmccall added inline comments.
clang/lib/CodeGen/TargetInfo.cpp
327

This shouldn't be a problem for PPC64 because the ABI specifically makes this decision independent of what kind of type it is. I'm thinking it might be a problem for existing big-endian targets, though, especially if any 64-bit big-endian targets support _Float16, because _Complex _Float16 is a standard type that we need to make sure follows the official psABI. _Complex char is a Clang extension, so as long as the frontend and backend agree, I'm not particularly worried about psABI compatibility.

Anyway, since this doesn't affect PPC64, it doesn't need to hold up your patch.

This revision is now accepted and ready to land.Oct 12 2022, 10:56 AM
This revision was landed with ongoing or failed builds.Oct 16 2022, 7:03 PM
This revision was automatically updated to reflect the committed changes.