This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Assign the full space for promoted and split outgoing args
ClosedPublic

Authored by jonpa on Feb 25 2021, 4:53 PM.

Diff Detail

Event Timeline

jonpa created this revision.Feb 25 2021, 4:53 PM
jonpa requested review of this revision.Feb 25 2021, 4:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2021, 4:53 PM
jonpa added inline comments.Feb 25 2021, 5:23 PM
llvm/test/CodeGen/SystemZ/args-11.ll
14

Not sure why 24 extra bytes are allocated and not 20... Alignment?

alex added a subscriber: alex.Feb 25 2021, 5:51 PM

I'm not sure if "getTypeToTransformTo(*DAG.getContext(), OrigArgVT)" results in the same type that is used by common code in all cases.

If you look at code in TargetLowering::LowerCallTo and getCopyToParts, the logic seems to be different, and explicitly goes via a set of full "registers":

MVT RegisterVT = getRegisterTypeForCallingConv(CLI.RetTy->getContext(),
                                               CLI.CallConv, VT);
unsigned NumRegs = getNumRegistersForCallingConv(CLI.RetTy->getContext(),
                                                 CLI.CallConv, VT);

and then

  if (NumParts * PartBits > ValueVT.getSizeInBits()) {
[...]
      ValueVT = EVT::getIntegerVT(*DAG.getContext(), NumParts * PartBits);
      Val = DAG.getNode(ExtendKind, DL, ValueVT, Val);

so I'm wondering whether it wouldn't be better to follow that model here. (E.g. by using CreateStackTemporary with a size instead of a type, and computing the size as "number-of-parts * size-of-part".)

Also, I think in any case it would be good to add an assertion when emitting the Store nodes that the target memory range of the store is actually within the allocated space, so that if we do get something wrong in the size calcuation the result will be an internal compiler error rather than wrong code.

llvm/test/CodeGen/SystemZ/args-11.ll
14

Yes, the stack always needs to be 8-byte aligned, and therefore all stack allocations are rounded up to the next multiple of 8.

jonpa updated this revision to Diff 326725.Feb 26 2021, 9:59 AM

Updated per review.

I'm wondering whether it wouldn't be better to follow that model here. (E.g. by using CreateStackTemporary with a size instead of a type, and computing the size as "number-of-parts * size-of-part".)

I think you are right - reading the comment yesterday for getTypeToTransformTo() seemed to make sense to me for the promoted integer types, but I see now that the second (new) test I added got with that call an i136 -> i256 promotion (space for 4 parts) . The updated patch now only creates space for 3 parts, which is also what is expected...

The rest of the cases are expected to be "regular", and the added assert will catch them if they are not...

jonpa added inline comments.Feb 26 2021, 10:01 AM
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
1594

This was easier than calling CreateStackTemporary with a TypeSize and an alignment value.

uweigand added inline comments.Mar 1 2021, 5:51 AM
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
1588

I'm not sure this is the correct check, it doesn't appear to match anything that is done in common code ...

I think it would be better to explicitly check for the case of multiple parts (e.g. via if (I + 1 != E && Outs[I + 1].OrigArgIndex == ArgIndex)). If it's just a single part, I think the current approach to just use Outs[I].ArgVT is the best; if it is multiple parts, then we should do the NumParts * PartSize allocation as you do below.

1594

Ah, OK. Makes sense as well.

jonpa updated this revision to Diff 327338.Mar 1 2021, 6:35 PM

Patch updated per review.

uweigand accepted this revision.Mar 2 2021, 12:27 AM

LGTM, thanks!

This revision is now accepted and ready to land.Mar 2 2021, 12:27 AM
This revision was landed with ongoing or failed builds.Mar 2 2021, 10:57 AM
This revision was automatically updated to reflect the committed changes.

Just a heads-up about a possible pitfall of this patch, though I'm not sure it actually affects SystemZ.

In D99068 I basically ported this patch to the RISC-V target, as it was affected by the same problem that this patch fixes.
While comparing that patch with an alternative implementation (D99087) it became clear that this approach can produce stack slots with incorrect alignment.
The relevant part of the patch is SlotVT = EVT::getIntegerVT(Ctx, PartVT.getSizeInBits() * N);.
An example where it produces wrong results is when the original type fp128, which has 16-byte alignment, becomes i128, which for riscv32 has a lower alignment.
Perhaps more relevant for SystemZ, vectors can also suffer from similar (but even more dramatic) changes in alignment.

Although I had abandoned D99068 in favour of D99087, I've updated D99068 to address the alignment issues, in case it's useful.

Thanks for the heads-up, @luismarques ! However, this is not really an issue on SystemZ as all arguments in question have an alignment requirement of 8 bytes on our platform.