This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/SystemZ/
-
Target/
-
SystemZ/
1/3
SystemZISelLowering.cpp
-
test/CodeGen/SystemZ/
-
CodeGen/
-
SystemZ/
1/2
args-11.ll

Differential D97514

[SystemZ] Assign the full space for promoted and split outgoing args
ClosedPublic

Authored by jonpa on Feb 25 2021, 4:53 PM.

Download Raw Diff

Details

Reviewers

uweigand
cuviper

Commits

rG52bbbf4d4459: [SystemZ] Assign the full space for promoted and split outgoing args.

Summary

Attempts to fix https://bugs.llvm.org/show_bug.cgi?id=49322

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jonpa created this revision.Feb 25 2021, 4:53 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptFeb 25 2021, 4:53 PM

jonpa requested review of this revision.Feb 25 2021, 4:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2021, 4:53 PM

jonpa added inline comments.Feb 25 2021, 5:23 PM

llvm/test/CodeGen/SystemZ/args-11.ll
14	Not sure why 24 extra bytes are allocated and not 20... Alignment?

alex added a subscriber: alex.Feb 25 2021, 5:51 PM

Harbormaster completed remote builds in B90934: Diff 326554.Feb 25 2021, 9:20 PM

I'm not sure if "getTypeToTransformTo(*DAG.getContext(), OrigArgVT)" results in the same type that is used by common code in all cases.

If you look at code in TargetLowering::LowerCallTo and getCopyToParts, the logic seems to be different, and explicitly goes via a set of full "registers":

MVT RegisterVT = getRegisterTypeForCallingConv(CLI.RetTy->getContext(),
                                               CLI.CallConv, VT);
unsigned NumRegs = getNumRegistersForCallingConv(CLI.RetTy->getContext(),
                                                 CLI.CallConv, VT);

and then

  if (NumParts * PartBits > ValueVT.getSizeInBits()) {
[...]
      ValueVT = EVT::getIntegerVT(*DAG.getContext(), NumParts * PartBits);
      Val = DAG.getNode(ExtendKind, DL, ValueVT, Val);

so I'm wondering whether it wouldn't be better to follow that model here. (E.g. by using CreateStackTemporary with a size instead of a type, and computing the size as "number-of-parts * size-of-part".)

Also, I think in any case it would be good to add an assertion when emitting the Store nodes that the target memory range of the store is actually within the allocated space, so that if we do get something wrong in the size calcuation the result will be an internal compiler error rather than wrong code.

llvm/test/CodeGen/SystemZ/args-11.ll
14	Yes, the stack always needs to be 8-byte aligned, and therefore all stack allocations are rounded up to the next multiple of 8.

Updated per review.

I'm wondering whether it wouldn't be better to follow that model here. (E.g. by using CreateStackTemporary with a size instead of a type, and computing the size as "number-of-parts * size-of-part".)

I think you are right - reading the comment yesterday for getTypeToTransformTo() seemed to make sense to me for the promoted integer types, but I see now that the second (new) test I added got with that call an i136 -> i256 promotion (space for 4 parts) . The updated patch now only creates space for 3 parts, which is also what is expected...

The rest of the cases are expected to be "regular", and the added assert will catch them if they are not...

jonpa added inline comments.Feb 26 2021, 10:01 AM

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
1594	This was easier than calling CreateStackTemporary with a TypeSize and an alignment value.

Harbormaster completed remote builds in B91063: Diff 326725.Feb 26 2021, 12:34 PM

uweigand added inline comments.Mar 1 2021, 5:51 AM

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
1588	I'm not sure this is the correct check, it doesn't appear to match anything that is done in common code ... I think it would be better to explicitly check for the case of multiple parts (e.g. via `if (I + 1 != E && Outs[I + 1].OrigArgIndex == ArgIndex)`). If it's just a single part, I think the current approach to just use `Outs[I].ArgVT` is the best; if it is multiple parts, then we should do the NumParts * PartSize allocation as you do below.
1594	Ah, OK. Makes sense as well.

Patch updated per review.

Harbormaster completed remote builds in B91487: Diff 327338.Mar 1 2021, 11:54 PM

LGTM, thanks!

This revision is now accepted and ready to land.Mar 2 2021, 12:27 AM

This revision was landed with ongoing or failed builds.Mar 2 2021, 10:57 AM

Closed by commit rG52bbbf4d4459: [SystemZ] Assign the full space for promoted and split outgoing args. (authored by jonpa). · Explain Why

This revision was automatically updated to reflect the committed changes.

jonpa added a commit: rG52bbbf4d4459: [SystemZ] Assign the full space for promoted and split outgoing args..

luismarques mentioned this in D99068: [RISCV][WIP][RFC] Fix stack slot for argument type sizes not a multiple of 64 bits (Bug 49500).Mar 22 2021, 4:06 AM

Just a heads-up about a possible pitfall of this patch, though I'm not sure it actually affects SystemZ.

In D99068 I basically ported this patch to the RISC-V target, as it was affected by the same problem that this patch fixes.
While comparing that patch with an alternative implementation (D99087) it became clear that this approach can produce stack slots with incorrect alignment.
The relevant part of the patch is SlotVT = EVT::getIntegerVT(Ctx, PartVT.getSizeInBits() * N);.
An example where it produces wrong results is when the original type fp128, which has 16-byte alignment, becomes i128, which for riscv32 has a lower alignment.
Perhaps more relevant for SystemZ, vectors can also suffer from similar (but even more dramatic) changes in alignment.

Although I had abandoned D99068 in favour of D99087, I've updated D99068 to address the alignment issues, in case it's useful.

Thanks for the heads-up, @luismarques ! However, this is not really an issue on SystemZ as all arguments in question have an alignment requirement of 8 bytes on our platform.

Revision Contents

Path

Size

llvm/

lib/

Target/

SystemZ/

SystemZISelLowering.cpp

22 lines

test/

CodeGen/

SystemZ/

args-11.ll

54 lines

Diff 327521

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,541 Lines • ▼ Show 20 Lines	SystemZTargetLowering::LowerCall(CallLoweringInfo &CLI,
SmallVectorImpl<ISD::InputArg> &Ins = CLI.Ins;		SmallVectorImpl<ISD::InputArg> &Ins = CLI.Ins;
SDValue Chain = CLI.Chain;		SDValue Chain = CLI.Chain;
SDValue Callee = CLI.Callee;		SDValue Callee = CLI.Callee;
bool &IsTailCall = CLI.IsTailCall;		bool &IsTailCall = CLI.IsTailCall;
CallingConv::ID CallConv = CLI.CallConv;		CallingConv::ID CallConv = CLI.CallConv;
bool IsVarArg = CLI.IsVarArg;		bool IsVarArg = CLI.IsVarArg;
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
EVT PtrVT = getPointerTy(MF.getDataLayout());		EVT PtrVT = getPointerTy(MF.getDataLayout());
		LLVMContext &Ctx = *DAG.getContext();

// Detect unsupported vector argument and return types.		// Detect unsupported vector argument and return types.
if (Subtarget.hasVector()) {		if (Subtarget.hasVector()) {
VerifyVectorTypes(Outs);		VerifyVectorTypes(Outs);
VerifyVectorTypes(Ins);		VerifyVectorTypes(Ins);
}		}

// Analyze the operands of the call, assigning locations to each operand.		// Analyze the operands of the call, assigning locations to each operand.
SmallVector<CCValAssign, 16> ArgLocs;		SmallVector<CCValAssign, 16> ArgLocs;
SystemZCCState ArgCCInfo(CallConv, IsVarArg, MF, ArgLocs, *DAG.getContext());		SystemZCCState ArgCCInfo(CallConv, IsVarArg, MF, ArgLocs, Ctx);
ArgCCInfo.AnalyzeCallOperands(Outs, CC_SystemZ);		ArgCCInfo.AnalyzeCallOperands(Outs, CC_SystemZ);

// We don't support GuaranteedTailCallOpt, only automatically-detected		// We don't support GuaranteedTailCallOpt, only automatically-detected
// sibling calls.		// sibling calls.
if (IsTailCall && !canUseSiblingCall(ArgCCInfo, ArgLocs, Outs))		if (IsTailCall && !canUseSiblingCall(ArgCCInfo, ArgLocs, Outs))
IsTailCall = false;		IsTailCall = false;

// Get a count of how many bytes are to be pushed on the stack.		// Get a count of how many bytes are to be pushed on the stack.
unsigned NumBytes = ArgCCInfo.getNextStackOffset();		unsigned NumBytes = ArgCCInfo.getNextStackOffset();

// Mark the start of the call.		// Mark the start of the call.
if (!IsTailCall)		if (!IsTailCall)
Chain = DAG.getCALLSEQ_START(Chain, NumBytes, 0, DL);		Chain = DAG.getCALLSEQ_START(Chain, NumBytes, 0, DL);

// Copy argument values to their designated locations.		// Copy argument values to their designated locations.
SmallVector<std::pair<unsigned, SDValue>, 9> RegsToPass;		SmallVector<std::pair<unsigned, SDValue>, 9> RegsToPass;
SmallVector<SDValue, 8> MemOpChains;		SmallVector<SDValue, 8> MemOpChains;
SDValue StackPtr;		SDValue StackPtr;
for (unsigned I = 0, E = ArgLocs.size(); I != E; ++I) {		for (unsigned I = 0, E = ArgLocs.size(); I != E; ++I) {
CCValAssign &VA = ArgLocs[I];		CCValAssign &VA = ArgLocs[I];
SDValue ArgValue = OutVals[I];		SDValue ArgValue = OutVals[I];

if (VA.getLocInfo() == CCValAssign::Indirect) {		if (VA.getLocInfo() == CCValAssign::Indirect) {
// Store the argument in a stack slot and pass its address.		// Store the argument in a stack slot and pass its address.
SDValue SpillSlot = DAG.CreateStackTemporary(Outs[I].ArgVT);		unsigned ArgIndex = Outs[I].OrigArgIndex;
		EVT SlotVT;
		if (I + 1 != E && Outs[I + 1].OrigArgIndex == ArgIndex) {
		// Allocate the full stack space for a promoted (and split) argument.
		uweigandUnsubmitted Not Done Reply Inline Actions I'm not sure this is the correct check, it doesn't appear to match anything that is done in common code ... I think it would be better to explicitly check for the case of multiple parts (e.g. via `if (I + 1 != E && Outs[I + 1].OrigArgIndex == ArgIndex)`). If it's just a single part, I think the current approach to just use `Outs[I].ArgVT` is the best; if it is multiple parts, then we should do the NumParts * PartSize allocation as you do below. uweigand: I'm not sure this is the correct check, it doesn't appear to match anything that is done in…
		Type *OrigArgType = CLI.Args[Outs[I].OrigArgIndex].Ty;
		EVT OrigArgVT = getValueType(MF.getDataLayout(), OrigArgType);
		MVT PartVT = getRegisterTypeForCallingConv(Ctx, CLI.CallConv, OrigArgVT);
		unsigned N = getNumRegistersForCallingConv(Ctx, CLI.CallConv, OrigArgVT);
		SlotVT = EVT::getIntegerVT(Ctx, PartVT.getSizeInBits() * N);
		} else {
		jonpaAuthorUnsubmitted Done Reply Inline Actions This was easier than calling CreateStackTemporary with a TypeSize and an alignment value. jonpa: This was easier than calling CreateStackTemporary with a TypeSize and an alignment value.
		uweigandUnsubmitted Not Done Reply Inline Actions Ah, OK. Makes sense as well. uweigand: Ah, OK. Makes sense as well.
		SlotVT = Outs[I].ArgVT;
		}
		SDValue SpillSlot = DAG.CreateStackTemporary(SlotVT);
int FI = cast<FrameIndexSDNode>(SpillSlot)->getIndex();		int FI = cast<FrameIndexSDNode>(SpillSlot)->getIndex();
MemOpChains.push_back(		MemOpChains.push_back(
DAG.getStore(Chain, DL, ArgValue, SpillSlot,		DAG.getStore(Chain, DL, ArgValue, SpillSlot,
MachinePointerInfo::getFixedStack(MF, FI)));		MachinePointerInfo::getFixedStack(MF, FI)));
// If the original argument was split (e.g. i128), we need		// If the original argument was split (e.g. i128), we need
// to store all parts of it here (and pass just one address).		// to store all parts of it here (and pass just one address).
unsigned ArgIndex = Outs[I].OrigArgIndex;
assert (Outs[I].PartOffset == 0);		assert (Outs[I].PartOffset == 0);
while (I + 1 != E && Outs[I + 1].OrigArgIndex == ArgIndex) {		while (I + 1 != E && Outs[I + 1].OrigArgIndex == ArgIndex) {
SDValue PartValue = OutVals[I + 1];		SDValue PartValue = OutVals[I + 1];
unsigned PartOffset = Outs[I + 1].PartOffset;		unsigned PartOffset = Outs[I + 1].PartOffset;
SDValue Address = DAG.getNode(ISD::ADD, DL, PtrVT, SpillSlot,		SDValue Address = DAG.getNode(ISD::ADD, DL, PtrVT, SpillSlot,
DAG.getIntPtrConstant(PartOffset, DL));		DAG.getIntPtrConstant(PartOffset, DL));
MemOpChains.push_back(		MemOpChains.push_back(
DAG.getStore(Chain, DL, PartValue, Address,		DAG.getStore(Chain, DL, PartValue, Address,
MachinePointerInfo::getFixedStack(MF, FI)));		MachinePointerInfo::getFixedStack(MF, FI)));
		assert((PartOffset + PartValue.getValueType().getStoreSize() <=
		SlotVT.getStoreSize()) && "Not enough space for argument part!");
++I;		++I;
}		}
ArgValue = SpillSlot;		ArgValue = SpillSlot;
} else		} else
ArgValue = convertValVTToLocVT(DAG, DL, VA, ArgValue);		ArgValue = convertValVTToLocVT(DAG, DL, VA, ArgValue);

if (VA.isRegLoc())		if (VA.isRegLoc())
// Queue up the argument copies and emit them at the end.		// Queue up the argument copies and emit them at the end.
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	SystemZTargetLowering::LowerCall(CallLoweringInfo &CLI,
Chain = DAG.getCALLSEQ_END(Chain,		Chain = DAG.getCALLSEQ_END(Chain,
DAG.getConstant(NumBytes, DL, PtrVT, true),		DAG.getConstant(NumBytes, DL, PtrVT, true),
DAG.getConstant(0, DL, PtrVT, true),		DAG.getConstant(0, DL, PtrVT, true),
Glue, DL);		Glue, DL);
Glue = Chain.getValue(1);		Glue = Chain.getValue(1);

// Assign locations to each value returned by this call.		// Assign locations to each value returned by this call.
SmallVector<CCValAssign, 16> RetLocs;		SmallVector<CCValAssign, 16> RetLocs;
CCState RetCCInfo(CallConv, IsVarArg, MF, RetLocs, *DAG.getContext());		CCState RetCCInfo(CallConv, IsVarArg, MF, RetLocs, Ctx);
RetCCInfo.AnalyzeCallResult(Ins, RetCC_SystemZ);		RetCCInfo.AnalyzeCallResult(Ins, RetCC_SystemZ);

// Copy all of the result registers out of their specified physreg.		// Copy all of the result registers out of their specified physreg.
for (unsigned I = 0, E = RetLocs.size(); I != E; ++I) {		for (unsigned I = 0, E = RetLocs.size(); I != E; ++I) {
CCValAssign &VA = RetLocs[I];		CCValAssign &VA = RetLocs[I];

// Copy the value out, gluing the copy to the end of the call sequence.		// Copy the value out, gluing the copy to the end of the call sequence.
SDValue RetValue = DAG.getCopyFromReg(Chain, DL, VA.getLocReg(),		SDValue RetValue = DAG.getCopyFromReg(Chain, DL, VA.getLocReg(),
▲ Show 20 Lines • Show All 6,738 Lines • Show Last 20 Lines

llvm/test/CodeGen/SystemZ/args-11.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; Test outgoing promoted arguments that are split (and passed by reference).
				;
				; RUN: llc < %s -mtriple=s390x-linux-gnu \| FileCheck %s

				; The i96 arg is promoted to i128 and should get the full stack space.
				declare void @fn1(i96)
				define i32 @fn2() {
				; CHECK-LABEL: fn2:
				; CHECK: # %bb.0:
				; CHECK-NEXT: stmg %r14, %r15, 112(%r15)
				; CHECK-NEXT: .cfi_offset %r14, -48
				; CHECK-NEXT: .cfi_offset %r15, -40
				; CHECK-NEXT: aghi %r15, -184
				jonpaAuthorUnsubmitted Done Reply Inline Actions Not sure why 24 extra bytes are allocated and not 20... Alignment? jonpa: Not sure why 24 extra bytes are allocated and not 20... Alignment?
				uweigandUnsubmitted Not Done Reply Inline Actions Yes, the stack always needs to be 8-byte aligned, and therefore all stack allocations are rounded up to the next multiple of 8. uweigand: Yes, the stack always needs to be 8-byte aligned, and therefore all stack allocations are…
				; CHECK-NEXT: .cfi_def_cfa_offset 344
				; CHECK-NEXT: mvhi 180(%r15), -1
				; CHECK-NEXT: mvghi 168(%r15), 0
				; CHECK-NEXT: la %r2, 160(%r15)
				; CHECK-NEXT: mvghi 160(%r15), 0
				; CHECK-NEXT: brasl %r14, fn1@PLT
				; CHECK-NEXT: l %r2, 180(%r15)
				; CHECK-NEXT: lmg %r14, %r15, 296(%r15)
				; CHECK-NEXT: br %r14
				%1 = alloca i32
				store i32 -1, i32* %1
				call void @fn1(i96 0)
				%2 = load i32, i32* %1
				ret i32 %2
				}

				declare void @fn3(i136)
				define i32 @fn4() {
				; CHECK-LABEL: fn4:
				; CHECK: # %bb.0:
				; CHECK-NEXT: stmg %r14, %r15, 112(%r15)
				; CHECK-NEXT: .cfi_offset %r14, -48
				; CHECK-NEXT: .cfi_offset %r15, -40
				; CHECK-NEXT: aghi %r15, -192
				; CHECK-NEXT: .cfi_def_cfa_offset 352
				; CHECK-NEXT: mvhi 188(%r15), -1
				; CHECK-NEXT: mvghi 176(%r15), 0
				; CHECK-NEXT: mvghi 168(%r15), 0
				; CHECK-NEXT: la %r2, 160(%r15)
				; CHECK-NEXT: mvghi 160(%r15), 0
				; CHECK-NEXT: brasl %r14, fn3@PLT
				; CHECK-NEXT: l %r2, 188(%r15)
				; CHECK-NEXT: lmg %r14, %r15, 304(%r15)
				; CHECK-NEXT: br %r14
				%1 = alloca i32
				store i32 -1, i32* %1
				call void @fn3(i136 0)
				%2 = load i32, i32* %1
				ret i32 %2
				}