This is an archive of the discontinued LLVM Phabricator instance.

GlobalISel: Restructure argument lowering loop in handleAssignments
ClosedPublic

Authored by arsenm on Jul 8 2020, 12:30 PM.

Download Raw Diff

Details

Reviewers

aemerson
paquette
rovka
aditya_nandakumar
dsanders

Summary

This was structured in a way that implied every split argument is in
memory, or in registers. It is possible for a pass a original argument
partially in registers, and partially in memory. Transpose the logic
here to only consider a single piece at a time. Every individual
CCValAssign should be treated independently, and any merge to original
value needs to be handled later.

This is in preparation for merging some preprocessing hacks in the
AMDGPU calling convention lowering into the generic code. This was
intended to be NFC, but it does partially address a FIXME in the
memloc handling.

As a result, this does slightly change AArch64 handling of some
promoted arguments passed on the stack. The store will be emitted as
the smaller, piece type rather than a wider store of an anyext
value. I think this exposes a failure to merge stores later, as the
change in swifterror replaces a single 64-bit stp with 2 4-byte str.

I'm also not sure what the correct behavior for memlocs where the
promoted size is larger than the original value. I've opted to clamp
the memory access size to not exceed the value register to avoid the
explicit trunc/extend/vector widen/vector extract instruction. This
happens for AMDGPU for i8 arguments that end up stack passed, which
are promoted to i16 (I think this is a preexisting DAG bug though, and
they should not really be promoted when in memory).

Diff Detail

Event Timeline

arsenm created this revision.Jul 8 2020, 12:30 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2020, 12:30 PM

Herald added subscribers: hiraditya, kristof.beyls, tpr, wdng. · View Herald Transcript

I don't think we should be changing the extending behavior for memlocs.

Just for the varargs case in Darwin, there's lots of code out there which incorrectly try to interpret a sub 64bit incoming varargs parameter as a 64 bit value. Although it should be technically correct to emit a smaller store, what happens in practice is that this code breaks for very hard to detect reasons (i.e. you no longer get a free zeroing of the upper bits of the stack slot). In arm64 we could force this to always explicitly zero-extend to 64 bits but that incurs a penalty at the call site.

It's unfortunate but copying the DAG behavior here is likely to cause less pain.

Use LocVT, which this code should have been using in the first place which avoids shrinking the store.

I also found the 2 forms of assignValueToAddress confusing, and the full form interface can't handle the case where a single stack slot covers multiple registers (although it seems unlikely this would ever be needed)

Herald added subscribers: kerbowa, nhaehnle, jvesely. · View Herald TranscriptJul 11 2020, 2:56 PM

aemerson added inline comments.Jul 14 2020, 10:55 AM

llvm/test/CodeGen/AArch64/GlobalISel/call-lowering-i128-on-stack.ll
11 ↗	(On Diff #277264)	This test is actually falling back to SDAG. If you add -global-isel-abort=1 it crashes.

Remove no longer fixed aarch64 case

llvm/test/CodeGen/AArch64/GlobalISel/call-lowering-i128-on-stack.ll
11 ↗	(On Diff #277264)	I think enabling the fallback is a bad default for llc

aemerson added inline comments.Jul 14 2020, 12:45 PM

llvm/test/CodeGen/AArch64/GlobalISel/call-lowering-i128-on-stack.ll
11 ↗	(On Diff #277264)	Probably true.

ping

Sorry, didn’t realise you’d updated the patch.

This revision is now accepted and ready to land.Jul 21 2020, 8:49 PM

b98f902f1877c3d679f77645a267edc89ffcd5d6

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

GlobalISel/

CallLowering.h

1 line

lib/

CodeGen/

GlobalISel/

CallLowering.cpp

141 lines

Target/

AArch64/

GISel/

AArch64CallLowering.cpp

2 lines

AMDGPU/

AMDGPUCallLowering.cpp

8 lines

Diff 277936

llvm/include/llvm/CodeGen/GlobalISel/CallLowering.h

Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	virtual void assignValueToAddress(Register ValVReg, Register Addr,
uint64_t Size, MachinePointerInfo &MPO,		uint64_t Size, MachinePointerInfo &MPO,
CCValAssign &VA) = 0;		CCValAssign &VA) = 0;

/// An overload which takes an ArgInfo if additional information about		/// An overload which takes an ArgInfo if additional information about
/// the arg is needed.		/// the arg is needed.
virtual void assignValueToAddress(const ArgInfo &Arg, Register Addr,		virtual void assignValueToAddress(const ArgInfo &Arg, Register Addr,
uint64_t Size, MachinePointerInfo &MPO,		uint64_t Size, MachinePointerInfo &MPO,
CCValAssign &VA) {		CCValAssign &VA) {
		assert(Arg.Regs.size() == 1);
assignValueToAddress(Arg.Regs[0], Addr, Size, MPO, VA);		assignValueToAddress(Arg.Regs[0], Addr, Size, MPO, VA);
}		}

/// Handle custom values, which may be passed into one or more of \p VAs.		/// Handle custom values, which may be passed into one or more of \p VAs.
/// \return The number of \p VAs that have been assigned after the first		/// \return The number of \p VAs that have been assigned after the first
/// one, and which should therefore be skipped from further		/// one, and which should therefore be skipped from further
/// processing.		/// processing.
virtual unsigned assignCustomValue(const ArgInfo &Arg,		virtual unsigned assignCustomValue(const ArgInfo &Arg,
▲ Show 20 Lines • Show All 196 Lines • Show Last 20 Lines

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp

Show First 20 Lines • Show All 307 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = Args.size(), j = 0; i != e; ++i, ++j) {

// FIXME: Pack registers if we have more than one.		// FIXME: Pack registers if we have more than one.
Register ArgReg = Args[i].Regs[0];		Register ArgReg = Args[i].Regs[0];

EVT OrigVT = EVT::getEVT(Args[i].Ty);		EVT OrigVT = EVT::getEVT(Args[i].Ty);
EVT VAVT = VA.getValVT();		EVT VAVT = VA.getValVT();
const LLT OrigTy = getLLTForType(*Args[i].Ty, DL);		const LLT OrigTy = getLLTForType(*Args[i].Ty, DL);

if (VA.isRegLoc()) {
if (Handler.isIncomingArgumentHandler() && VAVT != OrigVT) {
if (VAVT.getSizeInBits() < OrigVT.getSizeInBits()) {
// Expected to be multiple regs for a single incoming arg.		// Expected to be multiple regs for a single incoming arg.
		// There should be Regs.size() ArgLocs per argument.
unsigned NumArgRegs = Args[i].Regs.size();		unsigned NumArgRegs = Args[i].Regs.size();
if (NumArgRegs < 2)
return false;

assert((j + (NumArgRegs - 1)) < ArgLocs.size() &&		assert((j + (NumArgRegs - 1)) < ArgLocs.size() &&
"Too many regs for number of args");		"Too many regs for number of args");
for (unsigned Part = 0; Part < NumArgRegs; ++Part) {		for (unsigned Part = 0; Part < NumArgRegs; ++Part) {
// There should be Regs.size() ArgLocs per argument.		// There should be Regs.size() ArgLocs per argument.
VA = ArgLocs[j + Part];		VA = ArgLocs[j + Part];
Handler.assignValueToReg(Args[i].Regs[Part], VA.getLocReg(), VA);		if (VA.isMemLoc()) {
		// Don't currently support loading/storing a type that needs to be split
		// to the stack. Should be easy, just not implemented yet.
		if (NumArgRegs > 1) {
		LLVM_DEBUG(
		dbgs()
		<< "Load/store a split arg to/from the stack not implemented yet\n");
		return false;
}		}
j += NumArgRegs - 1;
// Merge the split registers into the expected larger result vreg		// FIXME: Use correct address space for pointer size
// of the original call.		EVT LocVT = VA.getValVT();
MIRBuilder.buildMerge(Args[i].OrigRegs[0], Args[i].Regs);		unsigned MemSize = LocVT == MVT::iPTR ? DL.getPointerSize()
		: LocVT.getStoreSize();
		unsigned Offset = VA.getLocMemOffset();
		MachinePointerInfo MPO;
		Register StackAddr = Handler.getStackAddress(MemSize, Offset, MPO);
		Handler.assignValueToAddress(Args[i], StackAddr,
		MemSize, MPO, VA);
		continue;
		}

		assert(VA.isRegLoc() && "custom loc should have been handled already");

		if (OrigVT.getSizeInBits() >= VAVT.getSizeInBits() \|\|
		!Handler.isIncomingArgumentHandler()) {
		// This is an argument that might have been split. There should be
		// Regs.size() ArgLocs per argument.

		// Insert the argument copies. If VAVT < OrigVT, we'll insert the merge
		// to the original register after handling all of the parts.
		Handler.assignValueToReg(Args[i].Regs[Part], VA.getLocReg(), VA);
continue;		continue;
}		}

		// This ArgLoc covers multiple pieces, so we need to split it.
const LLT VATy(VAVT.getSimpleVT());		const LLT VATy(VAVT.getSimpleVT());
Register NewReg =		Register NewReg =
MIRBuilder.getMRI()->createGenericVirtualRegister(VATy);		MIRBuilder.getMRI()->createGenericVirtualRegister(VATy);
Handler.assignValueToReg(NewReg, VA.getLocReg(), VA);		Handler.assignValueToReg(NewReg, VA.getLocReg(), VA);
// If it's a vector type, we either need to truncate the elements		// If it's a vector type, we either need to truncate the elements
// or do an unmerge to get the lower block of elements.		// or do an unmerge to get the lower block of elements.
if (VATy.isVector() &&		if (VATy.isVector() &&
VATy.getNumElements() > OrigVT.getVectorNumElements()) {		VATy.getNumElements() > OrigVT.getVectorNumElements()) {
// Just handle the case where the VA type is 2 * original type.		// Just handle the case where the VA type is 2 * original type.
if (VATy.getNumElements() != OrigVT.getVectorNumElements() * 2) {		if (VATy.getNumElements() != OrigVT.getVectorNumElements() * 2) {
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "Incoming promoted vector arg has too many elts");		<< "Incoming promoted vector arg has too many elts");
return false;		return false;
}		}
auto Unmerge = MIRBuilder.buildUnmerge({OrigTy, OrigTy}, {NewReg});		auto Unmerge = MIRBuilder.buildUnmerge({OrigTy, OrigTy}, {NewReg});
MIRBuilder.buildCopy(ArgReg, Unmerge.getReg(0));		MIRBuilder.buildCopy(ArgReg, Unmerge.getReg(0));
} else {		} else {
MIRBuilder.buildTrunc(ArgReg, {NewReg}).getReg(0);		MIRBuilder.buildTrunc(ArgReg, {NewReg}).getReg(0);
}		}
} else if (!Handler.isIncomingArgumentHandler()) {
assert((j + (Args[i].Regs.size() - 1)) < ArgLocs.size() &&
"Too many regs for number of args");
// This is an outgoing argument that might have been split.
for (unsigned Part = 0; Part < Args[i].Regs.size(); ++Part) {
// There should be Regs.size() ArgLocs per argument.
VA = ArgLocs[j + Part];
Handler.assignValueToReg(Args[i].Regs[Part], VA.getLocReg(), VA);
}
j += Args[i].Regs.size() - 1;
} else {
Handler.assignValueToReg(ArgReg, VA.getLocReg(), VA);
}		}
} else if (VA.isMemLoc()) {
// Don't currently support loading/storing a type that needs to be split		// Now that all pieces have been handled, re-pack any arguments into any
// to the stack. Should be easy, just not implemented yet.		// wider, original registers.
if (Args[i].Regs.size() > 1) {		if (Handler.isIncomingArgumentHandler()) {
LLVM_DEBUG(		if (VAVT.getSizeInBits() < OrigVT.getSizeInBits()) {
dbgs()		assert(NumArgRegs >= 2);
<< "Load/store a split arg to/from the stack not implemented yet");
return false;		// Merge the split registers into the expected larger result vreg
		// of the original call.
		MIRBuilder.buildMerge(Args[i].OrigRegs[0], Args[i].Regs);
}		}
MVT VT = MVT::getVT(Args[i].Ty);
unsigned Size = VT == MVT::iPTR ? DL.getPointerSize()
: alignTo(VT.getSizeInBits(), 8) / 8;
unsigned Offset = VA.getLocMemOffset();
MachinePointerInfo MPO;
Register StackAddr = Handler.getStackAddress(Size, Offset, MPO);
Handler.assignValueToAddress(Args[i], StackAddr, Size, MPO, VA);
} else {
// FIXME: Support byvals and other weirdness
return false;
}		}

		j += NumArgRegs - 1;
}		}

return true;		return true;
}		}

bool CallLowering::analyzeArgInfo(CCState &CCState,		bool CallLowering::analyzeArgInfo(CCState &CCState,
SmallVectorImpl<ArgInfo> &Args,		SmallVectorImpl<ArgInfo> &Args,
CCAssignFn &AssignFnFixed,		CCAssignFn &AssignFnFixed,
CCAssignFn &AssignFnVarArg) const {		CCAssignFn &AssignFnVarArg) const {
for (unsigned i = 0, e = Args.size(); i < e; ++i) {		for (unsigned i = 0, e = Args.size(); i < e; ++i) {
▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp

Show First 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	void assignValueToAddress(const CallLowering::ArgInfo &Arg, Register Addr,
uint64_t Size, MachinePointerInfo &MPO,		uint64_t Size, MachinePointerInfo &MPO,
CCValAssign &VA) override {		CCValAssign &VA) override {
unsigned MaxSize = Size * 8;		unsigned MaxSize = Size * 8;
// For varargs, we always want to extend them to 8 bytes, in which case		// For varargs, we always want to extend them to 8 bytes, in which case
// we disable setting a max.		// we disable setting a max.
if (!Arg.IsFixed)		if (!Arg.IsFixed)
MaxSize = 0;		MaxSize = 0;

		assert(Arg.Regs.size() == 1);

Register ValVReg = VA.getLocInfo() != CCValAssign::LocInfo::FPExt		Register ValVReg = VA.getLocInfo() != CCValAssign::LocInfo::FPExt
? extendRegister(Arg.Regs[0], VA, MaxSize)		? extendRegister(Arg.Regs[0], VA, MaxSize)
: Arg.Regs[0];		: Arg.Regs[0];

// If we extended we might need to adjust the MMO's Size.		// If we extended we might need to adjust the MMO's Size.
const LLT RegTy = MRI.getType(ValVReg);		const LLT RegTy = MRI.getType(ValVReg);
if (RegTy.getSizeInBytes() > Size)		if (RegTy.getSizeInBytes() > Size)
Size = RegTy.getSizeInBytes();		Size = RegTy.getSizeInBytes();
▲ Show 20 Lines • Show All 853 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp

Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	case CCValAssign::LocInfo::AExt: {
break;		break;
}		}
default:		default:
MIRBuilder.buildCopy(ValVReg, PhysReg);		MIRBuilder.buildCopy(ValVReg, PhysReg);
break;		break;
}		}
}		}

void assignValueToAddress(Register ValVReg, Register Addr, uint64_t Size,		void assignValueToAddress(Register ValVReg, Register Addr, uint64_t MemSize,
MachinePointerInfo &MPO, CCValAssign &VA) override {		MachinePointerInfo &MPO, CCValAssign &VA) override {
MachineFunction &MF = MIRBuilder.getMF();		MachineFunction &MF = MIRBuilder.getMF();

		// The reported memory location may be wider than the value.
		const LLT RegTy = MRI.getType(ValVReg);
		MemSize = std::min(static_cast<uint64_t>(RegTy.getSizeInBytes()), MemSize);

// FIXME: Get alignment		// FIXME: Get alignment
auto MMO = MF.getMachineMemOperand(		auto MMO = MF.getMachineMemOperand(
MPO, MachineMemOperand::MOLoad \| MachineMemOperand::MOInvariant, Size,		MPO, MachineMemOperand::MOLoad \| MachineMemOperand::MOInvariant, MemSize,
inferAlignFromPtrInfo(MF, MPO));		inferAlignFromPtrInfo(MF, MPO));
MIRBuilder.buildLoad(ValVReg, Addr, *MMO);		MIRBuilder.buildLoad(ValVReg, Addr, *MMO);
}		}

/// How the physical register gets marked varies between formal		/// How the physical register gets marked varies between formal
/// parameters (it's a basic-block live-in), and a call instruction		/// parameters (it's a basic-block live-in), and a call instruction
/// (it's an implicit-def of the BL).		/// (it's an implicit-def of the BL).
virtual void markPhysRegUsed(unsigned PhysReg) = 0;		virtual void markPhysRegUsed(unsigned PhysReg) = 0;
▲ Show 20 Lines • Show All 1,108 Lines • Show Last 20 Lines