This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
MachineFrameInfo.h
-
lib/
-
CodeGen/
-
GlobalISel/
-
IRTranslator.cpp
-
MachineFrameInfo.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64FrameLowering.h
6/6
AArch64FrameLowering.cpp
-
AArch64InstrInfo.cpp
1/3
AArch64MachineFunctionInfo.h
-
AArch64RegisterInfo.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
6/11
large-stack.ll

Differential D70496

[AArch64] Fix issues with large arrays on stack
ClosedPublic

Authored by kiranchandramohan on Nov 20 2019, 7:57 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
efriedma
fhahn
aemerson

Commits

rG965ed1e974e8: [AArch64] Fix issues with large arrays on stack

Summary

This patch fixes a few issues when large arrays are allocated on the stack. Currently, clang has inconsistent behaviour, for debug builds there is an assertion failure when the array size on stack is around 2GB but there is no assertion when the stack is around 8GB. For release builds there is no assertion, the compilation succeeds but generates incorrect code. The incorrect code generated is due to using int/unsigned int instead of their 64-bit counterparts. This patch,

Removes the assertion in frame legality check.
Converts int/unsigned int in some places to the 64-bit variants. This helps in generating correct code and removes the inconsistent behaviour.
Adds a test which runs without optimisations.

Diff Detail

Event Timeline

kiranchandramohan created this revision.Nov 20 2019, 7:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 20 2019, 7:57 AM

Herald added subscribers: llvm-commits, hiraditya, kristof.beyls. · View Herald Transcript

kiranchandramohan added reviewers: sdesmalen, efriedma, fhahn, aemerson.Nov 20 2019, 9:27 AM

fpetrogalli added a subscriber: fpetrogalli.Nov 20 2019, 9:54 AM

fpetrogalli added inline comments.

llvm/test/CodeGen/AArch64/large-stack.ll
2	Why can't you use the same prefix?
25	Do you need attribute #0 and #1?

The general approach here seems fine, I guess. At least, we shouldn't crash or miscompile, and generating code to allocate the full requested stack is probably the easiest way to do that without worrying about weird edge cases. If we really cared about this, we might want to consider some heuristics to "split" the stack so spill slots and small variables are cheap to access, but I doubt that's actually important in practice.

Realistically, I can't imagine any good result from code allocating an 8GB array on the stack; no operating system is going to allocate enough stack to make this work, at least by default.

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
1394–1395	Indentation here needs to be fixed.
llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
43	Is there a potential problem here as well?
llvm/test/CodeGen/AArch64/large-stack.ll
31	What sequence are we generating to adjust the stack pointer? For a large array that fits in 32 bits, I see a long sequence of "sub sp, sp, #4095, lsl #12"; are we doing the same thing here?

Thanks @fpetrogalli, @eli.friedman for the review.

I have responded to the review comments. I have a few questions and will update with a new patch after your response to these questions.

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
1394–1395	Will fix, thanks.
llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
43	I don't know whether i can answer that. Typically the ABI prohibits passing large objects by value. For e.g. the AArch64 ABI disallows passing composite types by value whose size is larger than 16bytes. "If the argument type is a Composite Type that is larger than 16 bytes, then the argument is copied to memory allocated by the caller and the argument is replaced by a pointer to the copy." If this refers to arguments only then probably it is not needed. Do you feel there are cases which can cause this variable to have a high value which requires 64 bit?
llvm/test/CodeGen/AArch64/large-stack.ll
2	So that i can progressively construct the test. First check the spills, then the stack and the then accessing the value. Are you recommending combining them?
25	Maybe not. But these attributes are by default generated by clang and hence matches the IR that clang generates. Also, the test will need updating since the attribute specified uses some attributes (no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf") which can change the assembly code generated. What is the standard for these tests do you omit attributes?
31	Yes, we are doing the same thing here. That sequence is 128 long here.

fpetrogalli added inline comments.Nov 22 2019, 8:09 AM

llvm/test/CodeGen/AArch64/large-stack.ll
2	I can see value in having multiple prefixes when building the test. But multiple prefixes are designed to be used when executing multiple RUN lines on the same source. If you remove them the tests will be more clear. If you need to highlight what you are checking, I'd rather add plain comments inline where the CHECKs are: ; Checking spill ; CHECK: stp x[[SPILL_REG1:[0-9]+]], x[[SPILL_REG2:[0-9]+]], [sp, #-[[SPILL_OFFSET1:[0-9]+]]] ; CHECK-NEXT: str x[[SPILL_REG3:[0-9]+]], [sp, #[[SPILL_OFFSET2:[0-9]+]]] ; Checking frame ; CHECK: mov x[[FRAME:[0-9]+]], sp ; Checking setstack count 128 bit ; CHECK: sub sp, sp, #[[STACK1:[0-9]+]], lsl #[[SHIFT:[0-9]+]] ; ...
25	If you need attributes, you should reduce them to the minimal set needed for the test. If `no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"` is all you need, remove the rest (and merge the two attributes in a single attribute). less is always better! :)

efriedma added inline comments.Nov 22 2019, 1:17 PM

llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h
43	I guess not in practice; functions can't have that many arguments.

Addressed review comments by @fpetrogalli and @eli.friedman,

Fixed formatting in llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
Updated test to work with minimal attributes and removed prefixes for checks

kiranchandramohan marked 2 inline comments as done.Nov 25 2019, 4:17 AM

kiranchandramohan added inline comments.

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
1394–1395	Done.
llvm/test/CodeGen/AArch64/large-stack.ll
25	Have now updated the test to have only the necessary attributes and also removed prefixes.

Does this patch need additional reviews or changes?

Please give reviewers at least a few days before you start pinging a patch. (I'll try to get to this today.)

Apologies, will wait for the reviews.

I'm a little concerned we've missed some conversion somewhere, but I don't have any good suggestion for that; we currently don't use integer conversion warnings in LLVM, and that would be a giant project to change.

There's some potential here that the arithmetic could overflow, even in 64 bits, but I'm not sure how we can handle that cleanly; I think it's okay to put off solving that issue.

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
920	Indentation.
1037	This should also be uint64_t?
llvm/test/CodeGen/AArch64/large-stack.ll
29	Using FileCheck variables for values that will never change isn't helpful. For example, "lsl #[[SHIFT:[0-9]+]]" is actually always "lsl #12", because that's the only legal value. Is there a reason some of these CHECKs aren't CHECK-NEXT?

kiranchandramohan updated this revision to Diff 231526.Nov 29 2019, 6:12 AM

In general, I also worry that i might have missed some checks. I was hoping to get some pointers on how to run some tests so that we can minimise this. At the same time, I also feel that this change should not cause regressions.

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
1037	Initially missed out since this portion of the code seems to error out. But you are correct that it has to be int64_t to prevent overflows.
llvm/test/CodeGen/AArch64/large-stack.ll
29	I have removed the variables from here. A few of these are CHECKS and not CHECK-NEXT. Skips over some cfi attributes ; CHECK: sub x[[INDEX:[0-9]+]], x[[FRAME]], #8 Skips over some set up for calling the print function ; CHECK: bl printf The CHECK-COUNTs did not seem to have a way of combining with NEXT. ; CHECK-COUNT-128: sub sp, sp, #[[STACK1:[0-9]+]], lsl #12 ; CHECK-COUNT-128: add sp, sp, #[[STACK1]], lsl #12 Please let me know if this is not OK.

LGTM

This revision is now accepted and ready to land.Dec 3 2019, 5:39 PM

Thanks @eli.friedman.
@fpetrogalli are you OK with the changes I made to your suggestions? Might need some handholding to land this patch.

In D70496#1770521, @kiranchandramohan wrote:

@fpetrogalli are you OK with the changes I made to your suggestions? Might need some handholding to land this patch.

I am happy, with a nit: I think that CHECK-COUNT-128 is ignored? Or is COUNT-128 a specific token for the CHECK prefix?

In D70496#1770739, @fpetrogalli wrote:

In D70496#1770521, @kiranchandramohan wrote:

@fpetrogalli are you OK with the changes I made to your suggestions? Might need some handholding to land this patch.

I am happy, with a nit: I think that CHECK-COUNT-128 is ignored? Or is COUNT-128 a specific token for the CHECK prefix?

Oh, I didn't know about this: https://llvm.org/docs/CommandGuide/FileCheck.html#the-check-count-directive

The patch LGTM. Thanks!

Closed by commit rG965ed1e974e8: [AArch64] Fix issues with large arrays on stack (authored by kiranchandramohan). · Explain WhyDec 10 2019, 3:54 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

MachineFrameInfo.h

2 lines

lib/

CodeGen/

GlobalISel/

IRTranslator.cpp

6 lines

MachineFrameInfo.cpp

8 lines

Target/

AArch64/

AArch64FrameLowering.h

4 lines

AArch64FrameLowering.cpp

38 lines

AArch64InstrInfo.cpp

2 lines

AArch64MachineFunctionInfo.h

6 lines

AArch64RegisterInfo.cpp

1 line

test/

CodeGen/

AArch64/

large-stack.ll

49 lines

Diff 230874

llvm/include/llvm/CodeGen/MachineFrameInfo.h

Show First 20 Lines • Show All 547 Lines • ▼ Show 20 Lines	public:
/// all of the fixed size frame objects. This is only valid after		/// all of the fixed size frame objects. This is only valid after
/// Prolog/Epilog code insertion has finalized the stack frame layout.		/// Prolog/Epilog code insertion has finalized the stack frame layout.
uint64_t getStackSize() const { return StackSize; }		uint64_t getStackSize() const { return StackSize; }

/// Set the size of the stack.		/// Set the size of the stack.
void setStackSize(uint64_t Size) { StackSize = Size; }		void setStackSize(uint64_t Size) { StackSize = Size; }

/// Estimate and return the size of the stack frame.		/// Estimate and return the size of the stack frame.
unsigned estimateStackSize(const MachineFunction &MF) const;		uint64_t estimateStackSize(const MachineFunction &MF) const;

/// Return the correction for frame offsets.		/// Return the correction for frame offsets.
int getOffsetAdjustment() const { return OffsetAdjustment; }		int getOffsetAdjustment() const { return OffsetAdjustment; }

/// Set the correction for frame offsets.		/// Set the correction for frame offsets.
void setOffsetAdjustment(int Adj) { OffsetAdjustment = Adj; }		void setOffsetAdjustment(int Adj) { OffsetAdjustment = Adj; }

/// Return the alignment in bytes that this function must be aligned to,		/// Return the alignment in bytes that this function must be aligned to,
▲ Show 20 Lines • Show All 250 Lines • Show Last 20 Lines

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp

Show First 20 Lines • Show All 219 Lines • ▼ Show 20 Lines	ArrayRef<Register> IRTranslator::getOrCreateVRegs(const Value &Val) {

return *VRegs;		return *VRegs;
}		}

int IRTranslator::getOrCreateFrameIndex(const AllocaInst &AI) {		int IRTranslator::getOrCreateFrameIndex(const AllocaInst &AI) {
if (FrameIndices.find(&AI) != FrameIndices.end())		if (FrameIndices.find(&AI) != FrameIndices.end())
return FrameIndices[&AI];		return FrameIndices[&AI];

unsigned ElementSize = DL->getTypeAllocSize(AI.getAllocatedType());		uint64_t ElementSize = DL->getTypeAllocSize(AI.getAllocatedType());
unsigned Size =		uint64_t Size =
ElementSize * cast<ConstantInt>(AI.getArraySize())->getZExtValue();		ElementSize * cast<ConstantInt>(AI.getArraySize())->getZExtValue();

// Always allocate at least one byte.		// Always allocate at least one byte.
Size = std::max(Size, 1u);		Size = std::max<uint64_t>(Size, 1u);

unsigned Alignment = AI.getAlignment();		unsigned Alignment = AI.getAlignment();
if (!Alignment)		if (!Alignment)
Alignment = DL->getABITypeAlignment(AI.getAllocatedType());		Alignment = DL->getABITypeAlignment(AI.getAllocatedType());

int &FI = FrameIndices[&AI];		int &FI = FrameIndices[&AI];
FI = MF->getFrameInfo().CreateStackObject(Size, Alignment, false, &AI);		FI = MF->getFrameInfo().CreateStackObject(Size, Alignment, false, &AI);
return FI;		return FI;
▲ Show 20 Lines • Show All 2,179 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachineFrameInfo.cpp

Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	BitVector MachineFrameInfo::getPristineRegs(const MachineFunction &MF) const {
// Saved CSRs are not pristine.		// Saved CSRs are not pristine.
for (auto &I : getCalleeSavedInfo())		for (auto &I : getCalleeSavedInfo())
for (MCSubRegIterator S(I.getReg(), TRI, true); S.isValid(); ++S)		for (MCSubRegIterator S(I.getReg(), TRI, true); S.isValid(); ++S)
BV.reset(*S);		BV.reset(*S);

return BV;		return BV;
}		}

unsigned MachineFrameInfo::estimateStackSize(const MachineFunction &MF) const {		uint64_t MachineFrameInfo::estimateStackSize(const MachineFunction &MF) const {
const TargetFrameLowering *TFI = MF.getSubtarget().getFrameLowering();		const TargetFrameLowering *TFI = MF.getSubtarget().getFrameLowering();
const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();		const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
unsigned MaxAlign = getMaxAlignment();		unsigned MaxAlign = getMaxAlignment();
int Offset = 0;		int64_t Offset = 0;

// This code is very, very similar to PEI::calculateFrameObjectOffsets().		// This code is very, very similar to PEI::calculateFrameObjectOffsets().
// It really should be refactored to share code. Until then, changes		// It really should be refactored to share code. Until then, changes
// should keep in mind that there's tight coupling between the two.		// should keep in mind that there's tight coupling between the two.

for (int i = getObjectIndexBegin(); i != 0; ++i) {		for (int i = getObjectIndexBegin(); i != 0; ++i) {
// Only estimate stack size of default stack.		// Only estimate stack size of default stack.
if (getStackID(i) != TargetStackID::Default)		if (getStackID(i) != TargetStackID::Default)
continue;		continue;
int FixedOff = -getObjectOffset(i);		int64_t FixedOff = -getObjectOffset(i);
if (FixedOff > Offset) Offset = FixedOff;		if (FixedOff > Offset) Offset = FixedOff;
}		}
for (unsigned i = 0, e = getObjectIndexEnd(); i != e; ++i) {		for (unsigned i = 0, e = getObjectIndexEnd(); i != e; ++i) {
// Only estimate stack size of live objects on default stack.		// Only estimate stack size of live objects on default stack.
if (isDeadObjectIndex(i) \|\| getStackID(i) != TargetStackID::Default)		if (isDeadObjectIndex(i) \|\| getStackID(i) != TargetStackID::Default)
continue;		continue;
Offset += getObjectSize(i);		Offset += getObjectSize(i);
unsigned Align = getObjectAlignment(i);		unsigned Align = getObjectAlignment(i);
Show All 19 Lines	else
StackAlign = TFI->getTransientStackAlignment();		StackAlign = TFI->getTransientStackAlignment();

// If the frame pointer is eliminated, all frame offsets will be relative to		// If the frame pointer is eliminated, all frame offsets will be relative to
// SP not FP. Align to MaxAlign so this works.		// SP not FP. Align to MaxAlign so this works.
StackAlign = std::max(StackAlign, MaxAlign);		StackAlign = std::max(StackAlign, MaxAlign);
unsigned AlignMask = StackAlign - 1;		unsigned AlignMask = StackAlign - 1;
Offset = (Offset + AlignMask) & ~uint64_t(AlignMask);		Offset = (Offset + AlignMask) & ~uint64_t(AlignMask);

return (unsigned)Offset;		return (uint64_t)Offset;
}		}

void MachineFrameInfo::computeMaxCallFrameSize(const MachineFunction &MF) {		void MachineFrameInfo::computeMaxCallFrameSize(const MachineFunction &MF) {
const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();		const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
unsigned FrameSetupOpcode = TII.getCallFrameSetupOpcode();		unsigned FrameSetupOpcode = TII.getCallFrameSetupOpcode();
unsigned FrameDestroyOpcode = TII.getCallFrameDestroyOpcode();		unsigned FrameDestroyOpcode = TII.getCallFrameDestroyOpcode();
assert(FrameSetupOpcode != ~0u && FrameDestroyOpcode != ~0u &&		assert(FrameSetupOpcode != ~0u && FrameDestroyOpcode != ~0u &&
"Can only compute MaxCallFrameSize if Setup/Destroy opcode are known");		"Can only compute MaxCallFrameSize if Setup/Destroy opcode are known");
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64FrameLowering.h

Show All 38 Lines	public:
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override;		bool canUseAsPrologue(const MachineBasicBlock &MBB) const override;

int getFrameIndexReference(const MachineFunction &MF, int FI,		int getFrameIndexReference(const MachineFunction &MF, int FI,
unsigned &FrameReg) const override;		unsigned &FrameReg) const override;
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI,		StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI,
unsigned &FrameReg, bool PreferFP,		unsigned &FrameReg, bool PreferFP,
bool ForSimm) const;		bool ForSimm) const;
StackOffset resolveFrameOffsetReference(const MachineFunction &MF,		StackOffset resolveFrameOffsetReference(const MachineFunction &MF,
int ObjectOffset, bool isFixed,		int64_t ObjectOffset, bool isFixed,
bool isSVE, unsigned &FrameReg,		bool isSVE, unsigned &FrameReg,
bool PreferFP, bool ForSimm) const;		bool PreferFP, bool ForSimm) const;
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB,		bool spillCalleeSavedRegisters(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI,		MachineBasicBlock::iterator MI,
const std::vector<CalleeSavedInfo> &CSI,		const std::vector<CalleeSavedInfo> &CSI,
const TargetRegisterInfo *TRI) const override;		const TargetRegisterInfo *TRI) const override;

bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB,		bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB,
Show All 40 Lines	bool isSupportedStackID(TargetStackID::Value ID) const override {
case TargetStackID::SVEVector:		case TargetStackID::SVEVector:
case TargetStackID::NoAlloc:		case TargetStackID::NoAlloc:
return true;		return true;
}		}
}		}

private:		private:
bool shouldCombineCSRLocalStackBump(MachineFunction &MF,		bool shouldCombineCSRLocalStackBump(MachineFunction &MF,
unsigned StackBumpBytes) const;		uint64_t StackBumpBytes) const;

int64_t estimateSVEStackObjectOffsets(MachineFrameInfo &MF) const;		int64_t estimateSVEStackObjectOffsets(MachineFrameInfo &MF) const;
int64_t assignSVEStackObjectOffsets(MachineFrameInfo &MF,		int64_t assignSVEStackObjectOffsets(MachineFrameInfo &MF,
int &MinCSFrameIndex,		int &MinCSFrameIndex,
int &MaxCSFrameIndex) const;		int &MaxCSFrameIndex) const;
};		};

} // End llvm namespace		} // End llvm namespace

#endif		#endif

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp

Show First 20 Lines • Show All 221 Lines • ▼ Show 20 Lines	if (!EnableRedZone)
return false;		return false;
// Don't use the red zone if the function explicitly asks us not to.		// Don't use the red zone if the function explicitly asks us not to.
// This is typically used for kernel code.		// This is typically used for kernel code.
if (MF.getFunction().hasFnAttribute(Attribute::NoRedZone))		if (MF.getFunction().hasFnAttribute(Attribute::NoRedZone))
return false;		return false;

const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();		const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
unsigned NumBytes = AFI->getLocalStackSize();		uint64_t NumBytes = AFI->getLocalStackSize();

return !(MFI.hasCalls() \|\| hasFP(MF) \|\| NumBytes > 128 \|\|		return !(MFI.hasCalls() \|\| hasFP(MF) \|\| NumBytes > 128 \|\|
getSVEStackSize(MF));		getSVEStackSize(MF));
}		}

/// hasFP - Return true if the specified function should have a dedicated frame		/// hasFP - Return true if the specified function should have a dedicated frame
/// pointer register.		/// pointer register.
bool AArch64FrameLowering::hasFP(const MachineFunction &MF) const {		bool AArch64FrameLowering::hasFP(const MachineFunction &MF) const {
▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	bool AArch64FrameLowering::canUseAsPrologue(
if (!RegInfo->needsStackRealignment(*MF))		if (!RegInfo->needsStackRealignment(*MF))
return true;		return true;
// Otherwise, we can use any block as long as it has a scratch register		// Otherwise, we can use any block as long as it has a scratch register
// available.		// available.
return findScratchNonCalleeSaveRegister(TmpMBB) != AArch64::NoRegister;		return findScratchNonCalleeSaveRegister(TmpMBB) != AArch64::NoRegister;
}		}

static bool windowsRequiresStackProbe(MachineFunction &MF,		static bool windowsRequiresStackProbe(MachineFunction &MF,
unsigned StackSizeInBytes) {		uint64_t StackSizeInBytes) {
const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();		const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
if (!Subtarget.isTargetWindows())		if (!Subtarget.isTargetWindows())
return false;		return false;
const Function &F = MF.getFunction();		const Function &F = MF.getFunction();
// TODO: When implementing stack protectors, take that into account		// TODO: When implementing stack protectors, take that into account
// for the probe threshold.		// for the probe threshold.
unsigned StackProbeSize = 4096;		unsigned StackProbeSize = 4096;
if (F.hasFnAttribute("stack-probe-size"))		if (F.hasFnAttribute("stack-probe-size"))
F.getFnAttribute("stack-probe-size")		F.getFnAttribute("stack-probe-size")
.getValueAsString()		.getValueAsString()
.getAsInteger(0, StackProbeSize);		.getAsInteger(0, StackProbeSize);
return (StackSizeInBytes >= StackProbeSize) &&		return (StackSizeInBytes >= StackProbeSize) &&
!F.hasFnAttribute("no-stack-arg-probe");		!F.hasFnAttribute("no-stack-arg-probe");
}		}

bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(		bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
MachineFunction &MF, unsigned StackBumpBytes) const {		MachineFunction &MF, uint64_t StackBumpBytes) const {
AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();		AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();		const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();		const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();

if (MF.getFunction().hasOptSize())		if (MF.getFunction().hasOptSize())
return false;		return false;

▲ Show 20 Lines • Show All 265 Lines • ▼ Show 20 Lines	static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(
}		}

return std::prev(MBB.erase(MBBI));		return std::prev(MBB.erase(MBBI));
}		}

// Fixup callee-save register save/restore instructions to take into account		// Fixup callee-save register save/restore instructions to take into account
// combined SP bump by adding the local stack size to the stack offsets.		// combined SP bump by adding the local stack size to the stack offsets.
static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI,		static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI,
unsigned LocalStackSize,		uint64_t LocalStackSize,
bool NeedsWinCFI,		bool NeedsWinCFI,
bool *HasWinCFI) {		bool *HasWinCFI) {
if (AArch64InstrInfo::isSEHInstruction(MI))		if (AArch64InstrInfo::isSEHInstruction(MI))
return;		return;

unsigned Opc = MI.getOpcode();		unsigned Opc = MI.getOpcode();

// Ignore instructions that do not operate on SP, i.e. shadow call stack		// Ignore instructions that do not operate on SP, i.e. shadow call stack
▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
const StackOffset &SVEStackSize = getSVEStackSize(MF);		const StackOffset &SVEStackSize = getSVEStackSize(MF);

// getStackSize() includes all the locals in its size calculation. We don't		// getStackSize() includes all the locals in its size calculation. We don't
// include these locals when computing the stack size of a funclet, as they		// include these locals when computing the stack size of a funclet, as they
// are allocated in the parent's stack frame and accessed via the frame		// are allocated in the parent's stack frame and accessed via the frame
// pointer from the funclet. We only save the callee saved registers in the		// pointer from the funclet. We only save the callee saved registers in the
// funclet, which are really the callee saved registers of the parent		// funclet, which are really the callee saved registers of the parent
// function, including the funclet.		// function, including the funclet.
int NumBytes = IsFunclet ? (int)getWinEHFuncletFrameSize(MF)		int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
: (int)MFI.getStackSize();		: MFI.getStackSize();
		efriedmaUnsubmitted Done Reply Inline Actions Indentation. efriedma: Indentation.
if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {		if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {
assert(!HasFP && "unexpected function without stack frame but with FP");		assert(!HasFP && "unexpected function without stack frame but with FP");
assert(!SVEStackSize &&		assert(!SVEStackSize &&
"unexpected function without stack frame but with SVE objects");		"unexpected function without stack frame but with SVE objects");
// All of the stack allocation is for locals.		// All of the stack allocation is for locals.
AFI->setLocalStackSize(NumBytes);		AFI->setLocalStackSize(NumBytes);
if (!NumBytes)		if (!NumBytes)
return;		return;
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	if (F.hasPersonalityFn()) {
}		}
}		}

return;		return;
}		}

if (HasFP) {		if (HasFP) {
// Only set up FP if we actually need to.		// Only set up FP if we actually need to.
int FPOffset = isTargetDarwin(MF) ? (AFI->getCalleeSavedStackSize() - 16) : 0;		int64_t FPOffset = isTargetDarwin(MF) ? (AFI->getCalleeSavedStackSize() - 16) : 0;

if (CombineSPBump)		if (CombineSPBump)
FPOffset += AFI->getLocalStackSize();		FPOffset += AFI->getLocalStackSize();

// Issue sub fp, sp, FPOffset or		// Issue sub fp, sp, FPOffset or
// mov fp,sp when FPOffset is zero.		// mov fp,sp when FPOffset is zero.
// Note: All stores of callee-saved registers are marked as "FrameSetup".		// Note: All stores of callee-saved registers are marked as "FrameSetup".
// This code marks the instruction(s) that set the FP also.		// This code marks the instruction(s) that set the FP also.
emitFrameOffset(MBB, MBBI, DL, AArch64::FP, AArch64::SP,		emitFrameOffset(MBB, MBBI, DL, AArch64::FP, AArch64::SP,
{FPOffset, MVT::i8}, TII, MachineInstr::FrameSetup, false,		{FPOffset, MVT::i8}, TII, MachineInstr::FrameSetup, false,
NeedsWinCFI, &HasWinCFI);		NeedsWinCFI, &HasWinCFI);
}		}

if (windowsRequiresStackProbe(MF, NumBytes)) {		if (windowsRequiresStackProbe(MF, NumBytes)) {
uint32_t NumWords = NumBytes >> 4;		uint32_t NumWords = NumBytes >> 4;
		efriedmaUnsubmitted Done Reply Inline Actions This should also be uint64_t? efriedma: This should also be uint64_t?
		kiranchandramohanAuthorUnsubmitted Done Reply Inline Actions Initially missed out since this portion of the code seems to error out. But you are correct that it has to be int64_t to prevent overflows. kiranchandramohan: Initially missed out since this portion of the code seems to error out. But you are correct…
if (NeedsWinCFI) {		if (NeedsWinCFI) {
HasWinCFI = true;		HasWinCFI = true;
// alloc_l can hold at most 256MB, so assume that NumBytes doesn't		// alloc_l can hold at most 256MB, so assume that NumBytes doesn't
// exceed this amount. We need to move at most 2^24 - 1 into x15.		// exceed this amount. We need to move at most 2^24 - 1 into x15.
// This is at most two instructions, MOVZ follwed by MOVK.		// This is at most two instructions, MOVZ follwed by MOVK.
// TODO: Fix to use multiple stack alloc unwind codes for stacks		// TODO: Fix to use multiple stack alloc unwind codes for stacks
// exceeding 256MB in size.		// exceeding 256MB in size.
if (NumBytes >= (1 << 28))		if (NumBytes >= (1 << 28))
▲ Show 20 Lines • Show All 340 Lines • ▼ Show 20 Lines	if (MBB.end() != MBBI) {
DL = MBBI->getDebugLoc();		DL = MBBI->getDebugLoc();
unsigned RetOpcode = MBBI->getOpcode();		unsigned RetOpcode = MBBI->getOpcode();
IsTailCallReturn = RetOpcode == AArch64::TCRETURNdi \|\|		IsTailCallReturn = RetOpcode == AArch64::TCRETURNdi \|\|
RetOpcode == AArch64::TCRETURNri \|\|		RetOpcode == AArch64::TCRETURNri \|\|
RetOpcode == AArch64::TCRETURNriBTI;		RetOpcode == AArch64::TCRETURNriBTI;
IsFunclet = isFuncletReturnInstr(*MBBI);		IsFunclet = isFuncletReturnInstr(*MBBI);
}		}

int NumBytes = IsFunclet ? (int)getWinEHFuncletFrameSize(MF)		int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
: MFI.getStackSize();		: MFI.getStackSize();
		efriedmaUnsubmitted Done Reply Inline Actions Indentation here needs to be fixed. efriedma: Indentation here needs to be fixed.
		kiranchandramohanAuthorUnsubmitted Done Reply Inline Actions Will fix, thanks. kiranchandramohan: Will fix, thanks.
		kiranchandramohanAuthorUnsubmitted Done Reply Inline Actions Done. kiranchandramohan: Done.
AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();		AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();

// All calls are tail calls in GHC calling conv, and functions have no		// All calls are tail calls in GHC calling conv, and functions have no
// prologue/epilogue.		// prologue/epilogue.
if (MF.getFunction().getCallingConv() == CallingConv::GHC)		if (MF.getFunction().getCallingConv() == CallingConv::GHC)
return;		return;

// Initial and residual are named for consistency with the prologue. Note that		// Initial and residual are named for consistency with the prologue. Note that
▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
if (!hasFP(MF)) {		if (!hasFP(MF)) {
bool RedZone = canUseRedZone(MF);		bool RedZone = canUseRedZone(MF);
// If this was a redzone leaf function, we don't need to restore the		// If this was a redzone leaf function, we don't need to restore the
// stack pointer (but we may need to pop stack args for fastcc).		// stack pointer (but we may need to pop stack args for fastcc).
if (RedZone && AfterCSRPopSize == 0)		if (RedZone && AfterCSRPopSize == 0)
return;		return;

bool NoCalleeSaveRestore = PrologueSaveSize == 0;		bool NoCalleeSaveRestore = PrologueSaveSize == 0;
int StackRestoreBytes = RedZone ? 0 : NumBytes;		int64_t StackRestoreBytes = RedZone ? 0 : NumBytes;
if (NoCalleeSaveRestore)		if (NoCalleeSaveRestore)
StackRestoreBytes += AfterCSRPopSize;		StackRestoreBytes += AfterCSRPopSize;

// If we were able to combine the local stack pop with the argument pop,		// If we were able to combine the local stack pop with the argument pop,
// then we're done.		// then we're done.
bool Done = NoCalleeSaveRestore \|\| AfterCSRPopSize == 0;		bool Done = NoCalleeSaveRestore \|\| AfterCSRPopSize == 0;

// If we're done after this, make sure to help the load store optimizer.		// If we're done after this, make sure to help the load store optimizer.
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	return resolveFrameIndexReference(
.getBytes();		.getBytes();
}		}

int AArch64FrameLowering::getNonLocalFrameIndexReference(		int AArch64FrameLowering::getNonLocalFrameIndexReference(
const MachineFunction &MF, int FI) const {		const MachineFunction &MF, int FI) const {
return getSEHFrameIndexOffset(MF, FI);		return getSEHFrameIndexOffset(MF, FI);
}		}

static StackOffset getFPOffset(const MachineFunction &MF, int ObjectOffset) {		static StackOffset getFPOffset(const MachineFunction &MF, int64_t ObjectOffset) {
const auto *AFI = MF.getInfo<AArch64FunctionInfo>();		const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();		const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
bool IsWin64 =		bool IsWin64 =
Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv());		Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv());
unsigned FixedObject = IsWin64 ? alignTo(AFI->getVarArgsGPRSize(), 16) : 0;		unsigned FixedObject = IsWin64 ? alignTo(AFI->getVarArgsGPRSize(), 16) : 0;
unsigned FPAdjust = isTargetDarwin(MF)		unsigned FPAdjust = isTargetDarwin(MF)
? 16 : AFI->getCalleeSavedStackSize(MF.getFrameInfo());		? 16 : AFI->getCalleeSavedStackSize(MF.getFrameInfo());
return {ObjectOffset + FixedObject + FPAdjust, MVT::i8};		return {ObjectOffset + FixedObject + FPAdjust, MVT::i8};
}		}

static StackOffset getStackOffset(const MachineFunction &MF, int ObjectOffset) {		static StackOffset getStackOffset(const MachineFunction &MF, int64_t ObjectOffset) {
const auto &MFI = MF.getFrameInfo();		const auto &MFI = MF.getFrameInfo();
return {ObjectOffset + (int)MFI.getStackSize(), MVT::i8};		return {ObjectOffset + (int64_t)MFI.getStackSize(), MVT::i8};
}		}

int AArch64FrameLowering::getSEHFrameIndexOffset(const MachineFunction &MF,		int AArch64FrameLowering::getSEHFrameIndexOffset(const MachineFunction &MF,
int FI) const {		int FI) const {
const auto RegInfo = static_cast<const AArch64RegisterInfo >(		const auto RegInfo = static_cast<const AArch64RegisterInfo >(
MF.getSubtarget().getRegisterInfo());		MF.getSubtarget().getRegisterInfo());
int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);		int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
return RegInfo->getLocalAddressRegister(MF) == AArch64::FP		return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
? getFPOffset(MF, ObjectOffset).getBytes()		? getFPOffset(MF, ObjectOffset).getBytes()
: getStackOffset(MF, ObjectOffset).getBytes();		: getStackOffset(MF, ObjectOffset).getBytes();
}		}

StackOffset AArch64FrameLowering::resolveFrameIndexReference(		StackOffset AArch64FrameLowering::resolveFrameIndexReference(
const MachineFunction &MF, int FI, unsigned &FrameReg, bool PreferFP,		const MachineFunction &MF, int FI, unsigned &FrameReg, bool PreferFP,
bool ForSimm) const {		bool ForSimm) const {
const auto &MFI = MF.getFrameInfo();		const auto &MFI = MF.getFrameInfo();
int ObjectOffset = MFI.getObjectOffset(FI);		int64_t ObjectOffset = MFI.getObjectOffset(FI);
bool isFixed = MFI.isFixedObjectIndex(FI);		bool isFixed = MFI.isFixedObjectIndex(FI);
bool isSVE = MFI.getStackID(FI) == TargetStackID::SVEVector;		bool isSVE = MFI.getStackID(FI) == TargetStackID::SVEVector;
return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,		return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,
PreferFP, ForSimm);		PreferFP, ForSimm);
}		}

StackOffset AArch64FrameLowering::resolveFrameOffsetReference(		StackOffset AArch64FrameLowering::resolveFrameOffsetReference(
const MachineFunction &MF, int ObjectOffset, bool isFixed, bool isSVE,		const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE,
unsigned &FrameReg, bool PreferFP, bool ForSimm) const {		unsigned &FrameReg, bool PreferFP, bool ForSimm) const {
const auto &MFI = MF.getFrameInfo();		const auto &MFI = MF.getFrameInfo();
const auto RegInfo = static_cast<const AArch64RegisterInfo >(		const auto RegInfo = static_cast<const AArch64RegisterInfo >(
MF.getSubtarget().getRegisterInfo());		MF.getSubtarget().getRegisterInfo());
const auto *AFI = MF.getInfo<AArch64FunctionInfo>();		const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();		const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();

int FPOffset = getFPOffset(MF, ObjectOffset).getBytes();		int64_t FPOffset = getFPOffset(MF, ObjectOffset).getBytes();
int Offset = getStackOffset(MF, ObjectOffset).getBytes();		int64_t Offset = getStackOffset(MF, ObjectOffset).getBytes();
bool isCSR =		bool isCSR =
!isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));		!isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));

const StackOffset &SVEStackSize = getSVEStackSize(MF);		const StackOffset &SVEStackSize = getSVEStackSize(MF);

// Use frame pointer to reference fixed objects. Use it for locals if		// Use frame pointer to reference fixed objects. Use it for locals if
// there are VLAs or a dynamically realigned SP (and thus the SP isn't		// there are VLAs or a dynamically realigned SP (and thus the SP isn't
// reliable as a base). Make sure useFPForScavengingIndex() does the		// reliable as a base). Make sure useFPForScavengingIndex() does the
▲ Show 20 Lines • Show All 668 Lines • ▼ Show 20 Lines	for (unsigned Reg : SavedRegs.set_bits()) {
else		else
CSStackSize += RegSize;		CSStackSize += RegSize;
}		}

// Save number of saved regs, so we can easily update CSStackSize later.		// Save number of saved regs, so we can easily update CSStackSize later.
unsigned NumSavedRegs = SavedRegs.count();		unsigned NumSavedRegs = SavedRegs.count();

// The frame record needs to be created by saving the appropriate registers		// The frame record needs to be created by saving the appropriate registers
unsigned EstimatedStackSize = MFI.estimateStackSize(MF);		uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
if (hasFP(MF) \|\|		if (hasFP(MF) \|\|
windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {		windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
SavedRegs.set(AArch64::FP);		SavedRegs.set(AArch64::FP);
SavedRegs.set(AArch64::LR);		SavedRegs.set(AArch64::LR);
}		}

LLVM_DEBUG(dbgs() << "*** determineCalleeSaves\nSaved CSRs:";		LLVM_DEBUG(dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
for (unsigned Reg		for (unsigned Reg
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	if (!ExtraCSSpill \|\| MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
RS->addScavengingFrameIndex(FI);		RS->addScavengingFrameIndex(FI);
LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI		LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
<< " as the emergency spill slot.\n");		<< " as the emergency spill slot.\n");
}		}
}		}

// Adding the size of additional 64bit GPR saves.		// Adding the size of additional 64bit GPR saves.
CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);		CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
unsigned AlignedCSStackSize = alignTo(CSStackSize, 16);		uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
LLVM_DEBUG(dbgs() << "Estimated stack frame size: "		LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
<< EstimatedStackSize + AlignedCSStackSize		<< EstimatedStackSize + AlignedCSStackSize
<< " bytes.\n");		<< " bytes.\n");

assert((!MFI.isCalleeSavedInfoValid() \|\|		assert((!MFI.isCalleeSavedInfoValid() \|\|
AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&		AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
"Should not invalidate callee saved info");		"Should not invalidate callee saved info");

▲ Show 20 Lines • Show All 194 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

Show First 20 Lines • Show All 3,102 Lines • ▼ Show 20 Lines	static void emitFrameOffsetAdj(MachineBasicBlock &MBB,
// register can be loaded with offset%8 and the add/sub can use an extending		// register can be loaded with offset%8 and the add/sub can use an extending
// instruction with LSL#3.		// instruction with LSL#3.
// Currently the function handles any offsets but generates a poor sequence		// Currently the function handles any offsets but generates a poor sequence
// of code.		// of code.
// assert(Offset < (1 << 24) && "unimplemented reg plus immediate");		// assert(Offset < (1 << 24) && "unimplemented reg plus immediate");

const unsigned MaxEncodableValue = MaxEncoding << ShiftSize;		const unsigned MaxEncodableValue = MaxEncoding << ShiftSize;
do {		do {
unsigned ThisVal = std::min<unsigned>(Offset, MaxEncodableValue);		uint64_t ThisVal = std::min<uint64_t>(Offset, MaxEncodableValue);
unsigned LocalShiftSize = 0;		unsigned LocalShiftSize = 0;
if (ThisVal > MaxEncoding) {		if (ThisVal > MaxEncoding) {
ThisVal = ThisVal >> ShiftSize;		ThisVal = ThisVal >> ShiftSize;
LocalShiftSize = ShiftSize;		LocalShiftSize = ShiftSize;
}		}
assert((ThisVal >> ShiftSize) <= MaxEncoding &&		assert((ThisVal >> ShiftSize) <= MaxEncoding &&
"Encoding cannot handle value that big");		"Encoding cannot handle value that big");
auto MBI = BuildMI(MBB, MBBI, DL, TII->get(Opc), DestReg)		auto MBI = BuildMI(MBB, MBBI, DL, TII->get(Opc), DestReg)
▲ Show 20 Lines • Show All 2,675 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h

Show All 34 Lines	class AArch64FunctionInfo final : public MachineFunctionInfo {
/// is expected to restore the argument stack this should be a multiple of 16,		/// is expected to restore the argument stack this should be a multiple of 16,
/// all usable during a tail call.		/// all usable during a tail call.
///		///
/// The alternative would forbid tail call optimisation in some cases: if we		/// The alternative would forbid tail call optimisation in some cases: if we
/// want to transfer control from a function with 8-bytes of stack-argument		/// want to transfer control from a function with 8-bytes of stack-argument
/// space to a function with 16-bytes then misalignment of this value would		/// space to a function with 16-bytes then misalignment of this value would
/// make a stack adjustment necessary, which could not be undone by the		/// make a stack adjustment necessary, which could not be undone by the
/// callee.		/// callee.
unsigned BytesInStackArgArea = 0;		unsigned BytesInStackArgArea = 0;
		efriedmaUnsubmitted Not Done Reply Inline Actions Is there a potential problem here as well? efriedma: Is there a potential problem here as well?
		kiranchandramohanAuthorUnsubmitted Done Reply Inline Actions I don't know whether i can answer that. Typically the ABI prohibits passing large objects by value. For e.g. the AArch64 ABI disallows passing composite types by value whose size is larger than 16bytes. "If the argument type is a Composite Type that is larger than 16 bytes, then the argument is copied to memory allocated by the caller and the argument is replaced by a pointer to the copy." If this refers to arguments only then probably it is not needed. Do you feel there are cases which can cause this variable to have a high value which requires 64 bit? kiranchandramohan: I don't know whether i can answer that. Typically the ABI prohibits passing large objects by…
		efriedmaUnsubmitted Not Done Reply Inline Actions I guess not in practice; functions can't have that many arguments. efriedma: I guess not in practice; functions can't have that many arguments.

/// The number of bytes to restore to deallocate space for incoming		/// The number of bytes to restore to deallocate space for incoming
/// arguments. Canonically 0 in the C calling convention, but non-zero when		/// arguments. Canonically 0 in the C calling convention, but non-zero when
/// callee is expected to pop the args.		/// callee is expected to pop the args.
unsigned ArgumentStackToRestore = 0;		unsigned ArgumentStackToRestore = 0;

/// HasStackFrame - True if this function has a stack frame. Set by		/// HasStackFrame - True if this function has a stack frame. Set by
/// determineCalleeSaves().		/// determineCalleeSaves().
bool HasStackFrame = false;		bool HasStackFrame = false;

/// Amount of stack frame size, not including callee-saved registers.		/// Amount of stack frame size, not including callee-saved registers.
unsigned LocalStackSize = 0;		uint64_t LocalStackSize = 0;

/// The start and end frame indices for the SVE callee saves.		/// The start and end frame indices for the SVE callee saves.
int MinSVECSFrameIndex = 0;		int MinSVECSFrameIndex = 0;
int MaxSVECSFrameIndex = 0;		int MaxSVECSFrameIndex = 0;

/// Amount of stack frame size used for saving callee-saved registers.		/// Amount of stack frame size used for saving callee-saved registers.
unsigned CalleeSavedStackSize = 0;		unsigned CalleeSavedStackSize = 0;
unsigned SVECalleeSavedStackSize = 0;		unsigned SVECalleeSavedStackSize = 0;
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	bool hasCalleeSaveStackFreeSpace() const {
return CalleeSaveStackHasFreeSpace;		return CalleeSaveStackHasFreeSpace;
}		}
void setCalleeSaveStackHasFreeSpace(bool s) {		void setCalleeSaveStackHasFreeSpace(bool s) {
CalleeSaveStackHasFreeSpace = s;		CalleeSaveStackHasFreeSpace = s;
}		}
bool isSplitCSR() const { return IsSplitCSR; }		bool isSplitCSR() const { return IsSplitCSR; }
void setIsSplitCSR(bool s) { IsSplitCSR = s; }		void setIsSplitCSR(bool s) { IsSplitCSR = s; }

void setLocalStackSize(unsigned Size) { LocalStackSize = Size; }		void setLocalStackSize(uint64_t Size) { LocalStackSize = Size; }
unsigned getLocalStackSize() const { return LocalStackSize; }		uint64_t getLocalStackSize() const { return LocalStackSize; }

void setCalleeSavedStackSize(unsigned Size) {		void setCalleeSavedStackSize(unsigned Size) {
CalleeSavedStackSize = Size;		CalleeSavedStackSize = Size;
HasCalleeSavedStackSize = true;		HasCalleeSavedStackSize = true;
}		}

// When CalleeSavedStackSize has not been set (for example when		// When CalleeSavedStackSize has not been set (for example when
// some MachineIR pass is run in isolation), then recalculate		// some MachineIR pass is run in isolation), then recalculate
▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp

Show First 20 Lines • Show All 391 Lines • ▼ Show 20 Lines	bool AArch64RegisterInfo::needsFrameBaseReg(MachineInstr *MI,

// The offset likely isn't legal; we want to allocate a virtual base register.		// The offset likely isn't legal; we want to allocate a virtual base register.
return true;		return true;
}		}

bool AArch64RegisterInfo::isFrameOffsetLegal(const MachineInstr *MI,		bool AArch64RegisterInfo::isFrameOffsetLegal(const MachineInstr *MI,
unsigned BaseReg,		unsigned BaseReg,
int64_t Offset) const {		int64_t Offset) const {
assert(Offset <= INT_MAX && "Offset too big to fit in int.");
assert(MI && "Unable to get the legal offset for nil instruction.");		assert(MI && "Unable to get the legal offset for nil instruction.");
StackOffset SaveOffset(Offset, MVT::i8);		StackOffset SaveOffset(Offset, MVT::i8);
return isAArch64FrameOffsetLegal(*MI, SaveOffset) & AArch64FrameOffsetIsLegal;		return isAArch64FrameOffsetLegal(*MI, SaveOffset) & AArch64FrameOffsetIsLegal;
}		}

/// Insert defining instruction(s) for BaseReg to be a pointer to FrameIdx		/// Insert defining instruction(s) for BaseReg to be a pointer to FrameIdx
/// at the beginning of the basic block.		/// at the beginning of the basic block.
void AArch64RegisterInfo::materializeFrameBaseRegister(MachineBasicBlock *MBB,		void AArch64RegisterInfo::materializeFrameBaseRegister(MachineBasicBlock *MBB,
▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/large-stack.ll

This file was added.

				; RUN: llc < %s -mtriple=aarch64-none-linux-gnu \| FileCheck %s

				fpetrogalliUnsubmitted Not Done Reply Inline Actions Why can't you use the same prefix? fpetrogalli: Why can't you use the same prefix?
				kiranchandramohanAuthorUnsubmitted Done Reply Inline Actions So that i can progressively construct the test. First check the spills, then the stack and the then accessing the value. Are you recommending combining them? kiranchandramohan: So that i can progressively construct the test. First check the spills, then the stack and the…
				fpetrogalliUnsubmitted Not Done Reply Inline Actions I can see value in having multiple prefixes when building the test. But multiple prefixes are designed to be used when executing multiple RUN lines on the same source. If you remove them the tests will be more clear. If you need to highlight what you are checking, I'd rather add plain comments inline where the CHECKs are: ; Checking spill ; CHECK: stp x[[SPILL_REG1:[0-9]+]], x[[SPILL_REG2:[0-9]+]], [sp, #-[[SPILL_OFFSET1:[0-9]+]]] ; CHECK-NEXT: str x[[SPILL_REG3:[0-9]+]], [sp, #[[SPILL_OFFSET2:[0-9]+]]] ; Checking frame ; CHECK: mov x[[FRAME:[0-9]+]], sp ; Checking setstack count 128 bit ; CHECK: sub sp, sp, #[[STACK1:[0-9]+]], lsl #[[SHIFT:[0-9]+]] ; ... fpetrogalli: I can see value in having multiple prefixes when building the test. But multiple prefixes are…
				@.str = private unnamed_addr constant [11 x i8] c"val = %ld\0A\00", align 1

				; Function Attrs: noinline optnone
				define dso_local void @set_large(i64 %val) #0 {
				entry:
				%val.addr = alloca i64, align 8
				%large = alloca [268435456 x i64], align 8
				%i = alloca i32, align 4
				store i64 %val, i64* %val.addr, align 8
				%0 = load i64, i64* %val.addr, align 8
				%arrayidx = getelementptr inbounds [268435456 x i64], [268435456 x i64]* %large, i64 0, i64 %0
				store i64 1, i64* %arrayidx, align 8
				%1 = load i64, i64* %val.addr, align 8
				%arrayidx1 = getelementptr inbounds [268435456 x i64], [268435456 x i64]* %large, i64 0, i64 %1
				%2 = load i64, i64* %arrayidx1, align 8
				%call = call i32 (i8, ...) @printf(i8 getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i64 0, i64 0), i64 %2)
				ret void
				}

				declare dso_local i32 @printf(i8*, ...)

				attributes #0 = { noinline optnone "no-frame-pointer-elim"="true" }

				fpetrogalliUnsubmitted Not Done Reply Inline Actions Do you need attribute #0 and #1? fpetrogalli: Do you need attribute #0 and #1?
				kiranchandramohanAuthorUnsubmitted Done Reply Inline Actions Maybe not. But these attributes are by default generated by clang and hence matches the IR that clang generates. Also, the test will need updating since the attribute specified uses some attributes (no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf") which can change the assembly code generated. What is the standard for these tests do you omit attributes? kiranchandramohan: Maybe not. But these attributes are by default generated by clang and hence matches the IR that…
				fpetrogalliUnsubmitted Not Done Reply Inline Actions If you need attributes, you should reduce them to the minimal set needed for the test. If `no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"` is all you need, remove the rest (and merge the two attributes in a single attribute). less is always better! :) fpetrogalli: If you need attributes, you should reduce them to the minimal set needed for the test. If `no…
				kiranchandramohanAuthorUnsubmitted Done Reply Inline Actions Have now updated the test to have only the necessary attributes and also removed prefixes. kiranchandramohan: Have now updated the test to have only the necessary attributes and also removed prefixes.
				; CHECK: stp x[[SPILL_REG1:[0-9]+]], x[[SPILL_REG2:[0-9]+]], [sp, #-[[SPILL_OFFSET1:[0-9]+]]]
				; CHECK-NEXT: str x[[SPILL_REG3:[0-9]+]], [sp, #[[SPILL_OFFSET2:[0-9]+]]]
				; CHECK: mov x[[FRAME:[0-9]+]], sp
				; CHECK-COUNT-128: sub sp, sp, #[[STACK1:[0-9]+]], lsl #[[SHIFT:[0-9]+]]
				efriedmaUnsubmitted Done Reply Inline Actions Using FileCheck variables for values that will never change isn't helpful. For example, "lsl #[[SHIFT:[0-9]+]]" is actually always "lsl #12", because that's the only legal value. Is there a reason some of these CHECKs aren't CHECK-NEXT? efriedma: Using FileCheck variables for values that will never change isn't helpful. For example, "lsl #…
				kiranchandramohanAuthorUnsubmitted Done Reply Inline Actions I have removed the variables from here. A few of these are CHECKS and not CHECK-NEXT. Skips over some cfi attributes ; CHECK: sub x[[INDEX:[0-9]+]], x[[FRAME]], #8 Skips over some set up for calling the print function ; CHECK: bl printf The CHECK-COUNTs did not seem to have a way of combining with NEXT. ; CHECK-COUNT-128: sub sp, sp, #[[STACK1:[0-9]+]], lsl #12 ; CHECK-COUNT-128: add sp, sp, #[[STACK1]], lsl #12 Please let me know if this is not OK. kiranchandramohan: I have removed the variables from here. A few of these are CHECKS and not CHECK-NEXT. 1)…
				; CHECK-NEXT: sub sp, sp, #[[STACK2:[0-9]+]], lsl #[[SHIFT]]
				; CHECK-NEXT: sub sp, sp, #[[STACK3:[0-9]+]]
				efriedmaUnsubmitted Not Done Reply Inline Actions What sequence are we generating to adjust the stack pointer? For a large array that fits in 32 bits, I see a long sequence of "sub sp, sp, #4095, lsl #12"; are we doing the same thing here? efriedma: What sequence are we generating to adjust the stack pointer? For a large array that fits in 32…
				kiranchandramohanAuthorUnsubmitted Done Reply Inline Actions Yes, we are doing the same thing here. That sequence is 128 long here. kiranchandramohan: Yes, we are doing the same thing here. That sequence is 128 long here.
				; CHECK: sub x[[INDEX:[0-9]+]], x[[FRAME]], #8
				; CHECK-NEXT: str x0, [x[[INDEX]]]
				; CHECK-NEXT: ldr x[[VAL1:[0-9]+]], [x[[INDEX]]]
				; CHECK-NEXT: mov x[[VAL2:[0-9]+]], #8
				; CHECK-NEXT: add x[[VAL3:[0-9]+]], sp, #8
				; CHECK-NEXT: madd x[[VAL1]], x[[VAL1]], x[[VAL2]], x[[VAL3]]
				; CHECK-NEXT: mov x[[TMP1:[0-9]+]], #1
				; CHECK-NEXT: str x[[TMP1]], [x[[VAL1]]]
				; CHECK-NEXT: ldr x[[INDEX]], [x[[INDEX]]]
				; CHECK-NEXT: mov x[[VAL4:[0-9]+]], #8
				; CHECK-NEXT: madd x[[INDEX]], x[[INDEX]], x[[VAL4]], x[[VAL3]]
				; CHECK-NEXT: ldr x1, [x[[INDEX]]
				; CHECK: bl printf
				; CHECK-COUNT-128: add sp, sp, #[[STACK1]], lsl #[[SHIFT]]
				; CHECK-NEXT: add sp, sp, #[[STACK2]], lsl #[[SHIFT]]
				; CHECK-NEXT: add sp, sp, #[[STACK3]]
				; CHECK: ldr x[[SPILL_REG3]], [sp, #[[SPILL_OFFSET2]]]
				; CHECK-NEXT: ldp x[[SPILL_REG1]], x[[SPILL_REG2]], [sp], #[[SPILL_OFFSET1]]

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Fix issues with large arrays on stackClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 230874

llvm/include/llvm/CodeGen/MachineFrameInfo.h

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp

llvm/lib/CodeGen/MachineFrameInfo.cpp

llvm/lib/Target/AArch64/AArch64FrameLowering.h

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h

llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp

llvm/test/CodeGen/AArch64/large-stack.ll

[AArch64] Fix issues with large arrays on stack
ClosedPublic