This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64FrameLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
arm64-never-combine-csr-local-stack-bump-for-size.ll

Differential D68530

[AArch64] Don't combine callee-save and local stack adjustment when optimizing for size
ClosedPublic

Authored by Nikolai on Oct 4 2019, 5:42 PM.

Download Raw Diff

Details

Reviewers

t.p.northover
gberry
dmgreen

Commits

rL375217: [AArch64] Don't combine callee-save and local stack adjustment when optimizing…
rG651f07908a14: [AArch64] Don't combine callee-save and local stack adjustment when optimizing…

Summary

For arm64, D18619 introduced the ability to combine bumping the stack pointer upfront in case it needs to be bumped for both the callee-save area as well as the lo\
cal stack area.

That diff already remarks that "This change can cause an increase in instructions", but argues that even when that happens, it should be still be a performance benefit because the number o\
f micro-ops is reduced.

We have observed that this code-size increase can be significant in practice. This diff disables combining stack bumping for methods that are marked as optimize-for-size.

Example of a prologue with the behavior before this diff (combining stack bumping when possible):

sub        sp, sp, #0x40
stp        d9, d8, [sp, #0x10]
stp        x20, x19, [sp, #0x20]
stp        x29, x30, [sp, #0x30]
add        x29, sp, #0x30
[... compute x8 somehow ...]
stp        x0, x8, [sp]

And after this diff, if the method is marked as optimize-for-size:

stp        d9, d8, [sp, #-0x30]!
stp        x20, x19, [sp, #0x10]
stp        x29, x30, [sp, #0x20]
add        x29, sp, #0x20
[... compute x8 somehow ...]
stp        x0, x8, [sp, #-0x10]!

Note that without combining the stack bump there are two auto-decrements, nicely folded into the stp instructions, whereas otherwise there is a single sub sp, ... instruction, but not \
folded.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Nikolai created this revision.Oct 4 2019, 5:42 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2019, 5:42 PM

Herald added subscribers: llvm-commits, hiraditya, kristof.beyls. · View Herald Transcript

Harbormaster completed remote builds in B39021: Diff 223347.Oct 4 2019, 5:42 PM

kyulee added a subscriber: kyulee.Oct 4 2019, 5:51 PM

Hello. Seems like a sensible change.

Do we need the option? Or should we just be doing this whenever we are Optsize?

Do we need the option? Or should we just be doing this whenever we are Optsize?

We don't really need it. I mainly introduced this option for "maximal backwards compatibility", in case anyone has taken a dependency on the code generation pattern outside of the LLVM test suite.
If you prefer, I can update the diff, removing the option.

My understanding is that D18619 was an optimisation, and that optimisation increases codesize in order to decrease micro-ops and improve scheduling?

If so then I don't think that anyone should be relying on it, exactly. Under minsize it should be fine to just not do the optimisation. It sounds like a fairly limited gain to me, increasing instruction count to gain on micro ops and scheduling, so the codesize benefit should be bigger win.

Addressing feedback by removing option, now always changing behavior when optimizing for size.

Nikolai retitled this revision from [AArch64] Make combining of callee-save and local stack adjustment optional to [AArch64] Don't combine callee-save and local stack adjustment when optimizing for size.Oct 10 2019, 8:02 PM

Nikolai edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B39395: Diff 224535.Oct 10 2019, 8:02 PM

LGTM, Thanks!

Do you want me to commit this, or do you have commit access already?

This revision is now accepted and ready to land.Oct 13 2019, 1:26 AM

Do you want me to commit this, or do you have commit access already?

I don't have commit access. So if you can commit, that would be great!

Closed by commit rG651f07908a14: [AArch64] Don't combine callee-save and local stack adjustment when optimizing… (authored by dmgreen). · Explain WhyOct 18 2019, 3:38 AM

This revision was automatically updated to reflect the committed changes.

This change appears to cause an assertion failure in clang during a Chromium for Windows on Arm (AArch64). We suspect that it is also the cause of a mis-compilation when clang does not have assertions enabled, and causes a crash in some test cases. See https://crbug.com/1029385 for details.

Oh boo. This is presumably some knock-on bug, that this is just exposing. Only on windows you say?

Do you have a reproducer?

In D68530#1764286, @dmgreen wrote:

Oh boo. This is presumably some knock-on bug, that this is just exposing. Only on windows you say?

Do you have a reproducer?

Here's a short repro:

$ cat /tmp/a.cc
template <typename T> void foo(T t);
struct S {
  virtual bool bar();
};
void baz() { foo(&S::bar); }
$ bin/clang -cc1 -triple arm64-unknown-windows-msvc19.16.0 -emit-obj -Os /tmp/a.cc
clang: /work/llvm.monorepo/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:706: MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(llvm::MachineBasicBlock &, MachineBasicBlock::iterator, const llvm::DebugLoc &, const llvm::TargetInstrInfo *, int, bool, bool *, bool): Assertion `MBBI->getOperand(OpndIdx - 1).getReg() == AArch64::SP && "Unexpected base register in callee-save save/restore instruction!"' failed.

I've reverted in c2443155a0f to unbreak the builds while this gets investigated.

hans mentioned this in rGc2443155a0fb: Revert 651f07908a1 "[AArch64] Don't combine callee-save and local stack….Nov 30 2019, 5:37 AM

Probably D18619 doesn't handle vcall thunk as below correctly when stack bump is not combined (in convertCalleeSaveRestoreToSPPrePostIncDec). The produced code has the last second add instruction removed which leaves unbalanced stack.

base_unittests!base::sequence_manager::TaskQueue::`vcall'{0}':
00007ff6`cc9660bc d10103ff sub         sp,sp,#0x40
00007ff6`cc9660c0 a9008be1 stp         x1,x2,[sp,#8]
00007ff6`cc9660c4 a90193e3 stp         x3,x4,[sp,#0x18]
00007ff6`cc9660c8 a9029be5 stp         x5,x6,[sp,#0x28]
00007ff6`cc9660cc f9001fe7 str         x7,[sp,#0x38]
00007ff6`cc9660d0 f9400009 ldr         x9,[x0]
00007ff6`cc9660d4 f8440529 ldr         x9,[x9],#0x40
                           add         sp, #0x40
00007ff6`cc9660d8 d61f0120 br          x9

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64FrameLowering.cpp

3 lines

test/

CodeGen/

AArch64/

arm64-never-combine-csr-local-stack-bump-for-size.ll

25 lines

Diff 225588

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp

	Show First 20 Lines • Show All 441 Lines • ▼ Show 20 Lines

	bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(			bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
	MachineFunction &MF, unsigned StackBumpBytes) const {			MachineFunction &MF, unsigned StackBumpBytes) const {
	AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();			AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
	const MachineFrameInfo &MFI = MF.getFrameInfo();			const MachineFrameInfo &MFI = MF.getFrameInfo();
	const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();			const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
	const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();			const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();

				if (MF.getFunction().hasOptSize())
				return false;

	if (AFI->getLocalStackSize() == 0)			if (AFI->getLocalStackSize() == 0)
	return false;			return false;

	// 512 is the maximum immediate for stp/ldp that will be used for			// 512 is the maximum immediate for stp/ldp that will be used for
	// callee-save save/restores			// callee-save save/restores
	if (StackBumpBytes >= 512 \|\| windowsRequiresStackProbe(MF, StackBumpBytes))			if (StackBumpBytes >= 512 \|\| windowsRequiresStackProbe(MF, StackBumpBytes))
	return false;			return false;

	▲ Show 20 Lines • Show All 1,950 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-never-combine-csr-local-stack-bump-for-size.ll

This file was added.

				; RUN: llc < %s -mtriple=arm64-apple-ios7.0 -disable-post-ra \| FileCheck %s

				; CHECK-LABEL: main:
				; CHECK: stp x29, x30, [sp, #-16]!
				; CHECK-NEXT: stp xzr, xzr, [sp, #-16]!
				; CHECK: adrp x0, l_.str@PAGE
				; CHECK: add x0, x0, l_.str@PAGEOFF
				; CHECK-NEXT: bl _puts
				; CHECK-NEXT: add sp, sp, #16
				; CHECK-NEXT: ldp x29, x30, [sp], #16
				; CHECK-NEXT: ret

				@.str = private unnamed_addr constant [7 x i8] c"hello\0A\00"

				define i32 @main() nounwind ssp optsize {
				entry:
				%local1 = alloca i64, align 8
				%local2 = alloca i64, align 8
				store i64 0, i64* %local1
				store i64 0, i64* %local2
				%call = call i32 @puts(i8* getelementptr inbounds ([7 x i8], [7 x i8]* @.str, i32 0, i32 0))
				ret i32 %call
				}

				declare i32 @puts(i8*)