This is an archive of the discontinued LLVM Phabricator instance.

[X86] Generate unaligned access for fixed slots in unaligned stack
ClosedPublic

Authored by thejh on Jan 21 2020, 10:25 AM.

Details

Summary

loadRegFromStackSlot()/storeRegToStackSlot() can generate aligned access
instructions for stack slots even if the stack is unaligned, based on the
assumption that the stack can be realigned.
However, this doesn't work for fixed slots, which are e.g. used for
spilling XMM registers in a non-leaf function with
__attribute__((preserve_all)).
When compiling such code with -mstack-alignment=8, this causes general
protection faults.

Fix it by only considering stack realignment for non-fixed slots.

Note that this changes the output of three existing tests which spill AVX
registers, since AVX requires higher alignment than the ABI provides on
stack frame entry.

Diff Detail

Event Timeline

thejh created this revision.Jan 21 2020, 10:25 AM
thejh added a comment.Jan 22 2020, 6:18 AM

Ah, and I don't have commit access. (The docs say I'm supposed to point that out.)

rnk added inline comments.Jan 24 2020, 1:40 PM
llvm/test/CodeGen/X86/avx512-intel-ocl.ll
263–264

Is there some reason we are using fixed objects for these spills? We could use unfixed objects and store them relative to the aligned RSP after this alignment here. It's a big change, but it seems like the code would be better.

thejh marked an inline comment as done.Jan 25 2020, 8:26 AM
thejh added inline comments.
llvm/test/CodeGen/X86/avx512-intel-ocl.ll
263–264

AFAICS using unfixed objects makes sense as an optimization that will work in most cases. However, it might be easier to still have this variant, where spills go into fixed objects, as a fallback method at least, in case the function prologue has to call another function before the stack frame is properly set up - for example, when compiling with -fsplit-stack.

The following case looks broken at the moment, and I think it might be easier to fix if the spills are in fixed slots. Otherwise, it would be necessary to either allocate stack memory for all the fixed slots before callq __morestack so that the non-fixed slots can be spilled into, or alternatively do something entirely different with the normally-callee-saved registers there.

$ cat test.c
void bar();
__attribute__((preserve_all)) void foo(void) {
  bar();
}
$ ../bin/clang -fsplit-stack -fno-asynchronous-unwind-tables -S -o test.s test.c
$ cat test.s
	.text
	.file	"test.c"
	.globl	foo                     # -- Begin function foo
	.p2align	4, 0x90
	.type	foo,@function
foo:                                    # @foo
# %bb.2:
	leaq	-344(%rsp), %r11
	cmpq	%fs:112, %r11
	ja	.LBB0_0
# %bb.1:
	movabsq	$344, %r10              # imm = 0x158
	movabsq	$0, %r11
	callq	__morestack
	retq
.LBB0_0:
	pushq	%rbp
	movq	%rsp, %rbp
	pushq	%rsp
	pushq	%r10
	pushq	%r9
	pushq	%r8
	pushq	%rdi
	pushq	%rsi
	pushq	%rdx
	pushq	%rcx
	pushq	%rax
	subq	$264, %rsp              # imm = 0x108
	movaps	%xmm15, -96(%rbp)       # 16-byte Spill
	movaps	%xmm14, -112(%rbp)      # 16-byte Spill
[...]

At the moment, the spills that are necessary for the preserve_all calling convention only happen after various registers that would be caller-saved in the standard calling convention have already been clobbered - r11 and r10 are clobbered by the generated code directly, other registers are going to be clobbered by __morestack, which assumes that only argument registers need to be spilled before calling into standard C code.

(In case you're wondering: No, I don't actually want to use -fsplit-stack, it was just the only example I could think of right now that needs function calls in the prologue.)

But maybe I'm just overthinking this - I'm not all that familiar with LLVM internals.

thejh marked an inline comment as done.Jan 25 2020, 8:36 AM
thejh added inline comments.
llvm/test/CodeGen/X86/avx512-intel-ocl.ll
263–264

Anyway, my overall idea for this patch was that I wanted to fix a crash I was encountering; I realize that there is the potential for further optimization, but I did not want to invest the time necessary for that.

jyknight accepted this revision.Feb 24 2020, 9:48 AM
jyknight added a subscriber: jyknight.

Even if it would be better to avoid using a fixed slots in some case, as long as fixed slots continue to exist this patch seems like the correct change.

Anyone disagree?

This revision is now accepted and ready to land.Feb 24 2020, 9:48 AM
rnk accepted this revision.Feb 24 2020, 11:28 AM

I guess the code change would be correct even if we used unfixed objects for spill slots.

Are we sure we don't come to this codepath to load arguments passed in memory, which used fixed stack slots? I think we should assume that such a vector is aligned. Is that already tested?

kees added a subscriber: kees.Mar 4 2020, 9:26 AM

Hi! What's the state of this change? Do you need help committing this?

arsenm resigned from this revision.Apr 5 2020, 7:46 AM
mizvekov updated this revision to Diff 321589.Feb 4 2021, 3:49 PM
mizvekov added a subscriber: mizvekov.

Just an easy rebase of the original patch.

No tests had to be updated at all.

Please let me know if I have properly attributed credits to original author.

By the way, I do not have commit access and will need help with that.

This revision was landed with ongoing or failed builds.Feb 4 2021, 7:37 PM
This revision was automatically updated to reflect the committed changes.