This is an archive of the discontinued LLVM Phabricator instance.

[Codegen] Ensure stack is properly aligned for call argument initialization
ClosedPublic

Authored by jketema on Aug 25 2015, 2:39 PM.

Details

Summary

Arguments spilled on the stack before a function call may have
alignment requirements, for example in the case of vectors.
These requirements are exploited by the code generator by using
move instructions that have similar alignment requirements, e.g.,
movaps on x86.

Although the code generator properly aligns the arguments with
respect to the displacement of the stack pointer it computes,
the displacement itself may cause misalignment. For example if
we have

%3 = load <16 x float>, <16 x float>* %1, align 64
call void @bar(<16 x float> %3, i32 0)

The x86 back-end emits:

movaps  32(%ecx), %xmm2
movaps  (%ecx), %xmm0
movaps  16(%ecx), %xmm1
movaps  48(%ecx), %xmm3
subl    $20, %esp       <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards 
movaps  %xmm3, (%esp)   <-- movaps requires 16-byte alignment, while %esp is not aligned as such.
movl    $0, 16(%esp)
calll   __bar

To solve this, we need to make sure that the computed value with which
the stack pointer is changed is a multiple af the maximal alignment seen
during its computation. With this change we get proper alignment:

subl    $32, %esp
movaps  %xmm3, (%esp)

Diff Detail

Event Timeline

jketema updated this revision to Diff 33126.Aug 25 2015, 2:39 PM
jketema retitled this revision from to [Codegen] Ensure stack is properly aligned for call argument initialization.
jketema updated this object.
jketema added a reviewer: rnk.
jketema added a subscriber: llvm-commits.
rnk added inline comments.Aug 26 2015, 8:27 AM
include/llvm/CodeGen/CallingConvLower.h
204

I think this is really more like MaxStackArgAlign. It's the maximum alignment of arguments that ended up on the stack, right?

273

We should probably put some Doxygen here about what NextStackOffset really means.

274–278

Are you sure this is the right place to change? Isn't this method used to figure out where variadic argument packs start also? Shouldn't we *not* align StackOffset for that use case?

I think it also gets used when we have a reserved stack frame (i.e. no dynamic alloca). As written, your code will overallocate some extra padding that isn't necessary.

275

Use RoundUpToAlignment.

408

RoundUpToAlignment here would be a nice cleanup.

411

This looks like std::max(Align, MinStackAlign) to me.

Thanks for your feedback! I'll address the easy bits, and have another look at the use of getNextStackOffset. I might need some help with that though, as I'm not very familiar with the code generation parts of llvm (still learning).

include/llvm/CodeGen/CallingConvLower.h
204

It was meant to be the minimum stack alignment I'll eventually need. I like your suggestion though.

274–278

I'll have another look at the use cases.

jketema updated this revision to Diff 33261.Aug 26 2015, 3:37 PM

Addressed easy comments. getNextStackOffset concerns still to investigate (including adding Doxygen comment)

jketema added inline comments.Aug 26 2015, 3:57 PM
include/llvm/CodeGen/CallingConvLower.h
274–279

In the case of variadic arguments: can't we run into the same problems? If one of the arguments in the pack is a <4 x float> vector, then wouldn't there be cases where the X86 code generator uses and an movaps instruction to load the vector into one of the XMM registers? Or will it always used movups in that case?

And similar for reserved stack frames?

jketema updated this revision to Diff 33270.Aug 26 2015, 4:24 PM

Add Doxygen comment

jketema marked 6 inline comments as done.Sep 1 2015, 5:37 PM

Hi Reid,

Could you have another look at this? After having had a renewed look at the code, getNextStackOffset still seems the correct place for this change to me for the reasons I provided in the inline comments. However, if you think this really belongs somewhere else, would you be able to point me in the right direction?

Thanks,

Jeroen

rnk added inline comments.Sep 8 2015, 2:46 PM
include/llvm/CodeGen/CallingConvLower.h
274–279

Yes, but here's the 32-bit test case I'm imagining:

void f(__m128 v, const char *f, ...) {
  va_list ap;
  va_start(ap, f);
  int d = va_arg(ap, int);
}

Do we have padding between 'f' and the variadic int parameter? I think the answer should be know, because if they were not variadic, they would not have padding between them.

jketema added inline comments.Sep 15 2015, 1:35 PM
include/llvm/CodeGen/CallingConvLower.h
274–279

Sorry, I missed this before I bumped. Somehow the system didn't sent me a notification email.

I don't see why this is a problem, my patch only changes the alignment of the whole frame, not the padding between the individual members; the computation for that stays unchanged. Hence, there will only be additional padding after the very last of the variadic parameters supplied.

jketema added inline comments.Sep 15 2015, 1:48 PM
include/llvm/CodeGen/CallingConvLower.h
274–279

Reading back all the comments, I think I see what you're getting at: What you're saying is that getNextStackOffset is also used to compute the offset of the variadic arguments with respect to the beginning of the stack frame?

If that's the case, then the patch would indeed be wrong, and I think we would then need both the old variant and the new one I'm proposing and adapt every use accordingly. Or do you see a better solution?

rnk added inline comments.Sep 15 2015, 1:52 PM
include/llvm/CodeGen/CallingConvLower.h
274–279

Exactly, getNextStackOffset currently has more than one use. You should take the example C code above and turn it into a test case for lit to show that at least this use isn't changing.

Given that there is more than one use, the I think we should introduce a new method called getAlignedCallFrameSize, update the appropriate call sites that use this in PEI, and declare victory. :)

jketema updated this revision to Diff 35166.Sep 19 2015, 6:41 AM

Hi Reid,

I've introduced a getAlignedCallFrameSize method as suggested and changed call lowering in the X86 target appropriately. I've also introduced a test involving var args, as you suggested.

The X86 target also contains a call to getNextStackOffset in the code that determines whether tail call optimization can be performed. I'm not sure if that call should also be changed to getAlignedCallFrameSize. I'm also wondering whether getAlignedCallFrameSize should be introduced in other targets too.

rnk accepted this revision.Sep 28 2015, 11:40 AM
rnk edited edge metadata.

lgtm

test/CodeGen/X86/win32-spill-xmm.ll
9
; CHECK: calll bar
This revision is now accepted and ready to land.Sep 28 2015, 11:40 AM
This revision was automatically updated to reflect the committed changes.

Thanks for all your feedback Reid!