This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Prefer prologues with sp adjustments merged into stp/ldp for WinCFI
ClosedPublic

Authored by mstorsjo on Oct 1 2020, 1:38 PM.

Details

Summary

This makes the prologue match the windows canonical layout, for cases without a frame pointer.

Diff Detail

Event Timeline

mstorsjo created this revision.Oct 1 2020, 1:38 PM
mstorsjo requested review of this revision.Oct 1 2020, 1:38 PM

How badly do we really want to match the canonical packed prologue? Is it really worth generating less efficient instructions to reduce the size of the unwind data? (I guess it's not a lot less efficient, but still.)

How badly do we really want to match the canonical packed prologue?

Well, the space savings are quite notable, and without this in place, we seldom end up matching the canonical forms allowing use of packed, so I'd hold off pushing D88677 until this one is settled (because there's little point in bending over backwards with the register order if we don't hit the packed forms regularly).

Is it really worth generating less efficient instructions to reduce the size of the unwind data? (I guess it's not a lot less efficient, but still.)

I guess it's marginally less efficient, but in most cases, the produced number of instructions should at least be the same. (In some of the testcase updates, it may look like we're getting more instructions, but that's in cases with sparse CHECK lines without thoroughly checking all with CHECK-NEXT.)

AFAIK in most cases, this patch should amount to changing this:

sub sp, sp, #48
stp x19, x20, [sp, #16]
stp x21, x30, [sp, #32]

Into this:

stp x19, x20, [sp, #-32]!
stp x21, x30, [sp, #16]
sub sp, sp, #16

So the same number of instructions, but sp is updated twice instead of once - that's the only inefficiency I can think of.

Yes, the dependency chain of sp is one instruction longer, and there's one extra arithmetic op on some cores. Those double if you count the epilogue that has the same issue.

Not sure how significant that is in practice, but I suspect it's observable in code with a lot of small functions.

Maybe it makes sense to distinguish between -O2 vs. -Os here?

mstorsjo updated this revision to Diff 295908.Oct 2 2020, 2:19 PM

Limit the change to when optimization for size is requested.

efriedma accepted this revision.Oct 2 2020, 6:12 PM

LGTM

This revision is now accepted and ready to land.Oct 2 2020, 6:12 PM