This is an archive of the discontinued LLVM Phabricator instance.

Hanlding of aligned allocas on a target that does not align stack pointer.
ClosedPublic

Authored by jonpa on Oct 28 2015, 7:05 AM.

Details

Reviewers
hfinkel
Summary

[Stack realignment] Handling of aligned allocas.

This patch implements dynamic realignment of stack objects for targets
with a non-realigned stack pointer. Behaviour in FunctionLoweringInfo
is changed so that for a target that has StackRealignable set to
false, over-aligned static allocas are considered to be variable-sized
objects and are handled with DYNAMIC_STACKALLOC nodes.

It would be good to group aligned allocas into a single big alloca as
an optimization, but this is yet todo.

SystemZ benefits from this, due to its stack frame layout.

New tests SystemZ/alloca-03.ll for aligned allocas, and
SystemZ/alloca-04.ll for "no-realign-stack" attribute on functions.

Review and help from Ulrich Weigand and Hal Finkel.

Diff Detail

Event Timeline

jonpa updated this revision to Diff 38654.Oct 28 2015, 7:05 AM
jonpa retitled this revision from to Hanlding of aligned allocas on a target that does not align stack pointer..
jonpa updated this object.
jonpa added subscribers: llvm-commits, uweigand.
jonpa added a comment.Nov 10 2015, 7:57 AM

ping!

This patch has been on Phabricator for a while, and still needs review and approval.

SystemZ maintains normal SP alignment always, and instead dynamically realigns stack objects when needed.

Before we get into the details, please explain this statement. Why can you not implement dynamic stack realignment using a base pointer like other targets?

lib/CodeGen/MachineFunction.cpp
515

You've added this extra parameter and made a number of other changes just to produce a debug message. Please don't do that. Either make this a member function so it can compute the necessary condition itself, or remove this completely.

lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
117

Indentation looks odd here.

SystemZ maintains normal SP alignment always, and instead dynamically realigns stack objects when needed.

Before we get into the details, please explain this statement. Why can you not implement dynamic stack realignment using a base pointer like other targets?

Given that Jonas' patch is based on a suggestion of mine, I'll jump in here :-)

Of course, we *can* implement dynamic stack realignment using a new reserved hard register like other targets. The point is rather that we don't *need* to. On SystemZ, the only parts of the stack frame that require non-default alignment are local variables that were manually over-aligned by the programmer. It is easily possible to implement this without any target support by just doing a bigger alloca and manually aligning a pointer within that area. (In fact, you can do this even in the source code without any compiler support.)

This is different from the situation on other platforms like Intel, where some of the "special" areas of the frame may need non-default alignment, like parameter areas, register save areas, or spill slots. In those cases, it is not possible to implement the alignment requirement without special target support, and that's where the special prolog/epilog code using an extra base register comes in.

Because of this difference, GCC supports two flavors of stack realignment: for those platforms that require it, you can use the extra base register and related code (implemented by the target back-end); but for those platforms that do *not* require it (which is actually most of them), common code simply implements alignment for local variables using generic code (no back-end changes required).

This not only minimizes code changes (most back-ends require no extra code), but also results in more efficient code for targets like SystemZ, since we do not require to reserve an extra hard register.

Jonas patch is trying to implement a similar scheme for LLVM: back-ends may chose to implement realignment via extra base pointer, but for those that don't (need to), common code will still handle the local variable case via generic code. (As Jonas said in the initial submission, this generic implementation is still not quite as efficient as it could be, but that can be improved later ...)

SystemZ maintains normal SP alignment always, and instead dynamically realigns stack objects when needed.

Before we get into the details, please explain this statement. Why can you not implement dynamic stack realignment using a base pointer like other targets?

Given that Jonas' patch is based on a suggestion of mine, I'll jump in here :-)

Of course, we *can* implement dynamic stack realignment using a new reserved hard register like other targets. The point is rather that we don't *need* to. On SystemZ, the only parts of the stack frame that require non-default alignment are local variables that were manually over-aligned by the programmer. It is easily possible to implement this without any target support by just doing a bigger alloca and manually aligning a pointer within that area. (In fact, you can do this even in the source code without any compiler support.)

This is different from the situation on other platforms like Intel, where some of the "special" areas of the frame may need non-default alignment, like parameter areas, register save areas, or spill slots. In those cases, it is not possible to implement the alignment requirement without special target support, and that's where the special prolog/epilog code using an extra base register comes in.

Because of this difference, GCC supports two flavors of stack realignment: for those platforms that require it, you can use the extra base register and related code (implemented by the target back-end); but for those platforms that do *not* require it (which is actually most of them), common code simply implements alignment for local variables using generic code (no back-end changes required).

This not only minimizes code changes (most back-ends require no extra code), but also results in more efficient code for targets like SystemZ, since we do not require to reserve an extra hard register.

Jonas patch is trying to implement a similar scheme for LLVM: back-ends may chose to implement realignment via extra base pointer, but for those that don't (need to), common code will still handle the local variable case via generic code. (As Jonas said in the initial submission, this generic implementation is still not quite as efficient as it could be, but that can be improved later ...)

Uli, thanks for explaining.

but also results in more efficient code for targets like SystemZ, since we do not require to reserve an extra hard register

This seems untrue. Even if you don't reserve a register for the base pointer, by handling these as dynamic allocations, you force yourself to keep separate pointers to each overaligned stack allocation. In short, you trade one reserved register for N virtual ones (one for each over-aligned stack object). It is true that you might spill those virtual registers when they're not needed, but that's a current-infrastructure problem not a theoretical one, and even so, unlikely to be a win.

In general, over-aligned objects are a performance feature, and should be implemented in a high-performance way. The question here seems to be: Do we want to have a suboptimal, but still functionally correct, support for over-aligned stack objects? I think having this capability makes sense, so long as it comes with appropriate comments and explanations in the code about the downsides. Let's proceed.

but also results in more efficient code for targets like SystemZ, since we do not require to reserve an extra hard register

This seems untrue. Even if you don't reserve a register for the base pointer, by handling these as dynamic allocations, you force yourself to keep separate pointers to each overaligned stack allocation. In short, you trade one reserved register for N virtual ones (one for each over-aligned stack object). It is true that you might spill those virtual registers when they're not needed, but that's a current-infrastructure problem not a theoretical one, and even so, unlikely to be a win.

Just to clarify: my "more efficient" comment refered to the way this feature is implemented in GCC today, where common code already groups all overaligned variables into a "secondary stack frame" and uses only a single alloca to allocate this frame and only a single virtual register to access it. I certainly agree that the (initial) LLVM implementation in Jonas' patch is not generally more efficient, but has the drawbacks you mention. I still think Jonas' patch is a good first step:

  • It actually implements overaligned variables correctly for all platforms without requiring target-specific code.
  • It can always be improved upon in the future to be more like the GCC implementation described above.
  • It should already be more efficient for the -likely common- case of functions using just one single overaligned variable.

In general, over-aligned objects are a performance feature, and should be implemented in a high-performance way. The question here seems to be: Do we want to have a suboptimal, but still functionally correct, support for over-aligned stack objects? I think having this capability makes sense, so long as it comes with appropriate comments and explanations in the code about the downsides. Let's proceed.

Agreed that we should have comments explaining the downsides of the current implementation and suggestions for future improvements.

Thanks for the review!

jonpa updated this object.Nov 25 2015, 7:57 AM
jonpa updated this revision to Diff 41148.Nov 25 2015, 8:03 AM

I removed the messy arguments for the warning, and also fixed indentation.
Added a comment explaining the current lack of optimization in MachineFrameInfo.h

hfinkel accepted this revision.Nov 26 2015, 8:18 AM
hfinkel added a reviewer: hfinkel.

LGTM.

include/llvm/CodeGen/MachineFrameInfo.h
128
/// Targets that set this to false don't have the ability to overalign
/// their stack frame, and thus, overaligned allocas are all treated
/// as dynamic allocations and the target must handle them as part
/// of DYNAMIC_STACKALLOC lowering.
/// FIXME: There is room for improvement in this case, in terms of
/// grouping overaligned allocas into a "secondary stack frame" and
/// then only use a single alloca to allocate this frame and only a
/// single virtual register to access it. Currently, without such an
/// optimization, each such alloca gets it's own dynamic
/// realignment.
lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
118–119

Add:

// FIXME: Overaligned static allocas should be grouped into a single dynamic allocation instead of using a separate stack allocation for each one.
This revision is now accepted and ready to land.Nov 26 2015, 8:18 AM
jonpa added a comment.Nov 28 2015, 3:07 AM

Thanks for review, commited as rL254227

jonpa closed this revision.Nov 28 2015, 3:08 AM