This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Align stack objects passed to memory intrinsics
ClosedPublic

Authored by john.brawn on Feb 26 2015, 6:11 AM.

Details

Summary

Memcpy, and other memory intrinsics, typically tries to use LDM/STM if the source and target addresses are 4-byte aligned. In CodeGenPrepare look for calls to memory intrinsics and, if the object is on the stack, 4-byte align it if it's large enough that we expect that memcpy would want to use LDM/STM to copy it.

Diff Detail

Repository
rL LLVM

Event Timeline

john.brawn updated this revision to Diff 20755.Feb 26 2015, 6:11 AM
john.brawn retitled this revision from to [ARM] Align stack objects that may be memcpy'd.
john.brawn updated this object.
john.brawn edited the test plan for this revision. (Show Details)
john.brawn set the repository for this revision to rL LLVM.
john.brawn added a subscriber: Unknown Object (MLST).

Hi John,

Have you done some stack consumption analysis? I remember Arnaud trying to reduce the size of the stack and this patch seems to go against it.

I agree it's a good idea to align it at 4, but we need to be careful, especially in -Os/z.

cheers,
--renato

Zero change in stack consumption in any EEMBC benchmark (I haven't checked SPEC). Also by CodeGen time we don't know if we're compiling -Os/-Oz as we just have CodeGenOpt::Level.

Zero change in stack consumption in any EEMBC benchmark (I haven't checked SPEC). Also by CodeGen time we don't know if we're compiling -Os/-Oz as we just have CodeGenOpt::Level.

No, the size optimization attribute is independent of CodeGenOpt::Level. You can get at it from CodeGen by querying:

MF.getFunction()->hasFnAttribute(Attribute::OptimizeForSize);

Zero change in stack consumption in any EEMBC benchmark (I haven't checked SPEC). Also by CodeGen time we don't know if we're compiling -Os/-Oz as we just have CodeGenOpt::Level.

No, the size optimization attribute is independent of CodeGenOpt::Level. You can get at it from CodeGen by querying:

MF.getFunction()->hasFnAttribute(Attribute::OptimizeForSize);

That having been said, code size and stack size, in this context, might be inversely related.

It might be better to implement this in CodeGenPrep. You could look directly for allocas that are used by memcpy intrinsics or as byval function argument parameters, and increase the alignment of only those directly.

Also worth noting, is that in getMemcpyLoadsAndStores in lib/CodeGen/SelectionDAG/SelectionDAG.cpp, there already is code that attempts to reset the alignment of a destination stack object when possible. Perhaps this should just be extended to do the same for the source.

Also worth noting, is that in getMemcpyLoadsAndStores in lib/CodeGen/SelectionDAG/SelectionDAG.cpp, there already is code that attempts to reset the alignment of a destination stack object when possible. Perhaps this should just be extended to do the same for the source.

The goal is to align not just on direct calls to memcpy, but also calls to functions that then may call memcpy. It looks like doing it as a CodeGenPrep pass can do both - we first align arguments to a call, then check if the call is actually a memcpy and up its alignment if possible.

john.brawn updated this revision to Diff 21012.Mar 2 2015, 9:22 AM
john.brawn retitled this revision from [ARM] Align stack objects that may be memcpy'd to [ARM] Align stack objects passed to functions.
john.brawn updated this object.

New patch attached that does this instead as a CodeGenPrepare pass.

Can you please put this into CodeGenPrep proper (with some target hook to enable it). I likely want to use this for PowerPC too, and the actual logic is fairly simple (so far).

lib/Target/ARM/ARMAlignAllocaPass.cpp
32 ↗(On Diff #21012)

Line too long?

62 ↗(On Diff #21012)

What are you actually trying to check for here? I assume you're mostly interested in looking for GEPs here (and not, for example, ptrtoint). Also, for GEPs, do you want to check that the offet to the underlying alloca is divisible by the alignment of interest. Maybe you should use something like Val->stripAndAccumulateInBoundsConstantOffsets() -- true, you'd miss the dynamic offset case, but I don't know if you have examples where that's important.

john.brawn updated this revision to Diff 21355.Mar 6 2015, 7:33 AM
john.brawn updated this object.

New patch attached that moves this into CodeGenPrepare, enabled by shouldAlignAllocaArgs in TargetLowering.

rengolin added inline comments.Mar 6 2015, 7:37 AM
lib/CodeGen/CodeGenPrepare.cpp
1236 ↗(On Diff #21355)

I'm a bit averse of setting parameters for the computation inside a conditional and having to assert right after every call. I'd rather have (or reuse) some default size/alignment parameters from elsewhere.

john.brawn added inline comments.Mar 6 2015, 7:58 AM
lib/CodeGen/CodeGenPrepare.cpp
1236 ↗(On Diff #21355)

If an implementation of shouldAlignAllocaArgs were to not set AllocaAlign then that indicates that it's not doing what it should, so failing an assert is better than hiding that by silently using a default value.

rengolin added inline comments.Mar 6 2015, 8:26 AM
lib/CodeGen/CodeGenPrepare.cpp
1236 ↗(On Diff #21355)

I'm not against the assert, I'm against passing a reference for AllocaSize and AllocaAlign on a check call. The call is not explicit that it's setting the two values here, nor is it on TargetLowering's default implementation.

A more clear way would be to make a function whose only job is to set those variables and make it explicit on its name, like "setAlignAllocaArgs(CI, Size, Align)" and add "Size > 0" to the conditional. You can avoid the assert in that case, too.

Also, your check for TLI is redundant, since you already check for TD.

john.brawn added inline comments.Mar 6 2015, 10:08 AM
lib/CodeGen/CodeGenPrepare.cpp
1236 ↗(On Diff #21355)

Doing it this way (return a bool, set a reference/pointer argument only if returning true) seems to be the standard way these queries are done in TargetInfo - see allowsMisalignedMemoryAccesses, getStackCookieLocation, GetAddrModeArguments.

hfinkel added inline comments.Mar 6 2015, 10:11 AM
include/llvm/Target/TargetLowering.h
980 ↗(On Diff #21355)

Please name AllocaSize -> MinAllocaSize to make its purpose clearer.

lib/CodeGen/CodeGenPrepare.cpp
1235 ↗(On Diff #21355)

Don't initialize these here; that makes it clear that they must be set by the TLI call, and allows mistakes to be caught by tools that detect use of undefined values (valgrind, etc.).

1236 ↗(On Diff #21355)

I'm not against the assert, I'm against passing a reference for AllocaSize and AllocaAlign on a check call. The call is not explicit that it's setting the two values here, nor is it on TargetLowering's default implementation.

A more clear way would be to make a function whose only job is to set those variables and make it explicit on its name, like "setAlignAllocaArgs(CI, Size, Align)" and add "Size > 0" to the conditional. You can avoid the assert in that case, too.

I disagree. So long as the values are not initialized above the call, it is clear that the call must be setting them (and returning true if it has done so). This pattern is used by a number of other callbacks, even some in this file (getStackCookieLocation, getPreIndexedAddressParts, getPostIndexedAddressParts, ShrinkDemandedConstant, etc.), and I think it is appropriate here.

I may be reworking this patch to align not just allocas, but also global variables defined in this translation unit. The ultimate thing I'm trying to optimize is cases like

void fn() {
  char arr[31];
  strcpy(arr, "some string");
  // do somthing with arr
}

This patch aligns arr, but the variable generated for the string needs to be aligned as well. The plan was to do that in clang, but I'm thinking that maybe doing it here would be better.

Or maybe it would be better to get this checked in then do that, but the name shouldAlignAllocaArgs would have to be changed (maybe shouldAlignPointerArgs).

john.brawn updated this revision to Diff 21488.Mar 9 2015, 7:51 AM

New patch that addresses review comments. It also renames things slightly in expectation that we'll be also aligning GlobalVariables. I'll be doing that in an upcoming patch.

You're currently doing this only for pointers that are captured as call arguments. Do you care about pointers captured in any other way (by having their address stored somewhere, or returned, for example)?

lib/CodeGen/CodeGenPrepare.cpp
1256 ↗(On Diff #21488)

may have improved -> may be able to improve

You're currently doing this only for pointers that are captured as call arguments. Do you care about pointers captured in any other way (by having their address stored somewhere, or returned, for example)?

Maybe? The original patch did that as it just aligned everything that was used in a FrameIndex instruction. I guess the question is, how much do we care if we over-align things pointlessly? When it's an argument to a memcpy call it's definitely a good idea, but in an argument to some other call or when the address is stored somewhere it may turn out to be pointless (which is possibly an argument that I should be restricting this to only memcpy/move etc. calls).

You're currently doing this only for pointers that are captured as call arguments. Do you care about pointers captured in any other way (by having their address stored somewhere, or returned, for example)?

Maybe? The original patch did that as it just aligned everything that was used in a FrameIndex instruction. I guess the question is, how much do we care if we over-align things pointlessly? When it's an argument to a memcpy call it's definitely a good idea, but in an argument to some other call or when the address is stored somewhere it may turn out to be pointless (which is possibly an argument that I should be restricting this to only memcpy/move etc. calls).

Agreed.

I don't have a strong opinion here. I suppose I'm inclined to go for other kinds of captures as well, or to stick with just memcpy/move/set. Picking only function calls, which amounts to some, but not all, captures, seems like an inconsistency likely to cause different kinds of abstractions to get different optimizations, and that's likely not good. Let's go to one extreme or the other ;)

john.brawn updated this revision to Diff 21573.Mar 10 2015, 6:50 AM
john.brawn retitled this revision from [ARM] Align stack objects passed to functions to [ARM] Align stack objects passed to memory intrinsics.
john.brawn updated this object.
john.brawn added a reviewer: hfinkel.

New patch attached that goes for the approach of just aligning memory intrinsics.

john.brawn updated this object.Mar 10 2015, 6:51 AM
hfinkel added inline comments.Mar 10 2015, 2:14 PM
lib/CodeGen/CodeGenPrepare.cpp
1252 ↗(On Diff #21573)

If this is not an inbounds GEP, then this subtraction could underflow (it could overflow otherwise too, but without inbounds it could underflow even in a case where the code does not have undefined behavior). I recommend making sure the value being subtracted is not greater than the alloc size.

Please add a test case for this.

john.brawn added inline comments.Mar 11 2015, 4:03 AM
lib/CodeGen/CodeGenPrepare.cpp
1252 ↗(On Diff #21573)

stripAndAccumulateInBoundsConstantOffsets will stop if it hits a non-inbound GEP, so we wouldn't get here in that case. Adding a test sound like a good idea though, I'll do that.

hfinkel added inline comments.Mar 11 2015, 6:52 AM
lib/CodeGen/CodeGenPrepare.cpp
1252 ↗(On Diff #21573)

Yea, but even for an inbounds gep, we should do the right thing (the code may have undefined behavior, but having that reset the alignment of some objects seems a bit silly).

Attached patch handles size<offset by doing the comparison in a different way (by checking size >= minsize+offset), and adds a bunch of tests of offset handling.

john.brawn updated this revision to Diff 22024.Mar 16 2015, 8:46 AM

While working on the next part of this (aligning global variables) I realised that we want to 8-byte align on some targets as it's faster, so the attached patch does that.

hfinkel accepted this revision.Mar 17 2015, 11:13 AM
hfinkel edited edge metadata.

Looks good as far as I'm concerned. You might want to do the global variable handling as a separate patch.

This revision is now accepted and ready to land.Mar 17 2015, 11:13 AM
This revision was automatically updated to reflect the committed changes.