This is an archive of the discontinued LLVM Phabricator instance.

[PEI, AArch64] Use empty spaces in stack area for local stack slot allocation.
ClosedPublic

Authored by gberry on May 12 2016, 1:32 PM.

Details

Summary

If the target requests it, use emptry spaces in the fixed and
callee-save stack area to allocate local stack objects.

AArch64: Change last callee-save reg stack object alignment instead of
size to leave a gap to take advantage of above change.

Diff Detail

Event Timeline

gberry updated this revision to Diff 57096.May 12 2016, 1:32 PM
gberry retitled this revision from to [PEI, AArch64] Use empty spaces in stack area for local stack slot allocation..
gberry updated this object.
gberry added reviewers: t.p.northover, qcolombet, MatzeB.
gberry added subscribers: llvm-commits, mcrosier, rengolin.
qcolombet edited edge metadata.May 16 2016, 10:27 AM

Hi Geoff,

Couple of questions:

  • Where are those empty slots? E.g., are they padding because of alignment, something else?
  • Isn’t stack coloring doing already what you want?

Cheers,
-Quentin

Hi Quentin,

  • In the case of AArch64, yes the empty slots are there because of alignment, specifically of the callee-save stack area. The stack pointer must always be 16-byte aligned on aarch64, so when allocating a callee-save area for say, 3 64-bit registers, there will be a 64-bit hole left. This change uses that hole to allocate a spill slot/alloca when possible.
  • My understanding is that stack coloring is solving a different problem (technically two different passes solving similar problems). It tries to compact the stack space used by spill slots/allocas by re-using the same stack slot for different variables when their live ranges don't overlap. They run before PEI, so they don't even see the "holes" in the callee-save area that this patch is trying to eliminate.

Hi Geoff,

Thanks for your answer, we are on the same page then :).

I’ll have a look at the patch shortly.

Cheers,
-Quentin

qcolombet added inline comments.May 17 2016, 2:59 PM
lib/CodeGen/PrologEpilogInserter.cpp
593

resize(0) is weird when reading the code. Clear() looks more appropriate.

Also, if all the bytes are free, why do we stop?

601

use the actual type here.
It is important to know if it is a int an unsigned, etc.

766

Add a comment on what information we keep in FixedCSOffset, e.g., offset containing all the fixed sized and callee saved objects.

903

protecter -> protector

903

isn’t being used *and* the target...

906

We should gate that with the optimization level as well.

909

Allocated

910

Period.

913

Period.

922

A comment reminding that ObjOffset is negative would be nice :).

925

ObjStart

932

Could you move all that stuff in a helper function?

test/CodeGen/AArch64/aarch64-dynamic-stack-layout.ll
681

Why is the input of the test case changed?

gberry updated this revision to Diff 58127.May 23 2016, 11:48 AM
gberry edited edge metadata.

Update to address Quentin's review feedback.

Hi Quentin,

Thanks for the review. I believe I've addressed all of the issues you raised, if not just let me know.

@t.p.northover Are you okay with the AArch64 specific changes?

lib/CodeGen/PrologEpilogInserter.cpp
925

Wow, that was a particularly bad typo :)

test/CodeGen/AArch64/aarch64-dynamic-stack-layout.ll
681

After my change, %tmp was getting placed in a hole in the Fixed/CS stack area, leading to no local stack space being allocated in the prologue, which ended up hiding the problem this test was checking for without actually fixing it. I made the change smaller in this revision by simply making %tmp bigger so it wouldn't fit in the unused part of the Fixed/CS stack area any more.

Thanks Geoff, the update looks good, just one missing thing.
(See my inlined comments.)

FWIW, the AArch64 changes looks good to me as well.

Q.

lib/CodeGen/PrologEpilogInserter.cpp
599

Putting back my comment from the other day: having the actual type (instead of auto) would be nice. At first glance, I have no idea if those are unsigned, int, etc.

622

The empty check is now also checked as part of the none check.
I would suggest to get rid of the empty check.

628

I guess we can get rid of the call to clear.
Maybe that was an optimization when StackBytesFree is not empty to have none being faster?
(I am fine keeping it.)

Thanks Quentin,

I'll address the issues you raised shortly.

lib/CodeGen/PrologEpilogInserter.cpp
599

Sorry, I didn't ignore you :) I think your original comment was on a different part of the original change and I didn't notice these needed fixing too. I'll double check all of the autos just in case.

622

Yeah, this is probably overkill.

628

Yes, my thinking was that later calls to empty()/none() would complete faster.

gberry updated this revision to Diff 58812.May 27 2016, 11:05 AM

Update to address Quentin's latest comments.

gberry added inline comments.May 27 2016, 11:07 AM
lib/CodeGen/PrologEpilogInserter.cpp
587

This check and the corresponding change of ObjOffest and ObjSize below to 'int' are the only significant changes in this revision.

Gerolf added a subscriber: Gerolf.May 27 2016, 11:21 AM

Since I couldn't spot this in the code: in the scenario where there is one empty reallocate both of them to the empty slot or give up after it reallocated the first object?

Thanks
Gerolf

lib/CodeGen/PrologEpilogInserter.cpp
632

Could this be function like FreeStart = FindSlot(...) -- line 632 - 657

Hi Gerolf,

I'm not sure I understand exactly what you're asking, but this change will allocate multiple stack slots to free space in the fixed/CS area. It will assign later slots to this free area even if earlier slots were not assigned there. It will not assign multiple slots to overlapping regions of the fixed/CS area.

lib/CodeGen/PrologEpilogInserter.cpp
632

I suppose, but I don't think it would make the loop any clearer since I don't think it would simplify the control flow of the loop. Also, this function isn't really doing much else.

Thanks, Geoff. I take back my previous comment. I was worried that when A and B are allocated at stack offset x and this optimization changes x, it would only change it for A but not for B. But looking deeper into the code this is not the case.

lib/CodeGen/PrologEpilogInserter.cpp
632

I agree with your functionality argument. A function would just improve readability.

Hi Geoff,

LGTM.

Did not mark it as accepted as you asked Tim to check the AArch64 part + I am not sure what you'd agreed with Gerolf :).

Cheers,
-Quentin

lib/CodeGen/PrologEpilogInserter.cpp
599

No worries, I figured that was what happened :). Phab can be quite confusing after a few revisions.

gberry added a comment.Jun 2 2016, 8:12 AM

I'm going to go ahead and commit this based on Quentin's review. @t.p.northover if you do take a look and have any objections, just let me know and I'll resolve them.

mcrosier accepted this revision.Jun 2 2016, 8:45 AM
mcrosier added a reviewer: mcrosier.

Sounds reasonable to me.

This revision is now accepted and ready to land.Jun 2 2016, 8:45 AM
This revision was automatically updated to reflect the committed changes.