Page MenuHomePhabricator

[AArch64] Stack frame reordering.

Authored by eugenis on Jan 7 2020, 3:05 PM.



Implement stack frame reordering in the AArch64 backend.

Unlike the X86 implementation, AArch64 does not seem to benefit from
"access density" based frame reordering, mainly because it has a much
smaller variety of addressing modes, and the fact that all instructions
are 4 bytes so each frame object is either in range of an instruction
(and then the access is "free") or not (and that has a code size cost
of 4 bytes).

This change improves Memory Tagging codegen by

  • Placing an object that has been chosen as the base tagged pointer of

the function at SP + 0. This saves one instruction to setup the pointer
(IRG does not have an offset immediate), and more because that object
can now be referenced without materializing its tagged address in a
scratch register.

  • Placing objects that go out of scope simultaneously together. This

exposes opportunities for instruction merging in tryMergeAdjacentSTG.

Diff Detail

Event Timeline

eugenis created this revision.Jan 7 2020, 3:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2020, 3:05 PM

This revision and its two dependencies improve MTE code size of chromium on aarch64-linux-android by slightly more than 1%.

Unit tests: fail. 61307 tests passed, 1 failed and 736 were skipped.

failed: libc++.std/thread/thread_mutex/thread_mutex_requirements/thread_sharedtimedmutex_requirements/thread_sharedtimedmutex_class/try_lock.pass.cpp

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Otherwise I like the idea.


IMHO if the data structure would be vector of vectors of FrameObjects then maybe the whole logic would be a bit simpler, but maybe I missed something.

efriedma added inline comments.

It's usually more concise to write this sort of thing with std::tuple: return std::make_tuple(!A.IsValid, A.ObjectFirst, A.GroupFirst, A.GroupIndex, A.ObjectIndex) < std::make_tuple(!B.IsValid, B.ObjectFirst, B.GroupFirst, B.GroupIndex, B.ObjectIndex).


Can we skip running this code when stack tagging is disabled? It doesn't do anything in that case, I think, but it would avoid running a bunch of unnecessary code, and make the intent more clear.

eugenis marked 2 inline comments as done.Apr 27 2020, 1:15 PM
eugenis added inline comments.

Do you mean a list of groups, where each group is a list of objects?
In that case we would have to deal with the possibility that one object participates in multiple groups, and resolve that somehow when building the final frame order. In the current implementation, the object always abandons its old group when joining a new one - it's not optimal, but simple and seems to work fine.

Also, FrameObjectCompare needs a fast way to determine when two objects are in the same group, hence group id is a member of FrameObject, and not vice versa.


Sure, we can look for the sanitize_memtag attribute. Stack tagging instructions can be added some other way of course (ex. with intrinsic calls in the C source).
We could check if MTE is in the subtarget's feature bits, but that would not help much.

danielkiss added inline comments.Apr 28 2020, 11:21 AM

Do you mean a list of groups, where each group is a list of objects?

Yes, like that.

In that case we would have to deal with the possibility that one object participates in multiple groups

Right, that I missed :)

eugenis updated this revision to Diff 261559.May 1 2020, 3:25 PM

address comments

eugenis marked 3 inline comments as done and an inline comment as not done.May 1 2020, 3:36 PM
eugenis added inline comments.

I'm not sure how to do this. As settag-merge-order.ll test shows, it's pretty easy to generate code that benefits from this transformation w/o a sanitize_memtag attribute.
I could fold this check into some other place where we scan all instructions in a function, like the StackTaggingPreRA pass, and store in AArch64FunctionInfo, but that looks like a premature optimization to me.

eugenis updated this revision to Diff 296085.Oct 4 2020, 6:13 PM
eugenis marked an inline comment as not done.


eugenis added inline comments.Oct 4 2020, 6:17 PM

I've measured compilation time (pdfium_test in chromium without tagging, default config vs -aarch64-order-frame-objects=0) and did not see any change due to this patch.

Could anyone take another look at this and ?

this and ?

The two patches seem to overlap?

this and ?

The two patches seem to overlap?

Indeed. Arc is full of surprises.

I've reuploaded the patches separately.

efriedma accepted this revision.Oct 9 2020, 1:41 PM


This revision is now accepted and ready to land.Oct 9 2020, 1:41 PM
This revision was landed with ongoing or failed builds.Oct 15 2020, 1:06 PM
This revision was automatically updated to reflect the committed changes.
nickdesaulniers added inline comments.

What happened with the formatting here?

eugenis added inline comments.Oct 19 2020, 10:00 PM

clang-format happened :)
Does it change if you reformat it?

I can move it to a helper function.