This is an archive of the discontinued LLVM Phabricator instance.

AArch64: Switch to x20 as the shadow base register for outlined HWASan checks.
ClosedPublic

Authored by pcc on Oct 29 2020, 1:53 PM.

Details

Summary

From a code size perspective it turns out to be better to use a
callee-saved register to pass the shadow base. For non-leaf functions
it avoids the need to reload the shadow base into x9 after each
function call, at the cost of an additional stack slot to save the
caller's x20. But with x9 there is also a stack size cost, either
as a result of copying x9 to a callee-saved register across calls or
by spilling it to stack, so for the non-leaf functions the change to
stack usage is largely neutral.

It is also code size (and stack size) neutral for many leaf functions.
Although they now need to save/restore x20 this can typically be
combined via LDP/STP into the x30 save/restore. In the case where
the function needs callee-saved registers or stack spills we end up
needing, on average, 8 more bytes of stack and 1 more instruction
but given the improvements to other functions this seems like the
right tradeoff.

Since this is an ABI change for the outlined check functions they
have been renamed.

Unfortunately we cannot change the register for the v1 (non short
granules) check because the runtime assumes that the shadow base
register is stored in x9, so the v1 check still uses x9.

With this change code size of /system/lib64/*.so in an Android build
with HWASan goes from 200066976 bytes to 194085912 bytes, or a 3%
decrease.

Diff Detail

Event Timeline

pcc created this revision.Oct 29 2020, 1:53 PM
Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald TranscriptOct 29 2020, 1:53 PM
Herald added subscribers: Restricted Project, cfe-commits, danielkiss and 2 others. · View Herald Transcript
hctim accepted this revision.Oct 29 2020, 3:19 PM

LGTM - I'm assuming this doesn't regress performance because there's no latency of the additional store due to its locality?

This revision is now accepted and ready to land.Oct 29 2020, 3:19 PM
pcc added a comment.Oct 29 2020, 3:28 PM

For the kernel I measured a small regression in boot time (with a version of this change that uses x20 for the v1 checks as well since the kernel doesn't use short granules yet) -- from 6.65s to 6.70s or 0.8%. But that's a fraction of the size gains which were 4% for kernel and (as mentioned) 3% for userspace.

eugenis accepted this revision.Oct 29 2020, 4:22 PM

LGTM
I was confused about the ABI statement in the description at the first glance - could you reword it to make it clear that HWASan ABI is not affected?

This revision was landed with ongoing or failed builds.Oct 30 2020, 12:52 PM
This revision was automatically updated to reflect the committed changes.