This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SME] Set up a lazy-save/restore around calls.

Authored by kmclaughlin on Sep 14 2022, 2:35 PM.



Setting up a lazy-save mechanism around calls is done during SelectionDAG
because calls to intrinsics may be expanded into an actual function call
(e.g. calls to @llvm.cos()), and maintaining an allowed-list in the SMEABI
pass is not feasible.

The approach for conditionally restoring the lazy-save based on the runtime
value of TPIDR2_EL0 is similar to how we handle conditional smstart/smstop.
We create a pseudo-node which gets expanded into a conditional branch and
expands to a call to __arm_tpidr2_restore(%tpidr2_object_ptr).

The lazy-save buffer and TPIDR2 block are only allocated once at the start
of the function. For each call, the TPIDR2 block is initialised, and at
the end of the call, a pseudo node (RestoreZA) is planted.

Diff Detail

Event Timeline

sdesmalen created this revision.Sep 14 2022, 2:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 14 2022, 2:35 PM
sdesmalen requested review of this revision.Sep 14 2022, 2:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 14 2022, 2:35 PM
Matt added a subscriber: Matt.Sep 16 2022, 11:44 AM
aemerson added inline comments.Sep 20 2022, 8:22 AM

nit: coding style is to use I not i.


Document this? This looks like an expensive function.

It also has an unfortunate naming clash with SMEAttrs::requiresLazySave(), I think it could be named more descriptively to avoid confusion.


We have Register to represent registers.

kmclaughlin commandeered this revision.Sep 27 2022, 3:10 AM
kmclaughlin added a reviewer: sdesmalen.
kmclaughlin added a subscriber: kmclaughlin.

Commandeering this patch to try and address review comments while @sdesmalen is away.

kmclaughlin marked 2 inline comments as done.
  • Changed LowerCall to allocate a lazy-save buffer and TPIDR2 block if RequiresLazySave is true but the TPIDR2 object has not yet been set in AArch64FunctionInfo. This is possible where requiresLazySave finds no calls in the function, but the function contains an instruction which will be lowered to a lib call (e.g. a 128 bit floating-point add).
  • Renamed requiresLazySave to requiresBufferForLazySave.
  • Added a function for allocating the buffer & TPIDR2 object (allocateLazySaveBuffer).
  • Added a test case to sme-shared-za-interface.ll for the scenario described above.

Gentle ping :)

aemerson accepted this revision.Oct 4 2022, 6:55 AM


This revision is now accepted and ready to land.Oct 4 2022, 6:55 AM
This revision was landed with ongoing or failed builds.Oct 5 2022, 6:43 AM
This revision was automatically updated to reflect the committed changes.