We already do this for most cases, with the exception of instructions that
get expanded to function calls (e.g. for lowering operations on fp128
values), in which case we temporarily allocate a lazy-save buffer.
The code that is generated in this case, is however incorrect, as it seems
to pass an incorrect address for the TPIDR2 object to the ZA restore
function. By always allocating the lazy-save buffer once, we avoid this
issue entirely.
The cost is that we also allocate such a buffer when it is not
needed. We could fix that in a follow-up patch, where we remove the
lazy-save buffer when it isn't used.
What about "Conservatively assume the function requires the lazy-save mechanism."?