This is an initial implementation for Speculative Load Hardening for
AArch64. It builds on top of the recently introduced
AArch64SpeculationHardening pass.
This doesn't implement (yet) some of the optimizations implemented for
the X86SpeculativeLoadHardening pass. I thought introducing the
optimizations incrementally in follow-up patches should make this easier
to review.
Details
Diff Detail
Event Timeline
lib/Target/AArch64/AArch64SpeculationHardening.cpp | ||
---|---|---|
583–590 | Why do you need to insert pseudo-instructions here, only to replace them with ANDs later on? Could this loop be moved to after the control-flow tracking has been inserted, and create the ANDs directly? |
lib/Target/AArch64/AArch64SpeculationHardening.cpp | ||
---|---|---|
583–590 | This design inserting pseudo-instructions is based on an earlier design I did (which did not get committed) to implement the intrinsics based approach that is also implemented in gcc. Obviously, we could not use the pseudo-instructions for now and only introduce them if we introduce the intrinsics-based approach too. |
lib/Target/AArch64/AArch64SpeculationHardening.cpp | ||
---|---|---|
583–590 | I've know looked into a design where the pass iterates over each basic block just once, inserting masking operations and csdb instructions in a single go. For that reason, I'd prefer to keep the design where we iterate each basic block twice: once for deciding which registers must be masked at which program locations and once to insert CSDBs. That way, we can implement the CSDB instruction insertion separately from the other transformations in this pass and it is less likely there will be bugs in that implementation. As to the use of pseudo instructions: we could not use pseudo instructions and encode the same information as the pseudo instructions in side data structures in the pass to convey information between the multiple iterations across a basic block. But that again might result in a more complex implementation. And furthermore, when we implement support for user-facing intrinsics, we'll still need those pseudos anyway. Therefore, I think the current design overall is a better trade-off. |
LGTM, thanks.
lib/Target/AArch64/AArch64SpeculationHardening.cpp | ||
---|---|---|
583–590 | My concern was that the two-phase implementation was more complex than doing it on one phase, but if it simplifies the CSDB generation then it makes sense. I agree that using pseudo-instructions is better than storing the same information in a side data structure, and as you say we'll likely need them anyway to implement user-facing intrinsics. |
Hi Kristof,
Thanks for writing this code! I had an easy time understanding what you were doing.
I was discussing this SLH implementation with Chandler and he suggested that it may be useful to write a design doc similar to the one for the x86 implementation, so other people in the community can understand and discuss the current/future design with the ARM specific details laid out. What do you think?
llvm/trunk/lib/Target/AArch64/AArch64SpeculationHardening.cpp | ||
---|---|---|
411 ↗ | (On Diff #180840) | Right now, this masks after every load in a program. Is one of the future optimizations you mention in the comment in this file to mask only after loads that are depended upon by later non-data invariant operations? |
447 ↗ | (On Diff #180840) | I'm not sure if this case is possible, but is it possible that some defs are GPR and some aren't? If that's possible would it be worthwhile to harden the GPRs and non-GPRs in different ways? It appears to me that currently if any are non-GPR, then all of them are treated as non-GPRs. |
I agree it would be good to have such a design doc. Actually I think it would be best to adapt https://llvm.org/docs/SpeculativeLoadHardening.html and separate out x86 and aarch64-specific implementation aspects from the target independent concepts there.
I am happy to try and do that, but will definitely not have time to do so this week. Maybe later this month.
I'd also be happy if e.g. you yourself would like to work on that.
llvm/trunk/lib/Target/AArch64/AArch64SpeculationHardening.cpp | ||
---|---|---|
411 ↗ | (On Diff #180840) | Rereading the FIXMEs I've written in this function: no, that is not recorded as a potential future optimization. |
447 ↗ | (On Diff #180840) | Yeah, something like |
Sounds good. I have a bit of time and can start working on updating the design doc to make it more target independent outside of the implementation section.
llvm/trunk/lib/Target/AArch64/AArch64SpeculationHardening.cpp | ||
---|---|---|
411 ↗ | (On Diff #180840) | I was trying to describe this section and the following section which is a bit different than what I said. Sorry about that. I'm not sure if the assumptions for those implementation detail sections hold for ARM. |
Why do you need to insert pseudo-instructions here, only to replace them with ANDs later on? Could this loop be moved to after the control-flow tracking has been inserted, and create the ANDs directly?