Page MenuHomePhabricator

[Spectre] Introduce a new pass to do speculative load hardening to mitigate Spectre variant #1 for x86.
ClosedPublic

Authored by chandlerc on Mar 23 2018, 3:52 AM.

Details

Summary

There is a lengthy, detailed RFC thread on llvm-dev which discusses the
high level issues. High level discussion is probably best there.

This patch is just an initial step. It isn't really ready for prime time
and is only exposed via debugging flags. It has two major limitations
currently:

  1. It only supports x86-64, and only certain ABIs. Many assumptions are currently hard-coded and need to be factored out of the code here.
  2. It doesn't include any options for more fine-grained control, either of which control flow edges are significant or which loads are important to be hardened.

However, this is enough for people to begin using. I have had numerous
requests from people to be able to experiment with this patch to
understand the trade-offs it presents and how to use it. We would also
like to encourage work to similar effect in other toolchains.

The ARM folks are actively developing a system based on this for
AArch64. We hope to merge this with their efforts when both are far
enough along. But we also don't want to block making this available on
that effort.

Diff Detail

Repository
rL LLVM

Event Timeline

chandlerc created this revision.Mar 23 2018, 3:52 AM
jgorbe added a subscriber: jgorbe.Mar 23 2018, 11:39 AM
chandlerc updated this revision to Diff 139730.Mar 24 2018, 4:45 PM

Rebase and update. Relevant changes here:

  • Updated docs a bit to track changes in the live Google Doc. Added links to the MSVC LFENCE analysis and added explicit credit for one of the folks at HACS that helped come up with some of these ideas.
  • Removed a bunch of flags that simply don't work now that the pass uses post-load hardening as heavily as it does. These mostly had to do with tricks to harden against even access low 2gb of memory. These tricks aren't compatible with a mask that can be used for post-load hardening. Plus they were pretty expensive. I had already removed them from the design document, but still had the code lying around and just not working as intended.
  • Removed the flag for fixed address hardening. This doesn't actually work anyways for the same reason as above.
  • Added FIXME text for one hole in the current approach around segmented addresses where we don't secure them as much as I would like. I'll probably just document that TLS data *also* cannot directly be used for secret data and be mitigated with this approach as we'll end up accessing fixed offsets of the TLS segment.
  • Added options to disable the IP hardening technique. This is responsible for a surprising fraction of the overhead: as much as 10% in my rough measurements. Having this option makes it easy to measure such things.
emaste added a subscriber: emaste.Mar 25 2018, 9:10 AM
chandlerc updated this revision to Diff 139895.Mar 27 2018, 1:58 AM

Rebase and updates from a round of feedback from Paul Kocher.

@chandlerc, once this patch is in a testable state, I'll merge it into a feature branch in HardenedBSD's playground repo. I'll test compiling world and kernel with it. Additionally, if there are any specific benchmarks you'd like me to run, I'd be more than happy to run them.

chandlerc updated this revision to Diff 154962.Jul 11 2018, 3:54 AM

Major rebase and update.

This now relies heavily on the new EFLAGS copy lowering. It also enhances it in
significant ways to handle complex control flow patterns with copies of EFLAGS
produced by this pass.

As a consequence of relying more on EFLAGS copying (and the fact that we have
reliably and efficient lowering of EFLAGS copies now) I've completely nuked
a bunch of the options for hacking around this.

There are still BMI2-based code paths that try to use SHRX rather than
EFLAGS-clobbering OR instructions. Some of these may not make sense, but it
seemed a bit premature to remove them.

I've done some minor cleanups to the code as well.

We now have a large number of users that want to experiment with this ASAP.
While I understand that there is probably a lot of work left to do to make this
100%, I'd like to focus on getting something into at least a tolerable state to
land. It is all hidden behind a flag and so not going to break things. We can
clean things up and improve things incrementally but a lot of users I think
would prefer that even a beta version of this be available sooner rather than
later.

So, please begin actually reviewing the code. I'll be working on trivial
testing very soon and expect to at least have the basics covered shortly.

Most of the LLVM test suite passes with this enabled, but I am seeing a number
of failures so a bug snuck in here somewhere. I'll be working on that, but
hopefully doesn't block the majority of code review.

chandlerc updated this revision to Diff 154966.Jul 11 2018, 4:04 AM
chandlerc retitled this revision from [Spectre] Introduce a new pass to do speculative load hardening to mitigate Spectre variant #1. to [Spectre] Introduce a new pass to do speculative load hardening to mitigate Spectre variant #1 for x86..
chandlerc edited the summary of this revision. (Show Details)
chandlerc removed subscribers: kristof.beyls, lattera, emaste and 2 others.

Update the phabricator review to have the much more modern commit description.

Still need to sync the documentation here with the live google document.

craig.topper added inline comments.
llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp
1427 ↗(On Diff #154966)

canonical*

1437 ↗(On Diff #154966)

Maybe use

OrI->addRegisterDead(X86::EFLAGS, TRI);

It should scan the operands and find EFLAGS without needing magic numbers

1760 ↗(On Diff #154966)

This should probably just be INSERT_SUBREG. SUBREG_TO_REG means you guarantee the upper bits are 0. But I'm not sure that's guaranteed here. And I don't think its really necessary.

mgrang added inline comments.Jul 11 2018, 11:03 AM
llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp
594 ↗(On Diff #154966)
chandlerc updated this revision to Diff 155085.Jul 11 2018, 4:50 PM

Update with several bug fixes:

  • No longer reversing the operands to SHRX instructions in two places.
  • Correctly doing the PHI placement for the predicate state after we finish doing all the per-block rewrites so that we include the predicate state round trips through function calls.
  • Fixing a nasty bug in EFLAGS copy lowering where we broke out of the loop too soon. I'll create a separate patch to land this with its own test, but included here so that this can be tested. It's just one line.

Also two generic improvements:

  • Now that we're scanning more blocks in the EFLAGS copy lowering, improve how to detect non-dominating blocks. We can check before adding them rather than when we start visiting them which seems much more clear and simplifies the condition.
  • Sink the initial predicate state formation past any label fluff at the top of the function. This may not be necessary but seemed like a nice clarification to me.
chandlerc edited reviewers, added: craig.topper, echristo; removed: javed.absar.Jul 11 2018, 4:53 PM
chandlerc updated this revision to Diff 155140.Jul 12 2018, 2:57 AM

Rebase on top of D49220 and add some more basic testing:

  • Conditions, merges, address hardening, post-load hardening
  • Loops, nested loops
  • Invokes and landing pads

Also worked on the tests ta get reasonable generated checks. I think we want
generated checks here because this pass is all about the very detailed
instruction sequence produced for these patterns.

I can add a bit more testing, but everything will look mostly like this. There
isn't *that* much variation left. The biggest thing I can add is some testing
with more contended EFLAGS, but a lot of the logic there is now really simple
in SLH and handled (and tested) in the flags copy lowering.

Generally, I think this is ready for more detailed review. I'll update with
fixes to the few minor comments next, and then I'll work on some more test
coverage.

I have a collection of refactorings in mind. I'm happy to do those whenever
makes sense -- immediately, after a round of review, or in a follow-up commit.

chandlerc updated this revision to Diff 155144.Jul 12 2018, 3:26 AM
chandlerc marked 3 inline comments as done.

Address all of Craig's outstanding review comments.

chandlerc updated this revision to Diff 155145.Jul 12 2018, 3:28 AM
chandlerc marked an inline comment as done.

Switch a std::sort to llvm::sort spotted in review.

Thanks for comments so far. Should all be addressed now. See my big update above.

craig.topper added inline comments.Jul 12 2018, 2:08 PM
llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp
413 ↗(On Diff #155145)

should we also check getReg() == X86::EFLAGS in this assert.

1442 ↗(On Diff #155145)

canonical is still mispelled

chandlerc updated this revision to Diff 155305.Jul 12 2018, 5:26 PM
chandlerc marked 2 inline comments as done.

Address review comments.

Thanks, addressed.

llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp
413 ↗(On Diff #155145)

Instead of this, I rewrote this to not hard-code 1 and to find the EFLAGS def operand and then do both the assert and setting it to dead.i

1442 ↗(On Diff #155145)

Doh! When the lines shifted I couldn't find it. Fixed for real. Along with some other typos in comments.

Just FYI, I'll land the docs in a separate commit just to keep any reverts and such simple. I'm also working on getting an updated snapshot of the google doc as there have been a number of edits and enhancements there.

craig.topper added inline comments.Jul 12 2018, 9:48 PM
llvm/docs/SpeculativeLoadHardening.md
567 ↗(On Diff #155305)

chose->choose

llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp
778 ↗(On Diff #155305)

Can we drop all the vector instructions here? Their AVX and AVX512 equivalents are all missing. It probably makes more sense to add them more methodically. Or add a bit to TSFlags.

929 ↗(On Diff #155305)

Further proof this list needs a methodical scrub. XORPSrm is missing, while ANDPS, ORPS, ANDNPS are all present.

965 ↗(On Diff #155305)

Oh there's the XORPSrm...

1123 ↗(On Diff #155305)

AVX versions of these are missing, but we can address as a follow up if you want. Just don't want to lose it since none of the instructions here will be used on Haswell with AVX.

1126 ↗(On Diff #155305)

May want MOV8rm_NOREX, MOVSX32rm8_NOREX, MOVZX32rm8_NOREX,

echristo accepted this revision.Jul 12 2018, 10:09 PM

This is obviously good to start and to iterate on.

I started adding some nits and some refactoring comments, but stopped - I think if we break up some of the larger functions it'll be a good start. Can be done later though.

-eric

llvm/include/llvm/CodeGen/MachineBasicBlock.h
483 ↗(On Diff #155305)

You don't currently use the normalization as far as I can tell.

llvm/lib/CodeGen/RegisterCoalescer.cpp
1212 ↗(On Diff #155305)

Can we get a comment on this assert?

llvm/lib/Target/X86/X86InstrInfo.h
327 ↗(On Diff #155305)

Not sure why?

llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp
302 ↗(On Diff #155305)

Can you pull this out into a separate function? Just for readability.

313 ↗(On Diff #155305)

Might want to comment that this is largely a set of optimizations on the lfence insertion?

317 ↗(On Diff #155305)

Braces nit.

321 ↗(On Diff #155305)

Comment please?

352 ↗(On Diff #155305)

Split and continue?

452 ↗(On Diff #155305)

Go ahead and comment what you're doing here please?

1553 ↗(On Diff #155305)

At least comment why the #if 0 is if'd out?

This revision is now accepted and ready to land.Jul 12 2018, 10:09 PM
echristo added inline comments.Jul 12 2018, 10:17 PM
llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp
771 ↗(On Diff #155305)

As an addendum after I saw Craig's comments - I definitely agree and think that making this part of the instruction tables is going to be important. That said, conservatively correct here is probably the best direction for now.

chandlerc marked 17 inline comments as done.Jul 13 2018, 3:59 AM

Thanks all.

I've made essentially all the suggested changes or nuked the code in question. A bunch of the extra files were just dregs. Not sure why I didn't notice and remove them earlier. They're gone now.

One thing I've left for now is the successor normalization. I explain the motivation, and I'm happy to remove it in a followup if its not convincing.

I'm going to go ahead and land this so that we can start much more actively testing and getting feedback on it. I'm definitely still interested in further comments and will keep working on cleaning some of the issues in this code up. The primary focus I'll have in follow-up patches initially is doing basic refactorings to pull out the different moving pieces here into comprehensible routines with proper comments and such.

llvm/include/llvm/CodeGen/MachineBasicBlock.h
483 ↗(On Diff #155305)

This was to remain consistent with removeSuccessor.

I can remove it in a follow-up if you want, I don't really feel strongly here.

llvm/lib/CodeGen/RegisterCoalescer.cpp
1212 ↗(On Diff #155305)

This was working around weird issues I hit using LAHF and SAHF. I'm no longer using them so I've just dropped this part of the patch.

llvm/lib/Target/X86/X86InstrInfo.h
327 ↗(On Diff #155305)

Sorry, this got copied from some other patch during development. Dropped for now.

But we should actually make some fixes here.

Because we're missing this, we override only *some* of the calls to analyzeBranch with the X86-specific logic. =[

llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp
313 ↗(On Diff #155305)

Yeah, and it's all just wrong. See below.

321 ↗(On Diff #155305)

I don't know what I was thinking when I wrote this. I've nuked all this code as none of it makes any sense.

Now we just do the much simpler thing of hardening all edges where there were multiple edges coming out of a single block which has branch terminators and where the successors aren't EH pads. Way simpler. Handles way more of the real cases.

778 ↗(On Diff #155305)

Yeah, sorry. I just pasted these as I was going through to find the ones I *wanted* to mark as true. But as we aren't handling them, might as well skip for now . Makes it less messy until we can get these into the TD files.

1123 ↗(On Diff #155305)

Added a FIXME. This will be worth doing quickly as these actually come up amazingly frequently and so I think its worth handling them, and making sure we do in AVX mode as well.

1553 ↗(On Diff #155305)

Sorry, I kept meaning to just delete this and never dit.

I had this great idea of sliding around the position of the hardening instruction to minimize EFLAGS copies. But htat was when EFLAGS copies were basically death to performance. With the new lowering, I'm much less worried and suspect we'll never want to bother with this. It's not so precious of code as we couldn't write it again, and I never debugged it so it is probably full of bad.

Deleted.

This revision was automatically updated to reflect the committed changes.
chandlerc marked 8 inline comments as done.