Page MenuHomePhabricator

Add a Pass to X86 that builds a Condensed CFG for Load Value Injection (LVI) Gadgets [4/6]
ClosedPublic

Authored by sconstab on Mar 10 2020, 10:00 AM.

Details

Summary

Adds a new data structure, ImmutableGraph, and uses RDF to find LVI gadgets and add them to a MachineGadgetGraph.

More specifically, a new X86 machine pass finds Load Value Injection (LVI) gadgets consisting of a load from memory (i.e., SOURCE), and any operation that may transmit the value loaded from memory over a covert channel, or use the value loaded from memory to determine a branch/call target (i.e., SINK).

Also adds a new target feature to X86: +lvi-load-hardening

The feature can be added via the clang CLI using -mlvi-hardening.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sconstab added inline comments.Apr 7 2020, 3:30 PM
llvm/lib/Target/X86/ImmutableGraph.h
74

G->nodes_size() will return the size without the sentinel node, so your example should actually operate on the last data node. Right?

Address review comments

craig.topper marked an inline comment as done.Apr 8 2020, 11:16 AM
craig.topper added inline comments.
llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
84–86

Oops forgot to do this one.

craig.topper marked 37 inline comments as done and an inline comment as not done.Apr 8 2020, 11:25 AM
craig.topper added inline comments.
llvm/lib/Target/X86/ImmutableGraph.h
42

Removed the template argument

74

Comment added to clarify the extra node was allocated.

118

Removed the template argument

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
114

I found a way to remove the getMF method entirely.

234

Added a fatal error. Which isn't great as it will generate a crash report in clang. But it will tell the user to file a compiler bug so I guess that's something.

craig.topper marked 5 inline comments as done.

-Replace ARG_NODE and GADGET_EDGE defines with static constexpr members

-Put llvm:: on the for_each calls in this patch instead of D75937

sconstab added inline comments.Apr 9 2020, 6:45 PM
llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
234

Would it be better to have

report_fatal_error("LVI load hardening is only supported on 64-bit "
                   "targets.", false);

So that the crash diagnostic is not generated?

@craig.topper can you please update/rebase this stack to remove the early LVI CFI work that ended up landed in https://reviews.llvm.org/D76812 instead?

mattdr accepted this revision.Apr 13 2020, 4:53 PM

Accepting modulo some comments to be addressed. Most of my review effort was spent on the data structure and algorithm employed as well as code style and readability.

I am least confident about my understanding of instrAccessesStackSlot and the other functions that make up instrIsFixedAccess, since each function seems to be pattern-matching on something very specific without a reference to why. I also don't know if this diff provides an exhaustive list of fixed loads, or indeed if it was intended to.

llvm/lib/Target/X86/ImmutableGraph.h
103

Worth adding a comment for this (and getEdgeIndex) that this will crash if the Node (Edge) provided is not a reference into this specific instance of ImmutableGraph.

104

Ideally I agree we'd find a way to collapse these -- but for this diff, let's content ourselves with a FIXME comment to that effect.

341

This if is unnecessary

349

Technically a "generic" graph, so we should leave out "Gadget" here

367

Two comments here would make the code significantly easier to understand:

  1. Note that we're using .size() here rather than .count(), so we're actually iterating over *all* Node indices, not just the ones to be trimmed
  2. The TrimmedNodes vector maps indices in the original NodeSet to the number of Nodes before that index that have been trimmed by that index, to allow later code to map elements to their new position in a dense array with the trimmed items removed
llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
57

This is another case where references to good documentation will go a long way. Without details about what the tradeoff is and how to reason about it, it doesn't seem like anyone should use this flag.

237

Each call to hardenLoads leads to a call to buildGadgetGraph. A *lot* of the work that getGadgetGraph does seems to be common between mitigating fixed and non-fixed loads -- for example, computing register dataflow and liveness over the entire function.

And calling hardenLoads twice looks to be the common case, since NoFixedLoads is false by default.

Could we make this pass about half as expensive by default by combining these two calls to hardenLoads into one? It would do the expensive work once, then either harden _all_ loads or _only_ non-fixed loads.

323

Please add a comment explaining the semantics of the boolean return here. I think it's: true if we need to consider defs of this instruction tainted by this use (and therefore add them for analysis); false if this instruction consumes its use

330

Why is it okay to assume that a call doesn't propagate its uses to defs? Is it because we can assume the CFI transform is already inserting an LFENCE? Whatever the reason, let's state it explicitly here

342

Some more detail would be useful here: precise about what? What are the likely errors?

410

We analyze every def from every instruction in the function, but then also in AnalyzeDefUseChain analyze every def of every instruction with an interesting use. Are we doing a lot of extra work?

494

Worth a comment here that we don't need to worry about indirect branches (jmp to register) because elsewhere we prevent them from being generated

507

It seems very weird to make this a template argument rather than just, like, a regular argument.

This revision is now accepted and ready to land.Apr 13 2020, 4:53 PM

Address some of the review comments. Primarily the ones in ImmutableGraph. I did de-templatize the method in X86LoadValueInjectionLoadHardening.cpp

craig.topper marked 6 inline comments as done.Apr 14 2020, 1:25 PM
craig.topper added inline comments.
llvm/lib/Target/X86/ImmutableGraph.h
367

I've stopped using TrimmedNodes.size() and TrimEdges.size() in favor of the size methods from the graph which should make things more obvious.

I renamed TrimmedNodes to RemappedNodeIndex and stored the new index rather than the adjustment needed. I'm also changed it to walk nodes instead of indices so we don't have to translate to Node to make the contains call.

I also removed the NewNumEdges count_if and the if statement around the edge loop from the loop below. I don't think that provided any value and just complicated the code.

llvm/lib/Target/X86/X86ISelLowering.cpp
8626 ↗(On Diff #257465)

Oops this snuck in from something else I was experimenting with in my repo earlier. I'll remove.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
507

Agreed. I've remove the template argument and made it a static function instead of a method since it doesn't use anything from the class.

craig.topper marked an inline comment as done.

Fix some mistakes I made in the previous upload where I accidentally deleted the pipeline tests instead of updating them.

Merge in test case from D77431 since it logically belongs with these changes.

mattdr marked an inline comment as done.Apr 14 2020, 10:42 PM
mattdr added inline comments.
llvm/lib/Target/X86/ImmutableGraph.h
367

Many thanks! These changes make the code much more accessible.

sconstab commandeered this revision.Apr 16 2020, 12:11 PM
sconstab edited reviewers, added: craig.topper; removed: sconstab.
sconstab updated this revision to Diff 260439.Apr 27 2020, 1:15 PM

Removed the -x86-lvi-no-fixed CLI flag. This change simplifies the code flow quite a bit.

craig.topper added inline comments.Apr 28 2020, 3:26 PM
llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
323

Was this comment addressed?

342

Was this answered somewhere?

410

Was this answered somewhere?

sconstab updated this revision to Diff 261968.Mon, May 4, 5:10 PM
sconstab marked 9 inline comments as done.

Addressed the previously unaddressed comments, as pointed out by @craig.topper.

sconstab added inline comments.Mon, May 4, 5:10 PM
llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
323

It had not been addressed, so thank you for pointing this out. That lambda was doing too many things at once, which made it more confusing than it needed to be. So I just inlined it in the

for (auto N : Uses) {
…
}

loop, and I added some additional clarifying comments.

342

This was referring to the use of mayLoad(). At the time I wrote that comment, I wasn't sure that mayLoad() was exactly what was needed there, but I now think that it does suffice (SLH also uses MachineInstr::mayLoad()).

410

Wow, big oversight on my part. @mattdr was correct that this was doing a LOT of extra work. I added a memoization scheme that remembers the instructions that may transmit for each def. The getGadgetGraph() routine now runs about 75% faster.

mattdr added a comment.Thu, May 7, 1:57 PM

Calling special attention to the comment at line 341, since I think it affects the correctness of the pass.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
288

This comment doesn't seem to match how the map is used -- it looks like the loop assumes a def has been analyzed iff it is present in the map. This matches my expectation that, if a def is present and maps to an empty list, it would meant the def had been analyzed and found not to transmit.

296

fwiw, this code would be easier to understand if we didn't shadow Def with another variable named Def.

322–323

"current def" is a bit ambiguous here. I _believe_ it means AnalyzeDef's Def argument? At least, that's the interpretation that makes the comment make sense since UsesVisited is in AnalyzeDef's scope.

332

Copying a comment from a previous iteration:

Why is it okay to assume that a call doesn't propagate its uses to defs? Is it because we can assume the CFI transform is already inserting an LFENCE? Whatever the reason, let's state it explicitly here

342

The comment doesn't match the loop, which is traversing over Uses. More importantly, though: why are we allowed to stop traversing through Uses here? This Def won't be analyzed again, so this is our only chance to enumerate all transmitters to make sure we have all the necessary source -> sink edges in the gadget graph.

365–366

This is also the place we populate Transmitters (with a default-constructed vector) for the current def if we haven't otherwise found any transmits. That's good, and necessary for Transmitters to remember we've analyzed the current def. But we should leave a comment about this subtle load-bearing side-effect.

367–370

Should Transmitters map to an llvm::SmallSet?

sconstab updated this revision to Diff 262816.Thu, May 7, 11:32 PM
sconstab marked 9 inline comments as done.

Addressed comments by @mattdr.

Several comments in the code have been updated, but the code has not changed.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
296

Changed the outer def to SourceDef, which also seems to make the code after the lambda a lot clearer.

322–323

I am now trying to be clearer by using capital-d "Def" to refer specifically to the def that is being analyzed, and lower-case-d "def" to refer to any other defs. Do you think this is better? Good enough?

332

Added clarification to the comment.

342

@mattdr I think that the code is correct, and I added more to the comment in an attempt to clarify. Let me know if you still think that this is an issue.

367–370

In my testing, std::vector seems a bit faster than llvm::SmallSet. I also suspect that llvm::SmallSet may waste more space because many defs will have no transmitters.

mattdr accepted this revision.Fri, May 8, 12:17 AM
mattdr marked 2 inline comments as done.

The extra comments and the new variable name are all helpful. Thanks again.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
322–323

Much better. Thank you for the change!

342

I definitely misread continue as break here. Thanks for the extra clarity and sorry for the noise.

This revision was automatically updated to reflect the committed changes.
nikic added a subscriber: nikic.Tue, May 12, 11:41 AM

This change causes a 0.8% compile-time regression for unoptimized builds. Based on the pipeline test diffs, I expect this is because the new pass requests a bunch of analyses, which it most likely (LVI load hardening disabled) will not need. Would it be possible to compute the analyses only if LVI load hardening is actually enabled?

This change causes a 0.8% compile-time regression for unoptimized builds. Based on the pipeline test diffs, I expect this is because the new pass requests a bunch of analyses, which it most likely (LVI load hardening disabled) will not need. Would it be possible to compute the analyses only if LVI load hardening is actually enabled?

@craig.topper Do you have any ideas on how this could be done?

This change causes a 0.8% compile-time regression for unoptimized builds. Based on the pipeline test diffs, I expect this is because the new pass requests a bunch of analyses, which it most likely (LVI load hardening disabled) will not need. Would it be possible to compute the analyses only if LVI load hardening is actually enabled?

@craig.topper Do you have any ideas on how this could be done?

Unfortunately due to LTO the need for LVI hardening is carried as a function attribute. The pass manager system doesn't allow for running different passes per function. So I don't have any good ideas of how to do that.

craig.topper added inline comments.Tue, May 12, 12:24 PM
llvm/test/CodeGen/X86/O3-pipeline.ll
147

I'm curious what happens if we add AU.setPreservesCFG() to getAnalysisUsage in FixupStatepointCallerSaved.cpp From a quick look through that pass it doesn't look like it changes the Machine CFG.

PostRA Machine Sink already preserves CFG. So I think that should remove the dominator tree construction after PostRA machine sink.

I'm no expert in the pass manager, but given the very targeted applicability of LVI it definitely seems like our goal should be 0% impact for the vast majority of compilations that don't concern themselves with it.

Is there a way to require the pass be enabled for the compiler invocation as well as for the subtarget? If there's a mismatch (where LVI is desired but the required analyses weren't done) we can fail the compile.

nikic added a comment.Wed, May 13, 9:04 AM

This change causes a 0.8% compile-time regression for unoptimized builds. Based on the pipeline test diffs, I expect this is because the new pass requests a bunch of analyses, which it most likely (LVI load hardening disabled) will not need. Would it be possible to compute the analyses only if LVI load hardening is actually enabled?

@craig.topper Do you have any ideas on how this could be done?

Unfortunately due to LTO the need for LVI hardening is carried as a function attribute. The pass manager system doesn't allow for running different passes per function. So I don't have any good ideas of how to do that.

Hm, I see. One possibility would be to make those analyses lazy, but that's a larger change.

Possibly a pragmatic choice would be to not support this feature at O0? It does not seem relevant for non-production binaries. The relative impact of a couple unnecessary analysis passes is much higher at O0 than it is at O3.