Page MenuHomePhabricator

Add Support to X86 for Load Hardening to Mitigate Load Value Injection (LVI) [5/6]
ClosedPublic

Authored by sconstab on Mar 10 2020, 10:00 AM.

Details

Summary

After finding all such gadgets in a given function, the pass minimally inserts LFENCE instructions in such a manner that the following property is satisfied: for all SOURCE+SINK pairs, all paths in the CFG from SOURCE to SINK contain at least one LFENCE instruction. The algorithm that implements this minimal insertion is influenced by an academic paper that minimally inserts memory fences for high-performance concurrent programs:

http://www.cs.ucr.edu/~lesani/companion/oopsla15/OOPSLA15.pdf

The algorithm implemented in this pass is as follows:

  1. Build a condensed CFG (i.e., a GadgetGraph) consisting only of the following components:
    • SOURCE instructions (also includes function arguments)
    • SINK instructions
    • Basic block entry points
    • Basic block terminators
    • LFENCE instructions
  2. Analyze the GadgetGraph to determine which SOURCE+SINK pairs (i.e., gadgets) are already mitigated by existing LFENCEs. If all gadgets have been mitigated, go to step 6.
  3. Use a heuristic or plugin to approximate minimal LFENCE insertion.
  4. Insert one LFENCE along each CFG edge that was cut in step 3.
  5. Go to step 2.
  6. If any LFENCEs were inserted, return true from runOnFunction() to tell LLVM that the function was modified.

By default, the heuristic used in Step 3 is a greedy heuristic that avoids inserting LFENCEs into loops unless absolutely necessary. There is also a CLI option to load a plugin that can provide even better optimization, inserting fewer fences, while still mitigating all of the LVI gadgets. The plugin can be found here: https://github.com/intel/lvi-llvm-optimization-plugin, and a description of the pass's behavior with the plugin can be found here: https://software.intel.com/security-software-guidance/insights/optimized-mitigation-approach-load-value-injection.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
craig.topper added inline comments.Apr 4 2020, 2:14 PM
llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
626

I'm thinking about adding a method to Graph to get a node/edge's index so we can hide all these std::distance calls.

636

Will do.

-Add llvm:: to the for_each calls

Make use getNodeIndex/getEdgeIndex

craig.topper requested review of this revision.Apr 6 2020, 10:18 AM
craig.topper marked an inline comment as done.Apr 8 2020, 2:02 PM
craig.topper added inline comments.
llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
383–427

doh this should be in the other patch.

Rebase on top of D75937

mattdr requested changes to this revision.Apr 15 2020, 1:23 AM
mattdr marked an inline comment as done.

Reviewed the algorithm, have to confess I skipped the tests.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
318

I know there was a lot of effort to make ImmutableGraph efficient, but this is an O(N) reallocation on every loop, right?

In practice, how many times do we end up looping before we hit a fixed point?

533

Could be more descriptive to call this, say, trimMitigatedEdges

541

In what circumstance will we add fences to the graph as sources or sinks? Can we just avoid that at the point of insertion, rather than running this extra culling pass on every iteration?

559

I would write something stronger here:

Start off assuming all gadgets originating at this source node have been mitigated and can be removed. Later we will add back unmitigated gadgets by erasing them from the removal set.

569

This is really, really hard to read and understand. I think that's in large part because we have this one graph that represents _both_ control-flow _and_ source/sink pairs.

Given that it's the load-bearing part of the whole stack, let me offer the best way I've come up with to explain it, then a suggestion for making it simpler to follow.

My summary based on a few readings:

First we compute GadgetSinks, the set of gadget sinks whose source is the current root. Then we do a depth-first search of the control-flow graph to find all gadget sinks that are control-flow-reachable from the given root. When we find such a sink, we look to see if it is also in GadgetSinks -- again, a sink whose source is the current root -- at which point we know we have found an unmitigated gadget sink. We iterate through the root's gadget edges to find the specific edge that points to the current DFS node -- the unmitigated gadget sink -- and remove that edge from ElimGadgets, where we had previously added it on the presumption it was mitigated.

One idea for making this easier to follow:

  1. Start off assuming no edges are mitigated. Don't pre-populate the EdgeSet with every edge.
  2. Use depth-first search to create ReachableSinks, the set of all gadget sinks (regardless of source) that are control-flow-reachable from RootN. (I don't think this adds more work than TraverseDFS already does.)
  3. Iterate over gadget edges from RootN. For each edge, ask: is the destination reachable? If so, add this to a new set, GadgetEdges.
  4. After iterating over all nodes, let TrimEdges be AllEdges - GadgetEdges. Then trim those.
571

We already capture RootN later on. We can remove the boolean conditional and make this more readable by writing if (*N != RootN) (maybe if (N != &RootN)? I've sort of lost track of the API for nodes and edges.)

Pushing further: maybe we can just remove the tiny optimization we get by special-casing the root node in the interest of simplicity?

595

It's weird to have a negated predicate like this in a conditional with an else. Let's either rewrite it or un-negate it and flip the if/else blocks.

Rewrite: !ElimEdges.empty() || !ElimNodes.empty()

606

likewise findEdgesToCut or similar

609

I'm still not convinced it's worth the extra (untestable) complexity to add this plugin point, but I defer to the committer.

611

Extra {} should be unnecessary

645

Seems like NI is a vestige of a previous iteration

647

Every edge touched by this loop will also be touched by the other leg of this loop at one point or another ("E is a CFG edge"). Can we find a way to avoid the O(E^2) inner loop?

For example: what if we made the outermost loop edge-major? Keep sets of mitigated source-nodes and sink-nodes. Create a list of all edges by weight, sort it, then go through the edges from lowest to highest weight and use them to try to mitigate a new source or sink node.

There's a good chance I'm missing something fundamental there, and I thank you in advance for your patience explaining it to me. But if I can I'd really like to avoid repeating the CheapestSoFar comparison on edge weights in two places.

654

What guarantee do we have here that all of the gadget sinks have, in fact, been added to GadgetSinks? It seems to be up to the order in which we iterate through nodes and edges.

655

NI also seems like it comes from a previous version

672

A comment for what this lambda does, and how it's intended to be used, is much appreciated for the top-to-bottom reader

Probably something like:

When we add an LFENCE before an instruction, remove any CFG edges that point to that instruction because they all now refer to a mitigated codepath

685

// Insert an LFENCE in this basic block

686

// ... at this point in the basic block

704–705

This one took me a bit. It seems like the summary is: add a fence unless it would be redundant, i.e. if we can see the next instruction and see it is itself a fence

This revision now requires changes to proceed.Apr 15 2020, 1:23 AM
sconstab commandeered this revision.Apr 16 2020, 12:10 PM
sconstab edited reviewers, added: craig.topper; removed: sconstab.
sconstab updated this revision to Diff 259718.Apr 23 2020, 2:52 PM
sconstab marked 20 inline comments as done.

Addressed feedback from @mattdr

@mattdr Thanks for the very scrupulous review!

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
318

The answers to your questions are "yes" and "until all of the gadgets have been mitigated". The intent here is to optimize for the MILP plugin:
https://github.com/intel/lvi-llvm-optimization-plugin
The plugin eliminates lots of gadgets all at once, and therefore rebuilding the graph is a relatively rare event. I'm not sure it would be worthwhile to use an entirely different data structure and algorithm for the greedy mitigation strategy.

What I have done is rewrite the Greedy heuristic to keep track of which edges have been cut and eliminated, thus never actually having to rebuild the graph.

533

Yes, done.

541

Fences are never added as sources or sinks. We need to know where the fences are so that we can eliminate gadget edges for which all CFG paths from source to sink cross a fence. We could perform this analysis at the point of insertion, i.e., during the getGadgetGraph() function, BUT this analysis would be much more expensive there because it would be performed on the actual MachineFunction, instead of on the gadget graph.

559

Well, I substantially rewrote this algorithm to build a set of mitigated edges, instead of trimming down a set of mitigated edges. I think that the new set of comments should be much easier to follow.

569

You of course are correct that this was terribly difficult to follow. I actually found an even simpler solution that what you had suggested. For each RootN that is the source of at least one gadget, I DFS to find all of the Visited nodes, and then check to see which node members of Visited are also the destinations of a gadget edge rooted at RootN.

595

Right. By the way, I decomposed elimEdges() into two functions:

  • elimEdges() just runs the edge elimination algorithm without rebuilding the gadget graph.
  • trimMitigatedEdges() wraps elimEdges() and does rebuild (i.e., "trim") the gadget graph.
647

Yes... in hindsight this whole algorithm was more than a bit sloppy on my part. I completely revamped the algorithm and I think that now it is O(N + E). Please check and make sure you agree.

654

Yikes! Can't believe I overlooked this. I fixed the issue, and it looks like the algorithm had been cutting a few more edges in some cases than necessary.

672

I simplified this by inlining lambda with a descriptive comment.

704–705

Right. I added a clarifying comment.

mattdr accepted this revision.Apr 23 2020, 11:06 PM

Still some comments to address, but I think this is substantially close to where it needs to be. Thanks so much for your work making the code more straightforward and readable!

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
163–164

In the implementation the latter two parameters are called ElimEdges and ElimNodes, which I think are good names

318

Did you mean to remove this call to GraphBuilder::trim()? As it stands right now the loop actually reallocates the graph twice per iteration -- once in trimMitigatedEdges, once in this call to trim().

533

This is a lot clearer now, thanks!

Given what this is doing I think it makes sense to call it findMitigatedEdges.

536

"Eliminate CFG edges that target a fence, as they're trivially mitigated"

540

How are we sure we've removed all edges that pointed to this same destination node?

552

To make it easier to follow, let's call this something like ReachableGadgetSinkNodes

554–555

I think this is just intended as an optimization -- it's not necessary for correctness. Assuming that's right, suggest removing it since it doesn't really make things faster but it does add some extra complexity.

558

Maybe FindCFGReachableGadgetSinksDFS?

571

Still think it makes sense to add the root node to the reachable set (as it's trivially reachable from itself)

582

Suggest putting this at the top of the loop so it's obvious from the beginning that results aren't intended to accrue from iteration to iteration.

635

I really like that the refinement loop (trying to get to a fixed point) for the greedy algorithm is now entirely within this function. I'd suggest going one step further: take the loop out of runOnMachineFunction entirely and add one around the calls to the plugin in the lines above. That way we keep the implementation detail that "this needs to be run in a loop" as close to the actual algorithm as possible.

647

Each iteration loops over all edges and removes exactly one... so this is probably still O(E^2).

Seems like we can get it down closer to O(E * lg E) if we:

  1. Compute GadgetSources and GadgetSinks once
  2. Put the edges into a vector, then sorting them by weight
  3. Iterate through that vector and, for each edge, add it as an edge to cut if it is still relevant (i.e. not yet otherwise mitigated)

The insight here is that edge weights don't change, so mitigating an edge doesn't change the ranking of other edges.

That said, I'm happy with the readability of the current implementation and would be satisfied for now if we just add //FIXME: this is O(E^2), it could probably be O(E * lg E) with some work

714–715

You can skip the double-lookup and just do:

if (MachineGadgetGraph::isCFGEdge(E)) {
  if (CutEdges.insert(E)) {  // returns true if not already present
    ++AdditionalEdgesCut;
  }
}
This revision is now accepted and ready to land.Apr 23 2020, 11:06 PM
sconstab updated this revision to Diff 260441.Apr 27 2020, 1:17 PM
sconstab marked 14 inline comments as done.

Addressed comments from @mattdr.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
533

I changed it to elimMitigatedEdgesAndNodes()

540

I think it does? This loop iterates through ALL edges. For each CFG edge that ingresses a fence, add that fence to ElimNodes, and add that edge and all egress CFG edges to ElimEdges. The loop does *not* skip over edges that have been added to ElimEdges, and therefore I think this should ensure that all CFG edges pointing to a given fence are removed.

552

But that isn't accurate.. The algorithm finds all reachable nodes, not just gadget sinks. I renamed to ReachableNodes.

554–555

Why do you think it doesn't make things faster? A majority of nodes in the graph are not gadget sinks, and this majority tends to grow rapidly as gadgets become mitigated. Each time this check passes, it saves an entire DFS traversal and one O(E) loop through the edges.

558

Similar to above, this is not accurate as it finds all reachable nodes, not just gadget sinks. I did rename to FindReachableNodes and added detail to the comment.

647

I think your description actually implies that this greedy heuristic cannot be less that O(E^2), right?

Iterate through that vector and, for each edge, add it as an edge to cut if it is still relevant (i.e. not yet otherwise mitigated)

The "if it is still relevant" is an O(E) operation, so the whole thing should be O(E^2).

Regardless, I did add a comment to this effect.

sconstab requested review of this revision.Apr 27 2020, 1:18 PM

Eliminated the NoFixedLoads feature in D75936, which simplified this patch quite a bit.

mattdr marked 2 inline comments as done.Apr 27 2020, 3:10 PM
mattdr added inline comments.
llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
540

Sure enough, that makes sense.

554–555

Ack! Yes, it seems I consistently misread this function and missed the difference between isGadgetEdge in some places and IsCFGEdge in others. (As I mentioned, having both represented in the same graph is confusing.) Now I see this is a different check than is done in the DFS.

sconstab updated this revision to Diff 261969.May 4 2020, 5:11 PM

Rebase onto master.

mattdr accepted this revision.May 8 2020, 12:01 PM
This revision is now accepted and ready to land.May 8 2020, 12:01 PM
This revision was automatically updated to reflect the committed changes.