This is an archive of the discontinued LLVM Phabricator instance.

Add Support to X86 for Load Hardening to Mitigate Load Value Injection (LVI) [5/6]
ClosedPublic

Authored by sconstab on Mar 10 2020, 10:00 AM.

Details

Summary

After finding all such gadgets in a given function, the pass minimally inserts LFENCE instructions in such a manner that the following property is satisfied: for all SOURCE+SINK pairs, all paths in the CFG from SOURCE to SINK contain at least one LFENCE instruction. The algorithm that implements this minimal insertion is influenced by an academic paper that minimally inserts memory fences for high-performance concurrent programs:

http://www.cs.ucr.edu/~lesani/companion/oopsla15/OOPSLA15.pdf

The algorithm implemented in this pass is as follows:

  1. Build a condensed CFG (i.e., a GadgetGraph) consisting only of the following components:
    • SOURCE instructions (also includes function arguments)
    • SINK instructions
    • Basic block entry points
    • Basic block terminators
    • LFENCE instructions
  2. Analyze the GadgetGraph to determine which SOURCE+SINK pairs (i.e., gadgets) are already mitigated by existing LFENCEs. If all gadgets have been mitigated, go to step 6.
  3. Use a heuristic or plugin to approximate minimal LFENCE insertion.
  4. Insert one LFENCE along each CFG edge that was cut in step 3.
  5. Go to step 2.
  6. If any LFENCEs were inserted, return true from runOnFunction() to tell LLVM that the function was modified.

By default, the heuristic used in Step 3 is a greedy heuristic that avoids inserting LFENCEs into loops unless absolutely necessary. There is also a CLI option to load a plugin that can provide even better optimization, inserting fewer fences, while still mitigating all of the LVI gadgets. The plugin can be found here: https://github.com/intel/lvi-llvm-optimization-plugin, and a description of the pass's behavior with the plugin can be found here: https://software.intel.com/security-software-guidance/insights/optimized-mitigation-approach-load-value-injection.

Diff Detail

Event Timeline

sconstab created this revision.Mar 10 2020, 10:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2020, 10:00 AM
Herald added subscribers: jfb, hiraditya. · View Herald Transcript
sconstab retitled this revision from Add Support to X86 for Load Hardening to Mitigate Load Value Injection (LVI) to Add Support to X86 for Load Hardening to Mitigate Load Value Injection (LVI) [5/5].
sconstab edited the summary of this revision. (Show Details)Mar 11 2020, 1:58 PM
sconstab retitled this revision from Add Support to X86 for Load Hardening to Mitigate Load Value Injection (LVI) [5/5] to Add Support to X86 for Load Hardening to Mitigate Load Value Injection (LVI) [5/6].Mar 16 2020, 9:30 AM
zbrid added a comment.Mar 18 2020, 6:08 PM

Sorry only took a quick look for now.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
304

Should this also have FencesInserted = hardenLoads?

sconstab updated this revision to Diff 251243.Mar 18 2020, 7:12 PM

One fix to correctly count the number of fences inserted.

sconstab marked an inline comment as done.Mar 18 2020, 7:13 PM
zbrid accepted this revision.Apr 2 2020, 5:09 PM

LGTM

This revision is now accepted and ready to land.Apr 2 2020, 5:09 PM
This revision was automatically updated to reflect the committed changes.
mattdr added a subscriber: mattdr.Apr 3 2020, 5:03 PM
mattdr added inline comments.
llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
642–645

Why aren't these vectors?

craig.topper reopened this revision.Apr 3 2020, 8:51 PM
This revision is now accepted and ready to land.Apr 3 2020, 8:51 PM
craig.topper commandeered this revision.Apr 3 2020, 8:51 PM
craig.topper planned changes to this revision.
craig.topper updated this revision to Diff 255001.
craig.topper edited reviewers, added: sconstab; removed: craig.topper.

Rebase

This revision is now accepted and ready to land.Apr 3 2020, 8:53 PM
craig.topper planned changes to this revision.Apr 3 2020, 9:42 PM
sconstab added inline comments.Apr 3 2020, 9:50 PM
llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
642–645

They're being passed to a C-compatible interface, so intuitively it made more sense to me. I also get paranoid because the array sizes will be copied into the std::vector struct, and from experience it seems I can never be 100% certain these copies will be optimized away.

sconstab added inline comments.Apr 3 2020, 9:55 PM
llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
642–645

@mattdr (above)

Update for changes in D75936. Use std::vector for some arrays.

This revision is now accepted and ready to land.Apr 4 2020, 12:00 AM

-Use range-based for loops
-Remove uses of the Traits class since it doesn't support methods needed for range-based for loops.

sconstab added inline comments.Apr 4 2020, 1:24 PM
llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
552

Would EdgeRef E : Graph->edges() be clearer here? Ditto for many other for loops.

(not sure if the LLVM conventions dictate something specific for these loops)

557

This caught me off guard the first time I saw it because I didn't realize that LLVM had its own implementation that takes a range. For readability, would it be better to prefix with llvm::?

583

Should this be MachineGadgetGraph::NodeRef?

636

I think that std::unique_ptr<T[]> should work fine here...

649

I think this for can lose the brackets.

689

More places where NodeRef and EdgeRef might be clearer.

craig.topper marked 5 inline comments as done.Apr 4 2020, 2:14 PM
craig.topper added inline comments.
llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
552

Personally, I'd prefer not to hide the & or *. And EdgeRef/NodeRef only exist in the Traits class not the main class. It's also confusing that NodeRef is a pointer and not a reference so I'd like to not leak that outside the Traits class where its easier to see.

LLVM is pretty conservative about the use of auto, but I figured in this case it wouldn't be unreasonable for a reader to understand that edges() returns Edge objects. But I'm happing to change it to MachineGadgetGraph::Edge.

557

Agreed. I'll change.

636

Agreed, but std::vector is far more common in the codebase and since it lives on the stack the extra capacity pointer shouldn't be a big deal.

craig.topper added inline comments.Apr 4 2020, 2:14 PM
llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
639

I'm thinking about adding a method to Graph to get a node/edge's index so we can hide all these std::distance calls.

649

Will do.

-Add llvm:: to the for_each calls

Make use getNodeIndex/getEdgeIndex

craig.topper requested review of this revision.Apr 6 2020, 10:18 AM
craig.topper marked an inline comment as done.Apr 8 2020, 2:02 PM
craig.topper added inline comments.
llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
463

doh this should be in the other patch.

Rebase on top of D75937

mattdr requested changes to this revision.Apr 15 2020, 1:23 AM
mattdr marked an inline comment as done.

Reviewed the algorithm, have to confess I skipped the tests.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
369

I know there was a lot of effort to make ImmutableGraph efficient, but this is an O(N) reallocation on every loop, right?

In practice, how many times do we end up looping before we hit a fixed point?

546

Could be more descriptive to call this, say, trimMitigatedEdges

554

In what circumstance will we add fences to the graph as sources or sinks? Can we just avoid that at the point of insertion, rather than running this extra culling pass on every iteration?

572

I would write something stronger here:

Start off assuming all gadgets originating at this source node have been mitigated and can be removed. Later we will add back unmitigated gadgets by erasing them from the removal set.

582

This is really, really hard to read and understand. I think that's in large part because we have this one graph that represents _both_ control-flow _and_ source/sink pairs.

Given that it's the load-bearing part of the whole stack, let me offer the best way I've come up with to explain it, then a suggestion for making it simpler to follow.

My summary based on a few readings:

First we compute GadgetSinks, the set of gadget sinks whose source is the current root. Then we do a depth-first search of the control-flow graph to find all gadget sinks that are control-flow-reachable from the given root. When we find such a sink, we look to see if it is also in GadgetSinks -- again, a sink whose source is the current root -- at which point we know we have found an unmitigated gadget sink. We iterate through the root's gadget edges to find the specific edge that points to the current DFS node -- the unmitigated gadget sink -- and remove that edge from ElimGadgets, where we had previously added it on the presumption it was mitigated.

One idea for making this easier to follow:

  1. Start off assuming no edges are mitigated. Don't pre-populate the EdgeSet with every edge.
  2. Use depth-first search to create ReachableSinks, the set of all gadget sinks (regardless of source) that are control-flow-reachable from RootN. (I don't think this adds more work than TraverseDFS already does.)
  3. Iterate over gadget edges from RootN. For each edge, ask: is the destination reachable? If so, add this to a new set, GadgetEdges.
  4. After iterating over all nodes, let TrimEdges be AllEdges - GadgetEdges. Then trim those.
584

We already capture RootN later on. We can remove the boolean conditional and make this more readable by writing if (*N != RootN) (maybe if (N != &RootN)? I've sort of lost track of the API for nodes and edges.)

Pushing further: maybe we can just remove the tiny optimization we get by special-casing the root node in the interest of simplicity?

608

It's weird to have a negated predicate like this in a conditional with an else. Let's either rewrite it or un-negate it and flip the if/else blocks.

Rewrite: !ElimEdges.empty() || !ElimNodes.empty()

619

likewise findEdgesToCut or similar

622

I'm still not convinced it's worth the extra (untestable) complexity to add this plugin point, but I defer to the committer.

624

Extra {} should be unnecessary

658

Seems like NI is a vestige of a previous iteration

660

Every edge touched by this loop will also be touched by the other leg of this loop at one point or another ("E is a CFG edge"). Can we find a way to avoid the O(E^2) inner loop?

For example: what if we made the outermost loop edge-major? Keep sets of mitigated source-nodes and sink-nodes. Create a list of all edges by weight, sort it, then go through the edges from lowest to highest weight and use them to try to mitigate a new source or sink node.

There's a good chance I'm missing something fundamental there, and I thank you in advance for your patience explaining it to me. But if I can I'd really like to avoid repeating the CheapestSoFar comparison on edge weights in two places.

667

What guarantee do we have here that all of the gadget sinks have, in fact, been added to GadgetSinks? It seems to be up to the order in which we iterate through nodes and edges.

668

NI also seems like it comes from a previous version

685

A comment for what this lambda does, and how it's intended to be used, is much appreciated for the top-to-bottom reader

Probably something like:

When we add an LFENCE before an instruction, remove any CFG edges that point to that instruction because they all now refer to a mitigated codepath

698

// Insert an LFENCE in this basic block

699

// ... at this point in the basic block

717–718

This one took me a bit. It seems like the summary is: add a fence unless it would be redundant, i.e. if we can see the next instruction and see it is itself a fence

This revision now requires changes to proceed.Apr 15 2020, 1:23 AM
sconstab commandeered this revision.Apr 16 2020, 12:10 PM
sconstab edited reviewers, added: craig.topper; removed: sconstab.
sconstab updated this revision to Diff 259718.Apr 23 2020, 2:52 PM
sconstab marked 20 inline comments as done.

Addressed feedback from @mattdr

@mattdr Thanks for the very scrupulous review!

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
369

The answers to your questions are "yes" and "until all of the gadgets have been mitigated". The intent here is to optimize for the MILP plugin:
https://github.com/intel/lvi-llvm-optimization-plugin
The plugin eliminates lots of gadgets all at once, and therefore rebuilding the graph is a relatively rare event. I'm not sure it would be worthwhile to use an entirely different data structure and algorithm for the greedy mitigation strategy.

What I have done is rewrite the Greedy heuristic to keep track of which edges have been cut and eliminated, thus never actually having to rebuild the graph.

546

Yes, done.

554

Fences are never added as sources or sinks. We need to know where the fences are so that we can eliminate gadget edges for which all CFG paths from source to sink cross a fence. We could perform this analysis at the point of insertion, i.e., during the getGadgetGraph() function, BUT this analysis would be much more expensive there because it would be performed on the actual MachineFunction, instead of on the gadget graph.

572

Well, I substantially rewrote this algorithm to build a set of mitigated edges, instead of trimming down a set of mitigated edges. I think that the new set of comments should be much easier to follow.

582

You of course are correct that this was terribly difficult to follow. I actually found an even simpler solution that what you had suggested. For each RootN that is the source of at least one gadget, I DFS to find all of the Visited nodes, and then check to see which node members of Visited are also the destinations of a gadget edge rooted at RootN.

608

Right. By the way, I decomposed elimEdges() into two functions:

  • elimEdges() just runs the edge elimination algorithm without rebuilding the gadget graph.
  • trimMitigatedEdges() wraps elimEdges() and does rebuild (i.e., "trim") the gadget graph.
660

Yes... in hindsight this whole algorithm was more than a bit sloppy on my part. I completely revamped the algorithm and I think that now it is O(N + E). Please check and make sure you agree.

667

Yikes! Can't believe I overlooked this. I fixed the issue, and it looks like the algorithm had been cutting a few more edges in some cases than necessary.

685

I simplified this by inlining lambda with a descriptive comment.

717–718

Right. I added a clarifying comment.

mattdr accepted this revision.Apr 23 2020, 11:06 PM

Still some comments to address, but I think this is substantially close to where it needs to be. Thanks so much for your work making the code more straightforward and readable!

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
193–194

In the implementation the latter two parameters are called ElimEdges and ElimNodes, which I think are good names

369

Did you mean to remove this call to GraphBuilder::trim()? As it stands right now the loop actually reallocates the graph twice per iteration -- once in trimMitigatedEdges, once in this call to trim().

546

This is a lot clearer now, thanks!

Given what this is doing I think it makes sense to call it findMitigatedEdges.

549

"Eliminate CFG edges that target a fence, as they're trivially mitigated"

553

How are we sure we've removed all edges that pointed to this same destination node?

565

To make it easier to follow, let's call this something like ReachableGadgetSinkNodes

567–568

I think this is just intended as an optimization -- it's not necessary for correctness. Assuming that's right, suggest removing it since it doesn't really make things faster but it does add some extra complexity.

571

Maybe FindCFGReachableGadgetSinksDFS?

584

Still think it makes sense to add the root node to the reachable set (as it's trivially reachable from itself)

595

Suggest putting this at the top of the loop so it's obvious from the beginning that results aren't intended to accrue from iteration to iteration.

648

I really like that the refinement loop (trying to get to a fixed point) for the greedy algorithm is now entirely within this function. I'd suggest going one step further: take the loop out of runOnMachineFunction entirely and add one around the calls to the plugin in the lines above. That way we keep the implementation detail that "this needs to be run in a loop" as close to the actual algorithm as possible.

660

Each iteration loops over all edges and removes exactly one... so this is probably still O(E^2).

Seems like we can get it down closer to O(E * lg E) if we:

  1. Compute GadgetSources and GadgetSinks once
  2. Put the edges into a vector, then sorting them by weight
  3. Iterate through that vector and, for each edge, add it as an edge to cut if it is still relevant (i.e. not yet otherwise mitigated)

The insight here is that edge weights don't change, so mitigating an edge doesn't change the ranking of other edges.

That said, I'm happy with the readability of the current implementation and would be satisfied for now if we just add //FIXME: this is O(E^2), it could probably be O(E * lg E) with some work

727–728

You can skip the double-lookup and just do:

if (MachineGadgetGraph::isCFGEdge(E)) {
  if (CutEdges.insert(E)) {  // returns true if not already present
    ++AdditionalEdgesCut;
  }
}
This revision is now accepted and ready to land.Apr 23 2020, 11:06 PM
sconstab updated this revision to Diff 260441.Apr 27 2020, 1:17 PM
sconstab marked 14 inline comments as done.

Addressed comments from @mattdr.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
546

I changed it to elimMitigatedEdgesAndNodes()

553

I think it does? This loop iterates through ALL edges. For each CFG edge that ingresses a fence, add that fence to ElimNodes, and add that edge and all egress CFG edges to ElimEdges. The loop does *not* skip over edges that have been added to ElimEdges, and therefore I think this should ensure that all CFG edges pointing to a given fence are removed.

565

But that isn't accurate.. The algorithm finds all reachable nodes, not just gadget sinks. I renamed to ReachableNodes.

567–568

Why do you think it doesn't make things faster? A majority of nodes in the graph are not gadget sinks, and this majority tends to grow rapidly as gadgets become mitigated. Each time this check passes, it saves an entire DFS traversal and one O(E) loop through the edges.

571

Similar to above, this is not accurate as it finds all reachable nodes, not just gadget sinks. I did rename to FindReachableNodes and added detail to the comment.

660

I think your description actually implies that this greedy heuristic cannot be less that O(E^2), right?

Iterate through that vector and, for each edge, add it as an edge to cut if it is still relevant (i.e. not yet otherwise mitigated)

The "if it is still relevant" is an O(E) operation, so the whole thing should be O(E^2).

Regardless, I did add a comment to this effect.

sconstab requested review of this revision.Apr 27 2020, 1:18 PM

Eliminated the NoFixedLoads feature in D75936, which simplified this patch quite a bit.

mattdr marked 2 inline comments as done.Apr 27 2020, 3:10 PM
mattdr added inline comments.
llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
553

Sure enough, that makes sense.

567–568

Ack! Yes, it seems I consistently misread this function and missed the difference between isGadgetEdge in some places and IsCFGEdge in others. (As I mentioned, having both represented in the same graph is confusing.) Now I see this is a different check than is done in the DFS.

sconstab updated this revision to Diff 261969.May 4 2020, 5:11 PM

Rebase onto master.

mattdr accepted this revision.May 8 2020, 12:01 PM
This revision is now accepted and ready to land.May 8 2020, 12:01 PM
This revision was automatically updated to reflect the committed changes.