This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
44/65
X86LoadValueInjectionLoadHardening.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
lvi-hardening-loads.ll

Differential D75937

Add Support to X86 for Load Hardening to Mitigate Load Value Injection (LVI) [5/6]
ClosedPublic

Authored by sconstab on Mar 10 2020, 10:00 AM.

Download Raw Diff

Details

Reviewers

andrew.w.kaylor
chandlerc
zbrid
george.burgess.iv
mattdr
craig.topper

Commits

rG8aa8abae349d: [X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI)
rG2530f4e0ce44: [X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI)
rG8ce078c7503d: [X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI)
rG62c42e29ba43: [X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI)

Summary

After finding all such gadgets in a given function, the pass minimally inserts LFENCE instructions in such a manner that the following property is satisfied: for all SOURCE+SINK pairs, all paths in the CFG from SOURCE to SINK contain at least one LFENCE instruction. The algorithm that implements this minimal insertion is influenced by an academic paper that minimally inserts memory fences for high-performance concurrent programs:

http://www.cs.ucr.edu/~lesani/companion/oopsla15/OOPSLA15.pdf

The algorithm implemented in this pass is as follows:

Build a condensed CFG (i.e., a GadgetGraph) consisting only of the following components:
- SOURCE instructions (also includes function arguments)
- SINK instructions
- Basic block entry points
- Basic block terminators
- LFENCE instructions
Analyze the GadgetGraph to determine which SOURCE+SINK pairs (i.e., gadgets) are already mitigated by existing LFENCEs. If all gadgets have been mitigated, go to step 6.
Use a heuristic or plugin to approximate minimal LFENCE insertion.
Insert one LFENCE along each CFG edge that was cut in step 3.
Go to step 2.
If any LFENCEs were inserted, return true from runOnFunction() to tell LLVM that the function was modified.

By default, the heuristic used in Step 3 is a greedy heuristic that avoids inserting LFENCEs into loops unless absolutely necessary. There is also a CLI option to load a plugin that can provide even better optimization, inserting fewer fences, while still mitigating all of the LVI gadgets. The plugin can be found here: https://github.com/intel/lvi-llvm-optimization-plugin, and a description of the pass's behavior with the plugin can be found here: https://software.intel.com/security-software-guidance/insights/optimized-mitigation-approach-load-value-injection.

Diff Detail

Event Timeline

sconstab created this revision.Mar 10 2020, 10:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2020, 10:00 AM

Herald added subscribers: jfb, hiraditya. · View Herald Transcript

sconstab added a reviewer: george.burgess.iv.Mar 11 2020, 9:27 AM

sconstab added a parent revision: D75932: Move RDF from Hexagon to Codegen [1/6].Mar 11 2020, 12:32 PM

sconstab added a parent revision: D75936: Add a Pass to X86 that builds a Condensed CFG for Load Value Injection (LVI) Gadgets [4/6].Mar 11 2020, 12:34 PM

sconstab retitled this revision from Add Support to X86 for Load Hardening to Mitigate Load Value Injection (LVI) to Add Support to X86 for Load Hardening to Mitigate Load Value Injection (LVI) [5/5].

sconstab edited the summary of this revision. (Show Details)Mar 11 2020, 1:58 PM

sconstab mentioned this in D75939: [x86][seses] Introduce SESES pass for LVI.Mar 12 2020, 8:32 AM

sconstab added a child revision: D76158: Add inline assembly load hardening mitigation for Load Value Injection (LVI) on X86 [6/6].Mar 13 2020, 1:50 PM

sconstab retitled this revision from Add Support to X86 for Load Hardening to Mitigate Load Value Injection (LVI) [5/5] to Add Support to X86 for Load Hardening to Mitigate Load Value Injection (LVI) [5/6].Mar 16 2020, 9:30 AM

Sorry only took a quick look for now.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
271–272	Should this also have FencesInserted = hardenLoads?

One fix to correctly count the number of fences inserted.

sconstab marked an inline comment as done.Mar 18 2020, 7:13 PM

LGTM

This revision is now accepted and ready to land.Apr 2 2020, 5:09 PM

Closed by commit rG62c42e29ba43: [X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI) (authored by sconstab, committed by craig.topper). · Explain WhyApr 3 2020, 2:06 PM

This revision was automatically updated to reflect the committed changes.

mattdr added a subscriber: mattdr.Apr 3 2020, 5:03 PM

mattdr added inline comments.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
629–632	Why aren't these `vector`s?

craig.topper reopened this revision.Apr 3 2020, 8:51 PM

This revision is now accepted and ready to land.Apr 3 2020, 8:51 PM

Rebase

This revision is now accepted and ready to land.Apr 3 2020, 8:53 PM

craig.topper planned changes to this revision.Apr 3 2020, 9:42 PM

sconstab added inline comments.Apr 3 2020, 9:50 PM

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
629–632	They're being passed to a C-compatible interface, so intuitively it made more sense to me. I also get paranoid because the array sizes will be copied into the std::vector struct, and from experience it seems I can never be 100% certain these copies will be optimized away.

sconstab added inline comments.Apr 3 2020, 9:55 PM

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
629–632	@mattdr (above)

Update for changes in D75936. Use std::vector for some arrays.

This revision is now accepted and ready to land.Apr 4 2020, 12:00 AM

-Use range-based for loops
-Remove uses of the Traits class since it doesn't support methods needed for range-based for loops.

sconstab added inline comments.Apr 4 2020, 1:24 PM

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
539	Would `EdgeRef E : Graph->edges()` be clearer here? Ditto for many other `for` loops. (not sure if the LLVM conventions dictate something specific for these loops)
544	This caught me off guard the first time I saw it because I didn't realize that LLVM had its own implementation that takes a range. For readability, would it be better to prefix with `llvm::`?
570	Should this be `MachineGadgetGraph::NodeRef`?
623	I think that `std::unique_ptr<T[]>` should work fine here...
636	I think this `for` can lose the brackets.
676	More places where `NodeRef` and `EdgeRef` might be clearer.

craig.topper marked 5 inline comments as done.Apr 4 2020, 2:14 PM

craig.topper added inline comments.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
539	Personally, I'd prefer not to hide the & or *. And EdgeRef/NodeRef only exist in the Traits class not the main class. It's also confusing that NodeRef is a pointer and not a reference so I'd like to not leak that outside the Traits class where its easier to see. LLVM is pretty conservative about the use of auto, but I figured in this case it wouldn't be unreasonable for a reader to understand that edges() returns Edge objects. But I'm happing to change it to MachineGadgetGraph::Edge.
544	Agreed. I'll change.
623	Agreed, but std::vector is far more common in the codebase and since it lives on the stack the extra capacity pointer shouldn't be a big deal.

craig.topper added inline comments.Apr 4 2020, 2:14 PM

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
626	I'm thinking about adding a method to Graph to get a node/edge's index so we can hide all these std::distance calls.
636	Will do.

-Add llvm:: to the for_each calls

Make use getNodeIndex/getEdgeIndex

craig.topper requested review of this revision.Apr 6 2020, 10:18 AM

Rebase on D75937

-Rebase on D75937

craig.topper marked an inline comment as done.Apr 8 2020, 2:02 PM

craig.topper added inline comments.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
383–427	doh this should be in the other patch.

Rebase again

craig.topper removed a parent revision: D75932: Move RDF from Hexagon to Codegen [1/6].Apr 10 2020, 5:29 PM

craig.topper removed a child revision: D76158: Add inline assembly load hardening mitigation for Load Value Injection (LVI) on X86 [6/6].

Rebase on top of D75937

Reviewed the algorithm, have to confess I skipped the tests.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
318	I know there was a lot of effort to make `ImmutableGraph` efficient, but this is an `O(N)` reallocation on every loop, right? In practice, how many times do we end up looping before we hit a fixed point?
533	Could be more descriptive to call this, say, `trimMitigatedEdges`
541	In what circumstance will we add fences to the graph as sources or sinks? Can we just avoid that at the point of insertion, rather than running this extra culling pass on every iteration?
559	I would write something stronger here: Start off assuming all gadgets originating at this source node have been mitigated and can be removed. Later we will add back unmitigated gadgets by erasing them from the removal set.
569	This is really, really hard to read and understand. I think that's in large part because we have this one graph that represents _both_ control-flow _and_ source/sink pairs. Given that it's the load-bearing part of the whole stack, let me offer the best way I've come up with to explain it, then a suggestion for making it simpler to follow. My summary based on a few readings: First we compute `GadgetSinks`, the set of gadget sinks whose source is the current root. Then we do a depth-first search of the control-flow graph to find all gadget sinks that are control-flow-reachable from the given root. When we find such a sink, we look to see if it is also in `GadgetSinks` -- again, a sink whose source is the current root -- at which point we know we have found an unmitigated gadget sink. We iterate through the root's gadget edges to find the specific edge that points to the current DFS node -- the unmitigated gadget sink -- and remove that edge from `ElimGadgets`, where we had previously added it on the presumption it was mitigated. One idea for making this easier to follow: Start off assuming no edges are mitigated. Don't pre-populate the `EdgeSet` with every edge. Use depth-first search to create `ReachableSinks`, the set of all gadget sinks (regardless of source) that are control-flow-reachable from `RootN`. (I don't think this adds more work than `TraverseDFS` already does.) Iterate over gadget edges from `RootN`. For each edge, ask: is the destination reachable? If so, add this to a new set, `GadgetEdges`. After iterating over all nodes, let `TrimEdges` be `AllEdges - GadgetEdges`. Then trim those.
571	We already capture `RootN` later on. We can remove the boolean conditional and make this more readable by writing `if (*N != RootN)` (maybe `if (N != &RootN)`? I've sort of lost track of the API for nodes and edges.) Pushing further: maybe we can just remove the tiny optimization we get by special-casing the root node in the interest of simplicity?
595	It's weird to have a negated predicate like this in a conditional with an `else`. Let's either rewrite it or un-negate it and flip the `if`/`else` blocks. Rewrite: `!ElimEdges.empty() \|\| !ElimNodes.empty()`
606	likewise `findEdgesToCut` or similar
609	I'm still not convinced it's worth the extra (untestable) complexity to add this plugin point, but I defer to the committer.
611	Extra `{}` should be unnecessary
645	Seems like `NI` is a vestige of a previous iteration
647	Every edge touched by this loop will also be touched by the other leg of this loop at one point or another ("E is a CFG edge"). Can we find a way to avoid the `O(E^2)` inner loop? For example: what if we made the outermost loop edge-major? Keep sets of mitigated source-nodes and sink-nodes. Create a list of all edges by weight, sort it, then go through the edges from lowest to highest weight and use them to try to mitigate a new source or sink node. There's a good chance I'm missing something fundamental there, and I thank you in advance for your patience explaining it to me. But if I can I'd really like to avoid repeating the `CheapestSoFar` comparison on edge weights in two places.
654	What guarantee do we have here that all of the gadget sinks have, in fact, been added to `GadgetSinks`? It seems to be up to the order in which we iterate through nodes and edges.
655	`NI` also seems like it comes from a previous version
672	A comment for what this lambda does, and how it's intended to be used, is much appreciated for the top-to-bottom reader Probably something like: When we add an `LFENCE` before an instruction, remove any CFG edges that point to that instruction because they all now refer to a mitigated codepath
685	`// Insert an LFENCE in this basic block`
686	`// ... at this point in the basic block`
704–705	This one took me a bit. It seems like the summary is: add a fence unless it would be redundant, i.e. if we can see the next instruction and see it is itself a fence

This revision now requires changes to proceed.Apr 15 2020, 1:23 AM

sconstab commandeered this revision.Apr 16 2020, 12:10 PM

sconstab edited reviewers, added: craig.topper; removed: sconstab.

annita.zhang added a subscriber: annita.zhang.Apr 20 2020, 7:13 PM

Addressed feedback from @mattdr

@mattdr Thanks for the very scrupulous review!

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
318	The answers to your questions are "yes" and "until all of the gadgets have been mitigated". The intent here is to optimize for the MILP plugin: https://github.com/intel/lvi-llvm-optimization-plugin The plugin eliminates lots of gadgets all at once, and therefore rebuilding the graph is a relatively rare event. I'm not sure it would be worthwhile to use an entirely different data structure and algorithm for the greedy mitigation strategy. What I have done is rewrite the Greedy heuristic to keep track of which edges have been cut and eliminated, thus never actually having to rebuild the graph.
533	Yes, done.
541	Fences are never added as sources or sinks. We need to know where the fences are so that we can eliminate gadget edges for which all CFG paths from source to sink cross a fence. We could perform this analysis at the point of insertion, i.e., during the `getGadgetGraph()` function, BUT this analysis would be much more expensive there because it would be performed on the actual `MachineFunction`, instead of on the gadget graph.
559	Well, I substantially rewrote this algorithm to build a set of mitigated edges, instead of trimming down a set of mitigated edges. I think that the new set of comments should be much easier to follow.
569	You of course are correct that this was terribly difficult to follow. I actually found an even simpler solution that what you had suggested. For each `RootN` that is the source of at least one gadget, I DFS to find all of the `Visited` nodes, and then check to see which node members of `Visited` are also the destinations of a gadget edge rooted at `RootN`.
595	Right. By the way, I decomposed `elimEdges()` into two functions: `elimEdges()` just runs the edge elimination algorithm without rebuilding the gadget graph. `trimMitigatedEdges()` wraps `elimEdges()` and does rebuild (i.e., "trim") the gadget graph.
647	Yes... in hindsight this whole algorithm was more than a bit sloppy on my part. I completely revamped the algorithm and I think that now it is O(N + E). Please check and make sure you agree.
654	Yikes! Can't believe I overlooked this. I fixed the issue, and it looks like the algorithm had been cutting a few more edges in some cases than necessary.
672	I simplified this by inlining lambda with a descriptive comment.
704–705	Right. I added a clarifying comment.

Still some comments to address, but I think this is substantially close to where it needs to be. Thanks so much for your work making the code more straightforward and readable!

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
163–164	In the implementation the latter two parameters are called `ElimEdges` and `ElimNodes`, which I think are good names
318	Did you mean to remove this call to `GraphBuilder::trim()`? As it stands right now the loop actually reallocates the graph twice per iteration -- once in `trimMitigatedEdges`, once in this call to `trim()`.
533	This is a lot clearer now, thanks! Given what this is doing I think it makes sense to call it `findMitigatedEdges`.
536	"Eliminate CFG edges that target a fence, as they're trivially mitigated"
540	How are we sure we've removed all edges that pointed to this same destination node?
552	To make it easier to follow, let's call this something like `ReachableGadgetSinkNodes`
554–555	I think this is just intended as an optimization -- it's not necessary for correctness. Assuming that's right, suggest removing it since it doesn't really make things faster but it does add some extra complexity.
558	Maybe `FindCFGReachableGadgetSinksDFS`?
571	Still think it makes sense to add the root node to the reachable set (as it's trivially reachable from itself)
582	Suggest putting this at the top of the loop so it's obvious from the beginning that results aren't intended to accrue from iteration to iteration.
635	I really like that the refinement loop (trying to get to a fixed point) for the greedy algorithm is now entirely within this function. I'd suggest going one step further: take the loop out of `runOnMachineFunction` entirely and add one around the calls to the plugin in the lines above. That way we keep the implementation detail that "this needs to be run in a loop" as close to the actual algorithm as possible.
647	Each iteration loops over all edges and removes exactly one... so this is probably still `O(E^2)`. Seems like we can get it down closer to `O(E * lg E)` if we: Compute `GadgetSources` and `GadgetSinks` once Put the edges into a vector, then sorting them by weight Iterate through that vector and, for each edge, add it as an edge to cut if it is still relevant (i.e. not yet otherwise mitigated) The insight here is that edge weights don't change, so mitigating an edge doesn't change the ranking of other edges. That said, I'm happy with the readability of the current implementation and would be satisfied for now if we just add `//FIXME: this is O(E^2), it could probably be O(E * lg E) with some work`
714–715	You can skip the double-lookup and just do: if (MachineGadgetGraph::isCFGEdge(E)) { if (CutEdges.insert(E)) { // returns true if not already present ++AdditionalEdgesCut; } }

This revision is now accepted and ready to land.Apr 23 2020, 11:06 PM

Addressed comments from @mattdr.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
533	I changed it to `elimMitigatedEdgesAndNodes()`
540	I think it does? This loop iterates through ALL edges. For each CFG edge that ingresses a fence, add that fence to `ElimNodes`, and add that edge and all egress CFG edges to `ElimEdges`. The loop does not skip over edges that have been added to `ElimEdges`, and therefore I think this should ensure that all CFG edges pointing to a given fence are removed.
552	But that isn't accurate.. The algorithm finds all reachable nodes, not just gadget sinks. I renamed to `ReachableNodes`.
554–555	Why do you think it doesn't make things faster? A majority of nodes in the graph are not gadget sinks, and this majority tends to grow rapidly as gadgets become mitigated. Each time this check passes, it saves an entire DFS traversal and one `O(E)` loop through the edges.
558	Similar to above, this is not accurate as it finds all reachable nodes, not just gadget sinks. I did rename to `FindReachableNodes` and added detail to the comment.
647	I think your description actually implies that this greedy heuristic cannot be less that `O(E^2)`, right? Iterate through that vector and, for each edge, add it as an edge to cut if it is still relevant (i.e. not yet otherwise mitigated) The "if it is still relevant" is an `O(E)` operation, so the whole thing should be `O(E^2)`. Regardless, I did add a comment to this effect.

Eliminated the NoFixedLoads feature in D75936, which simplified this patch quite a bit.

mattdr marked 2 inline comments as done.Apr 27 2020, 3:10 PM

mattdr added inline comments.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
540	Sure enough, that makes sense.
554–555	Ack! Yes, it seems I consistently misread this function and missed the difference between `isGadgetEdge` in some places and `IsCFGEdge` in others. (As I mentioned, having both represented in the same graph is confusing.) Now I see this is a different check than is done in the DFS.

Rebase onto master.

mattdr accepted this revision.May 8 2020, 12:01 PM

This revision is now accepted and ready to land.May 8 2020, 12:01 PM

Closed by commit rG8ce078c7503d: [X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI) (authored by sconstab, committed by craig.topper). · Explain WhyMay 11 2020, 1:30 PM

This revision was automatically updated to reflect the committed changes.

craig.topper mentioned this in D79813: [Statepoint] Mark FixupStatepointCallerSaved as preserving the CFG.May 12 2020, 3:19 PM

craig.topper mentioned this in rGde92dc2850c1: [Statepoint] Mark FixupStatepointCallerSaved as preserving the CFG.May 13 2020, 11:25 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86LoadValueInjectionLoadHardening.cpp

309 lines

test/

CodeGen/

X86/

lvi-hardening-loads.ll

98 lines

Diff 261969

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp

//==-- X86LoadValueInjectionLoadHardening.cpp - LVI load hardening for x86 --=//		//==-- X86LoadValueInjectionLoadHardening.cpp - LVI load hardening for x86 --=//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
///		///
/// Description: This pass finds Load Value Injection (LVI) gadgets consisting		/// Description: This pass finds Load Value Injection (LVI) gadgets consisting
/// of a load from memory (i.e., SOURCE), and any operation that may transmit		/// of a load from memory (i.e., SOURCE), and any operation that may transmit
/// the value loaded from memory over a covert channel, or use the value loaded		/// the value loaded from memory over a covert channel, or use the value loaded
/// from memory to determine a branch/call target (i.e., SINK).		/// from memory to determine a branch/call target (i.e., SINK). After finding
		/// all such gadgets in a given function, the pass minimally inserts LFENCE
		/// instructions in such a manner that the following property is satisfied: for
		/// all SOURCE+SINK pairs, all paths in the CFG from SOURCE to SINK contain at
		/// least one LFENCE instruction. The algorithm that implements this minimal
		/// insertion is influenced by an academic paper that minimally inserts memory
		/// fences for high-performance concurrent programs:
		/// http://www.cs.ucr.edu/~lesani/companion/oopsla15/OOPSLA15.pdf
		/// The algorithm implemented in this pass is as follows:
		/// 1. Build a condensed CFG (i.e., a GadgetGraph) consisting only of the
		/// following components:
		/// - SOURCE instructions (also includes function arguments)
		/// - SINK instructions
		/// - Basic block entry points
		/// - Basic block terminators
		/// - LFENCE instructions
		/// 2. Analyze the GadgetGraph to determine which SOURCE+SINK pairs (i.e.,
		/// gadgets) are already mitigated by existing LFENCEs. If all gadgets have been
		/// mitigated, go to step 6.
		/// 3. Use a heuristic or plugin to approximate minimal LFENCE insertion.
		/// 4. Insert one LFENCE along each CFG edge that was cut in step 3.
		/// 5. Go to step 2.
		/// 6. If any LFENCEs were inserted, return `true` from runOnMachineFunction()
		/// to tell LLVM that the function was modified.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "ImmutableGraph.h"		#include "ImmutableGraph.h"
#include "X86.h"		#include "X86.h"
#include "X86Subtarget.h"		#include "X86Subtarget.h"
#include "X86TargetMachine.h"		#include "X86TargetMachine.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
Show All 11 Lines
#include "llvm/CodeGen/MachineLoopInfo.h"		#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/RDFGraph.h"		#include "llvm/CodeGen/RDFGraph.h"
#include "llvm/CodeGen/RDFLiveness.h"		#include "llvm/CodeGen/RDFLiveness.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/DOTGraphTraits.h"		#include "llvm/Support/DOTGraphTraits.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
		#include "llvm/Support/DynamicLibrary.h"
#include "llvm/Support/GraphWriter.h"		#include "llvm/Support/GraphWriter.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

using namespace llvm;		using namespace llvm;

#define PASS_KEY "x86-lvi-load"		#define PASS_KEY "x86-lvi-load"
#define DEBUG_TYPE PASS_KEY		#define DEBUG_TYPE PASS_KEY

		STATISTIC(NumFences, "Number of LFENCEs inserted for LVI mitigation");
STATISTIC(NumFunctionsConsidered, "Number of functions analyzed");		STATISTIC(NumFunctionsConsidered, "Number of functions analyzed");
STATISTIC(NumFunctionsMitigated, "Number of functions for which mitigations "		STATISTIC(NumFunctionsMitigated, "Number of functions for which mitigations "
"were deployed");		"were deployed");
STATISTIC(NumGadgets, "Number of LVI gadgets detected during analysis");		STATISTIC(NumGadgets, "Number of LVI gadgets detected during analysis");

		static cl::opt<std::string> OptimizePluginPath(
		PASS_KEY "-opt-plugin",
		cl::desc("Specify a plugin to optimize LFENCE insertion"), cl::Hidden);

static cl::opt<bool> NoConditionalBranches(		static cl::opt<bool> NoConditionalBranches(
PASS_KEY "-no-cbranch",		PASS_KEY "-no-cbranch",
cl::desc("Don't treat conditional branches as disclosure gadgets. This "		cl::desc("Don't treat conditional branches as disclosure gadgets. This "
"may improve performance, at the cost of security."),		"may improve performance, at the cost of security."),
cl::init(false), cl::Hidden);		cl::init(false), cl::Hidden);

static cl::opt<bool> EmitDot(		static cl::opt<bool> EmitDot(
PASS_KEY "-dot",		PASS_KEY "-dot",
cl::desc(		cl::desc(
"For each function, emit a dot graph depicting potential LVI gadgets"),		"For each function, emit a dot graph depicting potential LVI gadgets"),
cl::init(false), cl::Hidden);		cl::init(false), cl::Hidden);

static cl::opt<bool> EmitDotOnly(		static cl::opt<bool> EmitDotOnly(
PASS_KEY "-dot-only",		PASS_KEY "-dot-only",
cl::desc("For each function, emit a dot graph depicting potential LVI "		cl::desc("For each function, emit a dot graph depicting potential LVI "
"gadgets, and do not insert any fences"),		"gadgets, and do not insert any fences"),
cl::init(false), cl::Hidden);		cl::init(false), cl::Hidden);

static cl::opt<bool> EmitDotVerify(		static cl::opt<bool> EmitDotVerify(
PASS_KEY "-dot-verify",		PASS_KEY "-dot-verify",
cl::desc("For each function, emit a dot graph to stdout depicting "		cl::desc("For each function, emit a dot graph to stdout depicting "
"potential LVI gadgets, used for testing purposes only"),		"potential LVI gadgets, used for testing purposes only"),
cl::init(false), cl::Hidden);		cl::init(false), cl::Hidden);

		static llvm::sys::DynamicLibrary OptimizeDL;
		typedef int (OptimizeCutT)(unsigned int nodes, unsigned int nodes_size,
		unsigned int edges, int edge_values,
		int cut_edges / out */, unsigned int edges_size);
		static OptimizeCutT OptimizeCut = nullptr;

namespace {		namespace {

struct MachineGadgetGraph : ImmutableGraph<MachineInstr *, int> {		struct MachineGadgetGraph : ImmutableGraph<MachineInstr *, int> {
static constexpr int GadgetEdgeSentinel = -1;		static constexpr int GadgetEdgeSentinel = -1;
static constexpr MachineInstr *const ArgNodeSentinel = nullptr;		static constexpr MachineInstr *const ArgNodeSentinel = nullptr;

using GraphT = ImmutableGraph<MachineInstr *, int>;		using GraphT = ImmutableGraph<MachineInstr *, int>;
using Node = typename GraphT::Node;		using Node = typename GraphT::Node;
Show All 35 Lines	private:
const X86Subtarget *STI;		const X86Subtarget *STI;
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
const TargetRegisterInfo *TRI;		const TargetRegisterInfo *TRI;

std::unique_ptr<MachineGadgetGraph>		std::unique_ptr<MachineGadgetGraph>
getGadgetGraph(MachineFunction &MF, const MachineLoopInfo &MLI,		getGadgetGraph(MachineFunction &MF, const MachineLoopInfo &MLI,
const MachineDominatorTree &MDT,		const MachineDominatorTree &MDT,
const MachineDominanceFrontier &MDF) const;		const MachineDominanceFrontier &MDF) const;
		int hardenLoadsWithPlugin(MachineFunction &MF,
		std::unique_ptr<MachineGadgetGraph> Graph) const;
		mattdrUnsubmitted Done Reply Inline Actions In the implementation the latter two parameters are called `ElimEdges` and `ElimNodes`, which I think are good names mattdr: In the implementation the latter two parameters are called `ElimEdges` and `ElimNodes`, which I…
		int hardenLoadsWithGreedyHeuristic(
		MachineFunction &MF, std::unique_ptr<MachineGadgetGraph> Graph) const;
		int elimMitigatedEdgesAndNodes(MachineGadgetGraph &G,
		EdgeSet &ElimEdges /* in, out */,
		NodeSet &ElimNodes /* in, out */) const;
		std::unique_ptr<MachineGadgetGraph>
		trimMitigatedEdges(std::unique_ptr<MachineGadgetGraph> Graph) const;
		void findAndCutEdges(MachineGadgetGraph &G,
		EdgeSet &CutEdges /* out */) const;
		int insertFences(MachineFunction &MF, MachineGadgetGraph &G,
		EdgeSet &CutEdges /* in, out */) const;
bool instrUsesRegToAccessMemory(const MachineInstr &I, unsigned Reg) const;		bool instrUsesRegToAccessMemory(const MachineInstr &I, unsigned Reg) const;
bool instrUsesRegToBranch(const MachineInstr &I, unsigned Reg) const;		bool instrUsesRegToBranch(const MachineInstr &I, unsigned Reg) const;
inline bool isFence(const MachineInstr *MI) const {		inline bool isFence(const MachineInstr *MI) const {
return MI && (MI->getOpcode() == X86::LFENCE \|\|		return MI && (MI->getOpcode() == X86::LFENCE \|\|
(STI->useLVIControlFlowIntegrity() && MI->isCall()));		(STI->useLVIControlFlowIntegrity() && MI->isCall()));
}		}
};		};

▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	bool X86LoadValueInjectionLoadHardeningPass::runOnMachineFunction(

// Don't skip functions with the "optnone" attr but participate in opt-bisect.		// Don't skip functions with the "optnone" attr but participate in opt-bisect.
const Function &F = MF.getFunction();		const Function &F = MF.getFunction();
if (!F.hasOptNone() && skipFunction(F))		if (!F.hasOptNone() && skipFunction(F))
return false;		return false;

++NumFunctionsConsidered;		++NumFunctionsConsidered;
TII = STI->getInstrInfo();		TII = STI->getInstrInfo();
TRI = STI->getRegisterInfo();		TRI = STI->getRegisterInfo();
LLVM_DEBUG(dbgs() << "Building gadget graph...\n");		LLVM_DEBUG(dbgs() << "Building gadget graph...\n");
		zbridUnsubmitted Done Reply Inline Actions Should this also have FencesInserted = hardenLoads? zbrid: Should this also have FencesInserted = hardenLoads?
const auto &MLI = getAnalysis<MachineLoopInfo>();		const auto &MLI = getAnalysis<MachineLoopInfo>();
const auto &MDT = getAnalysis<MachineDominatorTree>();		const auto &MDT = getAnalysis<MachineDominatorTree>();
const auto &MDF = getAnalysis<MachineDominanceFrontier>();		const auto &MDF = getAnalysis<MachineDominanceFrontier>();
std::unique_ptr<MachineGadgetGraph> Graph = getGadgetGraph(MF, MLI, MDT, MDF);		std::unique_ptr<MachineGadgetGraph> Graph = getGadgetGraph(MF, MLI, MDT, MDF);
LLVM_DEBUG(dbgs() << "Building gadget graph... Done\n");		LLVM_DEBUG(dbgs() << "Building gadget graph... Done\n");
if (Graph == nullptr)		if (Graph == nullptr)
return false; // didn't find any gadgets		return false; // didn't find any gadgets

Show All 13 Lines	if (FileError)
errs() << FileError.message();		errs() << FileError.message();
WriteGadgetGraph(FileOut, MF, Graph.get());		WriteGadgetGraph(FileOut, MF, Graph.get());
FileOut.close();		FileOut.close();
LLVM_DEBUG(dbgs() << "Emitting gadget graph... Done\n");		LLVM_DEBUG(dbgs() << "Emitting gadget graph... Done\n");
if (EmitDotOnly)		if (EmitDotOnly)
return false;		return false;
}		}

return 0;		int FencesInserted;
		if (!OptimizePluginPath.empty()) {
		if (!OptimizeDL.isValid()) {
		std::string ErrorMsg;
		OptimizeDL = llvm::sys::DynamicLibrary::getPermanentLibrary(
		OptimizePluginPath.c_str(), &ErrorMsg);
		if (!ErrorMsg.empty())
		report_fatal_error("Failed to load opt plugin: \"" + ErrorMsg + '\"');
		OptimizeCut = (OptimizeCutT)OptimizeDL.getAddressOfSymbol("optimize_cut");
		if (!OptimizeCut)
		report_fatal_error("Invalid optimization plugin");
		}
		FencesInserted = hardenLoadsWithPlugin(MF, std::move(Graph));
		} else { // Use the default greedy heuristic
		FencesInserted = hardenLoadsWithGreedyHeuristic(MF, std::move(Graph));
		}

		mattdrUnsubmitted Not Done Reply Inline Actions I know there was a lot of effort to make `ImmutableGraph` efficient, but this is an `O(N)` reallocation on every loop, right? In practice, how many times do we end up looping before we hit a fixed point? mattdr: I know there was a lot of effort to make `ImmutableGraph` efficient, but this is an `O(N)`…
		sconstabAuthorUnsubmitted Done Reply Inline Actions The answers to your questions are "yes" and "until all of the gadgets have been mitigated". The intent here is to optimize for the MILP plugin: https://github.com/intel/lvi-llvm-optimization-plugin The plugin eliminates lots of gadgets all at once, and therefore rebuilding the graph is a relatively rare event. I'm not sure it would be worthwhile to use an entirely different data structure and algorithm for the greedy mitigation strategy. What I have done is rewrite the Greedy heuristic to keep track of which edges have been cut and eliminated, thus never actually having to rebuild the graph. sconstab: The answers to your questions are "yes" and "until all of the gadgets have been mitigated". The…
		mattdrUnsubmitted Not Done Reply Inline Actions Did you mean to remove this call to `GraphBuilder::trim()`? As it stands right now the loop actually reallocates the graph twice per iteration -- once in `trimMitigatedEdges`, once in this call to `trim()`. mattdr: Did you mean to remove this call to `GraphBuilder::trim()`? As it stands right now the loop…
		if (FencesInserted > 0)
		++NumFunctionsMitigated;
		NumFences += FencesInserted;
		return (FencesInserted > 0);
}		}

std::unique_ptr<MachineGadgetGraph>		std::unique_ptr<MachineGadgetGraph>
X86LoadValueInjectionLoadHardeningPass::getGadgetGraph(		X86LoadValueInjectionLoadHardeningPass::getGadgetGraph(
MachineFunction &MF, const MachineLoopInfo &MLI,		MachineFunction &MF, const MachineLoopInfo &MLI,
const MachineDominatorTree &MDT,		const MachineDominatorTree &MDT,
const MachineDominanceFrontier &MDF) const {		const MachineDominanceFrontier &MDF) const {
using namespace rdf;		using namespace rdf;
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	std::function<void(NodeAddr<DefNode *>)> AnalyzeDefUseChain =
for (auto UA : I.second)		for (auto UA : I.second)
Uses.emplace(UA.first);		Uses.emplace(UA.first);
}		}
}		}
} else { // not a phi node		} else { // not a phi node
Uses.emplace(UseID);		Uses.emplace(UseID);
}		}
}		}

// For each use of the current def, we want to know whether:		// For each use of the current def, we want to know whether:
// (1) The use can leak the def'ed value,		// (1) The use can leak the def'ed value,
// (2) The use can further propagate the def'ed value to more defs		// (2) The use can further propagate the def'ed value to more defs
for (auto UseID : Uses) {		for (auto UseID : Uses) {
if (!UsesVisited.insert(UseID).second)		if (!UsesVisited.insert(UseID).second)
continue; // Already visited this use of the current def		continue; // Already visited this use of the current def

auto Use = DFG.addr<UseNode *>(UseID);		auto Use = DFG.addr<UseNode *>(UseID);
assert(!(Use.Addr->getFlags() & NodeAttrs::PhiRef));		assert(!(Use.Addr->getFlags() & NodeAttrs::PhiRef));
MachineOperand &UseMO = Use.Addr->getOp();		MachineOperand &UseMO = Use.Addr->getOp();
MachineInstr &UseMI = *UseMO.getParent();		MachineInstr &UseMI = *UseMO.getParent();
assert(UseMO.isReg());		assert(UseMO.isReg());

// We naively assume that an instruction propagates any loaded		// We naively assume that an instruction propagates any loaded
// Uses to all Defs, unless the instruction is a call.		// Uses to all Defs, unless the instruction is a call.
if (UseMI.isCall())		if (UseMI.isCall())
continue;		continue;

// Check whether this use can transmit (leak) its value.		// Check whether this use can transmit (leak) its value.
if (instrUsesRegToAccessMemory(UseMI, UseMO.getReg()) \|\|		if (instrUsesRegToAccessMemory(UseMI, UseMO.getReg()) \|\|
(!NoConditionalBranches &&		(!NoConditionalBranches &&
instrUsesRegToBranch(UseMI, UseMO.getReg()))) {		instrUsesRegToBranch(UseMI, UseMO.getReg()))) {
Transmitters[Def.Id].push_back(Use.Addr->getOwner(DFG).Id);		Transmitters[Def.Id].push_back(Use.Addr->getOwner(DFG).Id);
if (UseMI.mayLoad())		if (UseMI.mayLoad())
continue; // Found a transmitting load, stop traversing defs		continue; // Found a transmitting load, stop traversing defs
}		}

// Check whether the use propagates to more defs.		// Check whether the use propagates to more defs.
NodeAddr<InstrNode *> Owner{Use.Addr->getOwner(DFG)};		NodeAddr<InstrNode *> Owner{Use.Addr->getOwner(DFG)};
rdf::NodeList AnalyzedChildDefs;		rdf::NodeList AnalyzedChildDefs;
for (auto &ChildDef :		for (auto &ChildDef :
Owner.Addr->members_if(DataFlowGraph::IsDef, DFG)) {		Owner.Addr->members_if(DataFlowGraph::IsDef, DFG)) {
if (!DefsVisited.insert(ChildDef.Id).second)		if (!DefsVisited.insert(ChildDef.Id).second)
continue; // Already visited this def		continue; // Already visited this def
if (Def.Addr->getAttrs() & NodeAttrs::Dead)		if (Def.Addr->getAttrs() & NodeAttrs::Dead)
continue;		continue;
if (Def.Id == ChildDef.Id)		if (Def.Id == ChildDef.Id)
continue; // The def uses itself (e.g., increment loop counter)		continue; // The def uses itself (e.g., increment loop counter)

AnalyzeDefUseChain(ChildDef);		AnalyzeDefUseChain(ChildDef);

// Each def inherits all of its child defs' transmitters.		// Each def inherits all of its child defs' transmitters.
for (auto TransmitterId : Transmitters[ChildDef.Id])		for (auto TransmitterId : Transmitters[ChildDef.Id])
Transmitters[Def.Id].push_back(TransmitterId);		Transmitters[Def.Id].push_back(TransmitterId);
		craig.topperUnsubmitted Done Reply Inline Actions doh this should be in the other patch. craig.topper: doh this should be in the other patch.
}		}
}		}

// Remove duplicate transmitters		// Remove duplicate transmitters
auto &DefTransmitters = Transmitters[Def.Id];		auto &DefTransmitters = Transmitters[Def.Id];
llvm::sort(DefTransmitters);		llvm::sort(DefTransmitters);
DefTransmitters.erase(		DefTransmitters.erase(
std::unique(DefTransmitters.begin(), DefTransmitters.end()),		std::unique(DefTransmitters.begin(), DefTransmitters.end()),
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	X86LoadValueInjectionLoadHardeningPass::getGadgetGraph(
// GadgetGraph		// GadgetGraph
GraphIter ArgNode = MaybeAddNode(MachineGadgetGraph::ArgNodeSentinel).first;		GraphIter ArgNode = MaybeAddNode(MachineGadgetGraph::ArgNodeSentinel).first;
TraverseCFG(&MF.front(), ArgNode, 0);		TraverseCFG(&MF.front(), ArgNode, 0);
std::unique_ptr<MachineGadgetGraph> G{Builder.get(FenceCount, GadgetCount)};		std::unique_ptr<MachineGadgetGraph> G{Builder.get(FenceCount, GadgetCount)};
LLVM_DEBUG(dbgs() << "Found " << G->nodes_size() << " nodes\n");		LLVM_DEBUG(dbgs() << "Found " << G->nodes_size() << " nodes\n");
return G;		return G;
}		}

		// Returns the number of remaining gadget edges that could not be eliminated
		int X86LoadValueInjectionLoadHardeningPass::elimMitigatedEdgesAndNodes(
		mattdrUnsubmitted Done Reply Inline Actions Could be more descriptive to call this, say, `trimMitigatedEdges` mattdr: Could be more descriptive to call this, say, `trimMitigatedEdges`
		sconstabAuthorUnsubmitted Done Reply Inline Actions Yes, done. sconstab: Yes, done.
		mattdrUnsubmitted Done Reply Inline Actions This is a lot clearer now, thanks! Given what this is doing I think it makes sense to call it `findMitigatedEdges`. mattdr: This is a lot clearer now, thanks! Given what this is doing I think it makes sense to call it…
		sconstabAuthorUnsubmitted Done Reply Inline Actions I changed it to `elimMitigatedEdgesAndNodes()` sconstab: I changed it to `elimMitigatedEdgesAndNodes()`
		MachineGadgetGraph &G, MachineGadgetGraph::EdgeSet &ElimEdges /* in, out */,
		MachineGadgetGraph::NodeSet &ElimNodes /* in, out */) const {
		if (G.NumFences > 0) {
		mattdrUnsubmitted Done Reply Inline Actions "Eliminate CFG edges that target a fence, as they're trivially mitigated" mattdr: "Eliminate CFG edges that target a fence, as they're trivially mitigated"
		// Eliminate fences and CFG edges that ingress and egress the fence, as
		// they are trivially mitigated.
		for (const auto &E : G.edges()) {
		sconstabAuthorUnsubmitted Not Done Reply Inline Actions Would `EdgeRef E : Graph->edges()` be clearer here? Ditto for many other `for` loops. (not sure if the LLVM conventions dictate something specific for these loops) sconstab: Would `EdgeRef E : Graph->edges()` be clearer here? Ditto for many other `for` loops. (not…
		craig.topperUnsubmitted Done Reply Inline Actions Personally, I'd prefer not to hide the & or . And EdgeRef/NodeRef only exist in the Traits class not the main class. It's also confusing that NodeRef is a pointer and not a reference so I'd like to not leak that outside the Traits class where its easier to see. LLVM is pretty conservative about the use of auto, but I figured in this case it wouldn't be unreasonable for a reader to understand that edges() returns Edge objects. But I'm happing to change it to MachineGadgetGraph::Edge. craig.topper:* Personally, I'd prefer not to hide the & or *. And EdgeRef/NodeRef only exist in the Traits…
		const MachineGadgetGraph::Node *Dest = E.getDest();
		mattdrUnsubmitted Not Done Reply Inline Actions How are we sure we've removed all edges that pointed to this same destination node? mattdr: How are we sure we've removed all edges that pointed to this same destination node?
		sconstabAuthorUnsubmitted Done Reply Inline Actions I think it does? This loop iterates through ALL edges. For each CFG edge that ingresses a fence, add that fence to `ElimNodes`, and add that edge and all egress CFG edges to `ElimEdges`. The loop does not skip over edges that have been added to `ElimEdges`, and therefore I think this should ensure that all CFG edges pointing to a given fence are removed. sconstab: I think it does? This loop iterates through ALL edges. For each CFG edge that ingresses a fence…
		mattdrUnsubmitted Done Reply Inline Actions Sure enough, that makes sense. mattdr: Sure enough, that makes sense.
		if (isFence(Dest->getValue())) {
		mattdrUnsubmitted Not Done Reply Inline Actions In what circumstance will we add fences to the graph as sources or sinks? Can we just avoid that at the point of insertion, rather than running this extra culling pass on every iteration? mattdr: In what circumstance will we add fences to the graph as sources or sinks? Can we just avoid…
		sconstabAuthorUnsubmitted Done Reply Inline Actions Fences are never added as sources or sinks. We need to know where the fences are so that we can eliminate gadget edges for which all CFG paths from source to sink cross a fence. We could perform this analysis at the point of insertion, i.e., during the `getGadgetGraph()` function, BUT this analysis would be much more expensive there because it would be performed on the actual `MachineFunction`, instead of on the gadget graph. sconstab: Fences are never added as sources or sinks. We need to know where the fences are so that we can…
		ElimNodes.insert(*Dest);
		ElimEdges.insert(E);
		for (const auto &DE : Dest->edges())
		sconstabAuthorUnsubmitted Not Done Reply Inline Actions This caught me off guard the first time I saw it because I didn't realize that LLVM had its own implementation that takes a range. For readability, would it be better to prefix with `llvm::`? sconstab: This caught me off guard the first time I saw it because I didn't realize that LLVM had its own…
		craig.topperUnsubmitted Done Reply Inline Actions Agreed. I'll change. craig.topper: Agreed. I'll change.
		ElimEdges.insert(DE);
		}
		}
		}

		// Find and eliminate gadget edges that have been mitigated.
		int MitigatedGadgets = 0, RemainingGadgets = 0;
		MachineGadgetGraph::NodeSet ReachableNodes{G};
		mattdrUnsubmitted Done Reply Inline Actions To make it easier to follow, let's call this something like `ReachableGadgetSinkNodes` mattdr: To make it easier to follow, let's call this something like `ReachableGadgetSinkNodes`
		sconstabAuthorUnsubmitted Done Reply Inline Actions But that isn't accurate.. The algorithm finds all reachable nodes, not just gadget sinks. I renamed to `ReachableNodes`. sconstab: But that isn't accurate.. The algorithm finds all reachable nodes, not just gadget sinks. I…
		for (const auto &RootN : G.nodes()) {
		if (llvm::none_of(RootN.edges(), MachineGadgetGraph::isGadgetEdge))
		continue; // skip this node if it isn't a gadget source
		mattdrUnsubmitted Not Done Reply Inline Actions I think this is just intended as an optimization -- it's not necessary for correctness. Assuming that's right, suggest removing it since it doesn't really make things faster but it does add some extra complexity. mattdr: I think this is just intended as an optimization -- it's not necessary for correctness.
		sconstabAuthorUnsubmitted Done Reply Inline Actions Why do you think it doesn't make things faster? A majority of nodes in the graph are not gadget sinks, and this majority tends to grow rapidly as gadgets become mitigated. Each time this check passes, it saves an entire DFS traversal and one `O(E)` loop through the edges. sconstab: Why do you think it doesn't make things faster? A majority of nodes in the graph are not gadget…
		mattdrUnsubmitted Done Reply Inline Actions Ack! Yes, it seems I consistently misread this function and missed the difference between `isGadgetEdge` in some places and `IsCFGEdge` in others. (As I mentioned, having both represented in the same graph is confusing.) Now I see this is a different check than is done in the DFS. mattdr: Ack! Yes, it seems I consistently misread this function and missed the difference between…

		// Find all of the nodes that are CFG-reachable from RootN using DFS
		ReachableNodes.clear();
		mattdrUnsubmitted Done Reply Inline Actions Maybe `FindCFGReachableGadgetSinksDFS`? mattdr: Maybe `FindCFGReachableGadgetSinksDFS`?
		sconstabAuthorUnsubmitted Done Reply Inline Actions Similar to above, this is not accurate as it finds all reachable nodes, not just gadget sinks. I did rename to `FindReachableNodes` and added detail to the comment. sconstab: Similar to above, this is not accurate as it finds all reachable nodes, not just gadget sinks.
		std::function<void(const MachineGadgetGraph::Node *, bool)>
		mattdrUnsubmitted Not Done Reply Inline Actions I would write something stronger here: Start off assuming all gadgets originating at this source node have been mitigated and can be removed. Later we will add back unmitigated gadgets by erasing them from the removal set. mattdr: I would write something stronger here: Start off assuming all gadgets originating at this…
		sconstabAuthorUnsubmitted Done Reply Inline Actions Well, I substantially rewrote this algorithm to build a set of mitigated edges, instead of trimming down a set of mitigated edges. I think that the new set of comments should be much easier to follow. sconstab: Well, I substantially rewrote this algorithm to build a set of mitigated edges, instead of…
		FindReachableNodes =
		[&](const MachineGadgetGraph::Node *N, bool FirstNode) {
		if (!FirstNode)
		ReachableNodes.insert(*N);
		for (const auto &E : N->edges()) {
		const MachineGadgetGraph::Node *Dest = E.getDest();
		if (MachineGadgetGraph::isCFGEdge(E) &&
		!ElimEdges.contains(E) && !ReachableNodes.contains(*Dest))
		FindReachableNodes(Dest, false);
		}
		mattdrUnsubmitted Not Done Reply Inline Actions This is really, really hard to read and understand. I think that's in large part because we have this one graph that represents _both_ control-flow _and_ source/sink pairs. Given that it's the load-bearing part of the whole stack, let me offer the best way I've come up with to explain it, then a suggestion for making it simpler to follow. My summary based on a few readings: First we compute `GadgetSinks`, the set of gadget sinks whose source is the current root. Then we do a depth-first search of the control-flow graph to find all gadget sinks that are control-flow-reachable from the given root. When we find such a sink, we look to see if it is also in `GadgetSinks` -- again, a sink whose source is the current root -- at which point we know we have found an unmitigated gadget sink. We iterate through the root's gadget edges to find the specific edge that points to the current DFS node -- the unmitigated gadget sink -- and remove that edge from `ElimGadgets`, where we had previously added it on the presumption it was mitigated. One idea for making this easier to follow: Start off assuming no edges are mitigated. Don't pre-populate the `EdgeSet` with every edge. Use depth-first search to create `ReachableSinks`, the set of all gadget sinks (regardless of source) that are control-flow-reachable from `RootN`. (I don't think this adds more work than `TraverseDFS` already does.) Iterate over gadget edges from `RootN`. For each edge, ask: is the destination reachable? If so, add this to a new set, `GadgetEdges`. After iterating over all nodes, let `TrimEdges` be `AllEdges - GadgetEdges`. Then trim those. mattdr: This is really, really hard to read and understand. I think that's in large part because we…
		sconstabAuthorUnsubmitted Done Reply Inline Actions You of course are correct that this was terribly difficult to follow. I actually found an even simpler solution that what you had suggested. For each `RootN` that is the source of at least one gadget, I DFS to find all of the `Visited` nodes, and then check to see which node members of `Visited` are also the destinations of a gadget edge rooted at `RootN`. sconstab: You of course are correct that this was terribly difficult to follow. I actually found an even…
		};
		sconstabAuthorUnsubmitted Not Done Reply Inline Actions Should this be `MachineGadgetGraph::NodeRef`? sconstab: Should this be `MachineGadgetGraph::NodeRef`?
		FindReachableNodes(&RootN, true);
		mattdrUnsubmitted Not Done Reply Inline Actions We already capture `RootN` later on. We can remove the boolean conditional and make this more readable by writing `if (N != RootN)` (maybe `if (N != &RootN)`? I've sort of lost track of the API for nodes and edges.) Pushing further: maybe we can just remove the tiny optimization we get by special-casing the root node in the interest of simplicity? mattdr:* We already capture `RootN` later on. We can remove the boolean conditional and make this more…
		mattdrUnsubmitted Not Done Reply Inline Actions Still think it makes sense to add the root node to the reachable set (as it's trivially reachable from itself) mattdr: Still think it makes sense to add the root node to the reachable set (as it's trivially…

		// Any gadget whose sink is unreachable has been mitigated
		for (const auto &E : RootN.edges()) {
		if (MachineGadgetGraph::isGadgetEdge(E)) {
		if (ReachableNodes.contains(*E.getDest())) {
		// This gadget's sink is reachable
		++RemainingGadgets;
		} else { // This gadget's sink is unreachable, and therefore mitigated
		++MitigatedGadgets;
		ElimEdges.insert(E);
		}
		mattdrUnsubmitted Done Reply Inline Actions Suggest putting this at the top of the loop so it's obvious from the beginning that results aren't intended to accrue from iteration to iteration. mattdr: Suggest putting this at the top of the loop so it's obvious from the beginning that results…
		}
		}
		}
		return RemainingGadgets;
		}

		std::unique_ptr<MachineGadgetGraph>
		X86LoadValueInjectionLoadHardeningPass::trimMitigatedEdges(
		std::unique_ptr<MachineGadgetGraph> Graph) const {
		MachineGadgetGraph::NodeSet ElimNodes{*Graph};
		MachineGadgetGraph::EdgeSet ElimEdges{*Graph};
		int RemainingGadgets =
		elimMitigatedEdgesAndNodes(*Graph, ElimEdges, ElimNodes);
		mattdrUnsubmitted Not Done Reply Inline Actions It's weird to have a negated predicate like this in a conditional with an `else`. Let's either rewrite it or un-negate it and flip the `if`/`else` blocks. Rewrite: `!ElimEdges.empty() \|\| !ElimNodes.empty()` mattdr: It's weird to have a negated predicate like this in a conditional with an `else`. Let's either…
		sconstabAuthorUnsubmitted Done Reply Inline Actions Right. By the way, I decomposed `elimEdges()` into two functions: `elimEdges()` just runs the edge elimination algorithm without rebuilding the gadget graph. `trimMitigatedEdges()` wraps `elimEdges()` and does rebuild (i.e., "trim") the gadget graph. sconstab: Right. By the way, I decomposed `elimEdges()` into two functions: - `elimEdges()` just runs the…
		if (ElimEdges.empty() && ElimNodes.empty()) {
		Graph->NumFences = 0;
		Graph->NumGadgets = RemainingGadgets;
		} else {
		Graph = GraphBuilder::trim(Graph, ElimNodes, ElimEdges, 0 / NumFences */,
		RemainingGadgets);
		}
		return Graph;
		}

		int X86LoadValueInjectionLoadHardeningPass::hardenLoadsWithPlugin(
		mattdrUnsubmitted Done Reply Inline Actions likewise `findEdgesToCut` or similar mattdr: likewise `findEdgesToCut` or similar
		MachineFunction &MF, std::unique_ptr<MachineGadgetGraph> Graph) const {
		int FencesInserted = 0;

		mattdrUnsubmitted Done Reply Inline Actions I'm still not convinced it's worth the extra (untestable) complexity to add this plugin point, but I defer to the committer. mattdr: I'm still not convinced it's worth the extra (untestable) complexity to add this plugin point…
		do {
		LLVM_DEBUG(dbgs() << "Eliminating mitigated paths...\n");
		mattdrUnsubmitted Done Reply Inline Actions Extra `{}` should be unnecessary mattdr: Extra `{}` should be unnecessary
		Graph = trimMitigatedEdges(std::move(Graph));
		LLVM_DEBUG(dbgs() << "Eliminating mitigated paths... Done\n");
		if (Graph->NumGadgets == 0)
		break;

		LLVM_DEBUG(dbgs() << "Cutting edges...\n");
		EdgeSet CutEdges{*Graph};
		auto Nodes = std::make_unique<unsigned int[]>(Graph->nodes_size() +
		1 /* terminator node */);
		auto Edges = std::make_unique<unsigned int[]>(Graph->edges_size());
		auto EdgeCuts = std::make_unique<int[]>(Graph->edges_size());
		auto EdgeValues = std::make_unique<int[]>(Graph->edges_size());
		sconstabAuthorUnsubmitted Not Done Reply Inline Actions I think that `std::unique_ptr<T[]>` should work fine here... sconstab: I think that `std::unique_ptr<T[]>` should work fine here...
		craig.topperUnsubmitted Done Reply Inline Actions Agreed, but std::vector is far more common in the codebase and since it lives on the stack the extra capacity pointer shouldn't be a big deal. craig.topper: Agreed, but std::vector is far more common in the codebase and since it lives on the stack the…
		for (const auto &N : Graph->nodes()) {
		Nodes[Graph->getNodeIndex(N)] = Graph->getEdgeIndex(*N.edges_begin());
		}
		craig.topperUnsubmitted Done Reply Inline Actions I'm thinking about adding a method to Graph to get a node/edge's index so we can hide all these std::distance calls. craig.topper: I'm thinking about adding a method to Graph to get a node/edge's index so we can hide all these…
		Nodes[Graph->nodes_size()] = Graph->edges_size(); // terminator node
		for (const auto &E : Graph->edges()) {
		Edges[Graph->getEdgeIndex(E)] = Graph->getNodeIndex(*E.getDest());
		EdgeValues[Graph->getEdgeIndex(E)] = E.getValue();
		}
		OptimizeCut(Nodes.get(), Graph->nodes_size(), Edges.get(), EdgeValues.get(),
		mattdrUnsubmitted Not Done Reply Inline Actions Why aren't these `vector`s? mattdr: Why aren't these `vector`s?
		sconstabAuthorUnsubmitted Not Done Reply Inline Actions They're being passed to a C-compatible interface, so intuitively it made more sense to me. I also get paranoid because the array sizes will be copied into the std::vector struct, and from experience it seems I can never be 100% certain these copies will be optimized away. sconstab: They're being passed to a C-compatible interface, so intuitively it made more sense to me. I…
		sconstabAuthorUnsubmitted Not Done Reply Inline Actions @mattdr (above) sconstab: @mattdr (above)
		EdgeCuts.get(), Graph->edges_size());
		for (int I = 0; I < Graph->edges_size(); ++I)
		if (EdgeCuts[I])
		mattdrUnsubmitted Done Reply Inline Actions I really like that the refinement loop (trying to get to a fixed point) for the greedy algorithm is now entirely within this function. I'd suggest going one step further: take the loop out of `runOnMachineFunction` entirely and add one around the calls to the plugin in the lines above. That way we keep the implementation detail that "this needs to be run in a loop" as close to the actual algorithm as possible. mattdr: I really like that the refinement loop (trying to get to a fixed point) for the greedy…
		CutEdges.set(I);
		sconstabAuthorUnsubmitted Not Done Reply Inline Actions I think this `for` can lose the brackets. sconstab: I think this `for` can lose the brackets.
		craig.topperUnsubmitted Done Reply Inline Actions Will do. craig.topper: Will do.
		LLVM_DEBUG(dbgs() << "Cutting edges... Done\n");
		LLVM_DEBUG(dbgs() << "Cut " << CutEdges.count() << " edges\n");

		LLVM_DEBUG(dbgs() << "Inserting LFENCEs...\n");
		FencesInserted += insertFences(MF, *Graph, CutEdges);
		LLVM_DEBUG(dbgs() << "Inserting LFENCEs... Done\n");
		LLVM_DEBUG(dbgs() << "Inserted " << FencesInserted << " fences\n");

		Graph = GraphBuilder::trim(Graph, MachineGadgetGraph::NodeSet{Graph},
		mattdrUnsubmitted Done Reply Inline Actions Seems like `NI` is a vestige of a previous iteration mattdr: Seems like `NI` is a vestige of a previous iteration
		CutEdges);
		} while (true);
		mattdrUnsubmitted Not Done Reply Inline Actions Every edge touched by this loop will also be touched by the other leg of this loop at one point or another ("E is a CFG edge"). Can we find a way to avoid the `O(E^2)` inner loop? For example: what if we made the outermost loop edge-major? Keep sets of mitigated source-nodes and sink-nodes. Create a list of all edges by weight, sort it, then go through the edges from lowest to highest weight and use them to try to mitigate a new source or sink node. There's a good chance I'm missing something fundamental there, and I thank you in advance for your patience explaining it to me. But if I can I'd really like to avoid repeating the `CheapestSoFar` comparison on edge weights in two places. mattdr: Every edge touched by this loop will //also// be touched by the other leg of this loop at one…
		sconstabAuthorUnsubmitted Done Reply Inline Actions Yes... in hindsight this whole algorithm was more than a bit sloppy on my part. I completely revamped the algorithm and I think that now it is O(N + E). Please check and make sure you agree. sconstab: Yes... in hindsight this whole algorithm was more than a bit sloppy on my part. I completely…
		mattdrUnsubmitted Not Done Reply Inline Actions Each iteration loops over all edges and removes exactly one... so this is probably still `O(E^2)`. Seems like we can get it down closer to `O(E * lg E)` if we: Compute `GadgetSources` and `GadgetSinks` once Put the edges into a vector, then sorting them by weight Iterate through that vector and, for each edge, add it as an edge to cut if it is still relevant (i.e. not yet otherwise mitigated) The insight here is that edge weights don't change, so mitigating an edge doesn't change the ranking of other edges. That said, I'm happy with the readability of the current implementation and would be satisfied for now if we just add `//FIXME: this is O(E^2), it could probably be O(E * lg E) with some work` mattdr: Each iteration loops over all edges and removes exactly one... so this is probably still `O…
		sconstabAuthorUnsubmitted Done Reply Inline Actions I think your description actually implies that this greedy heuristic cannot be less that `O(E^2)`, right? Iterate through that vector and, for each edge, add it as an edge to cut if it is still relevant (i.e. not yet otherwise mitigated) The "if it is still relevant" is an `O(E)` operation, so the whole thing should be `O(E^2)`. Regardless, I did add a comment to this effect. sconstab: I think your description actually implies that this greedy heuristic cannot be less that `O…

		return FencesInserted;
		}

		int X86LoadValueInjectionLoadHardeningPass::hardenLoadsWithGreedyHeuristic(
		MachineFunction &MF, std::unique_ptr<MachineGadgetGraph> Graph) const {
		LLVM_DEBUG(dbgs() << "Eliminating mitigated paths...\n");
		mattdrUnsubmitted Done Reply Inline Actions What guarantee do we have here that all of the gadget sinks have, in fact, been added to `GadgetSinks`? It seems to be up to the order in which we iterate through nodes and edges. mattdr: What guarantee do we have here that all of the gadget sinks have, in fact, been added to…
		sconstabAuthorUnsubmitted Done Reply Inline Actions Yikes! Can't believe I overlooked this. I fixed the issue, and it looks like the algorithm had been cutting a few more edges in some cases than necessary. sconstab: Yikes! Can't believe I overlooked this. I fixed the issue, and it looks like the algorithm had…
		Graph = trimMitigatedEdges(std::move(Graph));
		mattdrUnsubmitted Done Reply Inline Actions `NI` also seems like it comes from a previous version mattdr: `NI` also seems like it comes from a previous version
		LLVM_DEBUG(dbgs() << "Eliminating mitigated paths... Done\n");
		if (Graph->NumGadgets == 0)
		return 0;

		LLVM_DEBUG(dbgs() << "Cutting edges...\n");
		MachineGadgetGraph::NodeSet ElimNodes{Graph}, GadgetSinks{Graph};
		MachineGadgetGraph::EdgeSet ElimEdges{Graph}, CutEdges{Graph};
		auto IsCFGEdge = [&ElimEdges, &CutEdges](const MachineGadgetGraph::Edge &E) {
		return !ElimEdges.contains(E) && !CutEdges.contains(E) &&
		MachineGadgetGraph::isCFGEdge(E);
		};
		auto IsGadgetEdge = [&ElimEdges,
		&CutEdges](const MachineGadgetGraph::Edge &E) {
		return !ElimEdges.contains(E) && !CutEdges.contains(E) &&
		MachineGadgetGraph::isGadgetEdge(E);
		};

		mattdrUnsubmitted Done Reply Inline Actions A comment for what this lambda does, and how it's intended to be used, is much appreciated for the top-to-bottom reader Probably something like: When we add an `LFENCE` before an instruction, remove any CFG edges that point to that instruction because they all now refer to a mitigated codepath mattdr: A comment for what this lambda does, and how it's intended to be used, is much appreciated for…
		sconstabAuthorUnsubmitted Done Reply Inline Actions I simplified this by inlining lambda with a descriptive comment. sconstab: I simplified this by inlining lambda with a descriptive comment.
		// FIXME: this is O(E^2), we could probably do better.
		do {
		// Find the cheapest CFG edge that will eliminate a gadget (by being
		// egress from a SOURCE node or ingress to a SINK node), and cut it.
		sconstabAuthorUnsubmitted Not Done Reply Inline Actions More places where `NodeRef` and `EdgeRef` might be clearer. sconstab: More places where `NodeRef` and `EdgeRef` might be clearer.
		const MachineGadgetGraph::Edge *CheapestSoFar = nullptr;

		// First, collect all gadget source and sink nodes.
		MachineGadgetGraph::NodeSet GadgetSources{Graph}, GadgetSinks{Graph};
		for (const auto &N : Graph->nodes()) {
		if (ElimNodes.contains(N))
		continue;
		for (const auto &E : N.edges()) {
		if (IsGadgetEdge(E)) {
		mattdrUnsubmitted Done Reply Inline Actions `// Insert an LFENCE in this basic block` mattdr: `// Insert an LFENCE in this basic block`
		GadgetSources.insert(N);
		mattdrUnsubmitted Done Reply Inline Actions `// ... at this point in the basic block` mattdr: `// ... at this point in the basic block`
		GadgetSinks.insert(*E.getDest());
		}
		}
		}

		// Next, look for the cheapest CFG edge which, when cut, is guaranteed to
		// mitigate at least one gadget by either:
		// (a) being egress from a gadget source, or
		// (b) being ingress to a gadget sink.
		for (const auto &N : Graph->nodes()) {
		if (ElimNodes.contains(N))
		continue;
		for (const auto &E : N.edges()) {
		if (IsCFGEdge(E)) {
		if (GadgetSources.contains(N) \|\| GadgetSinks.contains(*E.getDest())) {
		if (!CheapestSoFar \|\| E.getValue() < CheapestSoFar->getValue())
		CheapestSoFar = &E;
		}
		}
		mattdrUnsubmitted Done Reply Inline Actions This one took me a bit. It seems like the summary is: add a fence unless it would be redundant, i.e. if we can see the next instruction and see it is itself a fence mattdr: This one took me a bit. It seems like the summary is: add a fence unless it would be redundant…
		sconstabAuthorUnsubmitted Done Reply Inline Actions Right. I added a clarifying comment. sconstab: Right. I added a clarifying comment.
		}
		}

		assert(CheapestSoFar && "Failed to cut an edge");
		CutEdges.insert(*CheapestSoFar);
		ElimEdges.insert(*CheapestSoFar);
		} while (elimMitigatedEdgesAndNodes(*Graph, ElimEdges, ElimNodes));
		LLVM_DEBUG(dbgs() << "Cutting edges... Done\n");
		LLVM_DEBUG(dbgs() << "Cut " << CutEdges.count() << " edges\n");

		mattdrUnsubmitted Done Reply Inline Actions You can skip the double-lookup and just do: if (MachineGadgetGraph::isCFGEdge(E)) { if (CutEdges.insert(E)) { // returns true if not already present ++AdditionalEdgesCut; } } mattdr: You can skip the double-lookup and just do: ``` if (MachineGadgetGraph::isCFGEdge(E)) { if…
		LLVM_DEBUG(dbgs() << "Inserting LFENCEs...\n");
		int FencesInserted = insertFences(MF, *Graph, CutEdges);
		LLVM_DEBUG(dbgs() << "Inserting LFENCEs... Done\n");
		LLVM_DEBUG(dbgs() << "Inserted " << FencesInserted << " fences\n");

		return FencesInserted;
		}

		int X86LoadValueInjectionLoadHardeningPass::insertFences(
		MachineFunction &MF, MachineGadgetGraph &G,
		EdgeSet &CutEdges /* in, out */) const {
		int FencesInserted = 0;
		for (const auto &N : G.nodes()) {
		for (const auto &E : N.edges()) {
		if (CutEdges.contains(E)) {
		MachineInstr MI = N.getValue(), Prev;
		MachineBasicBlock *MBB; // Insert an LFENCE in this MBB
		MachineBasicBlock::iterator InsertionPt; // ...at this point
		if (MI == MachineGadgetGraph::ArgNodeSentinel) {
		// insert LFENCE at beginning of entry block
		MBB = &MF.front();
		InsertionPt = MBB->begin();
		Prev = nullptr;
		} else if (MI->isBranch()) { // insert the LFENCE before the branch
		MBB = MI->getParent();
		InsertionPt = MI;
		Prev = MI->getPrevNode();
		// Remove all egress CFG edges from this branch because the inserted
		// LFENCE prevents gadgets from crossing the branch.
		for (const auto &E : N.edges()) {
		if (MachineGadgetGraph::isCFGEdge(E))
		CutEdges.insert(E);
		}
		} else { // insert the LFENCE after the instruction
		MBB = MI->getParent();
		InsertionPt = MI->getNextNode() ? MI->getNextNode() : MBB->end();
		Prev = InsertionPt == MBB->end()
		? (MBB->empty() ? nullptr : &MBB->back())
		: InsertionPt->getPrevNode();
		}
		// Ensure this insertion is not redundant (two LFENCEs in sequence).
		if ((InsertionPt == MBB->end() \|\| !isFence(&*InsertionPt)) &&
		(!Prev \|\| !isFence(Prev))) {
		BuildMI(*MBB, InsertionPt, DebugLoc(), TII->get(X86::LFENCE));
		++FencesInserted;
		}
		}
		}
		}
		return FencesInserted;
		}

bool X86LoadValueInjectionLoadHardeningPass::instrUsesRegToAccessMemory(		bool X86LoadValueInjectionLoadHardeningPass::instrUsesRegToAccessMemory(
const MachineInstr &MI, unsigned Reg) const {		const MachineInstr &MI, unsigned Reg) const {
if (!MI.mayLoadOrStore() \|\| MI.getOpcode() == X86::MFENCE \|\|		if (!MI.mayLoadOrStore() \|\| MI.getOpcode() == X86::MFENCE \|\|
MI.getOpcode() == X86::SFENCE \|\| MI.getOpcode() == X86::LFENCE)		MI.getOpcode() == X86::SFENCE \|\| MI.getOpcode() == X86::LFENCE)
return false;		return false;

// FIXME: This does not handle pseudo loading instruction like TCRETURN*		// FIXME: This does not handle pseudo loading instruction like TCRETURN*
const MCInstrDesc &Desc = MI.getDesc();		const MCInstrDesc &Desc = MI.getDesc();
Show All 40 Lines

llvm/test/CodeGen/X86/lvi-hardening-loads.ll

This file was added.

				; RUN: llc -verify-machineinstrs -mtriple=x86_64-unknown < %s \| FileCheck %s --check-prefix=X64 --check-prefix=X64-ALL
				; RUN: llc -verify-machineinstrs -mtriple=x86_64-unknown --x86-lvi-load-no-cbranch < %s \| FileCheck %s --check-prefix=X64

				; Function Attrs: noinline nounwind optnone uwtable
				define dso_local i32 @test(i32** %secret, i32 %secret_size) #0 {
				; X64-LABEL: test:
				entry:
				%secret.addr = alloca i32**, align 8
				%secret_size.addr = alloca i32, align 4
				%ret_val = alloca i32, align 4
				%i = alloca i32, align 4
				store i32 %secret, i32* %secret.addr, align 8
				store i32 %secret_size, i32* %secret_size.addr, align 4
				store i32 0, i32* %ret_val, align 4
				call void @llvm.x86.sse2.lfence()
				store i32 0, i32* %i, align 4
				br label %for.cond

				; X64: # %bb.0: # %entry
				; X64-NEXT: movq %rdi, -{{[0-9]+}}(%rsp)
				; X64-NEXT: movl %esi, -{{[0-9]+}}(%rsp)
				; X64-NEXT: movl $0, -{{[0-9]+}}(%rsp)
				; X64-NEXT: lfence
				; X64-NEXT: movl $0, -{{[0-9]+}}(%rsp)
				; X64-NEXT: jmp .LBB0_1

				for.cond: ; preds = %for.inc, %entry
				%0 = load i32, i32* %i, align 4
				%1 = load i32, i32* %secret_size.addr, align 4
				%cmp = icmp slt i32 %0, %1
				br i1 %cmp, label %for.body, label %for.end

				; X64: .LBB0_1: # %for.cond
				; X64-NEXT: # =>This Inner Loop Header: Depth=1
				; X64-NEXT: movl -{{[0-9]+}}(%rsp), %eax
				; X64-ALL-NEXT: lfence
				; X64-NEXT: cmpl -{{[0-9]+}}(%rsp), %eax
				; X64-ALL-NEXT: lfence
				; X64-NEXT: jge .LBB0_5

				for.body: ; preds = %for.cond
				%2 = load i32, i32* %i, align 4
				%rem = srem i32 %2, 2
				%cmp1 = icmp eq i32 %rem, 0
				br i1 %cmp1, label %if.then, label %if.end

				; X64: # %bb.2: # %for.body
				; X64-NEXT: # in Loop: Header=BB0_1 Depth=1
				; X64-NEXT: movl -{{[0-9]+}}(%rsp), %eax
				; X64-ALL-NEXT: lfence
				; X64-NEXT: movl %eax, %ecx
				; X64-NEXT: shrl $31, %ecx
				; X64-NEXT: addl %eax, %ecx
				; X64-NEXT: andl $-2, %ecx
				; X64-NEXT: cmpl %ecx, %eax
				; X64-NEXT: jne .LBB0_4

				if.then: ; preds = %for.body
				%3 = load i32, i32* %secret.addr, align 8
				%4 = load i32, i32* %ret_val, align 4
				%idxprom = sext i32 %4 to i64
				%arrayidx = getelementptr inbounds i32, i32* %3, i64 %idxprom
				%5 = load i32, i32* %arrayidx, align 8
				%6 = load i32, i32* %5, align 4
				store i32 %6, i32* %ret_val, align 4
				br label %if.end

				; X64: # %bb.3: # %if.then
				; X64-NEXT: # in Loop: Header=BB0_1 Depth=1
				; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rax
				; X64-NEXT: lfence
				; X64-NEXT: movslq -{{[0-9]+}}(%rsp), %rcx
				; X64-NEXT: lfence
				; X64-NEXT: movq (%rax,%rcx,8), %rax
				; X64-NEXT: lfence
				; X64-NEXT: movl (%rax), %eax
				; X64-NEXT: movl %eax, -{{[0-9]+}}(%rsp)
				; X64-NEXT: jmp .LBB0_4

				if.end: ; preds = %if.then, %for.body
				br label %for.inc

				for.inc: ; preds = %if.end
				%7 = load i32, i32* %i, align 4
				%inc = add nsw i32 %7, 1
				store i32 %inc, i32* %i, align 4
				br label %for.cond

				for.end: ; preds = %for.cond
				%8 = load i32, i32* %ret_val, align 4
				ret i32 %8
				}

				; Function Attrs: nounwind
				declare void @llvm.x86.sse2.lfence() #1

				attributes #0 = { "target-features"="+lvi-load-hardening" }
				attributes #1 = { nounwind }