This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
44/65
X86LoadValueInjectionLoadHardening.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
lvi-hardening-loads.ll

Differential D75937

Add Support to X86 for Load Hardening to Mitigate Load Value Injection (LVI) [5/6]
ClosedPublic

Authored by sconstab on Mar 10 2020, 10:00 AM.

Download Raw Diff

Details

Reviewers

andrew.w.kaylor
chandlerc
zbrid
george.burgess.iv
mattdr
craig.topper

Commits

rG8aa8abae349d: [X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI)
rG2530f4e0ce44: [X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI)
rG8ce078c7503d: [X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI)
rG62c42e29ba43: [X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI)

Summary

After finding all such gadgets in a given function, the pass minimally inserts LFENCE instructions in such a manner that the following property is satisfied: for all SOURCE+SINK pairs, all paths in the CFG from SOURCE to SINK contain at least one LFENCE instruction. The algorithm that implements this minimal insertion is influenced by an academic paper that minimally inserts memory fences for high-performance concurrent programs:

http://www.cs.ucr.edu/~lesani/companion/oopsla15/OOPSLA15.pdf

The algorithm implemented in this pass is as follows:

Build a condensed CFG (i.e., a GadgetGraph) consisting only of the following components:
- SOURCE instructions (also includes function arguments)
- SINK instructions
- Basic block entry points
- Basic block terminators
- LFENCE instructions
Analyze the GadgetGraph to determine which SOURCE+SINK pairs (i.e., gadgets) are already mitigated by existing LFENCEs. If all gadgets have been mitigated, go to step 6.
Use a heuristic or plugin to approximate minimal LFENCE insertion.
Insert one LFENCE along each CFG edge that was cut in step 3.
Go to step 2.
If any LFENCEs were inserted, return true from runOnFunction() to tell LLVM that the function was modified.

By default, the heuristic used in Step 3 is a greedy heuristic that avoids inserting LFENCEs into loops unless absolutely necessary. There is also a CLI option to load a plugin that can provide even better optimization, inserting fewer fences, while still mitigating all of the LVI gadgets. The plugin can be found here: https://github.com/intel/lvi-llvm-optimization-plugin, and a description of the pass's behavior with the plugin can be found here: https://software.intel.com/security-software-guidance/insights/optimized-mitigation-approach-load-value-injection.

Diff Detail

Event Timeline

sconstab created this revision.Mar 10 2020, 10:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2020, 10:00 AM

Herald added subscribers: jfb, hiraditya. · View Herald Transcript

sconstab added a reviewer: george.burgess.iv.Mar 11 2020, 9:27 AM

sconstab added a parent revision: D75932: Move RDF from Hexagon to Codegen [1/6].Mar 11 2020, 12:32 PM

sconstab added a parent revision: D75936: Add a Pass to X86 that builds a Condensed CFG for Load Value Injection (LVI) Gadgets [4/6].Mar 11 2020, 12:34 PM

sconstab retitled this revision from Add Support to X86 for Load Hardening to Mitigate Load Value Injection (LVI) to Add Support to X86 for Load Hardening to Mitigate Load Value Injection (LVI) [5/5].

sconstab edited the summary of this revision. (Show Details)Mar 11 2020, 1:58 PM

sconstab mentioned this in D75939: [x86][seses] Introduce SESES pass for LVI.Mar 12 2020, 8:32 AM

sconstab added a child revision: D76158: Add inline assembly load hardening mitigation for Load Value Injection (LVI) on X86 [6/6].Mar 13 2020, 1:50 PM

sconstab retitled this revision from Add Support to X86 for Load Hardening to Mitigate Load Value Injection (LVI) [5/5] to Add Support to X86 for Load Hardening to Mitigate Load Value Injection (LVI) [5/6].Mar 16 2020, 9:30 AM

Sorry only took a quick look for now.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
288	Should this also have FencesInserted = hardenLoads?

One fix to correctly count the number of fences inserted.

sconstab marked an inline comment as done.Mar 18 2020, 7:13 PM

LGTM

This revision is now accepted and ready to land.Apr 2 2020, 5:09 PM

Closed by commit rG62c42e29ba43: [X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI) (authored by sconstab, committed by craig.topper). · Explain WhyApr 3 2020, 2:06 PM

This revision was automatically updated to reflect the committed changes.

mattdr added a subscriber: mattdr.Apr 3 2020, 5:03 PM

mattdr added inline comments.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
626–629	Why aren't these `vector`s?

craig.topper reopened this revision.Apr 3 2020, 8:51 PM

This revision is now accepted and ready to land.Apr 3 2020, 8:51 PM

Rebase

This revision is now accepted and ready to land.Apr 3 2020, 8:53 PM

craig.topper planned changes to this revision.Apr 3 2020, 9:42 PM

sconstab added inline comments.Apr 3 2020, 9:50 PM

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
626–629	They're being passed to a C-compatible interface, so intuitively it made more sense to me. I also get paranoid because the array sizes will be copied into the std::vector struct, and from experience it seems I can never be 100% certain these copies will be optimized away.

sconstab added inline comments.Apr 3 2020, 9:55 PM

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
626–629	@mattdr (above)

Update for changes in D75936. Use std::vector for some arrays.

This revision is now accepted and ready to land.Apr 4 2020, 12:00 AM

-Use range-based for loops
-Remove uses of the Traits class since it doesn't support methods needed for range-based for loops.

sconstab added inline comments.Apr 4 2020, 1:24 PM

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
536	Would `EdgeRef E : Graph->edges()` be clearer here? Ditto for many other `for` loops. (not sure if the LLVM conventions dictate something specific for these loops)
541	This caught me off guard the first time I saw it because I didn't realize that LLVM had its own implementation that takes a range. For readability, would it be better to prefix with `llvm::`?
567	Should this be `MachineGadgetGraph::NodeRef`?
620	I think that `std::unique_ptr<T[]>` should work fine here...
633	I think this `for` can lose the brackets.
673	More places where `NodeRef` and `EdgeRef` might be clearer.

craig.topper marked 5 inline comments as done.Apr 4 2020, 2:14 PM

craig.topper added inline comments.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
536	Personally, I'd prefer not to hide the & or *. And EdgeRef/NodeRef only exist in the Traits class not the main class. It's also confusing that NodeRef is a pointer and not a reference so I'd like to not leak that outside the Traits class where its easier to see. LLVM is pretty conservative about the use of auto, but I figured in this case it wouldn't be unreasonable for a reader to understand that edges() returns Edge objects. But I'm happing to change it to MachineGadgetGraph::Edge.
541	Agreed. I'll change.
620	Agreed, but std::vector is far more common in the codebase and since it lives on the stack the extra capacity pointer shouldn't be a big deal.

craig.topper added inline comments.Apr 4 2020, 2:14 PM

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
623	I'm thinking about adding a method to Graph to get a node/edge's index so we can hide all these std::distance calls.
633	Will do.

-Add llvm:: to the for_each calls

Make use getNodeIndex/getEdgeIndex

craig.topper requested review of this revision.Apr 6 2020, 10:18 AM

Rebase on D75937

-Rebase on D75937

craig.topper marked an inline comment as done.Apr 8 2020, 2:02 PM

craig.topper added inline comments.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
447	doh this should be in the other patch.

Rebase again

craig.topper removed a parent revision: D75932: Move RDF from Hexagon to Codegen [1/6].Apr 10 2020, 5:29 PM

craig.topper removed a child revision: D76158: Add inline assembly load hardening mitigation for Load Value Injection (LVI) on X86 [6/6].

Rebase on top of D75937

Reviewed the algorithm, have to confess I skipped the tests.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
353	I know there was a lot of effort to make `ImmutableGraph` efficient, but this is an `O(N)` reallocation on every loop, right? In practice, how many times do we end up looping before we hit a fixed point?
530	Could be more descriptive to call this, say, `trimMitigatedEdges`
538	In what circumstance will we add fences to the graph as sources or sinks? Can we just avoid that at the point of insertion, rather than running this extra culling pass on every iteration?
556	I would write something stronger here: Start off assuming all gadgets originating at this source node have been mitigated and can be removed. Later we will add back unmitigated gadgets by erasing them from the removal set.
566	This is really, really hard to read and understand. I think that's in large part because we have this one graph that represents _both_ control-flow _and_ source/sink pairs. Given that it's the load-bearing part of the whole stack, let me offer the best way I've come up with to explain it, then a suggestion for making it simpler to follow. My summary based on a few readings: First we compute `GadgetSinks`, the set of gadget sinks whose source is the current root. Then we do a depth-first search of the control-flow graph to find all gadget sinks that are control-flow-reachable from the given root. When we find such a sink, we look to see if it is also in `GadgetSinks` -- again, a sink whose source is the current root -- at which point we know we have found an unmitigated gadget sink. We iterate through the root's gadget edges to find the specific edge that points to the current DFS node -- the unmitigated gadget sink -- and remove that edge from `ElimGadgets`, where we had previously added it on the presumption it was mitigated. One idea for making this easier to follow: Start off assuming no edges are mitigated. Don't pre-populate the `EdgeSet` with every edge. Use depth-first search to create `ReachableSinks`, the set of all gadget sinks (regardless of source) that are control-flow-reachable from `RootN`. (I don't think this adds more work than `TraverseDFS` already does.) Iterate over gadget edges from `RootN`. For each edge, ask: is the destination reachable? If so, add this to a new set, `GadgetEdges`. After iterating over all nodes, let `TrimEdges` be `AllEdges - GadgetEdges`. Then trim those.
568	We already capture `RootN` later on. We can remove the boolean conditional and make this more readable by writing `if (*N != RootN)` (maybe `if (N != &RootN)`? I've sort of lost track of the API for nodes and edges.) Pushing further: maybe we can just remove the tiny optimization we get by special-casing the root node in the interest of simplicity?
592	It's weird to have a negated predicate like this in a conditional with an `else`. Let's either rewrite it or un-negate it and flip the `if`/`else` blocks. Rewrite: `!ElimEdges.empty() \|\| !ElimNodes.empty()`
603	likewise `findEdgesToCut` or similar
606	I'm still not convinced it's worth the extra (untestable) complexity to add this plugin point, but I defer to the committer.
608	Extra `{}` should be unnecessary
642	Seems like `NI` is a vestige of a previous iteration
644	Every edge touched by this loop will also be touched by the other leg of this loop at one point or another ("E is a CFG edge"). Can we find a way to avoid the `O(E^2)` inner loop? For example: what if we made the outermost loop edge-major? Keep sets of mitigated source-nodes and sink-nodes. Create a list of all edges by weight, sort it, then go through the edges from lowest to highest weight and use them to try to mitigate a new source or sink node. There's a good chance I'm missing something fundamental there, and I thank you in advance for your patience explaining it to me. But if I can I'd really like to avoid repeating the `CheapestSoFar` comparison on edge weights in two places.
651	What guarantee do we have here that all of the gadget sinks have, in fact, been added to `GadgetSinks`? It seems to be up to the order in which we iterate through nodes and edges.
652	`NI` also seems like it comes from a previous version
669	A comment for what this lambda does, and how it's intended to be used, is much appreciated for the top-to-bottom reader Probably something like: When we add an `LFENCE` before an instruction, remove any CFG edges that point to that instruction because they all now refer to a mitigated codepath
682	`// Insert an LFENCE in this basic block`
683	`// ... at this point in the basic block`
701–702	This one took me a bit. It seems like the summary is: add a fence unless it would be redundant, i.e. if we can see the next instruction and see it is itself a fence

This revision now requires changes to proceed.Apr 15 2020, 1:23 AM

sconstab commandeered this revision.Apr 16 2020, 12:10 PM

sconstab edited reviewers, added: craig.topper; removed: sconstab.

annita.zhang added a subscriber: annita.zhang.Apr 20 2020, 7:13 PM

Addressed feedback from @mattdr

@mattdr Thanks for the very scrupulous review!

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
353	The answers to your questions are "yes" and "until all of the gadgets have been mitigated". The intent here is to optimize for the MILP plugin: https://github.com/intel/lvi-llvm-optimization-plugin The plugin eliminates lots of gadgets all at once, and therefore rebuilding the graph is a relatively rare event. I'm not sure it would be worthwhile to use an entirely different data structure and algorithm for the greedy mitigation strategy. What I have done is rewrite the Greedy heuristic to keep track of which edges have been cut and eliminated, thus never actually having to rebuild the graph.
530	Yes, done.
538	Fences are never added as sources or sinks. We need to know where the fences are so that we can eliminate gadget edges for which all CFG paths from source to sink cross a fence. We could perform this analysis at the point of insertion, i.e., during the `getGadgetGraph()` function, BUT this analysis would be much more expensive there because it would be performed on the actual `MachineFunction`, instead of on the gadget graph.
556	Well, I substantially rewrote this algorithm to build a set of mitigated edges, instead of trimming down a set of mitigated edges. I think that the new set of comments should be much easier to follow.
566	You of course are correct that this was terribly difficult to follow. I actually found an even simpler solution that what you had suggested. For each `RootN` that is the source of at least one gadget, I DFS to find all of the `Visited` nodes, and then check to see which node members of `Visited` are also the destinations of a gadget edge rooted at `RootN`.
592	Right. By the way, I decomposed `elimEdges()` into two functions: `elimEdges()` just runs the edge elimination algorithm without rebuilding the gadget graph. `trimMitigatedEdges()` wraps `elimEdges()` and does rebuild (i.e., "trim") the gadget graph.
644	Yes... in hindsight this whole algorithm was more than a bit sloppy on my part. I completely revamped the algorithm and I think that now it is O(N + E). Please check and make sure you agree.
651	Yikes! Can't believe I overlooked this. I fixed the issue, and it looks like the algorithm had been cutting a few more edges in some cases than necessary.
669	I simplified this by inlining lambda with a descriptive comment.
701–702	Right. I added a clarifying comment.

Still some comments to address, but I think this is substantially close to where it needs to be. Thanks so much for your work making the code more straightforward and readable!

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
177–178	In the implementation the latter two parameters are called `ElimEdges` and `ElimNodes`, which I think are good names
353	Did you mean to remove this call to `GraphBuilder::trim()`? As it stands right now the loop actually reallocates the graph twice per iteration -- once in `trimMitigatedEdges`, once in this call to `trim()`.
530	This is a lot clearer now, thanks! Given what this is doing I think it makes sense to call it `findMitigatedEdges`.
533	"Eliminate CFG edges that target a fence, as they're trivially mitigated"
537	How are we sure we've removed all edges that pointed to this same destination node?
549	To make it easier to follow, let's call this something like `ReachableGadgetSinkNodes`
551–552	I think this is just intended as an optimization -- it's not necessary for correctness. Assuming that's right, suggest removing it since it doesn't really make things faster but it does add some extra complexity.
555	Maybe `FindCFGReachableGadgetSinksDFS`?
568	Still think it makes sense to add the root node to the reachable set (as it's trivially reachable from itself)
579	Suggest putting this at the top of the loop so it's obvious from the beginning that results aren't intended to accrue from iteration to iteration.
632	I really like that the refinement loop (trying to get to a fixed point) for the greedy algorithm is now entirely within this function. I'd suggest going one step further: take the loop out of `runOnMachineFunction` entirely and add one around the calls to the plugin in the lines above. That way we keep the implementation detail that "this needs to be run in a loop" as close to the actual algorithm as possible.
644	Each iteration loops over all edges and removes exactly one... so this is probably still `O(E^2)`. Seems like we can get it down closer to `O(E * lg E)` if we: Compute `GadgetSources` and `GadgetSinks` once Put the edges into a vector, then sorting them by weight Iterate through that vector and, for each edge, add it as an edge to cut if it is still relevant (i.e. not yet otherwise mitigated) The insight here is that edge weights don't change, so mitigating an edge doesn't change the ranking of other edges. That said, I'm happy with the readability of the current implementation and would be satisfied for now if we just add `//FIXME: this is O(E^2), it could probably be O(E * lg E) with some work`
711–712	You can skip the double-lookup and just do: if (MachineGadgetGraph::isCFGEdge(E)) { if (CutEdges.insert(E)) { // returns true if not already present ++AdditionalEdgesCut; } }

This revision is now accepted and ready to land.Apr 23 2020, 11:06 PM

Addressed comments from @mattdr.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
530	I changed it to `elimMitigatedEdgesAndNodes()`
537	I think it does? This loop iterates through ALL edges. For each CFG edge that ingresses a fence, add that fence to `ElimNodes`, and add that edge and all egress CFG edges to `ElimEdges`. The loop does not skip over edges that have been added to `ElimEdges`, and therefore I think this should ensure that all CFG edges pointing to a given fence are removed.
549	But that isn't accurate.. The algorithm finds all reachable nodes, not just gadget sinks. I renamed to `ReachableNodes`.
551–552	Why do you think it doesn't make things faster? A majority of nodes in the graph are not gadget sinks, and this majority tends to grow rapidly as gadgets become mitigated. Each time this check passes, it saves an entire DFS traversal and one `O(E)` loop through the edges.
555	Similar to above, this is not accurate as it finds all reachable nodes, not just gadget sinks. I did rename to `FindReachableNodes` and added detail to the comment.
644	I think your description actually implies that this greedy heuristic cannot be less that `O(E^2)`, right? Iterate through that vector and, for each edge, add it as an edge to cut if it is still relevant (i.e. not yet otherwise mitigated) The "if it is still relevant" is an `O(E)` operation, so the whole thing should be `O(E^2)`. Regardless, I did add a comment to this effect.

Eliminated the NoFixedLoads feature in D75936, which simplified this patch quite a bit.

mattdr marked 2 inline comments as done.Apr 27 2020, 3:10 PM

mattdr added inline comments.

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp
537	Sure enough, that makes sense.
551–552	Ack! Yes, it seems I consistently misread this function and missed the difference between `isGadgetEdge` in some places and `IsCFGEdge` in others. (As I mentioned, having both represented in the same graph is confusing.) Now I see this is a different check than is done in the DFS.

Rebase onto master.

mattdr accepted this revision.May 8 2020, 12:01 PM

This revision is now accepted and ready to land.May 8 2020, 12:01 PM

Closed by commit rG8ce078c7503d: [X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI) (authored by sconstab, committed by craig.topper). · Explain WhyMay 11 2020, 1:30 PM

This revision was automatically updated to reflect the committed changes.

craig.topper mentioned this in D79813: [Statepoint] Mark FixupStatepointCallerSaved as preserving the CFG.May 12 2020, 3:19 PM

craig.topper mentioned this in rGde92dc2850c1: [Statepoint] Mark FixupStatepointCallerSaved as preserving the CFG.May 13 2020, 11:25 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86LoadValueInjectionLoadHardening.cpp

267 lines

test/

CodeGen/

X86/

lvi-hardening-loads.ll

102 lines

Diff 255085

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp

//==-- X86LoadValueInjectionLoadHardening.cpp - LVI load hardening for x86 --=//		//==-- X86LoadValueInjectionLoadHardening.cpp - LVI load hardening for x86 --=//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
///		///
/// Description: This pass finds Load Value Injection (LVI) gadgets consisting		/// Description: This pass finds Load Value Injection (LVI) gadgets consisting
/// of a load from memory (i.e., SOURCE), and any operation that may transmit		/// of a load from memory (i.e., SOURCE), and any operation that may transmit
/// the value loaded from memory over a covert channel, or use the value loaded		/// the value loaded from memory over a covert channel, or use the value loaded
/// from memory to determine a branch/call target (i.e., SINK).		/// from memory to determine a branch/call target (i.e., SINK). After finding
		/// all such gadgets in a given function, the pass minimally inserts LFENCE
		/// instructions in such a manner that the following property is satisfied: for
		/// all SOURCE+SINK pairs, all paths in the CFG from SOURCE to SINK contain at
		/// least one LFENCE instruction. The algorithm that implements this minimal
		/// insertion is influenced by an academic paper that minimally inserts memory
		/// fences for high-performance concurrent programs:
		/// http://www.cs.ucr.edu/~lesani/companion/oopsla15/OOPSLA15.pdf
		/// The algorithm implemented in this pass is as follows:
		/// 1. Build a condensed CFG (i.e., a GadgetGraph) consisting only of the
		/// following components:
		/// - SOURCE instructions (also includes function arguments)
		/// - SINK instructions
		/// - Basic block entry points
		/// - Basic block terminators
		/// - LFENCE instructions
		/// 2. Analyze the GadgetGraph to determine which SOURCE+SINK pairs (i.e.,
		/// gadgets) are already mitigated by existing LFENCEs. If all gadgets have been
		/// mitigated, go to step 6.
		/// 3. Use a heuristic or plugin to approximate minimal LFENCE insertion.
		/// 4. Insert one LFENCE along each CFG edge that was cut in step 3.
		/// 5. Go to step 2.
		/// 6. If any LFENCEs were inserted, return `true` from runOnFunction() to tell
		/// LLVM that the function was modified.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "ImmutableGraph.h"		#include "ImmutableGraph.h"
#include "X86.h"		#include "X86.h"
#include "X86Subtarget.h"		#include "X86Subtarget.h"
#include "X86TargetMachine.h"		#include "X86TargetMachine.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
Show All 11 Lines
#include "llvm/CodeGen/MachineLoopInfo.h"		#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/RDFGraph.h"		#include "llvm/CodeGen/RDFGraph.h"
#include "llvm/CodeGen/RDFLiveness.h"		#include "llvm/CodeGen/RDFLiveness.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/DOTGraphTraits.h"		#include "llvm/Support/DOTGraphTraits.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
		#include "llvm/Support/DynamicLibrary.h"
#include "llvm/Support/GraphWriter.h"		#include "llvm/Support/GraphWriter.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

using namespace llvm;		using namespace llvm;

#define PASS_KEY "x86-lvi-load"		#define PASS_KEY "x86-lvi-load"
#define DEBUG_TYPE PASS_KEY		#define DEBUG_TYPE PASS_KEY

		STATISTIC(NumFences, "Number of LFENCEs inserted for LVI mitigation");
STATISTIC(NumFunctionsConsidered, "Number of functions analyzed");		STATISTIC(NumFunctionsConsidered, "Number of functions analyzed");
STATISTIC(NumFunctionsMitigated, "Number of functions for which mitigations "		STATISTIC(NumFunctionsMitigated, "Number of functions for which mitigations "
"were deployed");		"were deployed");
STATISTIC(NumGadgets, "Number of LVI gadgets detected during analysis");		STATISTIC(NumGadgets, "Number of LVI gadgets detected during analysis");

		static cl::opt<std::string> OptimizePluginPath(
		PASS_KEY "-opt-plugin",
		cl::desc("Specify a plugin to optimize LFENCE insertion"), cl::Hidden);

static cl::opt<bool> NoConditionalBranches(		static cl::opt<bool> NoConditionalBranches(
PASS_KEY "-no-cbranch",		PASS_KEY "-no-cbranch",
cl::desc("Don't treat conditional branches as disclosure gadgets. This "		cl::desc("Don't treat conditional branches as disclosure gadgets. This "
"may improve performance, at the cost of security."),		"may improve performance, at the cost of security."),
cl::init(false), cl::Hidden);		cl::init(false), cl::Hidden);

static cl::opt<bool> EmitDot(		static cl::opt<bool> EmitDot(
PASS_KEY "-dot",		PASS_KEY "-dot",
Show All 14 Lines	static cl::opt<bool> EmitDotVerify(
cl::init(false), cl::Hidden);		cl::init(false), cl::Hidden);

static cl::opt<bool> NoFixedLoads(		static cl::opt<bool> NoFixedLoads(
PASS_KEY "-no-fixed",		PASS_KEY "-no-fixed",
cl::desc("Don't mitigate RIP-relative or RSP-relative loads. This "		cl::desc("Don't mitigate RIP-relative or RSP-relative loads. This "
"may improve performance, at the cost of security."),		"may improve performance, at the cost of security."),
cl::init(false), cl::Hidden);		cl::init(false), cl::Hidden);

		static llvm::sys::DynamicLibrary OptimizeDL{};
		typedef int (OptimizeCutT)(unsigned int nodes, unsigned int nodes_size,
		unsigned int edges, int edge_values,
		int cut_edges / out */, unsigned int edges_size);
		static OptimizeCutT OptimizeCut = nullptr;

#define ARG_NODE nullptr		#define ARG_NODE nullptr
#define GADGET_EDGE ((int)(-1))		#define GADGET_EDGE ((int)(-1))
#define WEIGHT(EdgeValue) ((double)(2 * (EdgeValue) + 1))		#define WEIGHT(EdgeValue) ((double)(2 * (EdgeValue) + 1))

namespace {		namespace {

class X86LoadValueInjectionLoadHardeningPass : public MachineFunctionPass {		class X86LoadValueInjectionLoadHardeningPass : public MachineFunctionPass {
public:		public:
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	private:
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
const TargetRegisterInfo *TRI;		const TargetRegisterInfo *TRI;

int hardenLoads(MachineFunction &MF, bool Fixed) const;		int hardenLoads(MachineFunction &MF, bool Fixed) const;
std::unique_ptr<MachineGadgetGraph>		std::unique_ptr<MachineGadgetGraph>
getGadgetGraph(MachineFunction &MF, const MachineLoopInfo &MLI,		getGadgetGraph(MachineFunction &MF, const MachineLoopInfo &MLI,
const MachineDominatorTree &MDT,		const MachineDominatorTree &MDT,
const MachineDominanceFrontier &MDF, bool FixedLoads) const;		const MachineDominanceFrontier &MDF, bool FixedLoads) const;
		std::unique_ptr<MachineGadgetGraph>
		elimEdges(std::unique_ptr<MachineGadgetGraph> Graph) const;
		mattdrUnsubmitted Done Reply Inline Actions In the implementation the latter two parameters are called `ElimEdges` and `ElimNodes`, which I think are good names mattdr: In the implementation the latter two parameters are called `ElimEdges` and `ElimNodes`, which I…
		void cutEdges(MachineGadgetGraph &G, EdgeSet &CutEdges /* out */) const;
		int insertFences(MachineGadgetGraph &G,
		EdgeSet &CutEdges /* in, out */) const;

bool instrUsesRegToAccessMemory(const MachineInstr &I, unsigned Reg) const;		bool instrUsesRegToAccessMemory(const MachineInstr &I, unsigned Reg) const;
bool instrUsesRegToBranch(const MachineInstr &I, unsigned Reg) const;		bool instrUsesRegToBranch(const MachineInstr &I, unsigned Reg) const;
template <unsigned K> bool hasLoadFrom(const MachineInstr &MI) const;		template <unsigned K> bool hasLoadFrom(const MachineInstr &MI) const;
bool instrAccessesStackSlot(const MachineInstr &MI) const;		bool instrAccessesStackSlot(const MachineInstr &MI) const;
bool instrAccessesConstantPool(const MachineInstr &MI) const;		bool instrAccessesConstantPool(const MachineInstr &MI) const;
bool instrAccessesGOT(const MachineInstr &MI) const;		bool instrAccessesGOT(const MachineInstr &MI) const;
inline bool instrIsFixedAccess(const MachineInstr &MI) const {		inline bool instrIsFixedAccess(const MachineInstr &MI) const {
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	bool X86LoadValueInjectionLoadHardeningPass::runOnMachineFunction(
const Function &F = MF.getFunction();		const Function &F = MF.getFunction();
if (!F.hasOptNone() && skipFunction(F))		if (!F.hasOptNone() && skipFunction(F))
return false;		return false;

++NumFunctionsConsidered;		++NumFunctionsConsidered;
TII = STI->getInstrInfo();		TII = STI->getInstrInfo();
TRI = STI->getRegisterInfo();		TRI = STI->getRegisterInfo();
LLVM_DEBUG(dbgs() << "Hardening data-dependent loads...\n");		LLVM_DEBUG(dbgs() << "Hardening data-dependent loads...\n");
hardenLoads(MF, false);		int FencesInserted = hardenLoads(MF, false);
LLVM_DEBUG(dbgs() << "Hardening data-dependent loads... Done\n");		LLVM_DEBUG(dbgs() << "Hardening data-dependent loads... Done\n");
if (!NoFixedLoads) {		if (!NoFixedLoads) {
LLVM_DEBUG(dbgs() << "Hardening fixed loads...\n");		LLVM_DEBUG(dbgs() << "Hardening fixed loads...\n");
hardenLoads(MF, true);		FencesInserted += hardenLoads(MF, true);
		zbridUnsubmitted Done Reply Inline Actions Should this also have FencesInserted = hardenLoads? zbrid: Should this also have FencesInserted = hardenLoads?
LLVM_DEBUG(dbgs() << "Hardening fixed loads... Done\n");		LLVM_DEBUG(dbgs() << "Hardening fixed loads... Done\n");
}		}
return false;		if (FencesInserted > 0)
		++NumFunctionsMitigated;
		NumFences += FencesInserted;
		return (FencesInserted > 0);
}		}

// Apply the mitigation to `MF`, return the number of fences inserted.		// Apply the mitigation to `MF`, return the number of fences inserted.
// If `FixedLoads` is `true`, then the mitigation will be applied to fixed		// If `FixedLoads` is `true`, then the mitigation will be applied to fixed
// loads; otherwise, mitigation will be applied to non-fixed loads.		// loads; otherwise, mitigation will be applied to non-fixed loads.
int X86LoadValueInjectionLoadHardeningPass::hardenLoads(MachineFunction &MF,		int X86LoadValueInjectionLoadHardeningPass::hardenLoads(MachineFunction &MF,
bool FixedLoads) const {		bool FixedLoads) const {
		int FencesInserted = 0;

LLVM_DEBUG(dbgs() << "Building gadget graph...\n");		LLVM_DEBUG(dbgs() << "Building gadget graph...\n");
const auto &MLI = getAnalysis<MachineLoopInfo>();		const auto &MLI = getAnalysis<MachineLoopInfo>();
const auto &MDT = getAnalysis<MachineDominatorTree>();		const auto &MDT = getAnalysis<MachineDominatorTree>();
const auto &MDF = getAnalysis<MachineDominanceFrontier>();		const auto &MDF = getAnalysis<MachineDominanceFrontier>();
std::unique_ptr<MachineGadgetGraph> Graph =		std::unique_ptr<MachineGadgetGraph> Graph =
getGadgetGraph(MF, MLI, MDT, MDF, FixedLoads);		getGadgetGraph(MF, MLI, MDT, MDF, FixedLoads);
LLVM_DEBUG(dbgs() << "Building gadget graph... Done\n");		LLVM_DEBUG(dbgs() << "Building gadget graph... Done\n");
if (Graph == nullptr)		if (Graph == nullptr)
Show All 17 Lines	if (FileError)
errs() << FileError.message();		errs() << FileError.message();
WriteGraph(FileOut, Graph.get());		WriteGraph(FileOut, Graph.get());
FileOut.close();		FileOut.close();
LLVM_DEBUG(dbgs() << "Emitting gadget graph... Done\n");		LLVM_DEBUG(dbgs() << "Emitting gadget graph... Done\n");
if (EmitDotOnly)		if (EmitDotOnly)
return 0;		return 0;
}		}

return 0;		do {
		LLVM_DEBUG(dbgs() << "Eliminating mitigated paths...\n");
		std::unique_ptr<MachineGadgetGraph> ElimGraph = elimEdges(std::move(Graph));
		LLVM_DEBUG(dbgs() << "Eliminating mitigated paths... Done\n");
		if (ElimGraph->NumGadgets == 0)
		break;

		EdgeSet CutEdges{*ElimGraph};
		LLVM_DEBUG(dbgs() << "Cutting edges...\n");
		cutEdges(*ElimGraph, CutEdges);
		LLVM_DEBUG(dbgs() << "Cutting edges... Done\n");

		LLVM_DEBUG(dbgs() << "Inserting LFENCEs...\n");
		FencesInserted += insertFences(*ElimGraph, CutEdges);
		LLVM_DEBUG(dbgs() << "Inserting LFENCEs... Done\n");

		Graph = GraphBuilder::trim(
		mattdrUnsubmitted Not Done Reply Inline Actions I know there was a lot of effort to make `ImmutableGraph` efficient, but this is an `O(N)` reallocation on every loop, right? In practice, how many times do we end up looping before we hit a fixed point? mattdr: I know there was a lot of effort to make `ImmutableGraph` efficient, but this is an `O(N)`…
		sconstabAuthorUnsubmitted Done Reply Inline Actions The answers to your questions are "yes" and "until all of the gadgets have been mitigated". The intent here is to optimize for the MILP plugin: https://github.com/intel/lvi-llvm-optimization-plugin The plugin eliminates lots of gadgets all at once, and therefore rebuilding the graph is a relatively rare event. I'm not sure it would be worthwhile to use an entirely different data structure and algorithm for the greedy mitigation strategy. What I have done is rewrite the Greedy heuristic to keep track of which edges have been cut and eliminated, thus never actually having to rebuild the graph. sconstab: The answers to your questions are "yes" and "until all of the gadgets have been mitigated". The…
		mattdrUnsubmitted Not Done Reply Inline Actions Did you mean to remove this call to `GraphBuilder::trim()`? As it stands right now the loop actually reallocates the graph twice per iteration -- once in `trimMitigatedEdges`, once in this call to `trim()`. mattdr: Did you mean to remove this call to `GraphBuilder::trim()`? As it stands right now the loop…
		ElimGraph, MachineGadgetGraph::NodeSet{ElimGraph}, CutEdges);
		} while (true);

		return FencesInserted;
}		}

std::unique_ptr<X86LoadValueInjectionLoadHardeningPass::MachineGadgetGraph>		std::unique_ptr<X86LoadValueInjectionLoadHardeningPass::MachineGadgetGraph>
X86LoadValueInjectionLoadHardeningPass::getGadgetGraph(		X86LoadValueInjectionLoadHardeningPass::getGadgetGraph(
MachineFunction &MF, const MachineLoopInfo &MLI,		MachineFunction &MF, const MachineLoopInfo &MLI,
const MachineDominatorTree &MDT, const MachineDominanceFrontier &MDF,		const MachineDominatorTree &MDT, const MachineDominanceFrontier &MDF,
bool FixedLoads) const {		bool FixedLoads) const {
using namespace rdf;		using namespace rdf;
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	std::function<void(NodeAddr<DefNode *>)> AnalyzeDefUseChain =
Uses.push_back(Use);		Uses.push_back(Use);
}		}
}		}
for (auto N : Uses) {		for (auto N : Uses) {
NodeAddr<UseNode *> Use{N};		NodeAddr<UseNode *> Use{N};
if (NodesVisited.insert(Use.Id).second && AnalyzeUse(Use)) {		if (NodesVisited.insert(Use.Id).second && AnalyzeUse(Use)) {
NodeAddr<InstrNode *> Owner{Use.Addr->getOwner(DFG)};		NodeAddr<InstrNode *> Owner{Use.Addr->getOwner(DFG)};
NodeList Defs = Owner.Addr->members_if(DataFlowGraph::IsDef, DFG);		NodeList Defs = Owner.Addr->members_if(DataFlowGraph::IsDef, DFG);
for_each(Defs, AnalyzeDefUseChain);		llvm::for_each(Defs, AnalyzeDefUseChain);
		craig.topperUnsubmitted Done Reply Inline Actions doh this should be in the other patch. craig.topper: doh this should be in the other patch.
}		}
}		}
};		};
AnalyzeDefUseChain(Def);		AnalyzeDefUseChain(Def);
};		};

LLVM_DEBUG(dbgs() << "Analyzing def-use chains to find gadgets\n");		LLVM_DEBUG(dbgs() << "Analyzing def-use chains to find gadgets\n");
// Analyze function arguments		// Analyze function arguments
if (!FixedLoads) { // only need to analyze function args once		if (!FixedLoads) { // only need to analyze function args once
NodeAddr<BlockNode *> EntryBlock = DFG.getFunc().Addr->getEntryBlock(DFG);		NodeAddr<BlockNode *> EntryBlock = DFG.getFunc().Addr->getEntryBlock(DFG);
for (NodeAddr<PhiNode *> ArgPhi :		for (NodeAddr<PhiNode *> ArgPhi :
EntryBlock.Addr->members_if(DataFlowGraph::IsPhi, DFG)) {		EntryBlock.Addr->members_if(DataFlowGraph::IsPhi, DFG)) {
NodeList Defs = ArgPhi.Addr->members_if(DataFlowGraph::IsDef, DFG);		NodeList Defs = ArgPhi.Addr->members_if(DataFlowGraph::IsDef, DFG);
for_each(Defs, AnalyzeDef);		llvm::for_each(Defs, AnalyzeDef);
}		}
}		}
// Analyze every instruction in MF		// Analyze every instruction in MF
for (NodeAddr<BlockNode *> BA : DFG.getFunc().Addr->members(DFG)) {		for (NodeAddr<BlockNode *> BA : DFG.getFunc().Addr->members(DFG)) {
for (NodeAddr<StmtNode *> SA :		for (NodeAddr<StmtNode *> SA :
BA.Addr->members_if(DataFlowGraph::IsCode<NodeAttrs::Stmt>, DFG)) {		BA.Addr->members_if(DataFlowGraph::IsCode<NodeAttrs::Stmt>, DFG)) {
MachineInstr *MI = SA.Addr->getCode();		MachineInstr *MI = SA.Addr->getCode();
if (isFence(MI)) {		if (isFence(MI)) {
MaybeAddNode(MI);		MaybeAddNode(MI);
++FenceCount;		++FenceCount;
} else if (MI->mayLoad() && ((FixedLoads && instrIsFixedAccess(*MI)) \|\|		} else if (MI->mayLoad() && ((FixedLoads && instrIsFixedAccess(*MI)) \|\|
(!FixedLoads && !instrIsFixedAccess(*MI)))) {		(!FixedLoads && !instrIsFixedAccess(*MI)))) {
NodeList Defs = SA.Addr->members_if(DataFlowGraph::IsDef, DFG);		NodeList Defs = SA.Addr->members_if(DataFlowGraph::IsDef, DFG);
for_each(Defs, AnalyzeDef);		llvm::for_each(Defs, AnalyzeDef);
}		}
}		}
}		}
int GadgetCount = static_cast<int>(GadgetEdgeSet.size());		int GadgetCount = static_cast<int>(GadgetEdgeSet.size());
LLVM_DEBUG(dbgs() << "Found " << FenceCount << " fences\n");		LLVM_DEBUG(dbgs() << "Found " << FenceCount << " fences\n");
LLVM_DEBUG(dbgs() << "Found " << GadgetCount << " gadgets\n");		LLVM_DEBUG(dbgs() << "Found " << GadgetCount << " gadgets\n");
if (GadgetCount == 0)		if (GadgetCount == 0)
return nullptr;		return nullptr;
Show All 37 Lines	X86LoadValueInjectionLoadHardeningPass::getGadgetGraph(
// ARG_NODE is a pseudo-instruction that represents MF args in the GadgetGraph		// ARG_NODE is a pseudo-instruction that represents MF args in the GadgetGraph
GraphIter ArgNode = MaybeAddNode(ARG_NODE).first;		GraphIter ArgNode = MaybeAddNode(ARG_NODE).first;
TraverseCFG(&MF.front(), ArgNode, 0);		TraverseCFG(&MF.front(), ArgNode, 0);
std::unique_ptr<MachineGadgetGraph> G{Builder.get(FenceCount, GadgetCount)};		std::unique_ptr<MachineGadgetGraph> G{Builder.get(FenceCount, GadgetCount)};
LLVM_DEBUG(dbgs() << "Found " << G->nodes_size() << " nodes\n");		LLVM_DEBUG(dbgs() << "Found " << G->nodes_size() << " nodes\n");
return G;		return G;
}		}

		std::unique_ptr<X86LoadValueInjectionLoadHardeningPass::MachineGadgetGraph>
		X86LoadValueInjectionLoadHardeningPass::elimEdges(
		mattdrUnsubmitted Done Reply Inline Actions Could be more descriptive to call this, say, `trimMitigatedEdges` mattdr: Could be more descriptive to call this, say, `trimMitigatedEdges`
		sconstabAuthorUnsubmitted Done Reply Inline Actions Yes, done. sconstab: Yes, done.
		mattdrUnsubmitted Done Reply Inline Actions This is a lot clearer now, thanks! Given what this is doing I think it makes sense to call it `findMitigatedEdges`. mattdr: This is a lot clearer now, thanks! Given what this is doing I think it makes sense to call it…
		sconstabAuthorUnsubmitted Done Reply Inline Actions I changed it to `elimMitigatedEdgesAndNodes()` sconstab: I changed it to `elimMitigatedEdgesAndNodes()`
		std::unique_ptr<MachineGadgetGraph> Graph) const {
		MachineGadgetGraph::NodeSet ElimNodes{*Graph};
		MachineGadgetGraph::EdgeSet ElimEdges{*Graph};
		mattdrUnsubmitted Done Reply Inline Actions "Eliminate CFG edges that target a fence, as they're trivially mitigated" mattdr: "Eliminate CFG edges that target a fence, as they're trivially mitigated"

		if (Graph->NumFences > 0) { // eliminate fences
		for (const auto &E : Graph->edges()) {
		sconstabAuthorUnsubmitted Not Done Reply Inline Actions Would `EdgeRef E : Graph->edges()` be clearer here? Ditto for many other `for` loops. (not sure if the LLVM conventions dictate something specific for these loops) sconstab: Would `EdgeRef E : Graph->edges()` be clearer here? Ditto for many other `for` loops. (not…
		craig.topperUnsubmitted Done Reply Inline Actions Personally, I'd prefer not to hide the & or . And EdgeRef/NodeRef only exist in the Traits class not the main class. It's also confusing that NodeRef is a pointer and not a reference so I'd like to not leak that outside the Traits class where its easier to see. LLVM is pretty conservative about the use of auto, but I figured in this case it wouldn't be unreasonable for a reader to understand that edges() returns Edge objects. But I'm happing to change it to MachineGadgetGraph::Edge. craig.topper:* Personally, I'd prefer not to hide the & or *. And EdgeRef/NodeRef only exist in the Traits…
		const MachineGadgetGraph::Node *Dest = E.getDest();
		mattdrUnsubmitted Not Done Reply Inline Actions How are we sure we've removed all edges that pointed to this same destination node? mattdr: How are we sure we've removed all edges that pointed to this same destination node?
		sconstabAuthorUnsubmitted Done Reply Inline Actions I think it does? This loop iterates through ALL edges. For each CFG edge that ingresses a fence, add that fence to `ElimNodes`, and add that edge and all egress CFG edges to `ElimEdges`. The loop does not skip over edges that have been added to `ElimEdges`, and therefore I think this should ensure that all CFG edges pointing to a given fence are removed. sconstab: I think it does? This loop iterates through ALL edges. For each CFG edge that ingresses a fence…
		mattdrUnsubmitted Done Reply Inline Actions Sure enough, that makes sense. mattdr: Sure enough, that makes sense.
		if (isFence(Dest->getValue())) {
		mattdrUnsubmitted Not Done Reply Inline Actions In what circumstance will we add fences to the graph as sources or sinks? Can we just avoid that at the point of insertion, rather than running this extra culling pass on every iteration? mattdr: In what circumstance will we add fences to the graph as sources or sinks? Can we just avoid…
		sconstabAuthorUnsubmitted Done Reply Inline Actions Fences are never added as sources or sinks. We need to know where the fences are so that we can eliminate gadget edges for which all CFG paths from source to sink cross a fence. We could perform this analysis at the point of insertion, i.e., during the `getGadgetGraph()` function, BUT this analysis would be much more expensive there because it would be performed on the actual `MachineFunction`, instead of on the gadget graph. sconstab: Fences are never added as sources or sinks. We need to know where the fences are so that we can…
		ElimNodes.insert(Dest);
		ElimEdges.insert(&E);
		llvm::for_each(Dest->edges(),
		sconstabAuthorUnsubmitted Not Done Reply Inline Actions This caught me off guard the first time I saw it because I didn't realize that LLVM had its own implementation that takes a range. For readability, would it be better to prefix with `llvm::`? sconstab: This caught me off guard the first time I saw it because I didn't realize that LLVM had its own…
		craig.topperUnsubmitted Done Reply Inline Actions Agreed. I'll change. craig.topper: Agreed. I'll change.
		[&ElimEdges](const MachineGadgetGraph::Edge &E) {
		ElimEdges.insert(&E);
		});
		}
		}
		LLVM_DEBUG(dbgs() << "Eliminated " << ElimNodes.count()
		<< " fence nodes\n");
		}
		mattdrUnsubmitted Done Reply Inline Actions To make it easier to follow, let's call this something like `ReachableGadgetSinkNodes` mattdr: To make it easier to follow, let's call this something like `ReachableGadgetSinkNodes`
		sconstabAuthorUnsubmitted Done Reply Inline Actions But that isn't accurate.. The algorithm finds all reachable nodes, not just gadget sinks. I renamed to `ReachableNodes`. sconstab: But that isn't accurate.. The algorithm finds all reachable nodes, not just gadget sinks. I…

		// Eliminate gadget edges that are mitigated.
		int NumGadgets = 0;
		mattdrUnsubmitted Not Done Reply Inline Actions I think this is just intended as an optimization -- it's not necessary for correctness. Assuming that's right, suggest removing it since it doesn't really make things faster but it does add some extra complexity. mattdr: I think this is just intended as an optimization -- it's not necessary for correctness.
		sconstabAuthorUnsubmitted Done Reply Inline Actions Why do you think it doesn't make things faster? A majority of nodes in the graph are not gadget sinks, and this majority tends to grow rapidly as gadgets become mitigated. Each time this check passes, it saves an entire DFS traversal and one `O(E)` loop through the edges. sconstab: Why do you think it doesn't make things faster? A majority of nodes in the graph are not gadget…
		mattdrUnsubmitted Done Reply Inline Actions Ack! Yes, it seems I consistently misread this function and missed the difference between `isGadgetEdge` in some places and `IsCFGEdge` in others. (As I mentioned, having both represented in the same graph is confusing.) Now I see this is a different check than is done in the DFS. mattdr: Ack! Yes, it seems I consistently misread this function and missed the difference between…
		MachineGadgetGraph::NodeSet Visited{Graph}, GadgetSinks{Graph};
		MachineGadgetGraph::EdgeSet ElimGadgets{*Graph};
		for (const auto &RootN : Graph->nodes()) {
		mattdrUnsubmitted Done Reply Inline Actions Maybe `FindCFGReachableGadgetSinksDFS`? mattdr: Maybe `FindCFGReachableGadgetSinksDFS`?
		sconstabAuthorUnsubmitted Done Reply Inline Actions Similar to above, this is not accurate as it finds all reachable nodes, not just gadget sinks. I did rename to `FindReachableNodes` and added detail to the comment. sconstab: Similar to above, this is not accurate as it finds all reachable nodes, not just gadget sinks.
		// collect the gadgets for this node
		mattdrUnsubmitted Not Done Reply Inline Actions I would write something stronger here: Start off assuming all gadgets originating at this source node have been mitigated and can be removed. Later we will add back unmitigated gadgets by erasing them from the removal set. mattdr: I would write something stronger here: Start off assuming all gadgets originating at this…
		sconstabAuthorUnsubmitted Done Reply Inline Actions Well, I substantially rewrote this algorithm to build a set of mitigated edges, instead of trimming down a set of mitigated edges. I think that the new set of comments should be much easier to follow. sconstab: Well, I substantially rewrote this algorithm to build a set of mitigated edges, instead of…
		for (const auto &E : RootN.edges()) {
		if (MachineGadgetGraph::isGadgetEdge(E)) {
		++NumGadgets;
		ElimGadgets.insert(&E);
		GadgetSinks.insert(E.getDest());
		}
		}
		if (GadgetSinks.empty())
		continue;
		std::function<void(const MachineGadgetGraph::Node *, bool)> TraverseDFS =
		mattdrUnsubmitted Not Done Reply Inline Actions This is really, really hard to read and understand. I think that's in large part because we have this one graph that represents _both_ control-flow _and_ source/sink pairs. Given that it's the load-bearing part of the whole stack, let me offer the best way I've come up with to explain it, then a suggestion for making it simpler to follow. My summary based on a few readings: First we compute `GadgetSinks`, the set of gadget sinks whose source is the current root. Then we do a depth-first search of the control-flow graph to find all gadget sinks that are control-flow-reachable from the given root. When we find such a sink, we look to see if it is also in `GadgetSinks` -- again, a sink whose source is the current root -- at which point we know we have found an unmitigated gadget sink. We iterate through the root's gadget edges to find the specific edge that points to the current DFS node -- the unmitigated gadget sink -- and remove that edge from `ElimGadgets`, where we had previously added it on the presumption it was mitigated. One idea for making this easier to follow: Start off assuming no edges are mitigated. Don't pre-populate the `EdgeSet` with every edge. Use depth-first search to create `ReachableSinks`, the set of all gadget sinks (regardless of source) that are control-flow-reachable from `RootN`. (I don't think this adds more work than `TraverseDFS` already does.) Iterate over gadget edges from `RootN`. For each edge, ask: is the destination reachable? If so, add this to a new set, `GadgetEdges`. After iterating over all nodes, let `TrimEdges` be `AllEdges - GadgetEdges`. Then trim those. mattdr: This is really, really hard to read and understand. I think that's in large part because we…
		sconstabAuthorUnsubmitted Done Reply Inline Actions You of course are correct that this was terribly difficult to follow. I actually found an even simpler solution that what you had suggested. For each `RootN` that is the source of at least one gadget, I DFS to find all of the `Visited` nodes, and then check to see which node members of `Visited` are also the destinations of a gadget edge rooted at `RootN`. sconstab: You of course are correct that this was terribly difficult to follow. I actually found an even…
		[&](const MachineGadgetGraph::Node *N, bool FirstNode) {
		sconstabAuthorUnsubmitted Not Done Reply Inline Actions Should this be `MachineGadgetGraph::NodeRef`? sconstab: Should this be `MachineGadgetGraph::NodeRef`?
		if (!FirstNode) {
		mattdrUnsubmitted Not Done Reply Inline Actions We already capture `RootN` later on. We can remove the boolean conditional and make this more readable by writing `if (N != RootN)` (maybe `if (N != &RootN)`? I've sort of lost track of the API for nodes and edges.) Pushing further: maybe we can just remove the tiny optimization we get by special-casing the root node in the interest of simplicity? mattdr:* We already capture `RootN` later on. We can remove the boolean conditional and make this more…
		mattdrUnsubmitted Not Done Reply Inline Actions Still think it makes sense to add the root node to the reachable set (as it's trivially reachable from itself) mattdr: Still think it makes sense to add the root node to the reachable set (as it's trivially…
		Visited.insert(N);
		if (GadgetSinks.contains(N)) {
		for (const auto &E : RootN.edges()) {
		if (MachineGadgetGraph::isGadgetEdge(E) && E.getDest() == N)
		ElimGadgets.erase(&E);
		}
		}
		}
		for (const auto &E : N->edges()) {
		const MachineGadgetGraph::Node *Dest = E.getDest();
		if (MachineGadgetGraph::isCFGEdge(E) && !Visited.contains(Dest) &&
		mattdrUnsubmitted Done Reply Inline Actions Suggest putting this at the top of the loop so it's obvious from the beginning that results aren't intended to accrue from iteration to iteration. mattdr: Suggest putting this at the top of the loop so it's obvious from the beginning that results…
		!ElimEdges.contains(&E))
		TraverseDFS(Dest, false);
		}
		};
		TraverseDFS(&RootN, true);
		Visited.clear();
		GadgetSinks.clear();
		}
		LLVM_DEBUG(dbgs() << "Eliminated " << ElimGadgets.count()
		<< " gadget edges\n");
		ElimEdges \|= ElimGadgets;

		if (!(ElimEdges.empty() && ElimNodes.empty())) {
		mattdrUnsubmitted Not Done Reply Inline Actions It's weird to have a negated predicate like this in a conditional with an `else`. Let's either rewrite it or un-negate it and flip the `if`/`else` blocks. Rewrite: `!ElimEdges.empty() \|\| !ElimNodes.empty()` mattdr: It's weird to have a negated predicate like this in a conditional with an `else`. Let's either…
		sconstabAuthorUnsubmitted Done Reply Inline Actions Right. By the way, I decomposed `elimEdges()` into two functions: `elimEdges()` just runs the edge elimination algorithm without rebuilding the gadget graph. `trimMitigatedEdges()` wraps `elimEdges()` and does rebuild (i.e., "trim") the gadget graph. sconstab: Right. By the way, I decomposed `elimEdges()` into two functions: - `elimEdges()` just runs the…
		int NumRemainingGadgets = NumGadgets - ElimGadgets.count();
		Graph = GraphBuilder::trim(Graph, ElimNodes, ElimEdges, 0 / NumFences */,
		NumRemainingGadgets);
		} else {
		Graph->NumFences = 0;
		Graph->NumGadgets = NumGadgets;
		}
		return Graph;
		}

		void X86LoadValueInjectionLoadHardeningPass::cutEdges(
		mattdrUnsubmitted Done Reply Inline Actions likewise `findEdgesToCut` or similar mattdr: likewise `findEdgesToCut` or similar
		MachineGadgetGraph &G,
		MachineGadgetGraph::EdgeSet &CutEdges /* out */) const {
		if (!OptimizePluginPath.empty()) {
		mattdrUnsubmitted Done Reply Inline Actions I'm still not convinced it's worth the extra (untestable) complexity to add this plugin point, but I defer to the committer. mattdr: I'm still not convinced it's worth the extra (untestable) complexity to add this plugin point…
		if (!OptimizeDL.isValid()) {
		std::string ErrorMsg{};
		mattdrUnsubmitted Done Reply Inline Actions Extra `{}` should be unnecessary mattdr: Extra `{}` should be unnecessary
		OptimizeDL = llvm::sys::DynamicLibrary::getPermanentLibrary(
		OptimizePluginPath.c_str(), &ErrorMsg);
		if (!ErrorMsg.empty())
		report_fatal_error("Failed to load opt plugin: \"" + ErrorMsg + '\"');
		OptimizeCut = (OptimizeCutT)OptimizeDL.getAddressOfSymbol("optimize_cut");
		if (!OptimizeCut)
		report_fatal_error("Invalid optimization plugin");
		}
		std::vector<unsigned int> Nodes(G.nodes_size() + 1 /* terminator node */);
		std::vector<unsigned int> Edges(G.edges_size());
		std::vector<int> EdgeCuts(G.edges_size());
		std::vector<int> EdgeValues(G.edges_size());
		sconstabAuthorUnsubmitted Not Done Reply Inline Actions I think that `std::unique_ptr<T[]>` should work fine here... sconstab: I think that `std::unique_ptr<T[]>` should work fine here...
		craig.topperUnsubmitted Done Reply Inline Actions Agreed, but std::vector is far more common in the codebase and since it lives on the stack the extra capacity pointer shouldn't be a big deal. craig.topper: Agreed, but std::vector is far more common in the codebase and since it lives on the stack the…
		for (const auto &N : G.nodes()) {
		Nodes[std::distance(G.nodes_begin(), &N)] =
		std::distance(G.edges_begin(), N.edges_begin());
		craig.topperUnsubmitted Done Reply Inline Actions I'm thinking about adding a method to Graph to get a node/edge's index so we can hide all these std::distance calls. craig.topper: I'm thinking about adding a method to Graph to get a node/edge's index so we can hide all these…
		}
		Nodes[G.nodes_size()] = G.edges_size(); // terminator node
		for (const auto &E : G.edges()) {
		Edges[std::distance(G.edges_begin(), &E)] =
		std::distance(G.nodes_begin(), E.getDest());
		EdgeValues[std::distance(G.edges_begin(), &E)] = E.getValue();
		mattdrUnsubmitted Not Done Reply Inline Actions Why aren't these `vector`s? mattdr: Why aren't these `vector`s?
		sconstabAuthorUnsubmitted Not Done Reply Inline Actions They're being passed to a C-compatible interface, so intuitively it made more sense to me. I also get paranoid because the array sizes will be copied into the std::vector struct, and from experience it seems I can never be 100% certain these copies will be optimized away. sconstab: They're being passed to a C-compatible interface, so intuitively it made more sense to me. I…
		sconstabAuthorUnsubmitted Not Done Reply Inline Actions @mattdr (above) sconstab: @mattdr (above)
		}
		OptimizeCut(Nodes.data(), G.nodes_size(), Edges.data(), EdgeValues.data(),
		EdgeCuts.data(), G.edges_size());
		mattdrUnsubmitted Done Reply Inline Actions I really like that the refinement loop (trying to get to a fixed point) for the greedy algorithm is now entirely within this function. I'd suggest going one step further: take the loop out of `runOnMachineFunction` entirely and add one around the calls to the plugin in the lines above. That way we keep the implementation detail that "this needs to be run in a loop" as close to the actual algorithm as possible. mattdr: I really like that the refinement loop (trying to get to a fixed point) for the greedy…
		for (int I = 0; I < G.edges_size(); ++I)
		sconstabAuthorUnsubmitted Not Done Reply Inline Actions I think this `for` can lose the brackets. sconstab: I think this `for` can lose the brackets.
		craig.topperUnsubmitted Done Reply Inline Actions Will do. craig.topper: Will do.
		if (EdgeCuts[I])
		CutEdges.set(I);
		} else { // Use the default greedy heuristic
		// Find the cheapest CFG edge that will eliminate a gadget (by being egress
		// from a SOURCE node or ingress to a SINK node), and cut it.
		MachineGadgetGraph::NodeSet GadgetSinks{G};
		const MachineGadgetGraph::Edge *CheapestSoFar = nullptr;
		for (const auto &N : G.nodes()) {
		for (const auto &E : N.edges()) {
		mattdrUnsubmitted Done Reply Inline Actions Seems like `NI` is a vestige of a previous iteration mattdr: Seems like `NI` is a vestige of a previous iteration
		if (MachineGadgetGraph::isGadgetEdge(E)) {
		// NI is a SOURCE node. Look for a cheap egress edge
		mattdrUnsubmitted Not Done Reply Inline Actions Every edge touched by this loop will also be touched by the other leg of this loop at one point or another ("E is a CFG edge"). Can we find a way to avoid the `O(E^2)` inner loop? For example: what if we made the outermost loop edge-major? Keep sets of mitigated source-nodes and sink-nodes. Create a list of all edges by weight, sort it, then go through the edges from lowest to highest weight and use them to try to mitigate a new source or sink node. There's a good chance I'm missing something fundamental there, and I thank you in advance for your patience explaining it to me. But if I can I'd really like to avoid repeating the `CheapestSoFar` comparison on edge weights in two places. mattdr: Every edge touched by this loop will //also// be touched by the other leg of this loop at one…
		sconstabAuthorUnsubmitted Done Reply Inline Actions Yes... in hindsight this whole algorithm was more than a bit sloppy on my part. I completely revamped the algorithm and I think that now it is O(N + E). Please check and make sure you agree. sconstab: Yes... in hindsight this whole algorithm was more than a bit sloppy on my part. I completely…
		mattdrUnsubmitted Not Done Reply Inline Actions Each iteration loops over all edges and removes exactly one... so this is probably still `O(E^2)`. Seems like we can get it down closer to `O(E * lg E)` if we: Compute `GadgetSources` and `GadgetSinks` once Put the edges into a vector, then sorting them by weight Iterate through that vector and, for each edge, add it as an edge to cut if it is still relevant (i.e. not yet otherwise mitigated) The insight here is that edge weights don't change, so mitigating an edge doesn't change the ranking of other edges. That said, I'm happy with the readability of the current implementation and would be satisfied for now if we just add `//FIXME: this is O(E^2), it could probably be O(E * lg E) with some work` mattdr: Each iteration loops over all edges and removes exactly one... so this is probably still `O…
		sconstabAuthorUnsubmitted Done Reply Inline Actions I think your description actually implies that this greedy heuristic cannot be less that `O(E^2)`, right? Iterate through that vector and, for each edge, add it as an edge to cut if it is still relevant (i.e. not yet otherwise mitigated) The "if it is still relevant" is an `O(E)` operation, so the whole thing should be `O(E^2)`. Regardless, I did add a comment to this effect. sconstab: I think your description actually implies that this greedy heuristic cannot be less that `O…
		for (const auto &EE : N.edges()) {
		if (MachineGadgetGraph::isCFGEdge(EE)) {
		if (!CheapestSoFar \|\| EE.getValue() < CheapestSoFar->getValue())
		CheapestSoFar = &EE;
		}
		}
		GadgetSinks.insert(E.getDest());
		mattdrUnsubmitted Done Reply Inline Actions What guarantee do we have here that all of the gadget sinks have, in fact, been added to `GadgetSinks`? It seems to be up to the order in which we iterate through nodes and edges. mattdr: What guarantee do we have here that all of the gadget sinks have, in fact, been added to…
		sconstabAuthorUnsubmitted Done Reply Inline Actions Yikes! Can't believe I overlooked this. I fixed the issue, and it looks like the algorithm had been cutting a few more edges in some cases than necessary. sconstab: Yikes! Can't believe I overlooked this. I fixed the issue, and it looks like the algorithm had…
		} else { // E is a CFG edge
		mattdrUnsubmitted Done Reply Inline Actions `NI` also seems like it comes from a previous version mattdr: `NI` also seems like it comes from a previous version
		if (GadgetSinks.contains(E.getDest())) {
		// The dest is a SINK node. Hence EI is an ingress edge
		if (!CheapestSoFar \|\| E.getValue() < CheapestSoFar->getValue())
		CheapestSoFar = &E;
		}
		}
		}
		}
		assert(CheapestSoFar && "Failed to cut an edge");
		CutEdges.insert(CheapestSoFar);
		}
		LLVM_DEBUG(dbgs() << "Cut " << CutEdges.count() << " edges\n");
		}

		int X86LoadValueInjectionLoadHardeningPass::insertFences(
		MachineGadgetGraph &G, EdgeSet &CutEdges /* in, out */) const {
		int FencesInserted = 0, AdditionalEdgesCut = 0;
		mattdrUnsubmitted Done Reply Inline Actions A comment for what this lambda does, and how it's intended to be used, is much appreciated for the top-to-bottom reader Probably something like: When we add an `LFENCE` before an instruction, remove any CFG edges that point to that instruction because they all now refer to a mitigated codepath mattdr: A comment for what this lambda does, and how it's intended to be used, is much appreciated for…
		sconstabAuthorUnsubmitted Done Reply Inline Actions I simplified this by inlining lambda with a descriptive comment. sconstab: I simplified this by inlining lambda with a descriptive comment.
		auto CutAllCFGEdges =
		[&CutEdges, &AdditionalEdgesCut](const MachineGadgetGraph::Node *N) {
		for (const auto &E : N->edges()) {
		if (MachineGadgetGraph::isCFGEdge(E) && !CutEdges.contains(&E)) {
		sconstabAuthorUnsubmitted Not Done Reply Inline Actions More places where `NodeRef` and `EdgeRef` might be clearer. sconstab: More places where `NodeRef` and `EdgeRef` might be clearer.
		CutEdges.insert(&E);
		++AdditionalEdgesCut;
		}
		}
		};
		for (const auto &N : G.nodes()) {
		for (const auto &E : N.edges()) {
		if (CutEdges.contains(&E)) {
		MachineInstr MI = N.getValue(), Prev;
		mattdrUnsubmitted Done Reply Inline Actions `// Insert an LFENCE in this basic block` mattdr: `// Insert an LFENCE in this basic block`
		MachineBasicBlock *MBB;
		mattdrUnsubmitted Done Reply Inline Actions `// ... at this point in the basic block` mattdr: `// ... at this point in the basic block`
		MachineBasicBlock::iterator InsertionPt;
		if (MI == ARG_NODE) { // insert LFENCE at beginning of entry block
		MBB = &G.getMF().front();
		InsertionPt = MBB->begin();
		Prev = nullptr;
		} else if (MI->isBranch()) { // insert the LFENCE before the branch
		MBB = MI->getParent();
		InsertionPt = MI;
		Prev = MI->getPrevNode();
		CutAllCFGEdges(&N);
		} else { // insert the LFENCE after the instruction
		MBB = MI->getParent();
		InsertionPt = MI->getNextNode() ? MI->getNextNode() : MBB->end();
		Prev = InsertionPt == MBB->end()
		? (MBB->empty() ? nullptr : &MBB->back())
		: InsertionPt->getPrevNode();
		}
		if ((InsertionPt == MBB->end() \|\| !isFence(&*InsertionPt)) &&
		(!Prev \|\| !isFence(Prev))) {
		mattdrUnsubmitted Done Reply Inline Actions This one took me a bit. It seems like the summary is: add a fence unless it would be redundant, i.e. if we can see the next instruction and see it is itself a fence mattdr: This one took me a bit. It seems like the summary is: add a fence unless it would be redundant…
		sconstabAuthorUnsubmitted Done Reply Inline Actions Right. I added a clarifying comment. sconstab: Right. I added a clarifying comment.
		BuildMI(*MBB, InsertionPt, DebugLoc(), TII->get(X86::LFENCE));
		++FencesInserted;
		}
		}
		}
		}
		LLVM_DEBUG(dbgs() << "Inserted " << FencesInserted << " fences\n");
		LLVM_DEBUG(dbgs() << "Cut an additional " << AdditionalEdgesCut
		<< " edges during fence insertion\n");
		return FencesInserted;
		mattdrUnsubmitted Done Reply Inline Actions You can skip the double-lookup and just do: if (MachineGadgetGraph::isCFGEdge(E)) { if (CutEdges.insert(E)) { // returns true if not already present ++AdditionalEdgesCut; } } mattdr: You can skip the double-lookup and just do: ``` if (MachineGadgetGraph::isCFGEdge(E)) { if…
		}

bool X86LoadValueInjectionLoadHardeningPass::instrUsesRegToAccessMemory(		bool X86LoadValueInjectionLoadHardeningPass::instrUsesRegToAccessMemory(
const MachineInstr &MI, unsigned Reg) const {		const MachineInstr &MI, unsigned Reg) const {
if (!MI.mayLoadOrStore() \|\| MI.getOpcode() == X86::MFENCE \|\|		if (!MI.mayLoadOrStore() \|\| MI.getOpcode() == X86::MFENCE \|\|
MI.getOpcode() == X86::SFENCE \|\| MI.getOpcode() == X86::LFENCE)		MI.getOpcode() == X86::SFENCE \|\| MI.getOpcode() == X86::LFENCE)
return false;		return false;

// FIXME: This does not handle pseudo loading instruction like TCRETURN*		// FIXME: This does not handle pseudo loading instruction like TCRETURN*
const MCInstrDesc &Desc = MI.getDesc();		const MCInstrDesc &Desc = MI.getDesc();
▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/lvi-hardening-loads.ll

This file was added.

				; RUN: llc -verify-machineinstrs -mtriple=x86_64-unknown < %s \| FileCheck %s --check-prefix=X64 --check-prefix=X64-CBFX
				; RUN: llc -verify-machineinstrs -mtriple=x86_64-unknown --x86-lvi-load-no-fixed < %s \| FileCheck %s --check-prefix=X64 --check-prefix=X64-CB
				; RUN: llc -verify-machineinstrs -mtriple=x86_64-unknown --x86-lvi-load-no-cbranch < %s \| FileCheck %s --check-prefix=X64 --check-prefix=X64-FX
				; RUN: llc -verify-machineinstrs -mtriple=x86_64-unknown --x86-lvi-load-no-fixed --x86-lvi-load-no-cbranch < %s \| FileCheck %s --check-prefix=X64 --check-prefix=X64-BASE

				; Function Attrs: noinline nounwind optnone uwtable
				define dso_local i32 @test(i32** %secret, i32 %secret_size) #0 {
				; X64-LABEL: test:
				entry:
				%secret.addr = alloca i32**, align 8
				%secret_size.addr = alloca i32, align 4
				%ret_val = alloca i32, align 4
				%i = alloca i32, align 4
				store i32 %secret, i32* %secret.addr, align 8
				store i32 %secret_size, i32* %secret_size.addr, align 4
				store i32 0, i32* %ret_val, align 4
				call void @llvm.x86.sse2.lfence()
				store i32 0, i32* %i, align 4
				br label %for.cond

				; X64: # %bb.0: # %entry
				; X64-NEXT: movq %rdi, -{{[0-9]+}}(%rsp)
				; X64-NEXT: movl %esi, -{{[0-9]+}}(%rsp)
				; X64-NEXT: movl $0, -{{[0-9]+}}(%rsp)
				; X64-NEXT: lfence
				; X64-NEXT: movl $0, -{{[0-9]+}}(%rsp)
				; X64-NEXT: jmp .LBB0_1

				for.cond: ; preds = %for.inc, %entry
				%0 = load i32, i32* %i, align 4
				%1 = load i32, i32* %secret_size.addr, align 4
				%cmp = icmp slt i32 %0, %1
				br i1 %cmp, label %for.body, label %for.end

				; X64: .LBB0_1: # %for.cond
				; X64-NEXT: # =>This Inner Loop Header: Depth=1
				; X64-NEXT: movl -{{[0-9]+}}(%rsp), %eax
				; X64-CBFX-NEXT: lfence
				; X64-NEXT: cmpl -{{[0-9]+}}(%rsp), %eax
				; X64-CBFX-NEXT: lfence
				; X64-NEXT: jge .LBB0_5

				for.body: ; preds = %for.cond
				%2 = load i32, i32* %i, align 4
				%rem = srem i32 %2, 2
				%cmp1 = icmp eq i32 %rem, 0
				br i1 %cmp1, label %if.then, label %if.end

				; X64: # %bb.2: # %for.body
				; X64-NEXT: # in Loop: Header=BB0_1 Depth=1
				; X64-NEXT: movl -{{[0-9]+}}(%rsp), %eax
				; X64-CBFX-NEXT: lfence
				; X64-NEXT: movl %eax, %ecx
				; X64-NEXT: shrl $31, %ecx
				; X64-NEXT: addl %eax, %ecx
				; X64-NEXT: andl $-2, %ecx
				; X64-NEXT: cmpl %ecx, %eax
				; X64-NEXT: jne .LBB0_4

				if.then: ; preds = %for.body
				%3 = load i32, i32* %secret.addr, align 8
				%4 = load i32, i32* %ret_val, align 4
				%idxprom = sext i32 %4 to i64
				%arrayidx = getelementptr inbounds i32, i32* %3, i64 %idxprom
				%5 = load i32, i32* %arrayidx, align 8
				%6 = load i32, i32* %5, align 4
				store i32 %6, i32* %ret_val, align 4
				br label %if.end

				; X64: # %bb.3: # %if.then
				; X64-NEXT: # in Loop: Header=BB0_1 Depth=1
				; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rax
				; X64-CBFX-NEXT: lfence
				; X64-FX-NEXT: lfence
				; X64-NEXT: movslq -{{[0-9]+}}(%rsp), %rcx
				; X64-CBFX-NEXT: lfence
				; X64-FX-NEXT: lfence
				; X64-NEXT: movq (%rax,%rcx,8), %rax
				; X64-NEXT: lfence
				; X64-NEXT: movl (%rax), %eax
				; X64-NEXT: movl %eax, -{{[0-9]+}}(%rsp)
				; X64-NEXT: jmp .LBB0_4

				if.end: ; preds = %if.then, %for.body
				br label %for.inc

				for.inc: ; preds = %if.end
				%7 = load i32, i32* %i, align 4
				%inc = add nsw i32 %7, 1
				store i32 %inc, i32* %i, align 4
				br label %for.cond

				for.end: ; preds = %for.cond
				%8 = load i32, i32* %ret_val, align 4
				ret i32 %8
				}

				; Function Attrs: nounwind
				declare void @llvm.x86.sse2.lfence() #1

				attributes #0 = { "target-features"="+lvi-load-hardening" }
				attributes #1 = { nounwind }