This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86.h
-
X86LoadValueInjectionLoadHardening.cpp
1
X86TargetMachine.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
O0-pipeline.ll
-
lvi-hardening-loads.ll

Differential D80964

[X86] Add an Unoptimized Load Value Injection (LVI) Load Hardening Pass
ClosedPublic

Authored by sconstab on Jun 1 2020, 5:05 PM.

Download Raw Diff

Details

Reviewers

nikic
craig.topper
mattdr

Commits

rG72bff7855d8c: [X86] Add an Unoptimized Load Value Injection (LVI) Load Hardening Pass
rG7e06cf0011a8: [X86] Add an Unoptimized Load Value Injection (LVI) Load Hardening Pass

Summary

@nikic raised an issue on D75936 that the added complexity to the O0 pipeline was causing noticeable slowdowns for -O0 builds. This patch addresses the issue by adding a pass with equal security properties, but without any optimizations (and more importantly, without the need for expensive analysis dependencies).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sconstab created this revision.Jun 1 2020, 5:05 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2020, 5:06 PM

Herald added subscribers: jfb, hiraditya. · View Herald Transcript

I should add that I am submitting this patch as an alternative to D80064. That revision invisibly disables the mitigation at -O0, which may not be secure for some users.

nikic mentioned this in D80064: [X86] Disable LVI load hardening pass at O0.Jun 5 2020, 1:18 PM

This approach looks reasonable to me.

Rather than creating a separate pass, we could also pass OptLevel into it and pick the optimized/unoptimized approach based on that. No particular preference though, as it seems like apart from a couple initial checks, there is no code to share.

Isn't this pass basically SESES? https://github.com/llvm/llvm-project/blob/master/llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp

Perhaps there's an opportunity to unify the two.

In D80964#2077614, @mattdr wrote:

Isn't this pass basically SESES? https://github.com/llvm/llvm-project/blob/master/llvm/lib/Target/X86/X86SpeculativeExecutionSideEffectSuppression.cpp

Perhaps there's an opportunity to unify the two.

My understanding of SESES is that it inserts an LFENCE before each transmitter, whereas this pass inserts an LFENCE after each load. They are slightly different threat models; SESES is more strict. For example, SESES would mitigate the following vulnerability, whereas the LVI hardening will not:

uint64_t maybe_secret = *ptr;                           // architectural load
__mm_lfence();
if (is_secret) {                                        // suppose `is_secret == true` but branch mispredicts
    // do something constant-time with `maybe_secret`
} else {
    return byte_array[maybe_secret * 4096];             // speculatively leak the secret
}

I'm not opposed to merging the two approaches. But I am also not sure how to justify it.

LGTM

This revision is now accepted and ready to land.Jun 8 2020, 3:05 PM

Thank you for the analysis and for the example, they were really helpful. Agree the approaches are different. They feel similar enough I wish we could find one that satisfied all relevant requirements, but I can't say with any certainty that e.g. SESES' performance is on par with the mitigation proposed here.

jethrogb added a subscriber: jethrogb.Jun 9 2020, 12:34 PM

Closed by commit rG7e06cf0011a8: [X86] Add an Unoptimized Load Value Injection (LVI) Load Hardening Pass (authored by sconstab, committed by craig.topper). · Explain WhyJun 10 2020, 3:36 PM

This revision was automatically updated to reflect the committed changes.

zbrid added a subscriber: zbrid.Jun 16 2020, 5:23 PM

zbrid added inline comments.

llvm/lib/Target/X86/X86TargetMachine.cpp
503	In D80964#2077811, @sconstab wrote: My understanding of SESES is that it inserts an `LFENCE` before each transmitter, whereas this pass inserts an `LFENCE` after each load. They are slightly different threat models; SESES is more strict. For example, SESES would mitigate the following vulnerability, whereas the LVI hardening will not: uint64_t maybe_secret = ptr; // architectural load __mm_lfence(); if (is_secret) { // suppose `is_secret == true` but branch mispredicts // do something constant-time with `maybe_secret` } else { return byte_array[maybe_secret 4096]; // speculatively leak the secret } I'm not opposed to merging the two approaches. But I am also not sure how to justify it. Hi Scott, I'd like to push back on this submitted change. From what I can tell, using the SESES pass here would result in less code to maintain while fulfilling the goals of this change. That seems to be sufficient justification in a large codebases such as LLVM, though I'm open to arguments to the contrary. Is there a reason to prefer this specific, newly implemented approach over the SESES model in this situation? I can't immediately see any benefit to adding another pass here that does largely the same thing (if in a less strict manner). I may be missing something, so please let me know if I am.

Also to be a bit clearer, I don't think it's necessary to unify the approaches. It seems like deleting the new approach and dropping in SESES would be sufficient. Let me know if that's not the case.

@zbrid From a practical perspective I think you are correct. SESES mitigates a superset of gadgets that this pass mitigates, and therefore for code reuse/maintainability reasons it would make sense to replace this pass with SESES.

From a security perspective, I think that this could become problematic. It would mean that at -O0 I would get more security than I would at -O[1-3]. IMO optimization levels should not work like that.

It may also be worth noting that this new unoptimized pass is equivalent to the behavior of the mitigation implemented for gcc through binutils. Given that I wonder if it makes sense to use this pass at O1 or O2 and save the mostly costly analysis for O3.

In D80964#2097227, @craig.topper wrote:

It may also be worth noting that this new unoptimized pass is equivalent to the behavior of the mitigation implemented for gcc through binutils. Given that I wonder if it makes sense to use this pass at O1 or O2 and save the mostly costly analysis for O3.

I have seen some build systems use -O2 for release builds. So if we go this route, maybe we could have unoptimized for -O0 and -O1, and optimized for -O2 and -O3.

In D80964#2097211, @sconstab wrote:

From a security perspective, I think that this could become problematic. It would mean that at -O0 I would get more security than I would at -O[1-3]. IMO optimization levels should not work like that.

I don't really think this is a concern. As long as the pass provides at least the same level of security as what users need from the LVI pass we can use SESES. The particular implementation used is hidden behind the abstraction of the compiler flag.

A related note is that it seems like the unoptimized LVI pass does not provide the exact same level of security as the graph LVI pass, so this concern is already an issue whether we use SESES or the LVI upoptimized pass. (EDITED TO ADD: Oh wait I may have misunderstood your point. It sounds like you're saying the two LVI strategies are essentially the same with the unoptimized having extra LFENCES that are redundant. That doesn't change my mind though because one could say the same wrt optimized LVI and SESES and just add that SESES has a more redundant LFENCEs than unoptimized LVI.)

In D80964#2097227, @craig.topper wrote:

It may also be worth noting that this new unoptimized pass is equivalent to the behavior of the mitigation implemented for gcc through binutils. Given that I wonder if it makes sense to use this pass at O1 or O2 and save the mostly costly analysis for O3.

Ah that's an interesting point, but I'm not too sure similarity to gcc should be prioritized here. Is there a reason to value the similarities with the gcc approach? Up until now we've accepted the differences and I don't think we have new evidence suggesting the similarity should be prioritized.

I created a patch for the suggested change. Perhaps we should continue the conversation there? https://reviews.llvm.org/D82037

That doesn't change my mind though because one could say the same wrt optimized LVI and SESES and just add that SESES has a more redundant LFENCEs than unoptimized LVI.

I don't think that this is accurate. The relationships are:

Security(optimized LVI)  == Security(unoptimized LVI)
Security(optimized LVI)   < Security(SESES)
Security(unoptimized LVI) < Security(SESES)

SESES is targeting a broader threat model that encompasses non-universal Spectre v1 gadgets. Users who care about those gadgets should be using SESES outright and not the LVI passes.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86.h

2 lines

X86LoadValueInjectionLoadHardening.cpp

76 lines

X86TargetMachine.cpp

5 lines

test/

CodeGen/

X86/

O0-pipeline.ll

5 lines

lvi-hardening-loads.ll

46 lines

Diff 269983

llvm/lib/Target/X86/X86.h

	Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines
	/// ways.			/// ways.
	FunctionPass *createX86PartialReductionPass();			FunctionPass *createX86PartialReductionPass();

	InstructionSelector *createX86InstructionSelector(const X86TargetMachine &TM,			InstructionSelector *createX86InstructionSelector(const X86TargetMachine &TM,
	X86Subtarget &,			X86Subtarget &,
	X86RegisterBankInfo &);			X86RegisterBankInfo &);

	FunctionPass *createX86LoadValueInjectionLoadHardeningPass();			FunctionPass *createX86LoadValueInjectionLoadHardeningPass();
				FunctionPass *createX86LoadValueInjectionLoadHardeningUnoptimizedPass();
	FunctionPass *createX86LoadValueInjectionRetHardeningPass();			FunctionPass *createX86LoadValueInjectionRetHardeningPass();
	FunctionPass *createX86SpeculativeLoadHardeningPass();			FunctionPass *createX86SpeculativeLoadHardeningPass();
	FunctionPass *createX86SpeculativeExecutionSideEffectSuppression();			FunctionPass *createX86SpeculativeExecutionSideEffectSuppression();

	void initializeEvexToVexInstPassPass(PassRegistry &);			void initializeEvexToVexInstPassPass(PassRegistry &);
	void initializeFixupBWInstPassPass(PassRegistry &);			void initializeFixupBWInstPassPass(PassRegistry &);
	void initializeFixupLEAPassPass(PassRegistry &);			void initializeFixupLEAPassPass(PassRegistry &);
	void initializeFPSPass(PassRegistry &);			void initializeFPSPass(PassRegistry &);
	void initializeWinEHStatePassPass(PassRegistry &);			void initializeWinEHStatePassPass(PassRegistry &);
	void initializeX86AvoidSFBPassPass(PassRegistry &);			void initializeX86AvoidSFBPassPass(PassRegistry &);
	void initializeX86AvoidTrailingCallPassPass(PassRegistry &);			void initializeX86AvoidTrailingCallPassPass(PassRegistry &);
	void initializeX86CallFrameOptimizationPass(PassRegistry &);			void initializeX86CallFrameOptimizationPass(PassRegistry &);
	void initializeX86CmovConverterPassPass(PassRegistry &);			void initializeX86CmovConverterPassPass(PassRegistry &);
	void initializeX86CondBrFoldingPassPass(PassRegistry &);			void initializeX86CondBrFoldingPassPass(PassRegistry &);
	void initializeX86DomainReassignmentPass(PassRegistry &);			void initializeX86DomainReassignmentPass(PassRegistry &);
	void initializeX86ExecutionDomainFixPass(PassRegistry &);			void initializeX86ExecutionDomainFixPass(PassRegistry &);
	void initializeX86ExpandPseudoPass(PassRegistry &);			void initializeX86ExpandPseudoPass(PassRegistry &);
	void initializeX86FixupSetCCPassPass(PassRegistry &);			void initializeX86FixupSetCCPassPass(PassRegistry &);
	void initializeX86FlagsCopyLoweringPassPass(PassRegistry &);			void initializeX86FlagsCopyLoweringPassPass(PassRegistry &);
				void initializeX86LoadValueInjectionLoadHardeningUnoptimizedPassPass(PassRegistry &);
	void initializeX86LoadValueInjectionLoadHardeningPassPass(PassRegistry &);			void initializeX86LoadValueInjectionLoadHardeningPassPass(PassRegistry &);
	void initializeX86LoadValueInjectionRetHardeningPassPass(PassRegistry &);			void initializeX86LoadValueInjectionRetHardeningPassPass(PassRegistry &);
	void initializeX86OptimizeLEAPassPass(PassRegistry &);			void initializeX86OptimizeLEAPassPass(PassRegistry &);
	void initializeX86PartialReductionPass(PassRegistry &);			void initializeX86PartialReductionPass(PassRegistry &);
	void initializeX86SpeculativeLoadHardeningPassPass(PassRegistry &);			void initializeX86SpeculativeLoadHardeningPassPass(PassRegistry &);
	void initializeX86SpeculativeExecutionSideEffectSuppressionPass(PassRegistry &);			void initializeX86SpeculativeExecutionSideEffectSuppressionPass(PassRegistry &);

	namespace X86AS {			namespace X86AS {
	Show All 13 Lines

llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp

	Show First 20 Lines • Show All 816 Lines • ▼ Show 20 Lines
	INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)			INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
	INITIALIZE_PASS_DEPENDENCY(MachineDominanceFrontier)			INITIALIZE_PASS_DEPENDENCY(MachineDominanceFrontier)
	INITIALIZE_PASS_END(X86LoadValueInjectionLoadHardeningPass, PASS_KEY,			INITIALIZE_PASS_END(X86LoadValueInjectionLoadHardeningPass, PASS_KEY,
	"X86 LVI load hardening", false, false)			"X86 LVI load hardening", false, false)

	FunctionPass *llvm::createX86LoadValueInjectionLoadHardeningPass() {			FunctionPass *llvm::createX86LoadValueInjectionLoadHardeningPass() {
	return new X86LoadValueInjectionLoadHardeningPass();			return new X86LoadValueInjectionLoadHardeningPass();
	}			}

				namespace {

				/// The `X86LoadValueInjectionLoadHardeningPass` above depends on expensive
				/// analysis passes that add complexity to the pipeline. This complexity
				/// can cause noticable overhead when no optimizations are enabled, i.e., -O0.
				/// The purpose of `X86LoadValueInjectionLoadHardeningUnoptimizedPass` is to
				/// provide the same security as the optimized pass, but without adding
				/// unnecessary complexity to the LLVM pipeline.
				///
				/// The behavior of this pass is simply to insert an LFENCE after every load
				/// instruction.
				class X86LoadValueInjectionLoadHardeningUnoptimizedPass
				: public MachineFunctionPass {
				public:
				X86LoadValueInjectionLoadHardeningUnoptimizedPass()
				: MachineFunctionPass(ID) {}

				StringRef getPassName() const override {
				return "X86 Load Value Injection (LVI) Load Hardening (Unoptimized)";
				}
				bool runOnMachineFunction(MachineFunction &MF) override;
				static char ID;
				};

				} // end anonymous namespace

				char X86LoadValueInjectionLoadHardeningUnoptimizedPass::ID = 0;

				bool X86LoadValueInjectionLoadHardeningUnoptimizedPass::runOnMachineFunction(
				MachineFunction &MF) {
				LLVM_DEBUG(dbgs() << "***** " << getPassName() << " : " << MF.getName()
				<< " *****\n");
				const X86Subtarget *STI = &MF.getSubtarget<X86Subtarget>();
				if (!STI->useLVILoadHardening())
				return false;

				// FIXME: support 32-bit
				if (!STI->is64Bit())
				report_fatal_error("LVI load hardening is only supported on 64-bit", false);

				// Don't skip functions with the "optnone" attr but participate in opt-bisect.
				const Function &F = MF.getFunction();
				if (!F.hasOptNone() && skipFunction(F))
				return false;

				bool Modified = false;
				++NumFunctionsConsidered;

				const TargetInstrInfo *TII = STI->getInstrInfo();
				for (auto &MBB : MF) {
				for (auto &MI : MBB) {
				if (!MI.mayLoad() \|\| MI.getOpcode() == X86::LFENCE \|\|
				MI.getOpcode() == X86::MFENCE)
				continue;

				MachineBasicBlock::iterator InsertionPt =
				MI.getNextNode() ? MI.getNextNode() : MBB.end();
				BuildMI(MBB, InsertionPt, DebugLoc(), TII->get(X86::LFENCE));
				++NumFences;
				Modified = true;
				}
				}

				if (Modified)
				++NumFunctionsMitigated;

				return Modified;
				}

				INITIALIZE_PASS(X86LoadValueInjectionLoadHardeningUnoptimizedPass, PASS_KEY,
				"X86 LVI load hardening", false, false)

				FunctionPass *llvm::createX86LoadValueInjectionLoadHardeningUnoptimizedPass() {
				return new X86LoadValueInjectionLoadHardeningUnoptimizedPass();
				}

llvm/lib/Target/X86/X86TargetMachine.cpp

	Show First 20 Lines • Show All 491 Lines • ▼ Show 20 Lines
	}			}
	void X86PassConfig::addMachineSSAOptimization() {			void X86PassConfig::addMachineSSAOptimization() {
	addPass(createX86DomainReassignmentPass());			addPass(createX86DomainReassignmentPass());
	TargetPassConfig::addMachineSSAOptimization();			TargetPassConfig::addMachineSSAOptimization();
	}			}

	void X86PassConfig::addPostRegAlloc() {			void X86PassConfig::addPostRegAlloc() {
	addPass(createX86FloatingPointStackifierPass());			addPass(createX86FloatingPointStackifierPass());
				if (getOptLevel() != CodeGenOpt::None)
	addPass(createX86LoadValueInjectionLoadHardeningPass());			addPass(createX86LoadValueInjectionLoadHardeningPass());
				else
				addPass(createX86LoadValueInjectionLoadHardeningUnoptimizedPass());
				zbridUnsubmitted Not Done Reply Inline Actions In D80964#2077811, @sconstab wrote: My understanding of SESES is that it inserts an `LFENCE` before each transmitter, whereas this pass inserts an `LFENCE` after each load. They are slightly different threat models; SESES is more strict. For example, SESES would mitigate the following vulnerability, whereas the LVI hardening will not: uint64_t maybe_secret = ptr; // architectural load __mm_lfence(); if (is_secret) { // suppose `is_secret == true` but branch mispredicts // do something constant-time with `maybe_secret` } else { return byte_array[maybe_secret 4096]; // speculatively leak the secret } I'm not opposed to merging the two approaches. But I am also not sure how to justify it. Hi Scott, I'd like to push back on this submitted change. From what I can tell, using the SESES pass here would result in less code to maintain while fulfilling the goals of this change. That seems to be sufficient justification in a large codebases such as LLVM, though I'm open to arguments to the contrary. Is there a reason to prefer this specific, newly implemented approach over the SESES model in this situation? I can't immediately see any benefit to adding another pass here that does largely the same thing (if in a less strict manner). I may be missing something, so please let me know if I am. zbrid: >>! In D80964#2077811, @sconstab wrote: > > My understanding of SESES is that it inserts an…
	}			}

	void X86PassConfig::addPreSched2() { addPass(createX86ExpandPseudoPass()); }			void X86PassConfig::addPreSched2() { addPass(createX86ExpandPseudoPass()); }

	void X86PassConfig::addPreEmitPass() {			void X86PassConfig::addPreEmitPass() {
	if (getOptLevel() != CodeGenOpt::None) {			if (getOptLevel() != CodeGenOpt::None) {
	addPass(new X86ExecutionDomainFix());			addPass(new X86ExecutionDomainFix());
	addPass(createBreakFalseDeps());			addPass(createBreakFalseDeps());
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/O0-pipeline.ll

	Show All 40 Lines
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: X86 EFLAGS copy lowering			; CHECK-NEXT: X86 EFLAGS copy lowering
	; CHECK-NEXT: X86 WinAlloca Expander			; CHECK-NEXT: X86 WinAlloca Expander
	; CHECK-NEXT: Eliminate PHI nodes for register allocation			; CHECK-NEXT: Eliminate PHI nodes for register allocation
	; CHECK-NEXT: Two-Address instruction pass			; CHECK-NEXT: Two-Address instruction pass
	; CHECK-NEXT: Fast Register Allocator			; CHECK-NEXT: Fast Register Allocator
	; CHECK-NEXT: Bundle Machine CFG Edges			; CHECK-NEXT: Bundle Machine CFG Edges
	; CHECK-NEXT: X86 FP Stackifier			; CHECK-NEXT: X86 FP Stackifier
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: X86 Load Value Injection (LVI) Load Hardening (Unoptimized)
	; CHECK-NEXT: Machine Natural Loop Construction
	; CHECK-NEXT: Machine Dominance Frontier Construction
	; CHECK-NEXT: X86 Load Value Injection (LVI) Load Hardening
	; CHECK-NEXT: Fixup Statepoint Caller Saved			; CHECK-NEXT: Fixup Statepoint Caller Saved
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: Machine Optimization Remark Emitter			; CHECK-NEXT: Machine Optimization Remark Emitter
	; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization			; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization
	; CHECK-NEXT: Post-RA pseudo instruction expansion pass			; CHECK-NEXT: Post-RA pseudo instruction expansion pass
	; CHECK-NEXT: X86 pseudo instruction expansion pass			; CHECK-NEXT: X86 pseudo instruction expansion pass
	; CHECK-NEXT: Analyze Machine Code For Garbage Collection			; CHECK-NEXT: Analyze Machine Code For Garbage Collection
	; CHECK-NEXT: Insert fentry calls			; CHECK-NEXT: Insert fentry calls
	Show All 22 Lines

llvm/test/CodeGen/X86/lvi-hardening-loads.ll

	; RUN: llc -verify-machineinstrs -mtriple=x86_64-unknown < %s \| FileCheck %s --check-prefix=X64 --check-prefix=X64-ALL			; RUN: llc -verify-machineinstrs -mtriple=x86_64-unknown < %s \| FileCheck %s --check-prefix=X64 --check-prefix=X64-ALL
	; RUN: llc -verify-machineinstrs -mtriple=x86_64-unknown --x86-lvi-load-no-cbranch < %s \| FileCheck %s --check-prefix=X64			; RUN: llc -verify-machineinstrs -mtriple=x86_64-unknown --x86-lvi-load-no-cbranch < %s \| FileCheck %s --check-prefix=X64
				; RUN: llc -O0 -verify-machineinstrs -mtriple=x86_64-unknown < %s \| FileCheck %s --check-prefix=X64-NOOPT

	; Function Attrs: noinline nounwind optnone uwtable			; Function Attrs: noinline nounwind optnone uwtable
	define dso_local i32 @test(i32** %secret, i32 %secret_size) #0 {			define dso_local i32 @test(i32** %secret, i32 %secret_size) #0 {
	; X64-LABEL: test:			; X64-LABEL: test:
	entry:			entry:
	%secret.addr = alloca i32**, align 8			%secret.addr = alloca i32**, align 8
	%secret_size.addr = alloca i32, align 4			%secret_size.addr = alloca i32, align 4
	%ret_val = alloca i32, align 4			%ret_val = alloca i32, align 4
	%i = alloca i32, align 4			%i = alloca i32, align 4
	store i32 %secret, i32* %secret.addr, align 8			store i32 %secret, i32* %secret.addr, align 8
	store i32 %secret_size, i32* %secret_size.addr, align 4			store i32 %secret_size, i32* %secret_size.addr, align 4
	store i32 0, i32* %ret_val, align 4			store i32 0, i32* %ret_val, align 4
	call void @llvm.x86.sse2.lfence()			call void @llvm.x86.sse2.lfence()
	store i32 0, i32* %i, align 4			store i32 0, i32* %i, align 4
	br label %for.cond			br label %for.cond

	; X64: # %bb.0: # %entry			; X64: # %bb.0: # %entry
	; X64-NEXT: movq %rdi, -{{[0-9]+}}(%rsp)			; X64-NEXT: movq %rdi, -{{[0-9]+}}(%rsp)
	; X64-NEXT: movl %esi, -{{[0-9]+}}(%rsp)			; X64-NEXT: movl %esi, -{{[0-9]+}}(%rsp)
	; X64-NEXT: movl $0, -{{[0-9]+}}(%rsp)			; X64-NEXT: movl $0, -{{[0-9]+}}(%rsp)
	; X64-NEXT: lfence			; X64-NEXT: lfence
	; X64-NEXT: movl $0, -{{[0-9]+}}(%rsp)			; X64-NEXT: movl $0, -{{[0-9]+}}(%rsp)
	; X64-NEXT: jmp .LBB0_1			; X64-NEXT: jmp .LBB0_1

				; X64-NOOPT: # %bb.0: # %entry
				; X64-NOOPT-NEXT: movq %rdi, -{{[0-9]+}}(%rsp)
				; X64-NOOPT-NEXT: movl %esi, -{{[0-9]+}}(%rsp)
				; X64-NOOPT-NEXT: movl $0, -{{[0-9]+}}(%rsp)
				; X64-NOOPT-NEXT: lfence
				; X64-NOOPT-NEXT: movl $0, -{{[0-9]+}}(%rsp)

	for.cond: ; preds = %for.inc, %entry			for.cond: ; preds = %for.inc, %entry
	%0 = load i32, i32* %i, align 4			%0 = load i32, i32* %i, align 4
	%1 = load i32, i32* %secret_size.addr, align 4			%1 = load i32, i32* %secret_size.addr, align 4
	%cmp = icmp slt i32 %0, %1			%cmp = icmp slt i32 %0, %1
	br i1 %cmp, label %for.body, label %for.end			br i1 %cmp, label %for.body, label %for.end

	; X64: .LBB0_1: # %for.cond			; X64: .LBB0_1: # %for.cond
	; X64-NEXT: # =>This Inner Loop Header: Depth=1			; X64-NEXT: # =>This Inner Loop Header: Depth=1
	; X64-NEXT: movl -{{[0-9]+}}(%rsp), %eax			; X64-NEXT: movl -{{[0-9]+}}(%rsp), %eax
	; X64-ALL-NEXT: lfence			; X64-ALL-NEXT: lfence
	; X64-NEXT: cmpl -{{[0-9]+}}(%rsp), %eax			; X64-NEXT: cmpl -{{[0-9]+}}(%rsp), %eax
	; X64-ALL-NEXT: lfence			; X64-ALL-NEXT: lfence
	; X64-NEXT: jge .LBB0_5			; X64-NEXT: jge .LBB0_5

				; X64-NOOPT: .LBB0_1: # %for.cond
				; X64-NOOPT-NEXT: # =>This Inner Loop Header: Depth=1
				; X64-NOOPT-NEXT: movl -{{[0-9]+}}(%rsp), %eax
				; X64-NOOPT-NEXT: lfence
				; X64-NOOPT-NEXT: cmpl -{{[0-9]+}}(%rsp), %eax
				; X64-NOOPT-NEXT: lfence
				; X64-NOOPT-NEXT: jge .LBB0_6

	for.body: ; preds = %for.cond			for.body: ; preds = %for.cond
	%2 = load i32, i32* %i, align 4			%2 = load i32, i32* %i, align 4
	%rem = srem i32 %2, 2			%rem = srem i32 %2, 2
	%cmp1 = icmp eq i32 %rem, 0			%cmp1 = icmp eq i32 %rem, 0
	br i1 %cmp1, label %if.then, label %if.end			br i1 %cmp1, label %if.then, label %if.end

	; X64: # %bb.2: # %for.body			; X64: # %bb.2: # %for.body
	; X64-NEXT: # in Loop: Header=BB0_1 Depth=1			; X64-NEXT: # in Loop: Header=BB0_1 Depth=1
	; X64-NEXT: movl -{{[0-9]+}}(%rsp), %eax			; X64-NEXT: movl -{{[0-9]+}}(%rsp), %eax
	; X64-ALL-NEXT: lfence			; X64-ALL-NEXT: lfence
	; X64-NEXT: movl %eax, %ecx			; X64-NEXT: movl %eax, %ecx
	; X64-NEXT: shrl $31, %ecx			; X64-NEXT: shrl $31, %ecx
	; X64-NEXT: addl %eax, %ecx			; X64-NEXT: addl %eax, %ecx
	; X64-NEXT: andl $-2, %ecx			; X64-NEXT: andl $-2, %ecx
	; X64-NEXT: cmpl %ecx, %eax			; X64-NEXT: cmpl %ecx, %eax
	; X64-NEXT: jne .LBB0_4			; X64-NEXT: jne .LBB0_4

				; X64-NOOPT: # %bb.2: # %for.body
				; X64-NOOPT-NEXT: # in Loop: Header=BB0_1 Depth=1
				; X64-NOOPT-NEXT: movl -{{[0-9]+}}(%rsp), %eax
				; X64-NOOPT-NEXT: lfence
				; X64-NOOPT-NEXT: cltd
				; X64-NOOPT-NEXT: movl $2, %ecx
				; X64-NOOPT-NEXT: idivl %ecx
				; X64-NOOPT-NEXT: cmpl $0, %edx
				; X64-NOOPT-NEXT: jne .LBB0_4

	if.then: ; preds = %for.body			if.then: ; preds = %for.body
	%3 = load i32, i32* %secret.addr, align 8			%3 = load i32, i32* %secret.addr, align 8
	%4 = load i32, i32* %ret_val, align 4			%4 = load i32, i32* %ret_val, align 4
	%idxprom = sext i32 %4 to i64			%idxprom = sext i32 %4 to i64
	%arrayidx = getelementptr inbounds i32, i32* %3, i64 %idxprom			%arrayidx = getelementptr inbounds i32, i32* %3, i64 %idxprom
	%5 = load i32, i32* %arrayidx, align 8			%5 = load i32, i32* %arrayidx, align 8
	%6 = load i32, i32* %5, align 4			%6 = load i32, i32* %5, align 4
	store i32 %6, i32* %ret_val, align 4			store i32 %6, i32* %ret_val, align 4
	br label %if.end			br label %if.end

	; X64: # %bb.3: # %if.then			; X64: # %bb.3: # %if.then
	; X64-NEXT: # in Loop: Header=BB0_1 Depth=1			; X64-NEXT: # in Loop: Header=BB0_1 Depth=1
	; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rax			; X64-NEXT: movq -{{[0-9]+}}(%rsp), %rax
	; X64-NEXT: lfence			; X64-NEXT: lfence
	; X64-NEXT: movslq -{{[0-9]+}}(%rsp), %rcx			; X64-NEXT: movslq -{{[0-9]+}}(%rsp), %rcx
	; X64-NEXT: lfence			; X64-NEXT: lfence
	; X64-NEXT: movq (%rax,%rcx,8), %rax			; X64-NEXT: movq (%rax,%rcx,8), %rax
	; X64-NEXT: lfence			; X64-NEXT: lfence
	; X64-NEXT: movl (%rax), %eax			; X64-NEXT: movl (%rax), %eax
	; X64-NEXT: movl %eax, -{{[0-9]+}}(%rsp)			; X64-NEXT: movl %eax, -{{[0-9]+}}(%rsp)
	; X64-NEXT: jmp .LBB0_4			; X64-NEXT: jmp .LBB0_4

				; X64-NOOPT: # %bb.3: # %if.then
				; X64-NOOPT-NEXT: # in Loop: Header=BB0_1 Depth=1
				; X64-NOOPT-NEXT: movq -{{[0-9]+}}(%rsp), %rax
				; X64-NOOPT-NEXT: lfence
				; X64-NOOPT-NEXT: movslq -{{[0-9]+}}(%rsp), %rcx
				; X64-NOOPT-NEXT: lfence
				; X64-NOOPT-NEXT: movq (%rax,%rcx,8), %rax
				; X64-NOOPT-NEXT: lfence
				; X64-NOOPT-NEXT: movl (%rax), %eax
				; X64-NOOPT-NEXT: lfence
				; X64-NOOPT-NEXT: movl %eax, -{{[0-9]+}}(%rsp)

	if.end: ; preds = %if.then, %for.body			if.end: ; preds = %if.then, %for.body
	br label %for.inc			br label %for.inc

	for.inc: ; preds = %if.end			for.inc: ; preds = %if.end
	%7 = load i32, i32* %i, align 4			%7 = load i32, i32* %i, align 4
	%inc = add nsw i32 %7, 1			%inc = add nsw i32 %7, 1
	store i32 %inc, i32* %i, align 4			store i32 %inc, i32* %i, align 4
	br label %for.cond			br label %for.cond

				; X64-NOOPT: .LBB0_5: # %for.inc
				; X64-NOOPT-NEXT: # in Loop: Header=BB0_1 Depth=1
				; X64-NOOPT-NEXT: movl -{{[0-9]+}}(%rsp), %eax
				; X64-NOOPT-NEXT: lfence
				; X64-NOOPT-NEXT: addl $1, %eax
				; X64-NOOPT-NEXT: movl %eax, -{{[0-9]+}}(%rsp)
				; X64-NOOPT-NEXT: jmp .LBB0_1

	for.end: ; preds = %for.cond			for.end: ; preds = %for.cond
	%8 = load i32, i32* %ret_val, align 4			%8 = load i32, i32* %ret_val, align 4
	ret i32 %8			ret i32 %8
	}			}

	; Function Attrs: nounwind			; Function Attrs: nounwind
	declare void @llvm.x86.sse2.lfence() #1			declare void @llvm.x86.sse2.lfence() #1

	attributes #0 = { "target-features"="+lvi-load-hardening" }			attributes #0 = { "target-features"="+lvi-load-hardening" }
	attributes #1 = { nounwind }			attributes #1 = { nounwind }