This is an archive of the discontinued LLVM Phabricator instance.

Introduce control flow speculation tracking pass for AArch64.
ClosedPublic

Authored by kristof.beyls on Nov 26 2018, 6:24 AM.

Download Raw Diff

Details

Reviewers

Commits

rGe66bc1f756a8: Introduce control flow speculation tracking pass for AArch64
rL349456: Introduce control flow speculation tracking pass for AArch64

Summary

The pass implements tracking of control flow miss-speculation into a "taint"
register. That taint register can then be used to mask off registers with
sensitive data when executing under miss-speculation, a.k.a. "transient
execution".

At the moment, it implements the tracking of miss-speculation of control
flow into a taint register, but doesn't implement a mechanism yet to then
use that taint register to mask of vulnerable data in registers (something
for a follow-on improvement). Possible strategies to mask out vulnerable
data that can be implemented on top of this are:

speculative load hardening to automatically mask of data loaded in registers.
using intrinsics to mask of data in registers as indicated by the programmer (see https://lwn.net/Articles/759423/).

For AArch64, the following implementation choices have been made in this patch.
Some of these are different than the implementation choices made in
the similar pass implemented in X86SpeculativeLoadHardening.cpp, as
the instruction set characteristics result in different trade-offs.

The speculation hardening is done after register allocation. With a relative abundance of registers, one register is reserved (X16) to be the taint register. X16 is expected to not clash with other register reservation mechanisms with very high probability because: . The AArch64 ABI doesn't guarantee X16 to be retained across any call. . There currently isn't even a user interface to reserve X16. If needed, the choice of register to reserve could be made flexible on a per function basis.
It is easy to insert mask operations at this late stage as we have mask operations available that don't set flags.
The taint variable contains all-ones when no miss-speculation is detected, and contains all-zeros when miss-speculation is detected. Therefore, when masking, an AND instruction (which only changes the register to be masked, no other side effects) can easily be inserted anywhere that's needed.
The tracking of miss-speculation is done by using a data-flow conditional select instruction (CSEL) to evaluate the flags that were also used to make conditional branch direction decisions. Speculation of the CSEL instruction can be limited with a CSDB instruction - so the combination of CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL aren't speculated. When conditional branch direction gets miss-speculated, the semantics of the inserted CSEL instruction is such that the taint register will contain all zero bits. One key requirement for this to work is that the conditional branch is followed by an execution of the CSEL instruction, where the CSEL instruction needs to use the same flags status as the conditional branch. This means that the conditional branches must not be implemented as one of the AArch64 conditional branches that do not use the flags as input (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction selectors to not produce these instructions when speculation hardening is enabled. This pass will assert if it does encounter such an instruction.
On function call boundaries, the miss-speculation state is transferred from the taint register X16 to be encoded in the SP register as value 0.

Future extensions/improvements could be:

Implement this functionality using full speculation barriers, akin to the x86-slh-lfence option. This may be more useful for the intrinsics-based approach than for the SLH approach to masking.
No indirect branch misprediction gets protected/instrumented; but this could be done for some indirect branches, such as switch jump tables.

I'm still working on a follow on patch that builds on top of this to introduce
speculative load hardening. Nonetheless, I wanted to get this out already to
start collecting feedback (and split up the functionality into smaller parts to
make it easier to review).
I'm more busy than usual in the coming 2 weeks, so may be a bit slow to react
to review feedback, I'm afraid.

Diff Detail

Repository: rL LLVM

Event Timeline

kristof.beyls created this revision.Nov 26 2018, 6:24 AM

Herald added subscribers: jfb, javed.absar, mgorny. · View Herald TranscriptNov 26 2018, 6:24 AM

There currently isn't even a user interface to reserve X16.

X16 can be reserved by the user using inline assembly, either with a clobber or named register variable. I can think of a few cases where this might happen in real code:

D51432 uses it in libunwind to implement unwinding through functions using return-address signing.
Inline assembly which contains a function call will need to clobber X16 and X17.

If moving this pass to before register allocation would add a lot of extra complexity, maybe this could also be solved by copying the taint into SP before each inline asm block, and back out afterwards, like we currently do for calls?

lib/Target/AArch64/AArch64SpeculationHardening.cpp
116 ↗	(On Diff #175239)	What does VR stand for here? I'd assume Virtual Register, but these are only ever physical registers.
117 ↗	(On Diff #175239)	Unused variable.
test/CodeGen/AArch64/speculation-hardening.ll
11 ↗	(On Diff #175239)	Should we also check that these instructions are not emitted when SLH is disabled?
test/CodeGen/AArch64/speculation-hardening.mir
11 ↗	(On Diff #175239)	I think it would also be worth testing blocks ending in indirect branches, to make sure we ignore them for now.

Addressing most of Oliver's comments.

There currently isn't even a user interface to reserve X16.

X16 can be reserved by the user using inline assembly, either with a clobber or named register variable. I can think of a few cases where this might happen in real code:

D51432 uses it in libunwind to implement unwinding through functions using return-address signing.
Inline assembly which contains a function call will need to clobber X16 and X17.
If moving this pass to before register allocation would add a lot of extra complexity, maybe this could also be solved by copying the taint into SP before each inline asm block, and back out afterwards, like we currently do for calls?

Thanks for pointing out the possibility of X16 and X17 being forced to use by inline assembly code after all.
As we've discussed, we expect very little code to actually go and use X16.

I think there are at least 3 possible solutions to this, each with their set of pros and cons.

Change the implementation so that it runs pre register allocation.
- Cons:
  - Control flow and memory accesses that may be introduced by later passes (e.g. spill/fill code) will not be protected.
  - The implementation will be a lot more complex.
- Pros:
  - Implementations of SLH on some other architectures may have no choice but to implement this pre register allocation. If it were implemented pre register allocation for all architectures, there might be slightly more consistency in the protection given between architectures. Maybe.

Go for the suggestion you've made on storing speculation state in the stack pointer before the inline assembly block and restoring it after.
- Cons:
  - This would need a code sequence to change the value in SP based on the speculation state in X16. The code sequences used at function entry/exit for this cannot be used here, because those clobber X17 and the flags, which may be live when not at a function boundary. On AArch64, there are only a few instructions that can access that stack pointer, which is why we end up using temporary register X17 in the function boundary sequences. I have come up with a code sequence to encode miss-speculation in the SP that doesn't clobber any other register, nor the flags - but only if the encoding of miss-speculation in the stack pointer is allowed to be different: the least significant bit being set to 1. Obviously, this will not work if code uses a byte aligned stack pointer. I'm afraid I don't have a good insight into whether code would ever use a byte aligned stack pointer.
  - The X16 register may be live across multiple basic blocks/control flow. Applying this technique would remove the miss-speculation tracking from the edges on which the X16 register is live before SLH is applied, resulting in silently dropping some protection the user may expect.
- Pros:
  - This may be a relatively simple way to still protect the very few functions that absolutely must use X16.

Do not apply SLH hardening to a function making hard-coded use of X16, and provide a warning (or error) when a user requests SLH on such a function.
- Cons:
  - A function will not be protected by SLH, even if it was requested. The warning/error reported will inform the user to either adapt the code in the function to not use X16 or to accept whatever risk there may be in leaving this function unprotected.
- Pros:
  - A relatively simple implementation, while still keeping the advantages of doing speculation hardening very late (after register allocation).
  - No silent non-protected code.
  - The warning is assumed to trigger to an extremely small amount of code.

Overall, option 3 seems to have the best tradeoffs to me so I'll explore implementing that option further.

This updated diff addresses Oliver's final comment, namely about what to do when the program uses X16, e.g because of use of inline assembly specifically requesting to use X16.
Besides all the options I came up with previously, there is a 4th option: prevent speculation by inserting DSB SYS/ISB instruction pairs. Conceptually this is not unlike using lfence in X86SpeculativeLoadHardening.
I believe the pros/cons of this approach are (compared to the 3 options I listed previously):
Pros:

A relatively simple implementation.
Still keeping the advantages of doing speculation hardening very late.
No silent non-protected code.
This alternative protection mechanism is expected to only be needed for a very small amount of code.

Cons:

This likely has higher overhead than if we could still use a data flow mechanism to track when control flow miss-speculation happens.

Overall, this method seems to offer the best trade-off out of all options listed.

LGTM.

This revision is now accepted and ready to land.Dec 17 2018, 8:47 AM

Closed by commit rL349456: Introduce control flow speculation tracking pass for AArch64 (authored by kbeyls). · Explain WhyDec 18 2018, 12:53 AM

This revision was automatically updated to reflect the committed changes.

Hi Kristof,
Sorry about this late review. I wanted to make sure that I understood your implementation of SLH for ARM, so I took a look and made a few comments with some questions on this first patch. Let me know your thoughts. I'm planning to take a look at the other commit you made a few days ago related to SLH too. Hopefully I didn't make too many comments that were already addressed in the follow up commit.

llvm/trunk/lib/Target/AArch64/AArch64SpeculationHardening.cpp
100	Why is this a macro instead of a const variable?
164–166	Just curious in general about this, are there cases where this is expected to happen in practice? It seems useless to have a branch that goes to the same place for both true and false.
172	How does this assert work? I don't understand how you're ANDing with the string "unknown Cond array format"
191	From the comments at the beginning of the file, I expected a CSDB instruction right after this instruction. Is that not needed or added somewhere else (like the follow up commit that adds the masking)?
192	What is a live in and why does this one get added?
222	Is this getting the instruction before the branch instruction? Then that instruction is used as the debug location for the FBB and TBB we got in lines 214/215? Did I understand this correctly?
233	Why is this code in a separate code block?
262	Why doesn't this function return bool to indicate whether instructions were modified?
269–270	Why do these instructions have an immediate operand? It looks like you're passing the default for ISB, but it's not clear to me what that number means for DSB. I checked this manual, but it doesn't say what the immediate operand is used for, only that it's limited in the value it can have: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802a/a64_general_alpha.html http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802a/a64_general_alpha.html
274	Is having the zero register in ARM as the destination register equivalent to discarding the output or is the zero register affected? If it's the former, why use CMP instead of SUBS if this instruction discards the result of the instruction in the same way that CMP does?
280	Why use CSINV instead of CSETM here?
297	Why use ADD instead of MOV?
298–314	Why not the following instead of the three above instructions? AND SP, SP, TaintReg
338–339	Why do you have to get this data every time the function is called? Could these be static since we wouldn't expect them to be different between the functions in the same compilation? Or are there cases where this would be different between two functions?

efriedma added a subscriber: efriedma.Jan 11 2019, 4:31 PM

efriedma added inline comments.

llvm/trunk/lib/Target/AArch64/AArch64SpeculationHardening.cpp
269–270	The complete architecture reference manual is publicly available at https://developer.arm.com/docs/ddi0487/latest . This should allow you to quickly answer all your questions about the instruction choices.
274	CMP is an alias.
280	CSINV is an alias.
297	MOV is an alias.
298–314	There is no such encoding.
338–339	These can change between functions (with LTO, or certain function attributes).

In D54896#1354951, @zbrid wrote:

Hi Kristof,
Sorry about this late review. I wanted to make sure that I understood your implementation of SLH for ARM, so I took a look and made a few comments with some questions on this first patch. Let me know your thoughts. I'm planning to take a look at the other commit you made a few days ago related to SLH too. Hopefully I didn't make too many comments that were already addressed in the follow up commit.

Hi Zola,

I am very happy with you taking a look and asking those questions! Thank you!
I have tried to give clear answers to your questions. Please don't hold back in asking further if anything remains unclear.

llvm/trunk/lib/Target/AArch64/AArch64SpeculationHardening.cpp
100	I just followed existing practice in the other LLVM passes. Maybe this could be a const variable - not sure. If so, maybe best to make the change in a separate patch with a focus on making this change for all existing passes?
164–166	This pass has seen quite a few iterations over the past couple of months. I'm sure I have seen examples of TBB == FBB in some cases while running the pass across a range of test codes, but I'm afraid I no longer remember where I've seen the example...
172	This is also an idiom used in LLVM. Anding with the string results in some documentation to be printed (the string) when the assert fires. The string in the assert is often helpful as the actual condition check may be a bit cryptic if you hit an assert you haven't written yourself recently. For more details, see https://llvm.org/docs/CodingStandards.html#assert-liberally.
191	The CSDB instruction is needed before the register containing the taint is actually used. In this patch, no uses of the register containing the taint are introduced - that is done in the follow-on patch. So, indeed, you'll see CSDB instructions being inserted as part of the follow-on patch - where uses of the taint register also get introduced.
192	The LiveIns record which registers are live at the start of the basic block. Here, the CSEL instruction inserted at the start of the basic block uses (implicitly) the flags AArch64::NZCV. Those flags were set by the previous basic block, where the conditional jump lives that jumps to the current basic block. Therefore, here we have to explicitly mark that the flags are live at the start of the basic block.
222	instr_end() gets the "one past the last instruction" in the basic block, so "--MBB.instr_end()" gets the last instruction in the basic block MBB. So, the instruction that is used as the debug location is the last instruction. Presumably that last instruction will be the conditional branch instruction, so the tracking code (CSEL) will get the same debug info as the conditional branch instruction for which it's tracking miss-speculation.
233	There isn't really a need for it - it's just a personal style where I've used the block to indicate the conceptual extent of the "perform correct code generation around function calls and before returns". I'll remove the nesting level in an update - there is indeed no need for it and it deviates from existing practice.
262	This function always modifies MBB - so there is no need to signal back to the caller whether instructions were modified.
269–270	As Eli pointed to - you should be able to find the information in the location he pointed to. For example, in section "C6.2.75 DSB", the details state "...SY ... Encoded as CRm = 0b1111."
274	The xzr register exists in AArch64 ("64-bit Arm"), but not in ARM ("32-bit Arm"). You're understanding is otherwise correct. Let me quote the ArmARM ("Arm Architecture Reference Manual"): "The name XZR represents the zero register for 64-bit operands where an encoding of the value 31 in the corresponding register field is interpreted as returning zero when read or discarding the result when written." A little bit of background on instruction aliases: quite a few "instructions" you write in assembly, like CMP are actually encoded as more general instructions - in this case SUBS. So, the "CMP" instruction only really exists at the assembly level (for convenience) - but once encoded it is just a SUBS instruction. To quote the ArmARM again on the CMP instruction: """ CMP <Xn\|SP>, #<imm>{, <shift>} is equivalent to SUBS XZR, <Xn\|SP>, #<imm> {, <shift>} and is always the preferred disassembly. """ In MachineInstrs, these aliases are not available, only the "real" encodable instructions. So when creating an instruction, you need to use SUBS rather than CMP. In the comments, I have tried to put in the mapping from the alias to the encodable instruction, as I personally find it handy. I hope that makes sense?
298–314	As Eli said. More general, only the following instructions can write to the SP register in AArch64: ADD, SUB, NEG, NEGS, ADDS, SUBS. Therefore, to be able to AND the SP with another register and write that value to SP, we first need to move the SP to another register.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AArch64/

AArch64.h

2 lines

AArch64FastISel.cpp

7 lines

AArch64ISelLowering.cpp

18 lines

AArch64InstrInfo.cpp

7 lines

AArch64InstructionSelector.cpp

34 lines

AArch64RegisterInfo.cpp

4 lines

AArch64SpeculationHardening.cpp

368 lines

AArch64TargetMachine.cpp

11 lines

CMakeLists.txt

1 line

test/

CodeGen/

AArch64/

O0-pipeline.ll

1 line

O3-pipeline.ll

1 line

speculation-hardening-dagisel.ll

71 lines

speculation-hardening.ll

156 lines

speculation-hardening.mir

117 lines

Diff 178615

llvm/trunk/lib/Target/AArch64/AArch64.h

	Show All 33 Lines
	FunctionPass *createAArch64CondBrTuning();			FunctionPass *createAArch64CondBrTuning();
	FunctionPass *createAArch64CompressJumpTablesPass();			FunctionPass *createAArch64CompressJumpTablesPass();
	FunctionPass *createAArch64ConditionalCompares();			FunctionPass *createAArch64ConditionalCompares();
	FunctionPass *createAArch64AdvSIMDScalar();			FunctionPass *createAArch64AdvSIMDScalar();
	FunctionPass *createAArch64ISelDag(AArch64TargetMachine &TM,			FunctionPass *createAArch64ISelDag(AArch64TargetMachine &TM,
	CodeGenOpt::Level OptLevel);			CodeGenOpt::Level OptLevel);
	FunctionPass *createAArch64StorePairSuppressPass();			FunctionPass *createAArch64StorePairSuppressPass();
	FunctionPass *createAArch64ExpandPseudoPass();			FunctionPass *createAArch64ExpandPseudoPass();
				FunctionPass *createAArch64SpeculationHardeningPass();
	FunctionPass *createAArch64LoadStoreOptimizationPass();			FunctionPass *createAArch64LoadStoreOptimizationPass();
	FunctionPass *createAArch64SIMDInstrOptPass();			FunctionPass *createAArch64SIMDInstrOptPass();
	ModulePass *createAArch64PromoteConstantPass();			ModulePass *createAArch64PromoteConstantPass();
	FunctionPass *createAArch64ConditionOptimizerPass();			FunctionPass *createAArch64ConditionOptimizerPass();
	FunctionPass *createAArch64A57FPLoadBalancing();			FunctionPass *createAArch64A57FPLoadBalancing();
	FunctionPass *createAArch64A53Fix835769();			FunctionPass *createAArch64A53Fix835769();
	FunctionPass *createFalkorHWPFFixPass();			FunctionPass *createFalkorHWPFFixPass();
	FunctionPass *createFalkorMarkStridedAccessesPass();			FunctionPass *createFalkorMarkStridedAccessesPass();
	Show All 13 Lines
	void initializeAArch64BranchTargetsPass(PassRegistry&);			void initializeAArch64BranchTargetsPass(PassRegistry&);
	void initializeAArch64CollectLOHPass(PassRegistry&);			void initializeAArch64CollectLOHPass(PassRegistry&);
	void initializeAArch64CondBrTuningPass(PassRegistry &);			void initializeAArch64CondBrTuningPass(PassRegistry &);
	void initializeAArch64CompressJumpTablesPass(PassRegistry&);			void initializeAArch64CompressJumpTablesPass(PassRegistry&);
	void initializeAArch64ConditionalComparesPass(PassRegistry&);			void initializeAArch64ConditionalComparesPass(PassRegistry&);
	void initializeAArch64ConditionOptimizerPass(PassRegistry&);			void initializeAArch64ConditionOptimizerPass(PassRegistry&);
	void initializeAArch64DeadRegisterDefinitionsPass(PassRegistry&);			void initializeAArch64DeadRegisterDefinitionsPass(PassRegistry&);
	void initializeAArch64ExpandPseudoPass(PassRegistry&);			void initializeAArch64ExpandPseudoPass(PassRegistry&);
				void initializeAArch64SpeculationHardeningPass(PassRegistry&);
	void initializeAArch64LoadStoreOptPass(PassRegistry&);			void initializeAArch64LoadStoreOptPass(PassRegistry&);
	void initializeAArch64SIMDInstrOptPass(PassRegistry&);			void initializeAArch64SIMDInstrOptPass(PassRegistry&);
	void initializeAArch64PreLegalizerCombinerPass(PassRegistry&);			void initializeAArch64PreLegalizerCombinerPass(PassRegistry&);
	void initializeAArch64PromoteConstantPass(PassRegistry&);			void initializeAArch64PromoteConstantPass(PassRegistry&);
	void initializeAArch64RedundantCopyEliminationPass(PassRegistry&);			void initializeAArch64RedundantCopyEliminationPass(PassRegistry&);
	void initializeAArch64StorePairSuppressPass(PassRegistry&);			void initializeAArch64StorePairSuppressPass(PassRegistry&);
	void initializeFalkorHWPFFixPass(PassRegistry&);			void initializeFalkorHWPFFixPass(PassRegistry&);
	void initializeFalkorMarkStridedAccessesLegacyPass(PassRegistry&);			void initializeFalkorMarkStridedAccessesLegacyPass(PassRegistry&);
	void initializeLDTLSCleanupPass(PassRegistry&);			void initializeLDTLSCleanupPass(PassRegistry&);
	} // end namespace llvm			} // end namespace llvm

	#endif			#endif

llvm/trunk/lib/Target/AArch64/AArch64FastISel.cpp

Show First 20 Lines • Show All 2,252 Lines • ▼ Show 20 Lines	case CmpInst::ICMP_UGE:
return AArch64CC::HS;		return AArch64CC::HS;
case CmpInst::ICMP_ULT:		case CmpInst::ICMP_ULT:
return AArch64CC::LO;		return AArch64CC::LO;
}		}
}		}

/// Try to emit a combined compare-and-branch instruction.		/// Try to emit a combined compare-and-branch instruction.
bool AArch64FastISel::emitCompareAndBranch(const BranchInst *BI) {		bool AArch64FastISel::emitCompareAndBranch(const BranchInst *BI) {
		// Speculation tracking/SLH assumes that optimized TB(N)Z/CB(N)Z instructions
		// will not be produced, as they are conditional branch instructions that do
		// not set flags.
		if (FuncInfo.MF->getFunction().hasFnAttribute(
		Attribute::SpeculativeLoadHardening))
		return false;

assert(isa<CmpInst>(BI->getCondition()) && "Expected cmp instruction");		assert(isa<CmpInst>(BI->getCondition()) && "Expected cmp instruction");
const CmpInst *CI = cast<CmpInst>(BI->getCondition());		const CmpInst *CI = cast<CmpInst>(BI->getCondition());
CmpInst::Predicate Predicate = optimizeCmpPredicate(CI);		CmpInst::Predicate Predicate = optimizeCmpPredicate(CI);

const Value *LHS = CI->getOperand(0);		const Value *LHS = CI->getOperand(0);
const Value *RHS = CI->getOperand(1);		const Value *RHS = CI->getOperand(1);

MVT VT;		MVT VT;
▲ Show 20 Lines • Show All 2,915 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,337 Lines • ▼ Show 20 Lines
SDValue AArch64TargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const {		SDValue AArch64TargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const {
SDValue Chain = Op.getOperand(0);		SDValue Chain = Op.getOperand(0);
ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(1))->get();		ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(1))->get();
SDValue LHS = Op.getOperand(2);		SDValue LHS = Op.getOperand(2);
SDValue RHS = Op.getOperand(3);		SDValue RHS = Op.getOperand(3);
SDValue Dest = Op.getOperand(4);		SDValue Dest = Op.getOperand(4);
SDLoc dl(Op);		SDLoc dl(Op);

		MachineFunction &MF = DAG.getMachineFunction();
		// Speculation tracking/SLH assumes that optimized TB(N)Z/CB(N)Z instructions
		// will not be produced, as they are conditional branch instructions that do
		// not set flags.
		bool ProduceNonFlagSettingCondBr =
		!MF.getFunction().hasFnAttribute(Attribute::SpeculativeLoadHardening);

// Handle f128 first, since lowering it will result in comparing the return		// Handle f128 first, since lowering it will result in comparing the return
// value of a libcall against zero, which is just what the rest of LowerBR_CC		// value of a libcall against zero, which is just what the rest of LowerBR_CC
// is expecting to deal with.		// is expecting to deal with.
if (LHS.getValueType() == MVT::f128) {		if (LHS.getValueType() == MVT::f128) {
softenSetCCOperands(DAG, MVT::f128, LHS, RHS, CC, dl);		softenSetCCOperands(DAG, MVT::f128, LHS, RHS, CC, dl);

// If softenSetCCOperands returned a scalar, we need to compare the result		// If softenSetCCOperands returned a scalar, we need to compare the result
// against zero to select between true and false values.		// against zero to select between true and false values.
Show All 26 Lines	SDValue AArch64TargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const {

if (LHS.getValueType().isInteger()) {		if (LHS.getValueType().isInteger()) {
assert((LHS.getValueType() == RHS.getValueType()) &&		assert((LHS.getValueType() == RHS.getValueType()) &&
(LHS.getValueType() == MVT::i32 \|\| LHS.getValueType() == MVT::i64));		(LHS.getValueType() == MVT::i32 \|\| LHS.getValueType() == MVT::i64));

// If the RHS of the comparison is zero, we can potentially fold this		// If the RHS of the comparison is zero, we can potentially fold this
// to a specialized branch.		// to a specialized branch.
const ConstantSDNode *RHSC = dyn_cast<ConstantSDNode>(RHS);		const ConstantSDNode *RHSC = dyn_cast<ConstantSDNode>(RHS);
if (RHSC && RHSC->getZExtValue() == 0) {		if (RHSC && RHSC->getZExtValue() == 0 && ProduceNonFlagSettingCondBr) {
if (CC == ISD::SETEQ) {		if (CC == ISD::SETEQ) {
// See if we can use a TBZ to fold in an AND as well.		// See if we can use a TBZ to fold in an AND as well.
// TBZ has a smaller branch displacement than CBZ. If the offset is		// TBZ has a smaller branch displacement than CBZ. If the offset is
// out of bounds, a late MI-layer pass rewrites branches.		// out of bounds, a late MI-layer pass rewrites branches.
// 403.gcc is an example that hits this case.		// 403.gcc is an example that hits this case.
if (LHS.getOpcode() == ISD::AND &&		if (LHS.getOpcode() == ISD::AND &&
isa<ConstantSDNode>(LHS.getOperand(1)) &&		isa<ConstantSDNode>(LHS.getOperand(1)) &&
isPowerOf2_64(LHS.getConstantOperandVal(1))) {		isPowerOf2_64(LHS.getConstantOperandVal(1))) {
Show All 26 Lines	if (RHSC && RHSC->getZExtValue() == 0 && ProduceNonFlagSettingCondBr) {
// (a.k.a. TST) and the test in the test bit and branch instruction		// (a.k.a. TST) and the test in the test bit and branch instruction
// becomes redundant. This would also increase register pressure.		// becomes redundant. This would also increase register pressure.
uint64_t Mask = LHS.getValueSizeInBits() - 1;		uint64_t Mask = LHS.getValueSizeInBits() - 1;
return DAG.getNode(AArch64ISD::TBNZ, dl, MVT::Other, Chain, LHS,		return DAG.getNode(AArch64ISD::TBNZ, dl, MVT::Other, Chain, LHS,
DAG.getConstant(Mask, dl, MVT::i64), Dest);		DAG.getConstant(Mask, dl, MVT::i64), Dest);
}		}
}		}
if (RHSC && RHSC->getSExtValue() == -1 && CC == ISD::SETGT &&		if (RHSC && RHSC->getSExtValue() == -1 && CC == ISD::SETGT &&
LHS.getOpcode() != ISD::AND) {		LHS.getOpcode() != ISD::AND && ProduceNonFlagSettingCondBr) {
// Don't combine AND since emitComparison converts the AND to an ANDS		// Don't combine AND since emitComparison converts the AND to an ANDS
// (a.k.a. TST) and the test in the test bit and branch instruction		// (a.k.a. TST) and the test in the test bit and branch instruction
// becomes redundant. This would also increase register pressure.		// becomes redundant. This would also increase register pressure.
uint64_t Mask = LHS.getValueSizeInBits() - 1;		uint64_t Mask = LHS.getValueSizeInBits() - 1;
return DAG.getNode(AArch64ISD::TBZ, dl, MVT::Other, Chain, LHS,		return DAG.getNode(AArch64ISD::TBZ, dl, MVT::Other, Chain, LHS,
DAG.getConstant(Mask, dl, MVT::i64), Dest);		DAG.getConstant(Mask, dl, MVT::i64), Dest);
}		}

▲ Show 20 Lines • Show All 6,362 Lines • ▼ Show 20 Lines	SDValue performCONDCombine(SDNode *N,

return SDValue(N, 0);		return SDValue(N, 0);
}		}

// Optimize compare with zero and branch.		// Optimize compare with zero and branch.
static SDValue performBRCONDCombine(SDNode *N,		static SDValue performBRCONDCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
		MachineFunction &MF = DAG.getMachineFunction();
		// Speculation tracking/SLH assumes that optimized TB(N)Z/CB(N)Z instructions
		// will not be produced, as they are conditional branch instructions that do
		// not set flags.
		if (MF.getFunction().hasFnAttribute(Attribute::SpeculativeLoadHardening))
		return SDValue();

if (SDValue NV = performCONDCombine(N, DCI, DAG, 2, 3))		if (SDValue NV = performCONDCombine(N, DCI, DAG, 2, 3))
N = NV.getNode();		N = NV.getNode();
SDValue Chain = N->getOperand(0);		SDValue Chain = N->getOperand(0);
SDValue Dest = N->getOperand(1);		SDValue Dest = N->getOperand(1);
SDValue CCVal = N->getOperand(2);		SDValue CCVal = N->getOperand(2);
SDValue Cmp = N->getOperand(3);		SDValue Cmp = N->getOperand(3);

assert(isa<ConstantSDNode>(CCVal) && "Expected a ConstantSDNode here!");		assert(isa<ConstantSDNode>(CCVal) && "Expected a ConstantSDNode here!");
▲ Show 20 Lines • Show All 994 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp

Show First 20 Lines • Show All 958 Lines • ▼ Show 20 Lines	bool AArch64InstrInfo::areMemAccessesTriviallyDisjoint(
return false;		return false;
}		}

bool AArch64InstrInfo::isSchedulingBoundary(const MachineInstr &MI,		bool AArch64InstrInfo::isSchedulingBoundary(const MachineInstr &MI,
const MachineBasicBlock *MBB,		const MachineBasicBlock *MBB,
const MachineFunction &MF) const {		const MachineFunction &MF) const {
if (TargetInstrInfo::isSchedulingBoundary(MI, MBB, MF))		if (TargetInstrInfo::isSchedulingBoundary(MI, MBB, MF))
return true;		return true;
		switch (MI.getOpcode()) {
		case AArch64::DSB:
		case AArch64::ISB:
		// DSB and ISB also are scheduling barriers.
		return true;
		default:;
		}
return isSEHInstruction(MI);		return isSEHInstruction(MI);
}		}

/// analyzeCompare - For a comparison instruction, return the source registers		/// analyzeCompare - For a comparison instruction, return the source registers
/// in SrcReg and SrcReg2, and the value it compares against in CmpValue.		/// in SrcReg and SrcReg2, and the value it compares against in CmpValue.
/// Return true if the comparison instruction can be analyzed.		/// Return true if the comparison instruction can be analyzed.
bool AArch64InstrInfo::analyzeCompare(const MachineInstr &MI, unsigned &SrcReg,		bool AArch64InstrInfo::analyzeCompare(const MachineInstr &MI, unsigned &SrcReg,
unsigned &SrcReg2, int &CmpMask,		unsigned &SrcReg2, int &CmpMask,
▲ Show 20 Lines • Show All 4,554 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64InstructionSelector.cpp

Show First 20 Lines • Show All 782 Lines • ▼ Show 20 Lines	if (Ty.getSizeInBits() > 32) {
LLVM_DEBUG(dbgs() << "G_BRCOND has type: " << Ty		LLVM_DEBUG(dbgs() << "G_BRCOND has type: " << Ty
<< ", expected at most 32-bits");		<< ", expected at most 32-bits");
return false;		return false;
}		}

const unsigned CondReg = I.getOperand(0).getReg();		const unsigned CondReg = I.getOperand(0).getReg();
MachineBasicBlock *DestMBB = I.getOperand(1).getMBB();		MachineBasicBlock *DestMBB = I.getOperand(1).getMBB();

if (selectCompareBranch(I, MF, MRI))		// Speculation tracking/SLH assumes that optimized TB(N)Z/CB(N)Z
		// instructions will not be produced, as they are conditional branch
		// instructions that do not set flags.
		bool ProduceNonFlagSettingCondBr =
		!MF.getFunction().hasFnAttribute(Attribute::SpeculativeLoadHardening);
		if (ProduceNonFlagSettingCondBr && selectCompareBranch(I, MF, MRI))
return true;		return true;

		if (ProduceNonFlagSettingCondBr) {
auto MIB = BuildMI(MBB, I, I.getDebugLoc(), TII.get(AArch64::TBNZW))		auto MIB = BuildMI(MBB, I, I.getDebugLoc(), TII.get(AArch64::TBNZW))
.addUse(CondReg)		.addUse(CondReg)
.addImm(/bit offset=/0)		.addImm(/bit offset=/0)
.addMBB(DestMBB);		.addMBB(DestMBB);

I.eraseFromParent();		I.eraseFromParent();
return constrainSelectedInstRegOperands(*MIB.getInstr(), TII, TRI, RBI);		return constrainSelectedInstRegOperands(*MIB.getInstr(), TII, TRI, RBI);
		} else {
		auto CMP = BuildMI(MBB, I, I.getDebugLoc(), TII.get(AArch64::ANDSWri))
		.addDef(AArch64::WZR)
		.addUse(CondReg)
		.addImm(1);
		constrainSelectedInstRegOperands(*CMP.getInstr(), TII, TRI, RBI);
		auto Bcc =
		BuildMI(MBB, I, I.getDebugLoc(), TII.get(AArch64::Bcc))
		.addImm(AArch64CC::EQ)
		.addMBB(DestMBB);

		I.eraseFromParent();
		return constrainSelectedInstRegOperands(*Bcc.getInstr(), TII, TRI, RBI);
		}
}		}

case TargetOpcode::G_BRINDIRECT: {		case TargetOpcode::G_BRINDIRECT: {
I.setDesc(TII.get(AArch64::BR));		I.setDesc(TII.get(AArch64::BR));
return constrainSelectedInstRegOperands(I, TII, TRI, RBI);		return constrainSelectedInstRegOperands(I, TII, TRI, RBI);
}		}

case TargetOpcode::G_FCONSTANT:		case TargetOpcode::G_FCONSTANT:
▲ Show 20 Lines • Show All 1,017 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64RegisterInfo.cpp

Show First 20 Lines • Show All 197 Lines • ▼ Show 20 Lines	AArch64RegisterInfo::getReservedRegs(const MachineFunction &MF) const {
for (size_t i = 0; i < AArch64::GPR32commonRegClass.getNumRegs(); ++i) {		for (size_t i = 0; i < AArch64::GPR32commonRegClass.getNumRegs(); ++i) {
if (MF.getSubtarget<AArch64Subtarget>().isXRegisterReserved(i))		if (MF.getSubtarget<AArch64Subtarget>().isXRegisterReserved(i))
markSuperRegs(Reserved, AArch64::GPR32commonRegClass.getRegister(i));		markSuperRegs(Reserved, AArch64::GPR32commonRegClass.getRegister(i));
}		}

if (hasBasePointer(MF))		if (hasBasePointer(MF))
markSuperRegs(Reserved, AArch64::W19);		markSuperRegs(Reserved, AArch64::W19);

		// SLH uses register W16/X16 as the taint register.
		if (MF.getFunction().hasFnAttribute(Attribute::SpeculativeLoadHardening))
		markSuperRegs(Reserved, AArch64::W16);

assert(checkAllSuperRegsMarked(Reserved));		assert(checkAllSuperRegsMarked(Reserved));
return Reserved;		return Reserved;
}		}

bool AArch64RegisterInfo::isReservedReg(const MachineFunction &MF,		bool AArch64RegisterInfo::isReservedReg(const MachineFunction &MF,
unsigned Reg) const {		unsigned Reg) const {
return getReservedRegs(MF)[Reg];		return getReservedRegs(MF)[Reg];
}		}
▲ Show 20 Lines • Show All 304 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64SpeculationHardening.cpp

				//===- AArch64SpeculationHardening.cpp - Harden Against Missspeculation --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains a pass to insert code to mitigate against side channel
				// vulnerabilities that may happen under control flow miss-speculation.
				//
				// The pass implements tracking of control flow miss-speculation into a "taint"
				// register. That taint register can then be used to mask off registers with
				// sensitive data when executing under miss-speculation, a.k.a. "transient
				// execution".
				// This pass is aimed at mitigating against SpectreV1-style vulnarabilities.
				//
				// At the moment, it implements the tracking of miss-speculation of control
				// flow into a taint register, but doesn't implement a mechanism yet to then
				// use that taint register to mask of vulnerable data in registers (something
				// for a follow-on improvement). Possible strategies to mask out vulnerable
				// data that can be implemented on top of this are:
				// - speculative load hardening to automatically mask of data loaded
				// in registers.
				// - using intrinsics to mask of data in registers as indicated by the
				// programmer (see https://lwn.net/Articles/759423/).
				//
				// For AArch64, the following implementation choices are made below.
				// Some of these are different than the implementation choices made in
				// the similar pass implemented in X86SpeculativeLoadHardening.cpp, as
				// the instruction set characteristics result in different trade-offs.
				// - The speculation hardening is done after register allocation. With a
				// relative abundance of registers, one register is reserved (X16) to be
				// the taint register. X16 is expected to not clash with other register
				// reservation mechanisms with very high probability because:
				// . The AArch64 ABI doesn't guarantee X16 to be retained across any call.
				// . The only way to request X16 to be used as a programmer is through
				// inline assembly. In the rare case a function explicitly demands to
				// use X16/W16, this pass falls back to hardening against speculation
				// by inserting a DSB SYS/ISB barrier pair which will prevent control
				// flow speculation.
				// - It is easy to insert mask operations at this late stage as we have
				// mask operations available that don't set flags.
				// - The taint variable contains all-ones when no miss-speculation is detected,
				// and contains all-zeros when miss-speculation is detected. Therefore, when
				// masking, an AND instruction (which only changes the register to be masked,
				// no other side effects) can easily be inserted anywhere that's needed.
				// - The tracking of miss-speculation is done by using a data-flow conditional
				// select instruction (CSEL) to evaluate the flags that were also used to
				// make conditional branch direction decisions. Speculation of the CSEL
				// instruction can be limited with a CSDB instruction - so the combination of
				// CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL
				// aren't speculated. When conditional branch direction gets miss-speculated,
				// the semantics of the inserted CSEL instruction is such that the taint
				// register will contain all zero bits.
				// One key requirement for this to work is that the conditional branch is
				// followed by an execution of the CSEL instruction, where the CSEL
				// instruction needs to use the same flags status as the conditional branch.
				// This means that the conditional branches must not be implemented as one
				// of the AArch64 conditional branches that do not use the flags as input
				// (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction
				// selectors to not produce these instructions when speculation hardening
				// is enabled. This pass will assert if it does encounter such an instruction.
				// - On function call boundaries, the miss-speculation state is transferred from
				// the taint register X16 to be encoded in the SP register as value 0.
				//
				// Future extensions/improvements could be:
				// - Implement this functionality using full speculation barriers, akin to the
				// x86-slh-lfence option. This may be more useful for the intrinsics-based
				// approach than for the SLH approach to masking.
				// Note that this pass already inserts the full speculation barriers if the
				// function for some niche reason makes use of X16/W16.
				// - no indirect branch misprediction gets protected/instrumented; but this
				// could be done for some indirect branches, such as switch jump tables.
				//===----------------------------------------------------------------------===//

				#include "AArch64InstrInfo.h"
				#include "AArch64Subtarget.h"
				#include "Utils/AArch64BaseInfo.h"
				#include "llvm/ADT/BitVector.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/CodeGen/MachineBasicBlock.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstr.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineOperand.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/IR/DebugLoc.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/CodeGen.h"
				#include "llvm/Target/TargetMachine.h"
				#include <cassert>

				using namespace llvm;

				#define DEBUG_TYPE "aarch64-speculation-hardening"

				#define AARCH64_SPECULATION_HARDENING_NAME "AArch64 speculation hardening pass"
				zbridUnsubmitted Not Done Reply Inline Actions Why is this a macro instead of a const variable? zbrid: Why is this a macro instead of a const variable?
				kristof.beylsAuthorUnsubmitted Done Reply Inline Actions I just followed existing practice in the other LLVM passes. Maybe this could be a const variable - not sure. If so, maybe best to make the change in a separate patch with a focus on making this change for all existing passes? kristof.beyls: I just followed existing practice in the other LLVM passes. Maybe this could be a const…

				namespace {

				class AArch64SpeculationHardening : public MachineFunctionPass {
				public:
				const TargetInstrInfo *TII;
				const TargetRegisterInfo *TRI;

				static char ID;

				AArch64SpeculationHardening() : MachineFunctionPass(ID) {
				initializeAArch64SpeculationHardeningPass(*PassRegistry::getPassRegistry());
				}

				bool runOnMachineFunction(MachineFunction &Fn) override;

				StringRef getPassName() const override {
				return AARCH64_SPECULATION_HARDENING_NAME;
				}

				private:
				unsigned MisspeculatingTaintReg;
				bool UseControlFlowSpeculationBarrier;

				bool functionUsesHardeningRegister(MachineFunction &MF) const;
				bool instrumentControlFlow(MachineBasicBlock &MBB);
				bool endsWithCondControlFlow(MachineBasicBlock &MBB, MachineBasicBlock *&TBB,
				MachineBasicBlock *&FBB,
				AArch64CC::CondCode &CondCode) const;
				void insertTrackingCode(MachineBasicBlock &SplitEdgeBB,
				AArch64CC::CondCode &CondCode, DebugLoc DL) const;
				void insertSPToRegTaintPropagation(MachineBasicBlock *MBB,
				MachineBasicBlock::iterator MBBI) const;
				void insertRegToSPTaintPropagation(MachineBasicBlock *MBB,
				MachineBasicBlock::iterator MBBI,
				unsigned TmpReg) const;
				};

				} // end anonymous namespace

				char AArch64SpeculationHardening::ID = 0;

				INITIALIZE_PASS(AArch64SpeculationHardening, "aarch64-speculation-hardening",
				AARCH64_SPECULATION_HARDENING_NAME, false, false)

				bool AArch64SpeculationHardening::endsWithCondControlFlow(
				MachineBasicBlock &MBB, MachineBasicBlock &TBB, MachineBasicBlock &FBB,
				AArch64CC::CondCode &CondCode) const {
				SmallVector<MachineOperand, 1> analyzeBranchCondCode;
				if (TII->analyzeBranch(MBB, TBB, FBB, analyzeBranchCondCode, false))
				return false;

				// Ignore if the BB ends in an unconditional branch/fall-through.
				if (analyzeBranchCondCode.empty())
				return false;

				// If the BB ends with a single conditional branch, FBB will be set to
				// nullptr (see API docs for TII->analyzeBranch). For the rest of the
				// analysis we want the FBB block to be set always.
				assert(TBB != nullptr);
				if (FBB == nullptr)
				FBB = MBB.getFallThrough();

				// If both the true and the false condition jump to the same basic block,
				// there isn't need for any protection - whether the branch is speculated
				// correctly or not, we end up executing the architecturally correct code.
				zbridUnsubmitted Not Done Reply Inline Actions Just curious in general about this, are there cases where this is expected to happen in practice? It seems useless to have a branch that goes to the same place for both true and false. zbrid: Just curious in general about this, are there cases where this is expected to happen in…
				kristof.beylsAuthorUnsubmitted Done Reply Inline Actions This pass has seen quite a few iterations over the past couple of months. I'm sure I have seen examples of TBB == FBB in some cases while running the pass across a range of test codes, but I'm afraid I no longer remember where I've seen the example... kristof.beyls: This pass has seen quite a few iterations over the past couple of months. I'm sure I have seen…
				if (TBB == FBB)
				return false;

				assert(MBB.succ_size() == 2);
				// translate analyzeBranchCondCode to CondCode.
				assert(analyzeBranchCondCode.size() == 1 && "unknown Cond array format");
				zbridUnsubmitted Not Done Reply Inline Actions How does this assert work? I don't understand how you're ANDing with the string "unknown Cond array format" zbrid: How does this assert work? I don't understand how you're ANDing with the string ``` "unknown…
				kristof.beylsAuthorUnsubmitted Done Reply Inline Actions This is also an idiom used in LLVM. Anding with the string results in some documentation to be printed (the string) when the assert fires. The string in the assert is often helpful as the actual condition check may be a bit cryptic if you hit an assert you haven't written yourself recently. For more details, see https://llvm.org/docs/CodingStandards.html#assert-liberally. kristof.beyls: This is also an idiom used in LLVM. Anding with the string results in some documentation to be…
				CondCode = AArch64CC::CondCode(analyzeBranchCondCode[0].getImm());
				return true;
				}

				void AArch64SpeculationHardening::insertTrackingCode(
				MachineBasicBlock &SplitEdgeBB, AArch64CC::CondCode &CondCode,
				DebugLoc DL) const {
				if (UseControlFlowSpeculationBarrier) {
				// insert full control flow speculation barrier (DSB SYS + ISB)
				BuildMI(SplitEdgeBB, SplitEdgeBB.begin(), DL, TII->get(AArch64::ISB))
				.addImm(0xf);
				BuildMI(SplitEdgeBB, SplitEdgeBB.begin(), DL, TII->get(AArch64::DSB))
				.addImm(0xf);
				} else {
				BuildMI(SplitEdgeBB, SplitEdgeBB.begin(), DL, TII->get(AArch64::CSELXr))
				.addDef(MisspeculatingTaintReg)
				.addUse(MisspeculatingTaintReg)
				.addUse(AArch64::XZR)
				.addImm(CondCode);
				zbridUnsubmitted Not Done Reply Inline Actions From the comments at the beginning of the file, I expected a CSDB instruction right after this instruction. Is that not needed or added somewhere else (like the follow up commit that adds the masking)? zbrid: From the comments at the beginning of the file, I expected a CSDB instruction right after this…
				kristof.beylsAuthorUnsubmitted Done Reply Inline Actions The CSDB instruction is needed before the register containing the taint is actually used. In this patch, no uses of the register containing the taint are introduced - that is done in the follow-on patch. So, indeed, you'll see CSDB instructions being inserted as part of the follow-on patch - where uses of the taint register also get introduced. kristof.beyls: The CSDB instruction is needed before the register containing the taint is actually used. In…
				SplitEdgeBB.addLiveIn(AArch64::NZCV);
				zbridUnsubmitted Not Done Reply Inline Actions What is a live in and why does this one get added? zbrid: What is a live in and why does this one get added?
				kristof.beylsAuthorUnsubmitted Done Reply Inline Actions The LiveIns record which registers are live at the start of the basic block. Here, the CSEL instruction inserted at the start of the basic block uses (implicitly) the flags AArch64::NZCV. Those flags were set by the previous basic block, where the conditional jump lives that jumps to the current basic block. Therefore, here we have to explicitly mark that the flags are live at the start of the basic block. kristof.beyls: The LiveIns record which registers are live at the start of the basic block. Here, the CSEL…
				}
				}

				bool AArch64SpeculationHardening::instrumentControlFlow(
				MachineBasicBlock &MBB) {
				LLVM_DEBUG(dbgs() << "Instrument control flow tracking on MBB: " << MBB);

				bool Modified = false;
				MachineBasicBlock *TBB = nullptr;
				MachineBasicBlock *FBB = nullptr;
				AArch64CC::CondCode CondCode;

				if (!endsWithCondControlFlow(MBB, TBB, FBB, CondCode)) {
				LLVM_DEBUG(dbgs() << "... doesn't end with CondControlFlow\n");
				} else {
				// Now insert:
				// "CSEL MisSpeculatingR, MisSpeculatingR, XZR, cond" on the True edge and
				// "CSEL MisSpeculatingR, MisSpeculatingR, XZR, Invertcond" on the False
				// edge.
				AArch64CC::CondCode InvCondCode = AArch64CC::getInvertedCondCode(CondCode);

				MachineBasicBlock SplitEdgeTBB = MBB.SplitCriticalEdge(TBB, this);
				MachineBasicBlock SplitEdgeFBB = MBB.SplitCriticalEdge(FBB, this);

				assert(SplitEdgeTBB != nullptr);
				assert(SplitEdgeFBB != nullptr);

				DebugLoc DL;
				if (MBB.instr_end() != MBB.instr_begin())
				DL = (--MBB.instr_end())->getDebugLoc();
				zbridUnsubmitted Not Done Reply Inline Actions Is this getting the instruction before the branch instruction? Then that instruction is used as the debug location for the FBB and TBB we got in lines 214/215? Did I understand this correctly? zbrid: Is this getting the instruction before the branch instruction? Then that instruction is used as…
				kristof.beylsAuthorUnsubmitted Done Reply Inline Actions instr_end() gets the "one past the last instruction" in the basic block, so "--MBB.instr_end()" gets the last instruction in the basic block MBB. So, the instruction that is used as the debug location is the last instruction. Presumably that last instruction will be the conditional branch instruction, so the tracking code (CSEL) will get the same debug info as the conditional branch instruction for which it's tracking miss-speculation. kristof.beyls: instr_end() gets the "one past the last instruction" in the basic block, so "--MBB.instr_end()"…

				insertTrackingCode(*SplitEdgeTBB, CondCode, DL);
				insertTrackingCode(*SplitEdgeFBB, InvCondCode, DL);

				LLVM_DEBUG(dbgs() << "SplitEdgeTBB: " << *SplitEdgeTBB << "\n");
				LLVM_DEBUG(dbgs() << "SplitEdgeFBB: " << *SplitEdgeFBB << "\n");
				Modified = true;
				}

				// Perform correct code generation around function calls and before returns.
				{
				zbridUnsubmitted Not Done Reply Inline Actions Why is this code in a separate code block? zbrid: Why is this code in a separate code block?
				kristof.beylsAuthorUnsubmitted Done Reply Inline Actions There isn't really a need for it - it's just a personal style where I've used the block to indicate the conceptual extent of the "perform correct code generation around function calls and before returns". I'll remove the nesting level in an update - there is indeed no need for it and it deviates from existing practice. kristof.beyls: There isn't really a need for it - it's just a personal style where I've used the block to…
				SmallVector<MachineInstr *, 4> ReturnInstructions;
				SmallVector<MachineInstr *, 4> CallInstructions;

				for (MachineInstr &MI : MBB) {
				if (MI.isReturn())
				ReturnInstructions.push_back(&MI);
				else if (MI.isCall())
				CallInstructions.push_back(&MI);
				}

				Modified \|=
				(ReturnInstructions.size() > 0) \|\| (CallInstructions.size() > 0);

				for (MachineInstr *Return : ReturnInstructions)
				insertRegToSPTaintPropagation(Return->getParent(), Return, AArch64::X17);
				for (MachineInstr *Call : CallInstructions) {
				// Just after the call:
				MachineBasicBlock::iterator i = Call;
				i++;
				insertSPToRegTaintPropagation(Call->getParent(), i);
				// Just before the call:
				insertRegToSPTaintPropagation(Call->getParent(), Call, AArch64::X17);
				}
				}

				return Modified;
				}

				void AArch64SpeculationHardening::insertSPToRegTaintPropagation(
				zbridUnsubmitted Not Done Reply Inline Actions Why doesn't this function return bool to indicate whether instructions were modified? zbrid: Why doesn't this function return bool to indicate whether instructions were modified?
				kristof.beylsAuthorUnsubmitted Done Reply Inline Actions This function always modifies MBB - so there is no need to signal back to the caller whether instructions were modified. kristof.beyls: This function always modifies MBB - so there is no need to signal back to the caller whether…
				MachineBasicBlock *MBB, MachineBasicBlock::iterator MBBI) const {
				// If full control flow speculation barriers are used, emit a control flow
				// barrier to block potential miss-speculation in flight coming in to this
				// function.
				if (UseControlFlowSpeculationBarrier) {
				// insert full control flow speculation barrier (DSB SYS + ISB)
				BuildMI(*MBB, MBBI, DebugLoc(), TII->get(AArch64::DSB)).addImm(0xf);
				BuildMI(*MBB, MBBI, DebugLoc(), TII->get(AArch64::ISB)).addImm(0xf);
				zbridUnsubmitted Done Reply Inline Actions Why do these instructions have an immediate operand? It looks like you're passing the default for ISB, but it's not clear to me what that number means for DSB. I checked this manual, but it doesn't say what the immediate operand is used for, only that it's limited in the value it can have: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802a/a64_general_alpha.html http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802a/a64_general_alpha.html zbrid: Why do these instructions have an immediate operand? It looks like you're passing the default…
				efriedmaUnsubmitted Not Done Reply Inline Actions The complete architecture reference manual is publicly available at https://developer.arm.com/docs/ddi0487/latest . This should allow you to quickly answer all your questions about the instruction choices. efriedma: The complete architecture reference manual is publicly available at https://developer.arm.
				kristof.beylsAuthorUnsubmitted Done Reply Inline Actions As Eli pointed to - you should be able to find the information in the location he pointed to. For example, in section "C6.2.75 DSB", the details state "...SY ... Encoded as CRm = 0b1111." kristof.beyls: As Eli pointed to - you should be able to find the information in the location he pointed to.
				return;
				}

				// CMP SP, #0 === SUBS xzr, SP, #0
				zbridUnsubmitted Done Reply Inline Actions Is having the zero register in ARM as the destination register equivalent to discarding the output or is the zero register affected? If it's the former, why use CMP instead of SUBS if this instruction discards the result of the instruction in the same way that CMP does? zbrid: Is having the zero register in ARM as the destination register equivalent to discarding the…
				efriedmaUnsubmitted Done Reply Inline Actions CMP is an alias. efriedma: CMP is an alias.
				kristof.beylsAuthorUnsubmitted Done Reply Inline Actions The xzr register exists in AArch64 ("64-bit Arm"), but not in ARM ("32-bit Arm"). You're understanding is otherwise correct. Let me quote the ArmARM ("Arm Architecture Reference Manual"): "The name XZR represents the zero register for 64-bit operands where an encoding of the value 31 in the corresponding register field is interpreted as returning zero when read or discarding the result when written." A little bit of background on instruction aliases: quite a few "instructions" you write in assembly, like CMP are actually encoded as more general instructions - in this case SUBS. So, the "CMP" instruction only really exists at the assembly level (for convenience) - but once encoded it is just a SUBS instruction. To quote the ArmARM again on the CMP instruction: """ CMP <Xn\|SP>, #<imm>{, <shift>} is equivalent to SUBS XZR, <Xn\|SP>, #<imm> {, <shift>} and is always the preferred disassembly. """ In MachineInstrs, these aliases are not available, only the "real" encodable instructions. So when creating an instruction, you need to use SUBS rather than CMP. In the comments, I have tried to put in the mapping from the alias to the encodable instruction, as I personally find it handy. I hope that makes sense? kristof.beyls: The xzr register exists in AArch64 ("64-bit Arm"), but not in ARM ("32-bit Arm"). You're…
				BuildMI(*MBB, MBBI, DebugLoc(), TII->get(AArch64::SUBSXri))
				.addDef(AArch64::XZR)
				.addUse(AArch64::SP)
				.addImm(0)
				.addImm(0); // no shift
				// CSETM x16, NE === CSINV x16, xzr, xzr, EQ
				zbridUnsubmitted Done Reply Inline Actions Why use CSINV instead of CSETM here? zbrid: Why use CSINV instead of CSETM here?
				efriedmaUnsubmitted Done Reply Inline Actions CSINV is an alias. efriedma: CSINV is an alias.
				BuildMI(*MBB, MBBI, DebugLoc(), TII->get(AArch64::CSINVXr))
				.addDef(MisspeculatingTaintReg)
				.addUse(AArch64::XZR)
				.addUse(AArch64::XZR)
				.addImm(AArch64CC::EQ);
				}

				void AArch64SpeculationHardening::insertRegToSPTaintPropagation(
				MachineBasicBlock *MBB, MachineBasicBlock::iterator MBBI,
				unsigned TmpReg) const {
				// If full control flow speculation barriers are used, there will not be
				// miss-speculation when returning from this function, and therefore, also
				// no need to encode potential miss-speculation into the stack pointer.
				if (UseControlFlowSpeculationBarrier)
				return;

				// mov Xtmp, SP === ADD Xtmp, SP, #0
				zbridUnsubmitted Done Reply Inline Actions Why use ADD instead of MOV? zbrid: Why use ADD instead of MOV?
				efriedmaUnsubmitted Done Reply Inline Actions MOV is an alias. efriedma: MOV is an alias.
				BuildMI(*MBB, MBBI, DebugLoc(), TII->get(AArch64::ADDXri))
				.addDef(TmpReg)
				.addUse(AArch64::SP)
				.addImm(0)
				.addImm(0); // no shift
				// and Xtmp, Xtmp, TaintReg === AND Xtmp, Xtmp, TaintReg, #0
				BuildMI(*MBB, MBBI, DebugLoc(), TII->get(AArch64::ANDXrs))
				.addDef(TmpReg, RegState::Renamable)
				.addUse(TmpReg, RegState::Kill \| RegState::Renamable)
				.addUse(MisspeculatingTaintReg, RegState::Kill)
				.addImm(0);
				// mov SP, Xtmp === ADD SP, Xtmp, #0
				BuildMI(*MBB, MBBI, DebugLoc(), TII->get(AArch64::ADDXri))
				.addDef(AArch64::SP)
				.addUse(TmpReg, RegState::Kill)
				.addImm(0)
				.addImm(0); // no shift
				zbridUnsubmitted Done Reply Inline Actions Why not the following instead of the three above instructions? AND SP, SP, TaintReg zbrid: Why not the following instead of the three above instructions? ``` AND SP, SP, TaintReg ```
				efriedmaUnsubmitted Not Done Reply Inline Actions There is no such encoding. efriedma: There is no such encoding.
				kristof.beylsAuthorUnsubmitted Done Reply Inline Actions As Eli said. More general, only the following instructions can write to the SP register in AArch64: ADD, SUB, NEG, NEGS, ADDS, SUBS. Therefore, to be able to AND the SP with another register and write that value to SP, we first need to move the SP to another register. kristof.beyls: As Eli said. More general, only the following instructions can write to the SP register in…
				}

				bool AArch64SpeculationHardening::functionUsesHardeningRegister(
				MachineFunction &MF) const {
				for (MachineBasicBlock &MBB : MF) {
				for (MachineInstr &MI : MBB) {
				// treat function calls specially, as the hardening register does not
				// need to remain live across function calls.
				if (MI.isCall())
				continue;
				if (MI.readsRegister(MisspeculatingTaintReg, TRI) \|\|
				MI.modifiesRegister(MisspeculatingTaintReg, TRI))
				return true;
				}
				}
				return false;
				}

				bool AArch64SpeculationHardening::runOnMachineFunction(MachineFunction &MF) {
				if (!MF.getFunction().hasFnAttribute(Attribute::SpeculativeLoadHardening))
				return false;

				MisspeculatingTaintReg = AArch64::X16;
				TII = MF.getSubtarget().getInstrInfo();
				TRI = MF.getSubtarget().getRegisterInfo();
				zbridUnsubmitted Done Reply Inline Actions Why do you have to get this data every time the function is called? Could these be static since we wouldn't expect them to be different between the functions in the same compilation? Or are there cases where this would be different between two functions? zbrid: Why do you have to get this data every time the function is called? Could these be static since…
				efriedmaUnsubmitted Done Reply Inline Actions These can change between functions (with LTO, or certain function attributes). efriedma: These can change between functions (with LTO, or certain function attributes).
				bool Modified = false;

				UseControlFlowSpeculationBarrier = functionUsesHardeningRegister(MF);

				// Instrument control flow speculation tracking, if requested.
				LLVM_DEBUG(
				dbgs()
				<< "*** AArch64SpeculationHardening - track control flow ***\n");

				// 1. Add instrumentation code to function entry and exits.
				SmallVector<MachineBasicBlock *, 2> EntryBlocks;
				EntryBlocks.push_back(&MF.front());
				for (const LandingPadInfo &LPI : MF.getLandingPads())
				EntryBlocks.push_back(LPI.LandingPadBlock);
				for (auto Entry : EntryBlocks)
				insertSPToRegTaintPropagation(
				Entry, Entry->SkipPHIsLabelsAndDebug(Entry->begin()));

				// 2. Add instrumentation code to every basic block.
				for (auto &MBB : MF)
				Modified \|= instrumentControlFlow(MBB);

				return Modified;
				}

				/// \brief Returns an instance of the pseudo instruction expansion pass.
				FunctionPass *llvm::createAArch64SpeculationHardeningPass() {
				return new AArch64SpeculationHardening();
				}

llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.cpp

Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	extern "C" void LLVMInitializeAArch64Target() {
initializeAArch64SIMDInstrOptPass(*PR);		initializeAArch64SIMDInstrOptPass(*PR);
initializeAArch64PreLegalizerCombinerPass(*PR);		initializeAArch64PreLegalizerCombinerPass(*PR);
initializeAArch64PromoteConstantPass(*PR);		initializeAArch64PromoteConstantPass(*PR);
initializeAArch64RedundantCopyEliminationPass(*PR);		initializeAArch64RedundantCopyEliminationPass(*PR);
initializeAArch64StorePairSuppressPass(*PR);		initializeAArch64StorePairSuppressPass(*PR);
initializeFalkorHWPFFixPass(*PR);		initializeFalkorHWPFFixPass(*PR);
initializeFalkorMarkStridedAccessesLegacyPass(*PR);		initializeFalkorMarkStridedAccessesLegacyPass(*PR);
initializeLDTLSCleanupPass(*PR);		initializeLDTLSCleanupPass(*PR);
		initializeAArch64SpeculationHardeningPass(*PR);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// AArch64 Lowering public interface.		// AArch64 Lowering public interface.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {		static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {
if (TT.isOSBinFormatMachO())		if (TT.isOSBinFormatMachO())
return llvm::make_unique<AArch64_MachoTargetObjectFile>();		return llvm::make_unique<AArch64_MachoTargetObjectFile>();
▲ Show 20 Lines • Show All 357 Lines • ▼ Show 20 Lines

void AArch64PassConfig::addPreSched2() {		void AArch64PassConfig::addPreSched2() {
// Expand some pseudo instructions to allow proper scheduling.		// Expand some pseudo instructions to allow proper scheduling.
addPass(createAArch64ExpandPseudoPass());		addPass(createAArch64ExpandPseudoPass());
// Use load/store pair instructions when possible.		// Use load/store pair instructions when possible.
if (TM->getOptLevel() != CodeGenOpt::None) {		if (TM->getOptLevel() != CodeGenOpt::None) {
if (EnableLoadStoreOpt)		if (EnableLoadStoreOpt)
addPass(createAArch64LoadStoreOptimizationPass());		addPass(createAArch64LoadStoreOptimizationPass());
		}

		// The AArch64SpeculationHardeningPass destroys dominator tree and natural
		// loop info, which is needed for the FalkorHWPFFixPass and also later on.
		// Therefore, run the AArch64SpeculationHardeningPass before the
		// FalkorHWPFFixPass to avoid recomputing dominator tree and natural loop
		// info.
		addPass(createAArch64SpeculationHardeningPass());

		if (TM->getOptLevel() != CodeGenOpt::None) {
if (EnableFalkorHWPFFix)		if (EnableFalkorHWPFFix)
addPass(createFalkorHWPFFixPass());		addPass(createFalkorHWPFFixPass());
}		}
}		}

void AArch64PassConfig::addPreEmitPass() {		void AArch64PassConfig::addPreEmitPass() {
// Machine Block Placement might have created new opportunities when run		// Machine Block Placement might have created new opportunities when run
// at O3, where the Tail Duplication Threshold is set to 4 instructions.		// at O3, where the Tail Duplication Threshold is set to 4 instructions.
Show All 21 Lines

llvm/trunk/lib/Target/AArch64/CMakeLists.txt

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	add_llvm_target(AArch64CodeGen
AArch64MacroFusion.cpp		AArch64MacroFusion.cpp
AArch64MCInstLower.cpp		AArch64MCInstLower.cpp
AArch64PreLegalizerCombiner.cpp		AArch64PreLegalizerCombiner.cpp
AArch64PromoteConstant.cpp		AArch64PromoteConstant.cpp
AArch64PBQPRegAlloc.cpp		AArch64PBQPRegAlloc.cpp
AArch64RegisterBankInfo.cpp		AArch64RegisterBankInfo.cpp
AArch64RegisterInfo.cpp		AArch64RegisterInfo.cpp
AArch64SelectionDAGInfo.cpp		AArch64SelectionDAGInfo.cpp
		AArch64SpeculationHardening.cpp
AArch64StorePairSuppress.cpp		AArch64StorePairSuppress.cpp
AArch64Subtarget.cpp		AArch64Subtarget.cpp
AArch64TargetMachine.cpp		AArch64TargetMachine.cpp
AArch64TargetObjectFile.cpp		AArch64TargetObjectFile.cpp
AArch64TargetTransformInfo.cpp		AArch64TargetTransformInfo.cpp
AArch64SIMDInstrOpt.cpp		AArch64SIMDInstrOpt.cpp

DEPENDS		DEPENDS
Show All 9 Lines

llvm/trunk/test/CodeGen/AArch64/O0-pipeline.ll

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Eliminate PHI nodes for register allocation			; CHECK-NEXT: Eliminate PHI nodes for register allocation
	; CHECK-NEXT: Two-Address instruction pass			; CHECK-NEXT: Two-Address instruction pass
	; CHECK-NEXT: Fast Register Allocator			; CHECK-NEXT: Fast Register Allocator
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: Machine Optimization Remark Emitter			; CHECK-NEXT: Machine Optimization Remark Emitter
	; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization			; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization
	; CHECK-NEXT: Post-RA pseudo instruction expansion pass			; CHECK-NEXT: Post-RA pseudo instruction expansion pass
	; CHECK-NEXT: AArch64 pseudo instruction expansion pass			; CHECK-NEXT: AArch64 pseudo instruction expansion pass
				; CHECK-NEXT: AArch64 speculation hardening pass
	; CHECK-NEXT: Analyze Machine Code For Garbage Collection			; CHECK-NEXT: Analyze Machine Code For Garbage Collection
	; CHECK-NEXT: Branch relaxation pass			; CHECK-NEXT: Branch relaxation pass
	; CHECK-NEXT: AArch64 Branch Targets			; CHECK-NEXT: AArch64 Branch Targets
	; CHECK-NEXT: Contiguously Lay Out Funclets			; CHECK-NEXT: Contiguously Lay Out Funclets
	; CHECK-NEXT: StackMap Liveness Analysis			; CHECK-NEXT: StackMap Liveness Analysis
	; CHECK-NEXT: Live DEBUG_VALUE analysis			; CHECK-NEXT: Live DEBUG_VALUE analysis
	; CHECK-NEXT: Insert fentry calls			; CHECK-NEXT: Insert fentry calls
	; CHECK-NEXT: Insert XRay ops			; CHECK-NEXT: Insert XRay ops
	Show All 9 Lines

llvm/trunk/test/CodeGen/AArch64/O3-pipeline.ll

	Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Shrink Wrapping analysis			; CHECK-NEXT: Shrink Wrapping analysis
	; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization			; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization
	; CHECK-NEXT: Control Flow Optimizer			; CHECK-NEXT: Control Flow Optimizer
	; CHECK-NEXT: Tail Duplication			; CHECK-NEXT: Tail Duplication
	; CHECK-NEXT: Machine Copy Propagation Pass			; CHECK-NEXT: Machine Copy Propagation Pass
	; CHECK-NEXT: Post-RA pseudo instruction expansion pass			; CHECK-NEXT: Post-RA pseudo instruction expansion pass
	; CHECK-NEXT: AArch64 pseudo instruction expansion pass			; CHECK-NEXT: AArch64 pseudo instruction expansion pass
	; CHECK-NEXT: AArch64 load / store optimization pass			; CHECK-NEXT: AArch64 load / store optimization pass
				; CHECK-NEXT: AArch64 speculation hardening pass
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: Machine Natural Loop Construction			; CHECK-NEXT: Machine Natural Loop Construction
	; CHECK-NEXT: Falkor HW Prefetch Fix Late Phase			; CHECK-NEXT: Falkor HW Prefetch Fix Late Phase
	; CHECK-NEXT: PostRA Machine Instruction Scheduler			; CHECK-NEXT: PostRA Machine Instruction Scheduler
	; CHECK-NEXT: Analyze Machine Code For Garbage Collection			; CHECK-NEXT: Analyze Machine Code For Garbage Collection
	; CHECK-NEXT: Machine Block Frequency Analysis			; CHECK-NEXT: Machine Block Frequency Analysis
	; CHECK-NEXT: MachinePostDominator Tree Construction			; CHECK-NEXT: MachinePostDominator Tree Construction
	; CHECK-NEXT: Branch Probability Basic Block Placement			; CHECK-NEXT: Branch Probability Basic Block Placement
	Show All 23 Lines

llvm/trunk/test/CodeGen/AArch64/speculation-hardening-dagisel.ll

				; RUN: sed -e 's/SLHATTR/speculative_load_hardening/' %s \| llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu \| FileCheck %s --check-prefixes=CHECK,SLH --dump-input-on-failure
				; RUN: sed -e 's/SLHATTR//' %s \| llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu \| FileCheck %s --check-prefixes=CHECK,NOSLH --dump-input-on-failure

				declare i64 @g(i64, i64) local_unnamed_addr
				define i64 @f_using_reserved_reg_x16(i64 %a, i64 %b) local_unnamed_addr SLHATTR {
				; CHECK-LABEL: f_using_reserved_reg_x16
				; SLH: dsb sy
				; SLH: isb
				; NOSLH-NOT: dsb sy
				; NOSLH-NOT: isb
				entry:
				%cmp = icmp ugt i64 %a, %b
				br i1 %cmp, label %if.then, label %cleanup

				; CHECK: b.ls
				; SLH: dsb sy
				; SLH: isb
				; NOSLH-NOT: dsb sy
				; NOSLH-NOT: isb
				if.then:
				%0 = tail call i64 asm "autia1716", "={x17},{x16},0"(i64 %b, i64 %a)
				; CHECK: bl g
				; SLH: dsb sy
				; SLH: isb
				; NOSLH-NOT: dsb sy
				; NOSLH-NOT: isb
				; CHECK: ret
				%call = tail call i64 @g(i64 %a, i64 %b) #3
				%add = add i64 %call, %0
				br label %cleanup

				cleanup:
				; SLH: dsb sy
				; SLH: isb
				; NOSLH-NOT: dsb sy
				; NOSLH-NOT: isb
				; SLH: ret
				%retval.0 = phi i64 [ %add, %if.then ], [ %b, %entry ]
				ret i64 %retval.0
				}

				define i32 @f_clobbered_reg_w16(i32 %a, i32 %b) local_unnamed_addr SLHATTR {
				; CHECK-LABEL: f_clobbered_reg_w16
				entry:
				; SLH: dsb sy
				; SLH: isb
				; NOSLH-NOT: dsb sy
				; NOSLH-NOT: isb
				%cmp = icmp sgt i32 %a, %b
				br i1 %cmp, label %if.then, label %if.end
				; CHECK: b.le

				if.then:
				; SLH: dsb sy
				; SLH: isb
				; NOSLH-NOT: dsb sy
				; NOSLH-NOT: isb
				; CHECK: mov w16, w0
				tail call void asm sideeffect "mov w16, ${0:w}", "r,~{w16}"(i32 %a)
				br label %if.end
				; SLH: ret

				if.end:
				%add = add nsw i32 %b, %a
				ret i32 %add
				; SLH: dsb sy
				; SLH: isb
				; NOSLH-NOT: dsb sy
				; NOSLH-NOT: isb
				; SLH: ret
				}

llvm/trunk/test/CodeGen/AArch64/speculation-hardening.ll

				; RUN: sed -e 's/SLHATTR/speculative_load_hardening/' %s \| llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu \| FileCheck %s --check-prefixes=CHECK,SLH --dump-input-on-failure
				; RUN: sed -e 's/SLHATTR//' %s \| llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu \| FileCheck %s --check-prefixes=CHECK,NOSLH --dump-input-on-failure
				; RUN: sed -e 's/SLHATTR/speculative_load_hardening/' %s \| llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -global-isel \| FileCheck %s --check-prefixes=CHECK,SLH --dump-input-on-failure
				; RUN sed -e 's/SLHATTR//' %s \| llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -global-isel \| FileCheck %s --check-prefixes=CHECK,NOSLH --dump-input-on-failure
				; RUN: sed -e 's/SLHATTR/speculative_load_hardening/' %s \| llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -fast-isel \| FileCheck %s --check-prefixes=CHECK,SLH --dump-input-on-failure
				; RUN: sed -e 's/SLHATTR//' %s \| llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -fast-isel \| FileCheck %s --check-prefixes=CHECK,NOSLH --dump-input-on-failure

				define i32 @f(i8* nocapture readonly %p, i32 %i, i32 %N) local_unnamed_addr SLHATTR {
				; CHECK-LABEL: f
				entry:
				; SLH: cmp sp, #0
				; SLH: csetm x16, ne
				; NOSLH-NOT: cmp sp, #0
				; NOSLH-NOT: csetm x16, ne

				; SLH: mov x17, sp
				; SLH: and x17, x17, x16
				; SLH: mov sp, x17
				; NOSLH-NOT: mov x17, sp
				; NOSLH-NOT: and x17, x17, x16
				; NOSLH-NOT: mov sp, x17
				%call = tail call i32 @tail_callee(i32 %i)
				; SLH: cmp sp, #0
				; SLH: csetm x16, ne
				; NOSLH-NOT: cmp sp, #0
				; NOSLH-NOT: csetm x16, ne
				%cmp = icmp slt i32 %call, %N
				br i1 %cmp, label %if.then, label %return
				; GlobalISel lowers the branch to a b.ne sometimes instead of b.ge as expected..
				; CHECK: b.[[COND:(ge)\|(lt)\|(ne)]]

				if.then: ; preds = %entry
				; NOSLH-NOT: csel x16, x16, xzr, {{(lt)\|(ge)\|(eq)}}
				; SLH-DAG: csel x16, x16, xzr, {{(lt)\|(ge)\|(eq)}}
				%idxprom = sext i32 %i to i64
				%arrayidx = getelementptr inbounds i8, i8* %p, i64 %idxprom
				%0 = load i8, i8* %arrayidx, align 1
				; CHECK-DAG: ldrb [[LOADED:w[0-9]+]],
				%conv = zext i8 %0 to i32
				br label %return

				; SLH-DAG: csel x16, x16, xzr, [[COND]]
				; NOSLH-NOT: csel x16, x16, xzr, [[COND]]
				return: ; preds = %entry, %if.then
				%retval.0 = phi i32 [ %conv, %if.then ], [ 0, %entry ]
				; SLH: mov x17, sp
				; SLH: and x17, x17, x16
				; SLH: mov sp, x17
				; NOSLH-NOT: mov x17, sp
				; NOSLH-NOT: and x17, x17, x16
				; NOSLH-NOT: mov sp, x17
				ret i32 %retval.0
				}

				; Make sure that for a tail call, taint doesn't get put into SP twice.
				define i32 @tail_caller(i32 %a) local_unnamed_addr SLHATTR {
				; CHECK-LABEL: tail_caller:
				; SLH: mov x17, sp
				; SLH: and x17, x17, x16
				; SLH: mov sp, x17
				; NOSLH-NOT: mov x17, sp
				; NOSLH-NOT: and x17, x17, x16
				; NOSLH-NOT: mov sp, x17
				; GlobalISel doesn't optimize tail calls (yet?), so only check that
				; cross-call taint register setup code is missing if a tail call was
				; actually produced.
				; SLH: {{(bl tail_callee[[:space:]] cmp sp, #0)\|(b tail_callee)}}
				; SLH-NOT: cmp sp, #0
				%call = tail call i32 @tail_callee(i32 %a)
				ret i32 %call
				}

				declare i32 @tail_callee(i32) local_unnamed_addr

				; Verify that no cb(n)z/tb(n)z instructions are produced when implementing
				; SLH
				define i32 @compare_branch_zero(i32, i32) SLHATTR {
				; CHECK-LABEL: compare_branch_zero
				%3 = icmp eq i32 %0, 0
				br i1 %3, label %then, label %else
				;SLH-NOT: cb{{n?}}z
				;NOSLH: cb{{n?}}z
				then:
				%4 = sdiv i32 5, %1
				ret i32 %4
				else:
				%5 = sdiv i32 %1, %0
				ret i32 %5
				}

				define i32 @test_branch_zero(i32, i32) SLHATTR {
				; CHECK-LABEL: test_branch_zero
				%3 = and i32 %0, 16
				%4 = icmp eq i32 %3, 0
				br i1 %4, label %then, label %else
				;SLH-NOT: tb{{n?}}z
				;NOSLH: tb{{n?}}z
				then:
				%5 = sdiv i32 5, %1
				ret i32 %5
				else:
				%6 = sdiv i32 %1, %0
				ret i32 %6
				}

				define i32 @landingpad(i32 %l0, i32 %l1) SLHATTR personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
				; CHECK-LABEL: landingpad
				entry:
				; SLH: cmp sp, #0
				; SLH: csetm x16, ne
				; NOSLH-NOT: cmp sp, #0
				; NOSLH-NOT: csetm x16, ne
				; CHECK: bl _Z10throwing_fv
				invoke void @_Z10throwing_fv()
				to label %exit unwind label %lpad
				; SLH: cmp sp, #0
				; SLH: csetm x16, ne

				lpad:
				%l4 = landingpad { i8*, i32 }
				catch i8* null
				; SLH: cmp sp, #0
				; SLH: csetm x16, ne
				; NOSLH-NOT: cmp sp, #0
				; NOSLH-NOT: csetm x16, ne
				%l5 = extractvalue { i8*, i32 } %l4, 0
				%l6 = tail call i8* @__cxa_begin_catch(i8* %l5)
				%l7 = icmp sgt i32 %l0, %l1
				br i1 %l7, label %then, label %else
				; GlobalISel lowers the branch to a b.ne sometimes instead of b.ge as expected..
				; CHECK: b.[[COND:(le)\|(gt)\|(ne)]]

				then:
				; SLH-DAG: csel x16, x16, xzr, [[COND]]
				%l9 = sdiv i32 %l0, %l1
				br label %postif

				else:
				; SLH-DAG: csel x16, x16, xzr, {{(gt)\|(le)\|(eq)}}
				%l11 = sdiv i32 %l1, %l0
				br label %postif

				postif:
				%l13 = phi i32 [ %l9, %then ], [ %l11, %else ]
				tail call void @__cxa_end_catch()
				br label %exit

				exit:
				%l15 = phi i32 [ %l13, %postif ], [ 0, %entry ]
				ret i32 %l15
				}

				declare i32 @__gxx_personality_v0(...)
				declare void @_Z10throwing_fv() local_unnamed_addr
				declare i8* @__cxa_begin_catch(i8*) local_unnamed_addr
				declare void @__cxa_end_catch() local_unnamed_addr

llvm/trunk/test/CodeGen/AArch64/speculation-hardening.mir

				# RUN: llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu \
				# RUN: -start-before aarch64-speculation-hardening -o - %s \
				# RUN: \| FileCheck %s --dump-input-on-failure

				# Check that the speculation hardening pass generates code as expected for
				# basic blocks ending with a variety of branch patterns:
				# - (1) no branches (fallthrough)
				# - (2) one unconditional branch
				# - (3) one conditional branch + fall-through
				# - (4) one conditional branch + one unconditional branch
				# - other direct branches don't seem to be generated by the AArch64 codegen
				--- \|
				define void @nobranch_fallthrough(i32 %a, i32 %b) speculative_load_hardening {
				ret void
				}
				define void @uncondbranch(i32 %a, i32 %b) speculative_load_hardening {
				ret void
				}
				define void @condbranch_fallthrough(i32 %a, i32 %b) speculative_load_hardening {
				ret void
				}
				define void @condbranch_uncondbranch(i32 %a, i32 %b) speculative_load_hardening {
				ret void
				}
				define void @indirectbranch(i32 %a, i32 %b) speculative_load_hardening {
				ret void
				}
				...
				---
				name: nobranch_fallthrough
				tracksRegLiveness: true
				body: \|
				; CHECK-LABEL: nobranch_fallthrough
				bb.0:
				successors: %bb.1
				liveins: $w0, $w1
				; CHECK-NOT: csel
				bb.1:
				liveins: $w0
				RET undef $lr, implicit $w0
				...
				---
				name: uncondbranch
				tracksRegLiveness: true
				body: \|
				; CHECK-LABEL: uncondbranch
				bb.0:
				successors: %bb.1
				liveins: $w0, $w1
				B %bb.1
				; CHECK-NOT: csel
				bb.1:
				liveins: $w0
				RET undef $lr, implicit $w0
				...
				---
				name: condbranch_fallthrough
				tracksRegLiveness: true
				body: \|
				; CHECK-LABEL: condbranch_fallthrough
				bb.0:
				successors: %bb.1, %bb.2
				liveins: $w0, $w1
				$wzr = SUBSWrs renamable $w0, renamable $w1, 0, implicit-def $nzcv, implicit-def $nzcv
				Bcc 11, %bb.2, implicit $nzcv
				; CHECK: b.lt [[BB_LT_T:\.LBB[0-9_]+]]

				bb.1:
				liveins: $nzcv, $w0
				; CHECK: csel x16, x16, xzr, ge
				RET undef $lr, implicit $w0
				bb.2:
				liveins: $nzcv, $w0
				; CHECK: csel x16, x16, xzr, lt
				RET undef $lr, implicit $w0
				...
				---
				name: condbranch_uncondbranch
				tracksRegLiveness: true
				body: \|
				; CHECK-LABEL: condbranch_uncondbranch
				bb.0:
				successors: %bb.1, %bb.2
				liveins: $w0, $w1
				$wzr = SUBSWrs renamable $w0, renamable $w1, 0, implicit-def $nzcv, implicit-def $nzcv
				Bcc 11, %bb.2, implicit $nzcv
				B %bb.1, implicit $nzcv
				; CHECK: b.lt [[BB_LT_T:\.LBB[0-9_]+]]

				bb.1:
				liveins: $nzcv, $w0
				; CHECK: csel x16, x16, xzr, ge
				RET undef $lr, implicit $w0
				bb.2:
				liveins: $nzcv, $w0
				; CHECK: csel x16, x16, xzr, lt
				RET undef $lr, implicit $w0
				...
				---
				name: indirectbranch
				tracksRegLiveness: true
				body: \|
				; Check that no instrumentation is done on indirect branches (for now).
				; CHECK-LABEL: indirectbranch
				bb.0:
				successors: %bb.1, %bb.2
				liveins: $x0
				BR $x0
				bb.1:
				liveins: $x0
				; CHECK-NOT: csel
				RET undef $lr, implicit $x0
				bb.2:
				liveins: $x0
				; CHECK-NOT: csel
				RET undef $lr, implicit $x0
				...

This is an archive of the discontinued LLVM Phabricator instance.

Introduce control flow speculation tracking pass for AArch64.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 178615

llvm/trunk/lib/Target/AArch64/AArch64.h

llvm/trunk/lib/Target/AArch64/AArch64FastISel.cpp

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp

llvm/trunk/lib/Target/AArch64/AArch64InstructionSelector.cpp

llvm/trunk/lib/Target/AArch64/AArch64RegisterInfo.cpp

llvm/trunk/lib/Target/AArch64/AArch64SpeculationHardening.cpp

llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.cpp

llvm/trunk/lib/Target/AArch64/CMakeLists.txt

llvm/trunk/test/CodeGen/AArch64/O0-pipeline.ll

llvm/trunk/test/CodeGen/AArch64/O3-pipeline.ll

llvm/trunk/test/CodeGen/AArch64/speculation-hardening-dagisel.ll

llvm/trunk/test/CodeGen/AArch64/speculation-hardening.ll

llvm/trunk/test/CodeGen/AArch64/speculation-hardening.mir

Introduce control flow speculation tracking pass for AArch64.
ClosedPublic