This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add workaround for Cortex-A53 erratum (835769)
ClosedPublic

Authored by bsmith on Oct 10 2014, 2:05 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
jmolloy

Summary

Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is
possible for a 64-bit multiply-accumulate instruction in AArch64 state to
generate an incorrect result. The details are quite complex and hard to
determine statically, since branches in the code may exist in some
circumstances, but all cases end with a memory (load, store, or prefetch)
instruction followed immediately by the multiply-accumulate operation.

The safest work-around for this issue is to make the compiler avoid emitting
multiply-accumulate instructions immediately after memory instructions and the
simplest way to do this is to insert a NOP.

This patch implements such work-around. The work-around is only enabled
when specifying the clang command line option -mfix-cortex-a53-835769 or the
llvm backend option -aarch64-fix-cortex-a53-835769.

The work-around code generation is not enabled by default.

Diff Detail

Event Timeline

bsmith updated this revision to Diff 14706.Oct 10 2014, 2:05 AM

bsmith retitled this revision from to [AArch64] Add workaround for Cortex-A53 erratum (835769).

bsmith updated this object.

bsmith edited the test plan for this revision. (Show Details)

bsmith set the repository for this revision to rL LLVM.

bsmith added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptOct 10 2014, 2:05 AM

The implementation seems correct and the rationale seems right. I'd rather Tim had a look at it, too, just in case.

Just a few unimportant comments inline, but overall, looks good to me.

Thanks!

lib/Target/AArch64/AArch64FixCortexA53_835769.cpp
84 ↗	(On Diff #14706)	I'm not sure this is really necessary, but keeping a comment about this is probably good.
test/CodeGen/AArch64/aarch64-fix-cortex-a53-835769.ll
40	There is no RUN checks for these...
43	You can just have two checks: ON and OFF and control what needs to be on and off onthe RUN line.
129	I think you should also add the expected CHECK line on both cases but not add the CHECK text.

Hi Bradley,

I have plenty of comments inline.

Cheers,

James

lib/Target/AArch64/AArch64.h
43	I still maintain what I said internally that the name of the pass should fit convention - there's a convention in two lines above of putting "A5*" after AArch64. Can you please change to fit that convention unless you have a compelling reason otherwise?
lib/Target/AArch64/AArch64FixCortexA53_835769.cpp
29 ↗	(On Diff #14706)	You don't need EquivalenceClasses, BitVector, or RegisterScavenging, (or raw_ostream).
52 ↗	(On Diff #14706)	Capitalize "must".
69 ↗	(On Diff #14706)	All comments should have proper SPAG! :)
98 ↗	(On Diff #14706)	You're not using this, it should be removed.
134 ↗	(On Diff #14706)	You're not using this.
163 ↗	(On Diff #14706)	Why does this return nullptr on failure instead of asserting?
177 ↗	(On Diff #14706)	is owned by the variable name: MachineBasicBlock *PMBB.
200 ↗	(On Diff #14706)	GratuitouslyLongNameIsGratuitouslyLong. Wouldn't "Sequences" work, with a comment saying this is the terminating instruction in the sequence? Or something similarly shortened... We like explicit names, but I feel this one is a bit on the long side.
205 ↗	(On Diff #14706)	What if PrevInstr is nullptr here? Will you have to jump back to the previous fallthrough if one exists? or assert? It doesn't seem right to continue with PrevInstr being nullptr.
232 ↗	(On Diff #14706)	Space after MI.

This revision now requires changes to proceed.Oct 10 2014, 3:06 AM

bsmith added inline comments.Oct 10 2014, 4:10 AM

lib/Target/AArch64/AArch64.h
43	The rationale behind this name is that this phase isn't an A53 specific phase, it's a phase that addresses an A53 erratum. In a big-little system you may well want this phase enabled for A57 code.
lib/Target/AArch64/AArch64FixCortexA53_835769.cpp
205 ↗	(On Diff #14706)	I've added an assert to getLastNonPseudo since if we have a block we've fallen through we expect to find an instruction. If getBBFallenThrough returns nullptr it means there was no fallen through block, hence PrevInstr being nullptr is valid in this case, meaning we have no previous instruction yet (the loop will then just go to the next interation where it does have one).
test/CodeGen/AArch64/aarch64-fix-cortex-a53-835769.ll
40	These are a hangup from some previous code I had, I'll remove them. (They test the samething as CHECK-NOWORKAROUND).

Address review comments

rengolin added inline comments.Oct 10 2014, 6:34 AM

lib/Target/AArch64/AArch64FixCortexA53_835769.cpp
155 ↗	(On Diff #14711)	shouldn't this be an llvm_unreachable instead?
test/CodeGen/AArch64/aarch64-fix-cortex-a53-835769.ll
37	If BASIC-PASS-DISABLED == NOWORKAROUND, why have two?

bsmith added inline comments.Oct 10 2014, 7:16 AM

test/CodeGen/AArch64/aarch64-fix-cortex-a53-835769.ll
37	All of the below tests are very sensitive to scheduling, and indeed if you run all of the NOWORKAROUND tests for a57/generic/cyclone some of them will fail for this very reason. Instead only a single test is done for BASIC-PASS-DISABLED just to check the pass doesn't run.

Use llvm_unreachable instead of directly asserting.

Hi Bradley,

The rationale behind this name is that this phase isn't an A53 specific phase, it's a phase that addresses an A53 erratum. In a big-little system you may well want this phase enabled for A57 code.

Sorry, I'm not convinced by that rationale. By the same reasoning, the FP Load balancing for A57 might be applied to code with -mcpu=cortex-a53 because it may run on a big.LITTLE system and has more impact for the big core than the little.

It doesn't seem right to continue with PrevInstr being nullptr.

I've added an assert to getLastNonPseudo since if we have a block we've fallen through we expect to find an instruction. If getBBFallenThrough returns nullptr it means there was no fallen through block, hence PrevInstr being nullptr is valid in this case, meaning we have no previous instruction yet (the loop will then just go to the next interation where it does have one).

The case I was concerned about is if we had a fallthrough block, but that block contained only pseudo instructions. Do you handle that case correctly? You'd need to unwind through the fallthroughs until one contained a non-pseudo, I'd have thought.

Hi Bradley,

Thanks for being open about the issue, it's really excellent to see ARM contributing to LLVM even in difficult circumstances like this.

I can't really comment on the actual work around, since I haven't seen the erratum; but I'll trust you've had enough look at it. I did spot a couple of issues; one stylistic, the other a bit more problematic.

Cheers.

Tim.

lib/Target/AArch64/AArch64FixCortexA53_835769.cpp
34–37 ↗	(On Diff #14729)	I think we're trying to standardise on putting these predicates in AArch64TargetMachine.h, to decide whether the pass is even added.
143 ↗	(On Diff #14729)	This doesn't necessarily mean it was a fallthrough. Particularly at -O0 a block may happen to be before another in layout order but contain a real branch anyway.
168 ↗	(On Diff #14729)	I think this may damage expected block semantics too. You're putting an instruction without isTerminator after the real terminators (like the conditional branch). I'd hope the verifier would complain about that (but be unsurprised if it didn't). Other passes may not cope (in particular, a glance at AnalyzeBranch suggests it'll fail after this). Are you using the previous block to win back some efficiency when there's multiple branches here? If so, it may not be worth the effort, though I'll leave you to decide.

t.p.northover added inline comments.Oct 10 2014, 8:01 AM

lib/Target/AArch64/AArch64FixCortexA53_835769.cpp
143 ↗	(On Diff #14729)	Actually, AnalyzeBranch is probably what you want to use here, if you find some way to go ahead with the NOP at end solution.

rengolin added inline comments.Oct 10 2014, 8:10 AM

test/CodeGen/AArch64/aarch64-fix-cortex-a53-835769.ll
37	Right, makes sense.

bsmith added inline comments.Oct 10 2014, 8:15 AM

lib/Target/AArch64/AArch64FixCortexA53_835769.cpp
143 ↗	(On Diff #14729)	In a case where we have a real branch, a nop would never get inserted since the last instruction of the block would be the branch, not a load/store/prefetch. Unless it's easy to check for this case, might it be worth leaving like this if it's harmless?
168 ↗	(On Diff #14729)	Similarly here, if there is a real terminator a nop would never get inserted, since a load/store/prefetch would never be a terminator I think? How does this work normally for fallthrough blocks with no explicit terminator?

t.p.northover added inline comments.Oct 10 2014, 8:29 AM

lib/Target/AArch64/AArch64FixCortexA53_835769.cpp
143 ↗	(On Diff #14729)	Ah, I see. OK, I think it's probably correct now, but the name is a bit misleading because the block may not fall through at all. I'd probably change to using AnalyzeBranch anyway, simply because it indicates the type of checks you're performing and can hopefully be relied on to get the fiddly details right: if(!AnalyzeBranch(..., TBB, FBB) && !TBB & !FBB) return std::prev(MBB); return nullptr; Alternatively, perhaps rename the function to indicate you don't actually care if it's not a real fallthrough.

Address various review comments:

Move backend option to AArch64TargetMachine
Use AnalyzeBranch in getBBFallenThrough
Rename phase to AArch64A53Fix835769

Thanks Bradley, I think this looks OK now.

Tim.

Hi Bradley

Your new test needs a "REQUIRES: asserts" as llc -stats require asserts to be on.

Hi Bradley,

This now LGTM.

Cheers,

James

This revision is now accepted and ready to land.Oct 13 2014, 12:55 AM

Committed as 219603, thanks!

rengolin added a comment.Oct 13 2014, 3:34 AM

This comment was removed by rengolin.

rengolin added a comment.Oct 13 2014, 3:35 AM

This comment was removed by rengolin.

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64.h

1 line

AArch64A53Fix835769.cpp

229 lines

AArch64TargetMachine.cpp

7 lines

CMakeLists.txt

1 line

test/

CodeGen/

AArch64/

aarch64-fix-cortex-a53-835769.ll

524 lines

Diff 14733

lib/Target/AArch64/AArch64.h

Context not available.
	FunctionPass *createAArch64ConditionOptimizerPass();	FunctionPass *createAArch64ConditionOptimizerPass();
	FunctionPass *createAArch64AddressTypePromotionPass();	FunctionPass *createAArch64AddressTypePromotionPass();
	FunctionPass *createAArch64A57FPLoadBalancing();	FunctionPass *createAArch64A57FPLoadBalancing();
		FunctionPass *createAArch64A53Fix835769();
	/// \brief Creates an ARM-specific Target Transformation Info pass.	/// \brief Creates an ARM-specific Target Transformation Info pass.
		jmolloyUnsubmitted Not Done Reply Inline Actions I still maintain what I said internally that the name of the pass should fit convention - there's a convention in two lines above of putting "A5" after AArch64. Can you please change to fit that convention unless you have a compelling reason otherwise? jmolloy:* I still maintain what I said internally that the name of the pass should fit convention…
		bsmithAuthorUnsubmitted Not Done Reply Inline Actions The rationale behind this name is that this phase isn't an A53 specific phase, it's a phase that addresses an A53 erratum. In a big-little system you may well want this phase enabled for A57 code. bsmith: The rationale behind this name is that this phase isn't an A53 specific phase, it's a phase…
	ImmutablePass *	ImmutablePass *
	createAArch64TargetTransformInfoPass(const AArch64TargetMachine *TM);	createAArch64TargetTransformInfoPass(const AArch64TargetMachine *TM);
Context not available.

lib/Target/AArch64/AArch64A53Fix835769.cpp

This file was added.

				//===-- AArch64A53Fix835769.cpp -------------------------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				// This pass changes code to work around Cortex-A53 erratum 835769.
				// It works around it by inserting a nop instruction in code sequences that
				// in some circumstances may trigger the erratum.
				// It inserts a nop instruction between a sequence of the following 2 classes
				// of instructions:
				// instr 1: mem-instr (including loads, stores and prefetches).
				// instr 2: non-SIMD integer multiply-accumulate writing 64-bit X registers.
				//===----------------------------------------------------------------------===//

				#include "AArch64.h"
				#include "AArch64InstrInfo.h"
				#include "AArch64Subtarget.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstr.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/Debug.h"

				using namespace llvm;

				#define DEBUG_TYPE "aarch64-fix-cortex-a53-835769"

				STATISTIC(NumNopsAdded, "Number of Nops added to work around erratum 835769");

				//===----------------------------------------------------------------------===//
				// Helper functions

				// Is the instruction a match for the instruction that comes first in the
				// sequence of instructions that can trigger the erratum?
				static bool isFirstInstructionInSequence(MachineInstr *MI) {
				// Must return true if this instruction is a load, a store or a prefetch.
				switch (MI->getOpcode()) {
				case AArch64::PRFMl:
				case AArch64::PRFMroW:
				case AArch64::PRFMroX:
				case AArch64::PRFMui:
				case AArch64::PRFUMi:
				return true;
				default:
				return (MI->mayLoad() \|\| MI->mayStore());
				}
				}

				// Is the instruction a match for the instruction that comes second in the
				// sequence that can trigger the erratum?
				static bool isSecondInstructionInSequence(MachineInstr *MI) {
				// Must return true for non-SIMD integer multiply-accumulates, writing
				// to a 64-bit register.
				switch (MI->getOpcode()) {
				// Erratum cannot be triggered when the destination register is 32 bits,
				// therefore only include the following.
				case AArch64::MSUBXrrr:
				case AArch64::MADDXrrr:
				case AArch64::SMADDLrrr:
				case AArch64::SMSUBLrrr:
				case AArch64::UMADDLrrr:
				case AArch64::UMSUBLrrr:
				// Erratum can only be triggered by multiply-adds, not by regular
				// non-accumulating multiplies, i.e. when Ra=XZR='11111'
				return MI->getOperand(3).getReg() != AArch64::XZR;
				default:
				return false;
				}
				}


				//===----------------------------------------------------------------------===//

				namespace {
				class AArch64A53Fix835769 : public MachineFunctionPass {
				const AArch64InstrInfo *TII;

				public:
				static char ID;
				explicit AArch64A53Fix835769() : MachineFunctionPass(ID) {}

				bool runOnMachineFunction(MachineFunction &F) override;

				const char *getPassName() const override {
				return "Workaround A53 erratum 835769 pass";
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				MachineFunctionPass::getAnalysisUsage(AU);
				}

				private:
				bool runOnBasicBlock(MachineBasicBlock &MBB);
				};
				char AArch64A53Fix835769::ID = 0;

				} // end anonymous namespace

				//===----------------------------------------------------------------------===//

				bool
				AArch64A53Fix835769::runOnMachineFunction(MachineFunction &F) {
				const TargetMachine &TM = F.getTarget();

				bool Changed = false;
				DEBUG(dbgs() << "*** AArch64A53Fix835769 ***\n");

				TII = TM.getSubtarget<AArch64Subtarget>().getInstrInfo();

				for (auto &MBB : F) {
				Changed \|= runOnBasicBlock(MBB);
				}

				return Changed;
				}

				// Return the block that was fallen through to get to MBB, if any,
				// otherwise nullptr.
				static MachineBasicBlock *getBBFallenThrough(MachineBasicBlock &MBB,
				const TargetInstrInfo *TII) {
				// Get the previous machine basic block in the function.
				MachineFunction::iterator MBBI = MBB;

				// Can't go off top of function.
				if (MBBI == MBB.getParent()->begin())
				return nullptr;

				MachineBasicBlock TBB = nullptr, FBB = nullptr;
				SmallVector<MachineOperand, 2> Cond;

				MachineBasicBlock *PrevBB = std::prev(MBBI);
				for (MachineBasicBlock *S : MBB.predecessors())
				if (S == PrevBB && !TII->AnalyzeBranch(*PrevBB, TBB, FBB, Cond) &&
				!TBB && !FBB)
				return S;

				return nullptr;
				}

				static MachineInstr getLastNonPseudo(MachineBasicBlock MBB) {
				for (auto I = MBB->rbegin(), E = MBB->rend(); I != E; ++I) {
				if (!I->isPseudo())
				return &*I;
				}

				llvm_unreachable("Expected to find a non-pseudo instruction");
				}

				static void insertNopBeforeInstruction(MachineBasicBlock &MBB, MachineInstr* MI,
				const TargetInstrInfo *TII) {
				// If we are the first instruction of the block, put the NOP at the end of
				// the previous fallthrough block
				if (MI == &MBB.front()) {
				MachineBasicBlock *PMBB = getBBFallenThrough(MBB, TII);
				assert(PMBB && "Expected basic block");
				MachineInstr *I = getLastNonPseudo(PMBB);
				assert(I && "Expected instruction");
				DebugLoc DL = I->getDebugLoc();
				BuildMI(PMBB, DL, TII->get(AArch64::HINT)).addImm(0);
				}
				else {
				DebugLoc DL = MI->getDebugLoc();
				BuildMI(MBB, MI, DL, TII->get(AArch64::HINT)).addImm(0);
				}

				++NumNopsAdded;
				}

				bool
				AArch64A53Fix835769::runOnBasicBlock(MachineBasicBlock &MBB) {
				bool Changed = false;
				DEBUG(dbgs() << "Running on MBB: " << MBB << " - scanning instructions...\n");

				// First, scan the basic block, looking for a sequence of 2 instructions
				// that match the conditions under which the erratum may trigger.

				// List of terminating instructions in matching sequences
				std::vector<MachineInstr*> Sequences;
				unsigned Idx = 0;
				MachineInstr *PrevInstr = nullptr;

				if (MachineBasicBlock *PMBB = getBBFallenThrough(MBB, TII))
				PrevInstr = getLastNonPseudo(PMBB);

				for (auto &MI : MBB) {
				MachineInstr *CurrInstr = &MI;
				DEBUG(dbgs() << " Examining: " << MI);
				if (PrevInstr) {
				DEBUG(dbgs() << " PrevInstr: " << *PrevInstr
				<< " CurrInstr: " << *CurrInstr
				<< " isFirstInstructionInSequence(PrevInstr): "
				<< isFirstInstructionInSequence(PrevInstr) << "\n"
				<< " isSecondInstructionInSequence(CurrInstr): "
				<< isSecondInstructionInSequence(CurrInstr) << "\n");
				if (isFirstInstructionInSequence(PrevInstr) &&
				isSecondInstructionInSequence(CurrInstr)) {
				DEBUG(dbgs() << " ** pattern found at Idx " << Idx << "!\n");
				Sequences.push_back(CurrInstr);
				}
				}
				if (!CurrInstr->isPseudo())
				PrevInstr = CurrInstr;
				++Idx;
				}

				DEBUG(dbgs() << "Scan complete, "<< Sequences.size()
				<< " occurences of pattern found.\n");

				// Then update the basic block, inserting nops between the detected sequences.
				for (auto &MI : Sequences) {
				Changed = true;
				insertNopBeforeInstruction(MBB, MI, TII);
				}

				return Changed;
				}

				// Factory function used by AArch64TargetMachine to add the pass to
				// the passmanager.
				FunctionPass *llvm::createAArch64A53Fix835769() {
				return new AArch64A53Fix835769();
				}

lib/Target/AArch64/AArch64TargetMachine.cpp

Context not available.
	cl::desc("Use PBQP register allocator (experimental)"),	cl::desc("Use PBQP register allocator (experimental)"),
	cl::init(false));	cl::init(false));

		static cl::opt<bool>
		EnableA53Fix835769("aarch64-fix-cortex-a53-835769", cl::Hidden,
		cl::desc("Work around Cortex-A53 erratum 835769"),
		cl::init(false));

	extern "C" void LLVMInitializeAArch64Target() {	extern "C" void LLVMInitializeAArch64Target() {
	// Register the target.	// Register the target.
	RegisterTargetMachine<AArch64leTargetMachine> X(TheAArch64leTarget);	RegisterTargetMachine<AArch64leTargetMachine> X(TheAArch64leTarget);
Context not available.
	}	}

	bool AArch64PassConfig::addPreEmitPass() {	bool AArch64PassConfig::addPreEmitPass() {
		if (EnableA53Fix835769)
		addPass(createAArch64A53Fix835769());
	// Relax conditional branch instructions if they're otherwise out of	// Relax conditional branch instructions if they're otherwise out of
	// range of their destination.	// range of their destination.
	addPass(createAArch64BranchRelaxation());	addPass(createAArch64BranchRelaxation());
Context not available.

lib/Target/AArch64/CMakeLists.txt

Context not available.
	AArch64DeadRegisterDefinitionsPass.cpp	AArch64DeadRegisterDefinitionsPass.cpp
	AArch64ExpandPseudoInsts.cpp	AArch64ExpandPseudoInsts.cpp
	AArch64FastISel.cpp	AArch64FastISel.cpp
		AArch64A53Fix835769.cpp
	AArch64FrameLowering.cpp	AArch64FrameLowering.cpp
	AArch64ConditionOptimizer.cpp	AArch64ConditionOptimizer.cpp
	AArch64ISelDAGToDAG.cpp	AArch64ISelDAGToDAG.cpp
Context not available.

test/CodeGen/AArch64/aarch64-fix-cortex-a53-835769.ll

This file was added.

				; The regression tests need to test for order of emitted instructions, and
				; therefore, the tests are a bit fragile/reliant on instruction scheduling. The
				; test cases have been minimized as much as possible, but still most of the test
				; cases could break if instruction scheduling heuristics for cortex-a53 change
				; RUN: llc < %s -mcpu=cortex-a53 -aarch64-fix-cortex-a53-835769=1 -stats 2>&1 \
				; RUN: \| FileCheck %s --check-prefix CHECK
				; RUN: llc < %s -mcpu=cortex-a53 -aarch64-fix-cortex-a53-835769=0 -stats 2>&1 \
				; RUN: \| FileCheck %s --check-prefix CHECK-NOWORKAROUND
				; The following run lines are just to verify whether or not this pass runs by
				; default for given CPUs. Given the fragility of the tests, this is only run on
				; a test case where the scheduler has not freedom at all to reschedule the
				; instructions, so the potentially massively different scheduling heuristics
				; will not break the test case.
				; RUN: llc < %s -mcpu=generic \| FileCheck %s --check-prefix CHECK-BASIC-PASS-DISABLED
				; RUN: llc < %s -mcpu=cortex-a53 \| FileCheck %s --check-prefix CHECK-BASIC-PASS-DISABLED
				; RUN: llc < %s -mcpu=cortex-a57 \| FileCheck %s --check-prefix CHECK-BASIC-PASS-DISABLED
				; RUN: llc < %s -mcpu=cyclone \| FileCheck %s --check-prefix CHECK-BASIC-PASS-DISABLED

				target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64--linux-gnu"

				define i64 @f_load_madd_64(i64 %a, i64 %b, i64* nocapture readonly %c) #0 {
				entry:
				%0 = load i64* %c, align 8
				%mul = mul nsw i64 %0, %b
				%add = add nsw i64 %mul, %a
				ret i64 %add
				}
				; CHECK-LABEL: f_load_madd_64:
				; CHECK: ldr
				; CHECK-NEXT: nop
				; CHECK-NEXT: madd
				; CHECK-NOWORKAROUND-LABEL: f_load_madd_64:
				; CHECK-NOWORKAROUND: ldr
				; CHECK-NOWORKAROUND-NEXT: madd
				; CHECK-BASIC-PASS-DISABLED-LABEL: f_load_madd_64:
				; CHECK-BASIC-PASS-DISABLED: ldr
				rengolinUnsubmitted Not Done Reply Inline Actions If BASIC-PASS-DISABLED == NOWORKAROUND, why have two? rengolin: If BASIC-PASS-DISABLED == NOWORKAROUND, why have two?
				bsmithAuthorUnsubmitted Not Done Reply Inline Actions All of the below tests are very sensitive to scheduling, and indeed if you run all of the NOWORKAROUND tests for a57/generic/cyclone some of them will fail for this very reason. Instead only a single test is done for BASIC-PASS-DISABLED just to check the pass doesn't run. bsmith: All of the below tests are very sensitive to scheduling, and indeed if you run all of the…
				rengolinUnsubmitted Not Done Reply Inline Actions Right, makes sense. rengolin: Right, makes sense.
				; CHECK-BASIC-PASS-DISABLED-NEXT: madd


				rengolinUnsubmitted Not Done Reply Inline Actions There is no RUN checks for these... rengolin: There is no RUN checks for these...
				bsmithAuthorUnsubmitted Not Done Reply Inline Actions These are a hangup from some previous code I had, I'll remove them. (They test the samething as CHECK-NOWORKAROUND). bsmith: These are a hangup from some previous code I had, I'll remove them. (They test the samething as…
				define i32 @f_load_madd_32(i32 %a, i32 %b, i32* nocapture readonly %c) #0 {
				entry:
				%0 = load i32* %c, align 4
				rengolinUnsubmitted Not Done Reply Inline Actions You can just have two checks: ON and OFF and control what needs to be on and off onthe RUN line. rengolin: You can just have two checks: ON and OFF and control what needs to be on and off onthe RUN line.
				%mul = mul nsw i32 %0, %b
				%add = add nsw i32 %mul, %a
				ret i32 %add
				}
				; CHECK-LABEL: f_load_madd_32:
				; CHECK: ldr
				; CHECK-NEXT: madd
				; CHECK-NOWORKAROUND-LABEL: f_load_madd_32:
				; CHECK-NOWORKAROUND: ldr
				; CHECK-NOWORKAROUND-NEXT: madd


				define i64 @f_load_msub_64(i64 %a, i64 %b, i64* nocapture readonly %c) #0 {
				entry:
				%0 = load i64* %c, align 8
				%mul = mul nsw i64 %0, %b
				%sub = sub nsw i64 %a, %mul
				ret i64 %sub
				}
				; CHECK-LABEL: f_load_msub_64:
				; CHECK: ldr
				; CHECK-NEXT: nop
				; CHECK-NEXT: msub
				; CHECK-NOWORKAROUND-LABEL: f_load_msub_64:
				; CHECK-NOWORKAROUND: ldr
				; CHECK-NOWORKAROUND-NEXT: msub


				define i32 @f_load_msub_32(i32 %a, i32 %b, i32* nocapture readonly %c) #0 {
				entry:
				%0 = load i32* %c, align 4
				%mul = mul nsw i32 %0, %b
				%sub = sub nsw i32 %a, %mul
				ret i32 %sub
				}
				; CHECK-LABEL: f_load_msub_32:
				; CHECK: ldr
				; CHECK-NEXT: msub
				; CHECK-NOWORKAROUND-LABEL: f_load_msub_32:
				; CHECK-NOWORKAROUND: ldr
				; CHECK-NOWORKAROUND-NEXT: msub


				define i64 @f_load_mul_64(i64 %a, i64 %b, i64* nocapture readonly %c) #0 {
				entry:
				%0 = load i64* %c, align 8
				%mul = mul nsw i64 %0, %b
				ret i64 %mul
				}
				; CHECK-LABEL: f_load_mul_64:
				; CHECK: ldr
				; CHECK-NEXT: mul
				; CHECK-NOWORKAROUND-LABEL: f_load_mul_64:
				; CHECK-NOWORKAROUND: ldr
				; CHECK-NOWORKAROUND-NEXT: mul


				define i32 @f_load_mul_32(i32 %a, i32 %b, i32* nocapture readonly %c) #0 {
				entry:
				%0 = load i32* %c, align 4
				%mul = mul nsw i32 %0, %b
				ret i32 %mul
				}
				; CHECK-LABEL: f_load_mul_32:
				; CHECK: ldr
				; CHECK-NEXT: mul
				; CHECK-NOWORKAROUND-LABEL: f_load_mul_32:
				; CHECK-NOWORKAROUND: ldr
				; CHECK-NOWORKAROUND-NEXT: mul


				define i64 @f_load_mneg_64(i64 %a, i64 %b, i64* nocapture readonly %c) #0 {
				entry:
				%0 = load i64* %c, align 8
				%mul = sub i64 0, %b
				%sub = mul i64 %0, %mul
				ret i64 %sub
				}
				; CHECK-LABEL: f_load_mneg_64:
				; CHECK-NOWORKAROUND-LABEL: f_load_mneg_64:
				; FIXME: only add further checks here once LLVM actually produces
				; neg instructions
				; FIXME-CHECK: ldr
				; FIXME-CHECK-NEXT: nop
				; FIXME-CHECK-NEXT: mneg
				; FIXME-CHECK-NOWORKAROUND: ldr
				rengolinUnsubmitted Not Done Reply Inline Actions I think you should also add the expected CHECK line on both cases but not add the CHECK text. rengolin: I think you should also add the expected CHECK line on both cases but not add the CHECK text.
				; FIXME-CHECK-NOWORKAROUND-NEXT: mneg


				define i32 @f_load_mneg_32(i32 %a, i32 %b, i32* nocapture readonly %c) #0 {
				entry:
				%0 = load i32* %c, align 4
				%mul = sub i32 0, %b
				%sub = mul i32 %0, %mul
				ret i32 %sub
				}
				; CHECK-LABEL: f_load_mneg_32:
				; CHECK-NOWORKAROUND-LABEL: f_load_mneg_32:
				; FIXME: only add further checks here once LLVM actually produces
				; neg instructions
				; FIXME-CHECK: ldr
				; FIXME-CHECK-NEXT: mneg
				; FIXME-CHECK-NOWORKAROUND: ldr
				; FIXME-CHECK-NOWORKAROUND-NEXT: mneg


				define i64 @f_load_smaddl(i64 %a, i32 %b, i32 %c, i32* nocapture readonly %d) #0 {
				entry:
				%conv = sext i32 %b to i64
				%conv1 = sext i32 %c to i64
				%mul = mul nsw i64 %conv1, %conv
				%add = add nsw i64 %mul, %a
				%0 = load i32* %d, align 4
				%conv2 = sext i32 %0 to i64
				%add3 = add nsw i64 %add, %conv2
				ret i64 %add3
				}
				; CHECK-LABEL: f_load_smaddl:
				; CHECK: ldrsw
				; CHECK-NEXT: nop
				; CHECK-NEXT: smaddl
				; CHECK-NOWORKAROUND-LABEL: f_load_smaddl:
				; CHECK-NOWORKAROUND: ldrsw
				; CHECK-NOWORKAROUND-NEXT: smaddl


				define i64 @f_load_smsubl_64(i64 %a, i32 %b, i32 %c, i32* nocapture readonly %d) #0 {
				entry:
				%conv = sext i32 %b to i64
				%conv1 = sext i32 %c to i64
				%mul = mul nsw i64 %conv1, %conv
				%sub = sub i64 %a, %mul
				%0 = load i32* %d, align 4
				%conv2 = sext i32 %0 to i64
				%add = add nsw i64 %sub, %conv2
				ret i64 %add
				}
				; CHECK-LABEL: f_load_smsubl_64:
				; CHECK: ldrsw
				; CHECK-NEXT: nop
				; CHECK-NEXT: smsubl
				; CHECK-NOWORKAROUND-LABEL: f_load_smsubl_64:
				; CHECK-NOWORKAROUND: ldrsw
				; CHECK-NOWORKAROUND-NEXT: smsubl


				define i64 @f_load_smull(i64 %a, i32 %b, i32 %c, i32* nocapture readonly %d) #0 {
				entry:
				%conv = sext i32 %b to i64
				%conv1 = sext i32 %c to i64
				%mul = mul nsw i64 %conv1, %conv
				%0 = load i32* %d, align 4
				%conv2 = sext i32 %0 to i64
				%div = sdiv i64 %mul, %conv2
				ret i64 %div
				}
				; CHECK-LABEL: f_load_smull:
				; CHECK: ldrsw
				; CHECK-NEXT: smull
				; CHECK-NOWORKAROUND-LABEL: f_load_smull:
				; CHECK-NOWORKAROUND: ldrsw
				; CHECK-NOWORKAROUND-NEXT: smull


				define i64 @f_load_smnegl_64(i64 %a, i32 %b, i32 %c, i32* nocapture readonly %d) #0 {
				entry:
				%conv = sext i32 %b to i64
				%conv1 = sext i32 %c to i64
				%mul = sub nsw i64 0, %conv
				%sub = mul i64 %conv1, %mul
				%0 = load i32* %d, align 4
				%conv2 = sext i32 %0 to i64
				%div = sdiv i64 %sub, %conv2
				ret i64 %div
				}
				; CHECK-LABEL: f_load_smnegl_64:
				; CHECK-NOWORKAROUND-LABEL: f_load_smnegl_64:
				; FIXME: only add further checks here once LLVM actually produces
				; smnegl instructions


				define i64 @f_load_umaddl(i64 %a, i32 %b, i32 %c, i32* nocapture readonly %d) #0 {
				entry:
				%conv = zext i32 %b to i64
				%conv1 = zext i32 %c to i64
				%mul = mul i64 %conv1, %conv
				%add = add i64 %mul, %a
				%0 = load i32* %d, align 4
				%conv2 = zext i32 %0 to i64
				%add3 = add i64 %add, %conv2
				ret i64 %add3
				}
				; CHECK-LABEL: f_load_umaddl:
				; CHECK: ldr
				; CHECK-NEXT: nop
				; CHECK-NEXT: umaddl
				; CHECK-NOWORKAROUND-LABEL: f_load_umaddl:
				; CHECK-NOWORKAROUND: ldr
				; CHECK-NOWORKAROUND-NEXT: umaddl


				define i64 @f_load_umsubl_64(i64 %a, i32 %b, i32 %c, i32* nocapture readonly %d) #0 {
				entry:
				%conv = zext i32 %b to i64
				%conv1 = zext i32 %c to i64
				%mul = mul i64 %conv1, %conv
				%sub = sub i64 %a, %mul
				%0 = load i32* %d, align 4
				%conv2 = zext i32 %0 to i64
				%add = add i64 %sub, %conv2
				ret i64 %add
				}
				; CHECK-LABEL: f_load_umsubl_64:
				; CHECK: ldr
				; CHECK-NEXT: nop
				; CHECK-NEXT: umsubl
				; CHECK-NOWORKAROUND-LABEL: f_load_umsubl_64:
				; CHECK-NOWORKAROUND: ldr
				; CHECK-NOWORKAROUND-NEXT: umsubl


				define i64 @f_load_umull(i64 %a, i32 %b, i32 %c, i32* nocapture readonly %d) #0 {
				entry:
				%conv = zext i32 %b to i64
				%conv1 = zext i32 %c to i64
				%mul = mul i64 %conv1, %conv
				%0 = load i32* %d, align 4
				%conv2 = zext i32 %0 to i64
				%div = udiv i64 %mul, %conv2
				ret i64 %div
				}
				; CHECK-LABEL: f_load_umull:
				; CHECK: ldr
				; CHECK-NEXT: umull
				; CHECK-NOWORKAROUND-LABEL: f_load_umull:
				; CHECK-NOWORKAROUND: ldr
				; CHECK-NOWORKAROUND-NEXT: umull


				define i64 @f_load_umnegl_64(i64 %a, i32 %b, i32 %c, i32* nocapture readonly %d) #0 {
				entry:
				%conv = zext i32 %b to i64
				%conv1 = zext i32 %c to i64
				%mul = sub nsw i64 0, %conv
				%sub = mul i64 %conv1, %mul
				%0 = load i32* %d, align 4
				%conv2 = zext i32 %0 to i64
				%div = udiv i64 %sub, %conv2
				ret i64 %div
				}
				; CHECK-LABEL: f_load_umnegl_64:
				; CHECK-NOWORKAROUND-LABEL: f_load_umnegl_64:
				; FIXME: only add further checks here once LLVM actually produces
				; umnegl instructions


				define i64 @f_store_madd_64(i64 %a, i64 %b, i64* nocapture readonly %cp, i64* nocapture %e) #1 {
				entry:
				%0 = load i64* %cp, align 8
				store i64 %a, i64* %e, align 8
				%mul = mul nsw i64 %0, %b
				%add = add nsw i64 %mul, %a
				ret i64 %add
				}
				; CHECK-LABEL: f_store_madd_64:
				; CHECK: str
				; CHECK-NEXT: nop
				; CHECK-NEXT: madd
				; CHECK-NOWORKAROUND-LABEL: f_store_madd_64:
				; CHECK-NOWORKAROUND: str
				; CHECK-NOWORKAROUND-NEXT: madd


				define i32 @f_store_madd_32(i32 %a, i32 %b, i32* nocapture readonly %cp, i32* nocapture %e) #1 {
				entry:
				%0 = load i32* %cp, align 4
				store i32 %a, i32* %e, align 4
				%mul = mul nsw i32 %0, %b
				%add = add nsw i32 %mul, %a
				ret i32 %add
				}
				; CHECK-LABEL: f_store_madd_32:
				; CHECK: str
				; CHECK-NEXT: madd
				; CHECK-NOWORKAROUND-LABEL: f_store_madd_32:
				; CHECK-NOWORKAROUND: str
				; CHECK-NOWORKAROUND-NEXT: madd


				define i64 @f_store_msub_64(i64 %a, i64 %b, i64* nocapture readonly %cp, i64* nocapture %e) #1 {
				entry:
				%0 = load i64* %cp, align 8
				store i64 %a, i64* %e, align 8
				%mul = mul nsw i64 %0, %b
				%sub = sub nsw i64 %a, %mul
				ret i64 %sub
				}
				; CHECK-LABEL: f_store_msub_64:
				; CHECK: str
				; CHECK-NEXT: nop
				; CHECK-NEXT: msub
				; CHECK-NOWORKAROUND-LABEL: f_store_msub_64:
				; CHECK-NOWORKAROUND: str
				; CHECK-NOWORKAROUND-NEXT: msub


				define i32 @f_store_msub_32(i32 %a, i32 %b, i32* nocapture readonly %cp, i32* nocapture %e) #1 {
				entry:
				%0 = load i32* %cp, align 4
				store i32 %a, i32* %e, align 4
				%mul = mul nsw i32 %0, %b
				%sub = sub nsw i32 %a, %mul
				ret i32 %sub
				}
				; CHECK-LABEL: f_store_msub_32:
				; CHECK: str
				; CHECK-NEXT: msub
				; CHECK-NOWORKAROUND-LABEL: f_store_msub_32:
				; CHECK-NOWORKAROUND: str
				; CHECK-NOWORKAROUND-NEXT: msub


				define i64 @f_store_mul_64(i64 %a, i64 %b, i64* nocapture readonly %cp, i64* nocapture %e) #1 {
				entry:
				%0 = load i64* %cp, align 8
				store i64 %a, i64* %e, align 8
				%mul = mul nsw i64 %0, %b
				ret i64 %mul
				}
				; CHECK-LABEL: f_store_mul_64:
				; CHECK: str
				; CHECK-NEXT: mul
				; CHECK-NOWORKAROUND-LABEL: f_store_mul_64:
				; CHECK-NOWORKAROUND: str
				; CHECK-NOWORKAROUND-NEXT: mul


				define i32 @f_store_mul_32(i32 %a, i32 %b, i32* nocapture readonly %cp, i32* nocapture %e) #1 {
				entry:
				%0 = load i32* %cp, align 4
				store i32 %a, i32* %e, align 4
				%mul = mul nsw i32 %0, %b
				ret i32 %mul
				}
				; CHECK-LABEL: f_store_mul_32:
				; CHECK: str
				; CHECK-NEXT: mul
				; CHECK-NOWORKAROUND-LABEL: f_store_mul_32:
				; CHECK-NOWORKAROUND: str
				; CHECK-NOWORKAROUND-NEXT: mul


				define i64 @f_prefetch_madd_64(i64 %a, i64 %b, i64* nocapture readonly %cp, i64* nocapture %e) #1 {
				entry:
				%0 = load i64* %cp, align 8
				%1 = bitcast i64* %e to i8*
				tail call void @llvm.prefetch(i8* %1, i32 0, i32 0, i32 1)
				%mul = mul nsw i64 %0, %b
				%add = add nsw i64 %mul, %a
				ret i64 %add
				}
				; CHECK-LABEL: f_prefetch_madd_64:
				; CHECK: prfm
				; CHECK-NEXT: nop
				; CHECK-NEXT: madd
				; CHECK-NOWORKAROUND-LABEL: f_prefetch_madd_64:
				; CHECK-NOWORKAROUND: prfm
				; CHECK-NOWORKAROUND-NEXT: madd

				declare void @llvm.prefetch(i8* nocapture, i32, i32, i32) #2

				define i32 @f_prefetch_madd_32(i32 %a, i32 %b, i32* nocapture readonly %cp, i32* nocapture %e) #1 {
				entry:
				%0 = load i32* %cp, align 4
				%1 = bitcast i32* %e to i8*
				tail call void @llvm.prefetch(i8* %1, i32 1, i32 0, i32 1)
				%mul = mul nsw i32 %0, %b
				%add = add nsw i32 %mul, %a
				ret i32 %add
				}
				; CHECK-LABEL: f_prefetch_madd_32:
				; CHECK: prfm
				; CHECK-NEXT: madd
				; CHECK-NOWORKAROUND-LABEL: f_prefetch_madd_32:
				; CHECK-NOWORKAROUND: prfm
				; CHECK-NOWORKAROUND-NEXT: madd

				define i64 @f_prefetch_msub_64(i64 %a, i64 %b, i64* nocapture readonly %cp, i64* nocapture %e) #1 {
				entry:
				%0 = load i64* %cp, align 8
				%1 = bitcast i64* %e to i8*
				tail call void @llvm.prefetch(i8* %1, i32 0, i32 1, i32 1)
				%mul = mul nsw i64 %0, %b
				%sub = sub nsw i64 %a, %mul
				ret i64 %sub
				}
				; CHECK-LABEL: f_prefetch_msub_64:
				; CHECK: prfm
				; CHECK-NEXT: nop
				; CHECK-NEXT: msub
				; CHECK-NOWORKAROUND-LABEL: f_prefetch_msub_64:
				; CHECK-NOWORKAROUND: prfm
				; CHECK-NOWORKAROUND-NEXT: msub

				define i32 @f_prefetch_msub_32(i32 %a, i32 %b, i32* nocapture readonly %cp, i32* nocapture %e) #1 {
				entry:
				%0 = load i32* %cp, align 4
				%1 = bitcast i32* %e to i8*
				tail call void @llvm.prefetch(i8* %1, i32 1, i32 1, i32 1)
				%mul = mul nsw i32 %0, %b
				%sub = sub nsw i32 %a, %mul
				ret i32 %sub
				}
				; CHECK-LABEL: f_prefetch_msub_32:
				; CHECK: prfm
				; CHECK-NEXT: msub
				; CHECK-NOWORKAROUND-LABEL: f_prefetch_msub_32:
				; CHECK-NOWORKAROUND: prfm
				; CHECK-NOWORKAROUND-NEXT: msub

				define i64 @f_prefetch_mul_64(i64 %a, i64 %b, i64* nocapture readonly %cp, i64* nocapture %e) #1 {
				entry:
				%0 = load i64* %cp, align 8
				%1 = bitcast i64* %e to i8*
				tail call void @llvm.prefetch(i8* %1, i32 0, i32 3, i32 1)
				%mul = mul nsw i64 %0, %b
				ret i64 %mul
				}
				; CHECK-LABEL: f_prefetch_mul_64:
				; CHECK: prfm
				; CHECK-NEXT: mul
				; CHECK-NOWORKAROUND-LABEL: f_prefetch_mul_64:
				; CHECK-NOWORKAROUND: prfm
				; CHECK-NOWORKAROUND-NEXT: mul

				define i32 @f_prefetch_mul_32(i32 %a, i32 %b, i32* nocapture readonly %cp, i32* nocapture %e) #1 {
				entry:
				%0 = load i32* %cp, align 4
				%1 = bitcast i32* %e to i8*
				tail call void @llvm.prefetch(i8* %1, i32 1, i32 3, i32 1)
				%mul = mul nsw i32 %0, %b
				ret i32 %mul
				}
				; CHECK-LABEL: f_prefetch_mul_32:
				; CHECK: prfm
				; CHECK-NEXT: mul
				; CHECK-NOWORKAROUND-LABEL: f_prefetch_mul_32:
				; CHECK-NOWORKAROUND: prfm
				; CHECK-NOWORKAROUND-NEXT: mul

				define i64 @fall_through(i64 %a, i64 %b, i64* nocapture readonly %c) #0 {
				entry:
				%0 = load i64* %c, align 8
				br label %block1

				block1:
				%mul = mul nsw i64 %0, %b
				%add = add nsw i64 %mul, %a
				%tmp = ptrtoint i8* blockaddress(@fall_through, %block1) to i64
				%ret = add nsw i64 %tmp, %add
				ret i64 %ret
				}
				; CHECK-LABEL: fall_through
				; CHECK: ldr
				; CHECK-NEXT: nop
				; CHECK-NEXT: .Ltmp
				; CHECK-NEXT: BB
				; CHECK-NEXT: madd
				; CHECK-NOWORKAROUND-LABEL: fall_through
				; CHECK-NOWORKAROUND: ldr
				; CHECK-NOWORKAROUND-NEXT: .Ltmp
				; CHECK-NOWORKAROUND-NEXT: BB
				; CHECK-NOWORKAROUND-NEXT: madd

				attributes #0 = { nounwind readonly "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #2 = { nounwind }


				; CHECK-LABEL: ... Statistics Collected ...
				; CHECK: 11 aarch64-fix-cortex-a53-835769 - Number of Nops added to work around erratum 835769

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add workaround for Cortex-A53 erratum (835769)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 14733

lib/Target/AArch64/AArch64.h

lib/Target/AArch64/AArch64A53Fix835769.cpp

lib/Target/AArch64/AArch64TargetMachine.cpp

lib/Target/AArch64/CMakeLists.txt

test/CodeGen/AArch64/aarch64-fix-cortex-a53-835769.ll

[AArch64] Add workaround for Cortex-A53 erratum (835769)
ClosedPublic