This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add workaround for Cortex-A53 erratum (835769)
ClosedPublic

Authored by bsmith on Oct 10 2014, 2:05 AM.

Details

Summary

Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is
possible for a 64-bit multiply-accumulate instruction in AArch64 state to
generate an incorrect result. The details are quite complex and hard to
determine statically, since branches in the code may exist in some
circumstances, but all cases end with a memory (load, store, or prefetch)
instruction followed immediately by the multiply-accumulate operation.

The safest work-around for this issue is to make the compiler avoid emitting
multiply-accumulate instructions immediately after memory instructions and the
simplest way to do this is to insert a NOP.

This patch implements such work-around. The work-around is only enabled
when specifying the clang command line option -mfix-cortex-a53-835769 or the
llvm backend option -aarch64-fix-cortex-a53-835769.

The work-around code generation is not enabled by default.

Diff Detail

Event Timeline

bsmith updated this revision to Diff 14706.Oct 10 2014, 2:05 AM
bsmith retitled this revision from to [AArch64] Add workaround for Cortex-A53 erratum (835769).
bsmith updated this object.
bsmith edited the test plan for this revision. (Show Details)
bsmith set the repository for this revision to rL LLVM.
bsmith added a subscriber: Unknown Object (MLST).

The implementation seems correct and the rationale seems right. I'd rather Tim had a look at it, too, just in case.

Just a few unimportant comments inline, but overall, looks good to me.

Thanks!

lib/Target/AArch64/AArch64FixCortexA53_835769.cpp
84 ↗(On Diff #14706)

I'm not sure this is really necessary, but keeping a comment about this is probably good.

test/CodeGen/AArch64/aarch64-fix-cortex-a53-835769.ll
40

There is no RUN checks for these...

43

You can just have two checks: ON and OFF and control what needs to be on and off onthe RUN line.

129

I think you should also add the expected CHECK line on both cases but not add the CHECK text.

jmolloy requested changes to this revision.Oct 10 2014, 3:06 AM
jmolloy added a reviewer: jmolloy.
jmolloy added a subscriber: jmolloy.

Hi Bradley,

I have plenty of comments inline.

Cheers,

James

lib/Target/AArch64/AArch64.h
43

I still maintain what I said internally that the name of the pass should fit convention - there's a convention in two lines above of putting "A5*" after AArch64. Can you please change to fit that convention unless you have a compelling reason otherwise?

lib/Target/AArch64/AArch64FixCortexA53_835769.cpp
29 ↗(On Diff #14706)

You don't need EquivalenceClasses, BitVector, or RegisterScavenging, (or raw_ostream).

52 ↗(On Diff #14706)

Capitalize "must".

69 ↗(On Diff #14706)

All comments should have proper SPAG! :)

98 ↗(On Diff #14706)

You're not using this, it should be removed.

134 ↗(On Diff #14706)

You're not using this.

163 ↗(On Diff #14706)

Why does this return nullptr on failure instead of asserting?

177 ↗(On Diff #14706)
  • is owned by the variable name: MachineBasicBlock *PMBB.
200 ↗(On Diff #14706)

GratuitouslyLongNameIsGratuitouslyLong. Wouldn't "Sequences" work, with a comment saying this is the terminating instruction in the sequence? Or something similarly shortened...

We like explicit names, but I feel this one is a bit on the long side.

205 ↗(On Diff #14706)

What if PrevInstr is nullptr here? Will you have to jump back to the previous fallthrough if one exists? or assert?

It doesn't seem right to continue with PrevInstr being nullptr.

232 ↗(On Diff #14706)

Space after MI.

This revision now requires changes to proceed.Oct 10 2014, 3:06 AM
bsmith added inline comments.Oct 10 2014, 4:10 AM
lib/Target/AArch64/AArch64.h
43

The rationale behind this name is that this phase isn't an A53 specific phase, it's a phase that addresses an A53 erratum. In a big-little system you may well want this phase enabled for A57 code.

lib/Target/AArch64/AArch64FixCortexA53_835769.cpp
205 ↗(On Diff #14706)

I've added an assert to getLastNonPseudo since if we have a block we've fallen through we expect to find an instruction. If getBBFallenThrough returns nullptr it means there was no fallen through block, hence PrevInstr being nullptr is valid in this case, meaning we have no previous instruction yet (the loop will then just go to the next interation where it does have one).

test/CodeGen/AArch64/aarch64-fix-cortex-a53-835769.ll
40

These are a hangup from some previous code I had, I'll remove them. (They test the samething as CHECK-NOWORKAROUND).

bsmith updated this revision to Diff 14711.Oct 10 2014, 4:11 AM
bsmith edited edge metadata.

Address review comments

rengolin added inline comments.Oct 10 2014, 6:34 AM
lib/Target/AArch64/AArch64FixCortexA53_835769.cpp
155 ↗(On Diff #14711)

shouldn't this be an llvm_unreachable instead?

test/CodeGen/AArch64/aarch64-fix-cortex-a53-835769.ll
37

If BASIC-PASS-DISABLED == NOWORKAROUND, why have two?

bsmith added inline comments.Oct 10 2014, 7:16 AM
test/CodeGen/AArch64/aarch64-fix-cortex-a53-835769.ll
37

All of the below tests are very sensitive to scheduling, and indeed if you run all of the NOWORKAROUND tests for a57/generic/cyclone some of them will fail for this very reason.

Instead only a single test is done for BASIC-PASS-DISABLED just to check the pass doesn't run.

bsmith updated this revision to Diff 14729.Oct 10 2014, 7:22 AM
bsmith edited edge metadata.

Use llvm_unreachable instead of directly asserting.

Hi Bradley,

The rationale behind this name is that this phase isn't an A53 specific phase, it's a phase that addresses an A53 erratum. In a big-little system you may well want this phase enabled for A57 code.

Sorry, I'm not convinced by that rationale. By the same reasoning, the FP Load balancing for A57 might be applied to code with -mcpu=cortex-a53 because it may run on a big.LITTLE system and has more impact for the big core than the little.

It doesn't seem right to continue with PrevInstr being nullptr.

I've added an assert to getLastNonPseudo since if we have a block we've fallen through we expect to find an instruction. If getBBFallenThrough returns nullptr it means there was no fallen through block, hence PrevInstr being nullptr is valid in this case, meaning we have no previous instruction yet (the loop will then just go to the next interation where it does have one).

The case I was concerned about is if we had a fallthrough block, but that block contained only pseudo instructions. Do you handle that case correctly? You'd need to unwind through the fallthroughs until one contained a non-pseudo, I'd have thought.

Hi Bradley,

Thanks for being open about the issue, it's really excellent to see ARM contributing to LLVM even in difficult circumstances like this.

I can't really comment on the actual work around, since I haven't seen the erratum; but I'll trust you've had enough look at it. I did spot a couple of issues; one stylistic, the other a bit more problematic.

Cheers.

Tim.

lib/Target/AArch64/AArch64FixCortexA53_835769.cpp
34–37 ↗(On Diff #14729)

I think we're trying to standardise on putting these predicates in AArch64TargetMachine.h, to decide whether the pass is even added.

143 ↗(On Diff #14729)

This doesn't necessarily mean it was a fallthrough. Particularly at -O0 a block may happen to be before another in layout order but contain a real branch anyway.

168 ↗(On Diff #14729)

I think this may damage expected block semantics too. You're putting an instruction without isTerminator after the real terminators (like the conditional branch).

I'd hope the verifier would complain about that (but be unsurprised if it didn't). Other passes may not cope (in particular, a glance at AnalyzeBranch suggests it'll fail after this).

Are you using the previous block to win back some efficiency when there's multiple branches here? If so, it may not be worth the effort, though I'll leave you to decide.

t.p.northover added inline comments.Oct 10 2014, 8:01 AM
lib/Target/AArch64/AArch64FixCortexA53_835769.cpp
143 ↗(On Diff #14729)

Actually, AnalyzeBranch is probably what you want to use here, if you find some way to go ahead with the NOP at end solution.

rengolin added inline comments.Oct 10 2014, 8:10 AM
test/CodeGen/AArch64/aarch64-fix-cortex-a53-835769.ll
37

Right, makes sense.

bsmith added inline comments.Oct 10 2014, 8:15 AM
lib/Target/AArch64/AArch64FixCortexA53_835769.cpp
143 ↗(On Diff #14729)

In a case where we have a real branch, a nop would never get inserted since the last instruction of the block would be the branch, not a load/store/prefetch. Unless it's easy to check for this case, might it be worth leaving like this if it's harmless?

168 ↗(On Diff #14729)

Similarly here, if there is a real terminator a nop would never get inserted, since a load/store/prefetch would never be a terminator I think? How does this work normally for fallthrough blocks with no explicit terminator?

t.p.northover added inline comments.Oct 10 2014, 8:29 AM
lib/Target/AArch64/AArch64FixCortexA53_835769.cpp
143 ↗(On Diff #14729)

Ah, I see. OK, I think it's probably correct now, but the name is a bit misleading because the block may not fall through at all.

I'd probably change to using AnalyzeBranch anyway, simply because it indicates the type of checks you're performing and can hopefully be relied on to get the fiddly details right:

if(!AnalyzeBranch(..., TBB, FBB) && !TBB & !FBB)
  return std::prev(MBB);

return nullptr;

Alternatively, perhaps rename the function to indicate you don't actually care if it's not a real fallthrough.

bsmith updated this revision to Diff 14733.Oct 10 2014, 8:52 AM

Address various review comments:

  • Move backend option to AArch64TargetMachine
  • Use AnalyzeBranch in getBBFallenThrough
  • Rename phase to AArch64A53Fix835769
t.p.northover accepted this revision.Oct 10 2014, 9:35 AM
t.p.northover added a reviewer: t.p.northover.

Thanks Bradley, I think this looks OK now.

Tim.

Hi Bradley

Your new test needs a "REQUIRES: asserts" as llc -stats require asserts to be on.

jmolloy accepted this revision.Oct 13 2014, 12:55 AM
jmolloy edited edge metadata.

Hi Bradley,

This now LGTM.

Cheers,

James

This revision is now accepted and ready to land.Oct 13 2014, 12:55 AM
bsmith closed this revision.Oct 13 2014, 3:23 AM

Committed as 219603, thanks!

This comment was removed by rengolin.
This comment was removed by rengolin.