This is an archive of the discontinued LLVM Phabricator instance.

[LLD][ELF][AArch64] Implement scanner for Cortex-A53 Erratum 843419
ClosedPublic

Authored by peter.smith on Aug 15 2017, 8:15 AM.

Details

Summary

Add a new file SectionPatcher.cpp that implements the logic to scan for the Cortex-A53 Erratum 843419. This involves finding all the executable code, disassembling the instructions that might trigger the erratum and reporting a message if the sequence is detected.

This is patch 1 of 3 to fix pr33463 https://bugs.llvm.org/show_bug.cgi?id=33463 . This patch adds two new options -fix-cortex-a53-843419 (taken from gold and ld.bfd) and -print-fixes (lld specific). contains logic to find all the executable code in an AArch64 link and scan it for the erratum sequence. If we find it we can print out a message if -print-fixes is selected. The tests just check that the recognizer can detect the patch sequence, and also avoid detecting sequences that are very similar, or embedded in literal data. The code to actually fix the erratum will follow shortly in a later patch. Regardless of the method of fixing the erratum this should be independent of the code to detect it.

Details of the erratum: http://infocenter.arm.com/help/topic/com.arm.doc.epm048406/Cortex_A53_MPCore_Software_Developers_Errata_Notice.pdf
Arm Architecture Reference Manual: https://developer.arm.com/products/architecture/a-profile/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

There are a number of design decisions that I've made that are somewhat arbitrary and could easily be changed:

  • I've not tried to make the implementation target independent, as we only support one patch this would make the initial patch harder to understand. I'm quite happy to extract the target independent and target specific parts if people prefer.
  • Mapping symbols are to the best of my knowledge specific to ARM and AArch64, I've followed gold and ld.bfd by only disassembling code that is marked as code by the $x mapping symbol. This avoids fake matches of bit-patterns in inline literal data.
  • There is a lot of logic to decode the AArch64 instructions that are part of the erratum sequence, I've followed how these are written in the Arm Architecture Reference Manual, but this is still quite difficult to follow. I'm not sure how this can be avoided.
  • I've made a new file SectionPatcher.cpp as much of the code is not reusable elsewhere. There is a lot of target specific AArch64 code in there, I thought it better to leave it there rather than try to somehow hide it behind the Target interface.

This patch is dependent on D36739 to get the correct addresses in the report message.

Diff Detail

Event Timeline

peter.smith created this revision.Aug 15 2017, 8:15 AM
grimar added a subscriber: grimar.Sep 4 2017, 2:02 AM
grimar added inline comments.
ELF/Driver.cpp
681

Since this option is LLD specific, I wonder do we need it ? Can we just always report fixes
if -fix-cortex-a53-843419 is specified ?
Or may be always report them if -versbose is given.

ELF/Writer.cpp
1353

Should we just fail in driver.cpp if -fix-cortex-a53-843419 used for non AARCH64 targets ?

Thanks very much for the comments. I've got rid of the -print-fixes option in favour of -verbose, and I've also added a check (with test) to error if the -fix-cortex-a53-843419 option is used when the machine is not AArch64.

Rebased to account for D35987. I've also taken the opportunity to go re-read and update all the comments and do some minor refactoring in preparation for some larger changes in D36749.

Rebase, change lld/Core to lld/Common in include file. No further changes.

Update diff in light of recent refactoring, no other changes.

Rebased to account for recent refactorings. No other changes.

ruiu added inline comments.Nov 6 2017, 5:40 PM
ELF/SectionPatcher.cpp
1

This is pretty much AArch64-specific, so SectionPatcher is not a good name.

60

It looks like you can define member functions as non-member, file-scope functions.

371–373

I feel it is easier to read without this helper.

377

GetInst() is basically free, so you don't need to avoid calling it, do you? It is not clear to me why you had to check if Inst1 satisfies the condition. You'll always pass it to is843419ErratumSequence(), so you could check if it is ADRP in that function.

412–426

Can you assume that symbol tables are sorted?

462–497

Nesting seems too deep. Could you simplify?

ELF/SectionPatcher.h
1

Will we have something like this for non-ARM targets? If not, this class name and file name sound too generic. I'd name AArch64ErrataPatcher or something like that.

Thanks very much for the comments. I've made the following changes:

  • Changed the name of the files from SectionPatcher to AArch64ErrataFix. The technique for fixing errata isn't unique to AArch64, it has been applied to Arm before in other linkers, but I guess we can rename the file later if it becomes more generic.
  • Removed the A64 struct.
  • Got rid of GetInstr and simplified a bit.
  • Created a function to remove some of the indentation.

Unfortunately mapping symbols aren't guaranteed to be ordered. It was thought too target specific to mandate in the ABI.

Fix header guard to use the new filename. My apologies for missing the first time.

Few comments/suggestions below.

ELF/AArch64ErrataFix.cpp
32 ↗(On Diff #121893)

I think this description should be in the header of this file, see ICF.cpp for example.

288 ↗(On Diff #121893)

Function names should start from lowercase.
I would suggest to change name to something strarting from "is*" or "should*" or alike,
which looks more appropriate naming for helper returning bool.

352 ↗(On Diff #121893)

I would move this below declaration of PatchOff and declare as something like:

uint32_t *InstBuf = reinterpret_cast<uint32_t *>(Buf + Off);

That way you can get rid of multiple reinterpret_casts.

370 ↗(On Diff #121893)

I think you missing curly brackets here to conform code style:

if (is843419ErratumSequence(Instr1, Instr2, Instr3)) {
...
} else if
410 ↗(On Diff #121893)

You don't need brackets:

if (auto *Sec = dyn_cast<InputSection>(Def->Section))
  if (Sec->Flags & SHF_EXECINSTR)
    SectionMap[Sec].push_back(Def);
422 ↗(On Diff #121893)

May be early continue ?

if (MapSyms.size() <= 1)
  continue;
471 ↗(On Diff #121893)

Do you need argument ? It looks you could iterate over global OutputSections array.
(if suggestion below does not work for some reason).

476 ↗(On Diff #121893)

Seems you can just iterate global InputSections vector here ?

ELF/Config.h
119

Do you need 'FatalWarnings' ? Looks unused.

peter.smith marked 7 inline comments as done.

Thanks very much for the review comments. I've updated the diff to address them.

ELF/AArch64ErrataFix.cpp
352 ↗(On Diff #121893)

Good suggestion, thanks.

370 ↗(On Diff #121893)

I've made the change. Just for my curiosity do you have a reference? I couldn't find anything in https://llvm.org/docs/CodingStandards.html . I guess it could be local lld convention.

471 ↗(On Diff #121893)

Yes I can use the global. I seem to remember that OutputSections used to be a member of the Writer class, I guess this must have changed since I first wrote the patch.

476 ↗(On Diff #121893)

Maybe, however I think it is worth keeping it the way it is for the moment for a couple of reasons:

  • In the follow up patch that adds the patching we need to pass in the InputSectionDescription to insert the synthetic sections containing patches into.
  • I'm more confident that I'm not finding patches in InputSections that aren't in the image.
ELF/Config.h
119

I've removed it, it looks like a mistake made when rebasing.

ELF/SectionPatcher.cpp
412–426

Unfortunately not. The original Arm proprietary toolchain used to guarantee that mapping symbols were sorted before all other symbols with the first element a special mapping symbol giving the number of mapping symbols. This was considered to much of a target specific requirement for the other toolchains so the requirement didn't go into the ABI.

grimar added inline comments.Nov 14 2017, 12:07 AM
ELF/AArch64ErrataFix.cpp
370 ↗(On Diff #121893)

Hmm, I also did not find anything, that is probably local convention, I think we always write in that style (use brackets even for single line if else branch uses them), not sure where it came from initially :)

476 ↗(On Diff #121893)

I am fine with it.
(regarding the second part I believe all Live InputSections anyways should be in the image, otherwise it would be bug, also it may make sence to rename 'createA53Errata843419Fixes' to 'reportA53Errata843419Fixes' for this iteration as it is exactly what it do now as far I understand).

ELF/AArch64ErrataFix.h
20 ↗(On Diff #122665)

I meant this probably should be at the top of the .cpp file, in its header part,
just like we have in ICF.cpp, MapFile.cpp, ScriptLexer.cpp and others.

Thanks for the clarification about the header. I've renamed the create... to report... and have moved the comment to below the header in the .cpp file.

Many thanks for taking the time to review this and some of the other Arm/AArch64 specific stuff, it is much appreciated.

Posting into Phabricator to keep comments together.

+ if (Config->EMachine == EM_AARCH64 && Config->FixCortexA53Errata843419) {
+ Script->assignAddresses();
+ reportA53Errata843419Fixes();
+ }

Why do you need assignAddresses()? This is just after a loop that stops
when address don't change.

At the moment, until D37944 Add support for AArch64 Range Thunks lands the assignAddresses won't be entered. Some merge work will be needed here as Thunks and Errata fixes can impact each other.

+ This file implements Section Patching for the purpose of working around
+
errata in CPUs. The general principle is that an erratum sequence of one or
+ more instructions is detected in the instruction stream, one of the
+
instructions in the sequence is replaced with a branch to a patch sequence
+ of replacement instructions. At the end of the replacement sequence the
+
patch branches back to the instruction stream.
+
+ This technique is only suitable for fixing an erratum when:
+
- There is a set of necessary conditions required to trigger the erratum that
+ can be detected at static link time.
+
- There is a set of replacement instructions that can be used to remove at
+ least one of the necessary conditions that trigger the erratum.
+
- We can overwrite an instruction in the erratum sequence with a branch to
+ the replacement sequence.
+
- We can place the replacement sequence within range of the branch.

Given where this is called you also need to be able to place the
replacement sequence after all other sections or you will need to
reevaluate addresses for thunks, no?

Yes. At time of writing AArch64 doesn't support thunks, my preference would be to land the range thunks support for AArch64 first (D37944) and then merge this in with some test cases involving range extension thunks.

+ - The implementation here only supports one patch, the AArch64 Cortex-53
+
errata 843419 that affects r0p0, r0p1, r0p2 and r0p4 versions of the core.
+ To keep the initial version simple there is no support for multiple
+
architectures or selection of different patches.

BTW, is it public knowledge what uses these versions? I am not sure if
that is something that is present in a few early dev boards or in most
of the phones on the planet.

Unfortunately I don't have concrete information, we'd need to know for each phone what SoC it was using, and for each SoC which rev of the Cortex-A53 it was using. What I do know is that it was reproduced in real Android software on enough phones to make the NDK enable the gold/bfd equivalent option by default. My understanding is that Ubuntu for AArch64 and the Linaro aarch64-linux-gnu toolchains also enable it by default. My guess, given the low shelf life of mobiles, is the overall percentage of phones is low, but there are so many phones out there that it could still run into hundreds of thousands of phones still in use.

+ Load/store register (unsigned immediate)
+
| size (2) 11 | 1 V 01 | opc (2) | imm12 | Rn (5) | Rt (5) |
+static bool isLoadStoreRegisterUnsigned(uint32_t Instr) {
+ return (Instr & 0x3b000000) == 0x39000000;
+}

A crazy idea for another day: can all these functions be auto generated
from the td files that llvm-mc uses?

I did think a bit about that. In theory some of it, but in general the information in mc doesn't give you the information in the right form or in a stable enough form that you could interrogate for details such as does this instruction read or write this register? I suspect that quite a bit more information that wouldn't be needed for compilation/disassembly would need to be added to the .td files to make it work.

+ Note that this function refers to v8.0 only and does not include the
+
additional load and store instructions added for in later revisions of
+ the architecture such as the Atomic memory operations introduced
+
in v8.1.

Because the errata is fixed on all implementations of v8.1?

In effect yes. The errata is specific to the cortex-a53 CPU, and all cortex-a53s are v8.0.

+static void report843419Fix(uint64_t AdrpAddr) {
+ if (!Config->Verbose)
+ return;
+ message("detected cortex-a53-843419 erratum sequence starting at " +
+ utohexstr(AdrpAddr) + " in unpatched output.");
+}

This one is actually simpler without the early return.

Ok I'll fix.

+ bool OptionalAllowed = Limit - Off > 12;

Move bool after the if since it is not used in it.

Ok I'll fix.

LGTM with the above nits fixed (mostly comment updates).

Cheers,
Rafael

I think it will be best if I get the range-extension thunks in first, and then merge this one in so that the interactions can be tested.

Updated and rebased after addition of range-extension thunks for aarch64.

  • The erratum scanning is now done within the same loop as range extension thunks.
  • Added a new test case of an erratum sequence that will only be detected on the second pass. As the scanner only detects patches and doesn't try and fix them we cannot test the case where a new thunk is required due to a patch being added.

peter.smith updated this revision to Diff 125355.
peter.smith added a comment.

Updated and rebased after addition of range-extension thunks for aarch64.

  • The erratum scanning is now done within the same loop as range extension thunks.
  • Added a new test case of an erratum sequence that will only be detected on the second pass. As the scanner only detects patches and doesn't try and fix them we cannot test the case where a new thunk is required due to a patch being added.

What is the plan for when we detect sequences to patch on mulitple
iterations? Given that the address of the instructions are important, we
can end up patching sequences that in the end are in addresses that
don't actually need patching, no?

That is probably fine as avoiding it would probably require a fancier
optimization algorithm.

LGTM.

Cheers,
Rafael

The plan is to leave them in. This can result in a small amount of wasted code size (each patch instance is 8-bytes) but it isn't significant compared to the overall size of the program. There is a clever way of mitigating the problem by padding out the size of a patch such that it doesn't disturb the base address modulo 0x1000 of following sections. Although this wastes quite a lot of bytes for small programs.

Thanks very much for the review.

This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.
grimar added a comment.Dec 5 2017, 8:07 AM

Looks you forgot to add AArch64ErrataFix.cpp.

grimar added inline comments.Jun 29 2018, 3:29 AM
ELF/AArch64ErrataFix.cpp
256 ↗(On Diff #125355)

Hi from 2018, Peter :)

I am sorry for the delay, but this line of the code seems uncovered by any test cases we have.
Do you think you can fix it? (If not - I'll try to do something by myself probably).

(We are somewhere in the middle of the doing the bot to check the LLD code coverage.
I am currently trying to find out and resolve the most problematic places)

I'll come up with something. Will aim to do it on Monday. The load/store exclusive instructions are architecture v8.1 only so aren't supported by the v8.0 cortex-a53; it is unlikely that anyone will run the erratum fix that only affects cortex-a53 with the option but it should be possible to come up with test case.

I'll come up with something. Will aim to do it on Monday. The load/store exclusive instructions are architecture v8.1 only so aren't supported by the v8.0 cortex-a53; it is unlikely that anyone will run the erratum fix that only affects cortex-a53 with the option but it should be possible to come up with test case.

Cool, thanks!