This is an archive of the discontinued LLVM Phabricator instance.

[LLD][ELF][ARM] Implement --fix-cortex-a8 to fix erratum 657417

Authored by peter.smith on Sep 6 2019, 9:01 AM.



The --fix-cortex-a8 option implements a linker workaround for the coretex-a8 erratum 657417. A summary of the erratum conditions is:

  • A 32-bit Thumb-2 branch instruction B.w, Bcc.w, BL, BLX spans two

4KiB regions.

  • The destination of the branch is to the first 4KiB region.
  • The instruction before the branch is a 32-bit Thumb-2 non-branch


The linker fix is to redirect the branch to a patch not in the first 4KiBregion. The patch forwards the branch on to its target.

The cortex-a8, is an old CPU, with the first implementation of this workaround in ld.bfd appearing in 2009. The cortex-a8 has been used in early Android Phones and there are some critical applications that still need to run on a cortex-a8 that have the erratum. Implementing support for --fix-cortex-a8 will remove the final reason to keep in the Android build environment. The patch is applied roughly 10 times on LLD and 20 on Clang when they are built with --fix-cortex-a8 on an Arm system. The formal erratum description is avaliable in the ARM Core Cortex-A8 (AT400/AT401) Errata Notice document. This is available from Arm on request but it seems to be findable via a web search.

Implementation notes:

  • I appreciate that this will be difficult to review as it is a big lump of Arm specific code. The erratum fix itself is fairly simple, most of the complexity is in dealing with the combinations of relocations and instructions. Happy to answer any questions as best I can.
  • I have not attempted to share any code or merge this into the AArch64ErrataFix.cpp file. There are some parts that could be factored out at the expense of some customization points to do something specific to Arm, AArch64 or the patch. I chose not to do this for the initial patch to keep it as simple as possible. I'm happy to factor bits out or combine the source files if preferred.
  • The functions that are very similar to AArch64ErrataFix are:
    • init (Arm and Thumb mapping symbols)
    • insertPatches (Account for smaller Thumb branch range)
    • patchInputSectionDescription (Arm and Thumb mapping symbols)
    • createFixes (different patch name)
  • The functions that are specific to the Cortex-A8 erratum and contain most of the logic are:
    • Class Patch657417Section and its member functions
    • scanCortexA8Errata657417
    • implementPatch
  • ld.bfd has a bug in its mask for recognizing bcc.w which causes it to recognize nop.w as a conditional branch. This is only relevant if comparing the implementations.
  • If dynamic linking is used then if the page-size or alignment of the segment containing the patches were < 4KiB then a dynamic loader could undo the fixes. Given the target audience of Android the LLD default page-size of 4KiB for Arm prevents this from happening. There is scope to force Page Alignment to at least 4KiB if --fix-cortex-a8 is enabled but I haven't done it yet on the principle that people using this option will know what they are doing.

Diff Detail


Event Timeline

peter.smith created this revision.Sep 6 2019, 9:01 AM

I have read through the file but haven't gone through the logic in a debugger. Some suggestions and questions inlined.

MaskRay added inline comments.Sep 6 2019, 8:24 PM
42 ↗(On Diff #219121)

What are "region 1" and "region 2"?

63 ↗(On Diff #219121)

Omit lld::elf. This is not needed because of using namespace lld::elf; above. I'm thinking whether it is better to use

// delete use namespace lld;

namespace lld {
namespace elf {

for new files.

80 ↗(On Diff #219121)

Missing full stop.

87 ↗(On Diff #219121)

Return true if the half-word is the first half of a 32-bit instruction.

Do you mean the first half word?

88 ↗(On Diff #219121)

32-bit Thumb instruction encoding? (The heading as used in the manual)

91 ↗(On Diff #219121)

op1 != 0b00 ?

op1 == 0b00 encodes a 16-bit Thumb instruction.

156 ↗(On Diff #219121)

Use one write32le?

181 ↗(On Diff #219121)


Is "first region" the term used to refer to destAddr < sourceAddr && (destAddr & 0xfffff000) == (sourceAddr & 0xfffff000);

231 ↗(On Diff #219121)

ISAddr + Off -> isecAddr + off

248 ↗(On Diff #219121)

l6 -> 16

269 ↗(On Diff #219121)

Say off % 0x1000 = 0xffc before the assignment. The erratum sequence in the next page starts at 0xffa. This increment will skip that erratum sequence.

Is this a possible scenario?

286 ↗(On Diff #219121)

In include/llvm/Object/ELFObjectFile.h and llvm-objdump, we just use startswith("$a"). Is there a reason to check "$a."? Below, you just use return ms->getName().startswith("$t");

297 ↗(On Diff #219121)

Should there be a check to reject ELF32BE beforehand?
It can be placed in Driver.cpp:checkOptions.

314 ↗(On Diff #219121)

if (mapSyms.size() <= 1) continue can be deleted.

342 ↗(On Diff #219121)

initial Thunk placement?

344 ↗(On Diff #219121)

Missing full stop.

350 ↗(On Diff #219121)

Or use for (; patchIt != patchEnd; ++patchIt)

351 ↗(On Diff #219121)

Is it guaranteed that getBranchAddr() is monotonically increasing?

363 ↗(On Diff #219121)


370 ↗(On Diff #219121)
if (a->outSecOff != b->outSecOff)
  return a->outSecOff < b->outSecOff;
return isa<Patch657417Section>(a) && !isa<Patch657417Section>(b);
503 ↗(On Diff #219121)


27 ↗(On Diff #219121)


42 ↗(On Diff #219121)

std::map -> DenseMap

std::map may give a false impression that the key is ordered. InputSection's are allocated from a BumpPtrAllocator. While the allocator is called "bump", that just refers to the allocation strategy within a slab. When a new slab is allocated by malloc, it is not guaranteed the address will monotonically increase.

1573 ↗(On Diff #219121)


if (config->fixCortexA53Errata843419) {
  if (changed)
  changed |= a64p.createFixes();
if (config->fixCortexA8) {
  if (changed)
  changed |= a32p.createFixes();

if you think it is clearer.

4 ↗(On Diff #219121)

-start-address -> --start-address

14 ↗(On Diff #219121)

Align ld.lld

ruiu added a comment.Sep 9 2019, 5:55 AM

I'm halfway through reviewing it, and it looks like a straightforward implementation of a mitigation. Can I ask a (perhaps noob) question? If a CPU can be locked up by an instruction sequence that triggers the bug, and if the CPU is used by Android, does that mean a user application can lock up the entire system?

42 ↗(On Diff #219121)

I guess that means first page and second page?

372–375 ↗(On Diff #219121)

You can return the result of the if condition directly.

peter.smith marked 27 inline comments as done.Sep 9 2019, 9:31 AM

I'm halfway through reviewing it, and it looks like a straightforward implementation of a mitigation. Can I ask a (perhaps noob) question? If a CPU can be locked up by an instruction sequence that triggers the bug, and if the CPU is used by Android, does that mean a user application can lock up the entire system?

Yes it is possible for a user application to lock the entire system. From the erratum description: " If the deadlock condition occurs, it can only be interrupted by pulling the RESET pin on the processor." There are several more run-time conditions that make this rare, and it only occurs on models with a 32k instruction cache.

Thanks very much for the comments, I will post a new patch shortly that I hope will address the comments. I think the AArch64ErrataFix.cpp will need a patch after this as some of the refactoring cleanups could also be applied there. I'll do that when it is clear what needs to be done.

42 ↗(On Diff #219121)

Memory region is the word used in the official erratum description and the Architecture reference manual. It is closely related to a page so in this case it can be thought of as a page. I think memory region is used in the spec as memory regions can also be used in a non-paged context. I'll update the description to make it clear what region 1 and 2 are.

88 ↗(On Diff #219121)

Yes, I'll update the comment to improve the reference.

91 ↗(On Diff #219121)

Yes, I forgot to finish the sentence used in the manual. I'll fix up the comment.

181 ↗(On Diff #219121)

Yes it is, I'll update the function name to make it more explicit. I also worked out that I don't need the destAddr < sourceAddr part.

269 ↗(On Diff #219121)

Yes I think there might be a case that gets recognized as an erratum where there is something like:

0x0ffc nop.w
0x1000 b.w label

I will add a case to make sure the 0xffc is handled in a similar way to 0xffe and will add to the nopatch test case.

286 ↗(On Diff #219121)

Yes, $afoo without the . is not a mapping symbol. I'm thinking that it might be best to replace the map from InputSection -> vector<Symbol*> to InputSection -> std::pair<uint64_t offset, bool isThumb>, this would also work for AArch64.

297 ↗(On Diff #219121)

LLD doesn't support ELF32BE. I'd have thought it would give an error, but without passing an emulation, it seems like I can get LLD will accept a big-endian ELF file without error, I'll need to write a follow up patch to give an error.

As an aside:
Big Endian is strange on AArch64 and even stranger in ARM. In general it only tends to get used in networking so I've not seen any requests to implement it.

  • AArch64 BE: ELF is BE, instructions are LE, Data is BE
  • ARM BE : ELF is BE, instructions in relocatable objects are BE, Data is BE. Older ARM architectures like v4, v5 and v6 executable/dso instructions have BE instructions. All newer architectures have LE instructions and the linker needs to do the byte swaps.
351 ↗(On Diff #219121)

Yes. The getBranchAddr() is essentially the address in the InputSection that has the patch applied to it. Within an InputSectionDescriptions the InputSections->outSecOff monotonically increase, and as the patches are added to the end of the list, the getBranchAddr() monotonically increases.

Updated patch to address review comments.

Just a few nits from me.

176 ↗(On Diff #219373)

Doesn't seem you need this->?

188 ↗(On Diff #219373)

Perhaps just inline a here?

216 ↗(On Diff #219373)

You need to wrap this line into curly bracers I think, because first branch has it.
(LLD coding style feature)

281 ↗(On Diff #219373)

{} too.

300 ↗(On Diff #219373)

What is b stands for (here and below)? I'd expect to see sym or s.

peter.smith marked 5 inline comments as done.

Thanks for the comments. Updated diff to address George's comments. Changes include:

  • remove uses of this
  • inline addend
  • add braces where needed
  • change parameter name from b to s
MaskRay added inline comments.Sep 10 2019, 2:42 AM
490 ↗(On Diff #219499)

Above you use s->getName() == "$t" || s->getName().startswith("$t.");

peter.smith marked an inline comment as done.Sep 10 2019, 3:57 AM
peter.smith added inline comments.
490 ↗(On Diff #219499)

I think using ms here is defensible. Beforehand we were processing all symbols, that may or may not be mapping symbols. Here we know that the symbol is a mapping symbol as we filtered them out earlier.

I don't mind changing if you'd prefer s instead of ms?

MaskRay added inline comments.Sep 10 2019, 4:22 AM
490 ↗(On Diff #219499)

ms as the variable name is fine. Sorry, I should have been clearer. I meant why .startswith("$t") is used here. But I see the reason now:

Because the elements contain exclusively mapping symbols:

if (!isArmMapSymbol(def) && !isThumbMapSymbol(def) &&

ms->getName().startswith("$t"); should be sufficient.

MaskRay added inline comments.Sep 13 2019, 2:11 AM
202 ↗(On Diff #219499)

The find_if code sequence is also used in implementPatch:

auto relIt = llvm::find_if(isec->relocations, [=](const Relocation &r) {
    return r.offset == patcheeOffset &&
           (r.type == R_ARM_THM_JUMP19 || r.type == R_ARM_THM_JUMP24 ||
            r.type == R_ARM_THM_CALL);

Is it necessary to store the iterator in ScanResult?

227 ↗(On Diff #219499)

source + isec->getSize() + 0x100 or isec->getVA() + isec->getSize() + 0x100?

263 ↗(On Diff #219499)

l6 -> 16 or just delete l6-bit.

336 ↗(On Diff #219499)

I think these disjunctions can be simplified to:

isThumbMapSymbol(a) == isThumbMapSymbol(b)

479 ↗(On Diff #219499)

Superfluous space in the comment.

483 ↗(On Diff #219499)

MapSyms -> mapSyms

490 ↗(On Diff #219499)

In init()

    std::unique(mapSyms.begin(), mapSyms.end(), ...),

You can normalize mapSyms to start with a thumb mapping symbol (erase(begin()) if not thumb). Then you can do auto thumbSym = mapSyms.begin(); here.

36 ↗(On Diff #219499)

.local is the default for a defined symbol. Is the directive here to emphasize the symbol is local? Same question goes for other target* symbols.

123 ↗(On Diff #219499)

Add a line: CALLSITE7: 00019002 target7

202 ↗(On Diff #219499)

Delete the trailing empty line.

32 ↗(On Diff #219499)

You can just use spaces, instead of interleaving spaces and tabs before beq.w.


Unrelated to this patch, I guess +0x4 is incorrect. If so, we need an llvm-objdump fix as I mentioned in D66539.

3 ↗(On Diff #219499)

%t2 -> /dev/null because it is not used.

PS: I usually use %t/ as the executable/DSO name for the object file %t.o. The suffix (usually empty, 1 or 2) indicates their relations.

26 ↗(On Diff #219499)

expect -> expects?

41 ↗(On Diff #219499)

expect -> expects?

peter.smith marked 19 inline comments as done.Sep 13 2019, 9:28 AM

Thanks very much for the comments. I'll upload a patch shortly with the suggestions applied.

202 ↗(On Diff #219499)

I think that would help. I've made it so a pointer to the relocation is stored in ScanResult. As ImplementPatch uses all of ScanResult I've just passed through rather than splitting up the parameters.

227 ↗(On Diff #219499)

Thanks for pointing out the mistake. It should be isec->getVA() + isec->getSize() + 0x100

336 ↗(On Diff #219499)

Yes it can, that makes it a lot simpler; thanks.

490 ↗(On Diff #219499)

Thanks for the suggestion, I've implemented that.

36 ↗(On Diff #219499)

They are definitely intended to be local as the assembler can make more assumptions about resolving fixups without relocations if it is. I can remove them if they are the default as I don't think it is hugely important to emphasise.

32 ↗(On Diff #219499)

Apologies, forgot to untabify the file before posting. Neither GNU or llvm objdump do a particularly good job with Arm, Thumb branches due to the implicit PC offset.

26 ↗(On Diff #219499)

I've rewritten to "a patch will be attempted". It is difficult to say in the original, I'd say "We expect" or "The test expects".

41 ↗(On Diff #219499)

Rewritten to "a patch will be attempted"

peter.smith marked 3 inline comments as done.

Updated diff for latest suggestions. Main changes:

  • Simplify the mapping symbol calculations
  • Cache the result of searching for the relocation
  • Fix a bug when determining whether a patch could reach outside its section.
  • Update comments and fix white space in tests.
MaskRay accepted this revision.Sep 13 2019, 10:09 AM

LGTM, just a few nits and a requested test.

I will add a case to make sure the 0xffc is handled in a similar way to 0xffe and will add to the nopatch test case.

108 ↗(On Diff #220127)

The outer pair of () is unnecessary I think.

234 ↗(On Diff #220127)


246 ↗(On Diff #220127)

is at least 0xffa -> is 0xffa?

I haven't verified, but off = alignTo(isecAddr+off, 0x1000, 0xffa) - isecAddr; probably works.

373 ↗(On Diff #220127)

isecLimit can be defined here, i.e. remove the definition above.

This revision is now accepted and ready to land.Sep 13 2019, 10:09 AM
peter.smith marked 4 inline comments as done.Sep 16 2019, 2:16 AM

LGTM, just a few nits and a requested test.

I will add a case to make sure the 0xffc is handled in a similar way to 0xffe and will add to the nopatch test case.

Thanks for the review. I've made the updates and will commit shortly, if there is anything else then we should be able to fix it post-commit. There are a couple of test cases in arm-fix-cortex-a8-nopatch.s (CALLSITE6 and CALLSITE7) that check for those cases, but it probably wasn't clear enough. I've expanded the comment to explain what they are checking for.

246 ↗(On Diff #220127)

Yes, that cleans it up a lot. Thanks for the suggestion.

373 ↗(On Diff #220127)

If you mean do something like:

uint64_t isecLimit = isec->outSecOff + isec->getSize();

I don't think that will work due to the use of isecLimit outside the for loop on line 384:

for (; patchIt != patchEnd; ++patchIt)
    (*patchIt)->outSecOff = isecLimit;
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2019, 2:40 AM