This is an archive of the discontinued LLVM Phabricator instance.

[LLD][ELF][AArch64][ARM] When errata patching, round thunk size to page boundary
ClosedPublic

Authored by peter.smith on Dec 10 2019, 8:54 AM.

Details

Summary

On some edge cases such as Chromium compiled with full instrumentation we have a .text section over twice the size of the maximum branch range and the instrumentation code generation containing many examples of the erratum sequence. The combination of Thunks and many erratum sequences causes finalizeAddressDependentContent() to not converge. We have:
start:

  • Thunk Creation (disturbs addresses after thunks, creating more patches)
  • Patch Creation (disturbs addresses after patches, creating more thunks)
  • goto start

In most images with few thunks and patches the mutual disturbance does not cause convergence problems. As the .text size and number of patches go up the risk increases.

A way to prevent the thunk creation from interfering with patch creation is to round up the size of the thunks to a 4KiB boundary when the erratum patch is enabled. As the erratum sequence only triggers when an instruction sequence starts at 0xff8 or 0xffc modulo (4 KiB) by making the thunks not affect addresses modulo (4 KiB) we prevent thunks from
interfering.

The patch sections themselves could be aggregated in the same way that Thunks are within ThunkSections and have the size rounded up in the same way. This would reduce the number of patches created in a .text section size > 128 MiB but would not help with convergence problems.

fixes (part of) pr44071 Thunk convergence within limit.

Diff Detail

Event Timeline

peter.smith created this revision.Dec 10 2019, 8:54 AM

Thanks for the fix. I'll find some time to understand the logic.

lld/test/ELF/aarch64-cortex-a53-843419-thunk.s
6–7

Nit: indent the continuation line.

lld/test/ELF/arm-fix-cortex-a8-thunk.s
9

With a linker script, the gaps between adjacent sections will not be dumped by llvm-objdump -d.
llvm-objdump -d may be sufficient to dump 3 pieces of disassembly.

Without a linker script, assembly like .space can create a huge number of zeros, which will slow down llvm-objdump -d significantly.

peter.smith marked 2 inline comments as done.Dec 10 2019, 11:04 AM

Thanks, I'll upload a new diff with the test changes.

To try and do a better job of explaining the logic.
The -fix-cortex-a53-843419 and -fix-cortex-a8 options are look for an instruction sequence that starts at a particular offset modulo 0x1000. For fix-cortex-a53-843419 the instruction sequence needs to start at 0xff8 (modulo 0x1000) or 0xffc (modulo 0x1000) for -fix-cortex-a8 the instruction sequence needs to start at 0xfffe (modulo 0x1000). When we add a thunk or a patch we displace the sections after them by the size of the content inserted. This can mean that some instruction sequences that were lined up over the 0xff8, 0xffc or 0xfffe are no longer lined up, and potentially there are new sequences that become lined up. Normally the patches get placed at the end of the .text section so this isn't a problem. However for large applications we need to insert the patches in between other .text sections, as well as having more range extension thunks. If we ensure that the thunk section size is rounded up to 0x1000 then the displacement of sections following the thunks does not change (modulo 0x1000), this means we don't generate more instruction sequences with the necessary offsets in the sections following the thunks.

I have a suspicion that the instrumented build, with lots of adrp, ldr, sequences possibly to counters triggers more erratum sequences than usual. I'd need to run some experiments to confirm.

Updated diff with test changes.

ruiu accepted this revision.Dec 10 2019, 9:15 PM

LGTM

Thank you for fixing the issue. This is an interesting edge case that I haven't thought about that before.

lld/ELF/SyntheticSections.cpp
3364

nit: this is perhaps my personal preference but I'd think I prefer

if (config->fixCortexA53Errata843419 || config->fixCortexA8)
  return alignTo(size, 4096);
return size;
This revision is now accepted and ready to land.Dec 10 2019, 9:15 PM

LGTM

Thank you for fixing the issue. This is an interesting edge case that I haven't thought about that before.

Thanks for the review, I've applied the style change you recommended.

This revision was automatically updated to reflect the committed changes.