This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
ELF/
-
AArch64ErrataFix.cpp
2/2
ARMErrataFix.h
44/56
ARMErrataFix.cpp
-
CMakeLists.txt
-
Config.h
-
Driver.cpp
-
Options.td
1/1
Writer.cpp
-
test/ELF/
-
ELF/
-
arm-fix-cortex-a8-blx.s
2/3
arm-fix-cortex-a8-nopatch.s
-
arm-fix-cortex-a8-plt.s
3/3
arm-fix-cortex-a8-recognize.s
1/2
arm-fix-cortex-a8-thunk.s
3/5
arm-fix-cortex-a8-toolarge.s

Differential D67284

[LLD][ELF][ARM] Implement --fix-cortex-a8 to fix erratum 657417
ClosedPublic

Authored by peter.smith on Sep 6 2019, 9:01 AM.

Download Raw Diff

Details

Reviewers

ruiu
MaskRay
grimar
• espindola

Commits

rGea99ce5e9b49: [ELF][ARM] Implement --fix-cortex-a8 to fix erratum 657417
rLLD371965: [ELF][ARM] Implement --fix-cortex-a8 to fix erratum 657417
rL371965: [ELF][ARM] Implement --fix-cortex-a8 to fix erratum 657417

Summary

The --fix-cortex-a8 option implements a linker workaround for the coretex-a8 erratum 657417. A summary of the erratum conditions is:

A 32-bit Thumb-2 branch instruction B.w, Bcc.w, BL, BLX spans two

4KiB regions.

The destination of the branch is to the first 4KiB region.
The instruction before the branch is a 32-bit Thumb-2 non-branch

instruction.

The linker fix is to redirect the branch to a patch not in the first 4KiBregion. The patch forwards the branch on to its target.

The cortex-a8, is an old CPU, with the first implementation of this workaround in ld.bfd appearing in 2009. The cortex-a8 has been used in early Android Phones and there are some critical applications that still need to run on a cortex-a8 that have the erratum. Implementing support for --fix-cortex-a8 will remove the final reason to keep ld.gold in the Android build environment. The patch is applied roughly 10 times on LLD and 20 on Clang when they are built with --fix-cortex-a8 on an Arm system. The formal erratum description is avaliable in the ARM Core Cortex-A8 (AT400/AT401) Errata Notice document. This is available from Arm on request but it seems to be findable via a web search.

Implementation notes:

I appreciate that this will be difficult to review as it is a big lump of Arm specific code. The erratum fix itself is fairly simple, most of the complexity is in dealing with the combinations of relocations and instructions. Happy to answer any questions as best I can.

I have not attempted to share any code or merge this into the AArch64ErrataFix.cpp file. There are some parts that could be factored out at the expense of some customization points to do something specific to Arm, AArch64 or the patch. I chose not to do this for the initial patch to keep it as simple as possible. I'm happy to factor bits out or combine the source files if preferred.

The functions that are very similar to AArch64ErrataFix are:
- init (Arm and Thumb mapping symbols)
- insertPatches (Account for smaller Thumb branch range)
- patchInputSectionDescription (Arm and Thumb mapping symbols)
- createFixes (different patch name)

The functions that are specific to the Cortex-A8 erratum and contain most of the logic are:
- Class Patch657417Section and its member functions
- scanCortexA8Errata657417
- implementPatch

ld.bfd has a bug in its mask for recognizing bcc.w which causes it to recognize nop.w as a conditional branch. This is only relevant if comparing the implementations.

If dynamic linking is used then if the page-size or alignment of the segment containing the patches were < 4KiB then a dynamic loader could undo the fixes. Given the target audience of Android the LLD default page-size of 4KiB for Arm prevents this from happening. There is scope to force Page Alignment to at least 4KiB if --fix-cortex-a8 is enabled but I haven't done it yet on the principle that people using this option will know what they are doing.

Diff Detail

Event Timeline

peter.smith created this revision.Sep 6 2019, 9:01 AM

Herald added a reviewer: • espindola. · View Herald TranscriptSep 6 2019, 9:01 AM

Herald added subscribers: kristof.beyls, arichardson, mgorny, emaste. · View Herald Transcript

amilendra added a subscriber: amilendra.Sep 6 2019, 12:29 PM

I have read through the file but haven't gone through the logic in a debugger. Some suggestions and questions inlined.

MaskRay added inline comments.Sep 6 2019, 8:24 PM

ELF/ARMErrataFix.cpp
43	What are "region 1" and "region 2"?
64	Omit `lld::elf`. This is not needed because of `using namespace lld::elf;` above. I'm thinking whether it is better to use // delete use namespace lld; namespace lld { namespace elf { ... } } for new files.
81	Missing full stop.
88	Return true if the half-word is the first half of a 32-bit instruction. Do you mean the first half word?
89	32-bit Thumb instruction encoding? (The heading as used in the manual)
92	op1 != 0b00 ? op1 == 0b00 encodes a 16-bit Thumb instruction.
157	Use one write32le?
182	branchToFirstRegion Is "first region" the term used to refer to `destAddr < sourceAddr && (destAddr & 0xfffff000) == (sourceAddr & 0xfffff000);`
232	ISAddr + Off -> isecAddr + off
249	l6 -> 16
270	Say `off % 0x1000 = 0xffc` before the assignment. The erratum sequence in the next page starts at 0xffa. This increment will skip that erratum sequence. Is this a possible scenario?
287	In include/llvm/Object/ELFObjectFile.h and llvm-objdump, we just use `startswith("$a")`. Is there a reason to check `"$a."`? Below, you just use `return ms->getName().startswith("$t");`
298	Should there be a check to reject ELF32BE beforehand? It can be placed in Driver.cpp:checkOptions.
315	`if (mapSyms.size() <= 1) continue` can be deleted.
343	initial Thunk placement?
345	Missing full stop.
351	Or use `for (; patchIt != patchEnd; ++patchIt)`
352	Is it guaranteed that getBranchAddr() is monotonically increasing?
364	Capitalize.
371	if (a->outSecOff != b->outSecOff) return a->outSecOff < b->outSecOff; return isa<Patch657417Section>(a) && !isa<Patch657417Section>(b);
504	!initialized
ELF/ARMErrataFix.h
28	Capitalize
43	`std::map -> DenseMap` `std::map` may give a false impression that the key is ordered. InputSection's are allocated from a BumpPtrAllocator. While the allocator is called "bump", that just refers to the allocation strategy within a slab. When a new slab is allocated by malloc, it is not guaranteed the address will monotonically increase.
ELF/Writer.cpp
1549	Alternatively, if (config->fixCortexA53Errata843419) { if (changed) script->assignAddresses(); changed \|= a64p.createFixes(); } if (config->fixCortexA8) { if (changed) script->assignAddresses(); changed \|= a32p.createFixes(); } if you think it is clearer.
test/ELF/arm-fix-cortex-a8-recognize.s
5	`-start-address` -> `--start-address`
15	Align `ld.lld`

I'm halfway through reviewing it, and it looks like a straightforward implementation of a mitigation. Can I ask a (perhaps noob) question? If a CPU can be locked up by an instruction sequence that triggers the bug, and if the CPU is used by Android, does that mean a user application can lock up the entire system?

ELF/ARMErrataFix.cpp
43	I guess that means first page and second page?
373–376	You can return the result of the if condition directly.

In D67284#1663011, @ruiu wrote:

I'm halfway through reviewing it, and it looks like a straightforward implementation of a mitigation. Can I ask a (perhaps noob) question? If a CPU can be locked up by an instruction sequence that triggers the bug, and if the CPU is used by Android, does that mean a user application can lock up the entire system?

Yes it is possible for a user application to lock the entire system. From the erratum description: " If the deadlock condition occurs, it can only be interrupted by pulling the RESET pin on the processor." There are several more run-time conditions that make this rare, and it only occurs on models with a 32k instruction cache.

Thanks very much for the comments, I will post a new patch shortly that I hope will address the comments. I think the AArch64ErrataFix.cpp will need a patch after this as some of the refactoring cleanups could also be applied there. I'll do that when it is clear what needs to be done.

ELF/ARMErrataFix.cpp
43	Memory region is the word used in the official erratum description and the Architecture reference manual. It is closely related to a page so in this case it can be thought of as a page. I think memory region is used in the spec as memory regions can also be used in a non-paged context. I'll update the description to make it clear what region 1 and 2 are.
89	Yes, I'll update the comment to improve the reference.
92	Yes, I forgot to finish the sentence used in the manual. I'll fix up the comment.
182	Yes it is, I'll update the function name to make it more explicit. I also worked out that I don't need the destAddr < sourceAddr part.
270	Yes I think there might be a case that gets recognized as an erratum where there is something like: 0x0ffc nop.w label: 0x1000 b.w label I will add a case to make sure the 0xffc is handled in a similar way to 0xffe and will add to the nopatch test case.
287	Yes, $afoo without the . is not a mapping symbol. I'm thinking that it might be best to replace the map from InputSection -> vector<Symbol*> to InputSection -> std::pair<uint64_t offset, bool isThumb>, this would also work for AArch64.
298	LLD doesn't support ELF32BE. I'd have thought it would give an error, but without passing an emulation, it seems like I can get LLD will accept a big-endian ELF file without error, I'll need to write a follow up patch to give an error. As an aside: Big Endian is strange on AArch64 and even stranger in ARM. In general it only tends to get used in networking so I've not seen any requests to implement it. AArch64 BE: ELF is BE, instructions are LE, Data is BE ARM BE : ELF is BE, instructions in relocatable objects are BE, Data is BE. Older ARM architectures like v4, v5 and v6 executable/dso instructions have BE instructions. All newer architectures have LE instructions and the linker needs to do the byte swaps.
352	Yes. The getBranchAddr() is essentially the address in the InputSection that has the patch applied to it. Within an InputSectionDescriptions the InputSections->outSecOff monotonically increase, and as the patches are added to the end of the list, the getBranchAddr() monotonically increases.

Updated patch to address review comments.

Just a few nits from me.

ELF/ARMErrataFix.cpp
177	Doesn't seem you need `this->`?
189	Perhaps just inline `a` here?
217	You need to wrap this line into curly bracers I think, because first branch has it. (LLD coding style feature)
282	`{}` too.
301	What is `b` stands for (here and below)? I'd expect to see `sym` or `s`.

Thanks for the comments. Updated diff to address George's comments. Changes include:

remove uses of this
inline addend
add braces where needed
change parameter name from b to s

MaskRay added inline comments.Sep 10 2019, 2:42 AM

ELF/ARMErrataFix.cpp
491	Above you use `s->getName() == "$t" \|\| s->getName().startswith("$t.");`

peter.smith marked an inline comment as done.Sep 10 2019, 3:57 AM

peter.smith added inline comments.

ELF/ARMErrataFix.cpp
491	I think using ms here is defensible. Beforehand we were processing all symbols, that may or may not be mapping symbols. Here we know that the symbol is a mapping symbol as we filtered them out earlier. I don't mind changing if you'd prefer s instead of ms?

MaskRay added inline comments.Sep 10 2019, 4:22 AM

ELF/ARMErrataFix.cpp
491	`ms` as the variable name is fine. Sorry, I should have been clearer. I meant why `.startswith("$t")` is used here. But I see the reason now: Because the elements contain exclusively mapping symbols: if (!isArmMapSymbol(def) && !isThumbMapSymbol(def) && !isDataMapSymbol(def)) continue; `ms->getName().startswith("$t");` should be sufficient.

MaskRay added inline comments.Sep 13 2019, 2:11 AM

ELF/ARMErrataFix.cpp
203	The find_if code sequence is also used in `implementPatch`: auto relIt = llvm::find_if(isec->relocations, [=](const Relocation &r) { return r.offset == patcheeOffset && (r.type == R_ARM_THM_JUMP19 \|\| r.type == R_ARM_THM_JUMP24 \|\| r.type == R_ARM_THM_CALL); }); Is it necessary to store the iterator in ScanResult?
228	`source + isec->getSize() + 0x100` or `isec->getVA() + isec->getSize() + 0x100`?
264	`l6 -> 16` or just delete `l6-bit`.
337	I think these disjunctions can be simplified to: `isThumbMapSymbol(a) == isThumbMapSymbol(b)`
480	Superfluous space in the comment.
484	`MapSyms` -> `mapSyms`
491	In `init()` mapSyms.erase( std::unique(mapSyms.begin(), mapSyms.end(), ...), mapSyms.end()); You can normalize `mapSyms` to start with a thumb mapping symbol (`erase(begin())` if not thumb). Then you can do `auto thumbSym = mapSyms.begin();` here.
test/ELF/arm-fix-cortex-a8-nopatch.s
37	`.local` is the default for a defined symbol. Is the directive here to emphasize the symbol is local? Same question goes for other `target*` symbols.
124	Add a line: `CALLSITE7: 00019002 target7`
test/ELF/arm-fix-cortex-a8-recognize.s
203	Delete the trailing empty line.
test/ELF/arm-fix-cortex-a8-thunk.s
33	You can just use spaces, instead of interleaving spaces and tabs before `beq.w`. <__ThumbV7PILongThunk_far_away+0x4> Unrelated to this patch, I guess `+0x4` is incorrect. If so, we need an llvm-objdump fix as I mentioned in D66539.
test/ELF/arm-fix-cortex-a8-toolarge.s
4	`%t2` -> `/dev/null` because it is not used. PS: I usually use `%t`/`%t.so` as the executable/DSO name for the object file `%t.o`. The suffix (usually empty, `1` or `2`) indicates their relations.
27	`expect` -> `expects`?
42	`expect` -> `expects`?

Thanks very much for the comments. I'll upload a patch shortly with the suggestions applied.

ELF/ARMErrataFix.cpp
203	I think that would help. I've made it so a pointer to the relocation is stored in ScanResult. As ImplementPatch uses all of ScanResult I've just passed through rather than splitting up the parameters.
228	Thanks for pointing out the mistake. It should be isec->getVA() + isec->getSize() + 0x100
337	Yes it can, that makes it a lot simpler; thanks.
491	Thanks for the suggestion, I've implemented that.
test/ELF/arm-fix-cortex-a8-nopatch.s
37	They are definitely intended to be local as the assembler can make more assumptions about resolving fixups without relocations if it is. I can remove them if they are the default as I don't think it is hugely important to emphasise.
test/ELF/arm-fix-cortex-a8-thunk.s
33	Apologies, forgot to untabify the file before posting. Neither GNU or llvm objdump do a particularly good job with Arm, Thumb branches due to the implicit PC offset.
test/ELF/arm-fix-cortex-a8-toolarge.s
27	I've rewritten to "a patch will be attempted". It is difficult to say in the original, I'd say "We expect" or "The test expects".
42	Rewritten to "a patch will be attempted"

Updated diff for latest suggestions. Main changes:

Simplify the mapping symbol calculations
Cache the result of searching for the relocation
Fix a bug when determining whether a patch could reach outside its section.
Update comments and fix white space in tests.

LGTM, just a few nits and a requested test.

I will add a case to make sure the 0xffc is handled in a similar way to 0xffe and will add to the nopatch test case.

ELF/ARMErrataFix.cpp
108	The outer pair of `()` is unnecessary I think.
234	Capitalize
246	is at least 0xffa -> is 0xffa? I haven't verified, but `off = alignTo(isecAddr+off, 0x1000, 0xffa) - isecAddr;` probably works.
373	isecLimit can be defined here, i.e. remove the definition above.

This revision is now accepted and ready to land.Sep 13 2019, 10:09 AM

In D67284#1669708, @MaskRay wrote:

LGTM, just a few nits and a requested test.

I will add a case to make sure the 0xffc is handled in a similar way to 0xffe and will add to the nopatch test case.

Thanks for the review. I've made the updates and will commit shortly, if there is anything else then we should be able to fix it post-commit. There are a couple of test cases in arm-fix-cortex-a8-nopatch.s (CALLSITE6 and CALLSITE7) that check for those cases, but it probably wasn't clear enough. I've expanded the comment to explain what they are checking for.

ELF/ARMErrataFix.cpp
246	Yes, that cleans it up a lot. Thanks for the suggestion.
373	If you mean do something like: uint64_t isecLimit = isec->outSecOff + isec->getSize(); I don't think that will work due to the use of isecLimit outside the for loop on line 384: for (; patchIt != patchEnd; ++patchIt) (*patchIt)->outSecOff = isecLimit;

Herald added a subscriber: dmgreen. · View Herald TranscriptSep 16 2019, 2:16 AM

Closed by commit rL371965: [ELF][ARM] Implement --fix-cortex-a8 to fix erratum 657417 (authored by psmith). · Explain WhySep 16 2019, 2:40 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2019, 2:40 AM

peter.smith mentioned this in D67622: [LLD][AARCH64] Small refactor of AArchErrataFix to match changes in ARMErrataFix NFC..Sep 16 2019, 7:51 AM

peter.smith mentioned this in rG43d32cdd8717: [ELF][AARCH64] Refactor AArchErrataFix to match changes in ARMErrataFix NFC..Sep 17 2019, 2:49 AM

psmith mentioned this in rL372094: [ELF][AARCH64] Refactor AArchErrataFix to match changes in ARMErrataFix NFC..Sep 17 2019, 2:49 AM

Revision Contents

Path

Size

ELF/

11 lines

51 lines

534 lines

1 line

1 line

4 lines

3 lines

7 lines

test/

ELF/

arm-fix-cortex-a8-blx.s

33 lines

arm-fix-cortex-a8-nopatch.s

117 lines

arm-fix-cortex-a8-plt.s

39 lines

arm-fix-cortex-a8-recognize.s

201 lines

arm-fix-cortex-a8-thunk.s

69 lines

arm-fix-cortex-a8-toolarge.s

45 lines

Diff 220127

ELF/AArch64ErrataFix.cpp

	//===- AArch64ErrataFix.cpp -----------------------------------------------===//			//===- AArch64ErrataFix.cpp -----------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// This file implements Section Patching for the purpose of working around			// This file implements Section Patching for the purpose of working around
	// errata in CPUs. The general principle is that an erratum sequence of one or			// the AArch64 Cortex-53 errata 843419 that affects r0p0, r0p1, r0p2 and r0p4
				// versions of the core.
				//
				// The general principle is that an erratum sequence of one or
	// more instructions is detected in the instruction stream, one of the			// more instructions is detected in the instruction stream, one of the
	// instructions in the sequence is replaced with a branch to a patch sequence			// instructions in the sequence is replaced with a branch to a patch sequence
	// of replacement instructions. At the end of the replacement sequence the			// of replacement instructions. At the end of the replacement sequence the
	// patch branches back to the instruction stream.			// patch branches back to the instruction stream.

	// This technique is only suitable for fixing an erratum when:			// This technique is only suitable for fixing an erratum when:
	// - There is a set of necessary conditions required to trigger the erratum that			// - There is a set of necessary conditions required to trigger the erratum that
	// can be detected at static link time.			// can be detected at static link time.
	// - There is a set of replacement instructions that can be used to remove at			// - There is a set of replacement instructions that can be used to remove at
	// least one of the necessary conditions that trigger the erratum.			// least one of the necessary conditions that trigger the erratum.
	// - We can overwrite an instruction in the erratum sequence with a branch to			// - We can overwrite an instruction in the erratum sequence with a branch to
	// the replacement sequence.			// the replacement sequence.
	// - We can place the replacement sequence within range of the branch.			// - We can place the replacement sequence within range of the branch.

	// FIXME:
	// - The implementation here only supports one patch, the AArch64 Cortex-53
	// errata 843419 that affects r0p0, r0p1, r0p2 and r0p4 versions of the core.
	// To keep the initial version simple there is no support for multiple
	// architectures or selection of different patches.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "AArch64ErrataFix.h"			#include "AArch64ErrataFix.h"
	#include "Config.h"			#include "Config.h"
	#include "LinkerScript.h"			#include "LinkerScript.h"
	#include "OutputSections.h"			#include "OutputSections.h"
	#include "Relocations.h"			#include "Relocations.h"
	#include "Symbols.h"			#include "Symbols.h"
	▲ Show 20 Lines • Show All 615 Lines • Show Last 20 Lines

ELF/ARMErrataFix.h

This file was added.

				//===- ARMErrataFix.h -------------------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLD_ELF_ARMA8ERRATAFIX_H
				#define LLD_ELF_ARMA8ERRATAFIX_H

				#include "lld/Common/LLVM.h"
				#include "llvm/ADT/DenseMap.h"
				#include <map>
				#include <vector>

				namespace lld {
				namespace elf {

				class Defined;
				class InputSection;
				struct InputSectionDescription;
				class OutputSection;
				class Patch657417Section;

				class ARMErr657417Patcher {
				public:
				// Return true if Patches have been added to the OutputSections.
				MaskRayUnsubmitted Done Reply Inline Actions Capitalize MaskRay: Capitalize
				bool createFixes();

				private:
				std::vector<Patch657417Section *>
				patchInputSectionDescription(InputSectionDescription &isd);

				void insertPatches(InputSectionDescription &isd,
				std::vector<Patch657417Section *> &patches);

				void init();

				// A cache of the mapping symbols defined by the InputSection sorted in order
				// of ascending value with redundant symbols removed. These describe
				// the ranges of code and data in an executable InputSection.
				llvm::DenseMap<InputSection , std::vector<const Defined >> sectionMap;
				MaskRayUnsubmitted Done Reply Inline Actions `std::map -> DenseMap` `std::map` may give a false impression that the key is ordered. InputSection's are allocated from a BumpPtrAllocator. While the allocator is called "bump", that just refers to the allocation strategy within a slab. When a new slab is allocated by malloc, it is not guaranteed the address will monotonically increase. MaskRay: `std::map -> DenseMap` `std::map` may give a false impression that the key is ordered.

				bool initialized = false;
				};

				} // namespace elf
				} // namespace lld

				#endif

ELF/ARMErrataFix.cpp

This file was added.

				//===- ARMErrataFix.cpp ---------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				// This file implements Section Patching for the purpose of working around the
				// Cortex-a8 erratum 657417 "A 32bit branch instruction that spans 2 4K regions
				// can result in an incorrect instruction fetch or processor deadlock." The
				// erratum affects all but r1p7, r2p5, r2p6, r3p1 and r3p2 revisions of the
				// Cortex-A8. A high level description of the patching technique is given in
				// the opening comment of AArch64ErrataFix.cpp.
				//===----------------------------------------------------------------------===//

				#include "ARMErrataFix.h"

				#include "Config.h"
				#include "LinkerScript.h"
				#include "OutputSections.h"
				#include "Relocations.h"
				#include "Symbols.h"
				#include "SyntheticSections.h"
				#include "Target.h"
				#include "lld/Common/Memory.h"
				#include "lld/Common/Strings.h"
				#include "llvm/Support/Endian.h"
				#include "llvm/Support/raw_ostream.h"
				#include <algorithm>

				using namespace llvm;
				using namespace llvm::ELF;
				using namespace llvm::object;
				using namespace llvm::support;
				using namespace llvm::support::endian;

				namespace lld {
				namespace elf {

				// The documented title for Erratum 657417 is:
				// "A 32bit branch instruction that spans two 4K regions can result in an
				// incorrect instruction fetch or processor deadlock". Graphically using a
				// 32-bit B.w instruction encoded as a pair of halfwords 0xf7fe 0xbfff
				MaskRayUnsubmitted Not Done Reply Inline Actions What are "region 1" and "region 2"? MaskRay: What are "region 1" and "region 2"?
				ruiuUnsubmitted Not Done Reply Inline Actions I guess that means first page and second page? ruiu: I guess that means first page and second page?
				peter.smithAuthorUnsubmitted Done Reply Inline Actions Memory region is the word used in the official erratum description and the Architecture reference manual. It is closely related to a page so in this case it can be thought of as a page. I think memory region is used in the spec as memory regions can also be used in a non-paged context. I'll update the description to make it clear what region 1 and 2 are. peter.smith: Memory region is the word used in the official erratum description and the Architecture…
				// xxxxxx000 // Memory region 1 start
				// target:
				// ...
				// xxxxxxffe f7fe // First halfword of branch to target:
				// xxxxxx000 // Memory region 2 start
				// xxxxxx002 bfff // Second halfword of branch to target:
				//
				// The specific trigger conditions that can be detected at link time are:
				// - There is a 32-bit Thumb-2 branch instruction with an address of the form
				// xxxxxxFFE. The first 2 bytes of the instruction are in 4KiB region 1, the
				// second 2 bytes are in region 2.
				// - The branch instruction is one of BLX, BL, B.w BCC.w
				// - The instruction preceding the branch is a 32-bit non-branch instruction.
				// - The target of the branch is in region 1.
				//
				// The linker mitigation for the fix is to redirect any branch that meets the
				// erratum conditions to a patch section containing a branch to the target.
				//
				// As adding patch sections may move branches onto region boundaries the patch
				// must iterate until no more patches are added.
				//
				MaskRayUnsubmitted Done Reply Inline Actions Omit `lld::elf`. This is not needed because of `using namespace lld::elf;` above. I'm thinking whether it is better to use // delete use namespace lld; namespace lld { namespace elf { ... } } for new files. MaskRay: Omit `lld::elf`. This is not needed because of `using namespace lld::elf;` above. I'm thinking…
				// Example, before:
				// 00000FFA func: NOP.w // 32-bit Thumb function
				// 00000FFE B.W func // 32-bit branch spanning 2 regions, dest in 1st.
				// Example, after:
				// 00000FFA func: NOP.w // 32-bit Thumb function
				// 00000FFE B.w __CortexA8657417_00000FFE
				// 00001002 2 - bytes padding
				// 00001004 __CortexA8657417_00000FFE: B.w func

				class Patch657417Section : public SyntheticSection {
				public:
				Patch657417Section(InputSection *p, uint64_t off, uint32_t instr, bool isARM);

				void writeTo(uint8_t *buf) override;

				size_t getSize() const override { return 4; }

				MaskRayUnsubmitted Done Reply Inline Actions Missing full stop. MaskRay: Missing full stop.
				// Get the virtual address of the branch instruction at patcheeOffset.
				uint64_t getBranchAddr() const;

				// The Section we are patching.
				const InputSection *patchee;
				// The offset of the instruction in the Patchee section we are patching.
				uint64_t patcheeOffset;
				MaskRayUnsubmitted Done Reply Inline Actions Return true if the half-word is the first half of a 32-bit instruction. Do you mean the first half word? MaskRay: > Return true if the half-word is the first half of a 32-bit instruction. Do you mean the…
				// A label for the start of the Patch that we can use as a relocation target.
				MaskRayUnsubmitted Not Done Reply Inline Actions 32-bit Thumb instruction encoding? (The heading as used in the manual) MaskRay: 32-bit Thumb instruction encoding? (The heading as used in the manual)
				peter.smithAuthorUnsubmitted Done Reply Inline Actions Yes, I'll update the comment to improve the reference. peter.smith: Yes, I'll update the comment to improve the reference.
				Symbol *patchSym;
				// A decoding of the branch instruction at patcheeOffset.
				uint32_t instr;
				MaskRayUnsubmitted Not Done Reply Inline Actions op1 != 0b00 ? op1 == 0b00 encodes a 16-bit Thumb instruction. MaskRay: op1 != 0b00 ? op1 == 0b00 encodes a 16-bit Thumb instruction.
				peter.smithAuthorUnsubmitted Done Reply Inline Actions Yes, I forgot to finish the sentence used in the manual. I'll fix up the comment. peter.smith: Yes, I forgot to finish the sentence used in the manual. I'll fix up the comment.
				// True If the patch is to be written in ARM state, otherwise the patch will
				// be written in Thumb state.
				bool isARM;
				};

				// Return true if the half-word, when taken as the first of a pair of halfwords
				// is the first half of a 32-bit instruction.
				// Reference from ARM Architecure Reference Manual ARMv7-A and ARMv7-R edition
				// section A6.3: 32-bit Thumb instruction encoding
				// \| HW1 \| HW2 \|
				// \| 1 1 1 \| op1 (2) \| op2 (7) \| x (4) \|op\| x (15) \|
				// With op1 == 0b00, a 16-bit instruction is encoded.
				//
				// We test only the first halfword, looking for op != 0b00.
				static bool is32bitInstruction(uint16_t hw) {
				return ((hw & 0xe000) == 0xe000 && (hw & 0x1800) != 0x0000);
				MaskRayUnsubmitted Done Reply Inline Actions The outer pair of `()` is unnecessary I think. MaskRay: The outer pair of `()` is unnecessary I think.
				}

				// Reference from ARM Architecure Reference Manual ARMv7-A and ARMv7-R edition
				// section A6.3.4 Branches and miscellaneous control.
				// \| HW1 \| HW2 \|
				// \| 1 1 1 \| 1 0 \| op (7) \| x (4) \| 1 \| op1 (3) \| op2 (4) \| imm8 (8) \|
				// op1 == 0x0 op != x111xxx \| Conditional branch (Bcc.W)
				// op1 == 0x1 \| Branch (B.W)
				// op1 == 1x0 \| Branch with Link and Exchange (BLX.w)
				// op1 == 1x1 \| Branch with Link (BL.W)

				static bool isBcc(uint32_t instr) {
				return (instr & 0xf800d000) == 0xf0008000 &&
				(instr & 0x03800000) != 0x03800000;
				}

				static bool isB(uint32_t instr) { return (instr & 0xf800d000) == 0xf0009000; }

				static bool isBLX(uint32_t instr) { return (instr & 0xf800d000) == 0xf000c000; }

				static bool isBL(uint32_t instr) { return (instr & 0xf800d000) == 0xf000d000; }

				static bool is32bitBranch(uint32_t instr) {
				return isBcc(instr) \|\| isB(instr) \|\| isBL(instr) \|\| isBLX(instr);
				}

				Patch657417Section::Patch657417Section(InputSection *p, uint64_t off,
				uint32_t instr, bool isARM)
				: SyntheticSection(SHF_ALLOC \| SHF_EXECINSTR, SHT_PROGBITS, 4,
				".text.patch"),
				patchee(p), patcheeOffset(off), instr(instr), isARM(isARM) {
				parent = p->getParent();
				patchSym = addSyntheticLocal(
				saver.save("__CortexA8657417_" + utohexstr(getBranchAddr())), STT_FUNC,
				isARM ? 0 : 1, getSize(), *this);
				addSyntheticLocal(saver.save(isARM ? "$a" : "$t"), STT_NOTYPE, 0, 0, *this);
				}

				uint64_t Patch657417Section::getBranchAddr() const {
				return patchee->getVA(patcheeOffset);
				}

				// Given a branch instruction instr at sourceAddr work out its destination
				// address. This is only used when the branch instruction has no relocation.
				static uint64_t getThumbDestAddr(uint64_t sourceAddr, uint32_t instr) {
				uint8_t buf[4];
				write16le(buf, instr >> 16);
				write16le(buf + 2, instr & 0x0000ffff);
				int64_t offset;
				MaskRayUnsubmitted Done Reply Inline Actions Use one write32le? MaskRay: Use one write32le?
				if (isBcc(instr))
				offset = target->getImplicitAddend(buf, R_ARM_THM_JUMP19);
				else if (isB(instr))
				offset = target->getImplicitAddend(buf, R_ARM_THM_JUMP24);
				else
				offset = target->getImplicitAddend(buf, R_ARM_THM_CALL);
				return sourceAddr + offset + 4;
				}

				void Patch657417Section::writeTo(uint8_t *buf) {
				// The base instruction of the patch is always a 32-bit unconditional branch.
				if (isARM)
				write32le(buf, 0xea000000);
				else
				write32le(buf, 0x9000f000);
				// If we have a relocation then apply it. For a SyntheticSection buf already
				// has outSecOff added, but relocateAlloc also adds outSecOff so we need to
				// subtract to avoid double counting.
				if (!relocations.empty()) {
				relocateAlloc(buf - outSecOff, buf - outSecOff + getSize());
				grimarUnsubmitted Done Reply Inline Actions Doesn't seem you need `this->`? grimar: Doesn't seem you need `this->`?
				return;
				}

				// If we don't have a relocation then we must calculate and write the offset
				// ourselves.
				MaskRayUnsubmitted Not Done Reply Inline Actions branchToFirstRegion Is "first region" the term used to refer to `destAddr < sourceAddr && (destAddr & 0xfffff000) == (sourceAddr & 0xfffff000);` MaskRay: > branchToFirstRegion Is "first region" the term used to refer to `destAddr < sourceAddr &&…
				peter.smithAuthorUnsubmitted Done Reply Inline Actions Yes it is, I'll update the function name to make it more explicit. I also worked out that I don't need the destAddr < sourceAddr part. peter.smith: Yes it is, I'll update the function name to make it more explicit. I also worked out that I…
				// Get the destination offset from the addend in the branch instruction.
				// We cannot use the instruction in the patchee section as this will have
				// been altered to point to us!
				uint64_t s = getThumbDestAddr(getBranchAddr(), instr);
				uint64_t p = getVA(4);
				target->relocateOne(buf, isARM ? R_ARM_JUMP24 : R_ARM_THM_JUMP24, s - p);
				}
				grimarUnsubmitted Done Reply Inline Actions Perhaps just inline `a` here? grimar: Perhaps just inline `a` here?

				// Given a branch instruction spanning two 4KiB regions, at offset off from the
				// start of isec, return true if the destination of the branch is within the
				// first of the two 4Kib regions.
				static bool branchDestInFirstRegion(const InputSection *isec, uint64_t off,
				uint32_t instr, const Relocation *r) {
				uint64_t sourceAddr = isec->getVA(0) + off;
				assert((sourceAddr & 0xfff) == 0xffe);
				uint64_t destAddr = sourceAddr;
				// If there is a branch relocation at the same offset we must use this to
				// find the destination address as the branch could be indirected via a thunk
				// or the PLT.
				if (r) {
				uint64_t dst = (r->expr == R_PLT_PC) ? r->sym->getPltVA() : r->sym->getVA();
				MaskRayUnsubmitted Not Done Reply Inline Actions The find_if code sequence is also used in `implementPatch`: auto relIt = llvm::find_if(isec->relocations, [=](const Relocation &r) { return r.offset == patcheeOffset && (r.type == R_ARM_THM_JUMP19 \|\| r.type == R_ARM_THM_JUMP24 \|\| r.type == R_ARM_THM_CALL); }); Is it necessary to store the iterator in ScanResult? MaskRay: The find_if code sequence is also used in `implementPatch`: ``` auto relIt = llvm::find_if…
				peter.smithAuthorUnsubmitted Done Reply Inline Actions I think that would help. I've made it so a pointer to the relocation is stored in ScanResult. As ImplementPatch uses all of ScanResult I've just passed through rather than splitting up the parameters. peter.smith: I think that would help. I've made it so a pointer to the relocation is stored in ScanResult.
				// Account for Thumb PC bias, usually cancelled to 0 by addend of -4.
				destAddr = dst + r->addend + 4;
				} else {
				// If there is no relocation, we must have an intra-section branch
				// We must extract the offset from the addend manually.
				destAddr = getThumbDestAddr(sourceAddr, instr);
				}

				return (destAddr & 0xfffff000) == (sourceAddr & 0xfffff000);
				}

				// Return true if a branch can reach a patch section placed after isec.
				// The Bcc.w instruction has a range of 1 MiB, all others have 16 MiB.
				static bool patchInRange(const InputSection *isec, uint64_t off,
				grimarUnsubmitted Done Reply Inline Actions You need to wrap this line into curly bracers I think, because first branch has it. (LLD coding style feature) grimar: You need to wrap this line into curly bracers I think, because first branch has it. (LLD coding…
				uint32_t instr) {

				// We need the branch at source to reach a patch section placed immediately
				// after isec. As there can be more than one patch in the patch section we
				// add 0x100 as contingency to account for worst case of 1 branch every 4KiB
				// for a 1 MiB range.
				return target->inBranchRange(
				isBcc(instr) ? R_ARM_THM_JUMP19 : R_ARM_THM_JUMP24, isec->getVA(off),
				isec->getVA() + isec->getSize() + 0x100);
				}

				MaskRayUnsubmitted Done Reply Inline Actions `source + isec->getSize() + 0x100` or `isec->getVA() + isec->getSize() + 0x100`? MaskRay: `source + isec->getSize() + 0x100` or `isec->getVA() + isec->getSize() + 0x100`?
				peter.smithAuthorUnsubmitted Done Reply Inline Actions Thanks for pointing out the mistake. It should be isec->getVA() + isec->getSize() + 0x100 peter.smith: Thanks for pointing out the mistake. It should be isec->getVA() + isec->getSize() + 0x100
				struct ScanResult {
				// Offset of branch within its InputSection.
				uint64_t off;
				// Cached decoding of the branch instruction.
				MaskRayUnsubmitted Done Reply Inline Actions ISAddr + Off -> isecAddr + off MaskRay: ISAddr + Off -> isecAddr + off
				uint32_t instr;
				// branch relocation at off. Will be nullptr if no relocation exists.
				MaskRayUnsubmitted Done Reply Inline Actions Capitalize MaskRay: Capitalize
				Relocation *rel;
				};

				// Detect the erratum sequence, returning the offset of the branch instruction
				// and a decoding of the branch. If the erratum sequence is not found then
				// return an offset of 0 for the branch. 0 is a safe value to use for no patch
				// as there must be at least one 32-bit non-branch instruction before the
				// branch so the minimum offset for a patch is 4.
				static ScanResult scanCortexA8Errata657417(InputSection *isec, uint64_t &off,
				uint64_t limit) {
				uint64_t isecAddr = isec->getVA(0);
				// Advance Off so that (isecAddr + off) modulo 0x1000 is at least 0xffa. We
				MaskRayUnsubmitted Not Done Reply Inline Actions is at least 0xffa -> is 0xffa? I haven't verified, but `off = alignTo(isecAddr+off, 0x1000, 0xffa) - isecAddr;` probably works. MaskRay: is at least 0xffa -> is 0xffa? I haven't verified, but `off = alignTo(isecAddr+off, 0x1000…
				peter.smithAuthorUnsubmitted Done Reply Inline Actions Yes, that cleans it up a lot. Thanks for the suggestion. peter.smith: Yes, that cleans it up a lot. Thanks for the suggestion.
				// need to check for a 32-bit instruction immediately before a 32-bit branch
				// at 0xffe modulo 0x1000.
				uint64_t initialPageOff = (isecAddr + off) & 0xfff;
				MaskRayUnsubmitted Done Reply Inline Actions l6 -> 16 MaskRay: l6 -> 16
				if (initialPageOff < 0xffa)
				off += 0xffa - initialPageOff;
				else if (initialPageOff == 0xffc)
				off += 0xffe;
				else if (initialPageOff == 0xffe)
				off += 0xffc;
				if (off >= limit \|\| limit - off < 8) {
				// Need at least 2 4-byte sized instructions to trigger erratum.
				off = limit;
				return {0, 0};
				}

				ScanResult scanRes = {0, 0, nullptr};
				const uint8_t *buf = isec->data().begin();
				// ARMv7-A Thumb 32-bit instructions are encoded 2 consecutive
				MaskRayUnsubmitted Done Reply Inline Actions `l6 -> 16` or just delete `l6-bit`. MaskRay: `l6 -> 16` or just delete `l6-bit`.
				// little-endian halfwords.
				const ulittle16_t instBuf = reinterpret_cast<const ulittle16_t >(buf + off);
				uint16_t hw11 = *instBuf++;
				uint16_t hw12 = *instBuf++;
				uint16_t hw21 = *instBuf++;
				uint16_t hw22 = *instBuf++;
				MaskRayUnsubmitted Not Done Reply Inline Actions Say `off % 0x1000 = 0xffc` before the assignment. The erratum sequence in the next page starts at 0xffa. This increment will skip that erratum sequence. Is this a possible scenario? MaskRay: Say `off % 0x1000 = 0xffc` before the assignment. The erratum sequence in the next page starts…
				peter.smithAuthorUnsubmitted Done Reply Inline Actions Yes I think there might be a case that gets recognized as an erratum where there is something like: 0x0ffc nop.w label: 0x1000 b.w label I will add a case to make sure the 0xffc is handled in a similar way to 0xffe and will add to the nopatch test case. peter.smith: Yes I think there might be a case that gets recognized as an erratum where there is something…
				if (is32bitInstruction(hw11) && is32bitInstruction(hw21)) {
				uint32_t instr1 = (hw11 << 16) \| hw12;
				uint32_t instr2 = (hw21 << 16) \| hw22;
				if (!is32bitBranch(instr1) && is32bitBranch(instr2)) {
				// Find a relocation for the branch if it exists. This will be used
				// to determine the target.
				uint64_t branchOff = off + 4;
				auto relIt = llvm::find_if(isec->relocations, [=](const Relocation &r) {
				return r.offset == branchOff &&
				(r.type == R_ARM_THM_JUMP19 \|\| r.type == R_ARM_THM_JUMP24 \|\|
				r.type == R_ARM_THM_CALL);
				});
				grimarUnsubmitted Done Reply Inline Actions `{}` too. grimar: `{}` too.
				if (relIt != isec->relocations.end())
				scanRes.rel = &(*relIt);
				if (branchDestInFirstRegion(isec, branchOff, instr2, scanRes.rel)) {
				if (patchInRange(isec, branchOff, instr2)) {
				scanRes.off = branchOff;
				MaskRayUnsubmitted Not Done Reply Inline Actions In include/llvm/Object/ELFObjectFile.h and llvm-objdump, we just use `startswith("$a")`. Is there a reason to check `"$a."`? Below, you just use `return ms->getName().startswith("$t");` MaskRay: In include/llvm/Object/ELFObjectFile.h and llvm-objdump, we just use `startswith("$a")`. Is…
				peter.smithAuthorUnsubmitted Done Reply Inline Actions Yes, $afoo without the . is not a mapping symbol. I'm thinking that it might be best to replace the map from InputSection -> vector<Symbol> to InputSection -> std::pair<uint64_t offset, bool isThumb>, this would also work for AArch64. peter.smith:* Yes, $afoo without the . is not a mapping symbol. I'm thinking that it might be best to replace…
				scanRes.instr = instr2;
				} else {
				warn(toString(isec->file) +
				": skipping cortex-a8 657417 erratum sequence, section " +
				isec->name + " is too large to patch");
				}
				}
				}
				}
				off += 0x1000;
				return scanRes;
				MaskRayUnsubmitted Not Done Reply Inline Actions Should there be a check to reject ELF32BE beforehand? It can be placed in Driver.cpp:checkOptions. MaskRay: Should there be a check to reject ELF32BE beforehand? It can be placed in Driver.cpp…
				peter.smithAuthorUnsubmitted Done Reply Inline Actions LLD doesn't support ELF32BE. I'd have thought it would give an error, but without passing an emulation, it seems like I can get LLD will accept a big-endian ELF file without error, I'll need to write a follow up patch to give an error. As an aside: Big Endian is strange on AArch64 and even stranger in ARM. In general it only tends to get used in networking so I've not seen any requests to implement it. AArch64 BE: ELF is BE, instructions are LE, Data is BE ARM BE : ELF is BE, instructions in relocatable objects are BE, Data is BE. Older ARM architectures like v4, v5 and v6 executable/dso instructions have BE instructions. All newer architectures have LE instructions and the linker needs to do the byte swaps. peter.smith: LLD doesn't support ELF32BE. I'd have thought it would give an error, but without passing an…
				}

				void ARMErr657417Patcher::init() {
				grimarUnsubmitted Done Reply Inline Actions What is `b` stands for (here and below)? I'd expect to see `sym` or `s`. grimar: What is `b` stands for (here and below)? I'd expect to see `sym` or `s`.
				// The Arm ABI permits a mix of ARM, Thumb and Data in the same
				// InputSection. We must only scan Thumb instructions to avoid false
				// matches. We use the mapping symbols in the InputObjects to identify this
				// data, caching the results in sectionMap so we don't have to recalculate
				// it each pass.

				// The ABI Section 4.5.5 Mapping symbols; defines local symbols that describe
				// half open intervals [Symbol Value, Next Symbol Value) of code and data
				// within sections. If there is no next symbol then the half open interval is
				// [Symbol Value, End of section). The type, code or data, is determined by
				// the mapping symbol name, $a for Arm code, $t for Thumb code, $d for data.
				auto isArmMapSymbol = [](const Symbol *s) {
				return s->getName() == "$a" \|\| s->getName().startswith("$a.");
				};
				MaskRayUnsubmitted Done Reply Inline Actions `if (mapSyms.size() <= 1) continue` can be deleted. MaskRay: `if (mapSyms.size() <= 1) continue` can be deleted.
				auto isThumbMapSymbol = [](const Symbol *s) {
				return s->getName() == "$t" \|\| s->getName().startswith("$t.");
				};
				auto isDataMapSymbol = [](const Symbol *s) {
				return s->getName() == "$d" \|\| s->getName().startswith("$d.");
				};

				// Collect mapping symbols for every executable InputSection.
				for (InputFile *file : objectFiles) {
				auto *f = cast<ObjFile<ELF32LE>>(file);
				for (Symbol *s : f->getLocalSymbols()) {
				auto *def = dyn_cast<Defined>(s);
				if (!def)
				continue;
				if (!isArmMapSymbol(def) && !isThumbMapSymbol(def) &&
				!isDataMapSymbol(def))
				continue;
				if (auto *sec = dyn_cast_or_null<InputSection>(def->section))
				if (sec->flags & SHF_EXECINSTR)
				sectionMap[sec].push_back(def);
				}
				}
				MaskRayUnsubmitted Done Reply Inline Actions I think these disjunctions can be simplified to: `isThumbMapSymbol(a) == isThumbMapSymbol(b)` MaskRay: I think these disjunctions can be simplified to: `isThumbMapSymbol(a) == isThumbMapSymbol(b)`
				peter.smithAuthorUnsubmitted Done Reply Inline Actions Yes it can, that makes it a lot simpler; thanks. peter.smith: Yes it can, that makes it a lot simpler; thanks.
				// For each InputSection make sure the mapping symbols are in sorted in
				// ascending order and are in alternating Thumb, non-Thumb order.
				for (auto &kv : sectionMap) {
				std::vector<const Defined *> &mapSyms = kv.second;
				llvm::stable_sort(mapSyms, [](const Defined a, const Defined b) {
				return a->value < b->value;
				MaskRayUnsubmitted Done Reply Inline Actions initial Thunk placement? MaskRay: initial Thunk placement?
				});
				mapSyms.erase(std::unique(mapSyms.begin(), mapSyms.end(),
				MaskRayUnsubmitted Done Reply Inline Actions Missing full stop. MaskRay: Missing full stop.
				[=](const Defined a, const Defined b) {
				return (isThumbMapSymbol(a) ==
				isThumbMapSymbol(b));
				}),
				mapSyms.end());
				// Always start with a Thumb Mapping Symbol
				MaskRayUnsubmitted Done Reply Inline Actions Or use `for (; patchIt != patchEnd; ++patchIt)` MaskRay: Or use `for (; patchIt != patchEnd; ++patchIt)`
				if (!mapSyms.empty() && !isThumbMapSymbol(mapSyms.front()))
				MaskRayUnsubmitted Not Done Reply Inline Actions Is it guaranteed that getBranchAddr() is monotonically increasing? MaskRay: Is it guaranteed that getBranchAddr() is monotonically increasing?
				peter.smithAuthorUnsubmitted Done Reply Inline Actions Yes. The getBranchAddr() is essentially the address in the InputSection that has the patch applied to it. Within an InputSectionDescriptions the InputSections->outSecOff monotonically increase, and as the patches are added to the end of the list, the getBranchAddr() monotonically increases. peter.smith: Yes. The getBranchAddr() is essentially the address in the InputSection that has the patch…
				mapSyms.erase(mapSyms.begin());
				}
				initialized = true;
				}

				void ARMErr657417Patcher::insertPatches(
				InputSectionDescription &isd, std::vector<Patch657417Section *> &patches) {
				uint64_t spacing = 0x100000 - 0x7500;
				uint64_t isecLimit;
				uint64_t prevIsecLimit = isd.sections.front()->outSecOff;
				uint64_t patchUpperBound = prevIsecLimit + spacing;
				uint64_t outSecAddr = isd.sections.front()->getParent()->addr;
				MaskRayUnsubmitted Done Reply Inline Actions Capitalize. MaskRay: Capitalize.

				// Set the outSecOff of patches to the place where we want to insert them.
				// We use a similar strategy to initial thunk placement, using 1 MiB as the
				// range of the Thumb-2 conditional branch with a contingency accounting for
				// thunk generation.
				auto patchIt = patches.begin();
				auto patchEnd = patches.end();
				MaskRayUnsubmitted Done Reply Inline Actions if (a->outSecOff != b->outSecOff) return a->outSecOff < b->outSecOff; return isa<Patch657417Section>(a) && !isa<Patch657417Section>(b); MaskRay: ``` if (a->outSecOff != b->outSecOff) return a->outSecOff < b->outSecOff; return…
				for (const InputSection *isec : isd.sections) {
				isecLimit = isec->outSecOff + isec->getSize();
				MaskRayUnsubmitted Not Done Reply Inline Actions isecLimit can be defined here, i.e. remove the definition above. MaskRay: isecLimit can be defined here, i.e. remove the definition above.
				peter.smithAuthorUnsubmitted Done Reply Inline Actions If you mean do something like: uint64_t isecLimit = isec->outSecOff + isec->getSize(); I don't think that will work due to the use of isecLimit outside the for loop on line 384: for (; patchIt != patchEnd; ++patchIt) (patchIt)->outSecOff = isecLimit; peter.smith:* If you mean do something like: ``` uint64_t isecLimit = isec->outSecOff + isec->getSize(); ```…
				if (isecLimit > patchUpperBound) {
				for (; patchIt != patchEnd; ++patchIt) {
				if ((*patchIt)->getBranchAddr() - outSecAddr >= prevIsecLimit)
				ruiuUnsubmitted Done Reply Inline Actions You can return the result of the if condition directly. ruiu: You can return the result of the if condition directly.
				break;
				(*patchIt)->outSecOff = prevIsecLimit;
				}
				patchUpperBound = prevIsecLimit + spacing;
				}
				prevIsecLimit = isecLimit;
				}
				for (; patchIt != patchEnd; ++patchIt)
				(*patchIt)->outSecOff = isecLimit;

				// Merge all patch sections. We use the outSecOff assigned above to
				// determine the insertion point. This is ok as we only merge into an
				// InputSectionDescription once per pass, and at the end of the pass
				// assignAddresses() will recalculate all the outSecOff values.
				std::vector<InputSection *> tmp;
				tmp.reserve(isd.sections.size() + patches.size());
				auto mergeCmp = [](const InputSection a, const InputSection b) {
				if (a->outSecOff != b->outSecOff)
				return a->outSecOff < b->outSecOff;
				return isa<Patch657417Section>(a) && !isa<Patch657417Section>(b);
				};
				std::merge(isd.sections.begin(), isd.sections.end(), patches.begin(),
				patches.end(), std::back_inserter(tmp), mergeCmp);
				isd.sections = std::move(tmp);
				}

				// Given a branch instruction described by ScanRes redirect it to a patch
				// section containing an unconditional branch instruction to the target.
				// Ensure that this patch section is 4-byte aligned so that the branch cannot
				// span two 4 KiB regions. Place the patch section so that it is always after
				// isec so the branch we are patching always goes forwards.
				static void implementPatch(ScanResult sr, InputSection *isec,
				std::vector<Patch657417Section *> &patches) {

				log("detected cortex-a8-657419 erratum sequence starting at " +
				utohexstr(isec->getVA(sr.off)) + " in unpatched output.");
				Patch657417Section *psec;
				// We have two cases to deal with.
				// Case 1. There is a relocation at patcheeOffset to a symbol. The
				// unconditional branch in the patch must have a relocation so that any
				// further redirection via the PLT or a Thunk happens as normal. At
				// patcheeOffset we redirect the existing relocation to a Symbol defined at
				// the start of the patch section.
				//
				// Case 2. There is no relocation at patcheeOffset. We are unlikely to have
				// a symbol that we can use as a target for a relocation in the patch section.
				// Luckily we know that the destination cannot be indirected via the PLT or
				// a Thunk so we can just write the destination directly.
				if (sr.rel) {
				// Case 1. We have an existing relocation to redirect to patch and a
				// Symbol target.

				// Create a branch relocation for the unconditional branch in the patch.
				// This can be redirected via the PLT or Thunks.
				RelType patchRelType = R_ARM_THM_JUMP24;
				int64_t patchRelAddend = sr.rel->addend;
				bool destIsARM = false;
				if (isBL(sr.instr) \|\| isBLX(sr.instr)) {
				// The final target of the branch may be ARM or Thumb, if the target
				// is ARM then we write the patch in ARM state to avoid a state change
				// Thunk from the patch to the target.
				uint64_t dstSymAddr = (sr.rel->expr == R_PLT_PC) ? sr.rel->sym->getPltVA()
				: sr.rel->sym->getVA();
				destIsARM = (dstSymAddr & 1) == 0;
				}
				psec = make<Patch657417Section>(isec, sr.off, sr.instr, destIsARM);
				if (destIsARM) {
				// The patch will be in ARM state. Use an ARM relocation and account for
				// the larger ARM PC-bias of 8 rather than Thumb's 4.
				patchRelType = R_ARM_JUMP24;
				patchRelAddend -= 4;
				}
				psec->relocations.push_back(
				Relocation{sr.rel->expr, patchRelType, 0, patchRelAddend, sr.rel->sym});
				// Redirect the existing branch relocation to the patch.
				sr.rel->expr = R_PC;
				sr.rel->addend = -4;
				sr.rel->sym = psec->patchSym;
				} else {
				// Case 2. We do not have a relocation to the patch. Add a relocation of the
				// appropriate type to the patch at patcheeOffset.

				// The destination is ARM if we have a BLX.
				psec = make<Patch657417Section>(isec, sr.off, sr.instr, isBLX(sr.instr));
				RelType type;
				if (isBcc(sr.instr))
				type = R_ARM_THM_JUMP19;
				else if (isB(sr.instr))
				type = R_ARM_THM_JUMP24;
				else
				type = R_ARM_THM_CALL;
				isec->relocations.push_back(
				Relocation{R_PC, type, sr.off, -4, psec->patchSym});
				}
				patches.push_back(psec);
				}

				// Scan all the instructions in InputSectionDescription, for each instance of
				// the erratum sequence create a Patch657417Section. We return the list of
				// Patch657417Sections that need to be applied to the InputSectionDescription.
				std::vector<Patch657417Section *>
				ARMErr657417Patcher::patchInputSectionDescription(
				InputSectionDescription &isd) {
				std::vector<Patch657417Section *> patches;
				MaskRayUnsubmitted Done Reply Inline Actions Superfluous space in the comment. MaskRay: Superfluous space in the comment.
				for (InputSection *isec : isd.sections) {
				// LLD doesn't use the erratum sequence in SyntheticSections.
				if (isa<SyntheticSection>(isec))
				continue;
				MaskRayUnsubmitted Done Reply Inline Actions `MapSyms` -> `mapSyms` MaskRay: `MapSyms` -> `mapSyms`
				// Use sectionMap to make sure we only scan Thumb code and not Arm or inline
				// data. We have already sorted mapSyms in ascending order and removed
				// consecutive mapping symbols of the same type. Our range of executable
				// instructions to scan is therefore [thumbSym->value, nonThumbSym->value)
				// or [thumbSym->value, section size).
				std::vector<const Defined *> &mapSyms = sectionMap[isec];

				MaskRayUnsubmitted Done Reply Inline Actions Above you use `s->getName() == "$t" \|\| s->getName().startswith("$t.");` MaskRay: Above you use `s->getName() == "$t" \|\| s->getName().startswith("$t.");`
				peter.smithAuthorUnsubmitted Done Reply Inline Actions I think using ms here is defensible. Beforehand we were processing all symbols, that may or may not be mapping symbols. Here we know that the symbol is a mapping symbol as we filtered them out earlier. I don't mind changing if you'd prefer s instead of ms? peter.smith: I think using ms here is defensible. Beforehand we were processing all symbols, that may or may…
				MaskRayUnsubmitted Done Reply Inline Actions `ms` as the variable name is fine. Sorry, I should have been clearer. I meant why `.startswith("$t")` is used here. But I see the reason now: Because the elements contain exclusively mapping symbols: if (!isArmMapSymbol(def) && !isThumbMapSymbol(def) && !isDataMapSymbol(def)) continue; `ms->getName().startswith("$t");` should be sufficient. MaskRay: `ms` as the variable name is fine. Sorry, I should have been clearer. I meant why `.startswith…
				MaskRayUnsubmitted Done Reply Inline Actions In `init()` mapSyms.erase( std::unique(mapSyms.begin(), mapSyms.end(), ...), mapSyms.end()); You can normalize `mapSyms` to start with a thumb mapping symbol (`erase(begin())` if not thumb). Then you can do `auto thumbSym = mapSyms.begin();` here. MaskRay: In `init()` ``` mapSyms.erase( std::unique(mapSyms.begin(), mapSyms.end(), ...)…
				peter.smithAuthorUnsubmitted Done Reply Inline Actions Thanks for the suggestion, I've implemented that. peter.smith: Thanks for the suggestion, I've implemented that.
				auto thumbSym = mapSyms.begin();
				while (thumbSym != mapSyms.end()) {
				auto nonThumbSym = std::next(thumbSym);
				uint64_t off = (*thumbSym)->value;
				uint64_t limit = (nonThumbSym == mapSyms.end()) ? isec->data().size()
				: (*nonThumbSym)->value;

				while (off < limit) {
				ScanResult sr = scanCortexA8Errata657417(isec, off, limit);
				if (sr.off)
				implementPatch(sr, isec, patches);
				}
				if (nonThumbSym == mapSyms.end())
				MaskRayUnsubmitted Done Reply Inline Actions !initialized MaskRay: !initialized
				break;
				thumbSym = std::next(nonThumbSym);
				}
				}
				return patches;
				}

				bool ARMErr657417Patcher::createFixes() {
				if (!initialized)
				init();

				bool addressesChanged = false;
				for (OutputSection *os : outputSections) {
				if (!(os->flags & SHF_ALLOC) \|\| !(os->flags & SHF_EXECINSTR))
				continue;
				for (BaseCommand *bc : os->sectionCommands)
				if (auto *isd = dyn_cast<InputSectionDescription>(bc)) {
				std::vector<Patch657417Section *> patches =
				patchInputSectionDescription(*isd);
				if (!patches.empty()) {
				insertPatches(*isd, patches);
				addressesChanged = true;
				}
				}
				}
				return addressesChanged;
				}

				} // namespace elf
				} // namespace lld

ELF/CMakeLists.txt

Show All 16 Lines	add_lld_library(lldELF
Arch/MipsArchTree.cpp		Arch/MipsArchTree.cpp
Arch/MSP430.cpp		Arch/MSP430.cpp
Arch/PPC.cpp		Arch/PPC.cpp
Arch/PPC64.cpp		Arch/PPC64.cpp
Arch/RISCV.cpp		Arch/RISCV.cpp
Arch/SPARCV9.cpp		Arch/SPARCV9.cpp
Arch/X86.cpp		Arch/X86.cpp
Arch/X86_64.cpp		Arch/X86_64.cpp
		ARMErrataFix.cpp
CallGraphSort.cpp		CallGraphSort.cpp
DWARF.cpp		DWARF.cpp
Driver.cpp		Driver.cpp
DriverUtils.cpp		DriverUtils.cpp
EhFrame.cpp		EhFrame.cpp
ICF.cpp		ICF.cpp
InputFiles.cpp		InputFiles.cpp
InputSection.cpp		InputSection.cpp
Show All 35 Lines

ELF/Config.h

Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	struct Configuration {
bool disableVerify;		bool disableVerify;
bool ehFrameHdr;		bool ehFrameHdr;
bool emitLLVM;		bool emitLLVM;
bool emitRelocs;		bool emitRelocs;
bool enableNewDtags;		bool enableNewDtags;
bool executeOnly;		bool executeOnly;
bool exportDynamic;		bool exportDynamic;
bool fixCortexA53Errata843419;		bool fixCortexA53Errata843419;
		bool fixCortexA8;
bool forceBTI;		bool forceBTI;
bool formatBinary = false;		bool formatBinary = false;
bool requireCET;		bool requireCET;
bool gcSections;		bool gcSections;
bool gdbIndex;		bool gdbIndex;
bool gnuHash = false;		bool gnuHash = false;
bool gnuUnique;		bool gnuUnique;
bool hasDynamicList = false;		bool hasDynamicList = false;
▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

ELF/Driver.cpp

Show First 20 Lines • Show All 293 Lines • ▼ Show 20 Lines	static void checkOptions() {
// The MIPS ABI as of 2016 does not support the GNU-style symbol lookup		// The MIPS ABI as of 2016 does not support the GNU-style symbol lookup
// table which is a relatively new feature.		// table which is a relatively new feature.
if (config->emachine == EM_MIPS && config->gnuHash)		if (config->emachine == EM_MIPS && config->gnuHash)
error("the .gnu.hash section is not compatible with the MIPS target");		error("the .gnu.hash section is not compatible with the MIPS target");

if (config->fixCortexA53Errata843419 && config->emachine != EM_AARCH64)		if (config->fixCortexA53Errata843419 && config->emachine != EM_AARCH64)
error("--fix-cortex-a53-843419 is only supported on AArch64 targets");		error("--fix-cortex-a53-843419 is only supported on AArch64 targets");

		if (config->fixCortexA8 && config->emachine != EM_ARM)
		error("--fix-cortex-a8 is only supported on ARM targets");

if (config->tocOptimize && config->emachine != EM_PPC64)		if (config->tocOptimize && config->emachine != EM_PPC64)
error("--toc-optimize is only supported on the PowerPC64 target");		error("--toc-optimize is only supported on the PowerPC64 target");

if (config->pie && config->shared)		if (config->pie && config->shared)
error("-shared and -pie may not be used together");		error("-shared and -pie may not be used together");

if (!config->shared && !config->filterList.empty())		if (!config->shared && !config->filterList.empty())
error("-F may not be used without -shared");		error("-F may not be used without -shared");
▲ Show 20 Lines • Show All 520 Lines • ▼ Show 20 Lines	static void readConfigs(opt::InputArgList &args) {
config->entry = args.getLastArgValue(OPT_entry);		config->entry = args.getLastArgValue(OPT_entry);
config->executeOnly =		config->executeOnly =
args.hasFlag(OPT_execute_only, OPT_no_execute_only, false);		args.hasFlag(OPT_execute_only, OPT_no_execute_only, false);
config->exportDynamic =		config->exportDynamic =
args.hasFlag(OPT_export_dynamic, OPT_no_export_dynamic, false);		args.hasFlag(OPT_export_dynamic, OPT_no_export_dynamic, false);
config->filterList = args::getStrings(args, OPT_filter);		config->filterList = args::getStrings(args, OPT_filter);
config->fini = args.getLastArgValue(OPT_fini, "_fini");		config->fini = args.getLastArgValue(OPT_fini, "_fini");
config->fixCortexA53Errata843419 = args.hasArg(OPT_fix_cortex_a53_843419);		config->fixCortexA53Errata843419 = args.hasArg(OPT_fix_cortex_a53_843419);
		config->fixCortexA8 = args.hasArg(OPT_fix_cortex_a8);
config->forceBTI = args.hasArg(OPT_force_bti);		config->forceBTI = args.hasArg(OPT_force_bti);
config->requireCET = args.hasArg(OPT_require_cet);		config->requireCET = args.hasArg(OPT_require_cet);
config->gcSections = args.hasFlag(OPT_gc_sections, OPT_no_gc_sections, false);		config->gcSections = args.hasFlag(OPT_gc_sections, OPT_no_gc_sections, false);
config->gnuUnique = args.hasFlag(OPT_gnu_unique, OPT_no_gnu_unique, true);		config->gnuUnique = args.hasFlag(OPT_gnu_unique, OPT_no_gnu_unique, true);
config->gdbIndex = args.hasFlag(OPT_gdb_index, OPT_no_gdb_index, false);		config->gdbIndex = args.hasFlag(OPT_gdb_index, OPT_no_gdb_index, false);
config->icf = getICF(args);		config->icf = getICF(args);
config->ignoreDataAddressEquality =		config->ignoreDataAddressEquality =
args.hasArg(OPT_ignore_data_address_equality);		args.hasArg(OPT_ignore_data_address_equality);
▲ Show 20 Lines • Show All 1,101 Lines • Show Last 20 Lines

ELF/Options.td

	Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines

	defm filter: Eq<"filter", "Set DT_FILTER field to the specified name">;			defm filter: Eq<"filter", "Set DT_FILTER field to the specified name">;

	defm fini: Eq<"fini", "Specify a finalizer function">, MetaVarName<"<symbol>">;			defm fini: Eq<"fini", "Specify a finalizer function">, MetaVarName<"<symbol>">;

	def fix_cortex_a53_843419: F<"fix-cortex-a53-843419">,			def fix_cortex_a53_843419: F<"fix-cortex-a53-843419">,
	HelpText<"Apply fixes for AArch64 Cortex-A53 erratum 843419">;			HelpText<"Apply fixes for AArch64 Cortex-A53 erratum 843419">;

				def fix_cortex_a8: F<"fix-cortex-a8">,
				HelpText<"Apply fixes for ARM Cortex-A8 erratum 657417">;

	// This option is intentionally hidden from the user as the implementation			// This option is intentionally hidden from the user as the implementation
	// is not complete.			// is not complete.
	def require_cet: F<"require-cet">;			def require_cet: F<"require-cet">;

	def force_bti: F<"force-bti">,			def force_bti: F<"force-bti">,
	HelpText<"Force enable AArch64 BTI in PLT, warn if Input ELF file does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI property">;			HelpText<"Force enable AArch64 BTI in PLT, warn if Input ELF file does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI property">;

	defm format: Eq<"format", "Change the input format of the inputs following this option">,			defm format: Eq<"format", "Change the input format of the inputs following this option">,
	▲ Show 20 Lines • Show All 387 Lines • Show Last 20 Lines

ELF/Writer.cpp

	//===- Writer.cpp ---------------------------------------------------------===//			//===- Writer.cpp ---------------------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "Writer.h"			#include "Writer.h"
	#include "AArch64ErrataFix.h"			#include "AArch64ErrataFix.h"
				#include "ARMErrataFix.h"
	#include "CallGraphSort.h"			#include "CallGraphSort.h"
	#include "Config.h"			#include "Config.h"
	#include "LinkerScript.h"			#include "LinkerScript.h"
	#include "MapFile.h"			#include "MapFile.h"
	#include "OutputSections.h"			#include "OutputSections.h"
	#include "Relocations.h"			#include "Relocations.h"
	#include "SymbolTable.h"			#include "SymbolTable.h"
	#include "Symbols.h"			#include "Symbols.h"
	▲ Show 20 Lines • Show All 1,507 Lines • ▼ Show 20 Lines

	// We need to generate and finalize the content that depends on the address of			// We need to generate and finalize the content that depends on the address of
	// InputSections. As the generation of the content may also alter InputSection			// InputSections. As the generation of the content may also alter InputSection
	// addresses we must converge to a fixed point. We do that here. See the comment			// addresses we must converge to a fixed point. We do that here. See the comment
	// in Writer<ELFT>::finalizeSections().			// in Writer<ELFT>::finalizeSections().
	template <class ELFT> void Writer<ELFT>::finalizeAddressDependentContent() {			template <class ELFT> void Writer<ELFT>::finalizeAddressDependentContent() {
	ThunkCreator tc;			ThunkCreator tc;
	AArch64Err843419Patcher a64p;			AArch64Err843419Patcher a64p;
				ARMErr657417Patcher a32p;
	script->assignAddresses();			script->assignAddresses();

	int assignPasses = 0;			int assignPasses = 0;
	for (;;) {			for (;;) {
	bool changed = target->needsThunks && tc.createThunks(outputSections);			bool changed = target->needsThunks && tc.createThunks(outputSections);

	// With Thunk Size much smaller than branch range we expect to			// With Thunk Size much smaller than branch range we expect to
	// converge quickly; if we get to 10 something has gone wrong.			// converge quickly; if we get to 10 something has gone wrong.
	if (changed && tc.pass >= 10) {			if (changed && tc.pass >= 10) {
	error("thunk creation not converged");			error("thunk creation not converged");
	break;			break;
	}			}

	if (config->fixCortexA53Errata843419) {			if (config->fixCortexA53Errata843419) {
				MaskRayUnsubmitted Done Reply Inline Actions Alternatively, if (config->fixCortexA53Errata843419) { if (changed) script->assignAddresses(); changed \|= a64p.createFixes(); } if (config->fixCortexA8) { if (changed) script->assignAddresses(); changed \|= a32p.createFixes(); } if you think it is clearer. MaskRay: Alternatively, ``` if (config->fixCortexA53Errata843419) { if (changed)…
	if (changed)			if (changed)
	script->assignAddresses();			script->assignAddresses();
	changed \|= a64p.createFixes();			changed \|= a64p.createFixes();
	}			}
				if (config->fixCortexA8) {
				if (changed)
				script->assignAddresses();
				changed \|= a32p.createFixes();
				}

	if (in.mipsGot)			if (in.mipsGot)
	in.mipsGot->updateAllocSize();			in.mipsGot->updateAllocSize();

	for (Partition &part : partitions) {			for (Partition &part : partitions) {
	changed \|= part.relaDyn->updateAllocSize();			changed \|= part.relaDyn->updateAllocSize();
	if (part.relrDyn)			if (part.relrDyn)
	changed \|= part.relrDyn->updateAllocSize();			changed \|= part.relrDyn->updateAllocSize();
	▲ Show 20 Lines • Show All 1,169 Lines • Show Last 20 Lines

test/ELF/arm-fix-cortex-a8-blx.s

This file was added.

				// REQUIRES: arm
				// RUN: llvm-mc -filetype=obj -triple=armv7a-linux-gnueabihf --arm-add-build-attributes %s -o %t.o
				// RUN: ld.lld --fix-cortex-a8 -verbose %t.o -o %t2 2>&1 \| FileCheck %s
				// RUN: llvm-objdump -d --no-show-raw-insn --start-address=0x12ffa --stop-address=0x13008 %t2 \| FileCheck --check-prefix=CHECK-PATCH %s

				/// Test that the patch can work on an unrelocated BLX. Neither clang or GCC
				/// will emit these without a relocation, but they could be produced by ELF
				/// processing tools.

				// CHECK: ld.lld: detected cortex-a8-657419 erratum sequence starting at 12FFE in unpatched output.

				.syntax unified
				.text

				.type _start, %function
				.balign 4096
				.global _start
				.arm
				_start:
				bx lr
				.space 4086
				.thumb
				/// 32-bit Branch link and exchange spans 2 4KiB regions, preceded by a
				/// 32-bit non branch instruction. Expect a patch.
				nop.w
				/// Encoding for blx _start. Use .inst.n directives to avoid a relocation.
				.inst.n 0xf7ff
				.inst.n 0xe800

				// CHECK-PATCH: 12ffa: nop.w
				// CHECK-PATCH-NEXT: 12ffe: blx #4
				// CHECK-PATCH: 00013004 __CortexA8657417_12FFE:
				// CHECK-PATCH-NEXT: 13004: b #-4104

test/ELF/arm-fix-cortex-a8-nopatch.s

This file was added.

				// REQUIRES: arm
				// RUN: llvm-mc -filetype=obj -triple=armv7a-linux-gnueabihf --arm-add-build-attributes %s -o %t.o
				// RUN: ld.lld --fix-cortex-a8 -verbose %t.o -o %t2
				// RUN: llvm-objdump -d %t2 --start-address=0x12ffa --stop-address=0x13002 --no-show-raw-insn \| FileCheck --check-prefix=CALLSITE1 %s
				// RUN: llvm-objdump -d %t2 --start-address=0x13ffa --stop-address=0x14002 --no-show-raw-insn \| FileCheck --check-prefix=CALLSITE2 %s
				// RUN: llvm-objdump -d %t2 --start-address=0x14ffa --stop-address=0x15002 --no-show-raw-insn \| FileCheck --check-prefix=CALLSITE3 %s
				// RUN: llvm-objdump -d %t2 --start-address=0x15ffa --stop-address=0x16006 --no-show-raw-insn \| FileCheck --check-prefix=CALLSITE4 %s
				// RUN: llvm-objdump -d %t2 --start-address=0x16ffe --stop-address=0x17002 --no-show-raw-insn \| FileCheck --check-prefix=CALLSITE5 %s
				// RUN: llvm-objdump -d %t2 --start-address=0x18000 --stop-address=0x18004 --no-show-raw-insn \| FileCheck --check-prefix=CALLSITE6 %s
				// RUN: llvm-objdump -d %t2 --start-address=0x19002 --stop-address=0x19006 --no-show-raw-insn \| FileCheck --check-prefix=CALLSITE7 %s

				/// Test boundary conditions of the cortex-a8 erratum. The following cases
				/// should not trigger the Erratum
				.syntax unified
				.thumb
				.text
				.global _start
				.balign 4096
				.thumb_func
				_start:
				nop.w
				.space 4086
				.thumb_func
				target:
				/// 32-bit branch spans 2 4KiB regions, preceded by a 32-bit branch so no patch
				/// expected.
				b.w target
				b.w target

				// CALLSITE1: 00012ffa target:
				// CALLSITE1-NEXT: 12ffa: b.w #-4
				// CALLSITE1-NEXT: 12ffe: b.w #-8

				.space 4088
				.type target2, %function
				target2:
				/// 32-bit Branch and link spans 2 4KiB regions, preceded by a 16-bit
				MaskRayUnsubmitted Not Done Reply Inline Actions `.local` is the default for a defined symbol. Is the directive here to emphasize the symbol is local? Same question goes for other `target` symbols. MaskRay:* `.local` is the default for a defined symbol. Is the directive here to emphasize the symbol is…
				peter.smithAuthorUnsubmitted Done Reply Inline Actions They are definitely intended to be local as the assembler can make more assumptions about resolving fixups without relocations if it is. I can remove them if they are the default as I don't think it is hugely important to emphasise. peter.smith: They are definitely intended to be local as the assembler can make more assumptions about…
				/// instruction so no patch expected.
				nop
				nop
				bl target2

				// CALLSITE2: 00013ffa target2:
				// CALLSITE2-NEXT: 13ffa: nop
				// CALLSITE2-NEXT: 13ffc: nop
				// CALLSITE2-NEXT: 13ffe: bl #-8

				.space 4088
				.type target3, %function
				target3:
				/// 32-bit conditional branch spans 2 4KiB regions, preceded by a 32-bit
				/// non branch instruction, branch is backwards but outside 4KiB region. So
				/// expect no patch.
				nop.w
				beq.w target2

				// CALLSITE3: 00014ffa target3:
				// CALLSITE3-NEXT: 14ffa: nop.w
				// CALLSITE3-NEXT: 14ffe: beq.w #-4104

				.space 4088
				.type source4, %function
				source4:
				/// 32-bit conditional branch spans 2 4KiB regions, preceded by a 32-bit
				/// non branch instruction, branch is forwards to 2nd region so expect no patch.
				nop.w
				beq.w target4
				.thumb_func
				target4:
				nop.w

				// CALLSITE4: 00015ffa source4:
				// CALLSITE4-NEXT: 15ffa: nop.w
				// CALLSITE4-NEXT: 15ffe: beq.w #0
				// CALLSITE4: 00016002 target4:
				// CALLSITE4-NEXT: 16002: nop.w

				.space 4084
				.type target5, %function

				target5:
				/// 32-bit conditional branch spans 2 4KiB regions, preceded by the encoding of
				/// a 32-bit thumb instruction, but in ARM state (illegal instruction), we
				/// should not decode and match it as Thumb, expect no patch.
				.arm
				.inst 0x800f3af /// nop.w encoding in Thumb
				.thumb
				.thumb_func
				source5:
				beq.w target5

				// CALLSITE5: 00016ffe source5:
				// CALLSITE5-NEXT: 16ffe: beq.w #-8

				/// Edge case where two word sequence starts at offset 0xffc, check that
				/// we don't match.
				.space 4090
				.type target6, %function
				nop.w
				/// Make sure target of branch is in the same 4KiB region as the branch.
				target6:
				bl target6

				// CALLSITE6: 00018000 target6:
				// CALLSITE6-NEXT: 18000: bl #-4

				/// Edge case where two word sequence starts at offset 0xffe, check that
				/// we don't match.
				.space 4090
				.type target7, %function
				nop.w
				/// Make sure target of branch is in the same 4KiB region as the branch.
				target7:
				bl target7

				// CALLSITE7: 00019002 target7:
				// CALLSITE7: 19002: bl #-4
				MaskRayUnsubmitted Done Reply Inline Actions Add a line: `CALLSITE7: 00019002 target7` MaskRay: Add a line: `CALLSITE7: 00019002 target7`

test/ELF/arm-fix-cortex-a8-plt.s

This file was added.

				// REQUIRES: arm
				// RUN: llvm-mc -filetype=obj -triple=armv7a-linux-gnueabihf --arm-add-build-attributes %s -o %t.o
				// RUN: echo "SECTIONS { \
				// RUN: .plt 0x2000 : { (.plt) (.plt.*) } \
				// RUN: .text : { *(.text) } \
				// RUN: }" > %t.script

				// RUN: ld.lld --script %t.script --fix-cortex-a8 --shared -verbose %t.o -o %t2
				// RUN: llvm-objdump -d --start-address=0x2020 --stop-address=0x202c --no-show-raw-insn %t2 \| FileCheck --check-prefix=CHECK-PLT %s
				// RUN: llvm-objdump -d --start-address=0x2ffa --stop-address=0x3008 --no-show-raw-insn %t2 \| FileCheck %s

				/// If we patch a branch instruction that is indirected via the PLT then we
				/// must make sure the patch goes via the PLT

				// CHECK-PLT: 2020: add r12, pc, #0, #12
				// CHECK-PLT-NEXT: 2024: add r12, r12, #4096
				// CHECK-PLT-NEXT: 2028: ldr pc, [r12, #68]!

				.syntax unified
				.thumb

				.global external
				.type external, %function

				.text
				.balign 2048

				.space 2042
				.global source
				.thumb_func
				source:
				nop.w
				bl external

				// CHECK: 00002ffa source:
				// CHECK-NEXT: 2ffa: nop.w
				// CHECK-NEXT: 2ffe: blx #4
				// CHECK: 00003004 __CortexA8657417_2FFE:
				// CHECK-NEXT: 3004: b #-4076

test/ELF/arm-fix-cortex-a8-recognize.s

This file was added.

				// REQUIRES: arm
				// RUN: llvm-mc -filetype=obj -triple=armv7a-linux-gnueabihf --arm-add-build-attributes %s -o %t.o
				// RUN: ld.lld --fix-cortex-a8 -verbose %t.o -o %t2 2>&1 \| FileCheck %s
				// RUN: llvm-objdump -d %t2 --start-address=0x1a004 --stop-address=0x1a024 --no-show-raw-insn \| FileCheck --check-prefix=CHECK-PATCHES %s
				// RUN: llvm-objdump -d %t2 --start-address=0x12ffa --stop-address=0x13002 --no-show-raw-insn \| FileCheck --check-prefix=CALLSITE1 %s
				MaskRayUnsubmitted Done Reply Inline Actions `-start-address` -> `--start-address` MaskRay: `-start-address` -> `--start-address`
				// RUN: llvm-objdump -d %t2 --start-address=0x13ffa --stop-address=0x14002 --no-show-raw-insn \| FileCheck --check-prefix=CALLSITE2 %s
				// RUN: llvm-objdump -d %t2 --start-address=0x14ffa --stop-address=0x15002 --no-show-raw-insn \| FileCheck --check-prefix=CALLSITE3 %s
				// RUN: llvm-objdump -d %t2 --start-address=0x15ff4 --stop-address=0x16002 --no-show-raw-insn \| FileCheck --check-prefix=CALLSITE4 %s
				// RUN: llvm-objdump -d %t2 --start-address=0x16ffa --stop-address=0x17002 --no-show-raw-insn \| FileCheck --check-prefix=CALLSITE5 %s
				// RUN: llvm-objdump -d %t2 --start-address=0x17ffa --stop-address=0x18002 --no-show-raw-insn \| FileCheck --check-prefix=CALLSITE6 %s
				// RUN: llvm-objdump -d %t2 --start-address=0x18ffa --stop-address=0x19002 --no-show-raw-insn \| FileCheck --check-prefix=CALLSITE7 %s
				// RUN: llvm-objdump -d %t2 --start-address=0x19ff4 --stop-address=0x1a002 --no-show-raw-insn \| FileCheck --check-prefix=CALLSITE8 %s

				// CHECK: ld.lld: detected cortex-a8-657419 erratum sequence starting at 12FFE in unpatched output.
				// CHECK-NEXT: ld.lld: detected cortex-a8-657419 erratum sequence starting at 13FFE in unpatched output.
				MaskRayUnsubmitted Done Reply Inline Actions Align `ld.lld` MaskRay: Align `ld.lld`
				// CHECK-NEXT: ld.lld: detected cortex-a8-657419 erratum sequence starting at 14FFE in unpatched output.
				// CHECK-NEXT: ld.lld: detected cortex-a8-657419 erratum sequence starting at 15FFE in unpatched output.
				// CHECK-NEXT: ld.lld: detected cortex-a8-657419 erratum sequence starting at 16FFE in unpatched output.
				// CHECK-NEXT: ld.lld: detected cortex-a8-657419 erratum sequence starting at 17FFE in unpatched output.
				// CHECK-NEXT: ld.lld: detected cortex-a8-657419 erratum sequence starting at 18FFE in unpatched output.

				/// Basic tests for the -fix-cortex-a8 erratum fix. The full details of the
				/// erratum and the patch are in ARMA8ErrataFix.cpp . The test creates an
				/// instance of the erratum every 4KiB (32-bit non-branch, followed by 32-bit
				/// branch instruction, where the branch instruction spans two 4 KiB regions,
				/// and the branch destination is in the first 4KiB region.
				///
				/// Test each 32-bit branch b.w, bcc.w, bl, blx. For b.w, bcc.w, and bl we
				/// check the relocated and non-relocated forms. The blx instruction
				/// always has a relocation in assembler.
				.syntax unified
				.thumb
				.text
				.global _start
				.type _start, %function
				.balign 4096
				.thumb_func
				_start:
				nop.w
				.space 4086
				.thumb_func
				.global target
				.type target, %function
				target:
				/// 32-bit Branch spans 2 4KiB regions, preceded by a 32-bit non branch
				/// instruction, expect a patch.
				nop.w
				b.w target

				// CALLSITE1: 00012ffa target:
				// CALLSITE1-NEXT: 12ffa: nop.w
				// CALLSITE1-NEXT: 12ffe: b.w #28674

				.space 4088
				.type target2, %function
				.local target2
				target2:
				/// 32-bit Branch and link spans 2 4KiB regions, preceded by a 32-bit
				/// non branch instruction, expect a patch.
				nop.w
				bl target2

				// CALLSITE2: 00013ffa target2:
				// CALLSITE2-NEXT: 13ffa: nop.w
				// CALLSITE2-NEXT: 13ffe: bl #24582

				.space 4088
				.type target3, %function
				.local target3
				target3:
				/// 32-bit conditional branch spans 2 4KiB regions, preceded by a 32-bit
				/// non branch instruction, expect a patch.
				nop.w
				beq.w target3

				// CALLSITE3: 00014ffa target3:
				// CALLSITE3-NEXT: 14ffa: nop.w
				// CALLSITE3-NEXT: 14ffe: beq.w #20490

				.space 4082
				.type target4, %function
				.local target4
				.arm
				target4:
				bx lr
				.space 2
				.thumb
				/// 32-bit Branch link and exchange spans 2 4KiB regions, preceded by a
				/// 32-bit non branch instruction, blx always goes via relocation. Expect
				/// a patch.
				nop.w
				blx target4

				/// Target = 0x19010 __CortexA8657417_15FFE
				// CALLSITE4: 00015ff4 target4:
				// CALLSITE4-NEXT: 15ff4: bx lr
				// CALLSITE4: 15ff8: 00 00 .short 0x0000
				// CALLSITE4: 15ffa: nop.w
				// CALLSITE4-NEXT: 15ffe: blx #16400

				/// Separate sections for source and destination of branches to force
				/// a relocation.
				.section .text.0, "ax", %progbits
				.balign 2
				.global target5
				.type target5, %function
				target5:
				nop.w
				.section .text.1, "ax", %progbits
				.space 4084
				/// 32-bit branch spans 2 4KiB regions, preceded by a 32-bit non branch
				/// instruction, expect a patch. Branch to global symbol so goes via a
				/// relocation.
				nop.w
				b.w target5

				/// Target = 0x19014 __CortexA8657417_16FFE
				// CALLSITE5: 16ffa: nop.w
				// CALLSITE5-NEXT: 16ffe: b.w #12306

				.section .text.2, "ax", %progbits
				.balign 2
				.global target6
				.type target6, %function
				target6:
				nop.w
				.section .text.3, "ax", %progbits
				.space 4084
				/// 32-bit branch and link spans 2 4KiB regions, preceded by a 32-bit
				/// non branch instruction, expect a patch. Branch to global symbol so
				/// goes via a relocation.
				nop.w
				bl target6

				/// Target = 0x19018 __CortexA8657417_17FFE
				// CALLSITE6: 17ffa: nop.w
				// CALLSITE6-NEXT: 17ffe: bl #8214

				.section .text.4, "ax", %progbits
				.global target7
				.type target7, %function
				target7:
				nop.w
				.section .text.5, "ax", %progbits
				.space 4084
				/// 32-bit conditional branch spans 2 4KiB regions, preceded by a 32-bit
				/// non branch instruction, expect a patch. Branch to global symbol so
				/// goes via a relocation.
				nop.w
				bne.w target7

				// CALLSITE7: 18ffa: nop.w
				// CALLSITE7-NEXT: 18ffe: bne.w #4122

				.section .text.6, "ax", %progbits
				.space 4082
				.arm
				.global target8
				.type target8, %function
				target8:
				bx lr

				.section .text.7, "ax", %progbits
				.space 2
				.thumb
				/// 32-bit Branch link spans 2 4KiB regions, preceded by a 32-bit non branch
				/// instruction, expect a patch. The target of the BL is in ARM state so we
				/// expect it to be turned into a BLX. The patch must be in ARM state to
				/// avoid a state change thunk.
				nop.w
				bl target8

				// CALLSITE8: 00019ff4 target8:
				// CALLSITE8-NEXT: 19ff4: bx lr
				// CALLSITE8: 19ff8: 00 00 .short 0x0000
				// CALLSITE8: 19ffa: nop.w
				// CALLSITE8-NEXT: 19ffe: blx #32

				// CHECK-PATCHES: 0001a004 __CortexA8657417_12FFE:
				// CHECK-PATCHES-NEXT: 1a004: b.w #-28686

				// CHECK-PATCHES: 0001a008 __CortexA8657417_13FFE:
				// CHECK-PATCHES-NEXT: 1a008: b.w #-24594

				// CHECK-PATCHES: 0001a00c __CortexA8657417_14FFE:
				// CHECK-PATCHES-NEXT: 1a00c: b.w #-20502

				// CHECK-PATCHES: 0001a010 __CortexA8657417_15FFE:
				// CHECK-PATCHES-NEXT: 1a010: b #-16420

				// CHECK-PATCHES: 0001a014 __CortexA8657417_16FFE:
				// CHECK-PATCHES-NEXT: 1a014: b.w #-16406

				// CHECK-PATCHES: 0001a018 __CortexA8657417_17FFE:
				// CHECK-PATCHES-NEXT: 1a018: b.w #-12314

				// CHECK-PATCHES: 0001a01c __CortexA8657417_18FFE:
				// CHECK-PATCHES-NEXT: 1a01c: b.w #-8222

				// CHECK-PATCHES: 0001a020 __CortexA8657417_19FFE:
				// CHECK-PATCHES-NEXT: 1a020: b #-52
				MaskRayUnsubmitted Done Reply Inline Actions Delete the trailing empty line. MaskRay: Delete the trailing empty line.

test/ELF/arm-fix-cortex-a8-thunk.s

This file was added.

				// REQUIRES: arm
				// RUN: llvm-mc -filetype=obj -triple=armv7a-linux-gnueabihf --arm-add-build-attributes %s -o %t.o
				// RUN: echo "SECTIONS { \
				// RUN: .text0 0x011006 : { *(.text.00) } \
				// RUN: .text1 0x110000 : { (.text.01) (.text.02) *(.text.03) \
				// RUN: *(.text.04) } \
				// RUN: .text2 0x210000 : { *(.text.05) } } " > %t.script
				// RUN: ld.lld --script %t.script --fix-cortex-a8 --shared -verbose %t.o -o %t2 2>&1
				// RUN: llvm-objdump -d --no-show-raw-insn --start-address=0x110000 --stop-address=0x110010 %t2 \| FileCheck --check-prefix=THUNK %s
				// RUN: llvm-objdump -d --no-show-raw-insn --start-address=0x110ffa --stop-address=0x111008 %t2 \| FileCheck --check-prefix=PATCH %s
				// RUN: llvm-objdump -d --no-show-raw-insn --start-address=0x111008 --stop-address=0x111010 %t2 \| FileCheck --check-prefix=THUNK2 %s

				/// Test cases for Cortex-a8 Erratum 657417 that involve interactions with
				/// range extension thunks. Both erratum fixes and range extension thunks need
				/// precise information and after creation alter address information.
				.thumb

				.section .text.00, "ax", %progbits
				.thumb_func
				early:
				bx lr

				.section .text.01, "ax", %progbits
				.balign 4096
				.globl _start
				.type _start, %function
				_start:
				beq.w far_away
				/// Thunk to far_away and state change needed, size 12-bytes goes here.
				// THUNK: 00110000 _start:
				// THUNK-NEXT: 110000: beq.w #0 <__ThumbV7PILongThunk_far_away+0x4>
				// THUNK: 00110004 __ThumbV7PILongThunk_far_away:
				// THUNK-NEXT: 110004: movw r12, #65524
				MaskRayUnsubmitted Not Done Reply Inline Actions You can just use spaces, instead of interleaving spaces and tabs before `beq.w`. <__ThumbV7PILongThunk_far_away+0x4> Unrelated to this patch, I guess `+0x4` is incorrect. If so, we need an llvm-objdump fix as I mentioned in D66539. MaskRay: You can just use spaces, instead of interleaving spaces and tabs before `beq.w`. >…
				peter.smithAuthorUnsubmitted Done Reply Inline Actions Apologies, forgot to untabify the file before posting. Neither GNU or llvm objdump do a particularly good job with Arm, Thumb branches due to the implicit PC offset. peter.smith: Apologies, forgot to untabify the file before posting. Neither GNU or llvm objdump do a…
				// THUNK-NEXT: 110008: movt r12, #15
				// THUNK-NEXT: 11000c: add r12, pc
				// THUNK-NEXT: 11000e: bx r12

				.section .text.02, "ax", %progbits
				.space 4096 - 22

				.section .text.03, "ax", %progbits
				.thumb_func
				target:
				/// After thunk is added this branch will line up across 2 4 KiB regions
				/// and will trigger a patch.
				nop.w
				bl target

				/// Expect erratum patch inserted here
				// PATCH: 00110ffa target:
				// PATCH-NEXT: 110ffa: nop.w
				// PATCH-NEXT: 110ffe: bl #2
				// PATCH: 00111004 __CortexA8657417_110FFE:
				// PATCH-NEXT: 111004: b.w #-14

				// THUNK2: 00111008 __ThumbV7PILongThunk_early:
				// THUNK2-NEXT: 111008: b.w #-1048582
				.section .text.04, "ax", %progbits
				/// The erratum patch will push this branch out of range, so another
				/// range extension thunk will be needed.
				beq.w early
				// THUNK2-NEXT 11100c: beq.w #-8
				/// Expect range extension thunk here.
				.section .text.05, "ax", %progbits
				.arm
				nop
				.type far_away, %function
				far_away:
				bx lr

test/ELF/arm-fix-cortex-a8-toolarge.s

This file was added.

				// REQUIRES: arm
				// RUN: llvm-mc -filetype=obj -triple=armv7a-linux-gnueabihf --arm-add-build-attributes %s -o %t.o
				// RUN: ld.lld --fix-cortex-a8 -verbose %t.o -o /dev/null 2>&1 \| FileCheck %s
				/// Test that we warn, but don't attempt to patch when it is impossible to
				MaskRayUnsubmitted Done Reply Inline Actions `%t2` -> `/dev/null` because it is not used. PS: I usually use `%t`/`%t.so` as the executable/DSO name for the object file `%t.o`. The suffix (usually empty, `1` or `2`) indicates their relations. MaskRay: `%t2` -> `/dev/null` because it is not used. PS: I usually use `%t`/`%t.so` as the…
				/// redirect the branch as the Section is too large.

				// CHECK: skipping cortex-a8 657417 erratum sequence, section .text is too large to patch
				// CHECK: skipping cortex-a8 657417 erratum sequence, section .text.02 is too large to patch

				.syntax unified
				.thumb
				/// Case 1: 1 MiB conditional branch range without relocation.
				.text
				.global _start
				.type _start, %function
				.balign 4096
				.thumb_func
				_start:
				nop.w
				.space 4086
				.thumb_func
				.global target
				.type target, %function
				target:
				/// 32-bit Branch spans 2 4KiB regions, preceded by a 32-bit non branch
				/// instruction, a patch will be attempted. Unfortunately the branch
				/// cannot reach outside the section so we have to abort the patch.
				MaskRayUnsubmitted Not Done Reply Inline Actions `expect` -> `expects`? MaskRay: `expect` -> `expects`?
				peter.smithAuthorUnsubmitted Done Reply Inline Actions I've rewritten to "a patch will be attempted". It is difficult to say in the original, I'd say "We expect" or "The test expects". peter.smith: I've rewritten to "a patch will be attempted". It is difficult to say in the original, I'd say…
				nop.w
				beq.w target
				.space 1024 * 1024

				/// Case 2: 16 MiB
				.section .text.01, "ax", %progbits
				.balign 4096
				.space 4090
				.global target2
				.thumb_func
				target2:
				.section .text.02, "ax", %progbits
				/// 32-bit Branch and link spans 2 4KiB regions, preceded by a 32-bit
				/// non branch instruction, a patch will be be attempted. Unfortunately the
				/// the BL cannot reach outside the section so we have to abort the patch.
				MaskRayUnsubmitted Not Done Reply Inline Actions `expect` -> `expects`? MaskRay: `expect` -> `expects`?
				peter.smithAuthorUnsubmitted Done Reply Inline Actions Rewritten to "a patch will be attempted" peter.smith: Rewritten to "a patch will be attempted"
				nop.w
				bl target2
				.space 16 * 1024 * 1024