This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/
-
ELF/
-
SyntheticSections.h
1
SyntheticSections.cpp
-
test/ELF/
-
ELF/
1/1
aarch64-cortex-a53-843419-thunk.s
1/1
arm-fix-cortex-a8-thunk.s

Differential D71281

[LLD][ELF][AArch64][ARM] When errata patching, round thunk size to page boundary
ClosedPublic

Authored by peter.smith on Dec 10 2019, 8:54 AM.

Download Raw Diff

Details

Reviewers

ruiu
grimar
MaskRay
• espindola

Commits

rG86d24193a9eb: [LLD][ELF][AArch64][ARM] When errata patching, round thunk size to 4KiB.

Summary

On some edge cases such as Chromium compiled with full instrumentation we have a .text section over twice the size of the maximum branch range and the instrumentation code generation containing many examples of the erratum sequence. The combination of Thunks and many erratum sequences causes finalizeAddressDependentContent() to not converge. We have:
start:

Thunk Creation (disturbs addresses after thunks, creating more patches)
Patch Creation (disturbs addresses after patches, creating more thunks)
goto start

In most images with few thunks and patches the mutual disturbance does not cause convergence problems. As the .text size and number of patches go up the risk increases.

A way to prevent the thunk creation from interfering with patch creation is to round up the size of the thunks to a 4KiB boundary when the erratum patch is enabled. As the erratum sequence only triggers when an instruction sequence starts at 0xff8 or 0xffc modulo (4 KiB) by making the thunks not affect addresses modulo (4 KiB) we prevent thunks from
interfering.

The patch sections themselves could be aggregated in the same way that Thunks are within ThunkSections and have the size rounded up in the same way. This would reduce the number of patches created in a .text section size > 128 MiB but would not help with convergence problems.

fixes (part of) pr44071 Thunk convergence within limit.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

peter.smith created this revision.Dec 10 2019, 8:54 AM

Herald added a reviewer: • espindola. · View Herald TranscriptDec 10 2019, 8:54 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: kristof.beyls, arichardson, emaste. · View Herald Transcript

Thanks for the fix. I'll find some time to understand the logic.

lld/test/ELF/aarch64-cortex-a53-843419-thunk.s
6–7	Nit: indent the continuation line.
lld/test/ELF/arm-fix-cortex-a8-thunk.s
9	With a linker script, the gaps between adjacent sections will not be dumped by `llvm-objdump -d`. llvm-objdump -d may be sufficient to dump 3 pieces of disassembly. Without a linker script, assembly like .space can create a huge number of zeros, which will slow down llvm-objdump -d significantly.

Thanks, I'll upload a new diff with the test changes.

To try and do a better job of explaining the logic.
The -fix-cortex-a53-843419 and -fix-cortex-a8 options are look for an instruction sequence that starts at a particular offset modulo 0x1000. For fix-cortex-a53-843419 the instruction sequence needs to start at 0xff8 (modulo 0x1000) or 0xffc (modulo 0x1000) for -fix-cortex-a8 the instruction sequence needs to start at 0xfffe (modulo 0x1000). When we add a thunk or a patch we displace the sections after them by the size of the content inserted. This can mean that some instruction sequences that were lined up over the 0xff8, 0xffc or 0xfffe are no longer lined up, and potentially there are new sequences that become lined up. Normally the patches get placed at the end of the .text section so this isn't a problem. However for large applications we need to insert the patches in between other .text sections, as well as having more range extension thunks. If we ensure that the thunk section size is rounded up to 0x1000 then the displacement of sections following the thunks does not change (modulo 0x1000), this means we don't generate more instruction sequences with the necessary offsets in the sections following the thunks.

I have a suspicion that the instrumented build, with lots of adrp, ldr, sequences possibly to counters triggers more erratum sequences than usual. I'd need to run some experiments to confirm.

Updated diff with test changes.

LGTM

Thank you for fixing the issue. This is an interesting edge case that I haven't thought about that before.

lld/ELF/SyntheticSections.cpp
3364	nit: this is perhaps my personal preference but I'd think I prefer if (config->fixCortexA53Errata843419 \|\| config->fixCortexA8) return alignTo(size, 4096); return size;

This revision is now accepted and ready to land.Dec 10 2019, 9:15 PM

In D71281#1778886, @ruiu wrote:

LGTM

Thank you for fixing the issue. This is an interesting edge case that I haven't thought about that before.

Thanks for the review, I've applied the style change you recommended.

Closed by commit rG86d24193a9eb: [LLD][ELF][AArch64][ARM] When errata patching, round thunk size to 4KiB. (authored by peter.smith). · Explain WhyDec 11 2019, 6:12 AM

This revision was automatically updated to reflect the committed changes.

peter.smith mentioned this in D72344: [LLD][ELF][ARM][AArch64] Only round up ThunkSection Size when large OS..Jan 7 2020, 9:20 AM

peter.smith mentioned this in rG01ad4c838466: [LLD][ELF][ARM][AArch64] Only round up ThunkSection Size when large OS..Jan 17 2020, 2:52 AM

hans mentioned this in rG852b37f83b2d: [LLD][ELF][ARM][AArch64] Only round up ThunkSection Size when large OS..Feb 4 2020, 2:11 AM

Revision Contents

Path

Size

lld/

ELF/

SyntheticSections.h

2 lines

SyntheticSections.cpp

11 lines

test/

ELF/

aarch64-cortex-a53-843419-thunk.s

44 lines

arm-fix-cortex-a8-thunk.s

42 lines

Diff 233345

lld/ELF/SyntheticSections.h

Show First 20 Lines • Show All 1,027 Lines • ▼ Show 20 Lines	public:
// ThunkSection in OS, with desired outSecOff of Off		// ThunkSection in OS, with desired outSecOff of Off
ThunkSection(OutputSection *os, uint64_t off);		ThunkSection(OutputSection *os, uint64_t off);

// Add a newly created Thunk to this container:		// Add a newly created Thunk to this container:
// Thunk is given offset from start of this InputSection		// Thunk is given offset from start of this InputSection
// Thunk defines a symbol in this InputSection that can be used as target		// Thunk defines a symbol in this InputSection that can be used as target
// of a relocation		// of a relocation
void addThunk(Thunk *t);		void addThunk(Thunk *t);
size_t getSize() const override { return size; }		size_t getSize() const override;
void writeTo(uint8_t *buf) override;		void writeTo(uint8_t *buf) override;
InputSection *getTargetInputSection() const;		InputSection *getTargetInputSection() const;
bool assignOffsets();		bool assignOffsets();

private:		private:
std::vector<Thunk *> thunks;		std::vector<Thunk *> thunks;
size_t size = 0;		size_t size = 0;
};		};
▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines

lld/ELF/SyntheticSections.cpp

	Show First 20 Lines • Show All 3,349 Lines • ▼ Show 20 Lines

	ThunkSection::ThunkSection(OutputSection *os, uint64_t off)			ThunkSection::ThunkSection(OutputSection *os, uint64_t off)
	: SyntheticSection(SHF_ALLOC \| SHF_EXECINSTR, SHT_PROGBITS,			: SyntheticSection(SHF_ALLOC \| SHF_EXECINSTR, SHT_PROGBITS,
	config->wordsize, ".text.thunk") {			config->wordsize, ".text.thunk") {
	this->parent = os;			this->parent = os;
	this->outSecOff = off;			this->outSecOff = off;
	}			}

				// When the errata patching is on, we round the size up to a 4 KiB
				// boundary. This limits the effect that adding Thunks has on the addresses
				// of the program modulo 4 KiB. As the errata patching is sensitive to address
				// modulo 4 KiB this can prevent further patches from being needed due to
				// Thunk insertion.
				size_t ThunkSection::getSize() const {
				if (config->fixCortexA53Errata843419 \|\| config->fixCortexA8)
				ruiuUnsubmitted Not Done Reply Inline Actions nit: this is perhaps my personal preference but I'd think I prefer if (config->fixCortexA53Errata843419 \|\| config->fixCortexA8) return alignTo(size, 4096); return size; ruiu: nit: this is perhaps my personal preference but I'd think I prefer if (config…
				return alignTo(size, 4096);
				return size;
				}

	void ThunkSection::addThunk(Thunk *t) {			void ThunkSection::addThunk(Thunk *t) {
	thunks.push_back(t);			thunks.push_back(t);
	t->addSymbols(*this);			t->addSymbols(*this);
	}			}

	void ThunkSection::writeTo(uint8_t *buf) {			void ThunkSection::writeTo(uint8_t *buf) {
	for (Thunk *t : thunks)			for (Thunk *t : thunks)
	t->writeTo(buf + t->offset);			t->writeTo(buf + t->offset);
	▲ Show 20 Lines • Show All 319 Lines • Show Last 20 Lines

lld/test/ELF/aarch64-cortex-a53-843419-thunk.s

	// REQUIRES: aarch64			// REQUIRES: aarch64
	// RUN: llvm-mc -filetype=obj -triple=aarch64-none-linux %s -o %t.o			// RUN: llvm-mc -filetype=obj -triple=aarch64-none-linux %s -o %t.o
	// RUN: echo "SECTIONS { \			// RUN: echo "SECTIONS { \
	// RUN: .text1 0x10000 : { (.text.01) (.text.02) *(.text.03) } \			// RUN: .text1 0x10000 : { (.text.01) (.text.02) *(.text.03) } \
	// RUN: .text2 0x8010000 : { *(.text.04) } } " > %t.script			// RUN: .text2 0x8010000 : { *(.text.04) } } " > %t.script
	// RUN: ld.lld --script %t.script -fix-cortex-a53-843419 -verbose %t.o -o %t2 2>&1 \			// RUN: ld.lld --script %t.script -fix-cortex-a53-843419 -verbose %t.o -o %t2 \
	// RUN: \| FileCheck -check-prefix=CHECK-PRINT %s			// RUN: 2>&1 \| FileCheck -check-prefix=CHECK-PRINT %s
				MaskRayUnsubmitted Done Reply Inline Actions Nit: indent the continuation line. MaskRay: Nit: indent the continuation line.
	// RUN: llvm-objdump -triple=aarch64-linux-gnu -d %t2 \| FileCheck %s

	// %t2 is 128 Megabytes, so delete it early.			// RUN: llvm-objdump --no-show-raw-insn -triple=aarch64-linux-gnu -d %t2 \| FileCheck %s

				/// %t2 is 128 Megabytes, so delete it early.
	// RUN: rm %t2			// RUN: rm %t2

	// Test cases for Cortex-A53 Erratum 843419 that involve interactions with			/// Test cases for Cortex-A53 Erratum 843419 that involve interactions with
	// range extension thunks. Both erratum fixes and range extension thunks need			/// range extension thunks. Both erratum fixes and range extension thunks need
	// precise address information and after creation alter address information.			/// precise address information and after creation alter address information.


	.section .text.01, "ax", %progbits			.section .text.01, "ax", %progbits
	.balign 4096			.balign 4096
	.globl _start			.globl _start
	.type _start, %function			.type _start, %function
	_start:			_start:
	bl far_away			bl far_away
	// Thunk to far_away, size 16-bytes goes here.			/// Thunk to far_away, size 16-bytes goes here.
				/// Thunk Section with patch enabled has its size rounded up to 4KiB
				/// this leaves the address of following sections the same modulo 4 KiB

	.section .text.02, "ax", %progbits			.section .text.02, "ax", %progbits
	.space 4096 - 28			.space 4096 - 12

	// Erratum sequence will only line up at address 0 modulo 0xffc when			/// Erratum sequence will only line up at address 0 modulo 0xffc when
	// Thunk is inserted.			/// Thunk is inserted.
	.section .text.03, "ax", %progbits			.section .text.03, "ax", %progbits
	.globl t3_ff8_ldr			.globl t3_ff8_ldr
	.type t3_ff8_ldr, %function			.type t3_ff8_ldr, %function
	t3_ff8_ldr:			t3_ff8_ldr:
	adrp x0, dat			adrp x0, dat
	ldr x1, [x1, #0]			ldr x1, [x1, #0]
	ldr x0, [x0, :got_lo12:dat]			ldr x0, [x0, :got_lo12:dat]
	ret			ret

	// CHECK-PRINT: detected cortex-a53-843419 erratum sequence starting at 10FFC in unpatched output.			// CHECK-PRINT: detected cortex-a53-843419 erratum sequence starting at 11FFC in unpatched output.
	// CHECK: t3_ff8_ldr:			// CHECK: 0000000000011ffc t3_ff8_ldr:
	// CHECK-NEXT: 10ffc: 00 00 04 90 adrp x0, #134217728			// CHECK-NEXT: adrp x0, #134213632
	// CHECK-NEXT: 11000: 21 00 40 f9 ldr x1, [x1]			// CHECK-NEXT: ldr x1, [x1]
	// CHECK-NEXT: 11004: 02 00 00 14 b #8			// CHECK-NEXT: b #8
	// CHECK-NEXT: 11008: c0 03 5f d6 ret			// CHECK-NEXT: ret
	// CHECK: __CortexA53843419_11004:			// CHECK: 000000000001200c __CortexA53843419_12004:
	// CHECK-NEXT: 1100c: 00 04 40 f9 ldr x0, [x0, #8]			// CHECK-NEXT: ldr x0, [x0, #8]
	// CHECK-NEXT: 11010: fe ff ff 17 b #-8			// CHECK-NEXT: b #-8

	.section .text.04, "ax", %progbits			.section .text.04, "ax", %progbits
	.globl far_away			.globl far_away
	.type far_away, function			.type far_away, function
	far_away:			far_away:
	ret			ret

	.section .data			.section .data
	.globl dat			.globl dat
	dat: .quad 0			dat: .quad 0

lld/test/ELF/arm-fix-cortex-a8-thunk.s

	// REQUIRES: arm			// REQUIRES: arm
	// RUN: llvm-mc -filetype=obj -triple=armv7a-linux-gnueabihf --arm-add-build-attributes %s -o %t.o			// RUN: llvm-mc -filetype=obj -triple=armv7a-linux-gnueabihf --arm-add-build-attributes %s -o %t.o
	// RUN: echo "SECTIONS { \			// RUN: echo "SECTIONS { \
	// RUN: .text0 0x011006 : { *(.text.00) } \			// RUN: .text0 0x01200a : { *(.text.00) } \
	// RUN: .text1 0x110000 : { (.text.01) (.text.02) *(.text.03) \			// RUN: .text1 0x110000 : { (.text.01) (.text.02) *(.text.03) \
	// RUN: *(.text.04) } \			// RUN: *(.text.04) } \
	// RUN: .text2 0x210000 : { *(.text.05) } } " > %t.script			// RUN: .text2 0x210000 : { *(.text.05) } } " > %t.script
	// RUN: ld.lld --script %t.script --fix-cortex-a8 --shared -verbose %t.o -o %t2 2>&1			// RUN: ld.lld --script %t.script --fix-cortex-a8 --shared -verbose %t.o -o %t2 2>&1
	// RUN: llvm-objdump -d --no-show-raw-insn --start-address=0x110000 --stop-address=0x110010 %t2 \| FileCheck --check-prefix=THUNK %s			// RUN: llvm-objdump -d --no-show-raw-insn %t2 \| FileCheck %s
				MaskRayUnsubmitted Done Reply Inline Actions With a linker script, the gaps between adjacent sections will not be dumped by `llvm-objdump -d`. llvm-objdump -d may be sufficient to dump 3 pieces of disassembly. Without a linker script, assembly like .space can create a huge number of zeros, which will slow down llvm-objdump -d significantly. MaskRay: With a linker script, the gaps between adjacent sections will not be dumped by `llvm-objdump…
	// RUN: llvm-objdump -d --no-show-raw-insn --start-address=0x110ffa --stop-address=0x111008 %t2 \| FileCheck --check-prefix=PATCH %s
	// RUN: llvm-objdump -d --no-show-raw-insn --start-address=0x111008 --stop-address=0x111010 %t2 \| FileCheck --check-prefix=THUNK2 %s

	/// Test cases for Cortex-a8 Erratum 657417 that involve interactions with			/// Test cases for Cortex-a8 Erratum 657417 that involve interactions with
	/// range extension thunks. Both erratum fixes and range extension thunks need			/// range extension thunks. Both erratum fixes and range extension thunks need
	/// precise information and after creation alter address information.			/// precise information and after creation alter address information.
	.thumb			.thumb

	.section .text.00, "ax", %progbits			.section .text.00, "ax", %progbits
	.thumb_func			.thumb_func
	early:			early:
	bx lr			bx lr

	.section .text.01, "ax", %progbits			.section .text.01, "ax", %progbits
	.balign 4096			.balign 4096
	.globl _start			.globl _start
	.type _start, %function			.type _start, %function
	_start:			_start:
	beq.w far_away			beq.w far_away
	/// Thunk to far_away and state change needed, size 12-bytes goes here.			/// Thunk to far_away and state change needed, size 12-bytes goes here.
	// THUNK: 00110000 _start:			// CHECK: 00110004 __ThumbV7PILongThunk_far_away:
	// THUNK-NEXT: 110000: beq.w #0 <__ThumbV7PILongThunk_far_away+0x4>			// CHECK-NEXT: 110004: movw r12, #65524
	// THUNK: 00110004 __ThumbV7PILongThunk_far_away:			// CHECK-NEXT: movt r12, #15
	// THUNK-NEXT: 110004: movw r12, #65524			// CHECK-NEXT: add r12, pc
	// THUNK-NEXT: 110008: movt r12, #15			// CHECK-NEXT: bx r12
	// THUNK-NEXT: 11000c: add r12, pc
	// THUNK-NEXT: 11000e: bx r12

	.section .text.02, "ax", %progbits			.section .text.02, "ax", %progbits
	.space 4096 - 22			.space 4096 - 10

	.section .text.03, "ax", %progbits			.section .text.03, "ax", %progbits
	.thumb_func			.thumb_func
	target:			target:
	/// After thunk is added this branch will line up across 2 4 KiB regions			/// After thunk is added this branch will line up across 2 4 KiB regions
	/// and will trigger a patch.			/// and will trigger a patch.
	nop.w			nop.w
	bl target			bl target

	/// Expect erratum patch inserted here			/// Expect erratum patch inserted here
	// PATCH: 00110ffa target:			// CHECK: 00111ffa target:
	// PATCH-NEXT: 110ffa: nop.w			// CHECK-NEXT: 111ffa: nop.w
	// PATCH-NEXT: 110ffe: bl #2			// CHECK-NEXT: bl #2
	// PATCH: 00111004 __CortexA8657417_110FFE:			// CHECK: 00112004 __CortexA8657417_111FFE:
	// PATCH-NEXT: 111004: b.w #-14			// CHECK-NEXT: 112004: b.w #-14

				/// Expect range extension thunk here.
				// CHECK: 00112008 __ThumbV7PILongThunk_early:
				// CHECK-NEXT: 112008: b.w #-1048578

	// THUNK2: 00111008 __ThumbV7PILongThunk_early:
	// THUNK2-NEXT: 111008: b.w #-1048582
	.section .text.04, "ax", %progbits			.section .text.04, "ax", %progbits
	/// The erratum patch will push this branch out of range, so another			/// The erratum patch will push this branch out of range, so another
	/// range extension thunk will be needed.			/// range extension thunk will be needed.
	beq.w early			beq.w early
	// THUNK2-NEXT 11100c: beq.w #-8			// CHECK: 113008: beq.w #-4100
	/// Expect range extension thunk here.
	.section .text.05, "ax", %progbits			.section .text.05, "ax", %progbits
	.arm			.arm
	nop			nop
	.type far_away, %function			.type far_away, %function
	far_away:			far_away:
	bx lr			bx lr