Download Raw Diff

Details

Reviewers

grimar
peter.smith
ruiu
srhines
• espindola
psmith

Commits

rGb498d99338f8: [ELF] Start a new PT_LOAD if LMA region is different

Summary

GNU ld has a counterintuitive lang_propagate_lma_regions rule.

// .foo's LMA region is propagated to .bar because their VMA region is the same,
// and .bar does not have an explicit output section address (addr_tree).
.foo : { *(.foo) } >RAM AT> FLASH
.bar : { *(.bar) } >RAM

// An explicit output section address disables propagation.
.foo : { *(.foo) } >RAM AT> FLASH
.bar . : { *(.bar) } >RAM

In both cases, lld thinks .foo's LMA region is propagated and
places .bar in the same PT_LOAD, so lld diverges from GNU ld w.r.t. the
second case (lma-align.test).

This patch changes Writer<ELFT>::createPhdrs to disable propagation
(start a new PT_LOAD). A user of the first case can make linker scripts
portable by explicitly specifying AT>. By contrast, there was no
workaround for the old behavior.

This change uncovers another LMA related bug in assignOffsets() where
ctx->lmaOffset = 0; was omitted. It caused a spurious "load address
range overlaps" error for at2.test

The new PT_LOAD rule is complex. For convenience, I listed the origins of some subexpressions:

rL323449: sec->memRegion == load->firstSec->memRegion; linkerscript/at3.test
D43284: load->lastSec == Out::programHeaders (don't start a new PT_LOAD after program headers); linkerscript/at4.test
D58892: sec != relroEnd (start a new PT_LOAD after PT_GNU_RELRO)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

MaskRay created this revision.Feb 9 2020, 9:07 AM

Herald added a reviewer: • espindola. · View Herald TranscriptFeb 9 2020, 9:07 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, arichardson, emaste. · View Herald Transcript

MaskRay added a parent revision: D74286: [ELF] Respect output section alignment for AT> (non-null lmaRegion).Feb 9 2020, 9:08 AM

Harbormaster failed remote builds in B46047: Diff 243450!Feb 9 2020, 9:23 AM

grimar mentioned this in D74286: [ELF] Respect output section alignment for AT> (non-null lmaRegion).Feb 10 2020, 2:04 AM

Change looks good to me. Some small nits.

lld/ELF/Writer.cpp
2118	I 'm struggling to read that boolean expression. Would it be possible to move some of the bits around so that the && is at the end or maybe calculate the more complex ones and name them? A reorder (untested) if (!load \|\| sec->memRegion != load->firstSec->memRegion \|\| flags != newFlags \|\| sec == relroEnd \|\| (load->lastSec != Out::programHeaders && (sec->lmaExpr \|\| !sec->lmaRegion != !lmaRegion \|\| (sec->lmaRegion && sec->lmaRegion != lmaRegion)) ) ); Is there any way to rewrite this in the positive form, we have quite a lot of negatives. `!sec->lmaRegion != !lmaRegion` For example `set->lmaRegion == nullptr != lmaRegion == nullptr` writing it in that form made it easier to read.

negative forms -> positive forms

List origins of some subexpressions

Harbormaster failed remote builds in B46113: Diff 243602!Feb 10 2020, 10:18 AM

Harbormaster failed remote builds in B46115: Diff 243604!Feb 10 2020, 10:38 AM

Thanks for the update, looks good to me.
Some suggestions for the expression, not sure if they are better or not, so by all means keep the original if you prefer.

Suggestion 1, rename propagateLMA to sameLMARegion, my brain tended to lump the program header case and the same memory region under propagateLMA

bool sameLMARegion =
    load && !sec->lmaExpr && sec->lmaRegion == load->firstSec->lmaRegion;
if (!(load && newFlags == flags && sec != relroEnd &&
      sec->memRegion == load->firstSec->memRegion &&
      (sameLMARegion || load->lastSec == Out::programHeaders)))

Suggestion 2, add || load->lastSec == Out::programHeaders into propagateLMA

bool propagateLMA =
    (load && !sec->lmaExpr && sec->lmaRegion == load->firstSec->lmaRegion) || load->lastSec == Out::programHeaders;
if (!(load && newFlags == flags && sec != relroEnd &&
      sec->memRegion == load->firstSec->memRegion &&
      propagateLMA))

This revision is now accepted and ready to land.Feb 11 2020, 7:01 AM

nickdesaulniers removed a reviewer: nickdesaulniers.Feb 11 2020, 9:23 AM

nickdesaulniers added a subscriber: nickdesaulniers.

Adopt peter.smith's Suggestion 1

In D74297#1869573, @psmith wrote:
Thanks for the update, looks good to me.
Some suggestions for the expression, not sure if they are better or not, so by all means keep the original if you prefer.

Suggestion 1, rename propagateLMA to sameLMARegion, my brain tended to lump the program header case and the same memory region under propagateLMA
bool sameLMARegion =
    load && !sec->lmaExpr && sec->lmaRegion == load->firstSec->lmaRegion;
if (!(load && newFlags == flags && sec != relroEnd &&
      sec->memRegion == load->firstSec->memRegion &&
      (sameLMARegion || load->lastSec == Out::programHeaders)))
Suggestion 2, add || load->lastSec == Out::programHeaders into propagateLMA
bool propagateLMA =
    (load && !sec->lmaExpr && sec->lmaRegion == load->firstSec->lmaRegion) || load->lastSec == Out::programHeaders;
if (!(load && newFlags == flags && sec != relroEnd &&
      sec->memRegion == load->firstSec->memRegion &&
      propagateLMA))

Thanks for the suggestions.

Adopted Suggestion 1.

For Suggestion 2: load->lastSec == Out::programHeaders is a special case from D43284. It is not the same LMA region, so I will not place the condition into sameLMARegion.

Harbormaster failed remote builds in B46271: Diff 243988!Feb 11 2020, 2:04 PM

Closed by commit rGb498d99338f8: [ELF] Start a new PT_LOAD if LMA region is different (authored by MaskRay). · Explain WhyFeb 12 2020, 8:24 AM

This revision was automatically updated to reflect the committed changes.

MaskRay mentioned this in D74755: [llvm-objcopy] Attribute an empty section to a segment ending at its address.Feb 26 2020, 10:08 AM

MaskRay mentioned this in D76995: [ELF] Propagate LMA offset to sections with neither AT() nor AT>.Mar 28 2020, 1:35 PM

MaskRay mentioned this in rGbb4a36ea2802: [ELF] Propagate LMA offset to sections with neither AT() nor AT>.Apr 1 2020, 8:28 AM

Diff 244191

lld/ELF/LinkerScript.cpp

Show First 20 Lines • Show All 825 Lines • ▼ Show 20 Lines	void LinkerScript::assignOffsets(OutputSection *sec) {
// region, we need to expand the current region to account for the space		// region, we need to expand the current region to account for the space
// between the previous section, if any, and the start of this section.		// between the previous section, if any, and the start of this section.
if (ctx->memRegion && ctx->memRegion->curPos < dot)		if (ctx->memRegion && ctx->memRegion->curPos < dot)
expandMemoryRegion(ctx->memRegion, dot - ctx->memRegion->curPos,		expandMemoryRegion(ctx->memRegion, dot - ctx->memRegion->curPos,
ctx->memRegion->name, sec->name);		ctx->memRegion->name, sec->name);

switchTo(sec);		switchTo(sec);

		ctx->lmaOffset = 0;

if (sec->lmaExpr)		if (sec->lmaExpr)
ctx->lmaOffset = sec->lmaExpr().getValue() - dot;		ctx->lmaOffset = sec->lmaExpr().getValue() - dot;

if (MemoryRegion *mr = sec->lmaRegion)		if (MemoryRegion *mr = sec->lmaRegion)
ctx->lmaOffset = alignTo(mr->curPos, sec->alignment) - dot;		ctx->lmaOffset = alignTo(mr->curPos, sec->alignment) - dot;

// If neither AT nor AT> is specified for an allocatable section, the linker		// If neither AT nor AT> is specified for an allocatable section, the linker
// will set the LMA such that the difference between VMA and LMA for the		// will set the LMA such that the difference between VMA and LMA for the
// section is the same as the preceding output section in the same region		// section is the same as the preceding output section in the same region
// https://sourceware.org/binutils/docs-2.20/ld/Output-Section-LMA.html		// https://sourceware.org/binutils/docs-2.20/ld/Output-Section-LMA.html
// This, however, should only be done by the first "non-header" section		// This, however, should only be done by the first "non-header" section
▲ Show 20 Lines • Show All 368 Lines • Show Last 20 Lines

lld/ELF/Writer.cpp

Show First 20 Lines • Show All 2,104 Lines • ▼ Show 20 Lines	for (OutputSection *sec : outputSections) {
// Segments are contiguous memory regions that has the same attributes		// Segments are contiguous memory regions that has the same attributes
// (e.g. executable or writable). There is one phdr for each segment.		// (e.g. executable or writable). There is one phdr for each segment.
// Therefore, we need to create a new phdr when the next section has		// Therefore, we need to create a new phdr when the next section has
// different flags or is loaded at a discontiguous address or memory		// different flags or is loaded at a discontiguous address or memory
// region using AT or AT> linker script command, respectively. At the same		// region using AT or AT> linker script command, respectively. At the same
// time, we don't want to create a separate load segment for the headers,		// time, we don't want to create a separate load segment for the headers,
// even if the first output section has an AT or AT> attribute.		// even if the first output section has an AT or AT> attribute.
uint64_t newFlags = computeFlags(sec->getPhdrFlags());		uint64_t newFlags = computeFlags(sec->getPhdrFlags());
if (!load \|\|		bool sameLMARegion =
((sec->lmaExpr \|\|		load && !sec->lmaExpr && sec->lmaRegion == load->firstSec->lmaRegion;
(sec->lmaRegion && (sec->lmaRegion != load->firstSec->lmaRegion))) &&		if (!(load && newFlags == flags && sec != relroEnd &&
load->lastSec != Out::programHeaders) \|\|		sec->memRegion == load->firstSec->memRegion &&
sec->memRegion != load->firstSec->memRegion \|\| flags != newFlags \|\|		(sameLMARegion \|\| load->lastSec == Out::programHeaders))) {
sec == relroEnd) {
load = addHdr(PT_LOAD, newFlags);		load = addHdr(PT_LOAD, newFlags);
		psmithUnsubmitted Not Done Reply Inline Actions I 'm struggling to read that boolean expression. Would it be possible to move some of the bits around so that the && is at the end or maybe calculate the more complex ones and name them? A reorder (untested) if (!load \|\| sec->memRegion != load->firstSec->memRegion \|\| flags != newFlags \|\| sec == relroEnd \|\| (load->lastSec != Out::programHeaders && (sec->lmaExpr \|\| !sec->lmaRegion != !lmaRegion \|\| (sec->lmaRegion && sec->lmaRegion != lmaRegion)) ) ); Is there any way to rewrite this in the positive form, we have quite a lot of negatives. `!sec->lmaRegion != !lmaRegion` For example `set->lmaRegion == nullptr != lmaRegion == nullptr` writing it in that form made it easier to read. psmith: I 'm struggling to read that boolean expression. Would it be possible to move some of the bits…
flags = newFlags;		flags = newFlags;
}		}

load->add(sec);		load->add(sec);
}		}

// Add a TLS segment if any.		// Add a TLS segment if any.
PhdrEntry *tlsHdr = make<PhdrEntry>(PT_TLS, PF_R);		PhdrEntry *tlsHdr = make<PhdrEntry>(PT_TLS, PF_R);
▲ Show 20 Lines • Show All 607 Lines • Show Last 20 Lines

lld/test/ELF/linkerscript/Inputs/at2.s

	.section .foo1, "ax"			.section .foo1, "ax"
	.quad 0			.quad 0

	.section .foo2, "ax"			.section .foo2, "ax"
	.quad 0			.quad 0

	.section .bar1, "aw"			.section .bar1, "aw"
	.quad 0			.quad 0

	.section .bar2, "aw"			.section .bar2, "aw"
	.quad 0			.quad 0

	.section .bar3, "aw"			.section .bar3, "aw"
	.quad 0			.quad 0

				.section .bar4, "aw"
				.quad 0

lld/test/ELF/linkerscript/at2.test

	# REQUIRES: x86			# REQUIRES: x86
	# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %p/Inputs/at2.s -o %t.o			# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %p/Inputs/at2.s -o %t.o
	# RUN: ld.lld -o %t.exe %t.o --script %s			# RUN: ld.lld -o %t.exe %t.o --script %s
	# RUN: llvm-readelf -l %t.exe \| FileCheck %s			# RUN: llvm-readelf -l %t.exe \| FileCheck %s
	# RUN: llvm-objdump -section-headers %t.exe \| FileCheck %s --check-prefix=SECTIONS			# RUN: llvm-objdump -section-headers %t.exe \| FileCheck %s --check-prefix=SECTIONS

	MEMORY {			MEMORY {
	AX (ax) : ORIGIN = 0x2000, LENGTH = 0x100			AX (ax) : ORIGIN = 0x2000, LENGTH = 0x100
	AW (aw) : ORIGIN = 0x3000, LENGTH = 0x100			AW (aw) : ORIGIN = 0x3000, LENGTH = 0x100
	FLASH (ax) : ORIGIN = 0x6000, LENGTH = 0x100			FLASH (ax) : ORIGIN = 0x6000, LENGTH = 0x100
	RAM (aw) : ORIGIN = 0x7000, LENGTH = 0x100			RAM (aw) : ORIGIN = 0x7000, LENGTH = 0x100
	}			}

	SECTIONS {			SECTIONS {
	.foo1 : { *(.foo1) } > AX AT>FLASH			.foo1 : { *(.foo1) } > AX AT>FLASH
				## In GNU ld, .foo1's LMA region is propagated to .foo2 because their VMA region
				## is the same and .foo2 does not set an explicit address.
				## lld sets .foo2's LMA region to null.
	.foo2 : { *(.foo2) } > AX			.foo2 : { *(.foo2) } > AX
	.bar1 : { *(.bar1) } > AW AT> RAM
				.bar1 : { *(.bar1) } > AW
	.bar2 : { *(.bar2) } > AW AT > RAM			.bar2 : { *(.bar2) } > AW AT > RAM
	.bar3 : { *(.bar3) } > AW AT >RAM			.bar3 . : { *(.bar3) } > AW
				.bar4 : { *(.bar4) } > AW AT >RAM
	}			}

	# CHECK: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align			# CHECK: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
	# CHECK-NEXT: LOAD 0x001000 0x0000000000002000 0x0000000000006000 0x000010 0x000010 R E 0x1000			# CHECK-NEXT: LOAD 0x001000 0x0000000000002000 0x0000000000006000 0x000008 0x000008 R E 0x1000
	# CHECK-NEXT: LOAD 0x002000 0x0000000000003000 0x0000000000007000 0x000018 0x000018 RW 0x1000			# CHECK-NEXT: LOAD 0x001008 0x0000000000002008 0x0000000000002008 0x000008 0x000008 R E 0x1000
				# CHECK-NEXT: LOAD 0x002000 0x0000000000003000 0x0000000000003000 0x000008 0x000008 RW 0x1000
				# CHECK-NEXT: LOAD 0x002008 0x0000000000003008 0x0000000000007000 0x000008 0x000008 RW 0x1000
				# CHECK-NEXT: LOAD 0x002010 0x0000000000003010 0x0000000000003010 0x000008 0x000008 RW 0x1000
				# CHECK-NEXT: LOAD 0x002018 0x0000000000003018 0x0000000000007008 0x000008 0x000008 RW 0x1000

	# SECTIONS: Sections:			# SECTIONS: Sections:
	# SECTIONS-NEXT: Idx Name Size VMA			# SECTIONS-NEXT: Idx Name Size VMA
	# SECTIONS-NEXT: 0 00000000 0000000000000000			# SECTIONS-NEXT: 0 00000000 0000000000000000
	# SECTIONS-NEXT: 1 .foo1 00000008 0000000000002000			# SECTIONS-NEXT: 1 .foo1 00000008 0000000000002000
	# SECTIONS-NEXT: 2 .foo2 00000008 0000000000002008			# SECTIONS-NEXT: 2 .foo2 00000008 0000000000002008
	# SECTIONS-NEXT: 3 .text 00000000 0000000000002010			# SECTIONS-NEXT: 3 .text 00000000 0000000000002010
	# SECTIONS-NEXT: 4 .bar1 00000008 0000000000003000			# SECTIONS-NEXT: 4 .bar1 00000008 0000000000003000
	# SECTIONS-NEXT: 5 .bar2 00000008 0000000000003008			# SECTIONS-NEXT: 5 .bar2 00000008 0000000000003008
	# SECTIONS-NEXT: 6 .bar3 00000008 0000000000003010			# SECTIONS-NEXT: 6 .bar3 00000008 0000000000003010
				# SECTIONS-NEXT: 7 .bar4 00000008 0000000000003018

lld/test/ELF/linkerscript/at8.test

	# REQUIRES: x86			# REQUIRES: x86
	# RUN: llvm-mc -filetype=obj -triple=x86_64-pc-linux %p/Inputs/at8.s -o %t.o			# RUN: llvm-mc -filetype=obj -triple=x86_64-pc-linux %p/Inputs/at8.s -o %t.o
	# RUN: ld.lld %t.o --script %s -o %t			# RUN: ld.lld %t.o --script %s -o %t
	# RUN: llvm-readelf -sections -program-headers %t \| FileCheck %s			# RUN: llvm-readelf -sections -program-headers %t \| FileCheck %s

	MEMORY {			MEMORY {
	FLASH : ORIGIN = 0x08000000, LENGTH = 0x100			FLASH : ORIGIN = 0x08000000, LENGTH = 0x100
	RAM : ORIGIN = 0x20000000, LENGTH = 0x200			RAM : ORIGIN = 0x20000000, LENGTH = 0x200
	}			}

	SECTIONS {			SECTIONS {
	.text : { *(.text) } > FLASH			.text : { *(.text) } > FLASH
	.sec1 : { *(.sec1) } > RAM AT > FLASH			.sec1 : { *(.sec1) } > RAM AT > FLASH
	.sec2 : { *(.sec2) } > RAM			.sec2 : { *(.sec2) } > RAM AT > FLASH
	.sec3 : { *(.sec3) } > RAM AT > FLASH			.sec3 : { *(.sec3) } > RAM AT > FLASH
	}			}

	# Make sure we do not issue a load-address overlap error			# Make sure we do not issue a load-address overlap error
	# Previously, .sec3 would overwrite the LMAOffset in the			# Previously, .sec3 would overwrite the LMAOffset in the
	# PT_LOAD header.			# PT_LOAD header.

	# CHECK: Name Type Address Off			# CHECK: Name Type Address Off
	Show All 9 Lines

lld/test/ELF/linkerscript/lma-align.test

	# REQUIRES: x86			# REQUIRES: x86
	# RUN: echo 'ret; .data.rel.ro; .balign 16; .byte 0; .data; .byte 0; .bss; .byte 0' \| \			# RUN: echo 'ret; .data.rel.ro; .balign 16; .byte 0; .data; .byte 0; .bss; .byte 0' \| \
	# RUN: llvm-mc -filetype=obj -triple=x86_64 - -o %t.o			# RUN: llvm-mc -filetype=obj -triple=x86_64 - -o %t.o
	# RUN: ld.lld -T %s %t.o -o %t			# RUN: ld.lld -T %s %t.o -o %t
	# RUN: llvm-readelf -S -l %t \| FileCheck %s			# RUN: llvm-readelf -S -l %t \| FileCheck %s

	# CHECK: Name Type Address Off Size ES Flg Lk Inf Al			# CHECK: Name Type Address Off Size ES Flg Lk Inf Al
	# CHECK-NEXT: NULL 0000000000000000 000000 000000 00 0 0 0			# CHECK-NEXT: NULL 0000000000000000 000000 000000 00 0 0 0
	# CHECK-NEXT: .text PROGBITS 0000000000001000 001000 000001 00 AX 0 0 4			# CHECK-NEXT: .text PROGBITS 0000000000001000 001000 000001 00 AX 0 0 4
	# CHECK-NEXT: .data.rel.ro PROGBITS 0000000000011000 002000 000001 00 WA 0 0 16			# CHECK-NEXT: .data.rel.ro PROGBITS 0000000000011000 002000 000001 00 WA 0 0 16
	# CHECK-NEXT: .data PROGBITS 0000000000011010 002010 000001 00 WA 0 0 16			# CHECK-NEXT: .data PROGBITS 0000000000011010 002010 000001 00 WA 0 0 16
	# CHECK-NEXT: .bss NOBITS 0000000000011040 002011 000001 00 WA 0 0 64			# CHECK-NEXT: .bss NOBITS 0000000000011040 002040 000001 00 WA 0 0 64

	# CHECK: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align			# CHECK: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
	# CHECK-NEXT: LOAD 0x001000 0x0000000000001000 0x0000000000001000 0x000001 0x000001 R E 0x1000			# CHECK-NEXT: LOAD 0x001000 0x0000000000001000 0x0000000000001000 0x000001 0x000001 R E 0x1000
	# CHECK-NEXT: LOAD 0x002000 0x0000000000011000 0x0000000000001010 0x000001 0x000001 RW 0x1000			# CHECK-NEXT: LOAD 0x002000 0x0000000000011000 0x0000000000001010 0x000001 0x000001 RW 0x1000
	## FIXME .data and .bss should be placed in different PT_LOAD segments			# CHECK-NEXT: LOAD 0x002010 0x0000000000011010 0x0000000000001020 0x000001 0x000001 RW 0x1000
	## because their LMA regions are different.			# CHECK-NEXT: LOAD 0x002040 0x0000000000011040 0x0000000000011040 0x000000 0x000001 RW 0x1000
	# CHECK-NEXT: LOAD 0x002010 0x0000000000011010 0x0000000000001020 0x000001 0x000031 RW 0x1000

	MEMORY {			MEMORY {
	ROM : ORIGIN = 0x1000, LENGTH = 1K			ROM : ORIGIN = 0x1000, LENGTH = 1K
	RAM : ORIGIN = 0x11000, LENGTH = 1K			RAM : ORIGIN = 0x11000, LENGTH = 1K
	}			}
	SECTIONS {			SECTIONS {
	.text 0x1000 : { (.text) } >ROM			.text 0x1000 : { (.text) } >ROM
	## Input section alignment decides output section alignment.			## Input section alignment decides output section alignment.
	.data.rel.ro 0x11000 : { *(.data.rel.ro) } >RAM AT>ROM			.data.rel.ro 0x11000 : { *(.data.rel.ro) } >RAM AT>ROM
	## ALIGN decides output section alignment.			## ALIGN decides output section alignment.
	.data . : ALIGN(16) { (.data) } >RAM AT>ROM			.data . : ALIGN(16) { (.data) } >RAM AT>ROM
				## Start a new PT_LOAD because the LMA region is different from the previous one.
	.bss . : ALIGN(64) { (.bss) } >RAM			.bss . : ALIGN(64) { (.bss) } >RAM
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Start a new PT_LOAD if LMA region is different
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 244191

lld/ELF/LinkerScript.cpp

lld/ELF/Writer.cpp

lld/test/ELF/linkerscript/Inputs/at2.s

lld/test/ELF/linkerscript/at2.test

lld/test/ELF/linkerscript/at8.test

lld/test/ELF/linkerscript/lma-align.test

This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Start a new PT_LOAD if LMA region is differentClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 244191

lld/ELF/LinkerScript.cpp

lld/ELF/Writer.cpp

lld/test/ELF/linkerscript/Inputs/at2.s

lld/test/ELF/linkerscript/at2.test

lld/test/ELF/linkerscript/at8.test

lld/test/ELF/linkerscript/lma-align.test

[ELF] Start a new PT_LOAD if LMA region is different
ClosedPublic