This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/
-
ELF/
-
ScriptParser.cpp
-
test/ELF/linkerscript/
-
ELF/
-
linkerscript/
-
sections-padding.s

Differential D74687

[LLD][ELF] - Disambiguate "=fillexp" with a primary expression to allow =0x90 /DISCARD/
ClosedPublic

Authored by grimar on Feb 16 2020, 3:19 AM.

Download Raw Diff

Details

Reviewers

MaskRay
ruiu
psmith
• espindola

Commits

rGbb7d2b178022: [LLD][ELF] - Disambiguate "=fillexp" with a primary expression to allow =0x90…

Summary

Fixes https://bugs.llvm.org/show_bug.cgi?id=44903

It is about the following case:

SECTIONS {
  .foo : { *(.foo) } =0x90909090
  /DISCARD/ : { *(.bar) }
}

Here while parsing the fill expression we treated the
"/" of "/DISCARD/" as operator.

With this change, suggested by Fangrui Song, we do
not allow expressions with operators (e.g. "0x1100 + 0x22")
that are not wrapped into round brackets. It should not
be an issue for users, but helps to resolve parsing ambiguity.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

grimar created this revision.Feb 16 2020, 3:19 AM

Herald added a reviewer: • espindola. · View Herald TranscriptFeb 16 2020, 3:19 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: arichardson, emaste. · View Herald Transcript

Documentation: https://sourceware.org/binutils/docs-2.33.1/ld/Output-Section-Fill.html

https://bugs.llvm.org/show_bug.cgi?id=44903 is related to D64130 . The problem is parsing ambiguity.

llvm-mc -filetype=obj linkerscript/sections-padding.s -o a.o

GNU ld's behavior is strange:

% ld.bfd -T =(printf 'SECTIONS { .mysec : { *(.mysec*) } =0x1100 }') a.o -o a
% readelf -Wx .mysec a

Hex dump of section '.mysec':
  0x00000000 66110011 00110011 00110011 00110011 f...............
  0x00000010 6690                                f.

% ld.bfd -T =(printf 'SECTIONS { .mysec : { *(.mysec*) } =0x1100+2+3 }') a.o -o a
% readelf -Wx .mysec a

Hex dump of section '.mysec':
  0x00000000 66000011 05000011 05000011 05000011 f...............
  0x00000010 6690                                f.

Do we have a more elegant fix:) ?

lld/test/ELF/linkerscript/fill-with-discard.test
3 ↗	(On Diff #244863)	`-o /dev/null` Consider merging this test into `sections-padding.s`

In D74687#1878409, @MaskRay wrote:

Do we have a more elegant fix:) ?

Do you have something particular in mind? The only thing I can think of is to treat '\n' as an expression terminator.
I am not sure it is a right thing to do. I think that tokenizeExpr used in this diff is a correct place for such fix.
It has elegant name FWIW..

In D74687#1880546, @grimar wrote:

In D74687#1878409, @MaskRay wrote:

Do we have a more elegant fix:) ?

Do you have something particular in mind? The only thing I can think of is to treat '\n' as an expression terminator.
I am not sure it is a right thing to do. I think that tokenizeExpr used in this diff is a correct place for such fix.
It has elegant name FWIW..

Treating \n as an expression terminator should work. For the following example, we will accept a but not b (GNU ld accepts both):

a = (3
  +4);
b = 3
  +4;

I prefer an alternative: limit = to accept readPrimary instead of readFill:

if (peek() == "=" || peek().startswith("=")) {
  inExpr = true;
  consume("=");
  cmd->filler = readFill();
  inExpr = false;
}

As you can see, GNU ld's = syntax is weird...

In D74687#1880879, @MaskRay wrote:
I prefer an alternative: limit = to accept readPrimary instead of readFill:
if (peek() == "=" || peek().startswith("=")) {
  inExpr = true;
  consume("=");
  cmd->filler = readFill();
  inExpr = false;
}

Honestly I just do not feel comfortable to do this change.
If I currectly understood, you're suggesting something like:

if (peek() == "=" || peek().startswith("=")) {
  inExpr = true;
  consume("=");
  uint64_t value = readPrimary()().val;
  std::array<uint8_t, 4> buf;
  write32be(buf.data(), (uint32_t)value);

  cmd->filler = buf;
  inExpr = false;
}

With it we stop supporting syntax like SECTIONS { .mysec : { *(.mysec*) } =0x11223300+0x44 }.
It is a deviation from the specification which says that "fillexp is an expression" and mentions a unary +.

As you can see, GNU ld's = syntax is weird...

Yes, but it the current state LLD seems to have a reasonable behavior.
I am not sure we should intentionally ignore specification and remove this already supported feature just because of a specific issue
with "/DISCARD/".

Would it make sense to make /DISCARD/ its own token recognised by the Lexer? I believe that is what BFD does. It would prevent it from being recognised as "/" "DISCARD" "/". Apologies if this is not appropriate, I'm not too familiar with this part of the code.

In D74687#1893064, @psmith wrote:

Would it make sense to make /DISCARD/ its own token recognised by the Lexer? I believe that is what BFD does. It would prevent it from being recognised as "/" "DISCARD" "/". Apologies if this is not appropriate, I'm not too familiar with this part of the code.

It is what my patch does :) "/DISCARD/" is a single token initially, but during parsing of the FILL expression,
we call tokenizeExpr to return "3", "*" and "5" tokens for "3*5", for example.

I.e. tokenizeExpr should probably ignore and don't split the "/DISCARD/", and that is what I do.

In D74687#1893041, @grimar wrote:
In D74687#1880879, @MaskRay wrote:
I prefer an alternative: limit = to accept readPrimary instead of readFill:
if (peek() == "=" || peek().startswith("=")) {
  inExpr = true;
  consume("=");
  cmd->filler = readFill();
  inExpr = false;
}
Honestly I just do not feel comfortable to do this change.
If I currectly understood, you're suggesting something like:
if (peek() == "=" || peek().startswith("=")) {
  inExpr = true;
  consume("=");
  uint64_t value = readPrimary()().val;
  std::array<uint8_t, 4> buf;
  write32be(buf.data(), (uint32_t)value);

  cmd->filler = buf;
  inExpr = false;
}
With it we stop supporting syntax like SECTIONS { .mysec : { *(.mysec*) } =0x11223300+0x44 }.
It is a deviation from the specification which says that "fillexp is an expression" and mentions a unary +.

As you can see, GNU ld's = syntax is weird...

Yes, but it the current state LLD seems to have a reasonable behavior.
I am not sure we should intentionally ignore specification and remove this already supported feature just because of a specific issue
with "/DISCARD/".

See my example in https://reviews.llvm.org/D74687#1878409 I suspect =0x10+2+3 never works properly in GNU ld.
If we disallow that in lld, a user can easily work around the limitation by using =(0x10+2+3).

In D74687#1893065, @grimar wrote:

In D74687#1893064, @psmith wrote:

Would it make sense to make /DISCARD/ its own token recognised by the Lexer? I believe that is what BFD does. It would prevent it from being recognised as "/" "DISCARD" "/". Apologies if this is not appropriate, I'm not too familiar with this part of the code.

It is what my patch does :) "/DISCARD/" is a single token initially, but during parsing of the FILL expression,
we call tokenizeExpr to return "3", "*" and "5" tokens for "3*5", for example.

I.e. tokenizeExpr should probably ignore and don't split the "/DISCARD/", and that is what I do.

The trailing tokens of one output section description can cause other ambiguity when parsing the next output section description:

SECTIONS {
  .foo : { } =3
  /a : { }
}

I think it is just not necessary to support a syntax that will inherently cause problems, especially when there is no functionality loss.

I've revisited this patch, investigated possible use cases and reviewed our bugs history related.
I think Fangrui is right and I am going to update this diff very soon using his suggestion.

Reimplemented.

MaskRay accepted this revision.Mar 18 2020, 9:02 AM

This revision is now accepted and ready to land.Mar 18 2020, 9:02 AM

The title can probably be improved.

For do not fail parsing when "/DISCARD/" follows the fill expression., I've got one suggestion.

Disambiguate =fillexp with a primary expression to allow =0x90 /DISCARD/

A native speaker can suggest a better one:)

grimar retitled this revision from [LLD][ELF] - Linker script: do not fail parsing when "/DISCARD/" follows the fill expression. to [LLD][ELF] - Disambiguate "=fillexp" with a primary expression to allow =0x90 /DISCARD/.Mar 19 2020, 2:38 AM

Closed by commit rGbb7d2b178022: [LLD][ELF] - Disambiguate "=fillexp" with a primary expression to allow =0x90… (authored by grimar). · Explain WhyMar 19 2020, 3:11 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lld/

ELF/

ScriptParser.cpp

9 lines

test/

ELF/

linkerscript/

sections-padding.s

7 lines

Diff 251317

lld/ELF/ScriptParser.cpp

Show First 20 Lines • Show All 842 Lines • ▼ Show 20 Lines	while (!errorCount() && !consume("}")) {
} else if (tok == "CONSTRUCTORS") {		} else if (tok == "CONSTRUCTORS") {
// CONSTRUCTORS is a keyword to make the linker recognize C++ ctors/dtors		// CONSTRUCTORS is a keyword to make the linker recognize C++ ctors/dtors
// by name. This is for very old file formats such as ECOFF/XCOFF.		// by name. This is for very old file formats such as ECOFF/XCOFF.
// For ELF, we should ignore.		// For ELF, we should ignore.
} else if (tok == "FILL") {		} else if (tok == "FILL") {
// We handle the FILL command as an alias for =fillexp section attribute,		// We handle the FILL command as an alias for =fillexp section attribute,
// which is different from what GNU linkers do.		// which is different from what GNU linkers do.
// https://sourceware.org/binutils/docs/ld/Output-Section-Data.html		// https://sourceware.org/binutils/docs/ld/Output-Section-Data.html
expect("(");		if (peek() != "(")
		setError("( expected, but got " + peek());
cmd->filler = readFill();		cmd->filler = readFill();
expect(")");
} else if (tok == "SORT") {		} else if (tok == "SORT") {
readSort();		readSort();
} else if (tok == "INCLUDE") {		} else if (tok == "INCLUDE") {
readInclude();		readInclude();
} else if (peek() == "(") {		} else if (peek() == "(") {
cmd->sectionCommands.push_back(readInputSectionDescription(tok));		cmd->sectionCommands.push_back(readInputSectionDescription(tok));
} else {		} else {
// We have a file name and no input sections description. It is not a		// We have a file name and no input sections description. It is not a
Show All 38 Lines

// Reads a `=<fillexp>` expression and returns its value as a big-endian number.		// Reads a `=<fillexp>` expression and returns its value as a big-endian number.
// https://sourceware.org/binutils/docs/ld/Output-Section-Fill.html		// https://sourceware.org/binutils/docs/ld/Output-Section-Fill.html
// We do not support using symbols in such expressions.		// We do not support using symbols in such expressions.
//		//
// When reading a hexstring, ld.bfd handles it as a blob of arbitrary		// When reading a hexstring, ld.bfd handles it as a blob of arbitrary
// size, while ld.gold always handles it as a 32-bit big-endian number.		// size, while ld.gold always handles it as a 32-bit big-endian number.
// We are compatible with ld.gold because it's easier to implement.		// We are compatible with ld.gold because it's easier to implement.
		// Also, we require that expressions with operators must be wrapped into
		// round brackets. We did it to resolve the ambiguity when parsing scripts like:
		// SECTIONS { .foo : { ... } =120+3 /DISCARD/ : { ... } }
std::array<uint8_t, 4> ScriptParser::readFill() {		std::array<uint8_t, 4> ScriptParser::readFill() {
uint64_t value = readExpr()().val;		uint64_t value = readPrimary()().val;
if (value > UINT32_MAX)		if (value > UINT32_MAX)
setError("filler expression result does not fit 32-bit: 0x" +		setError("filler expression result does not fit 32-bit: 0x" +
Twine::utohexstr(value));		Twine::utohexstr(value));

std::array<uint8_t, 4> buf;		std::array<uint8_t, 4> buf;
write32be(buf.data(), (uint32_t)value);		write32be(buf.data(), (uint32_t)value);
return buf;		return buf;
}		}
▲ Show 20 Lines • Show All 691 Lines • Show Last 20 Lines

lld/test/ELF/linkerscript/sections-padding.s

	# REQUIRES: x86			# REQUIRES: x86
	# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t			# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t

	## Check that padding value works:			## Check that padding value works:
	# RUN: echo "SECTIONS { .mysec : { (.mysec) } =0x1122 }" > %t.script			# RUN: echo "SECTIONS { .mysec : { (.mysec) } =0x1122 }" > %t.script
	# RUN: ld.lld -o %t.out --script %t.script %t			# RUN: ld.lld -o %t.out --script %t.script %t
	# RUN: llvm-objdump -s %t.out \| FileCheck --check-prefix=YES %s			# RUN: llvm-objdump -s %t.out \| FileCheck --check-prefix=YES %s
	# YES: 66000011 22000011 22000011 22000011			# YES: 66000011 22000011 22000011 22000011

	# RUN: echo "SECTIONS { .mysec : { (.mysec) } =0x1100+0x22 }" > %t.script			# RUN: echo "SECTIONS { .mysec : { (.mysec) } =(0x1100+0x22) }" > %t.script
	# RUN: ld.lld -o %t.out --script %t.script %t			# RUN: ld.lld -o %t.out --script %t.script %t
	# RUN: llvm-objdump -s %t.out \| FileCheck --check-prefix=YES2 %s			# RUN: llvm-objdump -s %t.out \| FileCheck --check-prefix=YES2 %s
	# YES2: 66000011 22000011 22000011 22000011			# YES2: 66000011 22000011 22000011 22000011

	## Confirming that address was correct:			## Confirming that address was correct:
	# RUN: echo "SECTIONS { .mysec : { (.mysec) } =0x99887766 }" > %t.script			# RUN: echo "SECTIONS { .mysec : { (.mysec) } =0x99887766 }" > %t.script
	# RUN: ld.lld -o %t.out --script %t.script %t			# RUN: ld.lld -o %t.out --script %t.script %t
	# RUN: llvm-objdump -s %t.out \| FileCheck --check-prefix=YES3 %s			# RUN: llvm-objdump -s %t.out \| FileCheck --check-prefix=YES3 %s
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	# RUN: not ld.lld -o /dev/null --script %t.script %t 2>&1 \| FileCheck --check-prefix=ERR3 %s			# RUN: not ld.lld -o /dev/null --script %t.script %t 2>&1 \| FileCheck --check-prefix=ERR3 %s
	# ERR3: filler expression result does not fit 32-bit: 0x1100000000			# ERR3: filler expression result does not fit 32-bit: 0x1100000000

	## Check we report an error if an expression use a symbol.			## Check we report an error if an expression use a symbol.
	# RUN: echo "SECTIONS { foo = 0x11; .mysec : { (.mysec) } = foo }" > %t.script			# RUN: echo "SECTIONS { foo = 0x11; .mysec : { (.mysec) } = foo }" > %t.script
	# RUN: not ld.lld -o /dev/null %t --script %t.script 2>&1 \| FileCheck --check-prefix=ERR4 %s			# RUN: not ld.lld -o /dev/null %t --script %t.script 2>&1 \| FileCheck --check-prefix=ERR4 %s
	# ERR4: symbol not found: foo			# ERR4: symbol not found: foo

				## Check we are able to parse scripts where "/DISCARD/" follows a section fill expression.
				# RUN: echo "SECTIONS { .mysec : { (.mysec) } =0x1122 /DISCARD/ : { *(.text) } }" > %t.script
				# RUN: ld.lld -o %t.out --script %t.script %t
				# RUN: llvm-objdump -s %t.out \| FileCheck --check-prefix=YES %s

	.section .mysec.1,"a"			.section .mysec.1,"a"
	.align 16			.align 16
	.byte 0x66			.byte 0x66

	.section .mysec.2,"a"			.section .mysec.2,"a"
	.align 16			.align 16
	.byte 0x66			.byte 0x66

	.globl _start			.globl _start
	_start:			_start:
	nop			nop