Download Raw Diff

Details

Reviewers

ruiu
• rafael

Commits

rGf22ec9ddf61a: [ELF] - Linkerscript: fix issue with SUBALIGN.
rLLD316580: [ELF] - Linkerscript: fix issue with SUBALIGN.
rL316580: [ELF] - Linkerscript: fix issue with SUBALIGN.

Summary

This is PR34886.

SUBALIGN command currently triggers failture if result expression
is zero. Patch fixes the issue, treating zero as 1, what is consistent with
other places and ELF spec it seems.

Patch also adds "is power of 2" check for this and other expressions
returning alignment.

Diff Detail

Event Timeline

grimar created this revision.Oct 12 2017, 8:17 AM

Herald added a subscriber: emaste. · View Herald TranscriptOct 12 2017, 8:17 AM

jhenderson added a subscriber: jhenderson.Oct 12 2017, 8:39 AM

I'd think you are overthinking. I don't think it is a good idea to add that many parameters to various functions just to check for a misuse of some rarely used feature. We should report an error instead of firing an assertion for SUBALIGN expression 0, but for const-ness, I'm not too excited about rigorous error checking.

In D38846#898203, @ruiu wrote:

I'd think you are overthinking. I don't think it is a good idea to add that many parameters to various functions just to check for a misuse of some rarely used feature.

The only function where I added parameter was getSymbolValue. It should be enough to control availability of Dot for all possible cases I think.
One of pros also is that after such change it stops using 'Ctx', but uses 'CanUseDot' flag what removes dependency on 'Ctx' and improves readability.

We should report an error instead of firing an assertion for SUBALIGN expression 0,

Reporting a error would not only be inconsistent with ld.bfd and rest LLD code (we change 0->1 with use of std::max((uint64_t)1, E().getValue()) when parsing ALIGN and DATA_SEGMENT_ALIGN currently), but also incorrect, because ELF spec explicitly says:
"Some sections have address alignment constraints. <skipped> . Currently, only 0 and positive integral powers of two are allowed. Values 0 and 1 mean the section has no alignment constraints." (https://docs.oracle.com/cd/E19683-01/817-3677/chapter6-94076/index.html)
I see nothing wrong to have alignment 0 in script expression, we should treat them as 1, that is consistent with spec.

jhenderson added inline comments.Oct 16 2017, 9:28 AM

ELF/LinkerScript.cpp
881 ↗	(On Diff #118791)	Is Ctx always going to be available here? Was the old check redundant?

grimar added inline comments.Oct 16 2017, 9:35 AM

ELF/LinkerScript.cpp
881 ↗	(On Diff #118791)	No, it was not redundant, and in this patch `Ctx` is always should be available when `CanUseDot` is set, otherwise something is wrong. For example it is not set outside `SECTIONS` command.

You are right that we should handle 0 as 1. But please remove code for the rigorous error checking for const-ness. We just want to provide a reasonable error reporting, and this patch seem to have gone a bit too far. There are a lot of ways you can do wrong things using linker scripts, and I don't want to focus too much on some arbitrary corner case.

Updated, added forgotten testcase.

ruiu added inline comments.Oct 17 2017, 8:19 AM

ELF/LinkerScript.cpp
405–406 ↗	(On Diff #119312)	You should report an error if Subalign is not a power of two.

grimar added inline comments.Oct 17 2017, 8:22 AM

ELF/LinkerScript.cpp
405–406 ↗	(On Diff #119312)	Why ? We do not do that at other places. Isn't it the same as you wrote earlier "you can do wrong things using linker scripts, and I don't want to focus too much on some arbitrary corner case." ? Btw, 0 is not power of 2.

You should distinguish two completely different things. As to an expression you can use within SUBALIGN(<expr>), we are not too picky about terms, functions or operators you can use in the expression. It is something like C allows you to shoot your foot with i = i++. As long as <expr> in SUBALIGN(<expr>) returns some value, we don't care how it is computed. We just use it.

However, an attempt to set a non-power-of-two value to an alignment is a completely different story, because we assume everywhere in our code base that alignments are always power of two. If you break that assumption, the entire linker's behavior becomes unpredictable. It may work, or it may crash. The point is, we don't want to fall into that situation. We need to protect ourselves from bad inputs that breaks our internal assumptions.

So, that's different, and you should check for an input value if it is a power of two.

(Also, speaking of 0, since 0 means 1 in this context, it is a power of two because it is actually 1.)

Added "is power of 2" check.

ruiu added inline comments.Oct 18 2017, 11:57 AM

ELF/ScriptParser.cpp
645	I'd make this a member of ScriptParser to eliminate `Loc`.
649	You should return 1 instead of some erroneous value.

grimar added inline comments.Oct 19 2017, 4:32 AM

ELF/ScriptParser.cpp
645	That will not work. If I take current `Location` in member instead of passing it as parameter in `Expr ScriptParser::readPrimary()`, location will be different. That is why we pass location to other places, for example to 'checkIfExists'.
649	It does not make sence I think. What we should do here is to report an error and a value that is not 0, so that possible alignTo() call will not asset. My code already do that.

grimar added inline comments.Oct 19 2017, 4:36 AM

ELF/ScriptParser.cpp
649	"report an error and a value" -> "report an error and return a value"

ruiu added inline comments.Oct 19 2017, 8:50 AM

ELF/ScriptParser.cpp
649	Well, it is actually "report an error and return a sane value". Alignment 17 is, for example, not a sane value, and your function shouldn't return such value even in an error condition. That is an important contract of your function, and it needs to satisfy that post-condition. I believe you understand how lld handles error conditions very well, so it is a bit odd that you thought it doesn't make sense. It does make sense. error() does not call exit(). You can call errors() as many times as you want to report multiple errors, and until the control reaches some checkpoint, lld continues working. While it is working, we need to maintain the integrity of our internal data structure so that, for example, lld wouldn't die with an assertion failure after reporting an error. We do not expect Alignment to be a non-power-of-two value. So, you shouldn't return a non-power-of-two value from this function, even if there's an error in inputs. If you do, you are not only reporting an error but also propagating it to the caller and breaking our internal assumption.

As you are adding checkAlignment for the ALIGN directive as well, you should add tests to cover that case for non-power-of-two and zero values.

test/ELF/linkerscript/subalign.s
26	Could you comment here what we expect the behaviour to be in this case (apart from no error), please. For example, is the SUBALIGN value a) undefined, so may change, or b) always zero (effectively 1)? If it's undefined, I'm not sure we need the objdump check, since we may decide to change the behaviour later to something else, if it becomes more convenient.

Addressed review comments.

ELF/ScriptParser.cpp
649	Let me clarify. My point was that it should not make sence what to return here if such value allows to avoid assert/crash. Currently the only place where we can assert I know about is a call of alignTo with zero, what is fixed by patch. My code actually was heavily based on LLD's policy of handling error conditions - it did only thing that we had to do to be able to exit on closest exit checkpoint after triggering a error. It seems to me that if returning 17 can break something else (so that we assert/crash) because of our internal assumptions, it is at least a sign that we probably want to look closer at that place and probably may want to place one more exit checkpoint earlier. But I think I do not know such place currently. And most probably nothing too scary should happen until we reach existent exit checkpoint with alignment 17. I am ok to return 1 here just in case for now to be consistent with internal assumption and to let this patch go though.
test/ELF/linkerscript/subalign.s
26	I would say it is undefined. We do not want to support this behavior, it just works somehow now and that might change in future. Updated comment and testcase, thanks for looking !

If I understand correctly, you are saying that (1) returning an impossible value (e.g. 17 as an alignment) is fine as long as (2) doing that doesn't cause any assertion failure or something.

So, (1) is simply wrong. We assume that alignments are always powers of two, and you shouldn't break that assumption at any time. We do not want to even think about any value that is not a power of two. I've already described the reason, so I don't know how I can convince you, but this is how we handle errors in lld, and you should follow that.

Maybe, viewing programs as state machines might help you understand why what I was saying makes sense. Any program can be thought as a state machine (as long as it uses finite amount of memory.) The number of states lld can be, for example, is really huge, but it is finite, and on each step of the code, we move from one state to another. Now you can think of a set of "sane" states, in which our internal constraints are all satisfied. You want to keep "sane" states closed under all transitions, simply because we do not write our code for any insane state. We do not guarantee our code's behavior in any sense if a program falls in an impossible state. It could transition back to some sane state, but when that happens, that is a coincidence, and you shouldn't rely on that. You can make "sane" states larger by relaxing some constraints (e.g. temporarily accepting a non-power-of-two as a valid alignment), but that makes your program unnecessarily complicated, or at least makes you think more when you write code. As a result, it makes your code hard to understand.

If you are familiar with parsing, you can also think of the parser error recovery. To recover from a user error, you continue reading a text by assuming that an erroneous token were some different but a valid token. You wouldn't make your parser an impossible state before continue reading, right? It's just like that.

And the second point doesn't really matter. Just like traffic rules, you need to maintain all constraints whether you are being watched or not. We do not sprinkle asserts to many places, but that doesn't mean that setting a variable to an impossible value is accepted.

In D38846#903647, @ruiu wrote:

If I understand correctly, you are saying that (1) returning an impossible value (e.g. 17 as an alignment) is fine as long as (2) doing that doesn't cause any assertion failure or something.

So, (1) is simply wrong. We assume that alignments are always powers of two, and you shouldn't break that assumption at any time. We do not want to even think about any value that is not a power of two. I've already described the reason, so I don't know how I can convince you, but this is how we handle errors in lld, and you should follow that.

Maybe, viewing programs as state machines might help you understand why what I was saying makes sense. Any program can be thought as a state machine (as long as it uses finite amount of memory.) The number of states lld can be, for example, is really huge, but it is finite, and on each step of the code, we move from one state to another. Now you can think of a set of "sane" states, in which our internal constraints are all satisfied. You want to keep "sane" states closed under all transitions, simply because we do not write our code for any insane state. We do not guarantee our code's behavior in any sense if a program falls in an impossible state. It could transition back to some sane state, but when that happens, that is a coincidence, and you shouldn't rely on that. You can make "sane" states larger by relaxing some constraints (e.g. temporarily accepting a non-power-of-two as a valid alignment), but that makes your program unnecessarily complicated, or at least makes you think more when you write code. As a result, it makes your code hard to understand.

If you are familiar with parsing, you can also think of the parser error recovery. To recover from a user error, you continue reading a text by assuming that an erroneous token were some different but a valid token. You wouldn't make your parser an impossible state before continue reading, right? It's just like that.

And the second point doesn't really matter. Just like traffic rules, you need to maintain all constraints whether you are being watched or not. We do not sprinkle asserts to many places, but that doesn't mean that setting a variable to an impossible value is accepted.

Ok, thanks for detailed explanation. I'll follow.

Is it ok to land this ? checkAlignment returns dummy 1 value when error as discussed.

Going to add tests for ALIGN as suggested in comments, forgot about that, sorry.

Added testcases for testing ALIGN expression.

In D38846#904885, @grimar wrote:

Added testcases for testing ALIGN expression.

Thanks, they look fine from my point of view, aside from one minor point.

test/ELF/linkerscript/align.s
84 ↗	(On Diff #120028)	Minor point: to mirror the zero value cases above, could you switch round the order of these two cases, please.

Addressed comment.

ruiu added inline comments.Oct 24 2017, 12:25 PM

ELF/ScriptParser.cpp
647	The use of std::max is not obvious. I prefer uint64_t Alignment = E().getValue(); if (Alignment == 0) return (uint64_t)1; if (!isPowerOf2_64...
650	Remove the comment because it is obvious.
test/ELF/linkerscript/subalign.s
26	This comment is wrong. Our behavior is not exactly the same as GNU linkers, but it is consistent to our way.

grimar added inline comments.Oct 25 2017, 2:30 AM

ELF/ScriptParser.cpp
647	Isn't it the same as we already do here: https://github.com/llvm-mirror/lld/blob/master/ELF/ScriptParser.cpp#L964 and here: https://github.com/llvm-mirror/lld/blob/master/ELF/ScriptParser.cpp#L994 ? Should we fix that places ?
650	You asked me to add such comment for different but similar place here: https://reviews.llvm.org/D36140?id=109103#827418 Isn't it consistent ? Should we remove it from there ?

Please commit.

ELF/ScriptParser.cpp
647	I'll do.
650	I'll do.

Closed by commit rL316580: [ELF] - Linkerscript: fix issue with SUBALIGN. (authored by grimar). · Explain WhyOct 25 2017, 7:51 AM

This revision was automatically updated to reflect the committed changes.

Diff 119826

ELF/ScriptParser.cpp

Show First 20 Lines • Show All 636 Lines • ▼ Show 20 Lines	void ScriptParser::readSectionAddressType(OutputSection *Cmd) {

if (consume("(")) {		if (consume("(")) {
expect("NOLOAD");		expect("NOLOAD");
expect(")");		expect(")");
Cmd->Noload = true;		Cmd->Noload = true;
}		}
}		}

		static Expr checkAlignment(Expr E, std::string &Loc) {
		ruiuUnsubmitted Not Done Reply Inline Actions I'd make this a member of ScriptParser to eliminate `Loc`. ruiu: I'd make this a member of ScriptParser to eliminate `Loc`.
		grimarAuthorUnsubmitted Not Done Reply Inline Actions That will not work. If I take current `Location` in member instead of passing it as parameter in `Expr ScriptParser::readPrimary()`, location will be different. That is why we pass location to other places, for example to 'checkIfExists'. grimar: That will not work. If I take current `Location` in member instead of passing it as parameter…
		return [=] {
		uint64_t Alignment = std::max((uint64_t)1, E().getValue());
		ruiuUnsubmitted Not Done Reply Inline Actions The use of std::max is not obvious. I prefer uint64_t Alignment = E().getValue(); if (Alignment == 0) return (uint64_t)1; if (!isPowerOf2_64... ruiu: The use of std::max is not obvious. I prefer uint64_t Alignment = E().getValue(); if…
		grimarAuthorUnsubmitted Not Done Reply Inline Actions Isn't it the same as we already do here: https://github.com/llvm-mirror/lld/blob/master/ELF/ScriptParser.cpp#L964 and here: https://github.com/llvm-mirror/lld/blob/master/ELF/ScriptParser.cpp#L994 ? Should we fix that places ? grimar: Isn't it the same as we already do here: https://github.com/llvm…
		ruiuUnsubmitted Not Done Reply Inline Actions I'll do. ruiu: I'll do.
		if (!isPowerOf2_64(Alignment)) {
		error(Loc + ": alignment must be power of 2");
		ruiuUnsubmitted Not Done Reply Inline Actions You should return 1 instead of some erroneous value. ruiu: You should return 1 instead of some erroneous value.
		grimarAuthorUnsubmitted Not Done Reply Inline Actions It does not make sence I think. What we should do here is to report an error and a value that is not 0, so that possible alignTo() call will not asset. My code already do that. grimar: It does not make sence I think. What we should do here is to report an error and a value that…
		grimarAuthorUnsubmitted Not Done Reply Inline Actions "report an error and a value" -> "report an error and return a value" grimar: "report an error and a value" -> "report an error and return a value"
		ruiuUnsubmitted Not Done Reply Inline Actions Well, it is actually "report an error and return a sane value". Alignment 17 is, for example, not a sane value, and your function shouldn't return such value even in an error condition. That is an important contract of your function, and it needs to satisfy that post-condition. I believe you understand how lld handles error conditions very well, so it is a bit odd that you thought it doesn't make sense. It does make sense. error() does not call exit(). You can call errors() as many times as you want to report multiple errors, and until the control reaches some checkpoint, lld continues working. While it is working, we need to maintain the integrity of our internal data structure so that, for example, lld wouldn't die with an assertion failure after reporting an error. We do not expect Alignment to be a non-power-of-two value. So, you shouldn't return a non-power-of-two value from this function, even if there's an error in inputs. If you do, you are not only reporting an error but also propagating it to the caller and breaking our internal assumption. ruiu: Well, it is actually "report an error and return a sane value". Alignment 17 is, for example…
		grimarAuthorUnsubmitted Not Done Reply Inline Actions Let me clarify. My point was that it should not make sence what to return here if such value allows to avoid assert/crash. Currently the only place where we can assert I know about is a call of alignTo with zero, what is fixed by patch. My code actually was heavily based on LLD's policy of handling error conditions - it did only thing that we had to do to be able to exit on closest exit checkpoint after triggering a error. It seems to me that if returning 17 can break something else (so that we assert/crash) because of our internal assumptions, it is at least a sign that we probably want to look closer at that place and probably may want to place one more exit checkpoint earlier. But I think I do not know such place currently. And most probably nothing too scary should happen until we reach existent exit checkpoint with alignment 17. I am ok to return 1 here just in case for now to be consistent with internal assumption and to let this patch go though. grimar: Let me clarify. My point was that it should not make sence what to return here if such value…
		return (uint64_t)1; // Return a dummy value.
		ruiuUnsubmitted Not Done Reply Inline Actions Remove the comment because it is obvious. ruiu: Remove the comment because it is obvious.
		grimarAuthorUnsubmitted Not Done Reply Inline Actions You asked me to add such comment for different but similar place here: https://reviews.llvm.org/D36140?id=109103#827418 Isn't it consistent ? Should we remove it from there ? grimar: You asked me to add such comment for different but similar place here: https://reviews.llvm.
		ruiuUnsubmitted Not Done Reply Inline Actions I'll do. ruiu: I'll do.
		}
		return Alignment;
		};
		}

OutputSection *ScriptParser::readOutputSectionDescription(StringRef OutSec) {		OutputSection *ScriptParser::readOutputSectionDescription(StringRef OutSec) {
OutputSection *Cmd =		OutputSection *Cmd =
Script->createOutputSection(OutSec, getCurrentLocation());		Script->createOutputSection(OutSec, getCurrentLocation());

if (peek() != ":")		if (peek() != ":")
readSectionAddressType(Cmd);		readSectionAddressType(Cmd);
expect(":");		expect(":");

		std::string Location = getCurrentLocation();
if (consume("AT"))		if (consume("AT"))
Cmd->LMAExpr = readParenExpr();		Cmd->LMAExpr = readParenExpr();
if (consume("ALIGN"))		if (consume("ALIGN"))
Cmd->AlignExpr = readParenExpr();		Cmd->AlignExpr = checkAlignment(readParenExpr(), Location);
if (consume("SUBALIGN"))		if (consume("SUBALIGN"))
Cmd->SubalignExpr = readParenExpr();		Cmd->SubalignExpr = checkAlignment(readParenExpr(), Location);

// Parse constraints.		// Parse constraints.
if (consume("ONLY_IF_RO"))		if (consume("ONLY_IF_RO"))
Cmd->Constraint = ConstraintKind::ReadOnly;		Cmd->Constraint = ConstraintKind::ReadOnly;
if (consume("ONLY_IF_RW"))		if (consume("ONLY_IF_RW"))
Cmd->Constraint = ConstraintKind::ReadWrite;		Cmd->Constraint = ConstraintKind::ReadWrite;
expect("{");		expect("{");

▲ Show 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	if (Tok == "ADDR") {
return [=]() -> ExprValue {		return [=]() -> ExprValue {
checkIfExists(Sec, Location);		checkIfExists(Sec, Location);
return {Sec, false, 0, Location};		return {Sec, false, 0, Location};
};		};
}		}
if (Tok == "ALIGN") {		if (Tok == "ALIGN") {
expect("(");		expect("(");
Expr E = readExpr();		Expr E = readExpr();
if (consume(")"))		if (consume(")")) {
return [=] {		E = checkAlignment(E, Location);
return alignTo(Script->getDot(), std::max((uint64_t)1, E().getValue()));		return [=] { return alignTo(Script->getDot(), E().getValue()); };
};		}
expect(",");		expect(",");
Expr E2 = readExpr();		Expr E2 = checkAlignment(readExpr(), Location);
expect(")");		expect(")");
return [=] {		return [=] {
ExprValue V = E();		ExprValue V = E();
V.Alignment = std::max((uint64_t)1, E2().getValue());		V.Alignment = E2().getValue();
return V;		return V;
};		};
}		}
if (Tok == "ALIGNOF") {		if (Tok == "ALIGNOF") {
StringRef Name = readParenLiteral();		StringRef Name = readParenLiteral();
OutputSection *Cmd = Script->getOrCreateOutputSection(Name);		OutputSection *Cmd = Script->getOrCreateOutputSection(Name);
return [=] {		return [=] {
checkIfExists(Cmd, Location);		checkIfExists(Cmd, Location);
▲ Show 20 Lines • Show All 337 Lines • Show Last 20 Lines

test/ELF/linkerscript/subalign.s

	Show All 16 Lines

	# RUN: echo "SECTIONS { .aaa : SUBALIGN(1) { (.aaa.) } }" > %t2.script			# RUN: echo "SECTIONS { .aaa : SUBALIGN(1) { (.aaa.) } }" > %t2.script
	# RUN: ld.lld -o %t2 --script %t2.script %t1.o			# RUN: ld.lld -o %t2 --script %t2.script %t1.o
	# RUN: llvm-objdump -s %t2 \| FileCheck -check-prefix=SUBALIGN %s			# RUN: llvm-objdump -s %t2 \| FileCheck -check-prefix=SUBALIGN %s
	# SUBALIGN: Contents of section .aaa:			# SUBALIGN: Contents of section .aaa:
	# SUBALIGN: 01000000 00000000 02000000 00000000			# SUBALIGN: 01000000 00000000 02000000 00000000
	# SUBALIGN: 03000000 00000000 04000000 00000000			# SUBALIGN: 03000000 00000000 04000000 00000000

				## Test we do not assert or crash when dot(.) is used inside SUBALIGN.
				## ld.bfd does not allow to use dot in such expressions, our behavior is inconsistent
				jhendersonUnsubmitted Not Done Reply Inline Actions Could you comment here what we expect the behaviour to be in this case (apart from no error), please. For example, is the SUBALIGN value a) undefined, so may change, or b) always zero (effectively 1)? If it's undefined, I'm not sure we need the objdump check, since we may decide to change the behaviour later to something else, if it becomes more convenient. jhenderson: Could you comment here what we expect the behaviour to be in this case (apart from no error)…
				grimarAuthorUnsubmitted Not Done Reply Inline Actions I would say it is undefined. We do not want to support this behavior, it just works somehow now and that might change in future. Updated comment and testcase, thanks for looking ! grimar: I would say it is undefined. We do not want to support this behavior, it just works somehow now…
				ruiuUnsubmitted Not Done Reply Inline Actions This comment is wrong. Our behavior is not exactly the same as GNU linkers, but it is consistent to our way. ruiu: This comment is wrong. Our behavior is not exactly the same as GNU linkers, but it is…
				## here for simplicity of implementation. Value of dot is undefined.
				# RUN: echo "SECTIONS { . = 0x32; .aaa : SUBALIGN(.) { (.aaa) } }" > %t3.script
				# RUN: ld.lld %t1.o --script %t3.script -o %t3
				# RUN: llvm-objdump -s %t3 > /dev/null

				## Test we are able to link with zero alignment, this is consistent with bfd 2.26.1.
				# RUN: echo "SECTIONS { .aaa : SUBALIGN(0) { (.aaa) } }" > %t4.script
				# RUN: ld.lld %t1.o --script %t4.script -o %t4
				# RUN: llvm-objdump -s %t4 \| FileCheck -check-prefix=SUBALIGN %s

				## Test we fail gracefuly when alignment value is not a power of 2.
				# RUN: echo "SECTIONS { .aaa : SUBALIGN(3) { (.aaa) } }" > %t5.script
				# RUN: not ld.lld %t1.o --script %t5.script -o %t5 2>&1 \| FileCheck -check-prefix=ERR %s
				# ERR: {{.*}}.script:1: alignment must be power of 2

	.global _start			.global _start
	_start:			_start:
	nop			nop

	.section .aaa.1, "a"			.section .aaa.1, "a"
	.align 16			.align 16
	.quad 1			.quad 1

	Show All 11 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Linkerscript: fix issue with SUBALIGN.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 119826

ELF/ScriptParser.cpp

test/ELF/linkerscript/subalign.s

This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Linkerscript: fix issue with SUBALIGN.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 119826

ELF/ScriptParser.cpp

test/ELF/linkerscript/subalign.s

[ELF] - Linkerscript: fix issue with SUBALIGN.
ClosedPublic