This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Linkerscript: fix issue with SUBALIGN.
ClosedPublic

Authored by grimar on Oct 12 2017, 8:17 AM.

Details

Summary

This is PR34886.

SUBALIGN command currently triggers failture if result expression
is zero. Patch fixes the issue, treating zero as 1, what is consistent with
other places and ELF spec it seems.

Patch also adds "is power of 2" check for this and other expressions
returning alignment.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar created this revision.Oct 12 2017, 8:17 AM
ruiu edited edge metadata.Oct 15 2017, 1:54 PM

I'd think you are overthinking. I don't think it is a good idea to add that many parameters to various functions just to check for a misuse of some rarely used feature. We should report an error instead of firing an assertion for SUBALIGN expression 0, but for const-ness, I'm not too excited about rigorous error checking.

grimar added a comment.EditedOct 16 2017, 1:49 AM
In D38846#898203, @ruiu wrote:

I'd think you are overthinking. I don't think it is a good idea to add that many parameters to various functions just to check for a misuse of some rarely used feature.

The only function where I added parameter was getSymbolValue. It should be enough to control availability of Dot for all possible cases I think.
One of pros also is that after such change it stops using 'Ctx', but uses 'CanUseDot' flag what removes dependency on 'Ctx' and improves readability.

We should report an error instead of firing an assertion for SUBALIGN expression 0,

Reporting a error would not only be inconsistent with ld.bfd and rest LLD code (we change 0->1 with use of std::max((uint64_t)1, E().getValue()) when parsing ALIGN and DATA_SEGMENT_ALIGN currently), but also incorrect, because ELF spec explicitly says:
"Some sections have address alignment constraints. <skipped> . Currently, only 0 and positive integral powers of two are allowed. Values 0 and 1 mean the section has no alignment constraints." (https://docs.oracle.com/cd/E19683-01/817-3677/chapter6-94076/index.html)
I see nothing wrong to have alignment 0 in script expression, we should treat them as 1, that is consistent with spec.

jhenderson added inline comments.Oct 16 2017, 9:28 AM
ELF/LinkerScript.cpp
881 ↗(On Diff #118791)

Is Ctx always going to be available here? Was the old check redundant?

grimar added inline comments.Oct 16 2017, 9:35 AM
ELF/LinkerScript.cpp
881 ↗(On Diff #118791)

No, it was not redundant, and in this patch Ctx is always should be available when CanUseDot is set,
otherwise something is wrong.

For example it is not set outside SECTIONS command.

ruiu added a comment.Oct 16 2017, 10:20 AM

You are right that we should handle 0 as 1. But please remove code for the rigorous error checking for const-ness. We just want to provide a reasonable error reporting, and this patch seem to have gone a bit too far. There are a lot of ways you can do wrong things using linker scripts, and I don't want to focus too much on some arbitrary corner case.

grimar updated this revision to Diff 119312.Oct 17 2017, 6:45 AM
grimar retitled this revision from [ELF] - Linkerscript: Fix issues with SUBALIGN. to [ELF] - Linkerscript: fix issue with SUBALIGN..
grimar edited the summary of this revision. (Show Details)

Updated, added forgotten testcase.

ruiu added inline comments.Oct 17 2017, 8:19 AM
ELF/LinkerScript.cpp
405–406 ↗(On Diff #119312)

You should report an error if Subalign is not a power of two.

grimar added inline comments.Oct 17 2017, 8:22 AM
ELF/LinkerScript.cpp
405–406 ↗(On Diff #119312)

Why ? We do not do that at other places. Isn't it the same as you wrote earlier
"you can do wrong things using linker scripts, and I don't want to focus too much on some arbitrary corner case." ?
Btw, 0 is not power of 2.

ruiu added a comment.Oct 17 2017, 8:56 AM

You should distinguish two completely different things. As to an expression you can use within SUBALIGN(<expr>), we are not too picky about terms, functions or operators you can use in the expression. It is something like C allows you to shoot your foot with i = i++. As long as <expr> in SUBALIGN(<expr>) returns some value, we don't care how it is computed. We just use it.

However, an attempt to set a non-power-of-two value to an alignment is a completely different story, because we assume everywhere in our code base that alignments are always power of two. If you break that assumption, the entire linker's behavior becomes unpredictable. It may work, or it may crash. The point is, we don't want to fall into that situation. We need to protect ourselves from bad inputs that breaks our internal assumptions.

So, that's different, and you should check for an input value if it is a power of two.

(Also, speaking of 0, since 0 means 1 in this context, it is a power of two because it is actually 1.)

grimar updated this revision to Diff 119448.Oct 18 2017, 2:53 AM
grimar edited the summary of this revision. (Show Details)
  • Added "is power of 2" check.
ruiu added inline comments.Oct 18 2017, 11:57 AM
ELF/ScriptParser.cpp
645 ↗(On Diff #119448)

I'd make this a member of ScriptParser to eliminate Loc.

649 ↗(On Diff #119448)

You should return 1 instead of some erroneous value.

grimar added inline comments.Oct 19 2017, 4:32 AM
ELF/ScriptParser.cpp
645 ↗(On Diff #119448)

That will not work.
If I take current Location in member instead of passing it as parameter in Expr ScriptParser::readPrimary(),
location will be different. That is why we pass location to other places, for example to 'checkIfExists'.

649 ↗(On Diff #119448)

It does not make sence I think.
What we should do here is to report an error and a value that is not 0, so that possible alignTo() call will not asset.
My code already do that.

grimar added inline comments.Oct 19 2017, 4:36 AM
ELF/ScriptParser.cpp
649 ↗(On Diff #119448)

"report an error and a value" -> "report an error and return a value"

ruiu added inline comments.Oct 19 2017, 8:50 AM
ELF/ScriptParser.cpp
649 ↗(On Diff #119448)

Well, it is actually "report an error and return a *sane* value". Alignment 17 is, for example, not a sane value, and your function shouldn't return such value even in an error condition. That is an important contract of your function, and it needs to satisfy that post-condition.

I believe you understand how lld handles error conditions very well, so it is a bit odd that you thought it doesn't make sense. It does make sense.

error() does not call exit(). You can call errors() as many times as you want to report multiple errors, and until the control reaches some checkpoint, lld continues working. While it is working, we need to maintain the integrity of our internal data structure so that, for example, lld wouldn't die with an assertion failure after reporting an error.

We do not expect Alignment to be a non-power-of-two value. So, you shouldn't return a non-power-of-two value from this function, even if there's an error in inputs. If you do, you are not only reporting an error but also propagating it to the caller and breaking our internal assumption.

As you are adding checkAlignment for the ALIGN directive as well, you should add tests to cover that case for non-power-of-two and zero values.

test/ELF/linkerscript/subalign.s
26 ↗(On Diff #119448)

Could you comment here what we expect the behaviour to be in this case (apart from no error), please. For example, is the SUBALIGN value a) undefined, so may change, or b) always zero (effectively 1)? If it's undefined, I'm not sure we need the objdump check, since we may decide to change the behaviour later to something else, if it becomes more convenient.

grimar updated this revision to Diff 119826.Oct 23 2017, 3:33 AM
  • Addressed review comments.
ELF/ScriptParser.cpp
649 ↗(On Diff #119448)

Let me clarify. My point was that it should not make sence what to return here if such value allows to avoid assert/crash.
Currently the only place where we can assert I know about is a call of alignTo with zero, what is fixed by patch.
My code actually was heavily based on LLD's policy of handling error conditions - it did only thing that we had to do
to be able to exit on closest exit checkpoint after triggering a error.

It seems to me that if returning 17 can break something else (so that we assert/crash) because of our internal assumptions, it is at least a sign that we probably want to look closer at that place and probably may want to place one more exit checkpoint earlier. But I think I do not know such place currently. And most probably nothing too scary should happen until we reach existent exit checkpoint with alignment 17.

I am ok to return 1 here just in case for now to be consistent with internal assumption and to let this patch go though.

test/ELF/linkerscript/subalign.s
26 ↗(On Diff #119448)

I would say it is undefined. We do not want to support this behavior, it just works somehow now and that might change in future. Updated comment and testcase, thanks for looking !

ruiu added a comment.Oct 23 2017, 8:50 AM

If I understand correctly, you are saying that (1) returning an impossible value (e.g. 17 as an alignment) is fine as long as (2) doing that doesn't cause any assertion failure or something.

So, (1) is simply wrong. We assume that alignments are always powers of two, and you shouldn't break that assumption at any time. We do not want to even think about any value that is not a power of two. I've already described the reason, so I don't know how I can convince you, but this is how we handle errors in lld, and you should follow that.

Maybe, viewing programs as state machines might help you understand why what I was saying makes sense. Any program can be thought as a state machine (as long as it uses finite amount of memory.) The number of states lld can be, for example, is really huge, but it is finite, and on each step of the code, we move from one state to another. Now you can think of a set of "sane" states, in which our internal constraints are all satisfied. You want to keep "sane" states closed under all transitions, simply because we do not write our code for any insane state. We do not guarantee our code's behavior in any sense if a program falls in an impossible state. It could transition back to some sane state, but when that happens, that is a coincidence, and you shouldn't rely on that. You can make "sane" states larger by relaxing some constraints (e.g. temporarily accepting a non-power-of-two as a valid alignment), but that makes your program unnecessarily complicated, or at least makes you think more when you write code. As a result, it makes your code hard to understand.

If you are familiar with parsing, you can also think of the parser error recovery. To recover from a user error, you continue reading a text by assuming that an erroneous token were some different but a valid token. You wouldn't make your parser an impossible state before continue reading, right? It's just like that.

And the second point doesn't really matter. Just like traffic rules, you need to maintain all constraints whether you are being watched or not. We do not sprinkle asserts to many places, but that doesn't mean that setting a variable to an impossible value is accepted.

In D38846#903647, @ruiu wrote:

If I understand correctly, you are saying that (1) returning an impossible value (e.g. 17 as an alignment) is fine as long as (2) doing that doesn't cause any assertion failure or something.

So, (1) is simply wrong. We assume that alignments are always powers of two, and you shouldn't break that assumption at any time. We do not want to even think about any value that is not a power of two. I've already described the reason, so I don't know how I can convince you, but this is how we handle errors in lld, and you should follow that.

Maybe, viewing programs as state machines might help you understand why what I was saying makes sense. Any program can be thought as a state machine (as long as it uses finite amount of memory.) The number of states lld can be, for example, is really huge, but it is finite, and on each step of the code, we move from one state to another. Now you can think of a set of "sane" states, in which our internal constraints are all satisfied. You want to keep "sane" states closed under all transitions, simply because we do not write our code for any insane state. We do not guarantee our code's behavior in any sense if a program falls in an impossible state. It could transition back to some sane state, but when that happens, that is a coincidence, and you shouldn't rely on that. You can make "sane" states larger by relaxing some constraints (e.g. temporarily accepting a non-power-of-two as a valid alignment), but that makes your program unnecessarily complicated, or at least makes you think more when you write code. As a result, it makes your code hard to understand.

If you are familiar with parsing, you can also think of the parser error recovery. To recover from a user error, you continue reading a text by assuming that an erroneous token were some different but a valid token. You wouldn't make your parser an impossible state before continue reading, right? It's just like that.

And the second point doesn't really matter. Just like traffic rules, you need to maintain all constraints whether you are being watched or not. We do not sprinkle asserts to many places, but that doesn't mean that setting a variable to an impossible value is accepted.

Ok, thanks for detailed explanation. I'll follow.

Is it ok to land this ? checkAlignment returns dummy 1 value when error as discussed.

grimar planned changes to this revision.Oct 24 2017, 2:46 AM

Going to add tests for ALIGN as suggested in comments, forgot about that, sorry.

grimar updated this revision to Diff 120028.Oct 24 2017, 2:58 AM
  • Added testcases for testing ALIGN expression.
  • Added testcases for testing ALIGN expression.

Thanks, they look fine from my point of view, aside from one minor point.

test/ELF/linkerscript/align.s
84 ↗(On Diff #120028)

Minor point: to mirror the zero value cases above, could you switch round the order of these two cases, please.

grimar updated this revision to Diff 120032.Oct 24 2017, 3:28 AM
  • Addressed comment.
ruiu added inline comments.Oct 24 2017, 12:25 PM
ELF/ScriptParser.cpp
647 ↗(On Diff #120032)

The use of std::max is not obvious. I prefer

uint64_t Alignment = E().getValue();
if (Alignment == 0)
  return (uint64_t)1;
if (!isPowerOf2_64...
650 ↗(On Diff #120032)

Remove the comment because it is obvious.

test/ELF/linkerscript/subalign.s
26 ↗(On Diff #120032)

This comment is wrong. Our behavior is not exactly the same as GNU linkers, but it is consistent to our way.

grimar added inline comments.Oct 25 2017, 2:30 AM
ELF/ScriptParser.cpp
647 ↗(On Diff #120032)
650 ↗(On Diff #120032)

You asked me to add such comment for different but similar place here:
https://reviews.llvm.org/D36140?id=109103#827418

Isn't it consistent ? Should we remove it from there ?

ruiu added a comment.Oct 25 2017, 7:32 AM

Please commit.

ELF/ScriptParser.cpp
647 ↗(On Diff #120032)

I'll do.

650 ↗(On Diff #120032)

I'll do.

This revision was automatically updated to reflect the committed changes.