This is an archive of the discontinued LLVM Phabricator instance.

[ELF] PR30221 - linker script expression parser does not accept '~'
ClosedPublic

Authored by atanasyan on Sep 1 2016, 5:52 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

atanasyan updated this revision to Diff 69995.Sep 1 2016, 5:52 AM
atanasyan retitled this revision from to [ELF] PR30221 - linker script expression parser does not accept '~'.
atanasyan updated this object.
atanasyan added reviewers: ruiu, grimar.
atanasyan set the repository for this revision to rL LLVM.
atanasyan added a project: lld.
atanasyan added a subscriber: llvm-commits.
grimar added inline comments.Sep 1 2016, 6:28 AM
ELF/LinkerScript.cpp
1237 ↗(On Diff #69995)

Do you need that cast ?
Looks both branches do the same ?

emaste added a subscriber: emaste.Sep 1 2016, 6:32 AM
grimar added inline comments.Sep 1 2016, 7:18 AM
ELF/LinkerScript.cpp
1237 ↗(On Diff #69995)

Ah I see, it is int, not uint.

grimar edited edge metadata.Sep 1 2016, 7:31 AM

This looks fine for me. The only note I have is that gold/ld does not
support ~ without () it seems. So the only form that works is:

~(...)

So you could write:

if (Tok == "~") {
  Expr E = readParenExpr();
  return [=](uint64_t Dot) { return ~E(Dot); };
}

I don't know what is more correct though and why gnu linkers does
not allow to write just ~XXX.

This looks fine for me. The only note I have is that gold/ld does not
support ~ without () it seems. So the only form that works is:

Let's be consistent with gold/ld.

atanasyan updated this revision to Diff 70010.Sep 1 2016, 7:49 AM
atanasyan edited edge metadata.
  • Allow "~" operator if operand is surrounded by "()" to be consistent with gold/ld.
ruiu added inline comments.Sep 1 2016, 1:52 PM
ELF/LinkerScript.cpp
1236 ↗(On Diff #70010)

I think you want to fix the tokenizer (tokenize in ScritpParser.cpp) instead of handling -<number> in this file. Currently, - is allowed to be part of a token, but it is not correct. For example, with that tokenizer, an expression 3-1 is tokenized as 3-1 instead of 3, - and 1.

I think the only token that can contain - as part of it is -=. In all other cases, - shouldn't stick with adjacent characters.

Because we do not support -= yet, we don't care about that. So why don't you remove - from tokenize()?

1248 ↗(On Diff #70010)

Even if this is what gold does (I didn't test it myself), this looks very weird. If it's true, it seems a gold's bug. I think any primary expression should be allowed after ~.

ruiu added inline comments.Sep 1 2016, 2:54 PM
ELF/LinkerScript.cpp
1236 ↗(On Diff #70010)

I was wrong. - is allowed to be in a filename as well as -l<library> notation. So - can be part of a token. I think I'm fine with this new function.

grimar added inline comments.Sep 2 2016, 1:19 AM
ELF/LinkerScript.cpp
1248 ↗(On Diff #70010)

My consern about gold/ld is that users of lld can write some script using our syntax and have a problems if they decide to switch to gold/ld for some reason. I think it is always better to have script files fully compatible if it is possible.
Though it is always more convinent to write ~XXX then ~(XXX), therefore my position here is wonky.

ruiu added inline comments.Sep 2 2016, 7:00 AM
ELF/LinkerScript.cpp
1248 ↗(On Diff #70010)

I tried

echo 'SECTIONS { foo = ~0; }' > x
echo 'int main() {}' | /ssd/clang/bin/clang -xc - -o /dev/null -fuse-ld=gold -Wl,--script=./x

and gold didn't complain the absence of () after ~. I believe you made a mistake.

~ operator is defined in gold's yyscript.y as follows.

exp:
        '(' exp ')'
          { $$ = $2; }
      | '-' exp %prec UNARY
          { $$ = script_exp_unary_minus($2); }
      | '!' exp %prec UNARY
          { $$ = script_exp_unary_logical_not($2); }
      | '~' exp %prec UNARY
          { $$ = script_exp_unary_bitwise_not($2); }
      | '+' exp %prec UNARY
          { $$ = $2; }
      | exp '*' exp
          { $$ = script_exp_binary_mult($1, $3); }
      | ...

As you can see, exp is expected after ~.

Finally, it just doesn't make sense to require ( only after ~ unary operator while any expression is allowed for other unary operators. If you design a language, you'd never do that.

grimar added inline comments.Sep 2 2016, 7:05 AM
ELF/LinkerScript.cpp
1248 ↗(On Diff #70010)

Probably I made mistake that it was gold. But ld definetely does not like it:

. = 0x10000 + ~4;

test.script:3: undefined symbol `~4' referenced in expression

grimar added a comment.Sep 2 2016, 7:07 AM

Though as I mentioned earlier, it is always more convinent to write ~XXX. And if gold is fine with that, we can choose either way I think. I just was really sure that both gold and ld rejectes that for some reason.

ruiu added inline comments.Sep 2 2016, 7:18 AM
ELF/LinkerScript.cpp
1248 ↗(On Diff #70010)

That's a problem of the ld's tokenizer. The grammar still allows any expression after ~. Insert a space after ~, then you'll find that () is not actually required after ~.

grimar added inline comments.Sep 2 2016, 7:21 AM
ELF/LinkerScript.cpp
1248 ↗(On Diff #70010)

I see. Interesting :)

atanasyan updated this revision to Diff 70238.Sep 2 2016, 2:48 PM
  • Move recognition of negative number to the readInteger routine
  • Revert support of '~' operator without '()'
ruiu accepted this revision.Sep 2 2016, 2:57 PM
ruiu edited edge metadata.

LGTM

This revision is now accepted and ready to land.Sep 2 2016, 2:57 PM
This revision was automatically updated to reflect the committed changes.