This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Do not treat colon(":") as separate token in script parser.
AbandonedPublic

Authored by grimar on Mar 1 2017, 8:23 AM.

Details

Reviewers
ruiu
rafael
Summary

This is PR32091,

it has next script which we failed to parse.

GLIBCXX_3.4 {
    extern "C++" {
...
        std::set_new_handler*;
        std::set_terminate*;
        std::set_unexpected*;
        std::type_info::__*;
    }
}

Patch changes logic to not treat ":" as separate token to fix issue.

Diff Detail

Event Timeline

grimar created this revision.Mar 1 2017, 8:23 AM
ruiu edited edge metadata.Mar 1 2017, 8:55 AM

Are you sure this is correct? I was thinking that this is about quoted strings. This allows something like extern "C++" { std::foo; } (as opposed to extern "C++" { "std::foo"; }) which seems odd.

grimar added a comment.Mar 1 2017, 9:41 AM
In D30500#689702, @ruiu wrote:

Are you sure this is correct? I was thinking that this is about quoted strings. This allows something like extern "C++" { std::foo; } (as opposed to extern "C++" { "std::foo"; }) which seems odd.

I believe its fine.
If I have symbol _ZSt3qux which is std::qux. Lets see what BFD do.

Next script:

{ global: extern "C++" { "std::q*"; };  local: *; };

Will leave it local,
11: 000000000000014e 0 NOTYPE LOCAL DEFAULT 4 _ZSt3qux

because wildcard in quotes means exact match. And removing quotes makes symbol global as expected:

{ global: extern "C++" { std::q*; };  local: *; };

17: 0000000000000176 0 NOTYPE GLOBAL DEFAULT 4 _ZSt3qux

Your sample was about

{ global: extern "C++" { "std::qux"; };  local: *; };

vs

{ global: extern "C++" { std::qux; };  local: *; };

Both are accepted by BFD/gold and symbol is GLOBAL. That is expected behavior I believe,
see nothing odd here.

ruiu added inline comments.Mar 1 2017, 10:52 AM
ELF/LinkerScript.cpp
1965–1967

Handling only "local:" is not good. You have the same issue with "global:". You want to change this to bool consumeLabel(StringRef) and use it as consumeLabel("local") or consumeLabel("global"). It also needs comment that this is the only part that makes the grammar LL(2).

grimar updated this revision to Diff 90318.Mar 2 2017, 4:33 AM
  • Addressed review comments.
ELF/LinkerScript.cpp
1965–1967

Ok. I think it I need peakLabel too, to be able handle code like:
(line 2015).

if (peek() == "}" || Error || isNextLocal())
 break;
grimar edited the summary of this revision. (Show Details)Mar 2 2017, 4:35 AM
emaste added a subscriber: emaste.Mar 2 2017, 5:16 AM

extern "C++" { std::foo; } (as opposed to extern "C++" { "std::foo"; })

Oops, my fault. I added quotes while investigating the issue and accidentally pasted the quoted version in the bug report. The script actually has:

GLIBCXX_3.4 {
    extern "C++" {
...
        std::set_new_handler*;
        std::set_terminate*;
        std::set_unexpected*;
        std::type_info::__*;
    }
}
emaste added a comment.Mar 2 2017, 6:50 AM

With this change, linking libcxxrt in FreeBSD succeeds.

(I have not yet been able to test the result because lld is now segfaulting while linking a different component later in the build; I will submit a reproducer .tar in due course.)

ruiu added inline comments.Mar 2 2017, 9:18 AM
ELF/LinkerScript.cpp
1940–1946

I think you can do that without defining peekLabel. You should remove the first two lines from readLocals().

grimar added inline comments.Mar 2 2017, 9:29 AM
ELF/LinkerScript.cpp
1940–1946

This place is not a problem to avoid use of peekLabel. But
ScriptParser::readSymbols() need it anyways I believe:

if (peek() == "}" || Error || peekLabel("local:"))
  break;

We have to know when to get out from readSymbols, one of conditions is when we meet "local"

ruiu added inline comments.Mar 2 2017, 9:33 AM
ELF/LinkerScript.cpp
1940–1946

Do you mean it is impossible? If not, please try to edit other functions.

grimar added inline comments.Mar 2 2017, 9:37 AM
ELF/LinkerScript.cpp
1940–1946

Almost nothing is impossible usually.
I'll try tomorrow though don`t think I understand why it is ok to have peek()/consume() and not ok to have peekLable + consumeLable. Isn't that consistent ?

ruiu added inline comments.Mar 2 2017, 9:40 AM
ELF/LinkerScript.cpp
1940–1946

Because I don't like the LL(2) part of the grammar.

grimar updated this revision to Diff 90497.Mar 3 2017, 8:56 AM
  • Addressed review comments.
ruiu added a comment.Mar 6 2017, 8:47 AM

Overall this seems much more complicated than the original code. Can you simplify?

grimar updated this revision to Diff 90721.Mar 6 2017, 10:04 AM
  • Addressed review comment.
ruiu added inline comments.Mar 6 2017, 10:12 AM
ELF/LinkerScript.cpp
1945

: is not a part of a label. It's a punctuation after a label. So please remove it.

2004–2009

Do you need this error check?

ELF/ScriptLexer.cpp
261

This seems complicated. You want to skip either Tok or Tok:. In code, it's something like this.

if (consume(Tok))
  expect(":");
else
  expect(Tok + ":");
269–272

Please do not introduce a new utility function. If you want to skip two tokens, call next() twice.

grimar updated this revision to Diff 90818.Mar 7 2017, 1:39 AM
  • Addressed review comment.
ELF/LinkerScript.cpp
1945

Done.

2004–2009

Otherwise we would accept scripts like:

{
global:
...
local:
...
local:
...
local:
...
}

Both bfd and gold does not accept this, I think we also should not.

Code would be really simpler if we had peekLabel() I believe. Still not understand why we can't have one. consumeLabel can me implemented via peekLabel() + next(), so at fact we would have single method relying on LL(2) in final anyways.

ELF/ScriptLexer.cpp
261

No, that is not like this. I can't consume Tok and expect(":") for example because Tok may be a "local" symbol name, and not a label. I can't expect anything in this method actually.

expect(Tok + ":"); sure simplifies things, but requires using Twine argument (what involves std::string creation for each consume call, hope that is fine).

269–272

OK.

ruiu added inline comments.Mar 7 2017, 2:11 PM
ELF/LinkerScript.cpp
2004–2009

What's wrong with that? I think we should accept that because that makes sense. Overall this part of code should be simpler than this. Let me try.

grimar abandoned this revision.Mar 10 2017, 1:32 AM

D30722 landed instead.