This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Support . and $ in symbol names in expressions
ClosedPublic

Authored by MaskRay on Mar 9 2021, 4:44 PM.

Details

Summary

GNU ld supports . and $ in symbol names while LLD doesn't support them in
readPrimary expressions. Using . can result in such an error:

https://github.com/ClangBuiltLinux/linux/issues/1318
ld.lld: error: ./arch/powerpc/kernel/vmlinux.lds:255: malformed number: .TOC.
>>>   __toc_ptr = (DEFINED (.TOC.) ? .TOC. : ADDR (.got)) + 0x8000;

Allow . (ppc64 special symbol .TOC.) and $ (RISC-V special symbol __global_pointer$).

Change diag[3-5].test to use an invalid character ^.

Note: GNU ld allows ~ in non-leading positions of a symbol name. ~
is not used in practice, conflicts with the unary operator, and can
cause some parsing difficulty, so this patch does not add it.

Diff Detail

Event Timeline

MaskRay created this revision.Mar 9 2021, 4:44 PM
MaskRay requested review of this revision.Mar 9 2021, 4:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 9 2021, 4:44 PM
MaskRay retitled this revision from [ELF] Support . and $ in expressions to [ELF] Support . and $ in symbol names in expressions.Mar 9 2021, 4:45 PM

Noticed that ld.bfd will also accept ~ which may be worth adding as well; other than that looks reasonable to me.

lld/ELF/ScriptParser.cpp
1239

Looks like ld will also accept a '~' at least in some places, for example:

tem~p = 0x1000;
SECTIONS {
  boom tem~p : { *(.temp) }
}

Unlikely this will come up in practice but maybe worth matching it as well.

MaskRay added a comment.EditedMar 10 2021, 10:32 AM

Noticed that ld.bfd will also accept ~ which may be worth adding as well; other than that looks reasonable to me.

We could support ~ in non-leading positions in a symbol name but that would just add unneeded complexity.

~ is also a valid unary operator so supporting it requires us to tokenize ~. A simple try

--- a/lld/ELF/ScriptLexer.cpp
+++ b/lld/ELF/ScriptLexer.cpp
@@ -174,3 +174,3 @@ bool ScriptLexer::atEOF() { return errorCount() || tokens.size() == pos; }
 static std::vector<StringRef> tokenizeExpr(StringRef s) {
-  StringRef ops = "+-*/:!~=<>"; // List of operators
+  StringRef ops = "+-*/:!=<>"; // List of operators
 
diff --git a/lld/ELF/ScriptParser.cpp b/lld/ELF/ScriptParser.cpp
index 4b15a71f029b..0b10d3a67754 100644
--- a/lld/ELF/ScriptParser.cpp
+++ b/lld/ELF/ScriptParser.cpp
@@ -1236,2 +1236,9 @@ static void checkIfExists(OutputSection *cmd, StringRef location) {
 
+static bool isValidSymbolName(StringRef s) {
+  auto valid = [](char c) {
+    return isAlnum(c) || c == '$' || c == '.' || c == '_' || c == '~';
+  };
+  return !s.empty() && !isDigit(s[0]) && s[0] != '~' && llvm::all_of(s, valid);
+}
+
 Expr ScriptParser::readPrimary() {

does not work because symbol5 = symbol - ~0xfffb; in symbol-assignexpr.s will fail to parse.
The reason is that tokenizeExpr needs to recognize that the leading ~ in ~0xfffb can still be parsed.

Put these things together, I think ~ causes some ambiguity. Since no practical symbol names use ~, I incline to not support it.

peter.smith accepted this revision.Mar 11 2021, 12:12 AM

Fair enough, I think '$' and '.' are the important ones. LGTM.

This revision is now accepted and ready to land.Mar 11 2021, 12:12 AM
MaskRay edited the summary of this revision. (Show Details)Mar 11 2021, 9:23 AM
This revision was automatically updated to reflect the committed changes.
jhenderson added inline comments.Mar 12 2021, 2:45 AM
lld/ELF/ScriptParser.cpp
1419

Idle thought I had whislt taking a quick look at this change: this error message doesn't seem particularly suitable to me. There's no guarantee that the user was trying to write a number. It could easily be they wanted a symbol name.

MaskRay added inline comments.Apr 2 2021, 12:06 PM
lld/ELF/ScriptParser.cpp
1419

I noticed the same while touching this line but thought this should be a separate change.