This is an archive of the discontinued LLVM Phabricator instance.

Apply different tokenization rules to linker script expressions.
ClosedPublic

Authored by ruiu on Feb 14 2017, 2:18 PM.

Details

Summary

The linker script lexer is context-sensitive. In the regular context,
arithmetic operator characters are regular characters, but in the
expression context, they are independent tokens. This afects how the
lexer tokenizes "3*4", for example. (This kind of expression is real;
the Linux kernel uses it.)

This patch defines function maybeSplitExpr. This function splits the
current token into multiple expression tokens if the lexer is in the
expression context.

Event Timeline

ruiu created this revision.Feb 14 2017, 2:18 PM
grimar edited edge metadata.EditedFeb 15 2017, 1:07 AM

:) I think that approach implements exactly what I had in mind when wrote
in "[llvm-dev] Linking Linux kernel with LLD" thread:

"I was thinking about entering some special parser state for
extracting sub tokens from tokens transparently when
we are inside code that evaluates the expression."

Looks good to me.

lld/test/ELF/linkerscript/operators.s
13

I would add another sub-case, which tests all operators you support in patch.

grimar added inline comments.Feb 15 2017, 7:54 AM
lld/ELF/ScriptLexer.cpp
187

May be just:

  E = std::max(E, 1);
  Ret.push_back(S.substr(0, E));
  S = S.substr(E);
}
This revision was automatically updated to reflect the committed changes.
tpimh added a subscriber: tpimh.Feb 16 2017, 2:00 AM