This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Resolve references properly when using .symver directive
ClosedPublic

Authored by grimar on May 30 2017, 8:45 AM.

Details

Summary

This is PR28414.
Previously LLD was unable to link following:
(failed with undefined symbol bar)

Version script:
SOME_VERSION { global: *; };

.global _start
.global bar
.symver _start, bar@@SOME_VERSION
_start:
  jmp bar

Manual has next description:

.symver name, name2@@nodename
In this case, the symbol name must exist and be defined within the file being assembled. It is similar to name2@nodename.
The difference is name2@@nodename will also be used to resolve references to name2 by the linker
https://sourceware.org/binutils/docs/as/Symver.html

Patch implements that. If we have name@@ver symbol and name is undefined, name@@ver is used to resolve references to name.
If name is defined then multiple definition error is emited, that is consistent with what bfd do.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar created this revision.May 30 2017, 8:45 AM
grimar edited the summary of this revision. (Show Details)May 30 2017, 8:46 AM
grimar updated this revision to Diff 101173.Jun 2 2017, 1:42 AM
  • Addressed review comments.
ruiu edited edge metadata.Jun 5 2017, 5:53 AM

I don't think we want to add a new function resolve for this feature. That function seems a bit too ad-hoc. You probably should resolve any symbol whose name is "foo@@bar" as "foo" from the beginning, by stripping "@@" and any characters that follows double-atsign before inserting symbols to the symbol table.

grimar added a comment.Jun 6 2017, 3:13 AM
In D33680#772682, @ruiu wrote:

I don't think we want to add a new function resolve for this feature. That function seems a bit too ad-hoc. You probably should resolve any symbol whose name is "foo@@bar" as "foo" from the beginning, by stripping "@@" and any characters that follows double-atsign before inserting symbols to the symbol table.

I afraid I do not know how to implement your idea right now then. We add symbols at SymbolTable<ELFT>::insert(StringRef Name). There we are unable to assign symbol
versions, because we do not know if this symbol undefined or not shared may be (that means it is not a definition). At this place we have only name. And still need to parse it somehow
to assign proper version definition. This patch allows to do that in easy and clear way.

I spent not much time on this and may be missing something. Going to revisit your suggestion later.

grimar updated this revision to Diff 102014.Jun 9 2017, 5:14 AM
  • Addressed comments.
ruiu added a comment.Jun 9 2017, 2:01 PM

What this patch does is probably correct, but it doesn't look clean. The code to do one thing scattered across SymbolTable.cpp and Symbols.cpp for no reason. Let me clean this up.

Going to do an update I mentioned earlier for this one tomorrow, I am sorry for delay.

grimar updated this revision to Diff 102527.Jun 14 2017, 5:17 AM
  • Updated.

I played with few variations of this patch and found combination that looks fine for me finally.
I think this way of solving initial issue is simple and short and that is why I prefer it in general.

isDefaultVersion() looks a code duplication at first look, but it is just 3 lines and
IMO makes logic be very straightforward and allows to simplify other things, what makes this patch be very short.

grimar updated this revision to Diff 102604.Jun 14 2017, 2:12 PM
  • Fixing silly last minute mistype.
ruiu added inline comments.Jun 14 2017, 4:27 PM
ELF/SymbolTable.cpp
719–721 ↗(On Diff #102604)

You can do

return B->isInCurrentDSO() && B->getName().find("@@") != StringRef::npos;
733 ↗(On Diff #102604)

You can do

for (size_t I = 0; I < SymVector.size(); ++I)

to remove DefaultV, no?

734 ↗(On Diff #102604)

What is the cost of doing this for each symbol?

ELF/Symbols.cpp
259 ↗(On Diff #102604)

Why did you add this?

grimar updated this revision to Diff 102655.Jun 15 2017, 4:49 AM
  • Addressed review comments.
ELF/SymbolTable.cpp
719–721 ↗(On Diff #102604)

Done. FWIW at first I wanted to suggest to inline it since it is short now, but I tried to do that and found it is a bit more readable for me when it is wrapped in a method just like now.

733 ↗(On Diff #102604)

Yes, I tried something like that too when wrote it. I just found that its looks close to
mutating arguments. When you debug a loop it is probably easier to understand logic when
it's end is static. Having something like temp vector it is not ideal, but I would prefer to have temp vector than
mutate loop bound probably.

734 ↗(On Diff #102604)

I tried to link LLC binary with/wo this patch. Numbers were 5,790791071 seconds time elapsed (+- 0,23% ) vs 5,736353430 seconds time elapsed ( +- 0,32% ), so it looks like difference is zero or minimal because only computation error is visible.

ELF/Symbols.cpp
259 ↗(On Diff #102604)

Explanation is below (from my earlier answer about the same change to Rafael in llvm-mails)
Looks I should have add comment here, I did that.

Why do you needed to add "if (Config->Shared)"? The patch looks good
otherwise, but I would like to understand this part.

Cheers,
Rafael

Testcase does not specify --version-script and would fail otherwise with "has undefined version error",
after doing this change we are consistent with gnu linkers which do not require version script file for such case when
building executable.

And reason for that I believe is next. Spec (https://sourceware.org/binutils/docs/as/Symver.html) says:
"Use the .symver directive to bind symbols to specific version nodes within a source file. This is only supported on ELF platforms,
and is typically used when assembling files to be linked into a shared library. There are cases where it may make sense to use this
in objects to be bound into an application itself so as to override a versioned symbol from a shared library."
(last sentence).

Example. Imagine dso.s:

.globl foo1
foo1:
  ret
.globl foo2
foo2:
  ret

And version script ver.txt: VERSION_1.0 { global : foo1; foo2; local : *; };

Build DSO:
llvm-mc -filetype=obj -triple=x86_64-pc-linux dso.s -o dso.o
ld.bfd dso.o -shared -o dso.so --version-script ver.txt

Now we want to override one of foo() in executable. Usually version script is not provided when linking executable,
so we probably should be able just to do:

(app.s)

.global zed
.symver zed, foo1@@VERSION_1.0
zed:
  .quad 0

.global _start
_start:
 call foo1
 call foo2

llvm-mc -filetype=obj -triple=x86_64-pc-linux app.s -o app.o
ld.bfd app.o dso.so -o out

And end up with overriden foo1:

17: 00000000006003b0     0 NOTYPE  GLOBAL DEFAULT   10 foo2@@VERSION_1.0
18: 00000000006003b0     0 NOTYPE  GLOBAL DEFAULT   10 _edata
19: 00000000006003b0     0 NOTYPE  GLOBAL DEFAULT   10 _end
20: 0000000000400278     0 NOTYPE  GLOBAL DEFAULT    8 foo1@@VERSION_1.0

Without "if (Config->Shared)" we would error out here.

grimar updated this revision to Diff 102957.Jun 17 2017, 9:51 PM
  • Removed temp vector according to review suggestions. I am sorry for delay.
ruiu added inline comments.Jun 20 2017, 7:53 PM
ELF/SymbolTable.cpp
728 ↗(On Diff #102957)

Now that you can use a for-each loop, no?

ELF/Symbols.cpp
259 ↗(On Diff #102604)

Why do you want to support that behavior?

grimar updated this revision to Diff 103334.Jun 21 2017, 3:13 AM
  • Addressed review comments.
ELF/SymbolTable.cpp
728 ↗(On Diff #102957)

Ah yes. Honestly I found your comment very confusing at first. Because I used this loop assuming that SymVector may change in addRegular(). That can not happen because of find(Body->getName()) check below.

I found that even in diff where temp vector was introduced (102014), it was useless, that happened
because initially when I wrote patch I did not find() symbols, but added regulars always,
then found it is wrong (we should resolve only undefined ones) and added a check
later, before posting. I did not realize that check removes need of vector that time.

And all time after that I was still keeping in mind that code may modify SymVector in loop, but it was a mistake.

ELF/Symbols.cpp
259 ↗(On Diff #102604)

I think it is consistent with spec (please see my comment from Thu, Jun 15, 2:49 PM),
and consistent with gnu linkers, so why not to do that ?

Otherwise for testcase we will need to specify --version-script. That still works for initial PR,
but also uncommon in general for building executables I think.

ruiu added inline comments.Jun 22 2017, 3:08 PM
ELF/SymbolTable.cpp
733–734 ↗(On Diff #103334)

<name>@@<version> means the symbol is the default version. If that's the case,
the symbol is not used only to resolve <name> of version<version> but also unversioned
symbols with name <name>.

735 ↗(On Diff #103334)

Don't you want to check if a symbol returned by find is Undefined?

ELF/Symbols.cpp
259 ↗(On Diff #102604)

I don't know if this logic is correct. At least it looks odd. Shouldn't it be a check if a version script was given?

grimar updated this revision to Diff 103915.Jun 26 2017, 3:14 AM
  • Addressed review comments.
ELF/SymbolTable.cpp
735 ↗(On Diff #103334)

No, I think we should do this check.

First testcase (version-script-symver-err.s) currently fails with "error: duplicate symbol: bar".
If I would resolve only undefined symbols, it would not fail,
though this behavior is consistent with GNU linkers and looks fine for me. I do not think spec says anything about we should try to resolve only undefined symbols.

ELF/Symbols.cpp
259 ↗(On Diff #102604)

As I mentioned, looks spec allows that. I am not sure how much this case is common though.

Original PR I am trying to address here has version script in command line,
Because of that I suggest to focus on case revealed by PR. I updated testcases to have version scripts and removed this check. This fixes PR and if later we will have complains from somebody, we will be able to put this check back again.
What do you think ?

grimar updated this revision to Diff 104381.Jun 28 2017, 3:52 AM
  • Addressed review comments/suggestions.
ruiu added inline comments.Jun 28 2017, 12:29 PM
ELF/SymbolTable.cpp
742–743 ↗(On Diff #104381)

Old doesn't sound like a good name as it is not an old symbol. Since the scope of this variable is very small, just S would be enough.

745–746 ↗(On Diff #104381)

I added Symbol::copyBody, so please use that function.

grimar updated this revision to Diff 104622.Jun 29 2017, 5:45 AM
  • Addressed review comments.
ELF/SymbolTable.cpp
742–743 ↗(On Diff #104381)

Done.

745–746 ↗(On Diff #104381)

Done.

ruiu accepted this revision.Jun 29 2017, 8:45 AM

LGTM

This revision is now accepted and ready to land.Jun 29 2017, 8:45 AM
This revision was automatically updated to reflect the committed changes.
grimar updated this revision to Diff 105086.Jul 3 2017, 8:41 AM
  • Reopening reverted patch with fix for FreeBSD failture testcase provided by Ed Maste.
grimar reopened this revision.Jul 3 2017, 8:41 AM
This revision is now accepted and ready to land.Jul 3 2017, 8:41 AM
grimar requested review of this revision.Jul 3 2017, 8:41 AM
grimar edited edge metadata.
ruiu accepted this revision.Jul 3 2017, 12:52 PM

LGTM

This revision is now accepted and ready to land.Jul 3 2017, 12:52 PM
This revision was automatically updated to reflect the committed changes.
orivej added a subscriber: orivej.Jul 6 2017, 6:46 PM

lld incorrectly resolves versioned symbols to zeros when they are in a static library. Here is an example that should loop, but it crashes: https://gist.github.com/orivej/8379fbd550022e966f77d1516e31702a

grimar added a comment.Jul 7 2017, 1:23 AM

lld incorrectly resolves versioned symbols to zeros when they are in a static library. Here is an example that should loop, but it crashes: https://gist.github.com/orivej/8379fbd550022e966f77d1516e31702a

D35059 fixes that too.