This is an archive of the discontinued LLVM Phabricator instance.

[lld:elf] Weaken the requirement for a computed binding to be STB_LOCAL
ClosedPublic

Authored by lanza on Mar 16 2021, 1:41 AM.

Details

Summary

Given the following scenario:

// Cat.cpp
struct Animal { virtual void makeNoise() const = 0; };
struct Cat : Animal { void makeNoise() const override; };

extern "C" int puts(char const *);
void Cat::makeNoise() const { puts("Meow"); }
void doThingWithCat(Animal *a) { static_cast<Cat *>(a)->makeNoise(); }

// CatUser.cpp
struct Animal { virtual void makeNoise() const = 0; };
struct Cat : Animal { void makeNoise() const override; };

void doThingWithCat(Animal *a);

void useDoThingWithCat() {
  Cat *d = new Cat;
  doThingWithCat(d);
}

// cat.ver
{
  global: _Z17useDoThingWithCatv;
  local: *;
};

$ clang++ Cat.cpp CatUser.cpp -fpic -flto=thin -fwhole-program-vtables -shared -O3 -fuse-ld=lld -Wl,--lto-whole-program-visibility -Wl,--version-script,cat.ver

We cannot devirtualize Cat::makeNoise. The issue is complex:

Due to -fsplit-lto-unit and usage of type metadata, we place the Cat vtable declaration into module 0 and the Cat vtable definition with type metadata into module 1, causing duplicate entries (Undefined followed by Defined) in the lto::InputFile::symbols() output.
In BitcodeFile::parse, after processing the Undefined then the Defined, the final state is Defined.
In BitcodeCompiler::add, for the first symbol, computeBinding returns STB_LOCAL, then we reset it to Undefined because it is prevailing (versionId is preserved). For the second symbol, because the state is now Undefined, computeBinding returns STB_GLOBAL, causing ExportDynamic to be true and suppressing devirtualization.

In D77280, the computeBinding change used a stricter isDefined() condition to make weak`Lazy symbol work.
This patch relaxes the condition to weaker !isLazy() to keep it working while making the devirtualization work as well.

Diff Detail

Event Timeline

lanza created this revision.Mar 16 2021, 1:41 AM
lanza requested review of this revision.Mar 16 2021, 1:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2021, 1:41 AM

The summary says !isLocal but the code says !isLazy. I'm assuming the code is correct.

Can you add a test case?

The summary says !isLocal but the code says !isLazy. I'm assuming the code is correct.

Correct, typo.

Can you add a test case?

Yea of course, just want Fangrui's feedback before I finish minimizing my larger c++ test case.

smeenai resigned from this revision.Mar 16 2021, 10:14 AM

Sounds good. I'll let Fangrui take a look then :)

lanza edited the summary of this revision. (Show Details)Mar 16 2021, 10:32 AM
MaskRay added a comment.EditedMar 16 2021, 11:06 AM

b.cpp being part of the link causes the vtbl for _ZTV3Cat to be STB_GLOBAL since it's external.

Can you elaborate what happens? b.o has only one symbol _Z14doThingWithCatP3Cat. There is no vtable symbol...

Only canBeVersioned symbols can have a versionId value not equal to VER_NDX_GLOBAL. How does the isDefined() -> !isLazy() change affect behaviors?

The isDefined() requirement is to make version-script-weak.s work: a STB_WEAK lazy symbol, if not fetched, should be treated similar to an undefined weak symbol. An undefined symbol should not have non-VER_NDX_GLOBAL versionId values.

lanza added a comment.EditedMar 16 2021, 6:02 PM

The isDefined() requirement is to make version-script-weak.s work: a STB_WEAK lazy symbol, if not fetched, should be treated similar to an undefined weak symbol. An undefined symbol should not have non-VER_NDX_GLOBAL versionId values.

Can you elaborate what happens? b.o has only one symbol _Z14doThingWithCatP3Cat. There is no vtable symbol...

Yup, sorry, my minimized case was too minimized. Here's one that performs devirtualization with my change present and does not without it.

a.cpp

struct Animal {
  virtual void makeNoise() const = 0;
};

struct Cat : public Animal {
  void makeNoise() const override;
};

extern "C" int puts(char const *);

void Cat::makeNoise() const { puts("Meow"); }

void doThingWithCat(Animal *a) {
  if (auto d = reinterpret_cast<Cat *>(a))
    d->makeNoise();
}

`b.cpp

struct Animal {
  virtual void makeNoise() const = 0;
};

struct Cat : public Animal {
  void makeNoise() const override;
};

void doThingWithCat(Animal *a);

void useDoThingWithCat() {
  Cat *d = new Cat;
  doThingWithCat(d);
}

version.exp

{
  global:
    _Z17useDoThingWithCatv;
  local:
  *;
};

clang++-13 invocation

clang++ Cat.cpp CatUser.cpp -o libA.so -shared -std=c++20 -O3 -fuse-ld=lld \
                      -fwhole-program-vtables -flto=thin \
                      -Wl,--lto-whole-program-visibility \
                      -Wl,--version-script,version.exp \
                      -Wl,--opt-remarks-format,yaml -Wl,--opt-remarks-filename,libA.yaml -fpic

_ZTV3Cat occurs 4 times here -- twice for each file due to split LTO. Two instances have isDefined() == true and two have isDefined() == false. My change instead tests !isLazy() which they are both non-lazy.

Only canBeVersioned symbols can have a versionId value not equal to VER_NDX_GLOBAL. How does the isDefine) -> !isLazy() change affect behaviors?

I'm guessing that since the symbol is seen and defined during the initial symbol collection from Cat.cpp.o that the vtbl is isDefined() by the time the parsing of the version-script happens. Thus the version is marked as local but it is marked as UndefinedKind.

MaskRay added a comment.EditedMar 16 2021, 7:49 PM

clang++-13 invocation

clang++ Cat.cpp CatUser.cpp -o libA.so -shared -std=c++20 -O3 -fuse-ld=lld \
                      -fwhole-program-vtables -flto=thin \
                      -Wl,--lto-whole-program-visibility \
                      -Wl,--version-script,version.exp \
                      -Wl,--opt-remarks-format,yaml -Wl,--opt-remarks-filename,libA.yaml -fpic

_ZTV3Cat occurs 4 times here -- twice for each file due to split LTO. Two instances have isDefined() == true and two have isDefined() == false. My change instead tests !isLazy() which they are both non-lazy.

I see what you mean. With the change (the Cat vtable is now local) devirtualization can happen.

% fld.lld @response.txt -mllvm -pass-remarks=. --opt-remarks-format yaml --opt-remarks-filename xx.yaml
<unknown>:0:0: single-impl: devirtualized a call to _ZNK3Cat9makeNoiseEv
<unknown>:0:0: single-impl: devirtualized a call to _ZNK3Cat9makeNoiseEv
<unknown>:0:0: _ZNK3Cat9makeNoiseEv inlined into _Z14doThingWithCatP6Animal with (cost=0, threshold=375)
<unknown>:0:0: _ZNK3Cat9makeNoiseEv inlined into _Z14doThingWithCatP6Animal with (cost=0, threshold=375)
<unknown>:0:0: _Z14doThingWithCatP6Animal inlined into _Z17useDoThingWithCatv with (cost=0, threshold=375)

The problem is due to duplicate entries in the IR symbol table in -fsplit-lto-unit mode (default if -flto={thin,full} -fwhole-program-vtables)

I did notice that the IR symbol table can have duplicate entries in other cases (llvm-nm/lld will notice the issue; llvm-lto2 dump-symtab merges duplicate entries).
For example, with module level inline asm, the undefined symbols are placed in the end.
If a referenced symbol is defined in the IR, you may notice two entries.
It would not matter if IR symbol table can make sure defined symbols are always ordered before undefined ones. (That is still brittle.)

In this case, before the undefine symbol precedes the define one. The duplication is more harmful.

% fllvm-nm a.o
---------------- T _Z14doThingWithCatP6Animal
---------------- T _ZNK3Cat9makeNoiseEv
                 U _ZNK3Cat9makeNoiseEv
---------------- D _ZTI3Cat
                 U _ZTI3Cat  ## duplicate, but benign because defined precedes undefined
---------------- W _ZTI6Animal
---------------- D _ZTS3Cat
---------------- W _ZTS6Animal
                 U _ZTV3Cat
---------------- D _ZTV3Cat  ## duplicate and harmful, because undefined precedes defined
                 U _ZTVN10__cxxabiv117__class_type_infoE
                 U _ZTVN10__cxxabiv120__si_class_type_infoE
                 U puts

So I think this patch is a workaround. I will take a look at the IR symbol table issue.

Only canBeVersioned symbols can have a versionId value not equal to VER_NDX_GLOBAL. How does the isDefine) -> !isLazy() change affect behaviors?

I'm guessing that since the symbol is seen and defined during the initial symbol collection from Cat.cpp.o that the vtbl is isDefined() by the time the parsing of the version-script happens. Thus the version is marked as local but it is marked as UndefinedKind.

The issue here is due to IR symbol table's problem. If it is fixed, then we should have this invariant: between consecutive computeBinding calls on a symbol, a Defined cannot downgrade to others.

The only place where we change a Defined to an Undefined is in LTO.cpp when the symbol is prevailing, however, the Undefined will become Defined again when LTO has generated object files. computeBinding is not called in between.

MaskRay added a subscriber: tejohnson.EditedMar 16 2021, 11:47 PM

OK, I think IR symbol table probably should not be fixed. isDefined -> !isLazy on lld side looks fine to me.

llvm::lto::InputFile::symbols() may return duplicate symbols because

(1) A symbol may appear in multiple modules in a regular LTO bitcode file (llvm-cat a.bc b.bc).
(2) Global values and assembly symbols are concatenated (ModuleSymbolTable::addModule) without deduplication.
(3) Due to -fsplit-lto-unit -flto={thin,full}, a symbol may appear as an undefined in module 0 and as an defined in module 1 (e.g. a vtable symbol with the !type metadata).

LTO::addRegularLTO and lto::InputFile::create need to be synchronized on the symbol table. The symbols from multiple modules are concatenated.
While I thought deduplicating the symbol table for (2) and (3) would be nice, (1) and (2) and the current symbol table representation make deduplication inconvenient.
In addition, the current symbol resolution communication from various customers (different ports of lld and LLVMgold.so) to lib/LTO can work around duplicate symbol table entries easily (the r.Prevailing = line, its comment is a bit outdated though).
(@tejohnson )

So I think we should just live with duplicate symbols and don't bother fix (2) and (3), at least before something more clumsy is known.
For -fwhole-program-vtables, (1) and (2) are special cases. (3) is a practical usage (-shared + local: version node in a version-script) and we should optimize it if we can.
And we can indeed optimize it with such a one-line change:)

You need to add the Cat/CatUser test as a new file (named devirt_vcall*.ll).

lanza updated this revision to Diff 331429.Mar 17 2021, 5:56 PM

Add test case

lanza added a comment.Mar 17 2021, 6:03 PM

Yea, I figured that this was likely a fragile ordering issue. I was thinking about an approach but didn't know enough about lld outside of the standard LTO code path, thus figured I should ask here first. Thanks for the feedback!

MaskRay added a comment.EditedMar 17 2021, 6:56 PM

I'd completely rewrite the description. Consider this:

// Cat.cpp
struct Animal { virtual void makeNoise() const = 0; };
struct Cat : Animal { void makeNoise() const override; };

extern "C" int puts(char const *);
void Cat::makeNoise() const { puts("Meow"); }
void doThingWithCat(Animal *a) { static_cast<Cat *>(a)->makeNoise(); }

// CatUser.cpp
struct Animal { virtual void makeNoise() const = 0; };
struct Cat : Animal { void makeNoise() const override; };

void doThingWithCat(Animal *a);

void useDoThingWithCat() {
  Cat *d = new Cat;
  doThingWithCat(d);
}
// cat.ver
{
  global: _Z17useDoThingWithCatv;
  local: *;
};
clang++ Cat.cpp CatUser.cpp -fpic -flto=thin -fwhole-program-vtables -shared -O3 -fuse-ld=lld -Wl,--lto-whole-program-visibility -Wl,--version-script,cat.ver

We fail to devirtualize makeNoise. The issue is complex:

  1. Due to -fsplit-lto-unit and usage of type metadata, we place the Cat vtable declaration into module 0 and the Cat vtable definition with type metadata into module 1, causing duplicate entries (Undefined followed by Defined) in the lto::InputFile::symbols() output.
  2. In BitcodeFile::parse, after processing the Undefined then the Defined, the final state is Defined.
  3. In BitcodeCompiler::add, for the first symbol, computeBinding returns STB_LOCAL, then we reset it to Undefined because it is prevailing (versionId is preserved). For the second symbol, because the state is now Undefined, computeBinding returns TB_GLOBAL, causing ExportDynamic to be true and suppressing devirtualization.

In D77280, the computeBinding change used a stricter isDefined() condition to make weak Lazy symbol work.
This patch relaxes the condition to weaker !isLazy() to keep it working while making computeBinding return true for the second symbol, thus enabling devirtualization.

lld/test/ELF/lto/devirt_extern_vtable.ll
6

You can use split-file to place auxiliary files in one file. See some newer tests for example.

11

nit: | \ (the previous line instead of the continuation line)

54

Can you compile with higher optimization and remove unneeded instructions?

84

Attributes need clean-up.

89

Delete unneeded module flags and !llvm.linker.options.

MaskRay added inline comments.Mar 17 2021, 7:17 PM
lld/test/ELF/lto/devirt_extern_vtable.ll
1

devirt-local-split-unit

lanza edited the summary of this revision. (Show Details)Mar 17 2021, 7:23 PM
lanza added a comment.Mar 17 2021, 7:26 PM

I'd completely rewrite the description. Consider this:

👍

lld/test/ELF/lto/devirt_extern_vtable.ll
1

You mean for the file name?

MaskRay added inline comments.Mar 17 2021, 7:58 PM
lld/test/ELF/lto/devirt_extern_vtable.ll
1

yes, hmm may be devirt-split-unit-localize.ll

lanza marked 6 inline comments as done.Mar 17 2021, 8:34 PM
lanza updated this revision to Diff 331453.Mar 17 2021, 8:36 PM

Address Fangrui's comments

lanza updated this revision to Diff 331454.Mar 17 2021, 8:40 PM

Fix check in test

MaskRay accepted this revision.Mar 18 2021, 10:19 AM

Thanks!

This revision is now accepted and ready to land.Mar 18 2021, 10:19 AM