This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Skip repeated libraries.
Needs RevisionPublic

Authored by ikudrin on Oct 6 2020, 10:04 PM.

Details

Summary

If a library is added in the link several times, there is no need to process it more than once, because LLD is able to resolve back refs. The optimization is especially useful for the dependent libraries feature, where the same library may be referenced many times in input files.

Diff Detail

Event Timeline

ikudrin created this revision.Oct 6 2020, 10:04 PM
ikudrin requested review of this revision.Oct 6 2020, 10:04 PM
MaskRay requested changes to this revision.EditedOct 6 2020, 10:38 PM

I am skeptical about the change.

  • The duplicated libraries are usually indicator of user errors or build system brittleness.
  • The current way the state is introduced is incompatible with the direction on LLD-as-a-library (D88348) (This can be fixed).
  • The --verbose now diverges from GNU ld and gold. (This is usually a tie-breaker instead of a decisive reason.)
  • When duplicated libraries happen, the archive index can guarantee that the performance is not degraded.
This revision now requires changes to proceed.Oct 6 2020, 10:38 PM

I am skeptical about the change.

  • The duplicated libraries are usually indicator of user errors or build system brittleness.

The provided optimization is aimed to slightly improve the dependent libraries' case, where lots (hundreds) of references is very common. In a regular link, its impact is not expected to be noticeable.

  • The current way the state is introduced is incompatible with the direction on LLD-as-a-library (D88348) (This can be fixed).

Do you mean that addedLibraries is not reset on reenter? If so, the same is true for other members of LinkerDriver as well, so the patch does not make the situation worse.

  • The --verbose now diverges from GNU ld and gold. (This is usually a tie-breaker instead of a decisive reason.)

Well, we should not imitate GNU linkers in every nuance, especially in the internals, aren't we?

  • When duplicated libraries happen, the archive index can guarantee that the performance is not degraded.

The more repeated libraries referenced, the more time and memory the optimization saves. Even reading and basically parsing the archive files takes some time, and if there are hundreds of them, the improvement becomes noticeable. Of course, saving is not huge, but why not have it if the implementation is cheap?

I am skeptical about the change.

  • The duplicated libraries are usually indicator of user errors or build system brittleness.

The provided optimization is aimed to slightly improve the dependent libraries' case, where lots (hundreds) of references is very common. In a regular link, its impact is not expected to be noticeable.

  • The current way the state is introduced is incompatible with the direction on LLD-as-a-library (D88348) (This can be fixed).

Do you mean that addedLibraries is not reset on reenter? If so, the same is true for other members of LinkerDriver as well, so the patch does not make the situation worse.

  • The --verbose now diverges from GNU ld and gold. (This is usually a tie-breaker instead of a decisive reason.)

Well, we should not imitate GNU linkers in every nuance, especially in the internals, aren't we?

  • When duplicated libraries happen, the archive index can guarantee that the performance is not degraded.

The more repeated libraries referenced, the more time and memory the optimization saves. Even reading and basically parsing the archive files takes some time, and if there are hundreds of them, the improvement becomes noticeable. Of course, saving is not huge, but why not have it if the implementation is cheap?

The implementation may look cheap but "the provided optimization is aimed to slightly improve the dependent libraries' case" The benefit may not be noticeable. So far I am on the fence. However, then there are a few downsides including the incompatibility with --warn-backrefs and --verbose output. (The !config->formatBinary && !inWholeArchive condition looks subtle and can be error-prone as well)

I've made some tests and speculatively estimate the saving up to several seconds for a relatively large library (about 100000 symbols) which is referenced several hundred times. For me, that is noticeable.