This is an archive of the discontinued LLVM Phabricator instance.

[LLD] Search archives for non-tentative defintions.
ClosedPublic

Authored by sfertile on Aug 18 2020, 7:21 AM.

Details

Summary

Work in progress patch to spur the discussion. Motivated by bfds common symbol handling in archives which is needed by FORTRAN libraries which use CommonBlock/BlockData .

When an object symbol is a tentative definition, we search archives for 'real' definitions to override the common symbol.

Diff Detail

Event Timeline

sfertile created this revision.Aug 18 2020, 7:21 AM
Herald added a project: Restricted Project. · View Herald Transcript
sfertile requested review of this revision.Aug 18 2020, 7:21 AM
NeHuang added inline comments.Aug 18 2020, 9:48 AM
lld/ELF/InputFiles.cpp
1172

nit: remove the empty line.

lld/test/ELF/common-archive-lookup.s
1

The test cases are failing on linux and windows in phabricator unit test, are we missing -triple=powerpc64le or -triple=powerpc64 when running llvm-mc?

28

nit: definitions

28

nit: missing period.

80

nit: referenced

103

Can we remove the comments # in block definitions?

158

nit: initialized

160

nit: TEST1-LABEL:

166

nit: definition from the object file, and the definitions from the archive.

169

nit: TEST2-LABEL:

The general idea makes sense to me.
I have a set of mostly minor comments here and there.

lld/ELF/InputFiles.cpp
1206

You can probably make this static since you only call it from one place that I can see.
Same with the function below.

lld/ELF/Symbols.cpp
705

You probably don't need to save and restore the type here.
You know based on the guard that the symbol is isObject() so you can just use:

replace(other);
type = llvm::ELF::STT_OBJECT;
stefanp added inline comments.Aug 18 2020, 1:33 PM
lld/ELF/InputFiles.cpp
1212

Question:
Why do you use Expected<> here?
Won't getName() always return a valid StringRef?

Since you have this here I assume you have come across a situation where you need it.

1214

I think that by definition we cannot have both sym.isDefined() and sym.isCommon() to be both true at the same time.

bool isCommon() const { return symbolKind == CommonKind; }
bool isDefined() const { return symbolKind == DefinedKind; }

You might be able to just use

return sym.isDefined();
lld/test/ELF/common-archive-lookup.s
31

nit:
Do you need the ABI version?

MaskRay added a comment.EditedAug 18 2020, 10:04 PM

Alan Modra kindly shared with me the history
https://sourceware.org/pipermail/binutils/1999-December/002952.html See
elf_link_is_defined_archive_symbol in bfd/elflink.c, which checks whether
a common symbol is a regular definition in an archive member.
LLD's current semantics are compatible with gold, and GNU ld 20 years ago
(before https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a3a8c91d411abe91720a2ac92b8140e1bdb41282)
(I have to blame Sun/HP ld at that time... Ian Lance Taylor said
"It's hard to know what to do with common symbols, because they do not
have clearly defined semantics. Different people have different
expectations of them." It was so unfortunate they picked such strange expectations)

Before we jump directly into a fix, can you create a FORTRAN example where this
special common symbol handling is reasonably used? (I use uppercase FORTRAN
because I think this is FORTRAN 77 legacy...) I know that it matches GNU ld, but
this is rather bizarre. I hope the additional (very high; likely 100+ lines)
complexity in ArchiveFile and LazyObjFile's symbol resolution can be justified.

Note that regular overriding common has an analog: global overriding weak.
Suppose both libweak.a and libglobal.a define a symbol which is referenced by undef.o.
Let's consider two link orders:

  • undef.o -lweak -lglobal will pick libweak.a and ignore libglobal.a if libglobal.a does not need fetching.
  • -lglobal undef.o -lweak does not fetch libglobal.a at all

So to provide a weak definition while allowing an optional strong definition, the strong definition needs to be surrounded in --whole-archive. Can we ask the user to
do so the similar thing for the regular overriding common case? :)

There are 3 directions:

  • We don't need to address it at all. However, it seems that you may raise a reasonable FORTRAN use case. Still, hope you can consider my previous global vs weak example.
  • Add a Mach-O style -c option to llvm-ar (ranlib): don't add common symbols to the archive index. See below. (LazyObjFile will be inconsistent but your user does not likely use it I guess)
  • We patch LLD.

If we end up patching LLD, we should avoid making Symbol::resolveLazy complex.
We need to think hard doing symbol resolution in batch, in
ArchiveFile/LazyObjFile::parse. For the tests, all the code is insignificant.
My example at https://sourceware.org/pipermail/binutils/2020-August/112878.html
is probably sufficient. As you can see, bitcode LazyObjFile requires special handling.
I need to think about the interaction and I may be able to send a patch.

In ELF the last paragraph of the section on STT_COMMON only mentions that a dynamic linker may change its resolution rules to prefer non-common over a common definition. I'm guessing extending this to static linker is something that some other linkers have done and there are some programs that currently depend on that behaviour.

Symbols with type STT_COMMON label uninitialized common blocks. In relocatable objects, these symbols are not allocated and must have the special section index SHN_COMMON (see below). In shared objects and executables these symbols must be allocated to some section in the defining object.

In relocatable objects, symbols with type STT_COMMON are treated just as other symbols with index SHN_COMMON. If the link-editor allocates space for the SHN_COMMON symbol in an output section of the object it is producing, it must preserve the type of the output symbol as STT_COMMON.

When the dynamic linker encounters a reference to a symbol that resolves to a definition of type STT_COMMON, it may (but is not required to) change its symbol resolution rules as follows: instead of binding the reference to the first symbol found with the given name, the dynamic linker searches for the first symbol with that name with type other than STT_COMMON. If no such symbol is found, it looks for the STT_COMMON definition of that name that has the largest size.

Until recently -fcommon was the default for C programs in Clang and GCC as well, although not for C++. If we do change then this could have an affect wider than Fortran. I can think of a couple of things:

  • Some programs will see different library members being selected, potentially changing behaviour.
  • For every common definition in the program we are doing library searches and potentially expensive fetches of objects from libraries.

Given that BFD supports this behaviour but Gold and LLD do not I suspect that the risk of breaking existing programs is small, especially as LLD tends to be used with more recent programs where -fno-common is the default. However I think we should be cautious and perhaps enable under a command line option, especially as it looks like this is a convention rather than part of a specification. If anyone has better references, then I'm happy to retract.

I've spent some time reading through the linked thread. Thanks. There is a lot to reply to so I hope I don't miss anything.

Before we jump directly into a fix, can you create a FORTRAN example where this special common symbol handling is reasonably used?

The Engineering and Scientific Subroutine Library (ESSL) uses common block and block data. One case that shows this problem is when using DRCFTD function, which calls an internal routine that uses a table of trig values. If we link the library into a shared object we get the expected behavior: the file with the block data overrides all the common blocks and the global symbol is properly initialized in the shared object. If we package the library into an archive and link against that then we end up missing the strong definition since an earlier object provides the tentative definition and the global symbol is zero initialized, producing broken runtime behavior.

Note that regular overriding common has an analog: global overriding weak.

I agree that the special handling for commons is ugly, and I really don't like the asymmetry it produces in relation to weak symbols any more then you do. If we approached the problem strictly from an ELF semantics perspective, then the consistency of golds behavior is arguably desirable. From a usability perspective though, our archive member handling breaks a common idiom in an old but still used dialect of a rather important language. The interaction of common symbols and archive member inclusion unfortunately breaks the FORTRAN language semantics. My understanding is there are lots of modern(ish) programs (not necessarily written in fortran) that link against old FORTRAN libraries and so IMO the trade off in consistency is worth it for the usability.

I don't think using --whole-archives to pull in the entire archive is a reasonable fix. With the libraries I was using, linking in whole archives for all the FORTRAN libraries failed due to multiply defined symbols. In the cases where a library is structured in a way that we can use the whole archive option it will lead to binary bloat in any program that needs to link against an old Fortran library.

Given that BFD supports this behaviour but Gold and LLD do not I suspect that the risk of breaking existing programs is small, especially as LLD tends to be used with more recent programs where -fno-common is the default. However I think we should be cautious and perhaps enable under a command line option, especially as it looks like this is a convention rather than part of a specification. If anyone has better references, then I'm happy to retract.

I agree the new behavior needs an option to enable it and disable it. Initially I considered having the Fortran driver pass the option to LLD, but after discussing it offline with @rzurob I think it might be best to enable the behavior by default. The consumer of these libraries aren't necessarily Fortran programs, and as you mentioned its likely to be lower risk because we are adopting a behavior already implemented in ld.bfd.

If we end up patching LLD, we should avoid making Symbol::resolveLazy complex.
We need to think hard doing symbol resolution in batch, in
ArchiveFile/LazyObjFile::parse. For the tests, all the code is insignificant.
My example at https://sourceware.org/pipermail/binutils/2020-August/112878.html
is probably sufficient. As you can see, bitcode LazyObjFile requires special handling.
I need to think about the interaction and I may be able to send a patch.

That is why I posted a WIP patch early. So we could figure out what the best direction is. I'm open to any suggestions.

MaskRay added a comment.EditedAug 24 2020, 4:49 PM

Given that BFD supports this behaviour but Gold and LLD do not I suspect that the risk of breaking existing programs is small, especially as LLD tends to be used with more recent programs where -fno-common is the default. However I think we should be cautious and perhaps enable under a command line option, especially as it looks like this is a convention rather than part of a specification. If anyone has better references, then I'm happy to retract.

I agree the new behavior needs an option to enable it and disable it. Initially I considered having the Fortran driver pass the option to LLD, but after discussing it offline with @rzurob I think it might be best to enable the behavior by default. The consumer of these libraries aren't necessarily Fortran programs, and as you mentioned its likely to be lower risk because we are adopting a behavior already implemented in ld.bfd.

There are 4 combinations: (ArchiveFile,LazyObjFile) x (ELF, BitcodeFile). LazyObjFile is for --start-lib/--end-lib. Do you think whether it is worth making LazyObjFile or BitcodeFile consistent with ArchiveFile x ELF behavior? If we add an option (say, --fortran-linking, the Dec 1999 binutils thread suggested but did not adopt), we will have better justification that we don't make LazyObjFile or BitcodeFile consistent.

I do worry a bit about the time complexity of the regular-overriding-common behavior, which may be a problem for my internal users (Bazel: --start-lib; we use very few ArchiveFiles) and external users (mostly ArchiveFiles).

If we use an option --fortran-linking, we can mostly get away with the time complexity problem, because most users will not pass the option and will not observe any potential performance problems. The time complexity problem can be solved by looking up symbols in batch but the overall complexity may be larger.

If we end up patching LLD, we should avoid making Symbol::resolveLazy complex.
We need to think hard doing symbol resolution in batch, in
ArchiveFile/LazyObjFile::parse. For the tests, all the code is insignificant.
My example at https://sourceware.org/pipermail/binutils/2020-August/112878.html
is probably sufficient. As you can see, bitcode LazyObjFile requires special handling.
I need to think about the interaction and I may be able to send a patch.

That is why I posted a WIP patch early. So we could figure out what the best direction is. I'm open to any suggestions.

Thanks!

Given that BFD supports this behaviour but Gold and LLD do not I suspect that the risk of breaking existing programs is small, especially as LLD tends to be used with more recent programs where -fno-common is the default. However I think we should be cautious and perhaps enable under a command line option, especially as it looks like this is a convention rather than part of a specification. If anyone has better references, then I'm happy to retract.

I agree the new behavior needs an option to enable it and disable it. Initially I considered having the Fortran driver pass the option to LLD, but after discussing it offline with @rzurob I think it might be best to enable the behavior by default. The consumer of these libraries aren't necessarily Fortran programs, and as you mentioned its likely to be lower risk because we are adopting a behavior already implemented in ld.bfd.

There are 4 combinations: (ArchiveFile,LazyObjFile) x (ELF, BitcodeFile). LazyObjFile is for --start-lib/--end-lib. Do you think whether it is worth making LazyObjFile or BitcodeFile consistent with ArchiveFile x ELF behavior? If we add an option (say, --fortran-linking, the Dec 1999 binutils thread suggested but did not adopt), we will have better justification that we don't make LazyObjFile or BitcodeFile consistent.

IMO, it is important to make it consistent.

I do worry a bit about the time complexity of the regular-overriding-common behavior, which may be a problem for my internal users (Bazel: --start-lib; we use very few ArchiveFiles) and external users (mostly ArchiveFiles).

I think there should not be a problem with time complexity, because there is only extra work to do in uncommon situations:

  1. Only if you see a common symbol, do you need to try to find additional lazy definitions of the same symbol name. This will be very rare for C code now, due to -fno-common being the default.
  2. Only if the common symbol's name is present in another archive symbol-table do you need to load the archive member to check whether that second definition is also common or not. This will be even more rare, since it will only trigger, in C code, when you have an actually-incorrect use (e.g. "int x" in a header instead of "extern int x").

If we use an option --fortran-linking, we can mostly get away with the time complexity problem, because most users will not pass the option and will not observe any potential performance problems. The time complexity problem can be solved by looking up symbols in batch but the overall complexity may be larger.

I think it's unnecessary to have an option, due to the above. C users should not generally observe performance problems with this feature, because it won't trigger for them.

sfertile updated this revision to Diff 291913.Sep 15 2020, 7:31 AM

Extended the patch so that lazy object files are handled the same as archives, and bitcode files are handled same as native objects, both in archives and as lazy object files.

MaskRay added inline comments.Sep 20 2020, 11:00 AM
lld/ELF/InputFiles.cpp
1208

range-based for

1212

if is misaligned. equals -> ==

1220

static

1227

range based for

lld/ELF/Symbols.cpp
696

Delete isObject()

A common symbol is a subset of an object symbol. (If it doesn't, it is an error)

lld/test/ELF/Inputs/blockdata.ll
1 ↗(On Diff #291913)

Use RUN: split-file for short files.

Replace double array with a simple i32.

lld/test/ELF/Inputs/commonblock.ll
6 ↗(On Diff #291913)

Keeping one function is sufficient to demonstrate the effects.

lld/test/ELF/common-archive-lookup.s
67

The test is much more complex than it needs to be. You just need one reference. All the setup (TOC, addi, .localentry) and teardown (blr) instructions should be deleted not to harm readability.

sfertile updated this revision to Diff 293830.Sep 23 2020, 12:02 PM

Address review comments, switched test to use split-file instead of other inputs and macros, and simplified the test as much as possible.

sfertile marked 16 inline comments as done.Sep 23 2020, 12:05 PM

We usually use x86 for generic feature tests (sorry). I am fine with ppc but the test is probably worth a comment that some FORTRAN libraries on ppc require this behavior (i.e. if you no longer require this behavior and if we need some symbol table refactoring where this behavior gets in the way, we should have the right to revisit the decision. SHN_COMMON will get more and more obsoleted and hope we don't need this particular behavior in the future ;-) )

Gold behaves like LLD and internally we have much Fortran code (x86/ppc) which does not tickle this property:)

https://sourceware.org/pipermail/binutils/2020-August/112926.html

COMMON is also one of those dark corners with multiple implementations, so I'm not at all surprised to learn that some linkers handle it differently. Most code doesn't tickle those cases, so these differences generally go unnoticed.

In any case I have to do a large run internally because we still use -fcommon :)

lld/ELF/InputFiles.cpp
1210

sym.isGlobal() should be checked along with the name, because a local symbol should be ignored.

1224

Nit: parens around == are unneeded

1230

static

lld/ELF/Symbols.cpp
693

static

lld/test/ELF/common-archive-lookup.s
2

We usually use x86 for generic features (sorry). I am fine with ppc but this is probably worth a comment that some FORTRAN libraries on ppc require this behavior (i.e. if you no longer require this behavior and if we need some symbol table refactoring where this behavior gets in the way, we should have the right to revisit the decision. SHN_COMMON will get more and more obsoleted and hope we don't need this particular behavior in the future ;-) )

Gold behaves like LLD and internally we have much Fortran code (x86/ppc) which does not tickle this property:)

https://sourceware.org/pipermail/binutils/2020-August/112926.html

COMMON is also one of those dark corners with multiple implementations, so I'm not at all surprised to learn that some linkers handle it differently. Most code doesn't tickle those cases, so these differences generally go unnoticed.

In any case I have to do a large run internally because we still use -fcommon :)

61

For newer tests we use ## for non-RUN-non-CHECK comments.

95

.comm changes st_type, so .type is redundant.

135

Don't mix different indentations. 2 may be sufficient.

sfertile updated this revision to Diff 294399.Sep 25 2020, 12:24 PM
  • Add static on helper functions.
  • Use ## for comments in lit test.
  • Remove redundant parens
  • Ignore locals when looking for overriding symbol def.
sfertile added inline comments.Sep 25 2020, 12:40 PM
lld/test/ELF/common-archive-lookup.s
2

In any case I have to do a large run internally because we still use -fcommon :)

Sorry for the churn. On the bright side if this change impacts any internal projects you can use it to help convince them to stop using -fcommon :).

this is probably worth a comment that some FORTRAN libraries on ppc require this behavior.

The motivating example is ESSL on PPC, but this isn't a PPC distinct problem. From what our fortran compiler devs tell me, its not unusual for heavy fortran users to have one or two ancient libraries they depend on that were written in pre-F90, and no one touches them to port to newer language dialects.

In any case I have to do a large run internally because we still use -fcommon :)

Sorry for the churn. On the bright side if this change impacts any internal projects you can use it to help convince them to stop using -fcommon :).

Indeed I have noticed some dependent behaviors. Only two problems (potential ODR violation in an openssl; a qemu not using -fno-common) with a little noise. It is likely we can fix the issues in one week.

MaskRay added inline comments.Sep 28 2020, 2:44 PM
lld/ELF/InputFiles.h
312

name

lld/ELF/Symbols.cpp
706

dyn_cast -> cast ?

sfertile updated this revision to Diff 301308.Oct 28 2020, 9:31 AM
  • Added comment in the test describing why the behaviour change is needed.
  • Converted a dyn_cast to a cast.
  • Fixed paramater name to use lowercase.
  • Fixed .size directives in the test to refer to the correct object.

...
Given that BFD supports this behaviour but Gold and LLD do not I suspect that the risk of breaking existing programs is small, especially as LLD tends to be used with more recent programs where -fno-common is the default. However I think we should be cautious and perhaps enable under a command line option, especially as it looks like this is a convention rather than part of a specification. If anyone has better references, then I'm happy to retract.

The code change looks good to me, but I hope @psmith and @grimar can take a look. The question from Peter stands (if we adopt an option it could be --fortran-linking)

lld/ELF/InputFiles.cpp
1205

The two functions need a detailed explanation why it does so. It can quote the thread https://sourceware.org/pipermail/binutils/2020-August/112878.html

I've left a few small suggestions but overall no objections. Given that this is the default BFD behaviour and programs that have only been linked with LLD are not likely to use common; I think that if this behaviour is to be put under an option, it is on by default. This will permit users to use both GNU ld and lld with the same command line.

lld/ELF/InputFiles.cpp
1205

Uncommon could be interpreted as rare rather than not. Perhaps use Noncommon or NonCommon instead?

lld/ELF/InputFiles.h
310

strong isn't defined in ELF, although intuitively I know what you mean. Consider:
"Check if a non-common symbol should be fetched to override a common definition."

329

Same comment about strong

lld/ELF/Symbols.cpp
705

Could this be else if (auto *loSym = dyn_cast<LazyObject>(&other)) ?

sfertile updated this revision to Diff 303160.Nov 5 2020, 10:18 AM
sfertile marked an inline comment as done.
  • Add comments about why we have the special lookup behaviour for common in archives..
  • Use NonCommon as opposed to UnCommon
  • switch second if to be an else if instead.

I think that if this behaviour is to be put under an option, it is on by default. This will permit users to use both GNU ld and lld with the same command line.

Going to add this now, but I think I'll use --fortran-commons instead of --fortran-linking if there are no objections.

I think that if this behaviour is to be put under an option, it is on by default. This will permit users to use both GNU ld and lld with the same command line.

Going to add this now, but I think I'll use --fortran-commons instead of --fortran-linking if there are no objections.

Use the singular form --fortran-common?

sfertile updated this revision to Diff 303925.Nov 9 2020, 10:34 AM
  • Added a detailed comment trying to capture some of the linked discussion and distill why we want the more complicated lookup behaviour (at least by default).
  • Added an option -fortran-common to toggle the new behaviour on/off.
MaskRay added inline comments.Nov 9 2020, 1:11 PM
lld/ELF/InputFiles.cpp
1207

typo: tentative

1216

GOLD -> Gold or gold

1221

BFD-> GNU ld

lld/ELF/Options.td
68 ↗(On Diff #303925)
69 ↗(On Diff #303925)
lld/ELF/Symbols.cpp
695

Do you have a --warn-backrefs test for backwardReferences.erase(&oldSym); ?

lld/test/ELF/common-archive-lookup.s
8

21

113

sfertile updated this revision to Diff 304843.Nov 12 2020, 8:02 AM
  • Add testing for warn-backrefs interaction.
  • Spelling fixes
  • Fix comments to use ## instead of # in lit test.
sfertile marked 9 inline comments as done.Nov 12 2020, 8:04 AM
MaskRay accepted this revision.Nov 17 2020, 2:19 PM

Thanks for the update. LGTM. It'd be wise to wait for @grimar or @psmith's opinion.

lld/ELF/Options.td
69 ↗(On Diff #304843)

Delete the leading space

This revision is now accepted and ready to land.Nov 17 2020, 2:19 PM
MaskRay added a comment.EditedNov 17 2020, 2:20 PM

Please also update docs/ld.lld.1 for the new option --no-fortran-common (we usually just document the non-default option in the manpage)

sfertile updated this revision to Diff 308974.Dec 2 2020, 8:12 AM
  • Removed leading space in option description
  • Updated the manpage doc.
sfertile updated this revision to Diff 308976.Dec 2 2020, 8:23 AM

Fixed one too many hyphens in man page change.

If there is no objections, I'll go ahead and commit later this week.

This revision was landed with ongoing or failed builds.Dec 7 2020, 7:10 AM
This revision was automatically updated to reflect the committed changes.
dmajor added a subscriber: dmajor.Feb 17 2021, 9:41 AM

We're seeing errors about duplicate gcov symbols in Rust code after this change: https://bugs.llvm.org/show_bug.cgi?id=49226

Is it expected that this commit would affect non-Fortran code?

We're seeing errors about duplicate gcov symbols in Rust code after this change: https://bugs.llvm.org/show_bug.cgi?id=49226

Is it expected that this commit would affect non-Fortran code?

@dmajor Yes, any object file with common symbols (SHN_COMMON or (rare) STT_COMMON) may observed changed behaviors.
clang 11 and gcc 10 default to -fno-common, so you probably will not observe SHN_COMMON.

LLD<12, gold, macOS ld64 and GNU ld before 1999-12 have the --no-fortran-common behavior.
LLD>=12, GNU ld since 1999-12 have the --fortran-common behavior.

Folks, I realized that --fortran-common is not a good default and created D122450 to restore the previous behavior.

Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2022, 8:48 AM
Herald added a subscriber: StephenFan. · View Herald Transcript