This is an archive of the discontinued LLVM Phabricator instance.

[MC] Set sh_link to 0 if the associated symbol is undefined
ClosedPublic

Authored by MaskRay on Jan 16 2020, 11:35 PM.

Details

Summary

Part of https://bugs.llvm.org/show_bug.cgi?id=41734

LTO can drop externally available definitions. Such AssociatedSymbol is
not associated with a symbol. ELFWriter::writeSection() will assert.

Allow a SHF_LINK_ORDER section to have sh_link=0.

We need to give sh_link a syntax, a literal zero in the linked-to symbol
position, e.g. .section name,"ao",@progbits,0

Diff Detail

Event Timeline

MaskRay created this revision.Jan 16 2020, 11:35 PM

Unit tests: fail. 61935 tests passed, 2 failed and 783 were skipped.

failed: LLVM-Unit.TableGen/_/TableGenTests/Automata.BinPackerAutomatonAccepts
failed: LLVM-Unit.TableGen/_/TableGenTests/Automata.BinPackerAutomatonExplains

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

MaskRay planned changes to this revision.Jan 17 2020, 12:18 AM

Still investigating which approach is the best:

  • Allow SHF_LINK_ORDER + sh_link=0 : this will require more changes in CodeGen/MC
  • Drop SHF_LINK_ORDER when sh_link=0
MaskRay updated this revision to Diff 238705.Jan 17 2020, 12:37 AM
MaskRay retitled this revision from [MC] Drop SHF_LINK_ORDER and set sh_link to 0 if the associated symbol is undefined to [MC] Set sh_link to 0 if the associated symbol is undefined.
MaskRay edited the summary of this revision. (Show Details)
MaskRay removed a subscriber: merge_guards_bot.

Allow SHF_LINK_ORDER sh_link=0

Unit tests: fail. 61934 tests passed, 3 failed and 783 were skipped.

failed: LLVM-Unit.TableGen/_/TableGenTests/Automata.BinPackerAutomatonAccepts
failed: LLVM-Unit.TableGen/_/TableGenTests/Automata.BinPackerAutomatonExplains
failed: libc++.std/thread/thread_mutex/thread_mutex_requirements/thread_mutex_requirements_mutex/thread_mutex_class/try_lock.pass.cpp

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

The implementation looks good to me. Will defer to others that know -fsanitize-coverage=inline-8bit-counters and LTO better. There was a comment in the PR about adding a dummy non-gcable section instead, I think that this would have the same effect as having the linker handle sh-link == 0, but it might be more portable to other linkers. Any thoughts on whether a dummy section would work better?

MaskRay added a comment.EditedJan 17 2020, 1:41 PM

I think we have 3 choices to represent the linked-to section.

  1. Allow SHF_LINK_ORDER with sh_link=0. The ELF spec has a requirement about the link order[1] (original semantics of SHF_LINK_ORDER). I think allowing sh_link=0 should be vaguely compatible because we can interpret sh_link=0 as referencing the first section SHN_UNDEF.
  2. Add a non-GC-root dummy section. This is a bit awkward to implement in MC because we have to pick a section name. The section may be combined with other normal sections with the same name. We need to be careful and not cause an "incompatible section flags" error (lld specific).
  3. Add a GC-root dummy section, for example, SHT_INIT_ARRAY, and name it .init_array, so that it can be comined with other .init_array sections. Note other sh_link!=0 sections can be garbage collected.

I want to avoid 3 because it causes a behavior difference. 2 can make the linker logic straightforward but the awkwardness remains in the assembler. If we ever support -r --gc-sections, note that the linked-to section can still be collected while the metadata section is alive. We will still face the problem how to represent such metadata sections whose linked-to sections are collected.

Considering all the above, I am inclined to do 1 (this patch).

[1]: SHF_LINK_ORDER - This flag adds special ordering requirements for link editors. The requirements apply if the sh_link field of this section's header references another section (the linked-to section). If this section is combined with other sections in the output file, it must appear in the same relative order with respect to those sections, as the linked-to section appears with respect to sections the linked-to section is combined with.

pcc added a comment.EditedJan 17 2020, 1:56 PM
  1. Add a non-GC-root dummy section. This is a bit awkward to implement in MC because we have to pick a section name. The section may be combined with other normal sections with the same name. We need to be careful and not cause an "incompatible section flags" error (lld specific).

For this you could use .text or something else relatively common with well known flag values. We could do this in other cases where the section is known to be GCable (i.e. not !associated). But I don't think we should do this for !associated because:

  1. Add a GC-root dummy section, for example, SHT_INIT_ARRAY, and name it .init_array, so that it can be comined with other .init_array sections. Note other sh_link!=0 sections can be garbage collected.

I want to avoid 3 because it causes a behavior difference.

We use the !associated metadata to mean that the global *may* be discarded when the referenced global is discarded, not that it *must* be discarded in that case. If the referenced global pointer is null, it could mean that an optimization pass has combined multiple globals (see e.g. GlobalMerge pass) and not updated the metadata. This is valid for a pass to do since passes are not required to update metadata. For example:

@a = global i32 1
@b = global i32 2
@c = global i32 3, section "foo", !associated @a
@d = global i32 4, section "foo", !associated @b

becomes

@ab = global {i32 1, i32 2}
@c = global i32 3, section "foo", !associated gep(@ab, 0, 0) ; could become !associated null as a result of discarding unused ConstantExprs
@d = global i32 4, section "foo", !associated gep(@ab, 0, 1) ; ditto

In this case we should keep @c and @d despite associated metadata potentially being null, because @ab may still be alive.

MaskRay updated this revision to Diff 238898.Jan 17 2020, 2:51 PM

Simplify. Add a comment. Fix a comment in the test.

Unit tests: fail. 61978 tests passed, 1 failed and 783 were skipped.

failed: LLVM.Bindings/Go/go.test

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

MaskRay added subscribers: sanjoy, hfinkel.EditedJan 23 2020, 8:24 PM

I am at a loss how to proceed now.

Adding people listed in D29104 (add !associated) and D29110 (allowed metadata dropping transforms).

I am at a loss how to proceed now.

Adding people listed in D29104 (add !associated) and D29110 (allowed metadata dropping transforms).

Apologies, not had much time to keep up this week. To check my understanding, we have:

  • A section S with a SHF_LINK_ORDER dependency with respect to a symbol that no longer has a symbol.
  • When the symbol is associated then S can be GC'ed
  • When the symbol is !associated then S can't be GC'ed
  • We need to find something that must not GC !associated symbols and would (ideally?) permit S to be GC'ed when the symbol is associated.
  • We would like this to be generic enough to not blow up, or do the wrong thing on other linkers, although I fear that might not be possible as I'm not sure SHF_LINK_ORDER behaves well outside 1 - 1 meta-data to non-metadata.

I've not got any answers right now. In LLD allowing Section S with SHF_LINK_ORDER and sh_link=0 and making the section non-GCable would work. I wouldn't be surprised if GNU ld or Gold did the wrong thing though.

Reverse-ping, thanks.

MaskRay added a subscriber: phosek.Jul 16 2020, 2:36 PM

@pcc @phosek If we want to allow sh_link=0 for SHF_LINK_ORDER sections, we should have an assembler syntax (otherwise ld.lld --lto-emit-asm may fail)
We will need ELFObjectWriter.cpp and TargetLoweringObjectFileImpl.cpp changes as made in this patch.

We probably should add the following as well:

--- a/llvm/lib/MC/MCParser/ELFAsmParser.cpp
+++ b/llvm/lib/MC/MCParser/ELFAsmParser.cpp
@@ -450,8 +450,14 @@ bool ELFAsmParser::parseLinkedToSym(MCSymbolELF *&LinkedToSym) {
   Lex();
   StringRef Name;
   SMLoc StartLoc = L.getLoc();
-  if (getParser().parseIdentifier(Name))
+  if (getParser().parseIdentifier(Name)) {
+    if (getParser().getTok().getString() == "0") {
+      getParser().Lex();
+      LinkedToSym = nullptr;
+      return false;
+    }
     return TokError("invalid linked-to symbol");
+  }
   LinkedToSym = dyn_cast_or_null<MCSymbolELF>(getContext().lookupSymbol(Name));
   if (!LinkedToSym || !LinkedToSym->isInSection())
     return Error(StartLoc, "linked-to symbol is not in a section: " + Name);

For

declare void @foo()

@a = global i32 1, !associated !0
@b = global i32 1, !associated !0

!0 = !{void ()* @foo}

do we still want a diagnostic? If yes, we should do it in a higher level, not in MC.

pcc added a comment.Jul 21 2020, 12:10 PM

Doesn't this require a change to the parser to accept 0 in place of the linked symbol?

MaskRay updated this revision to Diff 279620.EditedJul 21 2020, 1:14 PM
MaskRay edited the summary of this revision. (Show Details)
MaskRay edited reviewers, added: psmith; removed: peter.smith.

@pcc Doesn't this require a change to the parser to accept 0 in place of the linked symbol?

Fold my AsmParser changes (https://reviews.llvm.org/D72899#2157020 ) into the patch

pcc accepted this revision.Aug 3 2020, 1:23 PM

LGTM

This revision is now accepted and ready to land.Aug 3 2020, 1:23 PM
This revision was landed with ongoing or failed builds.Aug 3 2020, 1:44 PM
This revision was automatically updated to reflect the committed changes.