This is an archive of the discontinued LLVM Phabricator instance.

Don't remove COMDATs when internalizing global objects
AbandonedPublic

Authored by MaskRay on Oct 12 2018, 6:18 PM.

Details

Summary

Background:

The LLVM Language Reference states that "All global objects that specify this key will only end up in the final object file if the linker chooses that key over some other key" - e.g. it shouldn't be possible for only some, but not all, of the objects in a COMDAT to be present in the final object.

However, Internalizer currently removes non-externally-visible COMDATs when internalizing global objects. The commit which added this behavior states that passes like GlobalDCE can legally remove individual COMDAT members. However, this is no longer correct - GlobalDCE always marks other COMDAT members as live when examining a module.

This behavior can end up causing issues involving the BFD linker and MD_Associated metadata, since the removed COMDAT allows GlobalDCE to delete the target of the MD_Associated metadata. Keeping the COMDAT causes the MD_Associated-annotated object and its target to be kept or removed as a unit, which is the correct behavior.

Changes:

This patch stops Internalizer from removing COMDATs from global objects, and updates the associated test to reflect this new behavior.

Diff Detail

Event Timeline

Aaron1011 created this revision.Oct 12 2018, 6:18 PM

However, Internalizer currently removes non-externally-visible COMDATs when internalizing global objects. The commit which added this behavior https://github.com/llvm-mirror/llvm/commit/4a6d99b0a96d4eb27d89c22e33981ff0344c5737 states that passes like GlobalDCE can legally remove individual COMDAT members.

My reading of that commit (D10679), and looking at the code, is that GlobalDCE would be able to remove these members because they should *all* be removed from the comdat by internalization. The change appears to make it such that internalization cannot proceed if any member of the comdat is externally visible and therefore cannot be internalized. If all members of the comdat can be internalized (i.e. not in the ExternalComdats set), then they can all be internalized and the comdat removed from all members.

Are you seeing a case where after this internalization we are left with an incomplete comdat that still has some members but not others?

This behavior can end up causing issues involving the BFD linker and MD_Associated metadata, since the removed COMDAT allows GlobalDCE to delete the target of the MD_Associated metadata. Keeping the COMDAT causes the MD_Associated-annotated object and its target to be kept or removed as a unit, which is the correct behavior.

Sorry, I don't follow what is happening - do you have a test case showing the incorrect behavior?

My reading of that commit (D10679), and looking at the code, is that GlobalDCE would be able to remove these members because they should *all* be removed from the comdat by internalization. The change appears to make it such that internalization cannot proceed if any member of the comdat is externally visible and therefore cannot be internalized. If all members of the comdat can be internalized (i.e. not in the ExternalComdats set), then they can all be internalized and the comdat removed from all members.

The issue only occurs when a pass like GlobalDCE runs after Internalizer. Since the COMDAT has been removed, GlobalDCE might remove only one member from the (now deleted) COMDAT, leaving other members intact.
I've added a testcase demonstrating this.

My reading of that commit (D10679), and looking at the code, is that GlobalDCE would be able to remove these members because they should *all* be removed from the comdat by internalization. The change appears to make it such that internalization cannot proceed if any member of the comdat is externally visible and therefore cannot be internalized. If all members of the comdat can be internalized (i.e. not in the ExternalComdats set), then they can all be internalized and the comdat removed from all members.

The issue only occurs when a pass like GlobalDCE runs after Internalizer. Since the COMDAT has been removed, GlobalDCE might remove only one member from the (now deleted) COMDAT, leaving other members intact.
I've added a testcase demonstrating this.

Normally once they are both internalized and removed from the comdat, it shouldn't matter if only one is kept by GlobalDCE. The other is simply an internal function and no longer part of a comdat (so it's not like the comdat is incomplete). Is the issue here that there is a dangling reference or undef from the !0 metadata after GlobalDCE? I'm not familiar with the associated metadata type, but shouldn't this metadata reference be preventing GlobalDCE from removing unused_fn? Would it be correct if they were both removed from the comdat but kept by GlobalDCE?

Hopefully @pcc will comment, since this was his change. I am not sure preventing the comdat removal in LTO is correct or desired.

Normally once they are both internalized and removed from the comdat, it shouldn't matter if only one is kept by GlobalDCE. The other is simply an internal function and no longer part of a comdat (so it's not like the comdat is incomplete).

My understanding is that deleting the COMDAT is invalid. The LLVM Language Reference doesn't mention anything about removing COMDATS during LTO.
From that perspective, the dangling !associated metadata is simply a consequence of the underlying issue of the COMDAT removal.

Normally once they are both internalized and removed from the comdat, it shouldn't matter if only one is kept by GlobalDCE. The other is simply an internal function and no longer part of a comdat (so it's not like the comdat is incomplete).

My understanding is that deleting the COMDAT is invalid.

That's not my understanding (at least in general for comdats, this may be a special case, see below), but hopefully Peter will clarify, as he's more familiar with the linker aspects.

The LLVM Language Reference doesn't mention anything about removing COMDATS during LTO.

True, although that section relates more to the semantics of comdat selection for things that stay comdats. I'm not sure the language ref needs to cover allowable optimizations.

From that perspective, the dangling !associated metadata is simply a consequence of the underlying issue of the COMDAT removal.

Reading through https://llvm.org/docs/LangRef.html#associated-metadata makes me think that comdats involving associated metadata need to be handled a little differently, since the object with the metadata should not be removed unless the object it references is removed, and the language reference specifically says they should be in the same comdat.

So perhaps this code needs to handle comdat containing an object with associated metadata specially (e.g. could just include in the ExternalComdats set)?

So perhaps this code needs to handle comdat containing an object with associated metadata specially (e.g. could just include in the ExternalComdats set)?

My concern is that the current behavior would still be incorrect for other uses of COMDATS - or at the very least, quite counterintuitive.
If I specify a COMDAT for two global objects, I would expect one of the following things to happen:

  1. Both objects appear in the final object, both still in the COMDAT.
  2. Neither object appears in the final object, and the COMDAT section does not exist (both objects were optimized out).

Having only one of the two objects be present, and having it lack a COMDAT, seems very unexpected to me - especially considering that the COMDAT section
can be read using external tools (e.g. readelf).

So perhaps this code needs to handle comdat containing an object with associated metadata specially (e.g. could just include in the ExternalComdats set)?

My concern is that the current behavior would still be incorrect for other uses of COMDATS - or at the very least, quite counterintuitive.
If I specify a COMDAT for two global objects, I would expect one of the following things to happen:

  1. Both objects appear in the final object, both still in the COMDAT.
  2. Neither object appears in the final object, and the COMDAT section does not exist (both objects were optimized out).

Having only one of the two objects be present, and having it lack a COMDAT, seems very unexpected to me - especially considering that the COMDAT section
can be read using external tools (e.g. readelf).

I think that there is a little confusion here. My understanding is:

  1. The internalize pass is only used for LTO - where the compiler has a global view of the whole program.
  2. The comdat mechanism (in ELF at least) is designed to handle C++ templates and had the explicit design goal of avoiding requiring a global view of the whole program.

Intuitively, we would not expect comdats in LTO object files as they are not required. LTO can remove the comdats (similarly, so can the system linker). In fact, lld actually ignores comdat groups in the LTO object files (see, the rather misnamed, addCombinedLTOObject in lld/ELF/SymbolTable.cpp), as the decisions about comdats should have been made during scan and then honored by LTO.

Normally once they are both internalized and removed from the comdat, it shouldn't matter if only one is kept by GlobalDCE. The other is simply an internal function and no longer part of a comdat (so it's not like the comdat is incomplete).

My understanding is that deleting the COMDAT is invalid.

That's not my understanding (at least in general for comdats, this may be a special case, see below), but hopefully Peter will clarify, as he's more familiar with the linker aspects.

The LLVM Language Reference doesn't mention anything about removing COMDATS during LTO.

True, although that section relates more to the semantics of comdat selection for things that stay comdats. I'm not sure the language ref needs to cover allowable optimizations.

From that perspective, the dangling !associated metadata is simply a consequence of the underlying issue of the COMDAT removal.

Reading through https://llvm.org/docs/LangRef.html#associated-metadata makes me think that comdats involving associated metadata need to be handled a little differently, since the object with the metadata should not be removed unless the object it references is removed, and the language reference specifically says they should be in the same comdat.

So perhaps this code needs to handle comdat containing an object with associated metadata specially (e.g. could just include in the ExternalComdats set)?

I don't think that this should be necessary. My understanding is that GlobalDCE does not understand (by design) the gc semantics implied by !associated therefore globals with such metadata are required to be added to llvm.compiler.used to anchor them. Presumably, such globals will contain references to the "associated" global which will also anchor those globals. Eventually these will be gc'ed by the system linker. Given that, I can't see the problem that you are trying to solve. Can you post the output of the bfd linker, and a testcase which reproduces the failure with bfd?

Intuitively, we would not expect comdats in LTO object files as they are not required.

I'm not certain what you mean by 'not required'. My view is that LLVM should always respect COMDATs in the IR (e.g. keep them in place, and respect removal rules),
since it can't know the exact reason why they were added in the IR. Leaving the COMDAT in place is always a correct option, but removing the COMDAT might break
things that aren't expected it.

LTO can remove the comdats (similarly, so can the system linker)

Yes, but it needs to follow the rules set out in the spec:

In the ELF spec, §12 states that "these groups are included, or these groups are omitted, from the linked object as a unit".

When the linker removes COMDATS, it ensures that it either removes all of the members, or none of them.
Right now, LLVM is removing the COMDAT group, but leaving all of its members intact - something that the system linker will never do

Presumably, such globals will contain references to the "associated" global which will also anchor those globals.

This is not the case for CoverageSanitizer and AddressSanitizer. The globals with !associated metadata are simply global variables.
There is a 'usage link' from the instrumented function to the global (since the function writes/reads from the global), but the only link from the global back to the function is the !associated
metadata and the COMDAT.

Can you post the output of the bfd linker, and a testcase which reproduces the failure with bfd?

I've created a self-contained example at https://github.com/Aaron1011/llvm_lto_ir
It should fail without my patch, and succeed with it.

Hi Araon,

I am extremely busy at the moment so I'm sorry if I am slow to reply.

Intuitively, we would not expect comdats in LTO object files as they are not required.

I'm not certain what you mean by 'not required'. My view is that LLVM should always respect COMDATs in the IR (e.g. keep them in place, and respect removal rules),
since it can't know the exact reason why they were added in the IR. Leaving the COMDAT in place is always a correct option, but removing the COMDAT might break
things that aren't expected it.

My understanding is that by design LTO can do system linker "level" optimizations. The object file/s produced by LTO are not supposed to work correctly with the original input files to the link. Tools that use the produced object files must expect that transformations such as COMDAT removal have been applied.

LTO can remove the comdats (similarly, so can the system linker)

Yes, but it needs to follow the rules set out in the spec:

In the ELF spec, §12 states that "these groups are included, or these groups are omitted, from the linked object as a unit".

When the linker removes COMDATS, it ensures that it either removes all of the members, or none of them.
Right now, LLVM is removing the COMDAT group, but leaving all of its members intact - something that the system linker will never do

I believe this is a misinterpretation of the specification. System linkers decide what to include from the input files and then apply further optimizations such as gc-sections, or removing all debug sections. These further optimizations are completely separate from group resolution - so yes the system linker is perfectly legal to remove some parts of a COMDAT group from the final output. I believe that the bfd linker does not remove individual parts of comdat groups when doing gc-sections but gold does... this is a historical mistake in the bfd linker.

Presumably, such globals will contain references to the "associated" global which will also anchor those globals.

This is not the case for CoverageSanitizer and AddressSanitizer. The globals with !associated metadata are simply global variables.
There is a 'usage link' from the instrumented function to the global (since the function writes/reads from the global), but the only link from the global back to the function is the !associated
metadata and the COMDAT.

Right, but in this case, if the function is removed then the !associated metadata will be dangling and should have no effect on the output object file, no?

Can you post the output of the bfd linker, and a testcase which reproduces the failure with bfd?

I've created a self-contained example at https://github.com/Aaron1011/llvm_lto_ir
It should fail without my patch, and succeed with it.

Thanks. I will take a look when I can.

so yes the system linker is perfectly legal to remove some parts of a COMDAT group from the final output.

That would seem to defeat the entire point of section groups - their whole purpose is to force certain sections to stay together.

Right, but in this case, if the function is removed then the !associated metadata will be dangling and should have no effect on the output object file, no?

The BFD linker will refuse to link if two combined sections (e.g. with the same name) have different SHF_LINKER_ORDER flags. When the !associated metadata is removed, LLVM wont emit the SHF_LINK_ORDER flag for one of the sections.

This comment was removed by bd1976llvm.

Trying again... Phabricator didn't like the markup in my last comment attempt.

Hi Aaron,

Sorry. V.busy as warned. Just got around to looking at this again. Great testcase! It also reproduces with LLD.

so yes the system linker is perfectly legal to remove some parts of a COMDAT group from the final output.

That would seem to defeat the entire point of section groups - their whole purpose is to force certain sections to stay together.

The point of comdat groups is that only one of the groups of sections from each duplicate comdat set is in linked in from the input files. There are no comdat groups in the output of a normal link (excluding -r links).

Right, but in this case, if the function is removed then the !associated metadata will be dangling and should have no effect on the output object file, no?

The BFD linker will refuse to link if two combined sections (e.g. with the same name) have different SHF_LINKER_ORDER flags. When the !associated metadata is removed, LLVM wont emit the SHF_LINK_ORDER flag for one of the sections.

Thinking out loud a bit here.. consider these solutions to the problem:

  1. Remove the warning from gnu-ld and other linkers.
    • Seems problematic as it seems like there would be ambiguity as to how to order the sections of an output section where SHF_LINK_ORDER is set on some but not all input sections.
  1. Retain the comdats in the output (this patch).
    • Limits the optimization potential of LTO.
    • !associated metadata sections don't need to be in a comdats.
  1. Add GV's with !associated metadata to llvm.compiler.used.
    • Limits the optimization potential of LTO (but not as much as 2.).
  1. Teach globalDCE to understand the semantics of !associated metadata.
    • Allows the best potential for optimization in LTO, but; this goes against the current design of !associated metadata.

Given the above, I think that 3. would be the path of least resistance for now.

The point of comdat groups is that only one of the groups of sections from each duplicate comdat set is in linked in from the input files. There are no comdat groups in the output of a normal link (excluding -r links).

Even if the COMDAT data itself isn't included in the final object, LLVM should still respect the rules for removing COMDAT objects before it produces the final object. That is, even if the final object does not
show that the objects were once in a COMDAT, LLVM should ensure that the members are always kept or removed as a group.

1, Remove the warning from gnu-ld and other linkers.
Seems problematic as it seems like there would be ambiguity as to how to order the sections of an output section where SHF_LINK_ORDER is set on some but not all input sections.

I agree. This would also require downstream users (e.g. people building Rust projects) to update their linkers to solve a problem that LLVM could fix on its own.

  1. Retain the comdats in the output (this patch).

Limits the optimization potential of LTO.
!associated metadata sections don't need to be in a comdats.

I don't believe that this is the right way to think about this. This patch prevents LLVM from performing an invalid optimization - that is, selectively discarding global objects that the IR explicitly indicates should stay together.

Add GV's with !associated metadata to llvm.compiler.used.

Limits the optimization potential of LTO (but not as much as 2.).

LLVM already does this in SanitizerCoverage

Did you mean adding the target of the !associated metadata to llvm.compiler.used? If so, I don't believe that there's a difference between this option and option 2.

Consider the case of a function that's unused, except for the fact that it's targeted by a global via !associated metadata. If both share a COMDAT, LTO will still be able to remove them from the final object,
since neither is considered used.

  1. Retain the comdats in the output (this patch).

Limits the optimization potential of LTO.
!associated metadata sections don't need to be in a comdats.

I don't believe that this is the right way to think about this. This patch prevents LLVM from performing an invalid optimization - that is, selectively discarding global objects that the IR explicitly indicates should stay together.

We need to resolve this point of disagreement. IMO that optimization is only invalid in regular compilation. I think that for LTO this is a valid optimization. LTO has optimizations that would normally be invalid - for example the Internalize pass.

Add GV's with !associated metadata to llvm.compiler.used.

Limits the optimization potential of LTO (but not as much as 2.).

LLVM already does this in SanitizerCoverage

Did you mean adding the target of the !associated metadata to llvm.compiler.used? If so, I don't believe that there's a difference between this option and option 2.

Yes, I mean't adding the target of the !associated metadata to llvm.compiler.used.

Consider the case of a function that's unused, except for the fact that it's targeted by a global via !associated metadata. If both share a COMDAT, LTO will still be able to remove them from the final object,
since neither is considered used.

I don't think this will happen because the !associated metadata section will always be referenced from llvm.compiler.used which will prevent the COMDAT from being removed, no?
AFAIK !associated metadata sections are not required to be in a COMDAT; so, option 3. is a more general solution.

We need to resolve this point of disagreement. IMO that optimization is only invalid in regular compilation. I think that for LTO this is a valid optimization. LTO has optimizations that would normally be invalid - for example the Internalize pass.

I think the issue here is that an LLVM COMDAT has two different purposes:

  1. De-duplicate section groups
  2. Ensure that section group members are kept/removed as a unit

In ELF files, you can have non-COMDAT section groups, which only accomplish purpose 2.

The way that COMDATS are currently used within LLVM seems to suggest that keeping group members together is an important purpose.
For example, AddressSanitizer uses COMDATs to keep groups together, and explicitly tries to prevent the COMDAT name from duplicating another name. In this case, the de-duplication feature of COMDATS is completely irrelevant - a COMDAT is only being used to keep an instrumented function and its associated global together.

AFAIK !associated metadata sections are not required to be in a COMDAT

The language ref states that globals with !associated metadata should be in a COMDAT "for best compatibility".
While that's not the same as requiring it, all LLVM passes which insert !associated metadata (e.g. AddressSanitizer, SanitizerCoverage) always create/use a COMDAT.

My main concern with option 3 is that it doesn't fix the underlying issue with COMDAT-related optimizations. If COMDATs can't be relied on to keep their group members together, their only possible use is for de-duplication purposes.

Given the existing uses in AddressSanitizer and CoversageSanitizer, I think it makes the most sense for LLVM to never delete COMDATS in optimization passes. In the worst case, we miss an (ambiguous) optimization opportunity - i.e. we don't delete an individual COMDAT member when it might have been legal to do so. However, the current behavior (deleting individual COMDAT members) leads to behavior that is at best surprising, and at worst a violation of the ELF spec.

We need to resolve this point of disagreement. IMO that optimization is only invalid in regular compilation. I think that for LTO this is a valid optimization. LTO has optimizations that would normally be invalid - for example the Internalize pass.

I think the issue here is that an LLVM COMDAT has two different purposes:

  1. De-duplicate section groups
  2. Ensure that section group members are kept/removed as a unit

In ELF files, you can have non-COMDAT section groups, which only accomplish purpose 2.

Actually, I don't think that ELF has non-COMDAT groups. AFAIK COMDATs are the only, currently, legal type of group.

The way that COMDATS are currently used within LLVM seems to suggest that keeping group members together is an important purpose.
For example, AddressSanitizer uses COMDATs to keep groups together, and explicitly tries to prevent the COMDAT name from duplicating another name. In this case, the de-duplication feature of COMDATS is completely irrelevant - a COMDAT is only being used to keep an instrumented function and its associated global together.

AFAIK !associated metadata sections are not required to be in a COMDAT

The language ref states that globals with !associated metadata should be in a COMDAT "for best compatibility".
While that's not the same as requiring it, all LLVM passes which insert !associated metadata (e.g. AddressSanitizer, SanitizerCoverage) always create/use a COMDAT.

My reading of "for best compatibility" that is simply that system linker support for "associated" sections might not be implemented (actually, I think that this advice is becoming increasingly irrelevant). We shouldn't rely on this"best compatibility" guideline for correctness.

My main concern with option 3 is that it doesn't fix the underlying issue with COMDAT-related optimizations.

I don't think that there are any other problems with COMDAT-related optimizations; or, at least I don't think that there are any that can't be fixed without changing the current handling of COMDATs.

If COMDATs can't be relied on to keep their group members together, their only possible use is for de-duplication purposes.

They can be relied on to keep group memebers together "for as long as necessary". They also provide for group de-duplication.

Given the existing uses in AddressSanitizer and CoversageSanitizer, I think it makes the most sense for LLVM to never delete COMDATS in optimization passes. In the worst case, we miss an (ambiguous) optimization opportunity - i.e. we don't delete an individual COMDAT member when it might have been legal to do so. However, the current behavior (deleting individual COMDAT members) leads to behavior that is at best surprising, and at worst a violation of the ELF spec.

While this strategy works, it would go against the mental model of developers working on LLVM - I expect LTO optimizations to be able to remove COMDATs. Given the current use cases I don't think that it is a problem to implement option 3. Option 3 also has the benefit that we can add specific fixes to the code that is creating the "associated metadata" (with appropriate comments) which is easy for a developer to understand. Adjusting the global behavior of LLVM w.r.t COMDATs to work around issues with "associated metadata" is much more complicated.

There is a relevant discussion about non-COMDAT groups here: https://reviews.llvm.org/D56437.

MaskRay added subscribers: xur, MaskRay.EditedMay 22 2021, 4:01 PM

This change makes sense to me.
PE-COFF's IMAGE_COMDAT_SELECT_ASSOCIATIVE and ELF's section group have the "if one member is retained, other members are retained as well" semantics.

In the current PGO section design, a pair of __llvm_prf_cnts and __llvm_prf_data make use of the semantics for garbage collection.
Noticed by @xur: if internalizer makes __llvm_prf_data internal, we lose the semantics: if the paired __llvm_prf_cnts is retained, the __llvm_prf_data should be retained as well.
This patch (keeping COMDAT) will bring back the desired semantics.

The original patch has some size advantage: if a group has only one member (used for COMDAT deduplication)
After internalization of the only member, the group can be deleted.
With this patch, we will keep the group. This adds some size overhead (64+8 bytes).
We can do if (has just one member) setComdat(nullptr);

This behavior can end up causing issues involving the BFD linker and MD_Associated metadata, since the removed COMDAT allows GlobalDCE to delete the target of the MD_Associated metadata. Keeping the COMDAT causes the MD_Associated-annotated object and its target to be kept or removed as a unit, which is the correct behavior.

This (https://github.com/Aaron1011/llvm_lto_ir/blob/master/bad_ir.ll) is stale with binutils 2.36.
LLD used to have similar issues. LLD now supports sh_link=0 and mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER input sections ( D72904 D84001).
GNU ld had made similar changes (https://sourceware.org/bugzilla/show_bug.cgi?id=26256)

BTW: I cannot find a use case where associated should be used together with a COMDAT group.
With the COMDAT GC semantics, associated is not useful.
When used with functions (instead of variables), associated doesn't play well with inlining (D97430).

BTW: comdat any is suppored (in 2021) in ELF AsmPrinter, which uses a section group with flag 0.

MaskRay added inline comments.May 22 2021, 4:03 PM
test/Transforms/Internalize/preserve-comdat-members.ll
9 ↗(On Diff #169735)

The test case should be changed to use two functions in the same group, instead of using !associated.

The !associated use case is a weak justification.

I think simply dropping the code can cause some trouble to regular LTO: a relocatable link of a regular LTO may
leverage the suppressed deduplication so that the result object file cannot collide with a
subsequent link with a comdat with the same signature but non-internal.

Posted D103043

MaskRay commandeered this revision.May 26 2021, 12:09 PM
MaskRay abandoned this revision.
MaskRay added a reviewer: Aaron1011.