This is an archive of the discontinued LLVM Phabricator instance.

[LTO] Remove values from non-prevailing comdats
ClosedPublic

Authored by tejohnson on Jun 28 2017, 8:14 PM.

Details

Summary

When linking a regular LTO module, if it has any non-prevailing values
(dropped to available_externally) in comdats, we need to do more than
just remove those values from their comdat. We also remove all values
from that comdat, so as to avoid leaving an incomplete comdat.

This is necessary in case we are compiling in mixed regular and ThinLTO
mode, since the resulting regularLTO native object is always linked into
the final binary first. We need to prevent the linker from selecting an
incomplete comdat that was not the prevailing copy.

Fixes PR32980.

Diff Detail

Repository
rL LLVM

Event Timeline

tejohnson created this revision.Jun 28 2017, 8:14 PM

Really sorry if I'm just missing context with a drive by comment ... but is this correct? What if two symbols in a comdat are actually required to be treated as a single comdat for correctness?

Really sorry if I'm just missing context with a drive by comment ... but is this correct? What if two symbols in a comdat are actually required to be treated as a single comdat for correctness?

Not sure what scenario you are referring to, but this is invoked if we have already decided, based on linker info, that the comdat contains a non-prevailing copy of a linkonce or weak symbol. So we would already have removed that non-prevailing symbol from the comdat, leaving it incomplete. Presumably if the linker did not select that copy of the linkonce or weak as prevailing, then it has not selected this coppy of the comdat, and so we remove everything else from the comdat so as to not leave behind an incomplete comdat. Since the regular LTO native object is passed to the final link before any native objects from ThinLTO backends, leaving an incomplete comdat there could cause it to be incorrectly selected in the final link, when it wasn't originally (which is what was happening in the associated bug).

This revision was automatically updated to reflect the committed changes.
ychen added a subscriber: ychen.Jun 2 2021, 11:52 PM
ychen added inline comments.
llvm/trunk/test/LTO/Resolution/X86/comdat-mixed-lto.ll
11

It seems PR49009 failed due to this patch.

Here both C@t1.o and testglobfunc@t2.o prevail however they are from COMDATs of the same key, I think this resolution is not possible from the linker's point of view?

Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2021, 11:52 PM
tejohnson added inline comments.Jun 3 2021, 8:19 AM
llvm/trunk/test/LTO/Resolution/X86/comdat-mixed-lto.ll
11

Reading through the bug this was fixing (PR32980) again, I think this is trying to simulate the scenario that occurred there. It sounds like it was a ThinLTO build where LTO splitting was in effect, so we also had a regular LTO module. When we are done with the LTO backends the regular LTO module is handed back to the linker first. The problem occurred because the comdat added to the regular LTO module was initially not prevailing, but since the regular LTO native object was handed back to the linker for the final native linker first, its (then incomplete) comdat and symbols were subsequently selected as prevailing, leading to the incomplete comdat issue. We avoided this by removing the comdat from the non-prevailing copies. I think the below test is trying to simulate that effect since here we only have a single link. I can add a comment to the test.

I looked at PR49009 but it isn't clear from what is written in the bug how this patch caused that failure. Can you elaborate as to what is happening to that symbol with and without this patch?

ychen added inline comments.Jun 3 2021, 10:10 AM
llvm/trunk/test/LTO/Resolution/X86/comdat-mixed-lto.ll
11

Thanks for taking a look, Teresa. I reduced PR49009 to a test case like this. D2 was dropped due to non-prevailing D0@a.ll, hence the D1 to D2 aliasing caused the assertion failure.

---- a.ll  (D2,px D0,)
$D5 = comdat any

@D1 = weak_odr unnamed_addr alias void (%"X"*), void (%"Y"*)* @D2

define weak_odr void @D2(%"Y"* %this) unnamed_addr #0 comdat($D5) align 2 {
entry:
  ret void
}

define weak_odr void @D0(%"Y"* %this) unnamed_addr #0 comdat($D5) align 2 {
entry:
  tail call void @llvm.trap()
  unreachable
}


---- b.ll  (D0,px)
$D0 = comdat any

define linkonce_odr void @D0(%"Y"* %this) unnamed_addr #0 comdat align 2 {
entry:
  tail call void @llvm.trap()
  unreachable
}
ychen added inline comments.Jun 3 2021, 11:08 AM
llvm/trunk/test/LTO/Resolution/X86/comdat-mixed-lto.ll
11

Maybe we should not drop the COMDAT when there are prevailing symbols in it which means the COMDAT has been chosen.

tejohnson added inline comments.
llvm/trunk/test/LTO/Resolution/X86/comdat-mixed-lto.ll
11

@MaskRay for input from the linker side

My understanding is that symbols in the same comdat should be kept or discarded as a group. In this case the linker has apparently decided that the copy of D0 from comdat $D5 in a.ll is not prevailing, despite the other symbol in that copy of the comdat D2 being selected as prevailing. Which breaks my understanding that the comdat should be kept or discarded as a whole. And the symbol D0 from comdat $D0 in b.ll is instead selected as the prevailing copy of D0. Part of the issue is that symbol D0 is in differently named comdats in a.ll and b.ll with different grouping of symbols, which seems unusual.

The problem in your test case presumably relates to the following bit of code from when we remove symbols from the comdat:

// Additionally need to drop externally visible global values from the comdat
// to available_externally, so that there aren't multiply defined linker
// errors.
if (!GV.hasLocalLinkage())
  GV.setLinkage(GlobalValue::AvailableExternallyLinkage);

I'm not 100% sure why I added that handling, since in the original bug PR32980 the other symbol in the comdat was already internal. Ah, ok I looked for and found the original review, which was off-phab since it was @respindola who didn't use phab for reviews. In fact here was his first reply:

From https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170626/465766.html:

I am probably missing something.

But if a symbol in a comdat is prevailing and another one is not, that is a bug in the linker, no?

Cheers,
Rafael

Note the comment about it being a bug in the linker if one symbol in the comdat is prevailing and one is not, which seems to be what is happening here.

In my reply (https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170626/465779.html) I pointed out that I am fixing an issue where the other symbol in the comdat is internal, so it isn't an issue with the linker:

This is just handling the rest of the symbols in the comdat (e.g. internals) that were being left in the comdat resulting in an incomplete comdat.

But I go on to say:

I suppose I could change this to simply removal all non-prevailing symbols from comdats in addRegularLTO (rather than just keeping track of which had a weak/linkonce removed from the comdat and fixing them up later).

And I'm not sure from the rest of what I wrote what triggered my thinking on that. But regardless, I think the idea per Rafael and my own understanding is that the comdat selection should result in all or nothing selection of prevailingness of externally visible symbols in a comdat, which seems to be what is not the case in the bug you are looking at. If that is expected, then I think removing the few lines I show above about dropping those to available_externally would fix this. But I do believe we still need to remove from the comdat, since that comdat would be incomplete.

MaskRay added inline comments.Jun 5 2021, 1:24 PM
llvm/trunk/test/LTO/Resolution/X86/comdat-mixed-lto.ll
11

I agree that comdat members should be retained or discarded as a unit.
The object files should ensure that the situation (one member is prevailing while another is not) does not happen.
I'll consider such cases erroneous input.

In the b.ll (D0,px) example, the issue is that @D0 should not be in two comdats ($D5 and $D0).
I can understand D0/D1/D5 (-mconstructor-aliaes) in $D5 is justified, but why is only D0 in $D0 in b.ll?

ychen added inline comments.Jun 5 2021, 3:13 PM
llvm/trunk/test/LTO/Resolution/X86/comdat-mixed-lto.ll
11

@MaskRay Thanks for chiming in.

I agree that comdat members should be retained or discarded as a unit.

Agreed. I wonder how does this works with symbol resolution?

The object files should ensure that the situation (one member is prevailing while another is not) does not happen.

This is kinda surprising to me. Does this imply that COMDAT choosing decides symbol resolution?

@respindola mentioned here (https://bugs.llvm.org/show_bug.cgi?id=27866#c18) that, COMDAT choosing happens before symbol resolution which, from my limited linker knowledge, suggests that COMDAT choosing is independent of symbol resolution? I think I misunderstand something here.

I'll consider such cases erroneous input.

In the b.ll (D0,px) example, the issue is that @D0 should not be in two comdats ($D5 and $D0).
I can understand D0/D1/D5 (-mconstructor-aliaes) in $D5 is justified, but why is only D0 in $D0 in b.ll?

D5 comdat is produced from explicit instantiation (https://github.com/weidai11/cryptopp/blob/1124a3d1fe8ac0c59acaf75f087ee4bd44a8b0bf/iterhash.cpp#L193).
D0 comdat is produced from implicit instantiation (SHA512 in a different TU: https://github.com/weidai11/cryptopp/blob/1124a3d1fe8ac0c59acaf75f087ee4bd44a8b0bf/donna_64.cpp#L845 ) triggered vtable emission which refers to the D0. I'm not sure if it correct to put D0 in its own comdat though.